Book Chapter
- Optimizing And Tuning Scientific Codes
Qing Yi
SCALABLE COMPUTING AND COMMUNICATIONS: THEORY AND PRACTICE.
Samee U. Khan and Albert Y. Zomaya and Lizhe Wang
Wiley-IEEE Computer Society Press
pages 255-276. Jan 2013.
Journal Publications
- Compiler-Driven Approach for Automating Non-Blocking Synchronization In Concurrent Data Abstractions,
Jiange Zhang, Qing Yi, Christina Perterson, and Damian Dechev.
Concurrency and Computation: Practice and Experience. Feb 2024; 36(5):e7935. doi: 10.1002/cpe.7935
- Enhancing the Effectiveness of Inlining in Automatic Parallelization,
Jichi Guo,
Qing Yi, and Kleanthis Psarris. Int J Parallel Prog 50, 65–88. Feb 2022.
- Layout-oblivious compiler optimization for matrix computations
Huimin Cui, Qing Yi, Jingling Xue, and Xiaobing Feng.
ACM Transactions on Architecture and Code Optimization. Vol 9, No 4, pages 35:1-20. Jan, 2013.
- POET: A Scripting Language For Applying
Parameterized Source-to-source Program Transformations
Qing Yi
Software Practice & Experience. John Wiley&Sons.
Vol 42, issue 6, pages 675-706. May, 2012.
-
Transforming Complex Loop Nests For Locality
Qing Yi, Ken Kennedy, and Vikram Adve
The Journal Of Supercomputing, Vol 27, pages 219-264, 2004
-
Improving Memory Hierarchy Performance Through Combined
Loop Interchange and Multi-level Fusion ,
Qing Yi and Ken Kennedy
International Journal of High Performance Computing Applications,
Vol 18, No.2, 2004
-
Advanced Optimization Strategies in the Rice dHPF compiler
John Mellor-Crummey, Vikram Adve, Bradly Broom, Daniel Chavarria-Miranda,
Robert Fowler, Guohua Jin, Ken Kennedy and Qing Yi
Concurrency and Computation: Practice and Experience, 14(8-9):741-767, 2002
Conference Publications
- Modeling Optimization of Stencil Computations Via Domain-level Properties,
Brandon Nesterenko, Qing Yi, Pei-Hung Lin, Chunhua Liao, Brandon Runnels. PMAM '22: Proceedings of the Thirteenth International Workshop on Programming Models and Applications for Multicores and ManycoresApril 2022 Pages 35–44. https://doi.org/10.1145/3528425.3529103
- An Adaptive Overlap-Pipelined Multitasking Superscalar Processor,
Mong T. Sim and Qing Yi,
In the {2020 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS)}, 2020.
- An Adaptive Multitasking Superscalar Processor,
Mong T. Sim and Qing Yi,
In the 2019 IEEE 5th International Conference on Computer and Communications (ICCC), Chengdu, China, 2019, pp. 1293-1299, doi: 10.1109/ICCC47050.2019.9064185.
- Automating Non-Blocking Synchronization In Concurrent Data Abstractions
Jiange Zhang, Qing Yi, and Damian Dechev.
In The 34th IEEE/ACM International Conference on Automated Software Engineering (ASE 2019). Nov, 2019. San Diego, CA, USA
- Transitioning Scientific Applications to using Non-VolatileMemory for Resilience
Brandon Nesterenko, Xiao Liu, Qing Yi, Jishen Zhao, and Jiange Zhang.
In The Interna- tional Symposium on Memory Systems (Memsys’2019). Sep 2019. Washington DC,USA.
- Accelerating Parallel Graph Computing with Speculation
Shuo Ji, Yinliang Zhao, and Qing Yi,
In The ACM International Conference on Computing Frontiers (CF'2019). May 2019. Alghero, Sardinia, Italy.
- Improving Resource Utilization through Demand Aware Process Schedulings
Brandon Nestenko, Qing Yi, and Jia Rao.
In The 2018 International Conference on Parallel Processing (ICPP '18). Aug, 2018. Oregan, USA.
- Automating the Exchangeability of Shared Data Abstractions
Jiange Zhang, Qian Wang, Qing Yi, and Huimin Cui.
In The 31st International Workshop on Languages and Compilers for Parallel Computing (LCPC'18), Salt Lake City, UT, USA
- Using Memory-style Storage to Support Fault Tolerance in Data Centers
Xiao Liu, Qing Yi, and Jishen ZHao.
In The 2016 USENIX Workshop on Cool Topics in Sustainable Data Centers (CoolDC '16). Mar 19, 2016. Santa Clara, Ca, USA.
- Compiler-Assisted Overlapping of Communication and Computation in MPI Applications
Jichi Guo, Qing Yi, Jiayuan Meng, Junchao Zhang, and Pavan Balaji.
In IEEE Cluster 2016, Sep 12-16, 2016. Taipei, Taiwan.
- Automatically Optimizing Stencil Computations on Many-core NUMA Architectures
Pei-Hung Lin, Qing Yi, Daniel Quinlan, Chunhua Liao and Yongqing Yan.
In The 29th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2016). Page 116-130. Sep 28-30, 2016. Rochester, NY, USA.
- Automatic Algorithm Selection in Computational Software Using Machine Learning
Matthew C. Simpson, Qing Yi, and Jugal Kalita.
In The 15th IEEE International Conference on Machine Learning and Applications (IEEE ICMLA'16). Page 355-360. Dec 18-20, 2016. ISBN: 9781509061686. Anaheim, California, USA.
- Characterizing and Optimizing the Performance of Multithreaded Programs Under Interference
Yong Zhao, Jia Rao, and Qing Yi.
In the 25th International Conference on Parallel Architectures and Compilation Techniques (PACT ’16), September 11-15, 2016, Haifa, Israel
- Interactive Composition Of Compiler Optimizations
Brandon Nesterenko, Wenwen Wang, and Qing Yi.
In The 28th International Workshop on Languages and Compilers for Parallel Computing (LCPC'15). Sep 9-11, 2015. Raleigh, NC, USA.
- Just-in-time Component-wise Power and Thermal Modeling
Shah Mohammad Faizur Rahman, Qing Yi and Houman Homayoun.
In ACM International Conference on Computing Frontiers (CF'15).
May 18-21, 2015. Ischia, Italy.
- Automatic Detection of Information Leakage Vulnerabilities in Browser Extensions
Rui Zhao, Chuan Yue and Qing Yi.
In the 24th International World Wide Web Conference (WWW'15). May 18-22, 2015. Florence, Italy.
- Specializing Compiler Optimizations Through Programmable Composition For Dense Matrix Computations
Qing Yi, Qian Wang, and Huimin Cui.
In The 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'14).
Dec 13-17, 2014. Cambridge, UK.
- Analytically Modeling Application Execution for Software-Hardware Co-Design
Jichi Guo, Jiayuan Meng, Qing Yi, Vitali Morozov, and Kalyan Kumaran.
In 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS'14).
May 19-23, 2014. PHOENIX, Arizona, USA.
- AUGEM:Automatically Generate High Performance Dense Linear Algebra Kernels on x86 CPUs
Qian Wang, Xianyi Zhang, Yunquan Zhang, and Qing Yi.
In the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'13). Denver, CO. Nov, 2013.
- Enhancing Performance Portability of MPI Applications Through Annotation-Based Transformations
Md. Ziaul Haque, Qing Yi, James Dinan, and Pavan Balaji.
In the 42nd International Conference on Parallel Processing (ICPP'13). Lyon, France. Oct, 2013.
- Effective Use of Non-blocking Data Structures in a Deduplication Application
Steven Feldman, Akshatha Bhat, Pierre Laborde, Qing Yi, and Damian Dechev.
In The ACM International Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH'13). Wavefront experimence report. Indianapolis, USA. Oct, 2013.
- Vectorization Past Dependent Branches Through Speculation
Majedul Haque Sujon, R. Clint Whaley, and Qing Yi.
In the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT'13).
Edinburgh, Scotland. Sep, 2013.
- A Highly Parallel Reuse Distance Analysis Algorithm on GPUs
Huimin Cui, Qing Yi, Jingling Xue, Lei Wang, Yang Yang, and Xiaobing Feng.
In 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS'12). May 21-25, 2012. Shanghai, China.
- Studying The Impact Of Application-level Optimizations On The Power Consumption Of Multi-Core Architectures
S. Faizur Rahman, Jichi Guo, Akshatha Bhat, Carlos Garcia, Majedul H. Sujon, Qing Yi, Chunhua Liao, and Daniel Quinlan.
In ACM International Conference on Computing Frontiers (CF'12). May 15-17, 2012. Cagliari, Italy.
- Enhancing the Role of Inlining in Effective Interprocedural Parallelization
Jichi Guo, Mike Stiles, Qing Yi, and Kleanthis Psarris
In International Conference On Parallel Processing (ICPP'11), Sep, 2011. Taipei, Taiwan.
- Collective Specification and Verification of Behavioral Models and Object-oriented Implementations
Qing Yi, Jianwei Niu, and Anitha R. Marneni
ICSOFT'11: International Conference On Software And Data Technologies.
Seville, Spain. July 18-21, 2011.
-
Extensive Parameterization And Tuning of Architecture-Sensitive Optimizations
Qing Yi and Jichi Guo
IWAPT'11: International Workshop on Automatic Performance Tuning.
Singapore. June, 2011.
-
Understanding Stencil Code Performance On MultiCore Architectures
S. Faizur Rahman, Qing Yi, and Apan Qasem,
CF'11: ACM International Conference on Computing Frontiers.
Ischia, Italy, May, 2011.
-
Automated Programmable Control and Parameterization of Compiler Optimizations
Qing Yi
CGO'11: IEEE/ACM International Symposium on Code Generation and Optimization.
Chamonix, France, Apr, 2011.
-
Automated Empirical Tuning of Scientific Codes For Performance and Power Consumption
Shah Faizur Rahman, Jichi Guo, and Qing Yi
HIPEAC'11: High-Performance and Embedded Architectures and Compilers.
Heraklion, Greece. Jan, 2011.
-
Exposing Tunable Parameters in Multi-threaded Numerical Code
Apan Qasem, Jichi Guo, Faizur Rahman, and Qing Yi
NPC'10: The 7th IFIP International Conference On Network and Parallel Computing (best paper).
Zhengzhou, China. Sep, 2010.
-
Improving Autotuning Efficiency and Portability Through Feedback Diagnostics
Qing Yi, Santosh Sarangkar, and Apan Qasem
IWAPT'10: The Fifth International Workshop on Automatic Performance Tuning (position paper).
Berkely, CA. June, 2010.
-
Automated Timer Generation for Empirical Tuning
Josh Magee, Qing Yi, and R. Clint Whaley
SMART'10: The 4th Workshop on Statistical and Machine learning approaches to ARchitecture and compilaTion,
Pisa, Italy. Jan, 2010.
-
Exploring the Optimization Space of Dense Linear Algebra Kernels
Qing Yi and Apan Qasem
LCPC'08: The 21th International Workshop on Languages and Compilers for Parallel Computing,
Edmonton, Canada. Aug, 2008.
-
Automated Transformation for Performance-Critical Kernels
Qing Yi and Clint Whaley
LCSD'07: ACM SIGPLAN Symposium on Library-Centric Software Design.
Montreal, Canada. Oct, 2007.
-
POET: Parameterized Optimizations for Empirical Tuning
Qing Yi, Keith Seymour, Haihang You, Richard Vuduc and Dan Quinlan
POHLL'07: Workshop on Performance Optimization for High-Level Languages and Libraries.
Long Beach, California. Mar,2007.
-
Annotating user-defined abstractions for optimization
Dan Quinlan, Markus Schordan, Richard Vuduc, and Qing Yi
Workshop on Performance Optimization for High-Level Languages and Libraries,
Rhodes Island, Greece. April 2006.
-
Applying Data Copy to Improve Memory Performance of General Array Computations
Qing Yi
The 18th International Workshop on Languages and Compilers for Parallel Computing,
Hawthorne, New York. Oct 2005.
-
Toward the Automated Generation of Components from Existing Source Code
Dan Quinlan, Qing Yi, Gary Kumfert, Thomas Epperly, Tamara Dahlgren, Markus Schordan, and Brian White
The Second Workshop on Productivity and Performance in High-end Computing,
San Francisco, Feb, 2005.
-
Classification and Untilization of Abstractions for Optimization ,
Dan Quinlan, Markus Schordan, Qing Yi, and Andreas Saebjornsen,
The First International Symposium on Leveraging Applications of Formal Methods,
Paphos, Cyprus, Oct, 2004.
-
Applying Loop Optimizations to Object-oriented Abstractions Through
General Classification of Array Semantics
Qing Yi and Dan Quinlan
The 17th International Workshop on Languages and Compilers for Parallel Computing,
West Lafayette, Indiana, USA. Sep. 2004.
-
Automatic Blocking Of QR and LU Factorizations for Locality
Qing Yi, Ken Kennedy, Haihang You, Keith Seymour, and Jack Dongarra
The Second ACM SIGPLAN Workshop on Memory System Performance,
Washington, DC, USA. June. 2004.
-
Semantic-Driven Parallelization of Loops Operating on User-Defined
Containers,
Dan Quinlan, Markus Schordan, Qing Yi and Bronis de Supinski
The 16th Annual Workshop on Languages and Compilers for Parallel Computing,
Collega Station, TX, USA. Oct. 2003.
-
A C++ infrastructure for Automatic Introduction and Translation of OpenMP
Directives,
Dan Quinlan, Markus Schordan and Qing Yi
Workshop on OpenMP Applications and Tools, Toronto, Ontario, Canada.
June. 2003
-
Improving Memory Hierarchy Performance Through Combined Loop Interchange and Multi-level Fusion
Qing Yi and Ken Kennedy
LACSI Symposium, Santa Fe, NM. Oct. 2002.
-
Transforming Loops To Recursion For Multi-Level Memory Hierarchies
Qing Yi, Vikram Adve, and Ken Kennedy
ACM SIGPLAN conference of Programming Language Design and Implementation,
Vancouver, British Columbia, Canada. June. 2000
-
High Performance Fortran Compilation Techniques for Parallelizing Scientific Codes
Vikram Adve, Guohua Jin, John Mellor-Crummey, and Qing Yi
Supercomputing. Orlando, FL, USA. Nov. 1998
Technical Reports
- POET: A Scripting Language For Applying
Parameterized Source-to-source Program Transformations
Qing Yi
Technical report CS-TR-2010-012, Computer Science, University of Texas at San Antonio.
- Evaluating the Role of Optimization-Specific Search Heuristics
in Effective Autotuning
Jichi Guo, Qing Yi, and Apan Qasem
Technical report CS-TR-2010-010, Computer Science, University of Texas at San Antonio. July, 2010.
- Automated Programmable Code Transformation For Portable Performance Tuning
Qing Yi
Technical report CS-TR-2010-002, Computer Science, University of Texas at San Antonio. Apr, 2010.
- Automated Timer Generation for Empirical
Tuning
Josh Magee, Qing Yi, and R. Clint Whaley
Technical report CS-TR-2009-006, Computer Science, University of Texas at San Antonio. July, 2009.
- Collective Specification and Verification of Behavioral Models and Object-oriented Implementations
Qing Yi, Jianwei Niu, and Anitha R. Marneni
Technical report CS-TR-2010-011, Computer Science, University of Texas at San Antonio. May, 2010.
- Automated Transformation for Performance-Critical Kernels
Qing Yi and Clint Whaley
Technical report CS-TR-2007-003, Computer Science, University of Texas at San Antonio.
- POET: Parameterized Optimizations for Empirical Tuning
Qing Yi Keith Seymour Haihang You Richard Vuduc Dan Quinlan
Technical report CS-TR-2006-006, Computer Science, University of Texas at San Antonio.
Go back to my home page 3>