NSF smallcore grant CCF-1261584
Total amount $500,000, with $250,000 allocated to UCCS
Period : 06/01/12 - 05/31/15 (extended through 8/31/18)
Priciple investigators: Damian Dechev (UCF) and Qing Yi (UCCS)
co-investigators: none
Students supported : Akshatha Bhat (MS, 2010-2012), Yong Zhao (Ph.D, 2013-2014), Qian Wang (Ph.D, 2013-2014), Jiange Zhang (Ph.D, 2014-present)
Project Summary :
A key challenge in developing multi-threaded applications on modern architectures is correctly synchronizing data shared among the threads while avoiding excessive performance penalties. Unsafe low-level synchronization mechanisms can easily introduce errors (e.g. race conditions and deadlock) that are extremely difficult to debug. At the same time, application performance and scalability are frequently compromised due to inefficient implementations of synchronous operations on shared data.
This research develops a library of highly concurrent scalable data containers with associated programming interface and optimization support to significantly enhance the productivity and performance of multi-threaded C/C++ applications on multicore architectures. The library provides an easy to use and composable interface similar to that of C++ Standard Template Library (STL) and enhances each container type with internal support for nonblocking synchronization of their data accesses, thereby providing better safety and performance than traditional blocking synchronization by eliminating hazards such as deadlock, livelock, and priority inversion, and by being highly scalable in supporting large numbers of threads. A higher level programming interface, similar to that of OpenMP, is supported by a preprocessing compiler associated with the runtime to ease the transition of existing sequential or multi-threaded C/C++ applications to using the nonblocking synchronous template library and to provide optimization and tuning support for the use of the library abstractions. The developed deliverables are expected to demonstrate a seamless integration of developer input, compiler optimization, and multicore runtimes to support systematic migration of C/C++ applications to continuously evolving architectures.
The scalable template library and the associated programming interface and tuning support is expected to provide an immense productivity and performance boost for developers of high-end scientific and systems applications, including branch and bound, graph analysis, complex scene rendering, and goal propagation in autonomous embedded systems. The developed programming techniques and tools can enable the transformation of such applications into software that is substantially more reliable, efficient, and scalable than existing state of the art. The software techniques is also expected to be employed as an educational toolkit in the teaching of programming languages, compilers, systems software, and parallel programming courses.
Publications
- Automating the Exchangeability of Shared Data Abstractions
Jiange Zhang, Qian Wang, Qing Yi, and Huimin Cui.
In The 31st International Workshop on Languages and Compilers for Parallel Computing (LCPC'18), Salt Lake City, UT, USA
- Characterizing and Optimizing the Performance of Multithreaded Programs Under Interference
Yong Zhao, Jia Rao, and Qing Yi.
In the 25th International Conference on Parallel Architectures and Compilation Techniques (PACT ’16), September 11-15, 2016, Haifa, Israel
- Effective Use of Non-blocking Data Structures in a Deduplication Application
Steven Feldman, Akshatha Bhat, Pierre Laborde, Qing Yi, and Damian Dechev.
In The ACM International Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH'13). Wavefront experimence report. Indianapolis, USA. Oct, 2013.
- Layout-oblivious compiler optimization for matrix computations
Huimin Cui, Qing Yi, Jingling Xue, and Xiaobing Feng.
ACM Transactions on Architecture and Code Optimization. Vol 9, No 4, pages 35:1-20. Jan, 2013.
|