CSR: Autonomous Performance and Power Control on Virtualized Servers (NSF CNS-1217979, 09/2012-08/2017)
Project description and goals
Modern data centers hosting popular Internet services face significant and multi-facet challenges in performance and power control. The challenges are mainly due to complex interaction of highly dynamic and heterogeneous workloads in complex virtualized computing systems. In this research project, the investigators take an organized approach to autonomic performance and power control on virtualized servers. The project designs and develops automated, agile and scalable techniques for server parameter tuning, virtual machine capacity planning, non-invasive energy-efficient performance isolation, and elastic power-aware resource provisioning. The deliverables are innovative and practical approaches and mechanisms that provide performance assurance of applications, maximize effective system throughput of data centers with resources and power budget, mitigate performance interference among heterogeneous applications, and achieve performance and power targets with flexible tradeoffs while assuring control accuracy and system stability. The research methodology integrates strengths of reinforcement learning, fast online learning neural networks, fuzzy logic control, model predictive controls and distributed and coordinated control. The project broadens impact by developing a testbed in a university prototype data center to demonstrate the orchestration of developed approaches and mechanisms for autonomous management of virtualized computing systems, middleware, and services. The success will guide autonomous resource management for sustainable computing in next-generation data centers.
The research project is exectued in a cutting-edge lab located in the new science and engineering building. The server room is furnished with cutting-edge HP data center blade facility that has three racks of HP ProLiant BL460C G6 blade server modules and a 40 TB HP EVA storage area network with 10 Gbps Ethernet and 8 Gbps Fibre/iSCSI dual channels. It has three APC InRow RP Air-Cooled and UPS equipments for maximum 40 kWs in the n+1 redundancy design.
Participants
- Dr. Xiaobo Zhou, The Principal Investigator
- Shaoqi Wang, PhD student, 2016 - 2017
- Palden Lama, PhD student, 2008 - 2013
- Yanfei Guo, PhD student, 2010 - 2014
- Dazhao Cheng, PhD student, 2011 - 2014
Project-sponsored Publications
- “Network-Adaptive Scheduling of Data-Intensive Parallel Jobs in Clusters”, Shaoqi Wang, Xiaobo Zhou, Liqiang Zhang, and Changjun Jiang, Proc. of the 14th IEEE International Conference on Autonomic Computing (ICAC), Columbus, July 2017.
- “Autonomic Performance and Power Control for Co-located Web Applications in Virtualized Datacenters”, Palden Lama, Yanfei Guo, Changjun Jiang, and Xiaobo Zhou, IEEE Transactions on Parallel and Distributed Systems, Vol. 27, No. 5, pages: 1289-1302, May 2016.
- “Elastic Power-Aware Resource Provisioning of Heterogeneous Workloads in Self-Sustainable Datacenters”, Dazhao Cheng, Jia Rao, Changjun Jiang, and Xiaobo Zhou, IEEE Transactions on Computers, Vol. 65, No. 2, pages: 508-521, February 2016.
- "Towards Energy Efficiency in Heterogeneous Hadoop Clusters by Adaptive Task Assignment", Dazhao Cheng, Palden Lama, Changjun Jiang, and Xiaobo Zhou, Proc. of the 35th IEEE ICDCS (acceptance rate 12.8%), 10 pages, June/July 2015.
- "Heterogeneity-aware Workload Placement and Migration in Distributed Sustainable Datacenters", Dazhao Cheng, Changjun Jiang, and Xiaobo Zhou, Proc. of the 28th IEEE IPDPS (acceptance rate 21%), 10 pages, May 2014.
- "Autonomic Provisioning with Self-Adaptive Neural Fuzzy Control for Percentile-based Delay Guarantee", Palden Lama and Xiaobo Zhou, ACM Transactions on Autonomous and Adaptive Systems, 8(2):1-31, July 2013.
- "iShuffle: Improving Hadoop Performance with Shuffle-on-Write", Yanfei Guo, Jia Rao, and Xiaobo Zhou, won the Best Paper Award of the 10th USENIX ICAC (1 out of 90 submissions), 11 pages, San Jose, June 2013.
- "V-Cache: Towards Flexible Resource Provisioning for Multi-tier Applications in IaaS Clouds", Yanfei Guo, Palden Lama, Jia Rao, and Xiaobo Zhou, In Proc. of the 27th IEEE IPDPS (acceptance rate 21%), 12 pages, Boston, May 2013.
- "Self-tuning Batching with DVFS for Improving Performance and Energy Efficiency in Servers", Dazhao Cheng, Yanfei Guo, and Xiaobo Zhou, In Proc. of the 21st IEEE MASCOTS (acceptance rate 27%), 10 pages, San Francisco, August 2013.
- "Autonomic Performance and Power Control for Co-located Web Applications on Virtualized Servers", Palden Lama, Yanfei Guo, and Xiaobo Zhou, In Proc. of the 21st ACM/IEEE IWQoS (acceptance rate 28%), 10 pages, Montreal, June 2013.
- "Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments", Palden Lama, Yan Li, Ashwin Aji, Pavan Balaji, James Dinan, Shucai Xiao, Yunquan Zhang, Wuchun Feng, Rajeev Thakur, Xiaobo Zhou, In Proc. of the 33rd IEEE ICDCS (acceptance rate 13%), 10 pages, Philadephia, July 2013.
- "Automated and Agile Server Parameter Tuning by Coordinated Learning and Control", IEEE Transactions on Parallel and Distributed Systems, 15 pages, accepted, April 2013.
- "Optimizing Virtual Machine Scheduling in NUMA Multicore Systems", Jia Rao, Kun Wang, Xiaobo Zhou, and Cheng-Zhong Xu, In Proc. of the 19th IEEE HPCA (accpeptance rate 20%), 12 pages, Feb 2013. The paper is one of four Best Paper Award candidates (out of 249 submissions).
Acknowledgement
This material is based upon work supported by the National Science Foundation under Grant CNS-1217979. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).