"The important thing in life is to have a great aim, and the determination to attain it" - Gothe.

DISCO: Distributed and Cloud Computing Systems Lab

The DISCO Lab aims to explore in-depth understanding of Distributed and Cloud computing with augmented services, and develop open-source techniques to enhance the system performance, dependability, scalability, and sustainability. The research was supported in part by National Science Foundation.

The DISCO Lab is located in the Osbourne science and engineering building. The server room is furnished with datacenter blade facility that has three racks of HP ProLiant BL460C G6 blade server modules and a 40 TB HP EVA storage area network with 10 Gbps Ethernet and 8 Gbps Fibre/iSCSI dual channels. It has three APC InRow RP Air-Cooled and UPS equipments for maximum 40 kWs in the n+1 redundancy design.


Recent Projects

SHF: Small: Lightweight Virtualization Driven Elastic Memory Management and Cluster Scheduling (Sponsor: NSF SHF-1816850, PI: X. Zhou. 7/2018 - 06/2021)

Data-centers are evolving to host heterogeneous workloads on shared clusters to reduce the operational cost and achieve high resource utilization. However, it is challenging to schedule heterogeneous workloads with diverse resource requirements and performance constraints on heterogeneous hardware. Data parallel processing often suffers from interference and significant memory pressure, resulting in excessive garbage collection and out-of-memory errors that harm application performance and reliability. Cluster memory management and scheduling is still inefficient, leading to low utilization and poor multi-service support. Existing approaches either focus on application awareness or operating system awareness, thus are not well positioned to address the semantic gap between application run-times and the operating system. This project aims to improve application performance and cluster efficiency via lightweight virtualization-enabled elastic memory management and cluster scheduling. It combines system experimentation with rigorous design and analyses to improve performance and efficiency, and tackle memory pressure of data-parallel processing. Developed system software will be open-sourced, providing opportunities to foster a large ecosystem that spans system software providers and customers.

CSR: Small: Moving MapReduce into the Cloud: Flexibility, Efficiency, and Elasticity (Sponsor: NSF CNS-1422119, PI: X. Zhou, CoPI: J. Rao. 10/2014 - 09/2018)

MapReduce, a parallel and distributed programming model on clusters of commodity hardware, has emerged as the de facto standard for processing large data sets. Although MapReduce provides a simple and generic interface for parallel programming, it incurs several problems including low cluster resource utilization, suboptimal scalability and poor multi-tenancy support. This project explores and designs new techniques that let MapReduce fully exploit the benefits of flexible and elastic resource allocations in the cloud while addressing the overhead and issues caused by server virtualization. It broadens impact by allowing a flexible and cost-effective way to perform big data analytics. This project also involves industry collaboration, curriculum development, and provides more avenues to bring women, minority, and underrepresented students into research and graduate programs.