Residential College | false |
Status | 已發表Published |
Improving Concurrent GC for Latency Critical Services in Multi-tenant Systems | |
Zhao, Junxian1; Pi, Aidi1; Zhou, Xiaobo1; Chang, Sang Yoon1; Xu, Chengzhong2 | |
2022-10-24 | |
Conference Name | 23rd ACM/IFIP International Middleware Conference, Middleware 2022 |
Source Publication | Middleware 2022 - Proceedings of the 23rd ACM/IFIP International Middleware Conference |
Pages | 43-55 |
Conference Date | 2022/11/07-2022/11/11 |
Conference Place | Quebec City, Quebec, Canada |
Publisher | Association for Computing Machinery, Inc |
Abstract | For resource utilization efficiency, latency critical (LC) services are commonly co-located with best-effort batch jobs in datacenter servers. Many LC services, such as Cassandra and HBase, run in Java Virtual Machine (JVM). We find that LC services often experience heavy-tailed latency due to performance interference of the concurrent garbage collection (GC) as well as multi-tenancy. The root cause is a semantic gap of resource allocation between JVM and the underlying Linux OS in multi-tenant systems. That is, the OS is unaware of the characteristics of different kinds of threads in JVM (i.e., GC threads and LC worker threads), which may lead to GC threads competing for CPUs; JVM is unaware of the resource utilization in the OS, which may trigger CPU-intensive GC operations when CPUs are busy. Furthermore, we find that co-located batch jobs can interfere with LC services due to Simultaneous Multi-Threading (SMT). We propose iGC, a middleware that bridges the semantic gap between JVM and Linux OS and improves concurrent GC performance in multi-tenant systems. iGC adaptively triggers GC based on the CPU utilization at runtime, which speeds up the GC process and reduces its CPU contention. Furthermore, iGC deploys a dynamic CPU scheduling and thread placement strategy to avoid or mitigate the interference due to concurrent GC and multi-tenancy, but also improve the cache performance. We implement iGC upon two state-of-the-art concurrent GC mechanisms ZGC and G1 GC. We conduct its evaluation using three NoSQL databases as LC services. Experimental results show that iGC significantly improves the performance of concurrent GC for LC services and the throughput in multi-tenant systems. iGC reduces the p95 tail latency by 83%, 37% and 22% for the three LC services Cassandra, HBase and Solr, respectively. It also increases the throughput of LC services up to 2.56X. |
Keyword | Garbage Collection Interference Job Co-location Tail Latency |
DOI | 10.1145/3528535.3531515 |
URL | View the original |
Language | 英語English |
Scopus ID | 2-s2.0-85132298339 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | Faculty of Science and Technology THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU) |
Corresponding Author | Zhou, Xiaobo |
Affiliation | 1.University of Colorado, Colorado Springs, United States 2.University of Macau, Macao |
Recommended Citation GB/T 7714 | Zhao, Junxian,Pi, Aidi,Zhou, Xiaobo,et al. Improving Concurrent GC for Latency Critical Services in Multi-tenant Systems[C]:Association for Computing Machinery, Inc, 2022, 43-55. |
APA | Zhao, Junxian., Pi, Aidi., Zhou, Xiaobo., Chang, Sang Yoon., & Xu, Chengzhong (2022). Improving Concurrent GC for Latency Critical Services in Multi-tenant Systems. Middleware 2022 - Proceedings of the 23rd ACM/IFIP International Middleware Conference, 43-55. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment