PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services

Abstract

Multi-tenancy in modern datacenters is currently limited to a single latency-critical, interactive service, running alongside one or more low-priority, best-effort jobs. This limits the efficiency gains from multi-tenancy, especially as most cloud applications progressively shift from batch jobs to services with strict latency requirements. We present PARTIES, a QoS-aware resource manager that enables an arbitrary number of interactive, latency-critical services to share a physical node without QoS violations. PARTIES leverages a set of hardware and software resource partitioning mechanisms to adjust allocations dynamically at runtime, to meet QoS of each co-scheduled workload and maximize throughput for the machine. We evaluate PARTIES on state-of-the-art server platforms across a set of diverse interactive services. Our results show that PARTIES improves throughput under QoS by 61% on average compared to existing resource managers, and that the rate of improvement increases with the number of co-scheduled applications per physical host.

Publication
In the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2019), ACM.