ReTail: Opting for Learning Simplicity to Enable QoS-Aware Power Management in the Cloud

Abstract

Many cloud services have QoS requirements, with most requests needing to complete within a given latency constraint. Recently, researchers have begun to investigate whether it is possible to meet QoS for these latency-critical applications while attempting to save power on a per-request basis. Existing work shows that one can indeed hand-tune a request latency predictor offline for a particular cloud application, and consult it at runtime to modulate CPU voltage and frequency, resulting in substantial power savings. In this paper, we propose ReTail, an automated and general solution for request-level power management of latency-critical services with QoS constraints. We present a systematic process to select the features of any given application that best correlate with its request latency. ReTail uses these features to predict latency, and adjust a CPU’s power consumption. ReTail’s predictor is trained fully at runtime. We show that unlike previous findings, simple techniques perform better than complex machine learning models, when using the right input application features. For a web search engine, ReTail outperforms prior mechanisms based on complex hand-tuned predictors for that application domain. Furthermore, ReTail’s systematic approach also yields superior power savings across a diverse set of cloud applications.

Publication
In the 28th IEEE International Symposium on High-Performance Computer Architecture (HPCA 2022), IEEE.