From Feature Selection to Resource Prediction: An Analysis of Commonly Applied Workflows and Techniques [Experiments & Analysis]

Abstract

Understanding and predicting database workload performance on different hardware settings in the cloud is crucial for both the users and providers in order to optimize resource allocation. Recently, machine learning (ML) based techniques have been applied to parts of the end-to-end three-step pipeline for workload prediction - feature selection, workload similarity, and performance prediction. However, despite its practical importance, there exists no principled analysis that studies the performance of such pipelines. In this paper, we examine the state-of-the-art strategies for these three components, with the goal of identifying which techniques work best in practice. Our experimental results reveal that while no universal solution exists for the prediction pipeline, certain best practices can improve prediction performance and reduce computation overhead. Based on our results, we outline important topics for future work that will benefit ML-driven recommendation systems for resource allocation.

Publication
In International Conference on Extending Database Technology 2025
Ling Zhang
Ling Zhang
Ph.D. student

My research forcuses on log processing and management, structured text search, and database query processing in general.