From Feature Selection to Resource Prediction: An Analysis of Commonly Applied Workflows and Techniques

Ling Zhang, Shaleen Deep, Joyce Cahoon, Anja Gruenheid, Jignesh M. Patel

February, 2025

Abstract

Understanding and predicting database workload performance on different hardware settings in the cloud is crucial for both the users and providers in order to optimize resource allocation. Recently, machine learning (ML) based techniques have been applied to parts of the end-to-end three-step pipeline for workload prediction - feature selection, workload similarity, and performance prediction. However, despite its practical importance, there exists no principled analysis that studies the performance of such pipelines. In this paper, we examine the state-of-the-art strategies for these three components, with the goal of identifying which techniques work best in practice. Our experimental results reveal that while no universal solution exists for the prediction pipeline, certain best practices can improve prediction performance and reduce computation overhead. Based on our results, we outline important topics for future work that will benefit ML-driven recommendation systems for resource allocation.

Type

Conference paper

Publication

In International Conference on Extending Database Technology 2025

Benchmarking and performance evaluation ML/AI for DB

From Feature Selection to Resource Prediction: An Analysis of Commonly Applied Workflows and Techniques

Abstract

Ling Zhang

Ph.D. student