Unveiling patterns in employee compensation: A feature-driven analysis using machine learning algorithms

Authors

Vinit Patwa, Pooja

Issue Date

2024-05

Degree

MSc in Business Analytics

Publisher

Dublin Business School

Rights

Abstract

Employee compensation is a crucial aspect of organizational performance in today’s workspace. Since it is a key factor that influences employee satisfaction, attrition, which essentially influences companies’ functioning, it is crucial to revolutionize how compensation forecasting is done. This project aims to achieve this goal by applying machine learning solutions into compensation forecasting. To create powerful forecasting models, job family, department, union affiliation, and several compensation items, such as salary, overtime, benefits, and many other features are being engineered. A detailed analysis is done to determine which variables such as job family code, department, and compensation specifics were more or less important. This involves data preprocessing and exploratory data analysis to determine the spread of each variable, cooperation patterns, and amount of contribution to its peers which helps to identify feature relationships leading to improved feature selection and engineering. Subsequently, the dataset is cleaned and engineered to be suitable for machine learning. Several regression algorithms ranging from linear regression, random forest, Gradient boosting, and XGBoost, structured such that a gridsearch approach allows optimizing r2 and rmse by adjusting the hyperparameters of the model based on the algorithm until one achieves the best evaluation metrics possible. The model is validated through its performance measures. An optimal R2 or Root mean square error level is achieved through several techniques such as cross-validation. Mean absolute error is used to evaluate model performance based on the total compensation measure; accurate measures make the model unbiased. Interpreting the key aspects of compensation and understanding how different factors are related is critical in coming up with a working model, engineered through using various analytical methods.