Predicting Health-Related Absenteeism with Machine Learning: A Case Study

Aleksander Piciga and Matjaž Kukar

Abstract
Health-related absenteeism, or sick leave, is a complex issue with significant financial and operational implications for businesses. We present a machine learning approach to predict employee
absenteeism in a Slovenian company. The study involved preprocessing and augmenting the dataset by incorporating domain knowledge, and evaluating various machine learning models.
Gradient Boosted Regression Trees emerged as the most effective model, significantly outperforming the baseline model which merely predicted the previous year’s absenteeism rate. Key attributes influencing absenteeism were identified, notably including current absenteeism, performance evaluations, and various job type and location-related features. Results highlight the potential of machine learning in proactively managing absenteeism and offer recommendations for future research, such as modeling
absenteeism as a time series and incorporating additional data sources. We also show that the current data is not detailed and granular enough to further improve the results.