Which employees are most frequently absent from work? Clustering the employee absenteeism dataset in R. – Wonder Mahembe

In today’s fast-paced work environments, organizations face numerous challenges, and one persistent issue that can hinder productivity and disrupt operations is absenteeism. High rates of absenteeism can lead to reduced efficiency, increased costs, and decreased employee morale. Understanding the underlying patterns and factors contributing to absenteeism is crucial for organizations to proactively address this issue.

This is where the power of unsupervised learning comes into play. Leveraging advanced data analytics and unsupervised learning techniques, such as clustering and feature importance analysis, provides valuable insights into the complex dynamics of absenteeism. By delving into the vast amount of employee data and uncovering hidden patterns, we can identify key factors driving absenteeism and develop targeted strategies to mitigate its impact.

In this project, I explore the realm of unsupervised learning to unravel the mysteries of absenteeism and shed light on the factors that influence it. Join me on this journey as we delve into the data, unravel patterns, and unlock the potential for a more efficient and engaged workforce. The project includes four main parts: Data exploration (with some visuals), feature selection (I selected the top 5 most important features) k-means clustering (with k = 3), and the visualisation and interpretation of clusters.

The dataset is a popularly used dataset which has 21 columns and 740 observations. You can find the original dataset below:

Download full dataset here Download

Download metadata file here Download