
In 2022, Youtuber Alex aka Alex the Analyst conducted an online survey of data professionals across the world. The purpose of the project was to establish the extent of job satisfaction of various professionals in the data sector. In addition to this, the survey also asked data professionals about their salary range, favorite programming language, and the difficulty they experienced while entering the data industry. Alex has made the data available publicly for other data professionals to view, analyse, obtain insights, and make interesting visualisations as they see fit.
Upon this background, I decided to dig into the dataset myself and visualise some interesting information emerging from the survey. Data were downloaded directly from the survey instrument and was not subjected to any pre-cleaning. To ensure usability for the purpose of my project, I decided to conduct some bare minimum cleaning before loading the dataset. All cleaning was conducted in Power BI and can be traced in the project’s Power Query editor. The following is a brief overview of the actions I took to clean and analyse the dataset.
Data cleaning
- I began by importing the Excel raw dataset, then sent it to Power Query for transformation before loading into Power BI.
- The salary variable was collected as a range in the format “45k-60k”, but I wanted to convert it into a numerical amount that I could use for analysis. As a result, I insitituted operations to split the column by delimiters (-, k), until I had the two values in different columns (the lower and upper value). I then created a custom column “average salary” by adding the two values and dividing by 2.
- The columns such as job title, industry, and country of residents had a very number of custom entries classified as “other”. To simplify cleaning, I used the split column by delimiter column to remove all the additional entries besides the ones which were part of the multiple choice in the survey. This left me with consistent data as well as the “other” value. Ideally, in a corporate scenario, I would have gone through each “other” selection to code it into the dataset for the most accurate result. I did not do this since it would have taken a lot of unnecessary time without adding much value in the specific case of demonstrating my data cleaning abilities.
- I also deleted some completely irrelevant columns added by the Survey platform to provide metadata about the project.
- Once I was satisfied with the cleaning, I saved the changes and loaded the data onto Power BI.
Data visualisation
- I added two cards showing the total survey participants as well as the average age of participants.
- I wanted to find out the average salary by job title, hence I added another visualisation to show this. I added a color legend to make the visualisation more appealing and edited the visualisation title.
- I also added a stacked column chart showing the favorite programming language employed by data professionals. I used the job title as color legend to find out which language was popular with different data professionals.
- To visualise the level of satisfaction with salary and work-life balance, I created two gauges showing the average rating and how it related to the rating scale (1-10).
- I also used a tree map to visualise the country of residence of the data professionals as well as a pie chart showing how difficult it was for all to break into data.
- I added a slicer to allow users to filter results by sex of participants. Also, since all visualisations are interactive, clicking on specific sections on each visualisation also allows users to filter the data by whatever criteria they selected, e.g., a specific country.
- Lastly, I customised the dashboard theme to my liking and published the project.
Insights obtained
- In terms of average salary, data shows that data scientists earned far more than all others on average, followed by data engineers and data architects.
- When asked about their favorite programming language, the majority of participants noted that they like using Python. R was a distant second.
- In relation to satisfaction with work-life balance, the average participant was mildly satisfied. However, participants were much less satisfied with their salary amount, on average.
- Lastly, the majority of participants said it was neither easy nor difficult to break into data, followed by those who thought it was difficult, and those who thought it was easy.