specialist with formal education or experience?

To the start of the flagship data science course we share a small study on salary based on Stack Overflow survey data, and also very briefly introduce readers to the Bambi Bayesian modeling library. For details, we invite under cat.


What for?

As a developer or data scientist, you may be wondering:

  • Will you earn more with experience? If yes, how much?

  • Are there other factors such as academic degree, position and size of the company? Do they affect salaries?

Let’s answer these questions using visualization and Bayesian inference.

Data

Using the 2018 survey stack overflow. Let’s load the data into the current directory and execute the code:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("survey_results_public.csv")

Let’s choose not students, but developers from the USA who are fully employed:

sample = df.query(
    "Country == 'United States' & Employment == 'Employed full-time' & Student == 'No'"
)

Let’s remove salaries exceeding $300,000:

normal_salary = sample[sample["ConvertedSalary"] < 300_000].copy()

And get the salary range:

# Group by salary range
bins = [0, 25_000, 50_000, 75_000, 100_000, 125_000, 150_000, 300_000]

normal_salary["Salary Range"] = pd.cut(
    normal_salary.dropna(subset=["ConvertedSalary"])["ConvertedSalary"], bins=bins
)

salary_groupby = normal_salary.groupby("Salary Range").agg({"Respondent": "count"})

# Create a bar plot
salary_groupby.plot(kind="bar")
plt.show()

It looks like most developers make between $75,000 and $100,000 a year. For a developer salary in the US, it sounds plausible.

Salary and experience

Let’s show the effect of experience on salary using a box with a mustache:

normal_salary.YearsCoding.head(10)

Let’s convert the YearsCoding strings to pandas.Interval:

def str_to_interval(text: str):
    if isinstance(text, str):
        years = text.split()[0].split("-")
        if len(years) == 2:
            return pd.Interval(int(years[0]), int(years[1]))
        else:
            return pd.Interval(int(years[0]), float("inf"))

    return text

# Convert string to pandas.Interval    
normal_salary.YearsCoding = normal_salary.YearsCoding.apply(str_to_interval)

# Sort by years
years_sorted = normal_salary.dropna(
    subset=["ConvertedSalary", "YearsCoding"]
).sort_values(by="YearsCoding")

And visualize the data:

sns.boxplot(data=years_sorted, x="YearsCoding", y="ConvertedSalary")
plt.show()

Great, the average salary seems to increase with experience.

Please note that the data above is just a sample. How to check that there is a pronounced difference in the average salary of developers with 3 to 5 years of experience and developers with up to two years of experience or 6–8 years and up to 2 years? This can be done using the Bayesian t-test with independent samples.

Bayesian t-test with Bambi

What is Bambi?

bambi is a high-level interface for building Bayesian models in Python. It is written on top of Pymc3. In other words, Bambi is similar to Pymc3 but much easier to use:

pip install bambi

Bayesian t-test by years of experience

Let’s start by comparing the average salary of developers with up to 2 years experience from 3 to 5 years.

# Filter
sample1 = years_sorted[
    (years_sorted.YearsCoding == pd.Interval(0, 2))
    | (years_sorted.YearsCoding == pd.Interval(3, 5))
]

# visualize
sns.boxplot(data=sample1, x="YearsCoding", y="ConvertedSalary")
plt.show()

Let us denote the average salaries:

So we can write:

It can be seen from the equation above that if there is no difference in average salaries between the two groups, then = 0. Let’s use Bambi to draw 4000 samples of β and find the distribution of β. Assume that the distribution of salaries with experience up to 2 and from 3 to 5 years is normal. Then:

import arviz as az
import bambi as bmb

# Convert to string
sample1.YearsCoding = sample1.YearsCoding.astype(str)

# Build model
model1 = bmb.Model("ConvertedSalary ~ YearsCoding", sample1)
result1 = model1.fit(random_seed=0)

In the code above, ConvertedSalary ~ YearsCoding says that YearsCoding affects ConvertedSalary. Let’s summarize the results:

az.summary(result1)

In the table above YearsCoding[(3.0, 5.0)] is the vp of the equation above. We can see that 94% of the highest density (HDI) is between 8418.925 and 14,348.040. Imagine the distribution :

az.plot_posterior(result1.posterior["YearsCoding"], ref_val=0)

Based on the distribution above, we can say that:

  • the difference in salaries between the two groups is $11,400;

  • the main layer of differences, which is 94%, lies between $8418,925 and 14,348,040.

Since all β values ​​are much higher than 0, we can say for sure that there is a significant difference between the average salary of developers with less than 2 years of experience and those with 3-5 years of experience. Let’s check if all values ​​of β are greater than 0:

(result1.posterior["YearsCoding"] > 1).values.mean() # Output: 1.0

Yes, that is right.

Bayesian t-test and work experience

Let’s compare other numbers:

years_sorted.YearsCoding = years_sorted.YearsCoding.astype(str)

all_model = bmb.Model("ConvertedSalary ~ YearsCoding", years_sorted)
all_results = all_model.fit(random_seed=0)

yearsCoding_summary = (
    az.summary(all_results)
    .drop(["ConvertedSalary_sigma", "Intercept"])
    .sort_values(by="mean")
)

The average salary increases with experience, but its growth slows down over the years.

Salary and other factors

What about other factors such as formal education, company size, and type of developer? Let’s look at the influence of these factors. You can find the code for the graphs in this notebook.

Salary and formal education

Does a formal degree affect how much you earn? We can find this out using a box with a mustache, the boxes of which are sorted by level of education.

Based on the graph above, people without formal education do not seem to earn as much as masters and more than people with a bachelor’s degree. We can say that a formal academic degree correlates little with salary.

Salary and company size

How does the size of the company affect salary? In the graph below, the average salary in large companies seems to be higher than the average salary in small companies.

Salary and type of developer

The graph below shows that the average salary of executives such as managerial engineer, general manager, CTO is higher than the average salary for other positions. However, the differences are small. Students seem to earn significantly less than full-time developers.

The pay distribution of C executives (CEO, CTO, etc.) is more sparse than the others. It is possible that the salaries of top executives vary greatly depending on the size of the company.

Outcomes and next steps

Based on the above analysis, we can say that:

  • there is a significant difference in the average salary between developers with less experience and developers with more experience;

  • the presence of a formal diploma practically does not affect the amount of wages;

  • The median salary increases slightly as the size of the company increases and with a higher position.

So should you graduate or gain more years of experience to earn a bigger salary? From the analysis of the data above, it follows that the best investment is to get more years of experience in the field of interest to you.

What’s next?

Can you use a Bayesian t-test to determine if there is a significant difference in average salary between companies of different sizes or between different job types? Fork freely repository articles.

Links

stack overflow. 2018–05–15. Stack Overflow 2018 Developer Survey, Version 2. Database Contents License. Retrieved 2021–01–06 from this anchor.

You can continue your study of data science and Python in our courses:

Learn more here.

Professions and courses

Data Science and Machine Learning

Python, web development

Mobile development

Java and C#

From basics to depth

As well as

Similar Posts

Leave a Reply Cancel reply