How to build beautiful graphs in Python with Seaborn

Future students of the course “Python Developer. Professional “ and everyone is invited to take part in an open webinar on the topic “Framework and Metaclasses”.

And now we are sharing the traditional translation of useful material.


Data visualization is a technique that enables data scientists to transform raw data into charts and graphs that provide valuable information. Charts reduce the complexity of the data and make it easier for any user to understand.

There are tons of data visualization tools like Tableau, Power BI, ChartBlocks and others that are no-code tools. They are very powerful and each has its own audience. However, for working with raw data that requires processing, and also as a sandbox, Python is best suited.

Despite the fact that this path is more difficult and requires programming skills, Python will allow you to perform any manipulation, transformation and visualization of your data. It is ideal for data scientists.

Python is the best data science tool for many reasons, but the most important is its library ecosystem. There are many great libraries for working with data in Python, such as numpy, pandas, matplotlib, tensorflow

Matplotlibis probably the most famous graphing library available in Python and other programming languages ​​such as R. It is its level of customization and usability that puts it first. However, some of the actions and customizations during its use can be difficult to cope with.

The developers have created a new library based on matplotlib, which is called seabornSeaborn as powerful as matplotlibbut at the same time provides a great abstraction to simplify graphs and brings in some unique features.

In this article, we will focus on how to work with seaborn to create top-notch charts. If you want, you can create a new project and repeat all the steps, or just contact my seaborn tutorial on GitHub

What is Seaborn?

Seaborn Is a library for generating statistical graphs in Python. It builds on matplotlib and works closely with pandas data structures

Seaborn’s architecture allows you to quickly explore and understand your data. Seaborn captures entire data frames or arrays containing all of your data and performs all the internal functions needed for semantic mapping and statistical aggregation to transform data into informative graphs.

It abstracts away complexity, allowing you to design graphics according to your needs.

Installing Seaborn

Install seaborn As easy as any other library, you need your favorite Python package manager for that. During installation seaborn the library will install all dependencies including matplotlib, pandas, numpy and scipy

Let’s install already seaborn and of course also the package notebookto access the data sandbox.

pipenv install seaborn notebook

Apart from that, let’s import a few modules before getting started.

import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib

Building the first charts

Before we start plotting graphs, we need data. Charm seaborn in that it works directly with objects dataframe of pandaswhich makes it very comfortable. Moreover, the library comes with some built-in datasets that can be used directly from the code, and not have to load files manually.

Let’s see how this works on an airplane flight dataset.

flights_data = sns.load_dataset("flights")
flights_data.head()

year

month

passengers

0

1949

Jan

112

one

1949

Feb

118

2

1949

Mar

132

3

1949

Apr

129

4

1949

May

121

All the magic happens when the function is called load_datasetwhich expects the name of the data to load and returns dataframe… All these datasets are available in repositories on Github

Scatter Plot – Scatter Plot

A scatter plot is a chart that displays points based on two dimensions of a dataset. It is very easy to create a scatterplot in seaborn, as it only requires one line of code.

sns.scatterplot(data=flights_data, x="year", y="passengers")

Easy, isn’t it? Function scatterplot takes in the dataset to be rendered and the columns that will act as the x and y axes.

Line Plot – Line Plot

This graph draws a line that represents the development of continuous or categorical data. This kind of graph is popular and famous and easy to create. As before, we will use the function lineplot with a dataset and columns representing the x and y axes. Will do the rest for us seaborn

sns.lineplot(data=flights_data, x="year", y="passengers")

Bar Chart – Bar Plot

This is probably the most well-known type of chart, and you guessed it we can build this type of chart using seaborn, just as we did it for line and scatterplots using the function barplot

sns.barplot(data=flights_data, x="year", y="passengers")

She’s very colorful, I know. Later we will learn how to customize it.

Extending functionality with matplotlib

Seaborn builds on matplotlib, extending its functionality and abstracting complexity. At the same time, seaborn does not lose its power. Any seaborn chart can be customized using functions from the matplotlib library. This mechanic can come in handy in certain cases and allows seaborn to take advantage of matplotlib’s capabilities without having to rewrite all of its functions.

Let’s say you want to plot several charts at the same time using seaborn, in this case you can use the function subplot from matplotlib.

diamonds_data = sns.load_dataset('diamonds')
plt.subplot(1, 2, 1)
sns.countplot(x='carat', data=diamonds_data)
plt.subplot(1, 2, 2)
sns.countplot(x='depth', data=diamonds_data)

Using the function subplot several diagrams can be built on one graph. The function takes three parameters: the first is the number of rows, the second is the number of columns, and the third is the number of charts.

We render one seaborn plot on each plot part, mixing matplotlib and seaborn functionality.

Seaborn and Pandas

We have already said that seaborn loves pandas to the point that all of its functions are built on a structure dataframe… Up to this point we have been using seaborn with preloaded data, but what if we want to render data that has already been loaded with pandas?

drinks_df = pd.read_csv("data/drinks.csv")
sns.barplot(x="country", y="beer_servings", data=drinks_df)

Create beautiful graphs with styles

Seaborn gives you the ability to change the interface of your charts. To do this, we have five styles at our disposal out of the box: darkgrid, whitegrid, dark, white and ticks

sns.set_style("darkgrid")
sns.lineplot(data = data, x = "year", y = "passengers")

Here’s another example.

sns.set_style("whitegrid")
sns.lineplot(data=flights_data, x="year", y="passengers")

Cool use cases

We’ve learned the basics of seaborn, and now let’s put this knowledge into practice and build several charts with the same dataset. In our case, you are using the data set “tips»Which can be downloaded directly from seaborn.

First, let’s load the dataset.

tips_df = sns.load_dataset('tips')
tips_df.head()

total_bill

tip

sex

smoker

day

time

size

0

16.99

1.01

Female

No.

Sun

Dinner

2

one

10.34

1.66

Male

No.

Sun

Dinner

3

2

21.01

3.50

Male

No.

Sun

Dinner

3

3

23.68

3.31

Male

No.

Sun

Dinner

2

4

24.59

3.61

Female

No.

Sun

Dinner

4

I like to display the first few rows of a dataset to get an idea of ​​the columns and the data itself. I usually use several pandas functions to deal with data problems like values null, or add information to the dataset that may be useful. You can read more about this in pandas manual

Let’s add another column that will be the percentage of the tip for the entire bill.

tips_df["tip_percentage"] = tips_df["tip"] / tips_df["total_bill"]
tips_df.head()

Now the data looks like this:

total_bill

tip

sex

smoker

day

time

size

tip_percentage

0

16.99

1.01

Female

No.

Sun

Dinner

2

0.059447

one

10.34

1.66

Male

No.

Sun

Dinner

3

0.160542

2

21.01

3.50

Male

No.

Sun

Dinner

3

0.166587

3

23.68

3.31

Male

No.

Sun

Dinner

2

0.139780

4

24.59

3.61

Female

No.

Sun

Dinner

4

0.146808

And now we will start building graphs.

Tip percentage

Let’s try to figure out the percentage of tip and bill. For this we use the function histplotwhich will generate the histogram.

sns.histplot(tips_df["tip_percentage"], binwidth=0.05)

To make everything read well, we had to set up the property binwidth, but now we can understand and evaluate the data faster. Most customers tip 15 to 20% of the bill, but there are a few cases where tips exceed 70%. These values ​​are called anomalies or outliers and are always worth looking at to see if these values ​​are in error.

I was also wondering if the tip percentage changes with the time of day.

sns.histplot(data=tips_df, x="tip_percentage", binwidth=0.05, hue="time")

This time, we created a chart using the entire dataset, not a single column, and set the property hue per column time… This will use a different color for each time value in the chart and add a legend to it.

Total tip for a specific day of the week

Another interesting metric is the number of tips that staff receive depending on the day of the week.

sns.barplot(data=tips_df, x="day", y="tip", estimator=np.sum)

It seems like Friday is a good day to stay at home.

Influence of table size and day of the week on tip

Sometimes you need to understand how several variables affect the final result. For example, how does the day of the week and table size affect tip percentage?

To build the following diagram, we combine the function pivot from pandas for preprocessing and then draw a heatmap.

pivot = tips_df.pivot_table(
    index=["day"],
    columns=["size"],
    values="tip_percentage",
    aggfunc=np.average)
sns.heatmap(pivot)

Conclusion

Of course, there are many more things we can do with seaborn, you can find out about them in official documentation

I hope you enjoyed this article as much as I did. Thanks for reading!


Learn more about the course “Python Developer. Professional “.

Register for the webinar on the topic “Frameworks and Metaclasses”

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *