Features of the scatterplot() function of the seaborn library
Relational plots are a type of graph that shows the relationships between two or more variables. These graphs allow you to find dependencies and patterns in the data.
In the library seaborn There are two main types of relationship graphs:
Scatter plot. A scatterplot shows the relationship between two variables as points on a graph. In the library seaborn scatter diagrams are constructed by the function
scatterplot()
.Line plot. A line graph is a line that connects data points in series. It is especially effective for showing changes in a variable over time. In the library seaborn linear graphs are constructed by the function
lineplot()
.
Also in the library seaborn there is a universal function relplot()
which allows you to create both scatterplots and line graphs.
scatterplot() function
As mentioned above, in the library seaborn the function is responsible for creating scatterplots scatterplot()
. Here are the main parameters of this function:
data
: DataFrame or array with data;x
: column name or data vector for the axisX
;y
: column name or data vector for the axisY
;hue
: column name or data vector to group points by color;size
: column name or data vector to resize points;style
: column name or data vector to change the style of the points (for example, different shapes);palette
: color palette for the variablehue
;hue_order
: sets the order in which categories are displayed for the variablehue
;hue_norm
: allows you to normalize variable datahue
;sizes
: size range for variablesize
;size_order
: sets the order in which categories are displayed for the variablesize
;size_norm
: allows you to normalize variable datasize
specifying a range of values;markers
: list of bullet styles for the variablestyle
;style_order
: sets the order in which categories are displayed for the variablestyle
.
In this article, to demonstrate examples, we will use one of the ones built into the library seaborn datasets, namely a dataset 'penguins'
containing information about three species of penguins living on the islands: Biscoe, Dream, Torgersen.
Let's download this dataset and look at the data contained in it:
import pandas as pd
import seaborn as sns
penguins = sns.load_dataset('penguins').dropna()
penguins.head()
Result:
penguins.info()
Result:
<class 'pandas.core.frame.DataFrame'>
Index: 333 entries, 0 to 343
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 species 333 non-null object
1 island 333 non-null object
2 bill_length_mm 333 non-null float64
3 bill_depth_mm 333 non-null float64
4 flipper_length_mm 333 non-null float64
5 body_mass_g 333 non-null float64
6 sex 333 non-null object
dtypes: float64(4), object(3)
memory usage: 20.8+ KB
Dataset penguins
contains the following columns:
'species'
– penguin species (Adelie, Chinstrap, Gentoo);'island'
– the island where the data was collected (Biscoe, Dream, Torgersen);'bill_length_mm'
– beak length in millimeters;'bill_depth_mm'
– beak depth in millimeters;'flipper_length_mm'
– fin length in millimeters;'body_mass_g'
– body weight in grams;'sex'
– gender of the penguin (Male, Female).
Let's construct a scatter diagram on which along the axis X
the beak length values will be displayed, and along the axis Y
– beak depth values:
import matplotlib.pyplot as plt
sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm')
plt.show()
Result:
At the moment there is no coordinate grid on the graph. The display of the coordinate grid depends on the selected style. The display style can be changed using the function set_style()
. The white style without displaying a grid is called 'white'
. To display coordinate grid lines on the graph, you need the functions set_style()
pass attribute 'whitegrid'
:
sns.set_style('whitegrid')
sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm')
plt.show()
Result:
In the library seaborn You can also use a “dark” theme with grid lines, which is called 'darkgrid'
:
sns.set_style('darkgrid')
sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm')
plt.show()
Result:
In addition to selecting styles, users can change the color, size, and shape of the points on the graph. Next, let's look at these possibilities.
color parameter of scatterplot() function
The color of the dots is changed by the parameter color
which receives the name of the selected color or its HEX code (hex code). For example, let’s make the points on the graph green:
sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm',
color="green")
plt.show()
Result:
the alpha parameter of the scatterplot() function
Parameter alpha
controls the transparency of points on the scatter plot. This parameter ranges from 0 to 1, where 0 means completely transparent points and 1 means completely opaque points. Adjusting the transparency of the points can be useful to improve the readability of the graph, especially when the points overlap.
For example, let's make the points semi-transparent:
sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm',
color="green",
alpha=0.5)
plt.show()
Result:
parameter s of scatterplot() function
Parameter s
functions scatterplot()
controls the size of the points on the scatter plot. This parameter accepts a numeric value.
Let's increase the size of the points on the graph:
sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm',
color="green",
alpha=0.5,
s=100)
plt.show()
Result:
marker parameter of scatterplot() function
Parameter marker
in function scatterplot()
determines the shape of markers (points) on the scatter plot.
Here are some of the available marker codes:
'o'
: circle;'s'
: square;'^'
: triangle up;'v'
: triangle down;'>'
: triangle right;'<'
: triangle left;'x'
: cross;'*'
: asterisk;'D'
: rhombus;'H'
: hexagon.
For example, let's change the circles to squares:
sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm',
color="green",
alpha=0.5,
s=100,
marker="s")
plt.show()
Result:
hue parameter of the scatterplot() function
Parameter hue
(“hue”) is used to add an extra dimension to a scatter plot by changing the color of the points based on the values of a specified category, allowing you to visualize the differences between groups of data on the same plot.
Parameter hue
the name of the variable by which we want to divide the points into groups is transmitted. For example, let's divide the penguins according to their belonging to a particular island:
sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm',
hue="island")
plt.show()
Result:
Or we’ll divide it into groups by penguin species:
sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm',
hue="species")
plt.show()
Result:
Looking at these two graphs, one can come to some conclusion that Adelie penguins live on all three islands, unlike representatives of the other two species.
Parameter hue
You can transmit not only a categorical variable, but also a quantitative one. For example, let's look at the distribution of data grouped by penguin body mass:
sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm',
hue="body_mass_g")
plt.show()
Result:
In this case, we observe how penguins with different body masses were distributed into different groups, and each of them received its own unique color shade.
From this figure we can conclude that penguins located in the lower right quarter of the graph have more significant body mass compared to the rest.
hue_norm parameter of the scatterplot() function
Parameter hue_norm
used to normalize the color scale when displaying data using the parameter hue
. The parameter is passed a tuple with minimum and maximum values, within which the color palette will change from the lightest tone to the darkest. This can be useful in cases where you want to focus on a specific range of values.
For example, let's make the shades change only at points corresponding to the mass of penguins from 2700 to 4000:
sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm',
hue="body_mass_g",
hue_norm=(2700, 4000))
plt.show()
Result:
In this example, the points corresponding to the mass of penguins over 4000 grams are colored in the darkest color for this palette.
hue_order parameter of the scatterplot() function
Parameter hue_order
used to determine the order of categories that will be displayed in the scatterplot when using the parameter hue
. It accepts a list of values that determine the order in which categories are displayed, allowing you to control which colors will be assigned to specific categories.
For example, let's change the display order of penguin species:
sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm',
hue="species",
hue_order=['Gentoo', 'Chinstrap', 'Adelie'])
plt.show()
Result:
You can learn about other possibilities of the scatterplot() function and how to work with the lineplot() function of the seaborn library from the free part of my course on the Stepik platform: https://stepik.org/204124