Features of the scatterplot() function of the seaborn library

Relational plots are a type of graph that shows the relationships between two or more variables. These graphs allow you to find dependencies and patterns in the data.

In the library seaborn There are two main types of relationship graphs:

  • Scatter plot. A scatterplot shows the relationship between two variables as points on a graph. In the library seaborn scatter diagrams are constructed by the function scatterplot().

  • Line plot. A line graph is a line that connects data points in series. It is especially effective for showing changes in a variable over time. In the library seaborn linear graphs are constructed by the function lineplot().

Also in the library seaborn there is a universal function relplot()which allows you to create both scatterplots and line graphs.

scatterplot() function

As mentioned above, in the library seaborn the function is responsible for creating scatterplots scatterplot(). Here are the main parameters of this function:

  • data: DataFrame or array with data;

  • x: column name or data vector for the axis X;

  • y: column name or data vector for the axis Y;

  • hue: column name or data vector to group points by color;

  • size: column name or data vector to resize points;

  • style: column name or data vector to change the style of the points (for example, different shapes);

  • palette: color palette for the variable hue;

  • hue_order: sets the order in which categories are displayed for the variable hue;

  • hue_norm: allows you to normalize variable data hue;

  • sizes: size range for variable size;

  • size_order: sets the order in which categories are displayed for the variable size;

  • size_norm: allows you to normalize variable data sizespecifying a range of values;

  • markers: list of bullet styles for the variable style;

  • style_order: sets the order in which categories are displayed for the variable style.

In this article, to demonstrate examples, we will use one of the ones built into the library seaborn datasets, namely a dataset 'penguins'containing information about three species of penguins living on the islands: Biscoe, Dream, Torgersen.

Let's download this dataset and look at the data contained in it:

import pandas as pd
import seaborn as sns

penguins = sns.load_dataset('penguins').dropna()
penguins.head()

Result:

penguins.info()

Result:

<class 'pandas.core.frame.DataFrame'>
Index: 333 entries, 0 to 343
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   species            333 non-null    object 
 1   island             333 non-null    object 
 2   bill_length_mm     333 non-null    float64
 3   bill_depth_mm      333 non-null    float64
 4   flipper_length_mm  333 non-null    float64
 5   body_mass_g        333 non-null    float64
 6   sex                333 non-null    object 
dtypes: float64(4), object(3)
memory usage: 20.8+ KB

Dataset penguins contains the following columns:

  • 'species' – penguin species (Adelie, Chinstrap, Gentoo);

  • 'island' – the island where the data was collected (Biscoe, Dream, Torgersen);

  • 'bill_length_mm' – beak length in millimeters;

  • 'bill_depth_mm' – beak depth in millimeters;

  • 'flipper_length_mm' – fin length in millimeters;

  • 'body_mass_g' – body weight in grams;

  • 'sex' – gender of the penguin (Male, Female).

Let's construct a scatter diagram on which along the axis X the beak length values ​​will be displayed, and along the axis Y – beak depth values:

import matplotlib.pyplot as plt

sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm')
plt.show()

Result:

​​At the moment there is no coordinate grid on the graph. The display of the coordinate grid depends on the selected style. The display style can be changed using the function set_style(). The white style without displaying a grid is called 'white'. To display coordinate grid lines on the graph, you need the functions set_style() pass attribute 'whitegrid':

sns.set_style('whitegrid')
sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm')
plt.show()

Result:

​​In the library seaborn You can also use a “dark” theme with grid lines, which is called 'darkgrid':

sns.set_style('darkgrid')
sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm')
plt.show()

Result:

In addition to selecting styles, users can change the color, size, and shape of the points on the graph. Next, let's look at these possibilities.

color parameter of scatterplot() function

The color of the dots is changed by the parameter colorwhich receives the name of the selected color or its HEX code (hex code). For example, let’s make the points on the graph green:

sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm',
                color="green")
plt.show()

Result:

the alpha parameter of the scatterplot() function

Parameter alpha controls the transparency of points on the scatter plot. This parameter ranges from 0 to 1, where 0 means completely transparent points and 1 means completely opaque points. Adjusting the transparency of the points can be useful to improve the readability of the graph, especially when the points overlap.

For example, let's make the points semi-transparent:

sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm',
                color="green",
                alpha=0.5)
plt.show()

Result:

parameter s of scatterplot() function

Parameter s functions scatterplot() controls the size of the points on the scatter plot. This parameter accepts a numeric value.

Let's increase the size of the points on the graph:

sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm',
                color="green",
                alpha=0.5,
                s=100)
plt.show()

Result:

marker parameter of scatterplot() function

Parameter marker in function scatterplot() determines the shape of markers (points) on the scatter plot.

Here are some of the available marker codes:

  • 'o': circle;

  • 's': square;

  • '^': triangle up;

  • 'v': triangle down;

  • '>': triangle right;

  • '<': triangle left;

  • 'x': cross;

  • '*': asterisk;

  • 'D': rhombus;

  • 'H': hexagon.

For example, let's change the circles to squares:

sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm',
                color="green",
                alpha=0.5,
                s=100,
                marker="s")
plt.show()

Result:

hue parameter of the scatterplot() function

Parameter hue (“hue”) is used to add an extra dimension to a scatter plot by changing the color of the points based on the values ​​of a specified category, allowing you to visualize the differences between groups of data on the same plot.

Parameter hue the name of the variable by which we want to divide the points into groups is transmitted. For example, let's divide the penguins according to their belonging to a particular island:

sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm',
                hue="island")
plt.show()

Result:

Or we’ll divide it into groups by penguin species:

sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm',
                hue="species")
plt.show()

Result:

Looking at these two graphs, one can come to some conclusion that Adelie penguins live on all three islands, unlike representatives of the other two species.

Parameter hue You can transmit not only a categorical variable, but also a quantitative one. For example, let's look at the distribution of data grouped by penguin body mass:

sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm',
                hue="body_mass_g")
plt.show()

Result:

In this case, we observe how penguins with different body masses were distributed into different groups, and each of them received its own unique color shade.

From this figure we can conclude that penguins located in the lower right quarter of the graph have more significant body mass compared to the rest.

hue_norm parameter of the scatterplot() function

Parameter hue_norm used to normalize the color scale when displaying data using the parameter hue. The parameter is passed a tuple with minimum and maximum values, within which the color palette will change from the lightest tone to the darkest. This can be useful in cases where you want to focus on a specific range of values.

For example, let's make the shades change only at points corresponding to the mass of penguins from 2700 to 4000:

sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm',
                hue="body_mass_g",
                hue_norm=(2700, 4000))
plt.show()

Result:

In this example, the points corresponding to the mass of penguins over 4000 grams are colored in the darkest color for this palette.

hue_order parameter of the scatterplot() function

Parameter hue_order used to determine the order of categories that will be displayed in the scatterplot when using the parameter hue. It accepts a list of values ​​that determine the order in which categories are displayed, allowing you to control which colors will be assigned to specific categories.

For example, let's change the display order of penguin species:

sns.scatterplot(data=penguins, x='bill_length_mm', y='bill_depth_mm',
                hue="species",
                hue_order=['Gentoo', 'Chinstrap', 'Adelie'])
plt.show()

Result:

You can learn about other possibilities of the scatterplot() function and how to work with the lineplot() function of the seaborn library from the free part of my course on the Stepik platform: https://stepik.org/204124

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *