How to highlight time intervals on charts


Plot time interval on time series plot using Python

Often time series analysis requires many factors to be sorted out in search of possible relationships. And usually, factors are just some events that happened in a certain period of time and did not directly affect the target indicator. That is, I want to “highlight” the time range on the time series chart, and also so that each type of event has its own color.
As it turned out, finding a solution is much more difficult than implementing it. The code is based on an article with geeksforgeeks.orghowever, my solution is presented as a function, albeit not very elegant, that allows you to automatically generate different colors for different time ranges, and also collects a legend.
To illustrate, let’s see if there are any relationships between volcanic eruptions in Russia and the average deviation of world temperature from the baseline. We take the temperature deviation curve from hereand the database with volcanic eruptions can be downloaded here. In order not to load the graph, we will select only eruptions with VEI of 4 or more (Volcanic Explosivity Index – a metric of the strength of the eruption).

Sequencing

In general, everything that needs to be done fits into 4 points:

  1. announce subplots

  2. “we build” scatter or plot with target value

  3. by using axvspan add the necessary time intervals

  4. plt.show() – hooray, time intervals are highlighted on the chart

We build our own schedule

Libraries and data loading

To build a graph, you need a function subplots and axvspan from the library matplotlib.pyplot. It is most convenient to store data in dataframes, so we also load pandas and to generate colors, taking into account the possible need for a large number of them, we will use the method random from numpy.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

Loading data. For some reason, the database of volcanic eruptions unloads in very old versions of Excel, the file will either have to be re-saved manually, or search for a suitable parameter engine(I have not found).
It is important that all temporary data had the same type and I use everywhere for convenience datetimebut strictly speaking this is not necessary – you can display just numeric data.

Data loading and preprocessing
# загружаем данные в датафремы
df_temp = pd.read_csv('annual_temp.csv')
df_eruptions = pd.read_excel('eruptions.xlsx')
# приводим данные в datetime для температурных данных
df_temp['Year'] = pd.to_datetime(df_temp['Year'], format="%Y")
df_temp.dropna(inplace=True) # убираем пропуски
df_temp = df_temp.groupby('Year').mean() # данные за каждый год представлены несколькими источниками, группируем и берем среднее
df_temp['Year'] = df_temp.index # для удобства создадим столбец с годами
# данные о изврежения нужно фильтровать
df_eruptions = df_eruptions[['Volcano Name', 'VEI', 'Start Year', 'Start Month', 'Start Day', 'End Year','End Month', 'End Day']] # нужные столбцы
df_eruptions.dropna(inplace=True) # убираем пропуски
df_eruptions = df_eruptions[(df_eruptions['VEI'] >=4) &(df_eruptions['Start Year'] >=1880)] # фильтруем данные по VEI 
# это строки посвящены формированию столбцов с датами начала и конца изврежения
df_eruptions[['Start Year', 'Start Month', 'Start Day', 
               'End Year', 'End Month', 'End Day']] = df_eruptions[['Start Year', 'Start Month', 'Start Day', 'End Year', 'End Month', 'End Day']].astype(str)
df_eruptions['start_date'] = pd.to_datetime(df_eruptions['Start Year'] + '/' 
                                            + df_eruptions['Start Month'] + '/' + df_eruptions['Start Day'], format="%Y/%m.0/%d.0" )
df_eruptions['end_date'] = pd.to_datetime(df_eruptions['End Year'] + '/' 
                                            + df_eruptions['End Month'] + '/' + df_eruptions['End Day'], format="%Y.0/%m.0/%d.0")

Schedule

The data is ready – it’s time to create a visualization.
plt.subplots() – will allow you to create a set of subgraphs and a common layout of subheadings. For example, here you can set the size of the final image
ax.plot – we build a curve of target values ​​(instead of plotmaybe for example scatter or other). We feed x and y into the function, for the legend we can designate label.
ax.axvspan – highlights the time range. At the input, you need to submit the start and end dates of the interval. You can pass color and label to additional parameters.
In this code, the color is assigned to the eruption, i.e. the Shiveluch eruptions of 1964 and recorded as one large eruption since 1999 have different colors. Accordingly, the legend is inflated. The function below implements the same colors.
Since there are many time intervals, a cycle is implemented that runs through lists with data. The conversion to lists is dictated by the convenience and clarity of the code.

fig, ax = plt.subplots(figsize=(20, 6)) # задаем сабплот и размеры графика
ax.plot(df_temp['Year'], df_temp['Mean'], marker="x",label="Среднее отклонение от базовой температуры")
eruption_started = df_eruptions['start_date'].to_list()
eruption_ended = df_eruptions['end_date'].to_list()
for i in range(len(eruption_started)):
    ax.axvspan(eruption_started[i], eruption_ended[i], alpha=0.3, color=np.random.rand(3,), label=df_eruptions['Volcano Name'].to_list()[i] )
plt.legend()
plt.show()
The resulting graph with temperature changes and periods of eruptions
The resulting graph with temperature changes and periods of eruptions

Function

Implementing as a function highlighted_date. It takes as input:

  • pandas.Series with x and y coordinates for the target plot;

  • str – name of the target graph for the legend;

  • pandas.Series with x and y coordinates for time intervals;

  • pandas.Series with time interval labels The function displays a graph with scatter objective function (for compounds can be replaced by plot), time ranges are colored according to labelthose. all Shiveluch eruptions have the same color and are indicated in the legend once. The legend is displayed.

def highlighted_date (x, y, label, x_2, y_2, label_2):
    """
    main_prepare_data(series,series, str, series,series,series)
    подсвечивает временные отрезки и выводит целевую кривую     
    """
    fig, ax = plt.subplots(figsize=(20, 6)) # задаём параметры графика
    ax.scatter(x, y, marker="x", label=label) # строим график целевого значения
    x_2 = x_2.to_list()
    y_2 = y_2.to_list()
    color_dict = {}
    already_labeled = [] # для фильтрации уже вынесенных в легенду
    for j in label_2.unique(): # создаём словарь цветов для уникальных названий
        color_dict[j] = np.random.rand(3,) # цвета задаются рандомом
    label_2 = label_2.to_list()
    for i in range(len(x_2)):
        if(label_2[i] in already_labeled):
                ax.axvspan(x_2[i], y_2[i], alpha=0.3, color=color_dict[label_2[i]])
        else:
            ax.axvspan(x_2[i], y_2[i], alpha=0.3, color=color_dict[label_2[i]], label=label_2[i])
            already_labeled.append(label_2[i])
    plt.legend()
    plt.show()

The call looks like this:

highlighted_date(df_temp['Year'], df_temp['Mean'], 
                 'Среднее отклонение от базовой температуры',
                 df_eruptions['start_date'], df_eruptions['end_date'], 
                 df_eruptions['Volcano Name'])
Now each volcano has its own color and occurs once in the legend
Now each volcano has its own color and occurs once in the legend

Conclusion

I wonder how big the impact of the Bezymyanny volcano eruption in 1955 on a sharp jump in temperature is?

Using python, you can clearly and beautifully display time intervals on a graph of time data. Different events can be colored in different colors or grouped into one. Sometimes, such visualization can significantly improve the quality of the analysis and find relationships.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *