How to determine the reasons for the achievements and failures of a football club using numbers

https://plus3s.site – football analytics, like a game…

In order to use numbers to get information about the events taking place on the football field, I propose to evaluate how this or that indicator affects the result of the match, and then find out which of the indicators the team is underperforming and how to fix it.

In the form of a heat map, the influence of the main features on the team’s goals scored is presented. Goals scored represent the results of any football team, although you can experiment with other target variables. Only the main ones are presented. Of course, there are others, and there are many, many of them.

We can create a heat map using python, the code is presented below. To do this, we need data – a dataframe (df) and an understanding of what target variable we will investigate (in this case, goals scored Gls). As a result – the heat map presented above or, in other words, the degree of influence of all the main features on the target variable.

# библиотеки, которые понадобятся
import pandas as pd
import matplotlib as mpl
from matplotlib import pyplot as plt
import seaborn as sb
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# читаем файл с данными
df = pd.read_csv('*.csv')
# строим из данных тепловую карту
f, ax = plt.subplots(figsize=(18, 18), dpi=200)
plt.figure(figsize=(10, 68))
df.corr()[['Gls']].sort_values(by='Gls', ascending=False)
heatmap = sb.heatmap(df.corr()[['Gls']].sort_values(by='Gls', ascending=False), vmin=-1, vmax=1, annot=True,
                     cmap='rocket', linecolor="white", linewidths=0.7)
ax.invert_yaxis()
# сохраняем файл в текущей папке
heatmap.figure.savefig('correlation_Gls.png', dpi=200)

Let’s look at some descriptive statistics. To do this, we take data for several seasons of the league. What is presented below is made on the basis of those signs that seem to us, and of course to the computer, the most important. As an example, statistics on shots on target (SoT).

A team, in order to score from 1.7 to 2.43 goals per match, must shoot on target on average 5.84 times. And if, for example, the average indicator for the match is 5.2 shots on target, but at the same time the team scores 2.0 goals, then this means that there is a flaw in shots on target, and the team achieved a good result in terms of goals scored thanks to some other indicators. Therefore, if we tighten up the accuracy, then the result of the team will be better. Got the meaning?

I give a small but useful code for obtaining descriptive statistics)

# разбиваем на равные интервалы
df_2['Gls'] = pd.qcut(df_2['Gls'], q=10)
# описательная статистика
df_m = df_2.groupby('Gls')['SoT'].describe()

Now let’s add more visualizations and see what the data we’re working with looks like.

The colored quads are the range of the largest cluster of observations. The vertical line in the middle of each observation is the fairest, median value. Well, the antennae, indicating the occurring deviations.

output, var2 = 'SoT', 'Gls'
fig, ax = plt.subplots(figsize=(14, 9))
sb.boxplot(x=var2, y=output, data=df_2)
plt.grid(linestyle="--")
ax.set(xlabel="голы забитые (в среднем за матч)", ylabel="SoT")
ax.figure.savefig('SoT.png', dpi=300)

Thus, we conducted a study on only one target variable – goals scored. As the main feature, shots on target were considered. However, a team can score a lot and be the leader in terms of goals scored, but at the same time concede a lot and actually be closer to the middle of the standings. It is necessary to investigate several signs at once in the aggregate! Then we will have a clearer idea of ​​how the team can improve its results by improving what qualities of the game.

Let’s imagine that we did similar research on metrics like average number of passes followed by goals (Ast), ball possession (Poss), shots on target (SoTA), and saves by our team’s goalkeeper (Save).

All indicators in the end should have an average value for the match. Next, we take the current performance of the analyzed team in the season and compare them with the league average for several years. Such a radar chart is easy to build in Excel.

Correspondence of features to goals scored can be determined using the same python and regression.

Xg = np.array(df['SoT']) # значения признака
yg = np.array(df['Gls']) # значение целевой переменной
X_g = Xg.reshape(-1, 1)
# данные для определения соответствия (удары в створ)
# для каждого из чисел мы получим предсказание соответствия признака целевой переменной
X_Gpred = np.array([2.4, 2.5, 2.75, 3, 3.27, 3.5, 4, 4.25, 4.5, 4.75, 5, 5.25, 5.6, 5.9, 6.2, 6.4])
# полиномиальная регрессия
from sklearn.preprocessing import PolynomialFeatures
X_train, X_test, y_train, y_test = train_test_split(X_g, yg, test_size=0.5, random_state=20)
pr = LinearRegression()
quadratic = PolynomialFeatures(degree=1)
pr.fit(quadratic.fit_transform(X_train), y_train) # обучаем модель
y_pr = pr.predict(quadratic.fit_transform(X_Gpred.reshape(-1, 1))) # определяем соответствие признака целевой переменной
# получаем результат для каждого значения X_Gpred
print(f'полиномиальная регрессия \n {y_pr}')
# оценка качества
from sklearn.metrics import mean_squared_error
y = np.array([2.4, 2.5, 2.75, 3, 3.27, 3.5, 4, 4.25, 4.5, 4.75, 5, 5.25, 5.6, 5.9, 6.2, 6.4])
print('Среднеквадратическое отклонение для полиномиальной модели:', mean_squared_error(y, y_pr))

TOTAL:

We have looked at the obvious signs… Yes, if a team scores low or concedes a lot, it can be seen without additional analysis, but the art of football analytics is also to pick up the signs for each team individually! As a result, based on the analysis, we will receive information about what indicators the team needs to improve …

Join plus3s.site!

Victory to you and overcoming difficulties! )

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *