NumPy for the little ones

Mathematics is everywhere in our lives, but in programming, and especially ML, there is twice as much of it. Python is usually taken as an example of the most “scientific” programming language because of its mathematical frameworks. Just as Python can help to operate with mathematical abstractions, some in the research field use Python exclusively for all kinds of scientific research – today we will talk about the NumPy library and working with arrays.

The newest “library” with gadgets in the form of SciPy and Matplotlib is designed for working with multidimensional arrays. NumPy is the basis for many other machine learning libraries such as SciPy, Pandas, Scikit-learn and TensorFlow.

Pandas, for example, is built on top of NumPy and allows you to work with high-level data structures like DataFrame and Series. Using NumPy, you can convert categorical data to a numeric format, for example, using one-hot encoding.

NumPy is implemented in C, at the level of low abstractions, so all work with the library does not proceed in the format of “waiting for two hours to compile the code” – the library is written in a low-level language for maximum speed and efficiency.

Arrays = work with classic matrices and vectors, otherwise multidimensional arrays/ndarray. We have the simplest functions for elements inside sin/cos/or/and, linear operations for the matrices themselves from finding the determinant to multiplying them, support for vectorization – all this is our shortest path to mathematical abstractions, ML.

You can pre-install the utility via pip install numpy, import classically mport numpy as np

Let's go over the simplest “operations” in the library. The main object is arrays. NumPy provides several ways to create arrays.

The most common ones about creation:

Creating arrays from lists.

Creating arrays of a given size or shape with initial values

Arrays of zeros or arrays with random elements.

Generating standard lists arrays

From the list:

arr = np.array([1, 2, 3, 4, 5], float)

We are given two arguments as input: a list to convert to an array and a data type.

From the given size and initial values

zeros_arr = np.zeros((3, 3))  # Массив нулей размером 
3x3ones_arr = np.ones((2, 2)) # Массив единиц размером 2x2

Yes, inside Numpy you can quickly define an array of any size with the same elements.

An array with randomized values.

rand_arr = np.random.rand(3, 3)  # Массив 3x3 со случайными значениями

Arrays have several features:

– The size of the array is fixed and cannot be changed after creation. On the one hand, we lose flexibility and some functionality; on the other hand, the library performs operations with arrays faster.

– All elements must have the same data type.

But you can specify not only integer values ​​- you can choose the data type.

To create multidimensional arrays, we should simply write different axes separated by commas.

matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) 
#двумерный массив 3x3. 

tensor = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]], [[9, 10], [11, 12]]]) 
#наш маленький трехмерный тензор 3х3х3. 

How to operate our minimatrices?

Indexing and slicing are ways of accessing elements and subarrays in NumPy arrays.

We remind you that indexing starts from 0.

To access array elements, specify their index or indices in square brackets, separated by commas for multidimensional arrays. You can use negative indices to access elements from the end of an array. We will not write an example; it is enough to simply enter the matrix into the function argument when calling it.

Slicing allows you to select subarrays by given index ranges using the syntax [start:stop:step],

where start is the starting index (enabled),

stop – ending index (not included),

step – step.

It looks something like this:

arr1d = np.array([1, 2, 3, 4, 5])

# Нарезка для выбора подмассива
print(arr1d[1:4])    # Получаем [2 3 4] (элементы с индексами от 1 до 3)
print(arr1d[:3])     # Получаем [1 2 3] (элементы до индекса 3)
print(arr1d[::2])    # Получаем [1 3 5] (каждый второй элемент)

# Тоже самое можно проделать с двумерными матрицами
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Нарезка для выбора подматрицы
print(arr2d[0:2, 1:])   
# Вывод: [[2 3] [5 6]] (подматрица с 1 по последний столбец в строках с 0 по 1)
print(arr2d[:2, :2])    
# Вывод: [[1 2] [4 5]] (подматрица в строках с 0 по 1 и столбцах с 0 по 1)

How to manage matrix elements (do whatever you want with them)

In this material we will not describe, for example, the use of logical “Or” and “And”, since this is too simple – a complete beginner should open the documentation. But there are a couple of interesting functions that you can work with in the future and reduce the time spent writing “crutches”.

For example, Numpy has statistical functions

np.mean() – the average value of the array elements.

np.median() – median of array elements.

np.std(): standard deviation of array elements.

np.var(): variance of array elements.

np.percentile() – array quantiles.

Or broadcasting, which some people don’t know about.

For example, the function np.broadcast() allows you to perform element-by-element operations between arrays of different shapes and dimensions. This object is not an array, but provides information on how to perform operations on arrays of different shapes without explicitly copying the data or changing its shape.

arr1 = np.array([1, 2, 3])         # Форма (3,)
arr2 = np.array([[4], [5], [6]])   # Форма (3, 1)

broadcasted = np.broadcast(arr1, arr2)

print(broadcasted.shape)

We even borrowed a small diagram from Pinterest showing “array views”.

You can also work with polynomials in NumPy. If you really love linear mathematics, then to the end. You can create a polynomial using the np.polyid() function. And some “special” types of equations:

np.polynomial.legendre.Legendre() – Legendre polynomials.

np.polynomial.chebyshev.Chebyshev() – Chebyshev polynomials.

np.polynomial.laguerre.Laguerre() – Laguerre polynomials.

For example, Chebyshev polynomials are used in countable approximation or interpolation methods.

# Создаем полином Чебышева первого рода
chebyshev_poly = np.polynomial.chebyshev.Chebyshev([1, 0, -1])  # Многочлен: x^2 - 1

# Вычисление значения многочлена в заданной точке
x = 0.5
value = chebyshev_poly(x)

# Интегрирование многочлена
integral = chebyshev_poly.integrate()

Working with matrices/arrays like adults

The NumPy library provides many linear algebra operations that allow you to work with vectors, matrices, and other linear data structures.

Matrix multiplication (np.dot())matrix transposition (np.transpose())finding the inverse matrix (np.linalg.inv())solving systems of linear equations (np.linalg.solve()) and finding eigenvalues ​​and eigenvectors (np.linalg.eig()).

Using the example of solving a system of linear equations:

Let's say we have a supervised learning problem where we have a training data set consisting of features (inputs) and corresponding target variables. We want to find the parameters of the model that best approximates our data set.

In the context of a linear model, where we build a relationship between inputs and target values, we look for the optimal weights and bias for that model. We represent this relationship as an equation where the inputs are multiplied by the weights and a bias is added. Our goal is to find the weights and bias to minimize the model error.

The np.linalg.solve() function solves a system of linear equations that models this equation. We represent this system as a matrix, where the rows represent examples of data and the columns represent features. We find the optimal values ​​for the weights and biases by solving this system of equations. These optimal parameters allow us to build and evaluate a linear model for our data in machine learning problems.

In code it looks something like this:

# Создаем матрицы признаков X и вектора целевых переменных y (пример)
X = np.array([[1, 2], [3, 4], [5, 6]])
y = np.array([3, 7, 11])

# Решаем линейное уравнение
w, b = np.linalg.solve(X.T @ X, X.T @ y)

print(" Тут у нас оптимальные значения весов:")
print(w)
print(" А тут оптимальное значение смещения:")
print(b)

Non-obvious functions:

Yes, indeed, NumPy has enough functions and listing them all would stretch the material into a couple of tens of kilosigns. But there are several non-obvious solutions that will simplify your work in the moment.

np.where() Allows you to perform conditional indexing. It returns the indexes of elements that satisfy a given condition. You can also use it to replace values ​​in an array based on a specific condition.

arr = np.array([1, 2, 3, 4, 5])
indices = np.where(arr > 2)
print(indices)  # Вывод: (array([2, 3, 4]),)

np.unique() – the function returns unique values ​​in an array in sorted order. This can be useful for removing duplicates or analyzing unique values.

arr = np.array([1, 2, 3, 1, 2, 4, 5])
unique_values = np.unique(arr)
print(unique_values)  # Вывод: [1 2 3 4 5]

np.clip() – The function clips the array values ​​so that they fall within a certain range. This can be useful for limiting the values ​​of an array at the top and bottom.

arr = np.array([1, 2, 3, 4, 5])
clipped_arr = np.clip(arr, 2, 4)
print(clipped_arr)  # Вывод: [2 2 3 4 4]

np.ravel() “straightens” the array, turning a multidimensional array into a one-dimensional one. This can be useful for quickly converting an array to a one-dimensional form.

arr = np.array([[1, 2, 3], [4, 5, 6]])
raveled_arr = np.ravel(arr)
print(raveled_arr)  # Вывод: [1 2 3 4 5 6]

A simple example or test of NumPy on linear regression without third-party libraries and frameworks

We create NumPy arrays to represent the training data, add a ones column to the input features to account for the intercept term in the model, and calculate model parameters (weights) using the normal equation for linear regression.

X_train = np.array([[1], [2], [3], [4], [5]])  # Входные признаки (одномерный массив)
y_train = np.array([2, 4, 5, 4, 5])            # Целевая переменная

# Добавляем столбец с единицами для учета свободного члена в модели
X_train_with_bias = np.c_[np.ones((X_train.shape[0], 1)), X_train]

# Обучение модели линейной регрессии
theta = np.linalg.inv
(X_train_with_bias.T.dot(X_train_with_bias)).dot(X_train_with_bias.T).dot(y_train)

# Вывод коэффициентов модели
print("Коэффициенты модели:", theta)

# Предсказание на новых данных
X_test = np.array([[6], [7]])                 # Новые входные признаки
X_test_with_bias = np.c_[np.ones((X_test.shape[0], 1)), X_test]
y_pred = X_test_with_bias.dot(theta)

# Вывод предсказанных значений
print("Предсказанные значения:", y_pred)

Function np.linalg.inv() used to calculate the inverse matrix.

We then create new input features for prediction and perform predictions on the values ​​of the target variable using the trained model parameters.

And you don’t need to write to us: use regular lists. We'll just laugh. Although the example is cartoonish, the whole point of NumPy is the basic library for linear operations, of which there are many in ML and more to come.

In this article, we quickly went over the functionality of the library and even showed with an example how to build a simple linear regression. The biggest piece of advice that can be given to any beginning ML/Data specialist is that mathematics is everything. Thanks to hand-written research, we got excellent GPTs, otherwise transformers, and thanks to linear algebra, we can prescribe “gradient” boosting.

Writing at a low level of abstraction, we are not talking about assembly language and C, trying to denote mathematics through libraries such as NumPy is the path to a better understanding of machine learning in general.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *