Getting Started with Linear Algebra in NumPy

Library NumPy for Python, it is the foundation of data science and bioinformatics. However, even though every Python programmer is familiar with the name of the package to install:

pip install numpy

and the library import command:

import numpy as np

In practical tasks, few people use it explicitly. This is because libraries for applied data analysis, such as pandas, Matplotlib, scikit-learn, rely on NumPy and call it “under the hood”. In real tasks, the highest-level libraries are more often used, but knowing NumPy at a theoretical level is very important.

What can NumPy do?

The key feature of NumPy is to operate on 1-dimensional and 2-dimensional arrays of numbers (known in linear algebra as vectors And matricesrespectively) without the user writing a cycle. The cycles themselves are launched, but executed hidden, and the code of the operations is written in the same way as in a linear algebra notebook.

For example, if array1 — is a two-dimensional matrix, then the operation

array1 + 3

will add each element of the matrix to the scalar.

If vector1 And vector2 — vectors of the same dimension, then the operation

vector1*vector2

will perform their Hadamard (element-wise) multiplication.

NumPy has other useful functions that make calculations easier. For example, if you want to extract square roots from an entire array of numbers (e.g., array1), then the function

np.sqrt(array1)

will extract the root of each element of the array and return the roots as a list. Such functions are called universal functions.

Detailed documentation on NumPy is available on the website numpy.orgwhere you can work with NumPy interactively in a browser window. If you're just starting out in Python, play around with it like a linear algebra calculator.

By the way, for studying applied linear algebra in the way it is implemented in NumPy, the following book is suitable:

Cohen M.I. Applied Linear Algebra for Data Scientists / translated from English by A.V. Logunov. – M.: DMK-Press, 2023. – 328 p.: ill.

Indexing and transposition

There are some particularly frequent operations of the NumPy library, and these include indexing And transpositionWhen working with big data, you will often have to call them explicitly.

Indexing is called selecting an element in a NumPy array by its index—its position or “number” in the matrix.

If my_array – this is your matrix, then the command of the type

my_array[i, j]

will return the element at the intersection of row i and column j.

But it's not that simple!

Pay attention to the left part Fig.1. Let my_array — is a 4×3 matrix surrounded by a rectangular frame. What happens if we set the element my_array[1, 2]? Intuitively, it seems that the program will return 4 – after all, there is a four in the first row and second column. But Python returns 47, because it counts from zero. The first element in any array for it is always zero. Such languages ​​are called zero-indexed.

Fig. 1. Indexing and transposition in NumPy

Fig. 1. Indexing and Transposition in NumPy

This makes accessing table rows and columns by numbers extremely inconvenient. But fortunately, there is a good solution – the library pandas with its data frames, which works “on top” of NumPy. We already partially talked about it in the previous article.

Besides indexing, you often need to lay a two-dimensional array on its side, swapping rows and columns. NumPy has two solutions for this!

First of all, every NumPy array has an object (Python is an object-oriented language, almost everything is an object) there is a one letter method – T. It is precisely this that carries out the transposition:

my_array.T

But there is also a more familiar option – calling a function transpose from the NumPy library, passing it our matrix as an argument. For example, if you imported NumPy as np (this is its standard name when importing), then transposition is performed as follows:

np.transpose(my_array)

The transposition operation itself is often encountered in bioinformatics tasks, almost at the routine level – some libraries and methods work better with data in rows, some – with columns. Knowing the commands for transposition allows you to perform it “in one fell swoop”, without being distracted by data conversion and writing new code.

Multiplication of vectors and matrices

Of all the operations that can be performed using the NumPy library, vector and matrix multiplication operations deserve special attention. The fact is that most other operations on vectors and matrices are intuitively understandable, but point (scalar) vector multiplication and standard matrix multiplication look unexpected at first glance.

At the same time, many basic concepts and methods of machine learning require an understanding of these operations. For example, without point multiplication of vectors, it is impossible to implement the mathematical operation of convolution – and convolutional neural networks used in computer vision and even in the first version of AlphaFold. And the simplest linear regression requires matrix multiplication: for two variables, everything can be written without it, but for multiple linear regression, you can’t do without matrices.

Of course, vector and matrix operations are usually performed “under the hood” of higher-level libraries like pandas and scikit-learn. But knowing how to implement them in numpy is also useful. The ability to implement machine learning functions “from scratch” will come in handy for developing your own mathematical apparatus (in bioinformatics, this is often needed) or implementing code in another language. For example, in R.

Dot (scalar) product of vectors — is the product of vectors expressed by a single number showing the relationship between them. To find it, you need to multiply both vectors element by element and find the sum of the elements of the resulting vector. The formula for this action is given in Fig. 2 top left.

Example: [1 11 6 4] • [22 3 5 4] = 22 + 33 + 30 + 16 = 101

Fig. 2. Dot (scalar) product of vectors and standard product of matrices

Fig. 2. Dot (scalar) product of vectors and standard product of matrices

The dot product of vectors is directly used in the operation standard matrix multiplicationwhich is reduced to the point multiplication of rows by columns (rows and columns of the matrix are considered as vectors).

In standard matrix multiplication, the commutative law does not hold—that is, the order of the factors matters, and swapping the factors changes the product. However, not all matrices can be multiplied by each other: the number of columns in the first multiplier must be equal to the number of rows in the second multiplier. The product will be a matrix with the same number of rows as in the first multiplier, and the same number of columns as in the second. In this case, each (i,j)-th element of the product matrix will be dot product i-th row of the first multiplier and j-th column of the second factor. The formula is shown at Fig. 2 on the bottom left, and the visualization of this action is on Fig. 2 on the right.

The most interesting thing is that in NumPy both operations – point vector multiplication and standard matrix multiplication – are provided by a single function np.dot(a,b). If a And b — vectors, then NumPy will return their dot product, if matrices — their standard product.

import numpy as np
c = np.dot(a,b)

This makes mathematical sense: when performing dot multiplication of vectors, NumPy simply treats the first vector as the only row of a matrix, and the second vector as the only column of another matrix. And the result is a “matrix” of one element – that is, just a scalar, as it should be in dot multiplication of vectors. This is a special case.

The standard product of matrices allows one to “encode” in one matrix all the connections between all the elements of two matrices — the factors. That is why such an operation is indispensable in statistics and machine learning. For example, special cases of multiplying two matrices lead to the widely used in statistics covariance matrices And permutation matriceswhich we will return to in the following posts.

And if you read this post to the end and understood something else, then your chances of mastering machine learning are already quite good. Stay with us!

More educational materials, including short formats, can be found on our Telegram channel eduopenbio.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *