Python package registry in GitLab

What is the article about: when developing projects, and especially distributed applications, it becomes necessary to use some parts of the application as separate modules. For example, compiled classes for gRPC, modules for working with a database, and much more, can be used unchanged in the code base of a dozen microservices. Leaving behind the brackets copy-paste as a “good” bad practice. You can consider git submodules, however, such a solution is not very convenient because, firstly, you need to provide developers with access to specific repositories with a code base, secondly, you need to understand which commit you need to pick up for yourself, and thirdly, installing dependencies for code included in the project as a submodule, remains on the conscience of the developer. Package managers (pip, or better, poetry) can resolve dependencies out of the box, without any extra work, and in general, using a package manager is much easier than working with a submodule. In this article, we will look at how to organize the package registry in GitLab, as well as various pitfalls that await on the way to convenient work with it.

For whom: The article will be useful for developers who are faced with the need to organize a private package registry, as a guide to organizing such a registry in GitLab.

About package registries in general, and why GitLab

Steps to Organize a Private Package Registry reflected in the documentation. In fact, in the described deployment option, the registry is a directory distributed via HTTP, and containing .tar.gz and / or .whl packages, distributed in folders corresponding to package names. Automatically uploading packages to the repository in this case is a “think about it” task. In the case of GitLab, it becomes possible to organize work in such a way that the package code base and the package registry are stored in the same space, which gives the following possibilities:

  • use each repository as a package registry;

  • create a shared registry containing all packages.

The second method, from my point of view, is more preferable, since to install packages from different registries, you will need to generate credentials, register addresses of registries in the configuration, to a greater extent than when using a single registry.

Register preparation

We will use a separate project as a registry of packages we create. Simply create a new project, and optionally give it a name to make it clear that this is a package registry.

The list of downloaded packages and additional information about them can be viewed by going to Packages & Registries > Package Registry.

By the way, GitLab can be a registry not only of PyPI packages, but also npm, NuGet, etc.

Credentials

To load packages into the registry, and install them in the project, we need credentials. The documentation states that the following types of tokens can be used:

  • personal access token – authenticates the owner of the token in accordance with the rights to the repository. This type of token is best used to issue access to specific users to the registry. In this case, the user must be assigned the rights to read from the registry, and the access token is generated by them independently.

  • Project access tokenn – suitable if you need to give access to the package registry to a large number of users. If you use this type of token for transfer to users, there is a danger of “leakage” of the token. Yes, and corny to take away the rights of a particular user (in the case of using one token) will not work. Either everything or nothing.

  • Group access token – Allows access to all packages in a project group.

In my case, the following scheme is used:

  • To load packages into the registry, use project access token with rights write_registry. This token is used exclusively for automatic assembly of packages. Registered in Variables for repositories containing the source code of modules;

  • Users gain access using personal access token.

Create a package

Let’s create a new project that will store the module code. We will use poetry to build the package.

To start, let’s install poetry.

pip install poetry

Further, if the repository for the module has already been created, and there is code there, you can use the command:

poetry init

Otherwise, you can create a new project (along with the directory structure) with the command:

poetry new

In both cases, a file will appear in the project pyproject.tomlwhich is used by poetry to store information about the project (name, description, dependencies).

If you already have a file requirements.txt and you don’t want to manually transfer everything that has grown during development, you can transfer dependencies to poetry with one command:

cat requirements.txt | xargs poetry add

Now let’s take a look at pyproject.toml:

[tool.poetry]
name = "hello-world-package"
version = "0.1.3"
description = ""
authors = ["Dmitry <dmitry8912@gmail.com>"]
readme = "README.md"
packages = [{include = "hello_world_package"}]

[tool.poetry.dependencies]
python = "^3.10"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

Here you need to pay attention to the first section of the file – [tool.poetry]it describes general information: package name, version, description, and subdirectories in which the module code is directly located (the line packages = [{include = "hello_world_package"}]). If the files with the code are located in the root of the project, move them to a separate subdirectory, and specify its name in the packages line.

Also create README.md with a detailed description of the package.

These settings are enough to build the package files.

Team poetry build creates a folder in the root of the project distwhich will contain .tar.gz and .whl package files (according to versions).

Versioning

The package, as a rule, does not live by the “fire and forget” principle. During the development process, changes to the source code may be made, and it becomes necessary to distinguish one version from another. The version stored in pyproject.tomlis current, and when rebuilding the package without updating the version, the contents of the folder dist (for a specific version, of course) will simply be overwritten. Therefore, before building a new version, you need to change the number in pyproject.toml. By hand, it is, of course, more fun, but I propose to move away from “updating the versions of the smoker”. The command available in poetry is poetry version with optional arguments (patch, major, minor…) incrementing the version according to the rule for each type of increment.

For example, to release a new version that contains minor bug fixes and does not break compatibility, you can use poetry version patch.

When a new version is released that contains significant changes – poetry version major.

Looking ahead, I note that GitLab will not accept a package if such a version is already in its registry, so it is necessary to “upgrade” the version before downloading the package.

Uploading packages to the registry

Uploading a package to the registry is possible using the command poetry publish, however, in my experience, I have consistently received a 422 error when trying to submit changes. Therefore, I will talk about how to load the package into the registry using the twine package. First, of course, install the required package:

pip install twine

We will need to create a .pypirc file to store registry information (address, access token):

[distutils]
index-servers =
    gitlab

[gitlab]
repository = https://<gitlab_address>/api/v4/projects/<project_id>/packages/pypi
username = <token_name>
password = <token>

By substituting the required values ​​into the registry address, login and password, we can download the previously assembled package.

[root@srv-dev-core0 hello_world_package]$ python -m twine upload --repository gitlab dist/* --config-file ./.pypirc

Uploading distributions to http://gitlab.local/api/v4/projects/29/packages/pypi
Uploading hello_world_package-0.0.1-py3-none-any.whl
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.0/4.0 kB • 00:00 • ?
Uploading hello_world_package-0.0.1.tar.gz
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.8/3.8 kB • 00:00 • ?

The package information should appear on the package registry page.

Installation from the registry

Actually, I also recommend installing the package through poetry, since pip does not store information about where the package was taken from, and pip install -r requirements.txt will look for the package in the wrong place. Or you will have to divide the dependencies into parts, store them in a separate requirements.txt, and tell pip where to get what. Poetry, on the other hand, stores information about the package source in pyproject.toml. When deploying the project, all that remains is to add authentication data.

In general, the procedure is as follows:

  1. Add a new registry address to the list (gitlab in this case is just a name)

poetry source add gitlab https://<gitlab_address>/api/v4/projects/<project_id>/packages/pypi/simple
  1. Adding authentication data

poetry config http-basic.gitlab <token_name> <token>
  1. Specifying poetry where to put the package

poetry add --source gitlab <your_package_name>

IN pyproject.tomlin the information about the package, there will be data about the remote registry, and the connection of the package with a specific registry.

your_package = {version = "^0.1.2", source = "gitlab"}

...

[[tool.poetry.source]]
name = "gitlab"
url = "https://<gitlab_address>/api/v4/projects/<project_id>/packages/pypi/simple"
default = false
secondary = false

Now, for other developers to install the package, they need to be told:

  • Registry address;

  • package name;

  • Access data (or indicate the need for self-generation).

CI/CD

Finally, some automation. It is much more convenient when each repository with module code is configured for automatic assembly and loading into the registry directly in the CI\CD pipeline.

You can use the following script for this:

#!/bin/sh

# Скрипт оперирует следующими env-переменными
# TOKEN_VALUE - токен для доступа к реестру на запись (привилегия write_registry)
# BUMP_VERSION - patch|minor|major|..., или как именно инкриментировать версию

echo 'Preparing .pypirc'
# В .pypirc для token вписано значение {REGISTRY_TOKEN}, заменяемое через sed "на лету"
sed -i "s/{REGISTRY_TOKEN}/${TOKEN_VALUE}/g" .pypirc
echo "Package building stage"

# Скрипт version.py получает последнюю загруженную версию через GitLab API
VERSION=$(python version.py)
CURRENT_VERSION=$(cat pyproject.toml | grep -n version | cut -d : -f 1)
# Полученная версия прописывается в pyproject.toml
sed -i "${CURRENT_VERSION}s/.*/version = \"${VERSION}\"/" pyproject.toml

# Версия пакета поднимается
echo "Bumping version with rule ${BUMP_VERSION}"
poetry version $BUMP_VERSION

# Пакет собирается и загружается в реестр
echo "Building package"
poetry build

echo "Uploading package to registry"
python -m twine upload --repository gitlab dist/* --config-file ./.pypirc\

This script runs inside a docker container launched by the runner. Authentication data is passed to the container as env variables. The .pypirc file is pre-populated with information about the address of the registry, and all that remains is to replace the token.

But there is a problem with updating the version. Two developers can install the same version in pyproject.toml, and as a result, some changes may be lost, since GitLab will not accept a new package with a version that already exists in the registry. There is only one way out of this situation – to determine the latest current version in the registry, write it to pyproject.toml, raise the version using poetry version [patch|minor|major|…].

To get packages from the registry, you can use the script:

import json
import os
import requests

if __name__ == '__main__':
    token = os.getenv('TOKEN_VALUE')
    package_name = os.getenv('CI_PROJECT_NAME')
    result = None

    try:
        result = requests.get(f'https://<gitlab_address>/api/v4/projects/<proejct_id>/packages?sort=desc&package_name={package_name}',
                              headers={'Authorization': f"Bearer {token}"})
    except Exception as e:
        print(e)
        exit(1)

    if result.status_code == 200:
        data = json.loads(result.content)
        print(data[0]['version'])
        exit(0)

The script, using the GitLab API, will get a list of packages in json, sorted in descending order (most recently downloaded package will be first). It remains only to change the version in pyproject.toml, raise it, and load the package into the registry.

Conclusion

Managing PyPI packages in GitLab is pretty straightforward. In just a couple of hours, you can replace the copy-pasted / downloaded parts of the application via git submodules with a more elegant solution, which, moreover, updates the versions with each push to the repository.

At the end of the article I would like to recommend free lesson from OTUS on the topic: “Design patterns”. The lesson will cover the main categories and the most famous patterns. Details link.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *