a small guide to creating a script for object storage in python

MinIoas an object data storage system, deservedly enjoys the love of developers: the tool is pleasant and quite easy to use and master. For one of our large projects at work, the need to use S3 storage recently arose, however, for corporate reasons, we chose another tool for use in production, namely – IONOS (our company is German and there are many other things tied to IONOS), but for testing and for local script launching, nothing better than MinIo came to our mind. Such a combination at the same time caused the need to use a Python library that could work both “on ours and yours”, and in our case on both MinIo and IONOS (changed the parameters in the config and the same code that worked locally starts working with production) and this library became Boto3 (standard package mini was not suitable for these purposes). It is precisely about this constellation – Python, MinIo and Boto3 – that I would like to tell you further, but if instead of MinIo you want to use something else, then “changed the parameters in the config and the same code that worked locally starts working with production“.

In the beginning there was a docker compose file…

So, since we mainly need MinIo for local development and testing, we will start with a local launch. To do this, we will create a docker-compose.yml file in our project and write the following instructions there:

services:
  minio:
    image: minio/minio
    entrypoint: sh
    command: > 
      -c 'mkdir -p /data/test-bucket #  Этой командой мы сразу создаем нужный
      && minio server /data'         #  нам бакет в MinIo (test-bucket)
    ports: 
      - 9000:9000
      - 9001:9001
    environment:                         #  Эта часть кода нам нужна, чтобы 
      MINIO_ROOT_USER: 'USERNAME'        #  запустить пользовательскую консоль
      MINIO_ROOT_PASSWORD: 'PASSWORD'
      MINIO_ADDRESS: ':9000'
      MINIO_CONSOLE_ADDRESS: ':9001'

Next, run docker compose with the command docker compose up and we wait until the image is pulled up and the container is launched. After these processes are completed, go to http://localhost:9001/login , enter your login and password (USERNAME and PASSWORD) and check that our test backup has been created. You should get the following image:

S3 service on Boto3

So MinIo is running, now we can move on to creating a script on the boto3 library, which will give us the ability to interact with our bucket. Let's install the library we are interested in with the command pip install boto3 (or whatever command your favorite dependency manager uses), and then create a file s3_service.py and put the following code in it:

from io import BytesIO
from pathlib import Path
from typing import Optional, Union

import boto3
from botocore.client import Config
from botocore.exceptions import ClientError
from botocore.response import StreamingBody


class S3BucketService:
    def __init__(
        self,
        bucket_name: str,
        endpoint: str,
        access_key: str,
        secret_key: str,
    ) -> None:
        self.bucket_name = bucket_name
        self.endpoint = endpoint
        self.access_key = access_key
        self.secret_key = secret_key

    def create_s3_client(self) -> boto3.client:
        client = boto3.client(
            "s3",
            endpoint_url=self.endpoint,
            aws_access_key_id=self.access_key,
            aws_secret_access_key=self.secret_key,
            config=Config(signature_version="s3v4"),
        )
        return client

    def upload_file_object(
        self,
        prefix: str,
        source_file_name: str,
        content: Union[str, bytes],
    ) -> None:
        client = self.create_s3_client()
        destination_path = str(Path(prefix, source_file_name))

        if isinstance(content, bytes):
            buffer = BytesIO(content)
        else:
            buffer = BytesIO(content.encode("utf-8"))
        client.upload_fileobj(buffer, self.bucket_name, destination_path)

    def list_objects(self, prefix: str) -> list[str]:
        client = self.create_s3_client()

        response = client.list_objects_v2(Bucket=self.bucket_name, Prefix=prefix)
        storage_content: list[str] = []

        try:
            contents = response["Contents"]
        except KeyError:
            return storage_content

        for item in contents:
            storage_content.append(item["Key"])

        return storage_content

    def delete_file_object(self, prefix: str, source_file_name: str) -> None:
        client = self.create_s3_client()
        path_to_file = str(Path(prefix, source_file_name))
        client.delete_object(Bucket=self.bucket_name, Key=path_to_file)
        

In the code above, we have created the S3BucketService class, which allows us to configure a connection to our storage, allows us to add and remove objects, and also get a list of them. Now is the time to configure the connection parameters with MinIo and check if everything works correctly. Let's create a configuration file (You can use any format you are used to, personally I will use .ini file and configparser) and let's call it default.ini. Let's write the following config inside:

[s3_storage]
bucket_name = test-bucket
endpoint = http://localhost:9000
access_key = USERNAME
secret_key = PASSWORD

Then we will add it to the already created file s3_service.py the following function:

def s3_bucket_service_factory(config: configparser.ConfigParser) -> S3BucketService:
    return S3BucketService(
        config["s3_storage"]["bucket_name"],
        config["s3_storage"]["endpoint"],
        config["s3_storage"]["access_key"],
        config["s3_storage"]["secret_key"],
    )

Everything is ready, now we can call our S3 service factory, pass it the config and connect to the storage. Let's test whether this is true: create a new Python file with an arbitrary name test.py and type the following code:

import configparser

from s3_service import s3_bucket_service_factory

config = configparser.ConfigParser()
config.read('default.ini')

s3 = s3_bucket_service_factory(config)
s3.upload_file_object("test", "test.txt", "test")

Let's run the resulting script, wait for it to execute and go to the MinIo user console, click on the bucket and enjoy the test directory that appears, with the test.txt file and its content “test” (Well, of course, before you can rejoice, you'll have to download the file first, but this can also be done through the GUI provided by MinIo)!

For curiosity's sake, you can do the same with other methods defined in the S3BucketService class, such as getting a list of objects and deleting an object.

Tests (or something like that)

In the example described, there is no business logic that would require loading and unloading objects, deleting them, etc. Therefore, writing tests in general seems redundant (well, we are not really going to test MinIo with a straight face and a straight face), but for illustrative and educational purposes, why not. Let's write a small test that checks that our functions for loading and deleting objects work. Let's rename the existing file test.py V test_minio.py and write the following code in it:

import configparser

from s3_service import s3_bucket_service_factory


OBJECTS_TO_UPLOAD = [1, 2, 3]
config = configparser.ConfigParser()
config.read("default.ini")
S3 = s3_bucket_service_factory(config)


def test_object_is_created():
    for obj in OBJECTS_TO_UPLOAD:
        S3.upload_file_object("test", f"{obj}.txt", "")

    objects_in_bucket = S3.list_objects("test")
    assert len(objects_in_bucket) == len(OBJECTS_TO_UPLOAD)


def test_object_is_deleted():
    for obj in OBJECTS_TO_UPLOAD:
        S3.delete_file_object("test", f"{obj}.txt")

    objects_in_bucket = S3.list_objects("test")
    assert len(objects_in_bucket) == 0

Let's run pytest and voila, everything works.

Gitlab-ci for pytest and Minio

To complete the picture, I will also provide a script for gitlab-ciwhich runs a pipeline with pytest, might be useful to someone.

test-pytest:
  image: 'python:3.9-slim-bullseye'
  stage: test
  needs: []
  variables:
    MINIO_BASE_URL: http://minio:9000
  services:
    - name: minio/minio
      alias: minio
      entrypoint: ['sh']
      command:
        - -c
        - >
          mkdir -p /data/test-bucket
          && minio server /data
      variables:
        MINIO_ROOT_USER: 'USERNAME'
        MINIO_ROOT_PASSWORD: 'PASSWORD'
  before_script:    # тут нам нужно убедиться, что MinIo запущен 
    - apt update    # и лишь потом стартовать наши тесты
    - apt install -y curl
    - | 
      until curl --output /dev/null --silent --head --fail $MINIO_BASE_URL/minio/health/live; do
        printf '.'
        sleep 1
      done
    - pip install -r requirements_dev.txt
  script:
    - |
      pytest -v .

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *