“Midjourney is on its knees, but now with S3.” How to store generation with prompts in object storage

. But how to save the results? After all, if you want to recreate the virtual machine with the neural network on which the files are stored, you will lose them completely. Under the cut we tell you how to connect it

to object storage

and store prompts in object metadata.

In this article we will use ready-made notebook template to work with the Diffusers library. If you are already familiar with creating a virtual server for its operation and the key advantages of S3, then you can safely move on to storage setup.

Use navigation if you don't want to read the entire text:

Features of S3
Briefly about DAVM
Creating DAVM
Setting up S3
DAVM in practice
Connecting to S3
Conclusion

S3 Features


Today S3 storage is one of the most popular cloud storage services. The technology allows you to work with a large amount of information of any type and quickly scale. Among its advantages are automatic backups and the ability to add metadata to files.

Metadata allows you to conveniently sort objects by type, creation date, etc. You can find and manage objects using unique URLs. The access algorithms here are quite simple. Even if there is a lot of data in the object storage, you will get access to individual objects just as quickly. At the same time, S3 can store petabytes of data.

Although the technology is gaining popularity among companies and users, there are nuances. To work with the storage, you need specialized software with the correct configuration, and the access speed is quite limited compared to other types of storage.

You should proceed from the tasks that you plan to solve with the service. For example, object storage is not suitable for hosting a database. Common scenarios for using S3 include storing personal information, backing up, and Big Data.

For more details on S3, please see in our review.

Briefly about DAVM



Data Analytics Virtual Machine (DAVM) — a virtual server with an OS image and a set of tools for data analysis and machine learning. You can customize DAVM for your tasks and quickly recreate the platform if something goes wrong. For the second scenario, it will be especially convenient to store data outside the virtual server. In this case, S3 storage (object storage) is suitable.

The DAVM image can be deployed in minutes from the control panel, and uses Ubuntu 22.04, as many GPUs as needed, and comes with a ready-to-use toolchain.

  • JupyterLab — a development environment for working with Jupyter Notebooks, data and code.
  • Prefect — Software for managing tasks for collecting, monitoring and processing data.
  • Apache Superset — a web application for visualizing and exploring data, creating dashboards and reports.
  • PostgreSQL — relational DBMS for data storage.
  • Various libraries for machine learning. For example, TensorFlow And PyTorch.

DAVM virtual machine device.

Docker is installed inside, with Prefect, Superset, Keycloak, and Jupyter Hub running in containers. It is worth noting the “standalone” PostgreSQL, which Superset runs in conjunction with. Data can be loaded via Prefect into PostgreSQL, and then moved to Superset.

Storage can be organized within DAVM. However, in this case, there is a risk of losing information as a result of a failure or re-creation of the platform. It is better to use S3 or cloud databases, for example, with PostgreSQL. This will allow you to re-create the platform if something goes wrong. Data storage will become more secure.

A complete list of pre-installed libraries and frameworks, their versions and other technical details about DAVM can be found in the product documentation. All components of the platform can be customized depending on the tasks.

Create DAVM

Let's take the configuration with the Tesla T4 video card as a basis.

1. Go to the section Cloud platform inside control panels.

2. Select the ru-7a or ru-9a pool, create a cloud server with the Ubuntu 22.04 LTS Machine Learning 64-bit distribution and a suitable configuration. Use a virtual machine with a NVIDIA Tesla® T4 16 GB video card.

3. It is important that the server is accessible “from the Internet”, otherwise it will be impossible to connect from a computer. To do this, select a new public IP address.

4. Press

Create

. The system will boot up within a couple of minutes. To set up the environment, connect to the server via SSH – then it will show the data for authorization in the DAVM environment. The command for connecting can be found in the tab

Configuration

.

Created server. Configuration tab.

5. Take the connection link, login and password information for the first login from the terminal.

Screenshot from the terminal. Link to connect and login details.

6. Follow the link and log in to DAVM. You can now launch Jupyter Lab, Keycloak, Prefect or Superset from your browser. For the purposes of this article, we will only use Jupyter Lab.

DAVM start page.

Setting up S3


1.
IN control panels let's move on to Object storage Create container.

2. Choose Region — St. Petersburg and Pool – ru-1.

3. Typepublic. Such a container is available without authorization. If you need to restrict direct access to files, a private one will do. Classstandard storage. The optimal choice for working with frequently used data. Cold storage is used for backups, archives and other important data that is rarely accessed.

4. Turn off addressing. Press Create container.

Create a container in the control panel.

Create a service user

To interact with S3 via the API, you must create a service user.

1. Go to the tab Access control Service users.

2. Press Add user. We can leave the username as default and generate a password.

3. In the field Role choose Object Storage Administrator.

4. After selecting the required project, click Add user.

Adding a user in the Access Management tab.

We receive the keys

Several keys can be issued for one project. However, for each new one you need to create a separate key.

1. In the tab user management Let's move on to the created user.

2. In the field S3 keys we press Add key.

3. In the window

Adding an S3 key

select our project, the name can be left by default. Click

Generate

.

We see that two values ​​have been generated: Access key – key identifier, and Secret key – secret key. We copy and save the key – after closing the window it will not be viewable.

Now that we have created and configured S3 and DAVM, we can move on to practice and organize the connection between them.

DAVM in practice


To test the platform, let's take
ready-made notebook template for working with the Diffusers library. We have already described the application in more detail in the review.

Diffusers — is a library from Hugging Face that lets you work with hundreds of trained Stable Diffusion models to generate images, audio, and even 3D molecular structures. It can be used to experiment with existing models or to train your own.

We deploy a neural network to generate images. The process takes no more than ten minutes, since the drivers and necessary software are already installed and configured in DAVM.

By default, we run inference models on GPU cores. With CPU power, image generation will take significantly longer. For the convenience of “communication” with the neural network, a Telegram bot was created. The user can submit a request and quickly receive the generated image. The model itself can be customized through Jupiter Lab.

Generating images in a chatbot. One image takes about 5-10 seconds.

Connecting to S3


Using a connector

To connect to object storage, you can use the S3 connector in DAVM. In the field

Access Key ID

we enter the key identifier, and in

Secret Access Key

— secret key. In the Endpoint URL field, enter the link —

https://s3.storage.selcloud.ru

. Click

Connect

. Done – you store your data outside the virtual machine, in a separate and scalable storage.

S3 connector in DAVM.

However, this method may not be flexible enough for some tasks. Let's say you want to set up a connection to S3 in a project. This way, you can add metadata to files, write read and write scripts, and customize them for your purposes.

Another of the many tools for working with S3 is the s3fs utility. It allows you to use object storage as a file system on Linux operating systems. More details — in the instructions.

Using Jupyter Notebook

Let's look at connecting to an S3 bucket in Jupyter Notebook. A bucket is an entity for storing objects in S3.

1. As before, go to the link in the browser (.pl.davm.selcloud.ru). Select the desired icon.

2. Import development kit (SDK) boto3, designed to work with AWS. You can find out more about the library and its use in the documentation.

import boto3

Create a boto3 session specifying the S3 service URL for Selectel —

https://s3.storage.selcloud.ru

. 'your_access_key_id' and 'your_secret_access_key' are our Selectel S3 access credentials that we got earlier.

session = boto3.session.Session()
s3 = session.client(
    service_name="s3",
    endpoint_url="https://s3.ru-1.storage.selcloud.ru",
    aws_access_key_id='your_access_key_id',
    aws_secret_access_key='your_secret_access_key'
)

Example of reading a file from a bucket:

bucket_name="your_bucket_name"
response = s3.list_objects_v2(Bucket=bucket_name)

for obj in response['Contents']:
    key = obj['Key']
    obj = s3.get_object(Bucket=bucket_name, Key=key)
    body = obj['Body'].read()
    print(f'Key: {key}, Body: {body}')

Accordingly, 'your_bucket_name' is the name of our S3 bucket in Selectel.

Example of writing a file to a bucket (upload_file method):

    user_image_path = f"./images/{user_id}.png"
    if os.path.exists(user_image_path):
        os.remove(user_image_path)
    try:
        image.save(user_image_path)
        s3.upload_file(user_image_path, ‘your_bucket_name’, ‘your_file_name’, ExtraArgs={'Metadata':{'UserPrompt': user_prompt}})
        return user_image_path

Here we can set up metadata recording (ExtraArgs). In the case of a neural network bot, it will be convenient to store user queries. You can display metadata in the console in the following way:

        metadata = s3.head_object(Bucket="your_bucket_name", Key=’your_file_name’)
        print(metadata)

Storing the user's request in the file's metadata.

Each file stores the request for which the image was generated. There are many scenarios for using metadata. In the case of image generation, this is useful for running tests. We track the least successful or controversial illustrations and requests for them. Thanks to this, it is possible to exclude suspicious generations at the model level and improve its performance.

Conclusion


S3 is a convenient and functional tool for cloud data storage. It offers high reliability, scalability and flexibility, and is ideal for storing large volumes of data.

However, object storage is not a one-size-fits-all solution. For example, it is not suitable for hosting databases. Before choosing S3 for your project, you need to carefully evaluate its capabilities and limitations. For a more detailed study of the capabilities of S3, we recommend that you read our review And documentation on working with Selectel S3 storage.

Connect to GitHub project repository. Leave issues, fork it, and use it as a reference if you want to start a similar project. Share your experience with S3 and suggestions for improvement in the comments!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *