Bulk upload, processing and upload of images in Bitrix on Python3

And in more detail?

A month ago, our team received a task of approximately the following content: a new online store site running on Bitrix is ​​being launched, all the products in its database are already designed, but the photos for each product are not completely filled out (for each product there is only a detailed photo, and a preview photographs and additional images are missing).

For each product, you had to do:

  • Download each detailed photo

  • Resize it to the required resolution and process it depending on the status of the product (photo resolution for a luxury product is higher than for a product from the “mass market” category)

  • Upload this photo as an additional picture

  • Resize the same photo to the resolution required for the three types of product preview images and upload it to the preview image fields

  • Download additional photos from the old site for each product, and adjust to the resolution of the main photo (for the purposes of this post, we will assume that there is no access to the database of the old site)

  • Upload additional photos to the additional pictures section

Since we communicate mainly in the Python language, we are not familiar with Bitrix, and writing and using modules in PHP is generally a darkness for us, we began to look for information on how to do this simply, preferably also in Python. There was little information, or rather almost zero, therefore, for the same fans to pile on hard Python code in this case, and not solve all issues with built-in tools, this article was written. It is assumed that you know how to work with Python and SQL, know the requests, bs4, sqlite3 libraries, if not, then the information will still be useful.

Step 1. Massively get links to images from Bitrix

At this stage, we spent the most time, because we didn’t notice one small button that makes it much easier to upload images and not write parsers that emulate a session and so on.

The first thing you need to know: in order to download an image using python and just download you need to get a direct link to this image. This can be done by parsing the product card in Bitrix, or you can simply (as it turned out) download a table with the corresponding product ID and a link to a detailed image.

To do this, go to Content > Catalog > Product Catalog > Elements

Bitrix product catalog
Bitrix product catalog

See the gear highlighted in the photo? Click on it and configure the fields that will be displayed in the table depending on your task.

Table setup
Table setup

For our specific task, we only needed a detailed picture, but depending on your settings, they may differ.

We press the “Apply” button and we get such a beautiful view of the table of our products

Now we need to download this table as a regular Excel table. To do this, click on this button:

Congratulations, you have uploaded a complete Excel spreadsheet with all products, their IDs, SKUs and links to a detailed photo.

And if you are already prepared to uncover your module for working with excel spreadsheets in python, then I hasten to break you off and open the downloaded file in a text editor:

It turns out that this is a regular html document, which, of course, cannot but rejoice us. It would be nice to convert the received data into a plus or minus convenient format, that is, upload it to a local database. Before you load something into a table, you need to create it:

(I want to note that this step is optional, and if it is convenient for you to store data in json, txt, the same table, please).

CREATE TABLE "elements" (
	"id"	INTEGER, 
	"article"	INTEGER,
	"detail"	TEXT,
	"additional"	TEXT,
	"prev1"	TEXT,
	"prev2"	TEXT,
	"prev3"	TEXT,
	"additional_handled"	TEXT
);

What are these fields:

  1. id – product identifier in Bitrix (logical)

  2. article – its article

  3. detail – link to the detailed photo we downloaded

  4. additional – semicolon-separated links to additional photos that will be obtained when parsing the old site

  5. prev1 – file name of the processed image for the first preview image

  6. prev2 – filename of the processed image for the second preview image

  7. prev3 – file name of the processed image for the third preview image

  8. additional_handled – semicolon-separated names of files with processed additional images

Now you can transfer our downloaded Excel to a readable database:

import sqlite3 
from bs4 import BeautifulSoup
import lxml

# наша таблица лежит в файле detail_images.xls

with open('detail_images.xls', 'r', encoding='utf-8') as file:
  	""" важно не забыть указать кодировку при открытии файла, иначе 
    все считывание сломается"""
    soup = BeautifulSoup(file.read(), features="lxml")
    """загружаем все, что мы считали с файла в соуп, не забывая указать features"""
table = soup.find('table')
"""находим таблицу в файле по тэгу"""
rows : list= table.find_all('tr')
""" в массиве rows лежат все ряды таблицы"""
database_connection = sqlite3.connect('base.db') 
cursor = database_connection.cursor()
# открываем соединение с базой и получаем курсор
for row in rows[1:]: # срез нужен потому, что в нулевой строке таблицы лежат заголовки
    cols : list = row.find_all('td')
    # в массиве cols лежат ячейки таблицы в конкретном ряду row
    detail = cols[0].text
    element_id = int(cols[1].text)
    try: article = int(cols[2].text); # артикулы иногда имеют свойство быть строчками или пустыми
    except Exception: article = -1; # поэтому таким товарищам мы присваиваем значение -1, потому что это
    																# это невалидные товары, их мы не обрабатываем
    cursor.execute("INSERT INTO elements VALUES ({}, {}, '{}', '', '', '', '', '')"\
                   .format(element_id, article, detail))
    # загружаем в таблицу elements полученные нами значения
    
    database_connection.commit()
database_connection.close()

Now if we open the database, we will see such a gratifying picture:

Step 2. Parsing images from the old Bitrix site

Describing this step is likely to be the least useful, since scrapers are things that are written for each specific site. However, some useful ideas can be gleaned from this step.
We will search for goods on the old site by their article number. I think that sites made on Bitrix work the same way, therefore this strategy will be applicable to all such sites.

First of all, we import the requests library, add an error class that will crash if the product by article is not found, or two of them (yes, there are such cases))

import requests

class ArticleException(Exception):
    pass

Let’s drive the necessary information into variables:

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:45.0) Gecko/20100101 Firefox/45.0'}
oldSiteSearchUrlTemplate : str = "https://bma.kz/catalog/?search={}"
# в этой строке должно лежать значение адресной строки браузера при 
# выполнении поиска на сайте, с параметрами поиска замененными на {}
old_site_domain = "https://bma.kz"
# просто адрес сайта

headers are GET request headers that will sign our automatic requests as not automatic at all

Next, we will collect an array of articles of our products

database_connection = sqlite3.connect('base.db')
cursor = database_connection.cursor()
cursor.execute("SELECT * FROM elements") # забираем все элементы из базы
articles = [x[1] for x in cursor.fetchall()] # сохраняем в массив только поле с 
# индексом 1, то есть артикул 

Running through the array of articles:

for article in articles: 
    response = requests.get(oldSiteSearchUrlTemplate.format(article), 
                            headers = headers)
    # запрашиваем поисковую выдачу по артикулу, подставляя его в шаблон
    # адреса поисковой выдачи на сайте
    response_soap = BeautifulSoup(response.text, features="lxml")
    try:
        elementDivs = response_soap.find_all("div", 
                                             {'class' : "cata-list-item"})
        # находим все элементы класса cata-list-item 
				# элементы этого класса — карточки товаров, которые выдаются вам
				# когда вы выполняете поисковой запрос
        print(len(elementDivs), end = '\t')
        if len(elementDivs) == 1: 
          	# если такой элемент ровно один, то тогда у нас все классно
            # и одному артикулу соответствует один товар
            elementDiv = elementDivs[0]
            url = elementDiv.find('a')['href']
            # выгружаем из этого элемента ссылку на карточку товара
            response = requests.get(old_site_domain + url, headers = headers)
            response_soap = BeautifulSoup(response.text, features="lxml")
            # делаем запрос к карточке товара
            dop_photo = response_soap.find('div', {'class' : 'cata-d-dopphoto'})
            # ищем на полученной странице все элементы класса
						# cata-d-dopphoto: это контейнер для дополнительных фотографий
            dop_photo_containers = dop_photo.find_all('img')
           	# из этого контейнера вылавливаем все элементы с тегом картинки
            additional_photo_urls=";".join(
              [old_site_domain + i['src'] for i in dop_photo_containers]
            )
            # а вот уже из этих элементов вытягиваем свойтво src
						# в котором лежит ссылка на нужную нам картинку,
						# формируем через точку с запятой строчку
            cursor.execute("UPDATE elements WHERE article={} SET additional="{}"".format(article, additional_photo_urls))
            database_connection.commit()
            # заливаем ее в базу для дальнейших экзекуций
        else:
            raise ArticleException
            # если у нас все не классно и по одному артикулу вы нашли несколько товаров
            # не стесняемся выкидывать
    except ArticleException: # а затем и обрабатывать ошибку
        continue
    except Exception:
        continue
database_connection.close()

I want to again draw your attention to the fact that specific class names may differ. The meaning of the above is in the description of the strategy with which you can download this data.

Step 3. Image processing

I will not give a detailed description of image processing, otherwise it will turn out to be either a short tutorial on PIL, or a useless description of the code for one task. I will mark only the main milestone, which may be useful to you:

PIL can only work with images that have already been downloaded to the hard drive, so you need a function to download those images.

def load_image(url : str, temporary_path : str, article : int) -> str:
  # в temporary_path лежит адрес файла в который вы запишете скачанную картинку
    try:
        p = requests.get(url)
        out = open(temporary_path, "wb")
        out.write(p.content)
        out.close()
        return temporary_path
    except Exception as e:
        with open("Exceptions.txt", 'a', encoding = 'utf-8') as file:
            file.write("{}\n".format(article))
        return "ERROR"

Run through the database -> get image URLs from it -> download images -> process them for your tasks -> save them with a name that you can uniquely match with the product and the place in the card where this image needs to be loaded -> profit .

Step 4. Uploading processed images to the server.

Everything would be strawberry-chocolate in this life if the pictures uploaded through the Bitrix tools would be on the server with the names that were originally assigned to these files. But it’s not that simple.)

To get started, photos need to be uploaded to the server in some form. To do this, go to Content > Site Structure > Media Library > Images

And with the help of the multiple upload element, we upload all our photos.

Let’s try to open one of the uploaded photos.

The rectangle highlights the name of the file with which it was in our computer file system, and the arrow shows the file name on the server. At the next stage, you need to match the addresses of the files on the server with the products in the cards of which these files need to be uploaded.

Step 5 Match File Names

To help us in this difficult task comes the ability to access the Bitrix database directly through the built-in tools. For this we go:
Settings > Tools > SQL query and in the query field we write

SELECT * FROM b_file

What will such a request return us? A table with complete data about files in the Bitrix file system. It looks like this:

In the “On page” setting, select “All”, wait a couple of minutes until the page with the full database loads and save this page as an html file.

Here is the code to help you download data from the table

with open('SQL_response.html', 'r', encoding='utf-8') as file:
    # в SQL_response.html лежит скачанная ранее страница
    soap = BeautifulSoup(file.read(), features="lxml")
print("soap readed")

rows = soap.find_all('tr', {'class' : 'adm-list-table-row'})
# rows - массив строчек в таблице, adm-list-table-row это класс строки
b = len(rows)
for i in range(len(rows)):
    print("{}% \t {}/{}".format(int(i/b * 100), i, b), end = '\r')
    row = rows[i]
    values = row.find_all('td')
    subdir = values[7].text
    filename = values[8].text
    originalname = values[9].text

Total:

  • originalname – the name of the file on your old filesystem

  • filename – file name on the server

  • subdir – the name of the subdirectory in which this file is located on the server

The general file address looks like this:
/uploads/$subdir/$filename

Congratulations, you have received the addresses of all uploaded files

Step 6 and last. Uploading photos to cards.

For each product card in Bitrix, the following model works: each picture in this card is set by a property in this card. If this explanation is not entirely clear, now we will figure it out along the way.

Let’s look at the PHP code that updates the images in a particular card:

if (CModule::IncludeModule("iblock")) {
$el = new CIBlockElement;
$PRODUCT_ID = 15;
$tmpFilePath=$_SERVER['DOCUMENT_ROOT']."/upload/medialibrary/84e/jwhfc3nj1z5nqtkld4myyno74s97z45h.jpg";
$arFile=array("VALUE" => \CFile::MakeFileArray($tmpFilePath),"DESCRIPTION"=>"");
\CIBlockElement::SetPropertyValueCode($PRODUCT_ID, 'PICTURE_455_430', $arFile);
$tmpFilePath=$_SERVER['DOCUMENT_ROOT']."/upload/medialibrary/ef5/osdcpwdynvfvn7icp2bmxxp2ukajvuf8.jpg";
$arFile=array("VALUE" => \CFile::MakeFileArray($tmpFilePath),"DESCRIPTION"=>"");
\CIBlockElement::SetPropertyValueCode($PRODUCT_ID, 'PICTURE_682_430', $arFile);
$tmpFilePath=$_SERVER['DOCUMENT_ROOT']."/upload/medialibrary/fe5/lekdifxhamcxt2vedzjzb6c3oasidjo2.jpg";
$arFile=array("VALUE" => \CFile::MakeFileArray($tmpFilePath),"DESCRIPTION"=>"");
\CIBlockElement::SetPropertyValueCode($PRODUCT_ID, 'PICTURE_682_962', $arFile);
$tmpFilePath=$_SERVER['DOCUMENT_ROOT']."/upload/medialibrary/42c/0j5lhnu3px1ppmtf12x04l1tc888rpxw.jpg";
$arFiles[]=array("VALUE" => \CFile::MakeFileArray($tmpFilePath),"DESCRIPTION"=>"");
$tmpFilePath=$_SERVER['DOCUMENT_ROOT']."/upload/iblock/f50/xi47bkmj83qpm0jg6z20ms8399uoazc1.jpg";
$arFiles[]=array("VALUE" => \CFile::MakeFileArray($tmpFilePath),"DESCRIPTION"=>"");
$tmpFilePath=$_SERVER['DOCUMENT_ROOT']."/upload/iblock/a38/knzilyxmibfjlozrr40dhnjxieir8z4i.jpg";
$arFiles[]=array("VALUE" => \CFile::MakeFileArray($tmpFilePath),"DESCRIPTION"=>"");
\CIBlockElement::SetPropertyValueCode($PRODUCT_ID, 'UF_ADDPHOTO', $arFiles);
$arFiles = array(); 
}

Little is clear. You don’t need to understand it we ourselves did not fully understand. Now let’s go in order:

if (CModule::IncludeModule("iblock")) {
$el = new CIBlockElement;
$PRODUCT_ID = 15;

The $PRODUCT_ID variable contains the product ID in Bitrix. I think it’s clear

$tmpFilePath=$_SERVER['DOCUMENT_ROOT']."/upload/medialibrary/84e/jwhfc3nj1z5nqtkld4myyno74s97z45h.jpg";
$arFile=array("VALUE" => \CFile::MakeFileArray($tmpFilePath),"DESCRIPTION"=>"");
\CIBlockElement::SetPropertyValueCode($PRODUCT_ID, 'PICTURE_455_430', $arFile);
  • The $tmpFilePath variable contains the file address for the first preview image.

  • Then it is thrown into the $arFile array.

  • The $arFile array is passed to the ‘PICTURE_455_430’ property of the element with ID = $PRODUCT_ID

  • Similar two blocks below load the files of the second and third preview images into the properties PICTURE_682_430 and PICTURE_682_962, respectively

Then for each additional photo (and there can be a variable number of them), the following code is written:

$tmpFilePath=$_SERVER['DOCUMENT_ROOT']."/upload/medialibrary/42c/0j5lhnu3px1ppmtf12x04l1tc888rpxw.jpg";
$arFiles[]=array("VALUE" => \CFile::MakeFileArray($tmpFilePath),"DESCRIPTION"=>"");

Then write once

\CIBlockElement::SetPropertyValueCode($PRODUCT_ID, 'UF_ADDPHOTO', $arFiles);

This line sets the UF_ADDPHOTO property to $arFiles.

$arFiles = array(); 
}

after that we clear the $arFiles array and close the code block.

According to these rules, substituting the necessary IDs and file addresses, you need to use python to generate a code for all products in the store (you can do this in parts, but according to observations, Bitrix does a good job of processing code for 2000 elements at a time).

This PHP code will take approximately 40,000 lines. And in order to execute it, we go:

Settings > Tools > PHP command line

And we paste this huge pile of shit code there. Most importantly, do not forget to turn off code highlighting, it greatly affects performance, and with code highlighting 2000 elements, it will most likely (for sure) crash your site 🙂

A small response to a possible criticism of this move:

While working on the project, we discovered a more elegant way to update product properties by uploading a CSV table through the built-in Bitrix tools. However, this method is already quite well described in other sources, and it also takes a little more time than the one described above 🙂

Finally

We are open to criticism in the comments, and even hope that the experts will show us an easier way to complete the above tasks, because additional work on this project involves approximately similar actions, so we kindly ask Bitrix experts to join the discussion 🙂

Yours, room304

Similar Posts

Leave a Reply