Automation of downloading documents from contracts from the EIS platform using Python

Currently, information on government contracts is an important resource for many studies and analytical works. However, collecting this information can be a time-consuming process, especially when dealing with a large number of contracts. In this article, we will look at how you can use the Python programming language and some of its libraries to automate the process of downloading documents from government procurement, as well as how to log exceptions using the Loguru library.

Preparing the environment

Before we begin, make sure you have Python and the required libraries installed: requests, BeautifulSoup, tqdm And loguru. You can install them using pip:

pip install requests beautifulsoup4 tqdm loguru

Code

Let's look at the example code we will use. This code allows you to download documents from public procurement using a list of contract numbers and log exceptions using the Loguru library.

import os
import requests
from loguru import logger
from bs4 import BeautifulSoup
from tqdm import tqdm
from random import randint
import time


# Задаем список номеров контрактов
contract_number = ['0830500000224001380', '0830500000224001379', .....]

# Задаем заголовки для имитации браузера
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

# Создаем временный список ссылок
temporary_list_links = []

# Соединяем статическую ссылку с номером контракта
for number in contract_number:
    url="https://zakupki.gov.ru/epz/order/notice/ok20/view/documents.html?regNumber=" + number
    temporary_list_links.append(url)


# Функция для проверки существования папки и создания её при отсутствии
def create_folder_if_not_exists(folder):
    if not os.path.exists(folder):
        os.makedirs(folder)


try:
    for url_in_list in range(len(temporary_list_links)):
        response = requests.get(temporary_list_links[url_in_list], headers=headers)

        if response.status_code == 200:
            time.sleep(randint(4, 12))
            soup = BeautifulSoup(response.text, 'html.parser')
            time.sleep(randint(6, 18))
            links = soup.find_all('a', href=True)
            urls = []
            titles = []

            for link in links:
                href = link['href']
                if 'file.html' in href:
                    title = link['title'].strip()
                    titles.append(title)
                    href = href.strip()
                    urls.append(href)

            folder = r'указываем локальный путь, по которому будем созадавать папку'
+ contract_number[url_in_list]

            # Проверяем существование папки и создаем ее, если она не существует
            create_folder_if_not_exists(folder)

            with tqdm(total=len(urls), desc="Загрузка файлов") as pbar:
                for url in range(len(urls)):
                    time.sleep(randint(16, 40))
                    r = requests.get(urls[url], headers=headers)
                    filename = titles[url]
                    with open(os.path.join(folder, filename), 'wb') as f:
                        f.write(r.content)
                    pbar.update(1)

except Exception as e:
    logger.error('Исключение' + str(e))

How it works

This code starts by importing the necessary libraries and defining initial variables, including a list of contract numbers and headers to simulate a browser. A temporary list of links to government procurement pages is then created for each contract.

The main code uses a loop to go through each link in the list, send requests to get the pages' HTML code, and parse it using BeautifulSoup. It then looks for document links in the tables and creates a list of download URLs.

Before starting the download, the code checks whether the files already exist in the contract folder to avoid downloading again. Then he uses tqdm to display the download progress in the terminal and downloads the files one by one, saving them in the appropriate folders.

If exceptions occur, such as query errors or other errors, the code handles them and logs them to a file for later analysis using the Loguru library.

Conclusion

This code demonstrates how Python can be used to automate the downloading of government procurement documents and exception logging using the Loguru library. It also shows how to handle exceptions and log errors to manage the process more efficiently.

I hope this example will be useful for managers like me, who, due to their responsibilities, need to download and analyze documentation for contracts.
However, this solution is not a solution; it is best to develop a normal application that will work specifically with the official ftp of government procurement. I created this solution for myself as a temporary one while I write full-fledged software.