Scaling telegram bots using the example of aiogram

Scaling is not just about increasing capacity, it is the art of efficiently managing resources to meet growing user demands without sacrificing quality of service. In telegram bots, where the number of users can increase exponentially, your bot’s ability to adapt to the growing load becomes key to its success.

AIogram, an asynchronous library for creating bots in Python, stands out for its flexibility and performance. It allows you to build more responsive and scalable bots using Python’s modern asynchronous capabilities.

aiogram basics

Installing aiogram requires using Python 3.7 or higher. To install aiogram use pip:

pip install aiogram

This command will install aiogram along with the necessary dependencies.

Before creating a bot, you need to obtain an API token from BotFather in Telegram. This token will be used to authenticate your bot in the Telegram API. Save this token, you will need it for the next step.

Let’s create a basic echo bot. This bot will simply respond to any message received with the same text.

Let’s create a file, for example echo_bot.pyand add the following code:

from aiogram import Bot, Dispatcher, types
from aiogram.utils import executor

bot_token = 'YOUR_BOT_TOKEN'  # Замените на ваш токен
bot = Bot(token=bot_token)
dp = Dispatcher(bot)

@dp.message_handler()
async def echo(message: types.Message):
    await message.answer(message.text)

if __name__ == "__main__":
    executor.start_polling(dp, skip_updates=True)

Launch the bot by running the command in the terminal:

python echo_bot.py

The bot should now respond to any messages it receives.

Aiogram uses asynchronous programming, which allows the bot to handle multiple tasks simultaneously without blocking the main thread of execution.

The main feature of asynchrony in aiogram is the use of a keyword async before defining functions and await when calling asynchronous functions. This allows Python to perform other tasks while an asynchronous operation, such as sending a message, is waiting to complete.

Architecture for scalability

The right choice of architecture can significantly improve the performance and responsiveness of the bot, as well as simplify its further scaling and maintenance.

Modular Architecture

Modular architecture allows you to divide the functionality of the bot into separate, independent parts (modules), which are easier to maintain and develop. Each module is responsible for its own area of ​​work: processing commands, interacting with APIs, processing messages, etc.:

# main.py
from aiogram import Dispatcher, executor
from modules import echo_module, admin_module

dp = Dispatcher(bot)

echo_module.register_handlers(dp)
admin_module.register_handlers(dp)

if __name__ == "__main__":
    executor.start_polling(dp)

# modules/echo_module.py
from aiogram import Dispatcher, types

async def echo_handler(message: types.Message):
    await message.answer(message.text)

def register_handlers(dp: Dispatcher):
    dp.register_message_handler(echo_handler)

Asynchronous tasks

Executing tasks asynchronously and using queues to process requests helps distribute the load and improve bot responsiveness. This is especially true when performing long operations, such as queries to external APIs or processing large data:

from aiogram import types
from some_queue_lib import enqueue

async def long_running_task(message: types.Message):
    # Длительная операция
    result = await some_long_operation()
    await message.answer(result)

@dp.message_handler(commands=['start'])
async def start_command(message: types.Message):
    enqueue(long_running_task, message)

Distributed architecture

A distributed Telegram bot architecture uses multiple servers or bot instances to distribute the load. This is ideal for popular bots that handle a large number of requests. In this architecture, each bot instance runs independently but is synchronized with a centralized database server or message broker.

Example of task distribution:

# Модуль, который отправляет задачи в брокер сообщений (например, RabbitMQ)
import pika

def send_task_to_queue(task_data):
    connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
    channel = connection.channel()

    channel.queue_declare(queue="bot_tasks")
    channel.basic_publish(exchange="",
                          routing_key='bot_tasks',
                          body=str(task_data))
    connection.close()

# Эта функция может быть вызвана в любом месте бота для отправки задачи
send_task_to_queue({"user_id": 123, "message": "Пример задачи"})

Caching and optimization

Caching frequently requested data and optimizing database queries help reduce the load and speed up the bot.

Using Redis for caching:

import redis
import json

redis_client = redis.StrictRedis(host="localhost", port=6379, db=0)

def cache_user_data(user_id, user_data):
    redis_client.set(f"user_data:{user_id}", json.dumps(user_data))

def get_cached_user_data(user_id):
    cached_data = redis_client.get(f"user_data:{user_id}")
    if cached_data:
        return json.loads(cached_data)
    else:
        # Загрузка данных из базы данных
        user_data = # Загрузить данные пользователя...
        cache_user_data(user_id, user_data)
        return user_data

# Пример использования кэша
user_data = get_cached_user_data(123)

Monitoring and logging

Robust monitoring and logging systems are critical to tracking bot health and quickly responding to problems.

While integrating with the ELK Stack (Elasticsearch, Logstash, Kibana) at the code level can be complex, it is typically done by setting up logging in Python to send the logs to Logstash, which then indexes them into Elasticsearch for analysis in Kibana.

import logging
from logstash_async.handler import AsynchronousLogstashHandler

# Настройка логирования для отправки в Logstash
logger = logging.getLogger('python-logstash-logger')
logger.setLevel(logging.INFO)
logstash_handler = AsynchronousLogstashHandler('localhost', 5000, database_path=None)
logger.addHandler(logstash_handler)

# Логирование сообщения
logger.info('Тестовое сообщение для Logstash')

These code examples demonstrate how distributed architecture, data caching, and monitoring can be implemented in the context of a Telegram bot. Such solutions help scale bots and ensure their stable operation even under high loads.

Optimizing interaction with the database

Optimizing database queries can significantly improve bot performance. Use connection pools and asynchronous requests to reduce latency.

Using asynchronous database connection pooling:

from aiogram import types
from aiomysql import create_pool

async def get_user_data(user_id):
    async with pool.acquire() as conn:
        async with conn.cursor() as cur:
            await cur.execute("SELECT * FROM users WHERE id=%s", (user_id,))
            return await cur.fetchone()

@dp.message_handler(commands=['start'])
async def start_command(message: types.Message):
    user_data = await get_user_data(message.from_user.id)
    # Обработка данных пользователя

Optimizing the use of external APIs

When using external APIs, optimize the number and size of requests. Use asynchronous requests and cache frequently requested data.

Asynchronous request to external API:

import aiohttp

async def fetch_external_data(api_url):
    async with aiohttp.ClientSession() as session:
        async with session.get(api_url) as response:
            return await response.json()

Profiling and Debugging

Regular profiling and debugging of the bot helps identify and eliminate bottlenecks, improving its performance. For Python, one of the most popular profiling tools is cProfilewhich can give detailed information about the execution time of various parts of the code.

Let’s say you have a message processing function and you want to analyze how long it takes to execute. Can be used cProfile to collect performance data for this feature.

import cProfile
import pstats
from aiogram import types

async def process_message(message: types.Message):
    # Какие-то операции, например, обращение к базе данных
    pass

# Профилирование функции обработки сообщения
pr = cProfile.Profile()
pr.enable()
await process_message(message)
pr.disable()

stats = pstats.Stats(pr)
stats.sort_stats('cumulative').print_stats(10)  # Печать 10 самых затратных операций

You can profile the entire bot as it runs to understand which features or methods take the most time.

import cProfile
from aiogram.utils import executor

# Предположим, что dp - это экземпляр Dispatcher

# Запуск бота с профилированием
pr = cProfile.Profile()
pr.enable()

executor.start_polling(dp)

pr.disable()
stats = pstats.Stats(pr)
stats.sort_stats('time').print_stats()  # Печать статистики по времени выполнения

cProfile can be used to collect performance data about different parts of the bot.

Vertical and horizontal scaling

Vertical scaling

Vertical scaling, in the context of bots, is the process of increasing the power of the server on which the bot is running to improve its performance. This may include increasing the amount of RAM, processor power, network bandwidth, or disk space.

Vertical scaling means strengthening the hardware of the server on which the bot is running. In context This can help improve response times and increase the number of concurrent requests processed. The main advantage is ease of implementation, since it does not require changes to the code.

Application scenarios:

  • High CPU load: If the bot is performing many computationally complex tasks, increasing CPU power can improve performance.

  • Lack of RAM: For bots that store a lot of data in memory or use large databases, increasing the RAM can prevent slowdowns and crashes.

  • Network Bandwidth Issues: Improving network performance will help bots that frequently exchange data with external services.

Regularly monitoring your bot’s performance helps determine when scaling is required, and use monitoring tools to track CPU, memory, disk space, and network traffic usage.

Advantages:

  1. Ease of implementation: Usually does not require changes to the bot code.

  2. Less Difficulty: There is no need to set up a network infrastructure or distribute the load between different servers.

Flaws

  1. Scaling limitations: There is an upper limit to how much you can increase the power of a single server.

  2. Possible downtime: Scaling often requires temporarily shutting down the server.

  3. Increase in cost: Updating hardware can be expensive.

Horizontal scaling

Horizontal scaling, in the context of bots, means adding additional instances or nodes to handle an increased volume of requests, rather than increasing the capacity of a single server (as in vertical scaling).

Horizontal scaling typically includes the following components:

  1. Load distribution: Distribution of incoming requests among multiple bot instances.

  2. State and sessions: Synchronize state and session data between different instances.

  3. Database and storage: A centralized or distributed database accessible to all instances.

Implementing Horizontal Scaling

  1. Deploying multiple bot instances: This can be accomplished through containerization (e.g. Docker) and orchestration (e.g. Kubernetes), making it easier to deploy and manage multiple instances.

  2. Using a Load Balancer: The load balancer distributes incoming requests among bot instances to distribute the load evenly.

  3. State synchronization: It is important to ensure data consistency between instances. This can be achieved using state synchronization solutions such as Redis.

  4. Centralized database: All instances must have access to the same database to maintain data integrity.

Load balancing, for example, might look like this:

http {
    upstream bot_cluster {
        server bot_instance1:port;
        server bot_instance2:port;
        server bot_instance3:port;
        # Дополнительные экземпляры бота
    }

    server {
        listen 80;

        location / {
            proxy_pass http://bot_cluster;
        }
    }
}

State Synchronization with Redis

import redis

# Настройка клиента Redis
redis_client = redis.StrictRedis(host="redis_server", port=6379, db=0)

# Сохранение состояния сессии
def save_session(user_id, session_data):
    redis_client.set(f"session:{user_id}", session_data)

# Получение состояния сессии
def get_session(user_id):
    return redis_client.get(f"session:{user_id}")

Advantages

  1. Scaling flexibility: You can add as many nodes as needed, allowing for almost unlimited scalability.

  2. High Availability: The failure of one node does not lead to a complete system failure.

  3. Load distribution: Improved performance through load balancing.

Flaws:

  1. Difficulty of control: More complex infrastructure and management required.

  2. State synchronization: It is necessary to ensure data synchronization between different nodes.

  3. Infrastructure costs: May require investment in load balancers and network infrastructure.

It all depends on your budget, technical requirements, availability of resources and long-term plans. Vertical scaling is often a good initial choice for small or medium-sized projects due to its simplicity.


Scaling telegram bots is a strategic step towards ensuring sustainable and effective interaction with users. Through asynchronous programming, efficient data management, and the use of caching techniques, we can significantly improve the performance and responsiveness of our bots.

You can get even more practical programming skills as part of online courses from my colleagues from OTUS. I also remind you that in the calendar of events everyone can register for those who are interested free webinars.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *