We get candidates from Telegram chat using Telethon and Snoop

In this case study, you will learn how IT recruiters can automate the search for candidates by nicknames in Telegram and transform it from a manual process into an almost industrial one.

We will use the Telegram chat participants information as a nickname base, but you can use any other alternative data source and start from step 4.

Step 1. Registering an account in Telegram via a virtual number

We won't go into details at this stage. You can rent a virtual number through services like sms-activate.io, 5sim.biz or onlinesim.io and register a new account so you don't have to use your personal or work number. If Telegram doesn't send you a verification code when you register, just buy a ready-made account on sites like Plati.Market, lzt.market or funpay.com.

Step 2. Register a new application in Telegram

  • Go to the Telegram for Developers page:

    • Go to the following link in your browser: my.telegram.org.

    • Log in to your Telegram account using the phone number you use in the Telegram app.

  • Create a new application:

    • Once logged in, you will be taken to the “App Configuration” page.

    • Click on the “API Development Tools” or “Create Application” button.

    • Fill out the form with information about your new application:

      • App title: The name of your application (eg. MyParserApp).

      • Short name: A short name for your application (eg. parser).

      • Platform: Select the platform on which the application will run (eg. Desktop).

      • Description: You can leave a description (for example, Application for Telegram data parsing).

    • Click “Create” or “Create”.

  • Get API ID and API Hash:

    • After creating the application, you will see your API ID And API Hash.

    • Save these values ​​- you will need them to work with the Telethon library.

    Here third party guide with pictures

    If you get an error when registering the application, try refreshing the page, using Latin letters and numbers without spaces, or changing/disabling the VPN

Step 3. Parsing Telegram chat participants

For example, I will use the DevOps chat (or so) “@kubernetes_ru”. Your new account should be a member of this chat.

I run the code in PyCharm, and my main development environment is ChatGPT. I do not accept any claims regarding the quality of the code.

Install the telethon library — pip instal thelethon
Copy the code below and specify your own values ​​in the credentials block**

from telethon import TelegramClient
from telethon.tl.types import User

# Учетные данные для авторизации
api_id = 'ЗАПОЛНИТЬ'  # Ваш api_id
api_hash="ЗАПОЛНИТЬ"  # Ваш api_hash
phone_number="ЗАПОЛНИТЬ"  # Ваш номер телефона
channel="@kubernetes_ru"  # Название или ID канала/чата

# Автоматическое создание имени файла на основе имени канала
output_file = f"{channel.replace('@', '')}_users.txt"

# Создаем клиент Telegram с сохранением сессии
client = TelegramClient('session_name', api_id, api_hash)

async def main():
    # Авторизация
    await client.start(phone=phone_number)
    if not await client.is_user_authorized():
        print("Не удалось авторизоваться.")
        return

    print(f"Сбор участников из чата: {channel}")

    # Получаем всех участников чата
    participants = await client.get_participants(channel)

    # Создаем список юзернеймов без @ в начале
    usernames = [user.username for user in participants if isinstance(user, User) and user.username]

    # Сохранение юзернеймов в файл
    with open(output_file, 'w', encoding='utf-8') as f:
        f.write('\n'.join(usernames))

    print(f"Сохранено {len(usernames)} юзернеймов в файл '{output_file}'.")

# Запуск программы
with client:
    client.loop.run_until_complete(main())

The first time, the program will ask you to enter your phone number and the Telegram authorization code, which will be sent via notifications in the application. You will not need to enter it in the future.

Based on the results of the script, the data will be saved in a txt file.

Our nickname database

Our nickname database

** This script can only collect nicknames from chats with visible participants. What to do if the chat participants are hidden, I will write in the following articles on my channel.

chat participants are visible

chat participants are visible

Step 4. Working with Snoop

Snoop — is a powerful OSINT tool for searching profiles by nickname on more than 4400 different sites. This is a domestic project that includes a search on popular resources in the CIS, and also offers convenient settings, such as managing the search region. Snoop allows you to search for information both pointwise by one nickname and by entire lists.

Additionally, Snoop provides a number of options to customize your search:

  • You can control the response time from servers to avoid errors caused by slow internet connections.

  • The tool supports excluding or including search regions. You can select to search only Russian-language resources.

  • Snoop also offers the ability to search specific sites from the database or use a dynamically updated web database of over 4,000 sites.

Let's assume that you are working on Windows.
I have no idea how to do all this on MacOS 🙂 See the documentation in the repository.
If you work on Linux, you will figure everything out without my advice.

The demonstration of the program's operation is performed in demo access, the search in which is cut down to 290 sites, but even they are quite enough for the recruiter's work. An annual license with full access costs $20 and includes >4400 sites. The price is very budget-friendly.

Instructions:

  1. Download the archive from github using this link Snoop_for_Windows.rar

  2. Unzip the archive and open the command line (Win+R —> cmd)

  3. To search for information by point by 1 nickname, drag the exe file to the command line and add parameters -f YOUR_USERNAME -t 9

    Launching the program

    Launching the program

    Search results

    Search results

    Next, you can simply copy all the links to the card with information about the candidate directly from the program terminal. You can close the window that opens in the browser.

  4. If you want to process the list of nicknames, then copy the path to your file with nicknames via left shift + right click

    It will be something like this "C:\Users\User\PycharmProjects\pythonProject1\kubernetes_ru_users.txt"

    It will be something like this “C:\Users\User\PycharmProjects\pythonProject1\kubernetes_ru_users.txt”

  5. Drag the exe file into the terminal as in step 3 and add parameters to it –userlist “PATH_TO_YOUR_FILE_WITH_LIST” -t 9

    You will see something like this

    You will see something like this

  6. Run the program and wait for it to complete.

    Painted over some forbidden and unwanted links. Reduced the base to 20 values ​​(see note)

    Painted over some forbidden and unwanted links. Reduced the base to 20 values ​​(see note)

    As a result of execution, you will receive a set of separate files in several formats: html, txt and csv.

    You can view them by copying the path from the browser to explorer.

    hints that it would be necessary to buy a license

    hints that it would be necessary to buy a license

    Let's open one of them. For example, in txt format.

    We see that this data format is not suitable for further work with them, so we will do a number of procedures to create a more convenient database in the form of a csv table.

  7. Let's open the command line directly from the folder. To do this, enter cmd in the address bar.

  8. In the opened terminal we write copy *.txt all_results.txt

  9. As a result, we will get one common file with data for all our nicknames.

    It is still impossible to work with such a document.

    It is still impossible to work with such a document.

  10. Let's copy the following code into our IDE, which will clean the data and format it into a more convenient form.

    import pandas as pd
    import re
    
    
    def parse_data(filename):
        data = []
        current_username = None
        current_links = []
    
        with open(filename, 'r', encoding='utf-8') as file:
            for line in file:
                line = line.strip()
    
                # Пропускаем пустые строки
                if not line:
                    continue
    
                # Поиск юзернейма
                username_match = re.match(r"Запрашиваемый объект: <(.+?)>", line)
                if username_match:
                    # Если есть предыдущие ссылки, сохраняем их с текущим юзернеймом
                    if current_username and current_links:
                        data.append({"Username": current_username, "Links": "\n".join(current_links)})
    
                    # Обновляем текущий юзернейм и сбрасываем список ссылок
                    current_username = username_match.group(1)
                    current_links = []
                    continue
    
                # Пропускаем строки с "©2020-2024 «Snoop Project» (demo version).Адрес | ресурс"
                if line.startswith("©2020-2024"):
                    continue
    
                # Поиск строк с URL и ресурсами (содержат " | ")
                if "|" in line:
                    url, _ = line.split(" | ")
                    current_links.append(url.strip())
    
            # Добавляем последние накопленные данные
            if current_username and current_links:
                data.append({"Username": current_username, "Links": "\n".join(current_links)})
    
        return data
    
    
    # Имя вашего входного текстового файла
    input_filename = r'ПУТЬ_К_ВАШЕМУ ФАЙЛУ'  # Укажите путь к вашему файлу
    
    # Парсинг данных из файла
    parsed_data = parse_data(input_filename)
    
    # Преобразование данных в DataFrame
    df = pd.DataFrame(parsed_data)
    
    # Сохранение данных в Excel
    output_filename="НАЗВАНИЕ_ТАБЛИЦЫ.xlsx"
    df.to_excel(output_filename, index=False)
    
    print(f"Данные успешно сохранены в {output_filename}!")
  11. Copy the path to the new file with all nicknames via left shift+right click. And paste it into the new code. Line #47.

  12. We run the script and receive a message that the table has been successfully generated.

  13. We open a new table and see that the data has now been converted into a format that is convenient for working with.

  14. If everything worked out, then I congratulate you. Now you are a professional sourcer. You can write the word OSINT in your resume and ask for a raise.

Note:
In this example, I have reduced the list of nicknames from 8 thousand to 20 for demonstration purposes. It is possible to check all 8 thousand, but it will take much more time.

Also, remember that the links you find to resources are not necessarily related to the target candidate's nickname. The more popular the nickname, the more often it is used by different people.

Conclusion

As you can see, even minimal skills in Python and access to any modern LLM opens up quite non-trivial ways for an IT recruiter to search for candidates, which means that you can collect your own almost unique databases and work with them.

If you are interested in advanced sourcing, subscribe to my group “Sourcing for perverts“. In it, we will analyze how to extract participants from chats, even if the administrator has hidden them using privacy settings. Also, if you need an internal or external IT recruiter, you can always contact me on Telegram: @rudenko_telegram.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *