Working with the HeadHunter API with python


We understand in practice with the HeadHunter API using python.

The task of analyzing vacancies in the labor market has appeared, and it must be carried out using the HeadHunter database. You need to get all the vacancies of a certain company in all cities of Russia. After reviewing the API documentation on github (https://github.com/hhru/api), let’s get to work.

We use python to solve the problem. We import the libraries necessary for work:

import requests                          # Для запросов по API
import json                                # Для обработки полученных результатов
import time                                # Для задержки между запросами
import os                                   # Для работы с файлами
import pandas as pd                  # Для формирования датафрейма с результатами

It is worth dealing with such a thing as areas. In total there are 9 conditional zones (countries):

Country ID

The name of the country

5

Ukraine

nine

Azerbaijan

sixteen

Belarus

28

Georgia

40

Kazakhstan

48

Kyrgyzstan

97

Uzbekistan

113

Russia

1001

Other regions

Each country has its own internal zones, which can be viewed by accessing HH (https://api.hh.ru/areas) with the area parameter equal to the country ID. For example, over 4 thousand different cities, villages and other settlements will be found for Russia.

To get all countries with all their internal zones, use the following code snippet:

def getAreas():
    req = requests.get('https://api.hh.ru/areas')    
    data = req.content.decode()
    req.close()
    jsObj = json.loads(data)
    areas = []
    for k in jsObj:
        for i in range(len(k['areas'])):
            if len(k['areas'][i]['areas']) != 0:                      # Если у зоны есть внутренние зоны
                for j in range(len(k['areas'][i]['areas'])):
                    areas.append([k['id'], 
                                  k['name'], 
                                  k['areas'][i]['areas'][j]['id'],
                                  k['areas'][i]['areas'][j]['name']])
            else:                                                                # Если у зоны нет внутренних зон
                areas.append([k['id'], 
                              k['name'], 
                              k['areas'][i]['id'], 
                              k['areas'][i]['name']])
    return areas

areas = getAreas()

If you are interested in a request for a specific zone (country), then you need to specify the ID of the required zone in the request parameters, for example, for Russia: {‘area’: 113}

Here is part of what will be stored in the areas variable:

The next step is to find employer IDs.

To do this, you need to get the number of employers at the moment and take into account the fact that not all sequence numbers exist and the internal restrictions of the HH API on paged search, the depth of which is only 2000 values.

def getEmployers():
    req = requests.get('https://api.hh.ru/employers')
    data = req.content.decode()
    req.close()
    count_of_employers = json.loads(data)['found']
    employers = []
    i = 0
    j = count_of_employers
    while i < j:
            req = requests.get('https://api.hh.ru/employers/'+str(i+1))
            data = req.content.decode()
            req.close()
            jsObj = json.loads(data)
            try:
                employers.append([jsObj['id'], jsObj['name']])
                i += 1
                print([jsObj['id'], jsObj['name']])
            except:
                i += 1
                j += 1
            if i%200 == 0:
                time.sleep(0.2)
    return employers
    
employers = getEmployers()

The result of what will be stored in the employers variable:

Let’s take for example 2GIS with ID 64174 and find all vacancies by employer in the context of each zone of Russia (ID 113). In the getPage function, as input parameters, we will only make the page number for page-by-page search and the zone where we will look for vacancies.

def getPage(page, area):
    params = {
        'employer_id': 3529,      # ID 2ГИС
        'area': area,                     # Поиск в зоне
        'page': page,                   # Номер страницы
        'per_page': 100               # Кол-во вакансий на 1 странице
    }   
    req = requests.get('https://api.hh.ru/vacancies', params)
    data = req.content.decode() 
    req.close()
    return data

Part of the code where the getPage function is used:

for area in areas:
    for page in range(0, 20):
        jsObj = json.loads(getPage(page, area[2]))
        if not os.path.exists('./areas/'):
            os.makedirs('./areas/')
        nextFileName="./areas/{}.json".format(str(area[2])+'_'+str(area[3])+'_'+str(page))
        f = open(nextFileName, mode="w", encoding='utf8')
        f.write(json.dumps(jsObj, ensure_ascii=False))
        f.close()
        if (jsObj['pages'] - page) <= 1:  
            print('[{0}/{1}] Область: {3} ({2}) - {5} ({4}) Вакансий: {6}'.format(area_list_id+1, 
                                                                         len(areas), 
                                                                         area[0], 
                                                                         area[1], 
                                                                         area[2], 
                                                                         area[3], 
                                                                         jsObj['found']))
            break
    time.sleep(0.2)

We save intermediate results in json format for each zone separately, including for zones where no vacancies were found. Now let’s group them into one file:

dt = []
for fl in os.listdir('./areas/'):
    f = open('./areas/{}'.format(fl), encoding='utf8')
    jsonText = f.read()
    f.close()
    jsonObj = json.loads(jsonText)
    if jsonObj['found'] != 0:
        for js in jsonObj['items']:
            if js['salary'] != None:
                salary_from = js['salary']['from']
                salaty_to = js['salary']['to']
            else:
                salary_from = None
                salaty_to = None
            if js['address'] != None:
                address_raw = js['address']['raw']
            else:
                address_raw = None
            dt.append([
                js['id'],
                js['premium'],
                js['name'],
                js['department']['name'],
                js['has_test'],
                js['response_letter_required'],
                js['area']['id'],
                js['area']['name'],
                salary_from, 
                salaty_to,
                js['type']['name'],
                address_raw,
                js['response_url'],
                js['sort_point_distance'],
                js['published_at'],
                js['created_at'],
                js['archived'],
                js['apply_alternate_url'],
                js['insider_interview'],
                js['url'],
                js['alternate_url'],
                js['relations'],
                js['employer']['id'],
                js['employer']['name'],
                js['snippet']['requirement'],
                js['snippet']['responsibility'],
                js['contacts'],
                js['schedule']['name'],
                js['working_days'],
                js['working_time_intervals'],
                js['working_time_modes'],
                js['accept_temporary']
                ])

The resulting intermediate result is saved in a DataFrame and saved as an Excel file.

df = pd.DataFrame(dt, columns = [
                                'id',
                                'premium',
                                'name',
                                'department_name',
                                'has_test',
                                'response_letter_required',
                                'area_id',
                                'area_name',
                                'salary_from', 
                                'salaty_to',
                                'type_name',
                                'address_raw',
                                'response_url',
                                'sort_point_distance',
                                'published_at',
                                'created_at',
                                'archived',
                                'apply_alternate_url',
                                'insider_interview',
                                'url',
                                'alternate_url',
                                'relations',
                                'employer_id',
                                'employer_name',
                                'snippet_requirement',
                                'snippet_responsibility',
                                'contacts',
                                'schedule_name',
                                'working_days',
                                'working_time_intervals',
                                'working_time_modes',
                                'accept_temporary'
                                ])   
df.to_excel('result_2gis.xlsx')

Screenshot of part of the end result inside Excel:

Without much difficulty, we completed the task set before us and received all possible vacancies for a particular employer in different territories.

This guide to working with the HeadHunter API covers the basic functionality of the API. However, for a successful understanding of all the functionality and capabilities, you can independently familiarize yourself with github HH.ru (https://github.com/hhru/api) or wait for our next article on this topic, where we will look at more complex examples.