Himself Rosstat

Start here

Guiding documents on the organization of primary medical and social care for the population prescribe a comparative analysis of the population size by territorial areas (the population standard in the therapeutic area is 1700 adults, in the pediatric area – 800 children, in the obstetric-gynecological area – 3300 women aged 15 and older, etc.).

Estimation of the population by the subjects of the Russian Federation Rosstat publishes annually on January 1 of the current year. For large cities, the territories served by the population by medical organizations often do not coincide with the targeted administrative division, and the distribution of the population according to the areas of responsibility of the medical organization becomes more of a creativity than a technological procedure. The question of how to decompose data from the Rosstat bulletin to the medical site remains unresolved for me.

We will go the other way. We can get population data from the health information system (MIS). The IIA database is enriched on a regular basis with data from insurance companies on insured persons under the Compulsory Medical Insurance (CMI) program.

For work, we need an impersonal download from the MIS containing data on patients: gender, date of birth, registration address, address of actual place of residence, data of a medical organization and site number for a therapeutic or pediatric profile. I loaded it into pandas.dataframe.

Calculate the age of patients on the current date and sort in ascending order. In our case, MIS uses several formats for storing the date of birth, we take this into account when writing the function.

def calculate_age(birth_date):
    format_str=["%d.%m.%Y", "%d.%m.%Y %H:%M:%S", "%Y-%m-%d %H:%M:%S"]
    today = date.today()
    i = 0

    while (i < len(format_str)):
        try:
            birth_date = datetime.datetime.strptime(birth_date, format_str[i]) 
        except:
            i +=  1
    try:
        age = today.year - birth_date.year
        full_year_passed = (today.month, today.day) < (birth_date.month, birth_date.day)
        if not full_year_passed:
            age -= 1
        #в каждой БД есть свой ветеран Куликовской битвы
        #if age > 100:
        #   print(birth_date)

    except:
        age = -1
    return age


df['age'] = df['birthdate'].apply(calculate_age)
df = df.sort_values(by='age', ascending=True)

pandas methods allow you to aggregate and group data. Let’s group the population by age for frequently used information requests and build a graph

#Группировка дети/взрослые и т.д методами Pandas

df['age_lvls_child_adults'] = pd.cut(df['age'],bins=[-999,0,18,199],
                                     labels=['Неуказан','0-17','18+'],right=False)

df['age_lvls_foms'] = pd.cut(df['age'],bins=[-999,0,5,18,60,199],
                             labels=['Неуказан','0-4','5-17','18-59', '60+'],right=False)
df['age_lvls_stat']=pd.cut(df['age'],
                             bins=[-999,0,
                                  5,10,15,20,25,
                                  30,35,40,45,50,
                                  55,60,65,70,199],
                             labels=['Неуказан',
                                     '0-4','5-9','10-14','15-19','20-24',
                                     '25-29','30-34','35-39','40-44','45-49',
                                     '50-54','55-59','60-64','65-69','70+'],right=False)
df['age_lvls_death']=pd.cut(df['age'],
                             bins=[-999,0,1,2,3,4,
                                  5,10,15,20,25,
                                  30,35,40,45,50,
                                  55,60,65,70,75,
                                  80,85,199],
                             labels=['Неуказан','0..','1','2','3',
                                     '4','5-9','10-14','15-19','20-24',
                                     '25-29','30-34','35-39','40-44','45-49',
                                     '50-54','55-59','60-64','65-69','70-74',
                                     '75-79','80-84','85+'],right=False)

df_age_lvls_stat = df.groupby(by=['age_lvls_death'],
                              as_index=False)[['ID','age']].count()
fig_stat = px.bar(df_age_lvls_stat, x='age_lvls_death', y='ID',
                  labels={'age_lvls_death':'Возрастная группа','ID':'Количество человек'}
                 )
fig_stat.show()

Let’s split the data by gender and build a graph:

#Полово-возрастной состав, расчетная база для группировки “Смертность населения”
df_gb = df.groupby(by=['age_lvls_death','gender'],
                   as_index=False)[['id']].agg(['count']) .reset_index()

fig_gender = px.bar(df_gb, x='age_lvls_death', y='id', 
                    color="gender",
                    color_discrete_map={
                        'женский': 'red',
                        'мужской': 'blue'
                    },
                    labels={'age_lvls_death':'Возрастная группа',
                            'gender':'Пол','id':'Количество человек'},
                    barmode="group", 
                    height=400 
                   )
fig_gender.show()

For more complex queries considering gender, different age for retirement, I used the select method of the numpy package:

male="мужской"
female="женский"
#Примечание: мужчины 16 61 года, женщины 16 56 лет. 
#Трудоспособный возраст в соответствии с приказом Росстата: 
#2021 год 16 60 лет для мужчин и 16 55 лет для женщин; 
#2022 года 16 61 года для мужчин и 16 56 лет для женщин. 

conditions_mzrf = [
    df['age'] < 0,
    df['age'].between(0, 15),
    (df['age'].between(16, 60)) & (df['SEX'] == male),
    (df['age'].between(16, 55)) & (df['SEX'] == female),
    (df['age'] >= 61) & (df['SEX'] == male),
    (df['age'] >= 56) & (df['SEX'] == female)
 ]

choices_mzrf = [' Неуказан','0-15','16-55/16-60','16-55/16-60','56+/61+','56+/61+']

df['age_lvls_trud'] = np.select(conditions_mzrf, choices_mzrf)

fig_trud = px.bar(df_trud.sort_values('age_lvls_trud', ascending=True),
                    x='age_lvls_trud', y='ID', 
                    color="SEX", #включает легенду из столбца ДФ
                    color_discrete_map={'женский': 'red',
                        'мужской': 'blue'},
                    labels={'age_lvls_trud':'Возрастная группа','SEX':'Пол',
                            'ID':'Количество человек'},
                    barmode="group", 
                    text_auto=True, 
                    height=400,                   
                    color_discrete_sequence=px.colors.diverging.Spectral
                   )
fig_trud.update_traces(textfont_size=14, textangle=-90, textposition="inside",
                       cliponaxis=True)
fig_trud.update_layout(uniformtext_minsize=8, uniformtext_mode="hide")
fig_trud.show()

So, having an up-to-date population database from a medical information system, it is possible in a few minutes to compile a statistical bulletin on it for the current date for a region, district of a region or city, a medical organization, a medical site or a house. It is also easy to organize a comparison of your data with the official information of Rosstat.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *