Analysis of open data on the websites of MedSwiss and MEDSI, part 1

Hello everyone! Today we will talk about parsing data from websites of medical clinic networks. We chose this direction for two reasons: 1 – it is a highly profitable business; 2 – all the cataclysms that happen in the world do not “pump” this sphere to the minus sign. Because you can deny yourself a trip to a restaurant, buy a new iPhone, even realize the impossibility of taking out a mortgage. But, if God forbid health problems appear – then there is no time for saving. And unfortunately, even with a compulsory medical insurance policy, it is actually possible to make an appointment with the necessary specialist no earlier than in 2-3 weeks. This is especially true for residents of large cities. That is why we began parsing data from Moscow clinics.

Let's make it clear right away that this article is probably of the greatest interest for business analysis. We will not focus on technical details this time. We will limit ourselves to the minimum: the parser is written in PHP using the DiDom library. If you look at our previous article about Twitter parsing, we used python + selenium. But DiDom is faster than selenium, so this stack was chosen.

Let's get back to the main story. It should be noted that the division of labor in the medical field continues to grow. New highly specialized specialists are emerging. And there is demand for them. I myself have two policies (OMS and VHI), but over the past two years I have encountered a situation a couple of times when neither policy covered the specialist I needed. And I had to consider options for receiving the service either in a state hospital through a referral (long and free), or for an additional fee in a specialized commercial clinic.

And so we took two popular networks of medical centers in Moscow: MedSwiss and MEDSI. The data was taken from public sites https://www.medswiss.ru/ And https://medsi.ru/ respectively. MedSwiss consists of 13 points in Moscow and 3 points in St. Petersburg. All of them have more or less the same set of services.

And MEDSI turned out to be significantly larger in size, although limited to one region of presence (Moscow):

Ø 3 pcs. clinical diagnostic center;

Ø 27 pcs. children's clinic;

Ø 52 pcs. primary care clinic;

Ø 3 pcs. premium segment clinic;

Ø 1 pc. multifunctional medical center;

Ø 1 unit. department of centralized home care;

Ø 1 pc. MEDSI Mental Health Center;

Accordingly, obtaining MEDSI data was a bit more difficult, due to the sophisticated structure of the site, due to the hierarchy and volume of the network itself. But nevertheless, we managed. True, so far we have limited ourselves to only a few service sections. But more on that later.

So, let's look at the structure of the results. MedSwiss yielded 6,777 paid services across 75 sections. Example data:

If we were to build a diagram by sections with a number of services > 100 pcs., then out of 76 sections we get 18 leaders:

Dentistry, infectious disease diagnostics and allergology are far ahead of the rest in this hit parade.

Let's now look at the price section in a logarithmic scale by percentile > 300K rubles. Because if you look at it in a normal scale, Dentistry turns into a rocket compared to the rest.

But let's remove Stomatologiya as the leader for the sake of interest and look at the shares in % for the first 20:

Further on, you can look even deeper into the details of the most voluminous sections on services. But for now, let's stop here.

And now, let's see what we have for MEDSI. As mentioned above, we limited ourselves to only 5 service sections:

o Treatment of insomnia and sleep disorders;

o Treatment of headaches;

o Treatment of polyneuropathies;

o MRI (magnetic resonance imaging);

o Appointment with a neurologist;

If we look at the comparative analysis of the number of services in these 5 categories, we get:

The picture above is very large, so let's limit ourselves to, for example, only the Headache Treatment section:

Perhaps one of the interesting points is the comparison of the cost of network services. So far, we have not bothered with the automation of such a dashboard. But for example, let's compare the appointment with a neurologist:

It is noteworthy that MedSwiss prices for the same service vary depending on the clinic category. MEDSI prices differ depending on the clinic where the service is provided. It is also clear that the lower limit of the cost of a neurologist appointment at MedSwiss is comparable to the upper limit of the cost of the same service at MEDSI. This is an interesting observation, although it only concerns a neurologist. At the same time, we remember that MEDSI is many times larger than MedSwiss. Perhaps the networks have different business strategies. But this, of course, should be checked for all services, if this is the goal. And perhaps we will do this in the next article, if we understand from the comments that this topic is of interest.

And here we will finish 1 hour on the topic of parsing medical services and analyzing the obtained open data. We will be glad to receive comments/questions/advice. Also, if there are any proposals for cooperation or any other requests – you can write in a personal message here or in tg @gromstorm

P.S. A few words about us. We are a team of 2 people. The product owner and the developer. We have quite a lot of experience in IT in the corporate segment. The first applications were written back in the early 2000s (then they were called programs).
Over the past few years, we have tried many different ideas and hypotheses, participated in and won hackathons, Digital Breakthrough, etc. We work in large companies: website design/development and the financial sector. This is far from our first approach to the parsing projectile. We have previously parsed large RuNet portals, but decided to write about it now. Until then, and thanks for reading!)

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *