The Best of Kaggle: What Competitive Data Science Is and How to Succeed in It

7 min


Hello Habr! IN blog on our website we regularly publish articles about data and everything related to it. We publish some materials from there and here.

How do companies know which data scientist is cooler when they hire them? How to show your talent and become famous in the community? On the basis of what is the rating formed, based on which you can then be hired for a prestigious position? We will tell you about the most famous competitive platform, the possibilities and rules of its game, and also reveal the list of the best participants from Russia.


Data science is, by definition, a science. Therefore, in order to evaluate developers and analysts for a long time, the widespread among scientists has been and is applied Hirsch index… It helps, by the number of publications and their citation, to understand how much scientific work is in demand – and hence its author.

The Hirsch index h is equal to the number of articles, each of which has been referenced at least h times. That is, in order to calculate it, they take all the scientist’s articles that were quoted by his colleagues, arrange them in decreasing order of the number of references to them, assigning them numbers. After that, they find the last article, whose number does not exceed the number of citations. This number is the Hirsch index.

Complicated? It seems not very good, and real data scientists understand right away – just not very suitable for evaluating their work. After all, the result of their work is much more often a code, not a scientific text. In addition, data scientists are in demand in the market, and the market is more important about examples of algorithms than achievements in science.

But often companies keep information about their employees and their work secret. Data Scientists are especially carefully hidden in Russia, where observed a huge shortage of personnel in this area.

In response to demand, competitive platforms for developers have grown in popularity. The most famous service is Kaggle (pronounced: “cajl”), which is owned by Google. It is used students, and professional developers tell you how to upgrade your rating. The solutions applied there set fashion among data scientists, and companies in Russia and in the world pay attention to their place in Kaggle’s ratings when hiring.

In 2017 at Kaggle was registered more than a million users, and in August 2020 users from Russia googled service is almost as common as the phrase “Big Data”:

Kaggle is completely free and anyone can host a data mining contest or participate in an existing one. The system contains sets open dataand also provides cloud processing and machine learning tools. There is also an opportunity to study and a section for posting vacancies, where contests will also help to select the best candidates.

How it works

One of the interesting features of Kaggle, thanks to which it became so popular in the data science environment, is rating system

Users can earn points and improve their ranking in four different categories:

  • Competition. Alone or as a team, you solve machine learning problems. Competitions are very diverse: from a simple and straightforward prediction task the number of survivors on the Titanic before evaluating the effectiveness of defense when playing the pass from the NFL Big Data Bowl 2021.
  • Program code. Share your code with the community by running it on Kaggle Notebooks, a cloud computing environment.
  • Data sets. You can help other data scientists by sharing new data.
  • Discussions. Discuss tasks and share your best solutions, as well as rate other users’ posts.

Promotion in each of the categories does not depend on the others. Different levels of achievements are available in them:

  • Beginner. You just need to register.
  • Participant. You filled out your profile and chatted with the community, and also used all the platform’s features:
    – We launched one script.
    – We took part in one competition.
    – We wrote one comment.
    – We gave one vote to one of the participants.
  • Expert. You have completed a significant amount of work at Kaggle in one or more areas of expertise and earned Bronze Medals. Each category requires a different number of medals, and after completing the achievement, you will be placed in the Kaggle rating of the corresponding category.
  • Master. To obtain this level, you must demonstrate superiority in one or more knowledge categories on Kaggle and receive silver or gold medals depending on the categories. Masters in the “Competition” category have the right to participate in exclusive competitions that are not available in other categories.
  • Grandmaster. You consistently show outstanding performance and receive gold medals. You are the best of the best.

Medals are awarded for excellent results in competitions, popular program code or useful data set and remain forever. At the same time, points lose their value over time, which allows the overall ranking to remain relevant.

Who comes first?

Most in Kaggle registered users from India and the USA. Russians occupy a stable fifth place in the overall rating of countries – between China and Japan. First place in the overall ranking date science competitions takes Guanshuo Xu is a data scientist based in New York. For five years, he scored more than 255 thousand points in Kaggle-competitions (this is an absolute record).

Guangshuo finished Bachelor’s degree in Electrical and Electronic Engineering from Tongji University in Shanghai, and then entered the Master’s degree at the University of New Jersey. Since 2010, he has been working on image recognition and machine learning algorithms, in 2017 he first became a grandmaster at Kaggle, and since 2019 he has been working as a Data Scientist at H2O.ai (Cisco, Intel and PayPal use the algorithms of this company).

The best data scientists from Russia according to Kaggle

To compile a list of the best practicing data scientists in Russia, we used data participants in Kaggle competitions who have personal information.

The strongest from the Russian developer participating in the Kaggle competition Dmitry Gordeev (dott) also works in H2O.ai. He signed up with Kaggle eight years ago and has 114,000 points today.

In the overall Kaggle rating, he ranked ninth… Dmitry graduated from Moscow State University in 2010, doing image recognition and data mining there. Having worked in the retail risk modeling group at a bank since 2008, he has grown to a divisional director and moved to Austria in 2013. In 2014 he took a course on data science on Coursera, and in 2020 he joined team in H2O.ai.

On second place among Russian data scientists in the Kaggle competition rating – Arthur Kuzin (n01z3) – he ranks 28th in the overall Kaggle ranking, with more than 71 thousand points.

Arthur graduated from the Moscow Institute of Physics and Technology in 2011 and worked in research analytics from 2008 to 2016. After that, he got a job at Avito as a Data Scientist, and for the past few years has been leading the Computer Vision team at X5 Retail Group. Arthur’s several publications in physics and a patent for a device for calibrating transmission electron microscopes.

Third place in the overall ranking of Kaggle competitions among Russians, Artem Kulakov (Art) – in the overall ranking he is 29th and 71 thousand Kaggle points, which he earned in two years of participation in the competition. Artem is studying at the Higher School of Economics with a degree in Computer Science and has already worked as a Data Analyst at Tinkoff Bank and Megafon. Artem is now freelancing and specializing in Computer Vision and NLP tasks.

In fourth place Roman Soloviev (ZFTurbo) – he has 69 thousand points and 31st place in the overall ranking of Kaggle competitions. Roman is a leading researcher at the Institute for Design Problems in Microelectronics of the Russian Academy of Sciences.

In fifth place Ilya Larchenko (ilialar), currently ranked 37th in the overall Kaggle rankings with 65k points. Ilya graduated from Moscow Institute of Physics and Technology in 2014, and then worked as an analyst and developer. Since 2017 he led team of Data Scientists at DOC +, and in 2020 he moved to Thailand, where he works as Data Science Manager at Agoda.

A small element of gamification that allows users to earn points and medals in Kaggle competitions has changed the hiring game.

The example of the best data scientists from Russia shows that education and experience working with data are not so important for building a successful career. For example, Artem Kulakov is still studying at the university, and he started taking part in competitions at Kaggle only two years ago. Now he is on the list of the best data scientists in Russia and works as a freelancer. Guangshuo Xu graduated with a bachelor’s degree in Electrical and Electronic Engineering and now works at H2O.ai, a leader in open source data science solutions.

Start with simple tasks today – and who knows, maybe in a year or two you will be in the ranking of the best data scientists and move progress forward by implementing technology HIV research, models predicting the congestion of highways and much more. The main thing is to have the desire to develop in the field of Data Science and to practice as much as possible.

image

Recommended articles

  • How Much Data Scientist Earns: An Overview of Salaries and Jobs in 2020
  • How Much Data Analyst Earns: An Overview of Salaries and Jobs in 2020
  • How to Become a Data Scientist Without Online Courses
  • 450 free courses from the Ivy League
  • How to learn Machine Learning 5 days a week for 9 months in a row
  • Machine Learning and Computer Vision in the Mining Industry

0 Comments

Leave a Reply