How I Would Study Data Science If I Started A Couple Of Years Ago, or A Guide to Learning Data Science Effectively

When I first started my journey into data science, I spent a lot of time figuring out where to start, what I should learn first, and what resources I should use. Over the past two years, I have learned a few things that I wanted to know about before, such as whether to focus on programming or statistics first, what resources should I use to learn new skills, how should I approach learning those skills, and so on. Thus, this article is written to provide directions and ideas for those who are studying Data Science


Table of contents:

Introduction
1. Mathematics and statistics
2. Fundamentals of programming
3. Algorithms and concepts of machine learning
4. Projects in the field of data science

Introduction

My guess is that as a budding data scientist, you will want to fully understand the concepts and details of various machine learning algorithms, data science concepts, and so on.
Therefore, I recommend that you start from the base before you even look at machine learning algorithms or data analysis applications. If you do not have a basic understanding of calculus and integrals, linear algebra and statistics, it will be difficult for you to understand the mechanics behind the various algorithms. Likewise, if you don’t have a basic understanding of Python, it will be difficult for you to translate your knowledge into real-world applications. Below are the order of the topics that I recommend studying:

  1. Mathematics and statistics.
  2. Basics of programming.
  3. Machine learning algorithms and concepts.


1. Mathematics and statistics

As with everything else, you should learn the basics before getting into the fun stuff. Trust me, it would be much easier for me if I started by learning math and statistics before getting started with some machine learning algorithms. Three general topics that I recommend looking at are calculus / integrals, statistics, and linear algebra (in no particular order).

Integrals

Integrals are important when it comes to probability distribution and hypothesis testing. While you don’t need to be an expert, it’s in your best interest to learn the basics of integrals. The first two articles are intended for those who want to get an idea of ​​what integrals are, or for those who just need to brush up on their knowledge. If you know absolutely nothing about integrals, I recommend that you take the Khan Academy course. Finally, here are links to a number of practical tasks to hone your skills:

Statistics

If there is any topic that you should focus on, it is statistics. After all, a data scientist is truly a modern statistician, and machine learning is a modern term for statistics. If you have time, I recommend that you take the Georgia Tech course called “Statistical Methods», Which covers the basics of probability, random variables, probability distribution, hypothesis testing, and more. If you don’t have time to devote yourself to this course, I highly recommend watching Khan Academy video on statistics

Linear algebra

Linear algebra is especially important if you want to dive into deep learning, but even then it’s useful to know it for other fundamental machine learning concepts such as principal component analysis and recommender systems. For mastering linear algebra, I also recommend Khan Academy!


2. Fundamentals of programming

Just as a fundamental understanding of math and statistics is important, a fundamental understanding of programming will make your life so much easier, especially when it comes to implementation. Therefore, I recommend that you take the time to learn basic languages ​​- SQL and Python, before diving into machine learning algorithms.

SQL

It doesn’t matter where to start, but I would start with SQL. Why? It is easier to learn and useful to know if you are employed in a data company, even if you are not a data scientist.

If you are new to SQL I recommend reading with the Mode tutorials on SQLas they are very concise and detailed. If you want to learn more advanced concepts, take a look list of resources where you can learn advanced SQL

Below are a few resources that you can use to practice SQL:

Python

I started out with Python and will probably stay with this language for the rest of my life. It is far ahead in terms of Open Source contributions and is easy to learn. Feel free to turn to R if you want, but I have no opinions or advice on R. I have found that learning Python through practice is much more rewarding. Nevertheless, after taking several Python crash courses, I came to the conclusion that this course is the most complete (and free!).

Pandas

Perhaps the most important library to know is Pandas, which is specifically designed for data manipulation and analysis. Below are two resources that should accelerate your learning curve. The first link is a tutorial on how to use Randas, and the second link contains many practical tasks that you can solve to solidify your knowledge!


3. Algorithms and concepts of machine learning

If you’ve gotten to this part of the article, it means you’ve built your foundation and are ready to learn interesting things. This part is split into two others: machine learning algorithms and machine learning concepts.

Machine learning algorithms

The next step is to learn about the various machine learning algorithms, how they work and when to use them. Below is a partial list of the various machine learning algorithms and resources that you can use to learn each of them.

Machine learning concepts

Plus, there are a few fundamental concepts of machine learning that you will want to learn as well. Below is a (non-exhaustive) list of concepts that I highly recommend learning. Many interview questions are based on these topics!


4. Projects in the field of data science

By this point, you will not only have built a solid foundation, but you will also have a solid understanding of the fundamentals of machine learning. Now it’s time to work on some personal side projects. If you want to see some simple examples of data science projects, check out some of my projects:

  • Predicting wine quality using classification methods (article, Github).
  • Visualizing coronavirus data with Plotly (article, Github).
  • Movie recommendation system with filter collaboration (Github).

Here list of projects in the field of Data Sciencewhich you can watch to come up with an interesting side project.

I hope this post will give you direction and help in your career in Data Science. There is no silver bullet, so feel free to take this post with a grain of salt, but I do believe that learning the basics will pay off in the future. A promo code HABR – will add 10% to the tuition discount shown on the banner.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *