Custom Machine Learning

Time series

In this section we will consider materials on time series. There are not very many materials, therefore, as in several previous sections, there will be no division by type.

Time series are used in statistics, signal processing, pattern recognition, econometrics, finance, weather forecasting, earthquake prediction, electroencephalography, astronomy, and any field in which measures change over time.

Time series

Time series example

Time series example

An article on time series from a textbook on machine learning from SHAD, which first introduces the concept of time series and provides examples, and then talks about reducing the problem of time series prediction to a regression problem. The article ends with a section on time series decomposition.

Topic 9. Time Series Analysis with Python

The Open Machine Learning Course, which we talked about in the second article, has chapter 9 dedicated to time series analysis.

It tells how to work with them in Python, what possible methods and models can be used for forecasting; what is double and triple exponential weighting? what to do if stationarity is not for you; how to build SARIMA and not die; and how to predict with xgboost.

Time series forecasting

Lecture by K.V. Vorontsov about time series.

Time Series

Kaggle Learn's introduction to time series.

Forecasting time series with gradient boosting: Skforecast, XGBoost, LightGBM, Scikit-learn and CatBoost by Joaquín Amat Rodrigo, Javier Escobar Ortiz

This guide shows how to use skforecast library methods for time series forecasting using models from the XGBoost, LightGBM, Scikit-learn and CatBoost libraries.

Gruzdev A.V., Rafferty G. Time Series Forecasting with Prophet, sktime, ETNA and Greykite paid

Forecasting is one of the tasks of data science that is central to many activities within an organization. The book is dedicated to the popular time series forecasting libraries Prophet, Sktime, ETNA and Greykite. The mathematical apparatus and API of each library are analyzed. Examples of solving problems of forecasting, classification and clustering of time series are shown, and the topics of design and selection of features for time series are illustrated.
As examples of forecasting, data from a variety of areas is used – the level of carbon dioxide in the atmosphere, sunspot cycles, the amount of local precipitation, the number of likes on popular social networks, etc. The publication will be of interest to data scientists who regularly solve problems with time series.

At home telegram the author additionally offers “Applied analysis of time series in Python in 4 volumes”, in which, in addition to this book, there is also a set of materials “The classics are immortal” (preliminary analysis of the series, simple models, feature construction, validation strategies, AR, MA, (S )ARIMA(X), ETS, VAR, TBATS/BATS, gradient boosting, series clustering, hierarchical series). 1200-page manual.”

Big Data

This direction is a little out of the selection because it characterizes not the problem that we solve, but the tool with which we do it, but since companies with a large amount of data at their disposal often require knowledge of these tools (namely, Spark ), and they ask about them in interviews, I decided to include this section in the article.

Analyzing data using the Spark framework from VK

A great, quick guide to using the Spark API.

Introducing Apache Spark from DataLearn

In the 7th module of the course Introduction to Data Engineering and Analytics there is an acquaintance with an open source solution for analytics and data engineering – Apache Spark and its commercial version Databricks and analogues Amazon Glue and Azure Synapse. You will learn about examples of using Spark in the industry and popular use cases. The author will talk about his experience with Apache Spark at Amazon and Microsoft and teach you how to work with data using PySpark and Spark SQL, as well as share the best books and materials on this topic.

  1. Introduction

  2. What is Apache Spark

  3. Getting started with Apache Spark

  4. Introducing the Spark API

  5. Spark SQL and operations in Spark

⭐ Perrin J.J. Spark in action / Spark in Action by Jean-Georges Perrin

Enterprise data analysis begins with reading, filtering, and combining files and streams from many sources. Spark's data processing engine is capable of handling these diverse volumes of information as a proven leader in the field, delivering 100 times faster speeds than Hadoop, for example. With SQL support, an intuitive interface, and a simple, clear multi-language API, you can use Spark without having to dig deep into a complex new ecosystem.

This book will teach you how to create complete analytics applications. As an example, a complete pipeline for processing data coming from NASA satellites is used.

Learning Spark by Jules S. Damji, Brooke Wenig, Tathagata Das & Denny Lee

This book offers a structured approach to learning Apache Spark, covering new developments in the project.

Data Analysis with Python and PySpark by Jonathan Rioux

This is your guide to running successful data science projects using Python. Filled with relevant examples and techniques, this practical book will teach you how to create pipelines for reporting, machine learning, and more. The exercises in each chapter will help you practice what you've learned and get you started using PySpark quickly.

Let's sum it up

If you haven't read it yet, I recommend reading the blocks Learning How to Learn and Let's summarize the results from the first article, since everything said there is also applicable for preparing for the section on specialized machine learning.

The materials collected in this article will be useful in preparing for interviews various positions in Big Data MegaFon.

And if you are just starting your career in Data Science, then pay attention to internships in large companies, where you can not only improve your knowledge, but also gain cool experience in applying theory to practical business problems. At MegaFon, an example of such an internship is the accelerator (email with the subject “internship in big data“), with the help of which Data Scientists, Data Analysts and Data Engineers find their first jobs every year.

What's next?

In the next article we will analyze materials for preparing for the design of machine learning systems.

You can find the latest resources for this series of articles in the repository Data Science Resources, which will be maintained and updated. This time, more materials have been posted in the repository than were included in this article (because there are a lot of them), so I advise you to go there and find what you need.

You can also subscribe to my telegram channel Data Science Weeklyin which I share interesting and useful materials every week.

If you know of any cool resources that I didn't include in this list, please write about them in the comments.

PS Thanks to Daria Shatko for editing and proofreading this post!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *