AutoML Tools in 2024

Let's define the terminology. You can find a dozen formulations “AutoML is…» with varying degrees of detail. But they all boil down to the words “AutoML – automates and simplifies working with data.” And this is where the difficulties begin. The definition of AutoML is blurred. There are frameworks that work on “3 lines” of code, some from a GUI platform, and there are libraries for professionals and beginners.

People go to AutoML for several reasons: because of inexperience, because of laziness, because of lack of time, because of great intelligence. The author most likely belongs to the first three categories.

Perhaps the main thing you need to understand when introducing AutoML into your projects is that it is not a magic pill for all tasks. For each type of problem we can use a different tool. I rather like to look at the AutoML subset as a craftsman's room filled with screws, screwdrivers, hammers, drills, lathes, and sanders. Entering this room is easy, but choosing the right tool, getting the result and leaving the room without injury is not always possible.

What application problems can AutoML help with?

Preparing data for models
EDA
Feature Engineering
Selection of models and their parameters
Explainability of models
Blending, stacking
implementation*

— if for tabular data and classic ML problems solved through regression and classification, AutoML will definitely be good, then with time series, multimodal data and implementation of solutions there are questions.

Which AutoML should I try?

Below is a brief reference to the AutoML tools (current as of May 2024)

AutoGluon — “Fast and Accurate ML in 3 Lines of Code”. Library from the Amazon guys. In 2023-2024, AutoGluon is perhaps the most promising library that will squeeze everything out of data. However, I note that the 3 lines of code mentioned above are still about marketing. The API has more than one page of documentation. AG has three modules: Tabular, Multimodal, Time-series. AG's superpower is blending and stacking models.

H2O-3 – AutoML library from H2Oai. Made and maintained by a brilliant team of data scientists, whose names you can see in the top of almost any competition on Kaggle. The superpower of this library is the Java in which it is written, the GUI and Python interfaces.

DriverlessAI is a commercial product and platform from H2O. If the organization has free impressive budgets for data science, then at DriverlessAI your data scientists will feel like children in a candy store. Just look at this interface…

However, if you have a budget for DriverlessAI, then why do you need a team of data scientists?

BlueCast is a framework created by one developer, caggler and enthusiast Thomas Meissner. BlueCast's superpower is EDA, model explainability, speed and product philosophy. Thomas created on Kaggle many notebooks with examples usage, and BlueCast has grown significantly over the year. Support the author star on GitHubthis is extremely important for him.

LightAutoML (LAMA) – a powerful open-source AutoML framework behind which stands one of the strongest teams in terms of DS expertise from Sber AI Lab. LAMA's superpower is blends and custom experiments. At the same time, LAMA is more of a scalpel for professionals. There hasn't been an update for a long time, I really hope we see it soon.

MLJAR – AutoML project created in 2016. Continues to be regularly updated and supported by its creators. The superpower of MLJAR is its stability and ease of configuration. In almost any test/comparison of AutoML frameworks, MLJAR will be close to the leaders.

PyCaret – Low-Code Machine Learning. A well-known Open Source project created by Moez Ali and enthusiasts. More than 8 thousand stars on GitHub. PyCaret's superpower is its modularity, low-code approach and documentation. If you're new to DS and want to give AutoML a try and understand what's going on under the hood, then start with PyCaret and their excellent site.

What did you not (but would like to) talk about in AutoML?

A few notable AutoMLs that I hope someone can tell you about in the comments:

What about AutoML benchmarks?

If you want to compare AutoML not subjectively, but based on the result, then perhaps this recent study is the ideal starting point:

AMLB: an AutoML Benchmark

https://jmlr.org/papers/volume25/22-0493/22-0493.pdf

https://automlbenchmark.streamlit.app/

You will see this picture on most tests:

Where to try AutoML in 2024?

May 1, 2024 2024 AutoML Gran Prix starts on Kaggle. This is a 5-month hackathon competition that lives a parallel life in playground competitions.

However, according to its philosophy, this competition is not about choosing the best AutoML framework, but about “you have 24 hours – do what you want” and with just such a wording you can fit into this competition.