How we automated uploads and other ad-hoc analyst tasks using Zeppelin

At the time of this writing, Cardsmobile, which develops the Wallet mobile application, employs 195 people: 8 analysts and 187 potential analyst customers. We make an application for end users, as well as work with retail, banks, brands and other partners. For a long time, the work of an analyst in the Wallet consisted not only of studies of user behavior, but also of various unloadings, typical analyzes for partners and forecasts for potential customers. Of course, dashboards were a huge lifesaver and allowed the entire company to track product performance. But we still wasted time on the rest of the routine, and with the growth of the team (customers) and the business, we got stuck: there were too many ad-hoc tasks, and research, the desire to develop and a bright future stood idle in the absence of time.

There are so many cool conferences around, interesting articles about various analytical research, data-science, data-driven, data-happiness. And we looked at all this beauty and did not know where to find time for experiments among the whole stream of routine. Many talk about how to make it cool, but few tell how to overcome the increasing turnover and free up resources for interesting and creative tasks. In this article I will tell you about our experience of entering a brighter future. Further there will be examples of how we automate the Ad-hoc tasks of analysts in Zeppelin


What is Zeppelin

Zeppelin is an OpenSource Notebook from Apache that allows you to access different databases in different languages ​​(Python, R, SQL, Spark). But what makes it especially thrilling is the set of visuals – dynamic forms

In one laptop, we can retrieve data via api from Amplitude, quickly read aggregates from Clickhouse, supplement the result with data from MSSQL, and process it all in Python. And the ready-made reports should be wrapped in Excel in a format convenient for the customer and placed in an html-link, where they can be easily downloaded.

Initially, we started using it simply as a notebook, in which it was convenient to write in different languages. Then we studied the possibilities of Zeppelin better, found built-in dynamic forms: inboxes, drop-down lists and checklists – the light overhead came on! We immediately figured out how much we can automate. We had many typical tasks with ready-made code, in which we just had to change the values ​​of variables. We moved all of our code to Zeppelin, moved variables into dynamic forms, and gave customers the ability to fill them in and run scripts on their own. We liked the idea and the rest of the team!

What dynamic forms are there

Input – text field. We use it to set the time range for entering identifiers. In other words, for everything, of which there are many variations.


Select – drop-down list. You can add a ready-made piece of code to each element of the list. We offer the user to choose one of several standard options. For example, one of the metrics for a typical report.


Checkbox – a form for multiple choice of variations. We give it to the user so that he, for example, can choose the list of required fields in the upload. This is perhaps the most popular case for us. Or when we give the opportunity to select multiple metrics, user segments.


What tasks do we automate in Zeppelin

Unloading is simple and complex, using filters by date, partner, specifying a set of columns.

Most often, download requests come from account managers. And, with a high probability, they act suddenly and urgently. The unloading tasks themselves are typical and quick to perform. But in reality, they distract from those most interesting analytical studies, and their number grows as the partner network grows.

What tasks do they usually come with:

  • A potential partner wants to evaluate the audience that is already using his loyalty card in our application. Our sales managers can log into Zeppelin right at the meeting and download the list of cards. They transfer materials even before the dialogue and interest in our service begin to cool down. We didn’t estimate, but it may be helping to accelerate sales.
  • A new store is connected to our partner network. His cards already existed in plastic format, our users could add them to the application and show the barcode at the checkout. But with the connection, we gave the partner new opportunities: now the user has information under the card about special offers, personal discounts. The partner’s manager wants to assess how the purchasing behavior has changed among those who added their card to the Wallet application. Our account manager helps him – unloads all card numbers and card barcodes that already existed on plastic media, issued in a certain period.
  • A partner wants to increase sales of a certain product and launches a promotion. He informs his audience in the “Wallet” application about special discounts on goods using push messages. To assess the effectiveness of this communication, we upload a report to him about who received and read this push.

We have created reports for all frequent upload tasks that were contacted to us. We have accelerated the processes of our colleagues and freed up time and attention for more interesting tasks. Now we are only finalizing these reports as needed.


Typical tasks in which you just need to run a ready-made script. Here we also apply filters, let us set the value of the variables. For example, recalculation of some metric or report that are rarely used and do not want to put them on a schedule.

A more sophisticated case from life. The marketing department, together with our strategic partners, decided to conduct a promotion with a specific mechanic. Users of our application had to complete a chain of actions, becoming participants in the drawing of gifts. Once a week, we wanted to receive a list of participants for the week, randomly determine the winners, congratulate them and send gifts. The line analyst created a notebook in Zeppelin that collected users who had eligible for the draw in the past calendar week. The marketer independently launched the notebook and picked up the participants of the week.

Summing up the results of A / B tests, measurement of base-line metrics in the test and control groups. When we test a new functionality or trigger communication, we look not only at the change in the target metric, but also at how the overall user behavior is changing. We have identified 4 base-line user behavior metrics:

  • Activity in the application
  • Loyalty cards and other product releases
  • Unsubscribe
  • Contacting support

Here Zeppelin gives us freedom in how we want to summarize, what metrics to count, how to draw graphs and how to explain the result to those who will use this tool.


We collect databases for communications and retargeting campaigns based on the cohort downloaded from Amplitude. Once we abandoned ready-made communication platforms in favor of our own development (perhaps this is a topic for a separate article, but we are not talking about that here). Our internal solution was primarily tailored for partner mailings: choose a partner and send a message to the entire database. But the preparation of databases for product and marketing communications – that is, Wallet’s own communications – fell on the shoulders of analysts. Typing all marketing and product inquiries seemed impossible. We all tried to highlight the most relevant segments without limiting our capabilities. For example, a fictitious request, but requests of similar complexity have happened to us:

  • Users who came during the period …
  • Added 5 or less cards from the top 10 loyalty programs
  • Started playing a certain scenario but didn’t finish
  • Have used the app more than 2 times in the last month
  • And you can add filters by device model, cellular operator and geography

Of course, we saved the code after each such task and assembled it into some kind of monstrous constructor. But it was still the analyst’s time and attention. And a mistake due to carelessness could cost us waves of angry users who received communication that was obviously irrelevant to them.

And all this was so until one analyst was lazy enough not to write code for a sample of users in Clickhouse, but to collect a cohort in Amplitude and unload it via api. Which, you see, is much easier and faster. The familiar and already understandable Amplitude interface, where any manager can independently assemble a cohort with all the filters from the example above, check its size, additionally check himself and check the users from the cohort that they got into it correctly.

What the mechanics look like:

  • A product manager or marketer creates a cohort in Amplitude. If necessary, analytics show complex cases.
  • Copies the id of the cohort that is in the address bar
  • Inserts into notebooks in Zeppelin
  • Exposes additional filters for which there is no data in Amplitude
  • Assigns a unique sub_id to the mailing list and starts notebook

What happens at this time:

  • The script takes the id of the cohort and unloads it via api from Amplitude
  • The resulting DataFrame is cleared of extra lines in Python
  • If necessary, the recipient base is additionally filtered by gender and / or age
  • A control group is also allocated if we want to measure the effectiveness of mailing (and we rarely do not want to)
  • Recipients are recorded in the history database and transferred to a csv file, which we put in a clickable link for easy download

I gave an example with push-mailing, but our colleagues quickly got ideas of where else we can use a similar tool: any unloading of a list of users with certain user behavior. We are now using cohorts from Amplitude to launch retargeting campaigns as well. And, I think, we will use it for many other tasks.


Monitoring systems

There is one more convenient feature that really does not apply to dynamic forms and, probably, not exactly about automation – scheduled launch. We use it to recalculate dashboards, launch various calculations. But the most useful analytical task that we solve with its help is monitoring. Anomalies in events, in the behavior of metrics, anything that an analyst should regularly monitor, but that also wants to be automated. We have configured the alert system in slack and now we can react in time to the changes we want to know about:

  • The rise or fall of important product metrics, scenario conversions, which reflect the quality of the user experience and affect Retention.
  • An increase in the number of errors that the user may experience. Not all such anomalies may affect the growth in the number of support calls. Many people can contribute to worse conversions and ultimately increase churn. And even if they are not critical, but simply cause inconvenience to our audience, it is important for us to learn about them in time and reduce their number.
  • Just anomalies in the number of all events and each separately. Such monitoring allows us to catch cases that we did not think about in advance.
  • We also set up an alert that some of our regular calculations, which work in Zeppelin on schedule, have worked with an error. We create many useful tools, but we cannot constantly manually monitor their quality.

Success. Defeated turnover, freed up time for the development of analytics in the company

The most enjoyable paragraph in this article – a bright future has arrived! We’ve already automated most of our ad-hoc tasks. Now there are less than 10% of them in the sprint. In the free time, we conduct research, put forward and test hypotheses, complicate our analytics products, and apply approaches from those very articles and conference speeches. In other words, we are finally doing some interesting analytical work. And most importantly, we now have time to take an active part in the development of the Wallet.

Advice to novice automators: put all common and typical code fragments into libraries. This will allow you to write faster, improve the coding quality of the entire analyst team, and edit the code in one place, not in all laptops. And don’t forget that you are making a tool not for yourself, but for your colleagues. And they have different backgrounds. Don’t scare them with complex interfaces and phrases, make them simpler and clearer.

Data happiness is yet to come, but we are already strongly inspired, come to life and run towards it.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *