How I Created a Bot for Mobile Musicians

Hi! I'm Victor, project manager

in Selectel

. For the last 20 years, my hobby has been writing music. It has had varying success (writing music, of course), but it is part of my life. First there was Fruity Loops 3, then Reaper, but my soul has always been drawn to hardware and new technologies.

Under the cut is the story of a pet project that is designed to make life easier for mobile musicians. Or make technical specialists say *meh* (a minor octave C). Be careful – inside there are neural networks, musical hardware and bad Python code.

Use navigation if you don't want to read the full text:

→ Idea
→ About the Stem Splitter Bot's working principle
→ What was used
→ How many resources are needed?
→ What's next?

Idea

In January 2024, there was a rumor that AKAI MPC Live was about to launch divider on STEMS (an acronym for Stereo Masters) – isolated audio tracks (drums, vocals, bass, etc.). The public was excited: now you can split a track on the road, sample it right away and be a little happier.

The reality turned out to be harsher. At the beginning of 2024, the company released the “divider” only in its software and has been promising to expand the functionality for months. At that time, I wondered if such a function could be placed in a Telegram bot.

UPD. While I was writing this article, AKAI released MPC STEMS for Standalone solutions, but this cannot be called an uncompromising and universal solution.

For whom

For users of Roland SP-404mk2 (I have one), Elektron Digitakt, DirtyWave M8 and other devices. The bot will help you make music on the go and play functions that are not available in the hardware.

Here, for example, is how Novosibirsk musician Zhenya PNV plays live
(this is live: there is a recorder under the T-shirt, everything is fine):

Below is my travel kit:

Sampler Roland SP-404MK2, headphones KZ ZSN Pro, USB cable and stereo pair 6.3-3.5 for connecting to the phone, powerbank just in case.

Why, if there is software in the browser

Yes, but you have to sit on websites like this from your smartphone.

lalal.ai

vocalremover.org

It's just inconvenient. Plus, they have limitations on the length and number of tracks in the free plan.

Screenshot of the site lalal.ai.

There is also a solution for those who do not want to use the web versions – Koala Sampler. Good software, but it costs money (from 450 ₽ one-time). Many people create beauty on it, because it is already a full-fledged studio in your pocket.

How STEMS Work Koala Sampler.

Knowing all this, I wanted to make a project “out of pocket” without additional software. Telegram can help with this, because it is a priori installed on most devices.

Someone will say: “But there are already bots.” Everything is simpler than it seems. At the time of the first release, I simply had not come across such developments. And I wanted to create a pet project, understand the libraries and implementation paths.

About the Stem Splitter Bot's working principle

The bot accepts any files with mp3, ogg, wav extensions as input, then offers a list of actions. As you understand, I did not limit myself to simple division into tracks.

Screenshot from Telegram bot Stem Spitter.

By the way, on whosampled.com This track is also there – someone has already been inspired by it.

Let's go over the functions.

BPM — allows you to determine the speed (number of beats per minute) of the track. Useful when the hardware does not automatically detect BPM.
Key – Shows the key of the track or track. Helps keyboardists or users who use chroma mode on samplers to play a sample live.
MIDI — converts a track into a midi file. It will be useful for those who can’t or don’t want to play something on their own, but need to “take off” notes. Then you can use the resulting midi in any sequencer (Fruity Loops, Ableton Live, Reaper) and play to the fullest with any synthesizers. However, you shouldn’t expect the “Wow!” effect, as well as any special tricky functions.
STEM MP3 or STEM WAV – splits a track and sends the user separate tracks in mp3 or wav.
/bp — calculator that calculates how much you need to change the Pitch when changing the BPM to keep the key of the track. You can just enter the command, you don't need a track for this.

What was used

Deezer's Spleeter Library

In short, I was delighted

from the library

I'll run through the key benefits.

Allows you to work with tensorflow cpu (works only on the processor and memory). At the same time, the implementation is quite fast.
It integrates into the bot in no time (but there are some nuances, more on that later).
There is a pack of pre-trained models for two, four and five tracks.
Near-production solution: used in plugins from iZotope and others.

Why not demuqs

The library works, but it didn't suit me for several reasons. First, it's optimized for working with CUDA cores. You can use the CPU-only option, but it won't help much. Second, it's slow: in my case, demuqs is 10 times slower than spleeter on four-track models. This is critical for the bot.

What else

Aiogram — a basic library for those working with synchronous calls for bots. There is a slightly toxic community that will help if needed.
Librosa — one of the best libraries for working with music files. In addition, we used a fork (microproject) Tonal_Fragment And sound_to_midi.
Ffmpeg — not a library, but a set of software and codecs. Needed for spleeter to work.
Tdlib — a library for working with Telegram Local API. Allows you to increase the permissible size of files that we send to the bot to 2 GB.
Aiosqlite. Sqlite is the de facto standard for pet projects on Python. Its asynchronous version is a little more suitable for bots. For production solutions, it is not the best choice, but I just find it more convenient to work with Aiosqlite.

How many resources are needed?

The advantage of Spleeter is that it can do without a GPU. It works quite quickly, and in skilled hands with a tensorflow-cpu – even faster.

Provided that I don't store user tracks and final tracks, a virtual machine with a fairly simple configuration will suffice.

Configuration

6 vCPU,
12GB RAM + 4GB SWAP,
50GB network drive (reading – 320, writing – 120 IOPS, 100 MB/s),
CentOS 9 Stream.

However, it is possible to optimize the processes and improve the code. According to my calculations, this will reduce the consumption to 4 vCPU and 8 GB of RAM.

And what about the numbers?

Today I “dance about literature” more than write directly about sound. As tests show, the bot divides tracks acceptably for most mobile musicians. If desired, you can test the Spleeter library or look at the research. Let's dig into the statistics a little:

Bot usage statistics.

The bot is distributed by word of mouth — once it was shared in the channels of beatmakers and DIY musicians. The total number of channels is about 8,000 people (in fact, about 10,000, but 20-30% of the audience overlaps and actively sits in one or another channel).

The total audience of unique users from the two bots is about 250 people. Only 37 people “visited” both versions. The final conversion of the bot is 3%.

Initially, there was an assumption that the two-track bot would be used more often, because it is convenient for mashups and funny remixes for social networks. I also remember how in the late 2000s everyone was looking for instrumentals for vocal schools in local community centers and used Adobe Audition with its wonderful plugins. But the time has apparently passed.

However, it is interesting that the four-track bot is used by the real target audience – musicians. I have feedback from a live audience – here are the top reasons:

“You have to take a piece of melodic sample and 'make magic'”
“I want to take the vocal track and do a remix,”
“Trying to understand how the bass or drums “work” in a track”
“Determine the number of actual layers (tracks) in a track”
“I can quickly determine the key, BPM, etc.”

So the mobile DIY musicians audience is just making music and constantly learning from different things. And of course, using the instruments for non-commercial purposes.

An attentive reader will ask where is split_wav, which I mentioned earlier, or the speed of the “splitter” itself. I'll give you a short teaser: logging and optimization are my passion.

Problems

In the article I planned to show how the project works quickly even on my bad code, with a minimum of resources. However, during the preparation it turned out that I lack experience in interacting with logs.

start_time = datetime.datetime.now()
функция
download_time = (datetime.datetime.now()-start_time).total_seconds()

The code above works, but with aiogram and asynchronous code, apparently, its own “black magic” is needed.

~~A failure~~ attempt to collect statistics on track download speed.

You'll have to believe it: the bot works equally well on different VPS and is not demanding on disk performance. There is no noticeable difference when using HDD or NVMe disks.

The second thing I couldn't cope with was the oversell from memory when starting Spleeter and its operation. This famous topic on Reddit: Many people encounter the basic 50% limit. The problem is that when Spleeter starts and first runs, it takes up a little more than 2 GB of memory. Various methods and advice from colleagues did not help to cope with this nuance.

The surefire solution is to take more memory and add swap, which is what I did. 🙂 But you can share tips and possible solutions in the comments – it will be interesting to read! However, you won't need much memory if you spend time on optimization.

Why two bots? Why not fit everything into one functionality if so many resources are not needed? Perhaps you will understand everything after seeing a code fragment:

if name == 'main':
  separator = Separator('spleeter:{0}stems'.format(stem_type_default))
  executor.start_polling(dp, skip_updates=True)

If not, I'll tell you briefly about the problem. Separator should be launched in the main program block. All my attempts to place its definition – for example, in two copies – failed miserably. Then came humility and an idea: new functions can be tested on a bot in two tracks. Some kind of A/B testing on a real audience.

What's next?

I don't know. 🙂 Programming and music are my hobbies, brain training and a way of spiritual enrichment. There were attempts to collect donations to stimulate my own interest. In all this time, only once was Telegram Premium given as a gift, but that was also very nice.

If you have any suggestions or want to take my code and make something awesome (without a paid subscription, of course), then write to me in Telegram — I will share the project and, if possible, add new functions to the bot.

I think the project will develop as the community comes up with ideas. The entire base is already there.

Make music, create useful projects and share your suggestions for improving the bot in the comments! Until next time.

How I Created a Bot for Mobile Musicians