MusicGen – generate music on your PC. New local neural network – introduction and installation

Facebook* recently rolled out a new neuron – MusicGen (repository).
Apparently, it was not enough for the guys from the company to release the Llama text model, which gave a huge boost to the development of local neural networks, and they decided to do the same in the field of music.

Today we will learn a little more about the model, think about who needs it, and launch it locally.

About model

Published weights make it possible to generate new music from a textual description and, if necessary, a sound example.
In total, 4 versions of the models were released:

  • small – small model with 300M parameters, txt2music only.

  • Medium – 1.5B parameters, medium model, also txt2music only.

  • Melody – 1.5B parameters, but with the ability to add a sound sample of the composition to the generation (txt+music2music).

  • large – 3.3B parameters, largest model, txt2music only

Despite the phrase in the repository that Medium the model requires 16GB VRAM – everything worked for me even on 8GB, on my old GTX 1070. Also, some people write that everything works stably on 6GB VRAM too.

The maximum length of an audio file that can be generated in the GUI is currently 30 seconds. I did not find any other information about all the possibilities for the duration of music at the output of the neural network, but it is known that generating longer passages will require much more video memory. Perhaps that is why there is a limitation in the interface so far.

And I will also add a note – the thinner the model, the less computing power it requires, so it is likely that we will be able to generate quite long compositions. And with a context shift – so generally endless.

About the data that was used when training the model, I will just quote:

We use 20 thousand hours of licensed music for MusicGen training. In particular, we rely on an internal dataset from 10,000 high quality music tracksas well as on ShutterStock and Pond5 music data.

Sounds pretty good, I guess. But as we know, a high-quality dataset is better than just a big one. Therefore, personally, all this does not mean anything to me, in contrast to the quality of the music received at the output.
Here is a small playlist created with m:

And there is also a large table with interactive examples and comparisons of the algorithm with associates – Here

They promise to roll out the code for training their models soon:

A licensed model under CC-BY-NC 4.0. Code in repository – MIT.

Who needs it at all – opinion

My opinion about this model is that lower-tier composers should now quickly go and studyhow this tool works.

30 seconds that a neuron can generate at the moment is not that much, but I think we all remember how it developed Stable Diffusion – first boring pictures with crooked cats, and then custom models, ControlNet, and everything else.

It will be about the same here, but a little slower, due to the fact that people are not as interested in music as they are in pictures with naked women.

I see at least 3 areas that this neuron will be interested in:

  1. Indie game development – my sphere, I live in it, I know what and how. I would not say that we have few composers in the environment (And sometimes there are even too many, but not all are useful)but we really need free music here, and preferably a lot of options in a short time.
    With the qualitative development of this neural network, this requirement will quickly close.

  2. Singers of the lower link, mother’s rappers, etc. – due to the overheating of the musical sphere, the minuses are unpleasantly expensive. So bitmari also ask for royalties.
    Those who want to pour their soul into songs, but do not know how to write music, will be very happy with the new instrument.

  3. Youtubers/Streamers – so as not to bother with copyrights to music, tk. ContentID bans for any srenk, they will use neurons.
    I wrote what the video is about, what mood is required, what speed – I got a suitable track for the background, against which no one will definitely throw a strike.

If you are one of these three, then it is better to start monitoring what is happening, because in terms of quality, the model can really either take money from you, or, on the contrary, save you money.

How to start

It’s not difficult to start at all, I’ll tell you now.

First of all, I want to note once again that you will need video memory, and therefore a video card. Top segment is optional, but minimum 6GB VRAM (and better 8GB) need to get. Required NVIDIA.

First, we need to install 3 regular programs that will run our neuro program.

1) If not installed, then install git- is a program that will download the developers’ code to your PC in a digestible form. Installed here.

2) Install Anaconda – the program inside which we will run the neural network code. (Techies, sorry for such an explanation, but I’m trying not to stifle for the sake of the future of Russian rap)
You can download the installer on the official site.

3) Install ffmpeghere pretty clear guide, but in theory you can use with this one-click solution (Download .msi and install).

With the git version, conda --version and ffmpeg commands, you can check if everything works

With the git version, conda –version and ffmpeg commands, you can check if everything works

And now let’s move on to installing the neuron:

1) We go to any folder where you want to see the downloaded neural network, right-click in an empty place with the Shift key pressed, and select Open PowerShell window here.

2) Write the following command to the console:

git clone https://github.com/facebookresearch/audiocraft.git

It will download the neural network code to your PC.

3) Create and configure a new virtual environment with the following commands entered in sequence:

conda create -n musicGen python=3.9 // Создаём виртуальную среду

conda activate musicGen // Activate it

pip3 install torch torchvision torchaudio –index-url https://download.pytorch.org/whl/cu117 // Install required packages

4) After everything is installed, go to the repository folder using

cd audiocraft

And install some more dependencies:

pip install -e . // Enter the command along with a dot at the end

5) Ready! You did it! Enter the last command that will start the neural network:

python app.py

After a while, everything will load, and you can open the MusicGen interface directly in your browser by clicking on the link http://127.0.0.1:7860/

In field input text enter your request for music, choose model sizeand press “Execute”. All!

And if in the future you want to run the neural network againyou will need to open a terminal in the folder, type conda activate musicGen,and run python app.py.

And with that, I think I can end. If you have any questions – write to me in any social network convenient for you. networks, or in the comments. 🙂


Also, here is my telegram channelwhere without it – additional news may appear there.

Thank you!

* The organization is banned in the Russian Federation

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *