ERNIE-ViLG – free Chinese neural network

Neural networks that generate images are now at the peak of popularity. While everyone is having fun Dall-E2 , midjourney and Stable Diffusion, there is another model that is not so well known on the Internet. Her name is: ERNIE-ViLG.

ERNIE-ViLG is an open source image generator developed by Chinese tech giant Baidu. The name of the neural network stands for Enhanced Representation through Knowledge Integration – Vision Language Generation.

Despite some similarities with Stable Diffusion, these are different neural networks. At least if you believe this document. You can test the new Chinese tool here: ERNIE-ViLG Demo. It is free and we did not see any restrictions on the number of generated images. Although the generation of images is not fast, you will have to be patient.

The interface is quite simple: in front of you is the simplest form for entering text, almost two dozen different artistic styles and an image generation button. There are more hints below – examples of popular queries.

It is also important to remember that ERNIE is designed in Chinese. This means that the English phrase is first translated into Chinese, and only then the image generation starts.

Some Cloud4Y employees played around with popular neural networks, so we decided to test the novelty as well. Here’s what we got.

Testing fantasy

We decided to start with a query that requires some imagination: A cat with glasses fights for a laptop with a robot. We composed all requests in such a way that they looked as primitive as possible – then the machine understands them better.

Here is a version in the style of “Futurism” generated by ERNIE.

Bigger pictures

As you can see, there are a couple of interesting options, but in general the result is not impressive. The first DALL-E produced something similar.

Okay, we thought. And if we take not an abstraction, but something more concrete? For example, Dracula is learning Python program code. Alas, here we were bitterly disappointed. Neuronka stubbornly did not want to show more or less decent options.

Here’s what happened

The neural network does not know Count Dracula. But what about just vampires? So, vampire learns Python program code on a laptop.

Well, also “not ah”

Something vampiric is already visible here, but still the results were not impressive. Maybe you should ask to draw something even more famous? Let’s feed IT requests to ERNIE.

Clouds, neural networks and Russia

Many people have tried Docker containers, so we decided to start with them. And to increase the chances of success, we added a couple more elements. Kubernetes with blackjack and kittens.

It looks like ERNIE only likes cats in this set of words

Quite far from what you need. Let’s make a request like this: Docker container, photos with laptop and kittens. The seals were added for the simple reason that without them, the result was a completely bizarre abstraction.

Something close but not right

Let’s refine the request a bit. Kubernetes container, photos with laptop and kittens. And also choose a cartoon style

No, there is something container in it

Yes, Ernie is not very friendly with container technologies. I wonder if the neural network is familiar with cloud technologies? We check: Russian cloud technology.

hidden text

Some of the images look like spoiled photographs. And almost everywhere we are shown St. Basil’s Cathedral. How about simplifying it even more? Russian technology.

Here are the technologies

The logic of the machine is not entirely clear, but okay. Let’s not be tied to geopolitics. Imagine that a happy elf bought a video card. How will ERNIE show it? Elf brings home a video card (cartoon).

Peculiar elves, of course

Cartoon options are scary. What about realistic style? Will it get worse or better? Trying…

Elf brings home a video card (Realistic)

No, return everything as it was. It turned out quite strange. It seems that the Chinese neural network still has to learn and learn before it can show a result close to its Western counterparts. However, the Chinese are learning fast.

Experiment with different styles

So far, the Chinese neural network does not generate the most successful images. But what if we take the simplest things and generate them in several popular styles? Let’s take, for example, sun, flowers and children. Style – realistic


What if it’s an oil painting?

Oil painting

Let’s try to add puppetry, Lolita mode


The results are quite good. If you do not peer into what is depicted on them, then the dresses can be called beautiful. The next style is cyberpunk.


Some images definitely have the right to exist, although they need to be improved. But we will not stop, we continue experiments: Baroque.


The style is definitely there. But realism is not. Let’s then completely break away from reality and test the anime style.


Like many other images generated by the neural network, problems with limbs, eyes, and postures are visible here. Although there is a certain beauty. So ERNIE is quite an interesting tool for pampering and experimentation.

Some more pictures

ERNIE available via API

Follow the instructions in the manual GitHub, if you want to try the API (but remember, this is a Chinese neural network, so many of the instructions are also in Chinese). An example API call looks like this.

def generate_image(
          style: Optional[str] = "探索无限",
          topk: Optional[int] = 6,
          output_dir: Optional[str] = 'ernievilg_output')
  • text_prompts – text of the phrase

  • style — image style

  • topk – number of images (up to 6)

  • output_dir — Directory to save the output image

Show us what you got interesting!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *