Machine Learning Highlights November 2021

article / repository

A multimodal transformer that, like DALL-E, can generate photos and videos from a text description with a single stream of tokens instead of three separate ones for each data format. In addition, the model is capable of solving several more types of tasks – converting sketches into photos and text, adding photos and videos, text manipulation with photos and videos.


Availability: blog posting / online demo

Microsoft also introduced a multimodal model with 2.5 billion training parameters for working with images. The model is similar to OpenAI’s CLIP, only capable of learning in 94 languages ​​and demonstrating SOTA performance in zero-shot image classification.


Availability: project page / article

Researchers at NVIDIA have unveiled a new method for semantic image editing. Users can edit images segment by segment in the graphical interface. This allows you to edit the mask of an individual part of an object, such as the headlights of a car, using vectors in hidden space. The method helps to determine these “editing” vectors and their number, so that they can then be applied to other images.


Availability: project page / article / repository / online demo

Adobe Research has introduced a new editor that allows you to interactively manipulate images. Unlike approaches in which the user needs to apply a mask to the object so that the neural network then generates a fragment to fill, here it is enough to draw the contours over the image.


Availability: project page / article

Google showed such an algorithm – an image with a resolution of 256×256 is fed into the input, then the model is able to predict what is outside the frame in two formats – using the panoramic effect and the effect of zooming out the camera.

In the first case, the model generates a left and right image in four steps to obtain a final 256×1280 image.

In the second, the frame of the original image is supplemented from four sides. This algorithm can also be used for general image2image tasks. It is based on the diffusion model, which we wrote about in more detail in July using the example of SR3.


Availability: project page / article

Google Research has unveiled a way to train NeRF on RAW images. This allows you to create new HDR representations of the scene, while not only controlling the point of view, but also exposure, focus, and tone mapping. The method can reconstruct scenes from extremely noisy images captured in near total darkness.


Availability: project page / article

While NeRF can generate lifelike photos from new vantage points, many shots are needed for high-quality results. Otherwise, the images will contain artifacts caused by erroneous prediction of the scene geometry. Researchers at the Max Planck Institute have unveiled a new way that orders invisible views during optimization to generate new views from just three photographs.

Winner WMT 21

Availability: blog post / article / repository

Most machine translation systems use groups of bilingual models, which usually require large datasets for each language pair and task. FAIR proposed an approach in which one model translates several language pairs at once, including low-resource (for example, from Icelandic to English) and resource-intensive (for example, from English to German) using a new method of mining training data using a mixture of experts (MoE). This allows the model to be scaled up to 52 billion parameters. The algorithm outperformed the best specially trained bilingual models in 10 out of 14 language pairs and won the WMT competition.

In November, the following became available:

  • beta GPT-3 and Codex. Service paid, but upon registration, 300 thousand tokens are given in the amount of $ 18;

  • source and demo new version of Demucs v3 from FAIR – a model for dividing audio tracks into separate source streams;

  • interactive demo GauGAN2 from NVIDIA, where you can create landscapes from a sketch in a simple graphical interface.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *