Getting started with the Galactica language model

Galactica is a science-based language model with 120 billion parameters. Galactica predicts protein annotations, creates lecture notes, and writes math formulas into text.

Introduction

Galaxy — is a large open source language model from Meta AI. The model copes with many scientific tasks using a single model, performs logical reasoning, creates lecture notes, predicts citations and has a lot of other talents.

The model expresses mathematical formulas and Python code in text.

The model is relevant because it provides the highest performance on scientifically oriented datasets. For example, she suggests less toxic speech in TruthfulQA-datasetcompared to the updated GPT-3 or OPT.

The model can refer to sources of mathematical formulas.

The complete model is available as an open source project. Let’s start with him.

Getting Started

The model can be accessed via Github with a few lines of code.

The model is installed using the command:

!pip install git+https://github.com/paperswithcode/galai

Galactica currently works with Python versions 3.8 and 3.9. Model installation is not possible with version 3.10 and above. This limitation is currently due to a library requirement prompt source.

Model import:

import galai as gal

Model loading:

model = gal.load_model("base", num_gpus = 1)

Here we always determine the size of the model. The “basic” version is used, which includes 1.3 billion parameters. We choose this version due to the memory requirements of the model.

The “basic” version consumes about 11 GB of memory. Larger versions require even more. For example, in the “standard” version, our laptop simply ran out of memory.

The second parameter is optional. We found it necessary to determine the number of GPUs because the model ran into an error without specifying one. Your environment may have a different number of GPUs.

Now we can start using the model. Let’s put in a sentence:

model.generate("We can explain Brain as", new_doc=True, top_p=0.7, max_length=200)

The model includes additional parameters with which you limit the output length. These parameters are similar to the model GPT-3.

The model prints the result:

“We can explain the brain as a computer program that takes in data from the outside world and, as a result, produces a result. Brain — it is a machine that decides what to do. Brain — it is the part of the brain that is made up of neurons, the basic building blocks of the brain. Neurons — they are the smallest units of the brain. Each neuron contains a membrane and a set of synapses that allow it to interact with other neurons.\n\n[ИЗОБРАЖЕНИЕ]\n\NFigure caption:Figure 10.2.110.2.1: Neurons — they are the smallest units of the brain. \ n \ n ## What are the functions of neurons? Neurons are the basic building blocks of the brain. Brain — is the part of the body that is made up of neurons. Neurons communicate with each other using chemical signals called neurotransmitters. There are many different types of neurons in the brain. Different types of neurons in the brain are called different types of neurons. Neurons of different types»

Model Galactica

Galaxy — it is a large language model based on the Transformer architecture. The model mainly uses scientific articles and research papers as data. Documents converted from pdf to text using GROBID-libraries.

The model includes 5 different sizes from 125 million to 120 billion parameters.
The model uses a context window of length 2048.
A “specialized” approach to tokenization to manage specific data types. For example, stepwise reasoning uses the special tokens and as working memory. Similarly, DNA sequences start with [START_DNA] and end with tokens [END_DNA].

The Galactica model is new, first of all, with its working memory token. We believe that future models will expand this approach towards adaptive computing.

Conclusion

Galactica makes a good impression as a useful tool for a scientifically oriented computing interface.

The model is particularly impressive with its tokenized working memory (working memory-token, WMT). We believe this is a key feature of the Galactica model.

Galactica, at the time of its launch, is the easiest model to use:

It only takes a few lines of code.
Truly open source with no model size limits for all users
The easiest model to install on a regular computer.

We believe that these aspects make models attractive to developers.

Large language models require even more memory to run locally. This leads users to service providers via an API. Models will pay more attention to memory requirements in order to expand the user base.

Meta AI removed the Galactica web demo within 24 hours due to misuse. We tested the demo version. The responses were not impressive. However, we see excellent model results when run locally.

Meta AI continues to release large AI models to open-source. We believe this encourages developers to use Meta AI tools, making open-source a good strategy.