Moving to retraining models? Google bought Character.ai researchers and technology

Daniel De Freitas (left) and Noam Shazeer, founders of startup Character.ai

Google and startup Character.ai announced about cooperation. As part of the agreements reached, Google will receive non-exclusive rights to the technology of large language models of Character.ai, and the startup's executive director Noam Shazir and the second co-founder Daniel De Freitas will begin working in the DeepMind division. Character.ai itself is going to try to switch to additional training of open models.

From the earliest days of Google, it was obvious that the company was merely masquerading as a search engine, while in reality its core business was artificial intelligence. This opinion expresses one of the paragraphs from a 2018 New Yorker article. To illustrate, an example is given: in 2001, one of Google's employees, Noam Shazir, got tired of the behavior of the spell checker and wrote his own with elements of AI.

At the time, Google licensed a spell checker from a third party. The article doesn't say what the product was used for, but it was likely to correct misspelled search queries.

This spell checker made stupid “cartoon anal” mistakes. The New Yorker article cites an example of a correction TurboTax (a popular US tax preparation software package) on turbot ax – grammatically correct but meaningless nouns in this order: fish turbot and an axe. Spell checkers like these are only as good as their vocabulary.

At the time, Shazeer was a young engineer sharing an office with Jeff Dean and Sanjay Ghemawat. They would all become famous much later. Shazeer had the idea that the indexed Web was the largest dictionary in history. To take advantage of this wealth of information, he wrote a program that assessed the statistical properties of text on the World Wide Web and determined which words were likely to be typos. For example, pritany spears And brinsley spears — this is clearly a misspelling of the name of a singer popular in the early 2000s Britney Spears.

Back then, Google's corporate culture still included Friday TGIF meetings. [Thank God It’s Friday, «слава богу, сегодня пятница»]; they will be significantly reworked only in 2019. Shazir presented his own system at one of these TGIFs. His creation was tested by attempts to cheat, but Shazir's spell checker was not mistaken.

At the level of rumors this story retold slightly differently: Noam allegedly proposed a similar idea during an interview at Google, and then the engineer was hired to implement the spell checker in practice. It is known for sure that Shazeer and Dean later applied similar AI principles to link ads to the context of the page.

Ad targeting still generates the bulk of Google's revenue. Shazeer left Google in October 2021, and in November, he founded the startup Character.ai with another former Google employee, Daniel De Freitas.

Character.ai is a Big Language Model (BLM)-based app with a set of chatbots that communicate with users. Bots take on the role of people or characters and then act out their own behavior in the chat. The bots are created by users themselves: they write a text description and upload an avatar.

Both Freitas and Shazir were key AI researchers at Google. Suffice it to say that Noam's name is on the legendary research paper,Attention Is All You Need“, which is associated with the invention of transformers. Daniel worked on the experimental AI project Meena, which later grew into LaMDA. It is with this BYA that the story is connected, in which one of the Google engineers declared that the neural network had consciousness.

Noam Shazeer's (nshazeer) activity on the Mesh TensorFlow project repository. GitHub

In November 2022, OpenAI's ChatGPT service opened to everyone and almost immediately gained huge popularity. 100 million ChatGPT users dialed in January 2023, in just two months.

In Google almost immediately was announced alarm, proceedings have begun, to work connected the company's founders Larry Page and Sergey Brin. It turned out that Google had a similar to GPT-3.5 BYA LaMDA, but its didn't launch in the form of a service like ChatGPT due to certain reputational risks.

By February 2023, Google had a ready-to-test version of the Bard chub bot. In March, the Character.ai startup reached a billion dollars in capitalization in its latest round of investments. In a statement to the press, the co-founder of the unicorn startup Freitas then answered to questions about competition: although the two products share the user base, Google will not produce anything interesting. Daniel explained his confidence by the fact that he already worked at Google.

In general, it is not known for certain why two Google employees quit and started their own startup. Insiders from the media they claim (archive.is/bNxEQ), as if Google's management deliberately suppressed attempts to create a system like ChatGPT.

The head of the parent holding Alphabet and Google itself, Sundar Pichai, allegedly personally forbade Shazira and Freitas from making promises to release a chatbot based on LaMDA. At the same time, the co-founders of Character.ai allegedly tried to integrate LaMDA into Google Assistant back in 2020 and experimented with BYAM's responses to user questions.

It was Shazir who said the famous words: “We cannot answer why these architectures [трансформеры]seem to be effective; we attribute their success, like everything else, to divine favor.” He writes something similar in the conclusion of the article “GLU Variants Improve Transformer” [arXiv:2002.05202]

Noam himself mentions conflicts very vaguely. For example, in in one of the interviews at the 46-minute mark, he literally talks in one sentence about the LaMDA experiments, which led to controversy, leaving Google, and founding Character.ai.

Further development of Google chatbots eventually led to the emergence of Gemini. This product competes with the best solutions on the market: it outperforms them in synthetic benchmarks and surpasses them in some parameters. No other BYAM can boast two million context window tokens.

However, it has failed to win over users. For example, a review of the paid version of Gemini Advanced from CNET criticizes the bot for the low quality of answers to even basic questions and poor information analysis capabilities. Review delivers a verdict: $20 a month would be better spent on another product.

Yesterday, August 2, 2024, the startup Character.ai announced about the beginning of a partnership with Google. As part of the agreements reached, Google will receive non-exclusive rights to the technology of BYA Character.ai. The startup receives funding from the search company in order, as stated, to continue to grow and develop personalized products based on artificial intelligence.

Character.ai also reports that Noam Shazeer, Daniel De Freitas, and several other unnamed members of the startup's research team will be joining Google. comments Shazir limited himself to an official statement to TechCrunch about being happy to return to Google and join the DeepMind team. Google does not explain what roles Noam and Daniel will be working in.

The Character.ai blog assures that most of the startup's employees will stay and continue to develop the product. Since the announcement, Dominic Perella, who has experience leading Snap Inc., has temporarily taken over the role of Character.ai's CEO.

As a blog post by Character.ai explains, the first versions of the product required pre-training and retraining of their own BLMs. The state of the industry has changed significantly over the last two years, and there are now many pre-trained models. Character.ai says it plans to use more third-party BLMs.

In the future, pre-training will indeed not make sense, and smaller players will focus on retraining, distillation and other techniques for refining models for their own needs. Such opinions were expressed in response to this news (1, 2) industry-related microblogging observers.

Indeed, pre-training large language models is an extremely expensive operation. For example, to create Llama, Meta* collected two clusters, each with 24,576 Nvidia H100 accelerators. The exact cost of the H100 is unknown, most often They saythat one accelerator costs $25 thousand. In this case, more than $1.2 billion was spent on purchasing accelerators alone. In general, a lot of expensive equipment is needed to operate such data centers, including proprietary Quantum2 InfiniBand platforms.

Before this Meta* was supposed to on a more modest cluster of 16 thousand A100. It was on the A100 that Llaama 2 was pre-trained. The family of models of the third version of Llama was trained on the H100. To get Lllama 3.1 in the version with 405 billion parameters, 16 thousand H100 accelerators processed a dataset of 15 trillion tokens for 54 days.

A scheme in which generous investments in infrastructure will help achieve universal artificial intelligence. Meta*

Third Party Bloggers they are figuring it outthat the pre-training of Llama 3.1 alone cost no less than $100 million. At the same time, BJMs were released for everyone under a relatively free license, which with 3.1 has become even more permissive and now allows training other models based on the output from the Meta* product.

Such open BNMs can be further trained and, if the license allows, used for commercial purposes. Further training is much cheaper than creating a model from scratch.

In the future, Meta* plans to increase its H100 accelerator fleet by an order of magnitude. It is claimed that by the end of the year, the company will have 350 thousand H100. The cost of this equipment will amount to billions of dollars, even by the most conservative estimates. Why so much computing power is needed, recently indirectly explained Meta CEO Mark Zuckerberg himself: pre-training the next Llama 4 will require ten times more resources than Llama 3.

Of course, some companies can afford such expenses. If you believe estimatesOpenAI will spend $7 billion this year on training and launching models — and that's with discounts from Microsoft on the Azure cloud. It's claimed that the company will lose $5 billion by the end of the year, and there's even a threat of bankruptcy.

Character.ai is much more modest in size. In September 2023 was discussed a new round of investment with a valuation of $5 billion, but it did not take place. The valuation never went beyond a billion. The total amount of investment in the startup is about $200 million.

It is quite possible that for AI startups of the caliber of Character.ai, the decision to move towards additional training of open BYAs will soon become the only possible development option. The service for communicating with the fake Elon Musk and the drawn Your Forger simply does not have billions to train models from scratch.

* – an extremist organization whose activities are prohibited