Take and host your own LLM – why is it necessary? [и нужно ли вообще]

Clearly, the IT industry is showing interest in Large Language Models (LLMs). Many companies – including startups and developers – prefer self-hosting open LLMs to working with the API of proprietary solutions. We are in beeline cloud decided to discuss the pros and cons of this approach, including from a financial point of view.

Image - Bernd Dittrich - Unsplash.com

Image – Bernd Dittrich – Unsplash.com

Selfhosting and its potential

One of the key advantages of self-hosting is the ability to further train and fine-tune the language model for specific tasks. Unlike API-based solutions, working with open LLMs like LLaMA 2 on its own infrastructure provides full control over its parameters and environment for effective adaptation. At the same time, a number of studies showsthat even compact models have decent performance on a narrow range of tasks. Thus, the developers of LLaMa-7B with 7 billion parameters claimthat the quality of her answers continues to improve even after the first trillion tokens.

How note specialists from the AIRI Institute of Artificial Intelligence and the National Research University Higher School of Economics, it is possible to improve the quality of responses of compact models by training on different types of data – for example, not only text corpora, but also images. Thus, by introducing new modalities into the LLM learning process, smarter models can be obtained. It is quite difficult to do this within the framework of working with the API of a proprietary solution – with reference to the organization’s data. Another advantage of hosting open LLMs is the availability of accessible and open tools for customization, deployment and inference, which are developed by a wide community of developers.

But if we touch on the economic side of the issue, then, at first glance, the situation folds up not in favor of self-hosting. According to experts, for a self-hosted model that processes about 10 thousand requests daily, required budget of 40–60 thousand dollars per month. This amount includes the cost of purchasing and further maintaining data collection equipment, as well as hiring specialists. Processing the same number of questions using the commercial solutions API would cost approximately $1,000 per month. But many proprietary products set limits on the number of requests. At achieving threshold, each additional thousand requests becomes increasingly more expensive and can turn into a serious expense for companies actively using the capabilities of AI systems. Dependence on an external API provider also comes with certain risks, such as changes in pricing policies or even discontinuation of service, which may require switching to another solution and increase costs for the business.

Self-hosting also solves important issues related to the security of personal data. For example, many API providers for working with AI systems in terms of use stipulate that the company can further train the model using customer data. Finding out exactly what data the developer of an AI system collects (and even more so, removing it) is usually very difficult. Information security specialists even notethat such a practice may be contrary to the laws of a number of countries. In turn, self-hosting provides full control, allowing companies to store and process personal data in accordance with regulatory requirements.

Another future

Open LLM models, which can be self-hosted, have their own advantages. However, the problem lies in the limited access to training data. Corporations today have a colossal amount of information, so the models they develop and train often turn out to be more accurate and functional than their open-source counterparts.

So, the GPT-4 model shows performs 20% better than LLaMA 2 on the MMLU benchmark, which includes 16 thousand questions from 57 academic fields. According to the Artificial Analysis website, which ranks large language models, the first places on the list occupy API solutions (for example, GPT-4o and Gemini 1.5 Pro), inferior to open source models only in price.

A key aspect of self-hosting LLM is the need for computing resources. But even compact machine learning models require significant power. The costs required for hosting and maintenance may be prohibitive for developers or small companies. In this context, it may be cheaper to use commercial machine learning models with API access at the start. Calculations showthat a Large Language Model (LLM) deployed on its hosting will be cheaper only if the number of dialogues exceeds 8 thousand.

Decentralization as a compromise

The trade-off between powerful all-in-one LLMs and self-hosting may be decentralized language models. One example of this approach is the project Petals. At its core lies BitTorrent technology for exchanging data between network participants, each of whom downloads only part of the model. Output is generated at a rate of up to six tokens per second for LLaMA 2 (70B) and up to four tokens per second for Falcon (180B), which is suitable for chatbots and interactive applications. You can take a look at the demo version of the solution at official website developers.

Image - Kelvin Ang - Unsplash.com

Image – Kelvin Ang – Unsplash.com

But despite the attractiveness of distributed platforms, there is an opinion that a centralized future still awaits us – as has happened in other industries. For example, blockchain and cryptocurrencies were originally created with the goal of decentralizing the financial system. But in practice, work takes place on centralized exchanges, and regulation is also pushing the cryptosphere towards centralization.

A similar story could happen with AI systems. It’s easier to open an inference page in a browser or integrate an API into an application than to set up decentralized nodes or retrain your own model. Plus, regulators here are pushing the industry towards centralization. In particular, states are introducing new regulations for developers of AI systems to follow. For example, European AI Act will impose obligations on businesses related to data security and ethical standards. Small and medium-sized companies may have difficulty complying with regulatory requirements, so they may find it easier to turn to a provider's API toolkit and work with its AI system. Obviously, it is easier for a major player like OpenAI to comply with the new regulations (and the company will be forced to do this in order not to lose markets and audiences in developed countries).

However, these are all mid- to long-term issues. And if you want to apply to a self-hosted LLM right now, we have guide about how to assemble a computer for working with large language models.

beeline cloud – secure cloud provider. We develop cloud solutions so that you provide your customers with the best services.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *