Building a computer to work with large language models

Artem ChebykinI am an ML engineer and media writer WHITE. In this article, I will talk about what type of computer: desktop, laptop or MacBook is most suitable for machine learning and why. We'll also look at a beginner and advanced build for machine learning of large language models (LLM).

L.L.M.

LLM is a model that is trained on large amounts of text data and is used to understand, summarize, and generate text. I am considering options for assembling a computer specifically for this type of model, because LLMs are very relevant now: many companies are trying to create a personal assistant who will help employees in solving work issues.

Large language models have two processes: training and retraining. The difference is that during additional training, the model adapts to the data of a specific topic in which it delves. When training, a model trains its generalizing ability on a large array of data, that is, the ability to analyze data and give correct answers to questions.

Additional training is simpler and requires fewer resources, but is important for cloud services. The reason is that if the model has seen very little or no data, it will begin to hallucinate. She will begin to invent something that does not exist. It is better to use local training models, because all of them, unlike online services or APIs, have a retraining function.

In this article I will focus on assembly specifically for LLM training. One of the tasks that LLMs solve is code generation. Link you can find a list of leaderboards by result for the models that generated the code. These are local open models that can be trained on the system. We will select parameters for their training.

What type of computer is best for machine learning?

To answer this question, let’s compare the hardware in a desktop computer, laptop and MacBook according to several criteria.

1. Performance

All laptops have a stripped-down version of the video card, which is limited in frequency and memory capacity. This is necessary so that the cooling systems of the equipment support its operation without failures in the small space of the laptop case. If you install a full-fledged version of a video card in a laptop, the cooling system will not cope, the laptop will overheat, and the components will fail.

For example, a laptop with expensive top-end hardware Machenike L16 Pro Supernova has an RTX 4090 for laptops – a cut-up video card with the characteristics of the RTX 4080 for desktop computers, so it is one step lower than the desktop version of the RTX 4090. This applies to any laptop.

Comparison of standard and desktop versions of the video card

(Data source)

2. Possibility of replacement

Another disadvantage of laptops and MacBooks: the central and graphic processors are soldered on the board, so they are difficult to replace. In a laptop near the processor BGA pins, that is, an array of balls acts as contacts. To remove the chips, you need a soldering station. You also need to choose the right desoldering time so as not to burn the electronics. Even if you manage to remove the processor correctly and replace it with a more powerful one, it may not be suitable due to incompatibility with other laptop components. The only upgrades that do not require special skills are adding an additional hard drive or SSD and replacing the RAM stick.

In all MacBooks it is impossible to replace even RAM.

A desktop computer has a different platform format: to replace the video card, you need to remove the additional power cable, unscrew the fixing bolt, unclip the latch and pull it out. To replace the processor, you need to remove the cooling system from it, lift the cover and pull it out. Both operations do not require a soldering station.

3. Memory capacity

However, MacBooks compensate for the inability to replace the CPU and GPU with unified memory and a unique architecture.

In laptops from other companies, the CPU and GPU are separated and each has its own memory. The MacBook's GPU and CPU are not demarcated and access the same memory blocks. Because of this, data does not need to be copied between different memory sources, which saves time and improves system performance.

Tests have shown that the GPU can occupy approximately 75% of the memory. In the case of 128 GB of unified memory, the GPU can occupy a maximum of 96 GB, the rest remains for the central processor. In traditional architecture, the GPU takes up much more, and with Apple the processor takes up some of the memory from the total.

Thanks to this, even with a total RAM of 64 or 96 GB, the MacBook has an impressive amount of memory, so you can train large models on it. However, MacBooks lose out to desktop computers, since the power of their processors still does not reach NVIDIA video cards, which are most suitable for machine learning.

GPU working set size — 75%

(Video source)

4. Video card

To understand why NVIDIA video cards are best suited for machine learning, let's look at the advantages of the company's products compared to other major video card manufacturers – AMD and Intel.

NVIDIA advantages:

  • The company began developing unified hardware and software for machine learning much earlier than AMD and Intel. The NVIDIA library development community is much larger and more active than AMD and Intel. Although AMD publishes news that their industrial cards can outperform NVIDIA cards, usually we are talking about benchmarks, that is, manufacturers run them in artificial conditions.

  • NVIDIA video cards have tensor cores. Their name is a marketing ploy from NVIDIA that attracts attention. Essentially, tensor cores are small computational units specially created by NVIDIA to calculate tensors that are used in neural networks.

  • NVIDIA video cards have CUDA cores – unified multitasking graphics cores. Their codecs are used, for example, for rendering, graphics rendering in games or graphics applications, and for accelerating machine learning.

  • Many frameworks and libraries are optimized using NVIDIA technology.

What to look for when choosing an NVIDIA video card for training:

  • Number of tensor cores. They can multiply operations on tensors in one clock cycle. The more tensor cores, the better – the more operations we can perform per unit of time and the faster the calculations will be.

AMD has Matrix Core Technologies – an analogue of tensor cores, which does not have such good hardware support. Since AMD's software is not as good as NVIDIA's, the speed of the video card is lower. With the release of ADA, the latest generation of NVIDIA graphics cards, the difference in quality between NVIDIA and AMD graphics cards has become even greater.

  • Number of CUDA cores. They allow you to partially accelerate machine learning, but not as effectively as tensor ones.

How many tensor and CUDA cores are needed? The average number for tensor and CUDA cores depends on the budget. For example, the entry-level machine learning solution RTX 4090 has 16,384 CUDA cores, 1,321 fourth-generation Tensor Cores, 512 texture units, and 176 rasterization units. There are even more of them in professional cards.

  • The amount of video memory on the card does not affect the learning process, but is very important. I believe that now the minimum size for training large grids is 24 GB. This capacity is available either in top NVIDIA gaming solutions: 3090, 3090 TI, 4090, or in video cards in the industrial segment. If the network is too large, it simply will not fit into our video memory. To reduce its size, you can use compression technology LoRAbut this will lead to a decrease in the quality of the model.

  • Video chip frequency. The higher the frequency, the higher the operating speed, the faster and more operations the processor performs per clock cycle.

  • Memory bus width video cards and memory type. NVIDIA gaming cards and some industrial cards now use the GDDR6X or GDDR6 standard. But there are also cards that are installed, for example, on NVIDIA Tesla – HBM2 memory with enormous bandwidth. That is, in one clock cycle we can transmit and receive much more data.

A table with characteristics and prices for video cards, which decrease from top to bottom. RTX 3090 and 3060 are not suitable for larger models. Video cards above RTX 4090 – professional solutions

5. Price

With equal performance, the price of a laptop and MacBook is higher than a desktop computer. The MacBook has a unique architecture, but its price is unreasonably high, so it is fair to compare the prices of laptops and desktop computers.

Due to the fact that laptops use cut-down versions of video cards, they will never catch up with the power of desktop computers. The latest version of the video card in a desktop computer will work at maximum, which cannot be achieved by a stripped-down version of the same video card in a laptop. That is why, with the same price and video card model, it is better to choose a desktop computer rather than a laptop.

Conclusion

The worst option for machine learning is a laptop, because it has chopped, less productive hardware than the same model in a desktop computer. MacBooks come in second place because they have unified memory that allows for machine learning. However, they contain video cores from Apple, and not NVIDIA video cards, which are most suitable for training. The best option is desktop computers with NVIDIA, because they have full-fledged versions of hardware and, with the same performance, are more cost-effective than MacBooks or laptops.

Options for building a computer for machine learning

In this section, we will consider two options for assembling a desktop computer for machine learning. These configurations are not standard; you can take analogues from other manufacturers. They are more about the concept of building a computer for machine learning.

Initial assembly

Budget build for machine learning and training large models.

Concept: We build a system around a video card – we invest as much as possible in it, because we train models on it. The remaining components are budget-friendly; they reveal the potential of the video card when training neural networks.

Entry Level Build Components

RAM. The minimum amount of RAM is 64 GB. Four memory sticks will allow the system to operate stably.

CPU. The processor operates in dual-channel mode. While two bars are active, the other two are being prepared. I recommend installing processors from AMD, because they are cheaper and also emit and consume less heat than processors from other companies. AMD also said that the platform on which AM5 stands will be supported until 2026. This means that it is relevant.

Storage devices. It is not enough to make a partition on one SSD; I recommend having two: the first of small size – for example, 256 GB – for the system, and the second – for the current project. That way, if the system or drive fails, you'll only lose the small SSD, and your data will remain safe on the second drive. Losing a system that can be reinstalled is not as bad as 2 TB of unique data.

Also, the high speed of an SSD may not be enough for transfer if you place the system and a large amount of data on one disk. The system will slow down.

Hard disks. Needed to store the project after its completion. They are more reliable than SSDs.

Power unit. An 850 W power supply is sufficient to power such a system.

Price. Approximately 300,000 rubles.

What can be done with this assembly. Link The parameters that are necessary for training various models are indicated. Pay attention to the GPU requirements parameter – in the case of the initial build, we have 24 GB of VRAM. BERT and Falcon, for example, fit these criteria. Also 24 GB is the minimum for Zephyr.

Advanced build

Assembly with the most powerful machine learning components as of April 1, 2024.

Concept: industrial cards are used. Not top-end, but more powerful than RTX 4090.

Each part of the assembly is selected to ensure the system performs to its maximum.

A special motherboard is used.

Advanced Build Components

CPU. This build option, like the first one, has an AMD processor, but it is almost a server one. AMD separates two types of processors, separating them from server processors. They are called high-end system processors and are installed in very advanced systems. These are exactly the ones needed for the intensive calculations found in neural networks.

RAM. The amount of RAM differs. If the previous build had 64 GB, then here you can install almost 2 TB.

The motherboard can accommodate up to four video cards, which can be placed on one board.

Price. The price of all components, excluding custom water cooling and assembly of all components, is approximately 6.5–7 million rubles.

What can be done with this assembly. You can train models such as LLaMA 70B, Bloom, Mixtral-8x7B.

What will the advanced build give?

  • A large amount of RAM (192 GB), you can store a lot of data.

  • High speed.

  • High quality results.

The more memory, the less you need to think about compression and the higher the quality will be, because we are training a full-fledged model, not a compressed one. You don’t have to use additional LoRA and QLoRA techniques to squeeze into 24 GB of RAM. In the advanced build – 192 GB, which in itself is quite a lot. NVIDIA professional cards also have technologies that can be used to combine them into one video card, which will further increase the learning speed.

Finally

The type of computer you build for machine learning depends on the problems you need to solve. Pay attention to the tasks you need to solve, your budget and comfort. Perhaps autonomy will play a decisive role for you, and you will prefer a laptop rather than a desktop computer.

WHITE — DIY media for IT specialists. Share personal stories about solving a variety of IT problems and receive rewards.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *