One More Machine Learning PC for the Price of an RTX4090
“Artificial intelligence may be the best or worst thing that ever happened to humanity.” Stephen Hawking
Hello everyone! The topic of the Artificial Intelligence (AI) march across the planet, which has been greatly heated up recently, not only does not fade away, but also attracts more and more attention every day, giving no one peace and causing discussions and disputes – from professional to superstitious. LLMs like ChatGPT, Gemini, YandexGPT and other models are being improved, services based on them are becoming more widespread and accessible, and more and more interesting things are appearing that can be done using them.
There are already a large number of open trained LLM models – run your GPT on a leash, as they say, and work wonders. Also, I have long wanted to run both Stable (and not so) Diffusion. In addition, my work is now related to programming and training simpler networks for predictions on time series. So there are no problems with models for experiments, but with computing resources for training, things are much worse. Options with renting servers with video cards are suitable for small short-term experiments due to the high cost and, of course, can be an option for business, but for a home experimental lab this is not the best choice.
In short, I decided to assemble my own PC to run large and small, smart and not so smart, but completely artificial models, and this is what came out of it.
“Porridge from an axe”
As in the famous fairy tale, porridge suddenly appears around the axe, so the PC for machine learning appears around the video card. The choice of the card is the most important step. As I already said, I was interested in large language models, which means that there can't be too much memory. Of the non-professional gaming video cards available on the market, the maximum memory you can count on is 24Gb.
The leader now is Nvidia RTX4090, which gives unattainable results in performance, has the maximum amount of new fast GDDR6X memory, is as fast as an electric broom and beautiful as a new refrigerator on sale. The downside of all its outstanding properties is the inhumane price – approximately from 200K rubles. The toad sitting inside me strangled thoughts of perfection at this stage and made me look for “a three, but in Butovo”. I considered different options, including AMD, by the way, but still they are much inferior in a number of characteristics. As a result, I found out that, despite the fact that Nvidia is already releasing a whole line of 4xxx series, there are no other cards with the desired memory capacity. But in the previous generation 3xxx there is an RTX3090 option. This card, of course, is inferior in performance to the 4090, but is much more affordable and has 24Gb on board.
And since I am not proud, and was ready for a used horse, I turned my gaze to the latest ads of a well-known resource. And on the very first page, bam, and stars the cards aligned! MSI GeForce RTX 3090 Ti Gaming X Trio 24G for only 80K rubles! That is, not just RTX 3090, but also Ti. I didn't delay and my axe for porridge was in my hands that same day.
Below is a little bit about RTX3090Ti
A little about the main star of my build – MSI GeForce RTX 3090 Ti Gaming X Trio 24G. Yes, it is not a flagship. RTX 4090but it also costs significantly less. Despite the fact that the RTX 3090 Ti is inferior to the new product in performance, it has a number of advantages that make it an excellent choice for machine learning.
Why RTX 3090 Ti?
24 GB of video memory: Allows you to work with large models and datasets without having to resort to quality-reducing optimizations.
Tensor cores: The Ti version is equipped with 3rd generation Tensor Cores, which significantly accelerate operations related to machine learning and deep learning.
Availability: Price 80,000 rubles (in my case) makes it more attractive compared to the RTX 4090, which starts at around 200,000 rubles.
RTX 3090 Ti Capabilities
Machine learning: Tensor cores and large memory capacity allow training complex neural networks faster and more efficiently.
Image generation: Working with Stable Diffusion and other image generation models becomes more comfortable and faster.
Big data processing: High throughput and performance speed up analysis and processing of large volumes of information.
NVIDIA GeForce RTX 3090 Ti comparison And NVIDIA GeForce RTX 4090:
Characteristic | NVIDIA GeForce RTX 3090 Ti | NVIDIA GeForce RTX 4090 |
---|---|---|
Architecture | Ampere | Ada Lovelace |
Technological process | 8 nm | 5 nm |
CUDA cores | 10,752 | 16,384 |
Tensor cores | 336 | 512 |
RT cores | 84 | 128 |
Base frequency | 1.560 MHz | 2.235 MHz |
Boost frequency | 1.860 MHz | 2.520 MHz |
Video memory (VRAM) capacity | 24GB GDDR6X | 24GB GDDR6X |
Memory bus | 384-bit | 384-bit |
Memory bandwidth | 1.008 GB/s | 1.008 GB/s |
Power Consumption (TDP) | 450 W | 450 W |
Recommended PSU | 850 W | 850 W |
PCIe version | PCIe 4.0 | PCIe 4.0 |
Tensor cores generation | 3rd generation | 4th generation |
Price (at time of release) | About $1,999 | About $1,599 |
Training neural networks: The RTX 4090 delivers between 50% and 100% performance gains over the RTX 3090 Ti depending on the specific model and framework.
Inference of models: Thanks to improved Tensor Cores, the RTX 4090 delivers faster inference, especially when using new data formats like FP8.
Tensor cores and their role:
RTX 3090 Ti Equipped with 336 3rd generation Tensor Cores, which deliver high performance in machine learning and deep learning operations.
RTX 4090 has 512 4th generation Tensor Cores that offer even higher performance and power efficiency, as well as new features such as Sparse Tensor algorithm acceleration and FP8 support.
Performance in teraflops (FP16):
Performance graphs
RTX 3090 Ti Advantages
Availability: With the release of the RTX 4090, prices for the RTX 3090 Ti may drop, making it more affordable.
Memory capacity: Same as RTX 4090, 24GB VRAM, which is critical for larger models.
Compatibility: Time-tested Ampere architecture with broad support across various frameworks and drivers.
Advantages of RTX 4090
Increased core count: More CUDA, Tensor and RT cores.
New technologies: Improved 4th generation Tensor Cores and 3rd generation RT Cores.
Power Efficiency: Despite the same TDP, the new architecture offers better performance per watt.
Conclusions
RTX 3090 Ti It's still a powerful graphics card for machine learning workloads, especially with 24GB of VRAM, which allows you to work with large models and datasets.
RTX 4090 offers significant performance and energy efficiency gains, making it the preferred choice for those who want to maximize model training and inference speed, have a budget, and find a reason to suck it up rather than the other way around.
The main thing is not to oversalt
With the video card in hand, all that was left was to add the remaining components to the mix that could reveal its (the card’s) potential.
CPU
I chose AMD Ryzen 9 7950X3D. I've been wanting to try AMD cpu instead of Intel for a long time, and now the time has come. 16 cores and 32 threads provide high performance in multi-threaded tasks, which is ideal for machine learning. Price in 59,406 rubles seemed justified to me and more attractive compared to Intel, as did the thermal package, by the way.
Motherboard
Such a powerful duo of processor and video card requires a reliable foundation. The choice fell on ASRock X670E STEEL LEGEND DDR5 for 30 293 rubles. The word “legend” in the name inspired confidence, and support for DDR5 and PCIe 5.0 provided a reserve for the future. I really liked that you can install 4 M2 SSDs on it and three of them have good radiators.
RAM
RAM is critical when working with large models. I settled on G.Skill Flare X5 DDR5 5200 MHz 2×32 GB for 18,722 rubles. 64 GB memory allows you to comfortably work with large datasets and models. Later it will be possible to expand to 128 GB motherboard allows.
Cooling system
After several unsuccessful attempts with other coolers, I chose Deepcool LS520 WH for 10,464 rubles. This liquid cooler not only effectively cools the processor, but also looks stylish in white, fits the case perfectly! I will say right away that Deepcool LS720 won't fit into the case.
Storage
For storing data I chose SmartBuy 1 TB SSD Stream P16 for 8 283 rublesThis NVMe SSD provides high read and write speeds, which is important when working with large amounts of data, and its price is quite modest.
power unit
The unit is required to produce at least 850W! Otherwise, the card will not even start. Taking into account the energy consumption of the card and other components, I chose a power supply with a small reserve Deepcool PQ1000M 1000W 80+ Gold for 13,844 rubles. It will cope with its task 100% and ensure stable operation of the system.
Frame
I chose be quiet! PURE BASE 500DX for 11,449 rubles. I already bought such a case for another PC, a beautiful and easy to install component, with good ventilation and quiet operation – what you need for a powerful system.
Final assembly and cost
Here is the final table of components:
Component | Model | Price (rubles) |
---|---|---|
Video card | MSI GeForce RTX 3090 Ti Gaming X Trio 24G | 80,000 |
CPU | AMD Ryzen 9 7950X3D | 59 406 |
Motherboard | ASRock X670E STEEL LEGEND DDR5 | 30 293 |
RAM | G.Skill Flare X5 DDR5 5200 MHz 2×32 GB | 18 722 |
Cooling | Deepcool LS520 WH | 10 464 |
Storage | SmartBuy 1 TB SSD Stream P16 | 8 283 |
power unit | Deepcool PQ1000M 1000W 80+ Gold | 13 844 |
Frame | be quiet! PURE BASE 500DX | 11 449 |
Total | 232 461 |
Some pictures
Conclusion
In the end, I built a powerful PC that could handle machine learning, image generation, and other resource-intensive applications. The cost of the system ended up being comparable with the cost of the RTX 4090 alone.
Now I can experiment with large language models, train neural networks, and work with big data at home. I hope my experience will be useful to those who are also thinking about assembling their own PC for machine learning.
P.S. If you are also faced with choosing a video card, remember that it is not always worth chasing flagships. Sometimes compromise solutions can be a great option, especially when it comes to the balance between price and performance.
PSS In continuation I will provide a description and scripts for setting up Linux + CUDA.