How to tame a neural network

For example, OpenAI has openly stated that to improve the quality of answers in their system, they use query histories, that is, everything their users have ever written. Therefore, some organizations strictly prohibit using ChatGPT and posting fragments of their documentation, source code, etc. there. But such services are too attractive to be completely ignored. After all, they can really be useful if you use them correctly and, most importantly, clearly understand why you need it.

By the way, local models can come to the rescue. They can be run on a home machine, if performance allows, or on a company server.

Please note that neurons run much faster on CUDA cores, so a powerful video card is welcome.

There are a lot of models in the public domain that have already been trained by certain companies and even just people. You can use any of them, further train them, since they are all built on open source data (yes, Open Source from the world of neural networks), and tailor them to your needs.

For example, you can “feed” all the documentation of your project to such a model. Let’s imagine that you have a large database in Confluence or YouTrack. All you need to do is export it and get rid of all the unnecessary stuff (especially the markup so that the model can “eat” the information better), and then calmly ask questions about the content.

Something like this is worth a try PrivateGPT. According to the documentation, you can install everything without any problems. You will have the opportunity to throw some text files into it and see how the model answers questions about them. One point: the author recommends using the Mistral model. It is, of course, good, but it works extremely doubtfully with the Russian language, so I advise you to look towards the modified version – Saiga Mistral.

Of course, it has flaws compared to ChatGPT, but it is still a good solution for solving simple local problems and primitive tasks. For example, this could be useful in the process of onboarding new employees, when you don’t want to sift through a large amount of boring documentation and annoy your colleagues with endless trivial questions about work issues.

Well, now I’ll briefly tell you what to look for when choosing a model. First of all, find out what she specializes in (not all models, as in the example above, understand the Russian language normally). Next, look at your computer. If it doesn’t have a powerful graphics card or an M-series chip from Apple lying around, the process can be extremely slow. Now take a look at the model name. As a rule, there are abbreviations there (B7/B13/B15, etc.). This is the number of parameters in the model. If you have, say, 16Gb of RAM, then it is recommended to take models B7/B13. If 32Gb, then you can try both B15 and B30.

Why is this? The more parameters, the better the answer, as a rule.

When everything has started, look at how much your video card is loaded and how much memory it uses. Try not to increase the values to peak loads, this will reduce the speed of operation. Here, unfortunately, only by trial and error. I use I7 9700k, rtx2080ti and 64Gb ram. This is enough for the B30, but it works quite slowly.

Most likely, your local neuron will be much stupider and will often hallucinate, unlike ChatGPT, because people from Open Source have much less resources than a company with a huge pile of money and employees.

Now let’s move on to more interesting things. There is such a thing as LM Studio. You select any of the available models there, download it to your computer and “feel” it. One of the very big advantages is that it is possible to open the API and communicate with it as a server. Based on this, you can write some application or integrate it into the flow. You can read about the models and generally get acquainted with what exists: here. This is something like GitHub from the world of neural networks. There are even models for generating pictures and much more interesting things. Overall, I definitely recommend it.

If you want something unusual and even more experiments, it’s time to chop up bundles of neural networks and that same flow. I recommend something like FlowiseAI. This thing will allow you to easily build a flow between models using a beautiful UI, and it’s also quite easy to connect with LM Studio.

In addition, it is worth exploring approaches that will improve the quality of the information obtained and combine the results of several models. Take, for example, the BM25 algorithm. Or at least just Google what flow builds other people have made. In short, I highly recommend this topic for you to read.

Well, I told it as briefly as possible and without much in-depth. If you find this topic interesting, I suggest you read my next article, which will appear very soon. From it you will learn how to implement a pretty good assistant with a search in the knowledge base and achieve the highest quality answer.