10 Open WebUI Tips to Help You Work with Neural Networks

Open WebUI is an open-source web interface designed to work with various LLM interfaces, such as Ollama or other OpenAI-compatible APIs. It offers a wide range of features, the main ones being aimed at improving the usability of model management and querying. You can deploy Open WebUI as on the serverand locally on your home machine and get a neural network harvester on your desk.

The platform allows users to easily interact with and control language learning models (LLM) through an intuitive graphical interface on both desktop and mobile devices, including OpenAI chat-style voice.

As Open WebUI continues to evolve thanks to an active community of users and developers who contribute to its improvement, on the one hand, the platform is constantly being improved, new features are added and existing ones are optimized, and on the other hand, the documentation does not keep up with all these changes. Therefore, we decided to give 10 tips that will help you unleash the potential of Ollama, Stable Diffusion and Open WebUI itself. All examples are given for the English interface, but you will intuitively find the necessary items and sections in Open WebUI in other languages.

Try AI chatbot on your own GPU server.

HOSTKEY offers a personal AI chatbot based on Ollama and OpenWebUI with a pre-installed Llama3.1 8b model, which is available on your own server! This innovative solution is designed for those who value data security, scalability and cost savings.

Main advantages

  • Data Security and Privacy. All data is processed on your server, which prevents third parties from accessing confidential data.

  • Saving moneyPay only for server rent – using the neural network is free, the cost does not depend on the number of users.

  • Scalability. Easily migrate your chatbot between servers, scale up or down computing power, and control costs.

  • Flexible customization. Connect and customize various LLM models, including Phi3, Mistral, Gemma and Code Llama.

Try it

Tip 1. In Open WebUI you can install any model for Ollama from the list of supported ones

To do this, simply follow the link https://ollama.com/library and use the search bar to find the desired model. If the model name does not have a prefix before /, then it has been uploaded by Ollama developers and tested for functionality, if there is, this model has been uploaded by the community.

Pay attention to the model's tags: the latter shows its size in billions of parameters. The higher this number, the more “powerful” the model, but the more memory space it takes up. You can also choose models by type:

  1. Tools — models for both general use in the “request-response” mode and for specialized ones (mathematical, etc.).

  2. Code – models that are trained to write code.

  3. Embedding — models that are needed to transform a complex data structure into a simpler form. They are needed for searching for information in documents, parsing Internet search data, creating RAGs, etc.

  4. Vision — multimodal models that can recognize an uploaded image, answer questions about it, etc.

To install a model in Open WebUI, you need to go to its card, select its size and compression from the drop-down list and copy the type code ollama run gemma2. Then you need to go to Open WebUI, paste this command into the search bar that appears when you click on the model name and click on the text Pull ollama run gemma2 from Ollama.com.

After some time (depending on the size of the model) the model will be downloaded and installed in Ollama.

Tip 2: In Open WebUI, you can set a model larger than the video memory of your video card

Some users are worried that the model they have chosen will not fit into the available video memory. Ollama (or rather llama.cpp that underlies it) can work in GPU Offload mode, i.e. the neural network layers will be divided during calculations between the video card and CPU and cached in RAM and on disk media. This will affect the processing speed, but as you can see from the screenshot, on a server with a 4090 video card with 24 GB of video memory and 64 GB of RAM, even the Reflection 70b model was launched, albeit very slowly (about 4–5 characters per second), while the same Command R 32b worked quickly. On local machines with 8 GB of video memory, gemma2 9B will also work fine, which will not change completely.

Tip 3: You can delete and update models directly from Open WebUI

Ollama models are updated and corrected quite regularly, so it is recommended to download new versions from time to time. Also, the model often remains with the same name, but with improved content.

To update all models uploaded to Ollama, you need to go to Settings – Admin Settings – Modelsby clicking on the user name in the lower left corner and going through the menus. Then you can either update all models by clicking on the button in the section Manage Ollama Modelsor delete the desired model by selecting it from the drop-down menu of the item Delete a model and clicking on the trash can icon.

Tip 4: In Open WebUI, you can increase the context of models both globally and for the current chat

By default, Open WebUI requests Ollama models with a context size of 2048 characters. Because of this, the model quickly “forgets” the current discussion and is difficult to work with. The same applies to “Temperature” and other parameters that affect the response.

If you want to increase the context size or temperature for the current chat only, click the settings icon next to the account circle in the upper right corner and set the desired values. Remember that by increasing the context, you increase the size of the data transferred to the model and decrease its speed.

If you want to change the global settings, then click on the user name in the lower left corner and select the menu Settings-Generaland in it open the submenu Advanced Parametersby clicking on Show next to the inscription, and change the desired values.

Tip 5: In Open WebUI, you can ask the model to use information from the Internet to answer

When asking a model something, you can either point her to a specific site or ask her to search the Internet using a particular search engine.

In the first case, when making a request, specify the URL of the desired site in the chat through #, press Enter and wait until the page loads, and then write a request.

In the second case, specify the web search provider and its parameters in the menu Settings – Admin Settings – Web Search (we recommend installing the free one duckduckgo or get an API key for Google search). And don't forget to save the settings by clicking the button Save in the lower right corner)

Then in the chat before the request, turn on the switch Web Search. The search will work during the entire session of communication in this chat.

The only downside is that searching the Internet takes some time and depends on the embed model. If it is set incorrectly, you will get the answer No search results found. If this happens, the following advice will help you.

Tip 6: Searching documents and sites in Open WebUI can be improved

By default, Open WebUI uses the SentenceTransformers library and its models. But even these need to be activated by going to Settings – Admin Settings – Documents and clicking on the download button next to the model name in the Embedding Models item.

It is better to install the Embedding model directly into Ollama, which will significantly improve the quality of searching documents and the Internet, as well as the formation of RAG. How to do this is written in secret # 1, and we recommend installing the model paraphrase-multilingual. After installing it in Ollama, change the parameter in the section above Embedding Model Engine on OllamaA Embedding Models — on paraphrase-multilingual:latest. Again, don't forget to save the settings by clicking on the green button. Save.

Tip 7: Open WebUI lets you quickly enable temporary chats that aren't saved

Open WebUI saves all user chats, allowing you to return to them later. But this is not always necessary, and often (during translations or other tasks) saving will even interfere, cluttering the interface. The solution to this is to enable the temporary chat mode. To do this, open the models menu at the top and move the switch Temporary Chat to the “On” position. You can return the chat saving mode back by turning it off. Temporary Chat.

Tip 8: You can install Open WebUI on Windows without Docker

Previously, working with Open WebUI on Windows was difficult, as the panel was distributed as a Docker container or as source. But now you can install it on Windows (after installing Ollama) via pip.

All you need is Python 3.11 (yes, 3.11) and running it in the Windows command line:

pip install open-webui

After installation, you need to run Ollama, then enter the following in the command line:

open-webui-serve

And open the web interface in the browser by typing https://127.0.0.1:8080.

If you get an error when launching Open WebUI

OSError: [WinError 126] The specified module was not found. Error loading “C:\Users\username\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\lib\fbgem.dll” or one of its Dependencies,

download and copy to /windows/system32 library libomp140.x86_64.dll from here: https://www.dllme.com/dll/files/libomp140_x86_64/00637fe34a6043031c9ae4c6cf0a891d/download.

The only downside to this solution is that Open WebUI may start to conflict with applications that require a different version of Python (in our case, the WebUI-Forge clone “broke”).

Tip 9: Open WebUI can generate graphics

To do this, you will need to install the Automatic1111 web interface (now called Stable Diffusion web UI) and models (see. Instructions) and configure work with Open WebUI.

For convenience, you can add a generation button via external tools (Tools) directly in the chat sent by the user, rather than using repeat and generation in the response. To do this, go to and register on the site https://openwebui.comthen go to the tool Image Gen and by pressing the button Getenter the URL of your Open WebUI installation, and then import the tool by clicking Import to WebUI. After this, the code will be added to Workspace-Tools and all you have to do is click on Save in the lower right corner.

If you have correctly configured access to image generation via API, then in the chat you can click on the plus to turn on the switch Image Gen and send the request directly to Stable Diffusion. The old method of generating via Repeat will still work.

Tip 10: Open WebUI lets you recognize images

Open WebUI allows you to work with vision models that can recognize images in pictures. To do this, select and install such a model, as described in the tips above (for example, llava-llama3). Next, by clicking on plus in the chat line and selecting Upload Filesyou can upload a picture and ask the model to recognize it. The result can then be “fed” to another model.

In fact, this is only the tip of the iceberg: the functionality of Open WebUI is much broader: from creating custom models and chatbots, and accessing them via API to automating work with LLM through tools and functions (Tools and Function), allowing you to filter user requests and cut off unwanted content, create processing pipelines from several sequential models, clean the output code from garbage or even play Doom! If you are interested in learning more about this, write in the comments.

Try AI chatbot on your own GPU server.

HOSTKEY offers a personal AI chatbot based on Ollama and OpenWebUI with a pre-installed Llama3.1 8b model, which is available on your own server! This innovative solution is designed for those who value data security, scalability and cost savings.

Main advantages

  • Data Security and Privacy. All data is processed on your server, which prevents third parties from accessing confidential data.

  • Saving moneyPay only for server rent – using the neural network is free, the cost does not depend on the number of users.

  • Scalability. Easily migrate your chatbot between servers, scale up or down computing power, and control costs.

  • Flexible customization. Connect and customize various LLM models, including Phi3, Mistral, Gemma and Code Llama.

Try it

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *