No, this post is not about how to run Stable Diffusion at a reasonable speed on a simple laptop without NVIDIA.
Yes, this post is about how to run Stable Diffusion at a reasonable speed on a simple laptop without NVIDIA.
But only if you still have a card somewhere. The post is not about freebies 🙂
The question is often asked: “We have a small agency, how can we make 4 people work on one video card?”. The same applies to “mom-dad-me is a close-knit family, and we all love Stable Diffusion.”
This becomes even more relevant due to the difficulties that have arisen when trying to drive SD for free in Google Colab (and Kaggle and other substitutes are not so generous).
In this post, I will describe several options for “single GPU collaboration”, their pros and cons. Basically, on the example of Stable Diffusion, although I also drove Vicuna in parallel.
Hardware and OS
So, there is a certain machine with a good video card. I experimented on an external RTX 3090 (Gigabyte AORUS BOX) and Intel NUC 9. I will call this machine “GPU server” or simply “server”.
Unfortunately, I was not able to share this card in ESXi. This is standard only for professional graphics cards and requires additional licensing. I met workaround for Proxmox, but not for Ampere cards (2060 is possible, but 3090 will not work). So I decided to do without virtualization.
Most of the materials on how to install Stable Diffusion and other bells and whistles like Kohya_ss on Windows. There is a beautiful portable assembly. Vicuna also has a bunch of powershell scripts. Although now the situation with drivers for NVIDIA under conditional Ubuntu is much better than many years ago, I decided not to bother and install Windows.
First, I installed Windows Server 2019. I found that under NUC 9, the drivers are not very good there. As a result, I rolled back to the most common Windows 10 Pro.
I wanted every member of the family to be able to connect to this server from their laptop. Those. he stands to himself and works. Or anyone can come up, press the power button, but work not from him, but from his laptop. (I omit all the bells and whistles like turning on a smart socket remotely, because this is a matter of taste and is beyond the scope of this post).
In other words, the server must be able to serve remote requests without login after being turned on. And here a difficulty arose.
I set everything up with a keyboard and monitor, connected from a neighboring laptop via Remote Access – everything is fine. I turned it off, put it under the table, turned it on, connected to the desktop, but there was no card. More precisely, it is visible, but does not work. I take out the server, connect the monitor and keyboard – everything starts superbly.
It turned out that although in the Thunderbolt settings (I have an external card) I authorized the device and allowed it to always connect, Windows has additional protection – Kernel DMA Protection. If you fully logged in, everything is fine (that’s why everything worked with the monitor and keyboard). But if you just turned on the server, the card will be visible in the device manager, but will not work fully.
I had to disable this protection (Allow all instead of Only while Logged in)
Web interface via LAN
Everything is simple here, only small features.
Since we want to connect over the network, we need to add to the command line options
--listen. If we are on a LAN, work through a public address (
--share) would be strange. And if you need access from afar, I prefer ZeroTier.
Now the server listens not only on localhost, and it can be accessed as standard (
You can’t just install extensions. For security reasons: so that those who connect remotely cannot install “bad” and get redundant access to your server. Or clean up
--listen, switch to 127.0.0.0, install extensions, and change everything back. Or, if you trust your other users, there is a special key
For portable assembly something like this:
set COMMANDLINE_ARGS= --api --listen --port 78XX --theme dark --enable-insecure-extension-access
--xformers I do not specify, it is individual).
Separately, I will indicate a nuance for those who want to connect with authorization. Yes, you can specify
--listen --share --gradio-auth user:password. And when you connect with a browser, you will indeed be asked to enter a password. Interestingly, the plugin for Photoshop (in my case, StableArt), while connecting easily and naturally, without asking you for a password. Others may also join.
And because we must not forget to require a password for api (
However, after adding this key, the StableArt plugin failed to ask me for a password and refused to work in this form. And this is another reason why I don’t use –share, but connect as if over a local network (Wireguard, ZeroTier..).
Now to the point (image generation).
1. Give everyone access to the Web interface
The variant is the simplest. And, if you do not work at the same time, has the right to life. But if several people try to generate something at the same time, glitches are possible. The web interface of the AUTOMATIC1111 is not designed for this.
For example, one user has changed some parameters, the other does not know about it, but now his generation is different.
Even just running it in different browser tabs, you can catch strange effects.
2. Launch multiple SD instances
In my opinion, the best option. Simple and convenient.
We took a portable assembly, installed it in different directories. For heavy files such as models and directories, links were made with them so as not to store gigabytes in several copies. And they launched it independently on different ports.
Each user can configure his own settings, install his own extensions. His pictures will fall into his own folder. And he can watch it through the sd-webui-infinite-image-browsing extension or set it up for viewing in breadboard-web in his browser.
The first option to automatically start webui-user.bat is to turn it into a service with, for example, “NSSM is a service helper program similar to srvany and cygrunsrv”. That’s exactly how it works for me. In the description of each service, I specify the launch not under LocalSystem, but under a specific user (.\WinUser). You just need to check how everything starts under this user by logging in under him via RDP. Sometimes, when starting webui-user.bat, it may swear that extensions were installed under a different user, and suggest which command to run to fix it.
Run remotely manually
The disadvantage of the previous method is the lack of console messages. Sometimes useful things are written there. Even just to see that “nothing is happening”, because something big is being downloaded, and you just have to wait, is already reassuring.
If this is required, instead of a service, users can be allowed to connect to the console via ssh and run the desired batch file. There is no problem that it is impossible to connect to Win10 via RDP for several users, everyone can see the logs of their instance. Yes, in Win10 you can add an OpenSSH server, not just a client, using standard methods.
However, I find this method difficult for “designers”. Moreover, if you close the window, the automation processes will continue to run. Connecting and running again to watch the logs will not work. The port is already busy. You must first kill certain processes.
You can, of course, write an additional batch file with
tasklist /fi "USERNAME eq WinUser" /v | find "python" And
taskkill (add such lines to webui-user.bat), but I haven’t gotten to that yet.
Maybe there is something “native Windows” (winrm / psremoting, psexec …), which will be easier for such purposes.
I note that I did not meet conflicts. If someone starts generating a bunch of pictures through XYZ, the rest will continue to work. If the graphics card allows, of course (has enough VRAM). Even with LoRa training, I could continue to generate images (fortunately, 3090 is as much as 24GB).
And this applies not only to Stable Diffusion. At the same time I ran Vicuna.
This is the picture I was sculpting while another person was chatting with Vicuna. The pictures were generated in parallel with the text chat. But Vicuna requires a lot of video memory, it was just a “try”.
Perhaps the most sophisticated way. The point is that the GPU server does not run “client” software, but only a certain server part. All sorts of Stable Diffusion are installed on user laptops. But they are launched through the client part, which intercepts calls to the GPU and redirects them to the server.
Those. RAM, CPU are used laptop. There are no problems with console messages, with the security of extensions – all this is located locally with the user. And only calls to the GPU are sent over the network.
There is caching, there is no need to be afraid that a multi-gigabyte model will be sent over the network every time. Although switching the model to a completely new one will make you wait for the first generation with it. Maybe even at first the error will pop up, while gigabytes are squeezed through weak WiFi.
This is done through https://github.com/Juice-Labs/Juice-Labs/wiki
Now they do not support authorization, everyone can use it in LAN, but they promise to give a lift soon.
In terms of speed – on KDPV, depending on the batch size, the speed was from 3.25 to 10.78 iterations per second. After upgrading to torch:2.0.0+cu118, it went up to 5.66. However, when I initially connected the eGPU directly to this laptop, the speed was also no higher than 3.5 it / s. I remember that I was still very scared how the 3090 generates so slowly. It’s just that this laptop has a number of problems, and the frequency is lowered.
As for me, I settled on method number 2. Several SD instances on different ports. At the same time, it was annoying that several users could not connect via RDP to Win10 at the same time to work with SD in Photoshop through a plug-in, but with the advent of the Photopea extension, this problem disappeared.
Have you ever wanted to share a GPU?
How did they decide? I’d love to hear your experience.