Artificial Artist – Google from Text-To-Img World

Hello, my name is Dmitry Karlovsky and I love to draw masterpieces, but I don’t have the patience to finish at least one of them.

Earlier, I already showed you a self-written google search killer. I still use it and am satisfied with the cleanliness of the issuance. Now we’re making an Artstation killer for creatives who only have a few minutes of patience to create real beauty. And neural networks will help us with this.

$hyoo_artist_app $mol_page title @ \Artificial Artist

$hyoo_artist_app $mol_page title @ \Artificial Artist

Next, you will reverse-engineer the HuggingFace API to use the Kandinsky model, support queries in 100 world languages ​​thanks to the Small100 model, design an infinite virtual tape in a few lines of $mol code, and, of course, examples of the work of the Artificial Artist.

Server API

So, first of all, let’s go to github.com huggingface.com. There you can find several hundred thousand different models for every taste. We will be interested in fresh ai-forever/Kandinsky_2.1:

Very interesting, but nothing is clear

Very interesting, but nothing is clear

Of course, we will not download it, but we will use it from a virtual machine in the cloud. Fortunately, near one can find the so-called spaceon the page of which you can play with this model:

The Iron Lady

The Iron Lady

Another happiness – this space drives neurons on the GPU. It could be forked and customized, as I did, for example, hereby connecting together 3 neurons:

Two gynoids and an LGBT horse

Two gynoids and an LGBT horse

But then the neurons will work on the CPU, which is slow and, as practice shows, less stable. So we will use what we have. Fortunately, the space with Kandinsky is not such a starship as, for example, this:

class starship "stable diffuser"

Stable Diffuser-class starship

The space code is written using the Python framework Gradiowhich describes interfacesaccording to which REST and WS API are automatically built, and a frontend is formed on Svelte And Tailwindaccessible via links of the form: https://ai-forever-kandinsky2-1.hf.space/.

Despite lightness Svelte, the frontend is half a megabyte in weight. And how peaceful I am, but the one who came up with Tailwind, pulling 200KB of styles, I want to castrate. Twice, to make sure it doesn’t multiply, because customizing his styles is easy !inhumane.

Rear wheel drive front end

Rear wheel drive front end

That would have made them front on $mol – it would be possible to change the design in a matter of minutes instead of several hours, evaporating saltsnot to go crazy. Yes, and it would have turned out much lighter.

Insider drain

Insider drain

Just like in $mol, in Gradio the interface is built as a composition of ready-made components by customizing them. Customization tools are poorer, of course, but there is an interesting feature there: any component can act as a stream of input and output data. That is, literally, as arguments, you can pass not values, but other components, and they will be connected by a reactive connection:

Prompt = gr.Textbox( label="Prompt" )
Image = gr.Image()
Imagine = gr.Button( "Imagine" )

Imagine.click(
	imagine,
	inputs=[ Prompt ],
	outputs=[ Image ],
	api_name="imagine"
)

Here we create a text field, an image, and a button, and then we set up the button so that it calls a Python function with an argument taken from the text field, and puts the values ​​returned by it into the image.

If the code author has set an API name for the button, then the function bound to it will be available via the REST endpoint of the form hyoo-translate.hf.space/run/translateand in the footer of the frontend you will find a button that opens a nice auto-generated API dock:

Artificial Hu..

Artificial Hu..

But this API has one trouble – if the request is too long, then it will fall off on a timeout. And since this phenomenon is not rare, we will use the WebSocket API, which is used by the Gradio-generated frontend. There are no longer any timeouts, but there are messages about the progress of the request.

WebSocket API

Unfortunately, there is no documentation for the WS API, so let’s use the debugger and the scientific poke method. After running the generation several times, we note that each time a new connection is established to the endpoint of the form:

wss://ai-forever-kandinsky2-1.hf.space/queue/join

When the task completes, the connection is closed. That is, you can’t just raise the connection once and then send RPC requests. Ok, we didn’t master it, we look at what kind of messages are transmitted there. First of all, the server asks us for a hash:

{ "msg": "send_hash" }

This hash is really just an RPC request ID and can be any random string. Apparently this is a reserve for the possibility of multiplexing multiple requests in one connection. And although multiplexing is not supported, we still have to rush with this hash in every message, like a chicken and an egg.

Well, the client sends the hash and function number:

{
	"session_hash": "jnjiyncjiub",
	"fn_index": 2
}

Yes, on the socket, all functions, or rather buttons, or more precisely button click handlers, are identified by a serial number in the code, and not by name, as in the case of REST. Needless to say, such an API is very fragile, and this is not the way to do it? Well, since we see this in software from a large company, then apparently yes, you need to:

💡 It’s better to force the developer to set a semantic name explicitly than implicitly tied to the order of declarations in the code, dooming the developer to debug mystical bugs that pop up like mushrooms after a server update, due to the fact that the old client still calls the function by the old number.

In view.treeby the way, this requirement is implemented at the syntax level – in general, all entities have semantic names there, but let’s not get ahead of ourselves.

Having received our desire to run the second function, the server puts us in a queue and starts periodically sending messages with information about the movement in it:

{
	"avg_event_concurrent_process_time": 9.63406398266656,
	"avg_event_process_time": 9.63406398266656,
	"msg": "estimation",
	"queue_eta": 125,
	"queue_size": 15,
	"rank": 14,
	"rank_eta": 154.14502372266497
}

Here we see that there are now 15 requests in the queue, we are in 14th place, and we will have to wait for the result for two and a half minutes. Well, this is now a full house, and so, at best, the picture can appear already in 10 seconds, according to the average query execution time.

When the server is ready to deal with our request, it will ask us for the function arguments:

{ "msg": "send_data" }

To which the client responds:

{
	"session_hash": "jnjiyncjiub",
	"fn_index": 2,
	"data": [ "Artificial Artist", "ugly mug" ]
}

And the server confirms that it went to work:

{ "msg": "process_starts" }

And if the stars add up successfully, then soon we will get the result:

{
    "msg": "process_completed",
    "success": true,
    "output": {
        "data": [[
            {
                "is_file": true,
                "data": null,
                "name": "/tmp/tmpwln8_d7p/tmprk7jefgn.png"
            }
        ]],
        "is_generating": false,
        "duration": 9.59670877456665,
        "average_duration": 9.432154195723134
    }
}

Flag is_generating apparently shows whether the result is final or intermediate. And the flag is_file shows if the data is in data (for pictures it would be data-uri) or they must be downloaded separately from the file, the path to which is specified in name.

Well, we take the resulting path, glue it to the endpoint to receive files, and finally get an absolute link to the image of the view:

https://ai-forever-kandinsky2-1.hf.space/file=/tmp/tmpwln8_d7p/tmprk7jefgn.png

Well, if something goes wrong, the server will send a concise:

{
    "msg": "process_completed",
    "success": false,
    "output": {
        "error": null
    }
}

If you imagine that a task can either succeed or fail, then think again. After all, here the task can also be completed and “go to hell”:

{ "msg": "queue_full" }
Leaks in abstractions

Leaks in abstractions

TypeScript API

Well, the protocol is not complicated. Let’s implement it in a generalized form so that models with HuggingFace can be used even with the middle toe of the left foot.

Implementing an adapter to the REST API is as easy as shelling pears:

export function $mol_huggingface_rest(
	space: string,
	method: string,
	... data: readonly any[]
) {
	
	const uri = `https://${space}.hf.space/run/${method}`
	const response = $mol_fetch.json( uri, {
		method: 'post',
		headers: { "Content-Type": "application/json" },
		body: JSON.stringify({ data }),
	} ) as any
	
	if( 'error' in response ) {
		$mol_fail( new Error( response.error ?? 'Unknown API error' ) )
	}
	
	return response.data as readonly any[]
	
}

A little more difficult to implement the WS API:

export function $mol_huggingface_ws(
	space: string,
	fn_index: number,
	... data: readonly any[]
) {
	
	const session_hash = $mol_guid()
	const socket = new WebSocket( `wss://${space}.hf.space/queue/join` )
	
	const promise = new Promise< readonly any[] >( ( done, fail )=> {
		
		socket.onclose = event => {
			if( event.reason ) fail( new Error( event.reason ) )
		}
	
		socket.onerror = event => {
			fail( new Error( `Socket error` ) )
		}
	
		socket.onmessage = event => {
			
			const message = JSON.parse( event.data )
			switch( message.msg ) {
				
				case 'send_hash':
					
					return socket.send(
						JSON.stringify({ session_hash, fn_index })
					)
			
				case 'estimation': return
				
				case 'queue_full':
					fail( new Error( `Queue full` ) )
			
				case 'send_data':
					
					return socket.send(
						JSON.stringify({ session_hash, fn_index, data })
					)
			
				case 'process_starts': return
			
				case 'process_completed':
					
					if( message.success ) {
						return done( message.output.data )
					} else {
						return fail(
							new Error( message.output.error ?? `Unknown API error` )
						)
					}
				
				default:
					
					return fail(
						new Error( `Unknown message type: ${ message.msg }` )
					)
				
			}
			
		}
	
	} )
	
	return Object.assign( promise, {
		destructor: ()=> socket.close()
	} )
	
}

Notice the destructor in the promise. It will be called automatically when the user enters new text and initiates a new call. More about this issue is described in the article: Undo cannot be continued.

Finally, let’s write a wrapper over these two functions, which can automatically repeat requests in the case when the server sends us to fig:

export function $mol_huggingface_run(
	space: string,
	method: string | number,
	... data: readonly any[]
) {
	while( true ) {
		
		try {
			
			if( typeof method === 'number' ) {
				const run = $mol_wire_sync( $mol_huggingface_ws )
				return run( space, method, ... data )
			} else {
				return $mol_huggingface_rest( space, method, ... data )
			}
			
		} catch( error ) {
			
			if( $mol_promise_like( error ) ) $mol_fail_hidden( error )
			
			if( error instanceof Error && error.message === `Queue full` ) {
				$mol_fail_log( error )
				continue
			}
			
			$mol_fail_hidden( error )
		}
		
	}
}

If the method is specified as a number, then there will be a request through the WS API, and if as a string, then through the REST API. Since the function $mol_huggingface_ws we have asynchronous, then it must first be synchronized using magic outside of Hogwarts. And here $mol_huggingface_rest already synchronous, and all the magic is hidden inside, so it does not need additional synchronization.

Now let’s create a more specific function that generates a link to an image generated by Kandinsky based on the passed positive and negative requests:

export function $hyoo_artist_imagine(
	prompt: string,
	forbid = '',
) {
	
	if( !prompt ) return ''
	
	const space="ai-forever-kandinsky2-1"
	
	const path = $mol_huggingface_run(
		space,
		2,
		prompt,
		forbid,
	)[0][0].name as string
	
	return `https://${space}.hf.space/file=${path}`
	
}

And in order to teach her to understand different languages, we will also write a function for translating from any language to a given one through model small100:

export function $hyoo_lingua_translate(
	lang: string,
	text: string,
) {
	
	if( !text.trim() ) return ''
	
	return $mol_huggingface_run(
		'hyoo-translate',
		0,
		lang,
		text
	)[0] as string
	
}

Tadam! All the complexity of using the neurons we need is encapsulated in a couple of functions with simple signatures:

function $hyoo_lingua_translate( lang: string, text: string ): string
function $hyoo_artist_imagine( prompt: string, forbid?: string ): string
I have an internal… um… not… a neuron

I have an internal… um… not… a neuron

Interface

Now watch your hands carefully, otherwise you will miss everything:

$hyoo_artist_app $mol_page
	title <= title_default @ \Artificial Artist
	head /
		<= Query $mol_search
			hint <= title_default
			query? <=> query_changed? \
			submit? <=> imagine? null
		<= Source $mol_link_source
			uri \https://github.com/hyoo-ru/artist.hyoo.ru
	body /
		<= Images $mol_infinite
			row_ids? <=> images? /
			after* <= images_more* /
			Row* <= Image* $mol_image
				minimal_width 256
				minimal_height 256
				uri <= image* \

Here we have created a typical page, where the header contains a search field and a link to the source code of the application. And in the body – an endless tape of pictures with no less than 256 pixels in size.

If this code does not seem clear to you, then try playing with it in sandbox. Well, or, if you are not an athlete, then read documentationwhich, as you know, does not exist.

The app name is automatically localized to the user’s language and used as an example query. Texts in different languages ​​can be put side by side:

{
	"$hyoo_artist_app_title_default": "Искусственный Художник"
}

But if some text does not appear – it does not matter. $mol_locale will translate the English text into the desired language on the fly. And if it does not work, it will show the English text. In extreme cases, it will display a human-readable key.

artificial example

artificial example

Wow, just put the component on the page, and already everything looks neat!

Logics

Now let’s add a little boilerplate, deriving from the component described above:

namespace $.$$ {
	export class $hyoo_artist_app extends $.$hyoo_artist_app {
		// whole logic here
	}
}

The request is synchronized with an unnamed parameter in the url:

@ $mol_mem
query( next?: string ) {
	return this.$.$mol_state_arg.value( '', next ) ?? ''
}

Let’s break it down into two groups of tokens: positive and negative.

@ $mol_mem
tokens() {
	
	const {
		prefer = [],
		forbid = [],
	} = $mol_array_groups(
		this.query().split( /\s+/g ).filter( v => v ),
		token => token.startsWith( '-' ) ? 'forbid' : 'prefer',
	)
	
	return {
		prefer,
		forbid: forbid.map( token => token.slice(1) ),
	}
		
}

Thus, if the user wants the neuron to try so that something does not fall into the picture, then it is enough to describe it in Google Search, preceding each word with minuses:

Full color from

full color image

Let’s go further. Let’s connect each group of tokens with a space and translate into English:

@ $mol_mem
prompts() {
	const { prefer, forbid } = this.tokens()
	return [ prefer, forbid ].map(
		tokens => this.$.$hyoo_lingua_translate(
			'en',
			tokens.join( ' ' ),
		)
	) as [ string, string ]
}

When the infinite list requests more data, we generate a new image for the translated requests and return a link to it as an array, or an empty array if the request was empty:

images_more( from: string | null ) {
	const uri = this.$.$hyoo_artist_imagine( ... this.prompts() )
	return uri ? [ uri ] : []
}

In the window title, we display the name of the application and the original request, if it is not empty:

@ $mol_mem
title() {
	if( !this.query() ) return this.title_default()
	return `${ this.query() } / ${ this.title_default() }`
}

When clearing the search field, we also erase the query:

@ $mol_mem
query_changed( next?: string ) {
	if( next === '' ) this.query( '' )
	return next ?? this.query()
}

When the user presses Enter, we change the query to the one entered:

imagine() {
	this.query( this.query_changed() )
}

Voila! The endless tape is ready:

Evening makeup out of control

Evening makeup out of control

But what about lifecycle hooks, spinners, exception handling, removal of scrolled images and other optimizations? In $mol, we don’t deal with such garbage, but we write only clean and uncluttered business logic. Everything else is automated without reaching the applied logic.

Quite the beauty

It remains only to add a few small touches to statically typed cascading styles for a uniform display of pictures and a loading indicator on any screen size:

namespace $.$$ {
	
	const Frame: $mol_style_properties = {
		margin: 'auto',
		width: '768px',
		maxWidth: '100%',
		height: 'auto',
		aspectRatio: 1,
	}
	
	$mol_style_define( $hyoo_artist_app, {
		
		Body: {
			padding: [ 0, $mol_gap.block ],
		},
		
		Images: {
			gap: $mol_gap.block,
			After: Frame,
		},
		
		Image: Frame,
		
	} )
	
}

Well, it seems there is nothing more to reduce here, so it’s time to try Artificial Artist by ourselves:

Consequences of reading $mol code

Consequences of reading $mol code

Happy doomscrolling!

Post Meta Sarcasm

Along the way, on the same small100, an online translation service was also developed Lingua Franca. Unfortunately, it does not translate very well and not very quickly, because the neuron is spinning on the CPU. But, unlike popular translators, it is much more convenient when you need to check the wording by translating in turn in both directions. But that’s another story..

If this story was not enough for you and you want more hardcore details with different neurons, you can watch the stream with one of the stages of the evolution of the Artificial Artist under the guidance of an artist identical to the natural one:

You can discuss these and our other projects in relevant topic on the Hyper Dev forum. And if you are not afraid of innovations and cooperations – write to me telegrams.


Actual original on $hyoo_page.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *