Hello, today I would like to talk about my way of writing an orchestrator for the UiPath orchestrator. Tell what was the root cause and what it resulted in.
As an introduction, I will tell you about the UiPath company and its main product: the environment for developing and executing robot programs (hereinafter referred to as AWP). Initially, the product was developed as a UI testing tool for both web and desktop applications. The product was not bad, but a small circle of people needed it as a testing tool, then the company’s managers had an idea to sell it as a platform for robotizing business processes. The essence of both here and there comes down to simulating user actions, only in the first case we do this in a test environment, and in the second – in a real system. To manage a bunch of software robots for some money, you buy a license for an orchestrator.
It is worth talking about licensing. The execution environment of software robots was divided into such types as: Attended (runs on the user’s machine, launched through the tray, it is impossible to run through the orchestrator supplied by the company) and UnAttended (most often executed on a virtual machine, it is possible to run through the supplied orchestrator). There are other types of licenses: Named User (execution only under one unique domain record on one workstation) and Concurrent (one domain account can be used simultaneously on several workstations). Some of the above types are combined with each other, respectively, and different prices for them. So UnAttended Concurrent is the most expensive of them, and Named User(Attended) is the cheapest. The difference in price between them quite reached 5-7 times and this is all the price for one workplace. Please note that a license is purchased for a year, but the need for licenses varies throughout the year. Moreover, the need is different and during the week. Since we automate user actions, we most often launch robots at this time (from 9 to 18). Accordingly, if we count only working hours, then the robot works 50% of the time, and the rest is idle, but the counter, like in a taxi, is ticking.
Now let’s talk about the supplied non-free offline orchestrator, which was a web application based on IIS and MSSQL Server. At the time of writing my orchestrator, the functionality of the supplied one was very limited. There was no trigger for external events, we had to write a robot that would periodically run and check for the occurrence of these events. The launch of robots was possible only by time or the appearance of elements in a particular queue. There was also a “pseudo AI”, which analyzed the number of elements in the queue and the time by which the work must be completed, but this functionality was irrelevant during “calm periods” and did not work during “peak periods”, since the calculation was based on the average execution of the element queues for the entire previously used period. That is, if usually the execution of a queue element took 1 minute, and in peaks 10, then the average could be 2. For all robots, regardless of the type of license, the orchestrator acts both as a system for collecting and storing logs and as a queue base and license storage.
Taking into account all of the above, it was decided to write your own orchestrator on top. A little offtopic: in robotics circles, everyone has a process called Bender. In our case, it was a self-written orchestrator that pulled the WebAPI methods of the regular one, thus completely controlling it.
First of all, it was necessary to force the licenses to run without involving the orchestrator. Responsible for this uirobot.exe and the arguments passed to it.
Secondly, you need to make the desktop active. For this purpose, as a temporary solution came up: TightVNC. To activate the desktop after regular reboots, a robot was written that works through images, since there was no time to deal with WinAPI at that time. Here, many will say, what if the robots work simultaneously on the same Windows Server virtual machine under different KMs. We moved away from this practice, as the behavior of running processes sometimes became unpredictable.
To be able to run remotely, a service was written in C # using WCF and basic authorization wrapped in a Windows service, a previously written template was at hand. At this stage it became clear that uirobot.exe must be executed with a desktop handle, otherwise the running robot does not see network folders. Services checkbox: “Allow desktop interaction hasn’t worked since Windows 8”. I had to pick WinAPI.
Robots are developed in a special studio: UiPath Studio, which is also not free, based on Microsoft Workflow Foundation technology, or block programming. One of these blocks: “Should Stop” in UiPath is responsible for informing the running process about the need to reach some logical point and stop. This component does not work on Attended licenses and, accordingly, I had to write my own component that implements this functionality and a WCF service as part of a new orchestrator. UiPath had a good soap component and was good for debugging purposes, so WCF was partly used.
Let’s move on to the most difficult part, the implementation of the orchestrator. Since there were different types of licenses, UnAttended licenses were started and stopped by sending commands to the regular orchestrator. Attended licenses were managed through a self-written WCF service. There was no time to implement the front, so all management was done by changing several PostgreSQL tables. This database was chosen because it is free, there was a good experience with it and the regular ability to store several values in one cell. DBeaver played the role of the front, and very successfully. The handicraft orchestrator used only about 10-15 commands of the regular orchestrator: obtaining an authorization token, a list of queues and the number of elements in them, a history of running processes for each robot (10 recent events), matching environments (Environment) and processes (ProcessName). The latter is necessary, since to start it is necessary to specify a bunch: Enviroment_ProcessName. First, the self-written orchestrator received all the above information from the regular one, then went through the process table in the postgresql database and looked: at the queues, start and stop times, and selected the appropriate ones. First of all, tasks had priority only by launch time, then by time and items in the queues. Analyzed idle workstations for extreme reboots and, if necessary, rebooted. Empirically, it has been found that rebooting every 24 hours has a beneficial effect on the stability of the behavior of Windows machines. Then he ran through the workstation and gave a command to stop those processes that work more than they should. Then, I analyzed the number of already working on this process with the maximum allowable number, so that there would not be a situation when all the robots are busy with one process. After that, he already made a start. In order not to overload the regular orchestrator with requests, the personally written one was launched every: 2.5 – 5 minutes, and this was enough, since it allowed to minimize idle process launches. To get all the information from the regular orchestrator for all queues and workstations, it was necessary to call about: 500-1000 requests. All running processes were kept in one postgresql table, after they were stopped, they were transferred to the historical table and used later for analysis.
If the process ended with a Failture error (it crashed with an Exception), then this machine was rebooted, and another of the available free ones took its place.
If the process was terminated with the Stopped or Successful status, then it was no longer launched during the interval specified in the table, since it was believed that the restart would be idle. Attended licenses, by the way, could not have the Stopped status.
It was possible to build chains of processes. When, after a complete stop of all machines in the first process, the second one was launched. It was used when different processes should not intersect, as there was mutual influence in the user program.
As a start or stop for the process, it was possible to specify not one, but several non-empty queues, or vice versa, when all of the specified ones should be empty. Relevant for large processes where several queues are used in series, but we want to break the work during the day into several steps. Otherwise, it is necessary to divide the entire program code into these steps, which complicates further support.
To start the work of a process, it was possible to add an external trigger: for example, the status of a foreign system in its database.
Added blocking of launch by process name pattern on a certain date or day of the week. It is relevant when holidays fall on weekdays (these days more often the robots did not work). When we have large processes that still need to be split into several software robots, and then they have a common name prefix.
For running processes, I checked the time of the last entry in the log, if it exceeded a certain value specified in the settings, then it was considered that the robot hung and this process was killed with a further reboot of the virtual machine.
Also, for processes, it was possible to specify the time after which it was necessary to stop ‘kindly’, and if the process did not stop after some time, then the process was killed ‘badly’, as in paragraph 6.
The main part of the project was written in 1.5 – 2 months. The stability of the work of this orchestrator was in the region: 99.99%. At the same time, despite the addition of another node (self-written orchestrator), the overall stability of the system has increased. The self-written orchestrator made it possible to prioritize processes more correctly, combine them better, and minimize idle launches. At the time of implementation, it allowed to reduce the need for licenses by 15%. I could easily manage 20 licenses at the same time, there was simply no more.
The main question is: How much was all this legal. I am not a lawyer, but in the license agreement I did not see any clauses about undocumented features or a ban on the use of certain licenses in another way.