Local REST service, or how to integrate Smart ID Engine in 5 minutes

We are in Smart Engines We deal with document recognition systems and, in particular, we understand a lot about the recognition of identity documents. We deliver the system in the form of an SDK – a set of libraries and software interfaces that allow the customer to implement recognition functionality into their infrastructure on the customer's side.

In general, we are rather supporters of native integration. The libraries of all our products are developed initially in C/C++ and their main interface, which includes the most complete set of functionality, is always in C++. We are used to this, and therefore, when the need arises to create some simple example of integration, we reflexively want to just write a short program in C++ (you can see this, for example, in how we described the creation of a simple DOCX template engine). Our clients use a set of wrappers we provide to more easily integrate with their systems: mainly automatically generated ones using SWIG wrappers for Java, C#, Python and PHP.

However, integrating a document recognition system as a native library with a direct program call is not always the most optimal way, for several reasons:

Firstly, to solve simple problems, such as banal extraction of a full name from a file with a scanned passport on the desktops of customer service points, you want to use the simplest possible integration. The fact that such simple tasks often arise is evidenced, for example, by the fact that some clients asked us to provide a simple console application or service that would “monitor” a directory and, when a new file with a document image appears in this directory, would recognize and record the result into some table.
Secondly, sometimes within the framework of one information system (or within one automated workstation), the functionality of document recognition (the same or different) needs to be used in several applications at once, and, accordingly, there is a desire to organize access to the recognition system as a some local service.
Thirdly, in more complex cases, the simplicity of the integration interface makes it easy to carry out prototyping in order to work out changes in business processes that are inevitable with automation.

All this led us to create an option for user interaction with the recognition system as a simple local REST service.

Service API Description

The REST API service for the identification document recognition system, Smart ID Engine, is designed as simply as possible. It has three groups of methods:

1. Service and information methods:

(GET) /doc – service documentation in HTML (Swagger)
(GET) /help – service documentation in JSON form (OpenAPI)
(GET) /diagnostics – information about the service status in the form of JSON
(GET) /version – information about the Smart ID Engine version

2. Simple recognition of one image:

(POST) /api/recognize/simple : performs recognition of the image specified in the body of the request (JPG and PNG images are allowed, as well as a single-channel or RGB buffer of 8-bit pixels encoded in base64)

As parameters of the recognition method, be sure to set the recognition mode mode (default defaultbut depending on the delivery, automatic document type selection modes may be available – anypassport (passport of any country), anydoc (any identification document of any country), anyrus (any document of the Russian Federation), etc.) and a mask of document types mask (For example, rus.passport.national to recognize a Russian passport, or an asterisk * to include all documents within the current mode).

There are also a number of optional Boolean parameters that affect what information will be packed into JSON with the document recognition results (for example, the flag get_img_fields regulates the return of graphic fields, flag get_field_attr regulates the return of additional attributes of details, etc.).

The result of the call will be a JSON object with the results of document recognition. For example, this is how the response comes when recognizing a test image of a Russian passport in a mode where only text field recognition results are required:

{
  "response": {
    "recog_res": {
  	  "props": {
        "authority": {
      	  "confidence": 0.9933671355247498,
      	  "is_rejected": false,
      	  "value": "ОТДЕЛОМ ВНУТРЕННИХ ДЕЛ ОКТЯБРЬСКОГО ОКРУГА ГОРОДА АРХАНГЕЛЬСКА"
    	},
    	"authority_code": {
      	  "confidence": 0.9970306158065796,
      	  "is_rejected": false,
      	  "value": "292-000"
    	},

    	// ... часть полей скрыта для краткости ...

    	"name": {
      	  "confidence": 0.999942421913147,
      	  "is_rejected": false,
      	  "value": "ЕВГЕНИЙ"
    	},
    	"patronymic": {
      	  "confidence": 0.9998761415481567,
      	  "is_rejected": false,
      	  "value": "АЛЕКСАНДРОВИЧ"
    	},
    	"surname": {
      	  "confidence": 0.9999336004257202,
      	  "is_rejected": false,
      	  "value": "ИМЯРЕК"
    	}
  	  },
  	  "type": "rus.passport.national"
	}
  },
  "status": "ok"
}

3. Methods related to session recognition.

Session recognition is a mechanism that allows you to recognize documents represented by multiple images. If the document being recognized contains several templates located in different images (for example, separate scanning of the front and back sides of an ID card), the session mechanism allows you to recognize each of the templates and combine them into a single document recognition result. And if we have several different images of the same document (for example, if we asked the user to photograph the document several times, or if we have several frames of a video sequence), then session recognition allows us to improve the quality of recognition due to inter-frame combining .

This group contains methods:

(POST) /api/recognize/session/start – creates a new recognition session, accepts required parameters mode And maskwith the same meaning as /api/recognize/simpleand returns JSON with the session ID
(GET) /api/recognize/session/list allows you to get a list of all active sessions
(POST) /api/recognize/session/{id} – within a session with a given identifier, recognizes the given one (as well as in /api/recognize/simple) in the body of the request is an image and returns the current accumulated result of the session. Accepts the same set of output modifiers as /api/recognize/simple.
(POST) /api/recognize/session/{id}/stop – stops and deletes a session with a given identifier
(POST) /api/recognize/session/{id}/reset – returns the session with the given identifier to the initial state (which was at the time of its creation)
(GET) /api/recognize/session/{id}/result – returns the result of recognizing a session with a given identifier. Accepts the same set of output modifiers as /api/recognize/simple.

Usage example

Let's consider this simple example of use: you need to make a local HTML page onto which you can drag a file with a photo or scan of a passport so that the program reads your full name and displays it on the same page, so that the user can copy it. This “toy” case demonstrates well the possibility of integration with a local REST service.

First, let's install and launch the service itself. We perform the test on a machine running Arch Linux; for Linux, the Smart ID Engine REST service is supplied as a systemd daemon. After we unpacked the archive and executed the script install.shour service is available idengine_rest_daemon:

$ sudo systemctl status idengine_rest_daemon
● idengine_rest_daemon.service - idengine rest api daemon
  Loaded: loaded (/etc/systemd/system/idengine_rest_daemon.service; enabled; preset: disabled)
  Active: active (running) since Fri 2024-04-26 09:37:51 +04; 6s ago
Main PID: 1482719 (idengine_rest)
   Tasks: 9 (limit: 77030)
  Memory: 29.3M (peak: 30.8M)
     CPU: 22ms
  CGroup: /system.slice/idengine_rest_daemon.service
          └─1482719 /home/user/install-directory/idengine_rest ...

Apr 26 09:37:51 hostname systemd[1]: Started idengine rest api daemon.
Apr 26 09:37:51 hostname idengine_rest[1482719]: Version: 2.3.0
Apr 26 09:37:51 hostname idengine_rest[1482719]: Bundle path: /home/user/install-directory/...

By default, the service listens to port 8082 (of course, these settings can be changed in the service configuration file before starting it). We check that the service is running:

$ curl localhost:8082/version
{"response":{"drogon_version":"1.8.4","idengine_version":"2.3.0"},"status":"ok"}⏎

The versions of the Smart ID Engine recognition engine are displayed, as well as the version drogon libraries – a C++ library we liked, which is used inside the binary to wrap our engine in the form of a REST service.

Now, in fact, you can move on to creating the HTML page itself. Let the drag-n-drop target when handling the event dropretrieves the file from the event and calls the function processFile(file):

processFile = async(file) => {
  // Проверяем расширение файла и запоминаем Content-Type для файлов изображений
  content_type = null

  if (file.name.toLowerCase().endsWith("jpg") ||
  	file.name.toLowerCase().endsWith("jpeg")) {
	content_type = "image/jpeg"
  } else if (file.name.toLowerCase().endsWith("png")) {
	content_type = "image/png"
  } else {
	displayError("Можно только JPG/JPEG и PNG")
	return
  }

  // Отображаем индикатор того, что мы начали работу
  displayThinking();

  // Создаем запрос на распознавание изображения в умолчательном режиме,
  // с поддержкой всех доступных типов документов и с возвратом текстовых полей
  const response = await fetch('http://localhost:8082/api/recognize/simple?mode=default&mask=*&get_text_fields=true', {
	method: "POST",
	headers: {
  	  "accept": "application/json",
  	  "Content-Type": content_type
	},
	"body": file
  });

  // Ожидаем выполнения запроса
  const responseJson = await response.json();

  // Отображаем результат распознавания либо ошибку
  if (responseJson.status == "ok") {
	displayResult(responseJson.response);
  } else {
	displayError("Ошибка обработки картинки");
  }
}

Actually… everything. All other functions relate directly to layout and the display of certain lines in HTML page elements. If you add layout to this code, you will get a page like this:

Instead of a conclusion

Described product integration option Smart ID Engine is intended for use specifically as a local service (within one system). Here we remain true to our principles of the advantages of native integration. REST API as a way to communicate with the recognition system has the advantages of ease of interaction, but you need to remember that when working with sensitive documents, you need to think every time about how to secure the data as much as possible. That is why we continue to produce libraries and recognition systems, and are not going to create a recognition service that receives and transmits document data via any communication channels.

We consider data security to be perhaps the most important aspect of working with identification documents, and we encourage everyone to build their decisions in such a way as to minimize any risks associated with transmitting images or recognition results anywhere. But if you need to implement convenient and efficient processing of document images locally (for example, on a desktop at a customer service point), the REST API may be the best way to integrate our product.

Thank you for your attention!