Face recognition on microcomputers

As a result, we have a solution that perfectly performs the tasks of working with vectors, but that’s all, if you want to pull meta data, make updates, etc., you will need to tinker a little. However, the pros here clearly outweigh the cons.

In addition, I would like to recommend a repository on GitHub [benchmark]. This is a benchmark that contains tests of various ANN databases for storing vectors, on different datasets with descriptors. Here you can estimate the desired quality, consumed resources, select parameters for indexes, etc. Overall, a very useful and convenient thing.

Let's return to point 2. With storage in some kind of database and brute-force calculation with hands. What can I say, it will work, but up to a certain point. The problems of a complete search, I think, are obvious and it’s not worth focusing on them. In addition, there is also suboptimal storage of the vectors themselves. As a result, if you have very few vectors and you know for sure that there will be no more of them, then this solution may well be suitable, since the vectors can be enriched with meta-information, and the dependency can simply be removed from the software.

And the last option is to use PGVector (an extension for Postgres) or ClickHouse. The advantages are obvious here: use the advantages of OLTP/OLAP, do not expand the technology stack, since you may already be using one of these databases. But here it is obvious that this is a good option for a server solution, or for one where you can easily raise these databases. For microcomputers this is still too resource intensive.

Another disadvantage of this solution is only their initial development. For us, this means that standard techniques from the same Faiss may not be present here. For example, for ClickHouse, indexing for descriptors is at the experimental stage, which imposes additional restrictions on use in production. PGVector is doing better, it has already implemented support for the main indexing options, but the speed of work leaves much to be desired, and there are also very interesting benchmark options, when the first link says that PGVector is the best solution for vectors, and the next link states absolutely the opposite. When analyzing, I was guided by the benchmark above (at least there is code there that you can look at and, if you wish, run everything in Docker) and there the data I received about the speed of work is confirmed.

What can we conclude from vector databases:

If there are few descriptors, there will be no more of them and we don’t want to drag out an additional dependency – database (for example, SQLite) + “manual” calculation of distances;
If you need to store a large number of vectors, do it optimally and conduct a quick search, then ANN vector databases. For example, Faiss;
If you are not particularly limited in resources and/or are already using Postgres/ClickHouse, then take a look at their solutions for storing vectors. They may be inferior in some respects, but ease of use, enrichment with meta information and the absence of the need to add new technology are definitely important advantages and at least a help for tests.

Implementation

Here we suggest looking at the code.

Conclusion

In this article, we examined the approach to building a baseline recognition system on microcomputers, determined the pipeline for implementing such a system, and where we can find models that meet our criteria.

I would like to note that this is a basic option, which may be good for a start, since fairly current models and approaches are used. Naturally, the result can be improved by retraining the models on your data and/or adding new functionality to the existing solution. For example, spoofing models, re-identification by external signs, and this is not a complete list of possible improvements.

Write your questions, suggestions and ideas in the comments!