How to create a prototype document comparison service in 28 hours and win a hackathon

Hello! After a long break, we decided to return to Habr and want to share the experience of participating in the hackathon. In September, Moscow hosted Diversity.Hack, partners of which were Dostavista, Waves, and we, New Cloud Technologies. Participation in a hackathon organized by Phystech.Genesis, a project of the MIPT Business Incubator, was the first and at the same time successful experience for our company. But let's not get ahead of ourselves – let's see how everything went and what interesting things the participants managed to come up with.

How was Diversity.Hack

208 people from 44 teams gathered at the GrowUp coworking hackathon, which was a record for the organizers. About how teams were formed, details can be found in the article by Phystech.Genesis. Participants solved one of the three proposed tasks from each company a little more than a day, having the opportunity to stay overnight right at the venue and with food from the organizers. The prize fund of 300,000 rubles was equally divided between the tracks. In addition to generating ideas and coding, participants could attend intensives on how to win at hackathons.

At the beginning of the event, all partner companies described the conditions of their tasks – before that, only general wording and a set of technologies that the participants had to use were known. The teams distributed the time for the development of the prototype at their discretion, taking into account the main points: market analysis of existing ideas, brain storm, identification of the key features of the prototype, roles within the team and coding.

Among the participants were both students and experienced specialists, very promising guys met. As a result, we and other partner companies managed to draw new ideas. For example, Waves proposed developing a mobile application using the Ride language and the Waves Platform. For the Dostavista delivery service, teams developed a system for optimizing courier routes.

Under the guidance of our colleagues from the New Cloud Technologies, it was necessary to come up with a convenient tool (algorithm, UI) for analyzing changes in documents when comparing two or more versions.

Why exactly this task?

Since 2013, our company has been developing MyOffice, a set of office applications that runs on all major operating systems, including mobile. Initially, we had 7-8 job options, and in order to choose one for the hackathon, we even staged a private ballot inside the company. Among the options were, for example, comparing two or more documents (development of the technology itself and UX), automated comparison of the quality of rendering documents (using the example of export to PDF from two different editors), there was also an idea related to the analysis of fonts in a document. We also wanted to give a task to collect information from chats about meeting dates for their automated addition to the smartphone’s calendar (interesting for, for example, writing bots in Telegram) and several others.

As a result, we settled on the task of comparing two or more documents. Globally, the idea is that using an office suite on their computer, different employees can make their own changes to the same source file. You need to create a technology that allows you to comfortably compare several versions of the same document, and then accept one of the proposed edits. The edits have yet to be checked for contradiction with each other, after which a single final document will be obtained.

It would seem how such a task can be relevant at all? Various clouds are now available in which the entire history of changes is displayed – who edited what, when. However, many companies do not allow documents to be stored in the cloud for security reasons. For this or any other reason, edits to the file can be made offline, without using a cloud, and then this or that version of the document is usually sent to colleagues by mail. Edits can be made by more than one person, and as a result, there are so many versions of the document that confusion inevitably arises.

So we are faced with the need to use a tool to simultaneously compare more than two documents. But bad luck! There simply weren’t any solutions to this problem on the market, there are only services for comparing two documents, and the existing choice does not shine with variety. So, there is ABBYY Comparator, which can compare documents not only in text format, but also PDF, scans and photos. The disadvantage of using this service may be its cost. Another service is Text Compare! – and similar ones allow you to simply insert text into two special windows. And this greatly limits the possibilities – even files cannot be downloaded, but only Ctrl + C and Ctrl + V!

Our company decided to seek a remedy for the “pain” from talented developers from all over Russia, removing the restriction on the format of the document. We proposed using familiar html – the tree structure of the document is conveniently represented in it. All participants responded to this offer. We were interested in getting the comparison technology itself, as well as a convenient and interactive interface.

Winning Team Decisions

On Habré there are many interview articles about how participants built a workflow during a hackathon, how they managed to win. Eight teams solved our problem, three of which won prizes: the first took Zendocssecond shared Serotonin mix and Segfault. We selected the winners according to the following criteria: user-friendliness of the user interface (its interactivity and minimalism), the quality of the various cases in this interface (how the comparison of tables – structural changes and text changes inside, diagrams, images will look like), the comparison algorithm itself (time complexity, ability to recognize different types of edits). A good working prototype was considered a big plus 🙂

The interface solutions for most teams turned out to be largely similar. Perhaps this suggests that the interfaces turned out to be user-friendly if all the developers came to the same thing. The detailing of custom cases and the quality of the prototype differed – someone managed to do more, someone less.

Nevertheless, after the presentation of solutions, the favorite was clearly defined – he turned out to be a team Zendocs, which emphasized the algorithmic part. They conducted an effective study of the available approaches and found a scientific article with an algorithm for comparing xml trees. This algorithm scales well to N documents: if you compare documents in pairs, as many other algorithms do, the complexity will grow polynomially in the number of documents, and with this approach it increases linearly. The algorithm returns the id of the tree elements that have changed, and the type of change is insert, delete, replace.

It is convenient to give such a response format to the frontend – you do not need to further process it. Pictures, rows and columns of the table are ordinary structural blocks of the document, the same as paragraphs, so this algorithm allows you to detect changes in the picture, the structure of the table or its contents, that is, it copes with many user cases, and this is an important evaluation criterion. The team proposed further optimization of the algorithm using the Merkle tree. Such optimization will allow checking only those subtrees for which the hash has changed, which speeds up the algorithm. The prototype is available at: https://zendocs.ru

Choosing the second and third places was more difficult, because three worthy decisions claimed for them. In the end, we settled on SerotoninMix and SegFault. Some of them were better at interactivity, others at navigation, and this is a normal situation: in two days it is difficult to develop an ideal solution. Some teams almost completely implemented the algorithm on a simple model, among them were ZenDocs and SerotoninMix.

We singled out the latter, including for their creative approach – appreciated their humor (they called their project “NeMoy office”, and there were a lot of fun moments in their presentation) and were impressed by the fact that they managed to make a full-fledged prototype.

The guys at SegFault have found a very original approach. When developing the prototype, they used Vue.js; the server was written in Python using Flask and Docker, used algorithms Word2vec and Crochemore. After reviewing the algorithms a bit, the participants compared the main document with all the others, highlighted the common and different parts. In the created environment, there is an editing block where fragments from different documents are displayed. You can select one of them and edit it if necessary. In the control panel, you can accept the edit or view previously accepted ones. The team also compared images, comparing them as base64-encoded and converting the image to a format base64, and worked out a method for comparing tables and displaying changes in them.

There were few designers at the hackathon, and our task for a large percentage consisted of prototyping the interface and creating layouts, which involved the use of tools such as Sketch and Figma. The Talestorm team sunk into our hearts, one of the participants of which learned how to work in them from scratch just overnight.

Post hackathon interaction

At the end of September, a meeting of winners, representatives of the New Cloud Technologies, including CEO Dmitry Komissarov, and Phystech.Genesis took place. At the meeting, they discussed the tasks of the company in which the winners of the hackathon can take part. We look forward to working with teams!