How I wrote my search engine to quickly find personal information

background

It all started with the fact that it became difficult for me to find the necessary information, files. The more files and folders I had formed, the more time it took to find the right one. I realized that every time to search in endless lists of files and folders, especially with a nesting condition, this is not an option for large amounts of data.

As for the search by file name, the number of characters specified in the name is limited and the search words must be in a strictly defined sequence. Moreover, if the system indexes other files that are not needed for file search (system files, project files), then the search produces a lot of “garbage”.

Searching the contents of the file does not give the most relevant result. May produce useless results with content containing keywords but not related to what is actually needed to be found.

Moreover, only text files can be searched by content.

Information content structure

The folder structure is represented as a tree. I don’t like it, because each folder can only contain certain files, if copying and links are not taken into account.

It can also be imagined with a real life example, in order to find a green fresh apple variety “Virgin”. It is necessary to find a department with fruits, then a department with apples, then we look for green ones, then a variety, well, there are still fresh ones, not fresh ones, in this imaginary example, and finally find the right apple.

Everything is further complicated by the fact that I don’t remember if there are apples at all, and if so, whether they are stored in the fruit section.

And why not just ask a henchman about it (everyone already has them, right?) – Bring me a green fresh apple».

How convenient it becomes!

In general, with all this I want to say that searching for the necessary information in folders is good if there are few folders and if you remember which folders exist, and not sort through everything.

But if we don’t know if apples exist at all, then we ask the henchman:

– Are there any apples?

– Yes, sir! Hundreds, toy, red, rotten….

– I need a fresh apple.

– Understood! There is a red fresh apple “Sirota”, a red fresh apple “Apricot”, ….

– And what about a green fresh apple.

– There is! Green fresh apple “Pooh-tibiduh” and Green fresh apple “Virgin”.

– In that case, bring me, perhaps, Green fresh apple “Virgin”.

– Yes, sir.

Here is the last phrase just the same and became the name of the application. As a response to the user’s command – “Yes Sir”.

Back to apples. Have you noticed that in the first case, you need to look for apples, don’t understand where, and in the second, we set clarifying conditions for the request?

To find the desired result, using a tree structure (folders), you have to bypass all the nodes.  And in the case of a graph (tags), you can get a result, at best, for passing through a single node.
To find the desired result, using a tree structure (folders), you have to bypass all the nodes. And in the case of a graph (tags), you can get a result, at best, for passing through a single node.

Let me give you a more realistic example. There is a folder with music and subfolders for division into genres. But what if at some point I want to listen to French music, regardless of the genre. This is where the whole problem of the tree structure of folders comes out. You can, of course, as advised on the forums, create separate folders for the language of the work and throw links, but again folders …

But what happens if each file is tagged with a genre, language, and of course that this is music, a song.

In this case, it is possible to group, sort music much more flexible. For example, by combining 3 tags: French, Russian, rock, you can get something that is not possible with standard Windows tools, or I don’t know something.

Trying to find a solution

The first idea was to use the “tagging” of files, folders. Thus, you can search for information by combining tags, regardless of word order. And the best applications for this are XYplorer and Tagging for windows. The first one is a separate file manager with a tagging option. The second application is an addition to the standard file manager. However, they allow you to search for files only on a PC and, of course, you cannot write a query close to the user like in Google search engine, and the algorithm would already select tags from the query and sort the information by priority. Subsequently, I deleted both, they often hung and crashed (maybe it was my Windows add-ons, I don’t want to do anti-PR for these excellent programs).

visual search

In an attempt to find the best way to search, it got strange. I was more visual and therefore uploaded images more or less relevant to the topic of information to the social network VKontakte, and saved the information itself in the comments under the images. This gave some increase in search speed and you can use it from any device. But as you probably understand, this could not last long. In the end, I began to think about what information this image refers to, on which the rails mean the addresses of friends or desired places to travel … Well, and the fact that under one image a footcloth is formed from information without the possibility of nesting is a fiasco, bro .

Desired functionality

I thought it would be great to develop an application that would meet the following criteria:

  1. Can be used from any device without internet connection.

  2. Search for personal information as quickly as possible.

  3. Search should be as simple as Google Search.

  4. Ability to save all text information to a text file.

Choice of technologies

1. According to the first point of desire, it was decided to develop a web application, since any device that has a browser can access it. The data is stored in the browser’s localstorage, but when the site is opened, it is immediately uploaded to a variable for better speed.

To synchronize data with another device, a browser, I took the mysql database from 000webhost for free, but then stopped using it due to volume restrictions. Right now the only way to update user data is to import and export the file. However, I do this very rarely, because. I mostly use my smartphone. As for offline mode – I used serviceworking. It is only necessary to visit the site once so that all the site resources are loaded and then used completely offline from the browser.

2. Quick search.

Since the search should be carried out like a Google search engine, then it is necessary to check each word from the request for an existing one from an already created block of information. An object with keys acts as such a block for me: a unique block name, action (show information, open a link …), content, tags.

So, according to the “tags” key, we will store an array of characters (words) for a specific block of information.

Let’s take a block as an example.

Title: how to create a website.

Action: show information.

Content: take html, add js and decorate with css.

Tags: website development, web programming, layout.

An array of tags is formed from the texts received from the input fields for tags and titles. Each word is a tag, separated by a comma and a space. There was an idea, of course, to make tags like phrases like on YouTube, but I decided to focus on a wider issue by keywords. From the block example above, the tag array would look like this: [“как”, “создать”, “сайт”, “создание”, “сайта”, “веб”, “программирование”, “верстка”].

Now the most important thing is to decide how the search will take place. The first thing that came to mind was to take each word from the search query and compare it with each word from the tag of each block. As it came and gone, it’s a disgusting idea. The next idea was to create an object in which each tag is a separate key, and the value is an array of block indices.

3. So, when entering a query, it checks if there is a word in the tag storage, if so, then the block is added to the array for display. Now you need to sort by priority. The higher the result in the search results, the more it matches the query. I implemented this using the number of keywords in the request, the more words from the request are contained in the array of block tags, the more the block has priority.

4. And about saving to a file, very briefly. You can save and import the file as json. Also, my experience with using VKontakte as an image search engine gave me the idea to be able to add an image to each block if desired.

Results

As a result, I did what I have been using for more than a year. Both the web and PC versions have been very helpful. I use for work and personal life. The search speed that I eventually got helped me out many times when I needed to find something very quickly.

Branch to other projects

I liked the web application so much that I wanted to write a program for executing programs on command from a user request on a PC. Inspired by voice assistants, I created a program that searches for and executes files. And the search, respectively, is also similar to a web search engine. The peculiarity is that you can drag the file / files directly into the program and the algorithm will automatically set the tags based on the name of the file and the folders in which it is contained. But that’s a topic for another post if this one is of interest.

Afterword

I will be glad to any comments. Get your opinion on the idea. Is it complete nonsense. Or, which I almost have no doubt, there are already applications with a similar implementation.

Thank you!

Similar Posts

Leave a Reply