Mixing files into tags

A long, long time ago… I was looking for a way to organize video files.

So that you can assign some markers, tags to each movie. Select files by tags. Or vice versa – by the absence of tags. For example: detective, fantastic, non-horror.

To make everything work in a regular player: clicked “Open file …”; chose Detective, Science Fiction, Non-Horror; got a list of movies and chose the right one. Without special file managers and browsers.

And then I did not find such a tool. And now … now I decided to make it myself.

What happened as a result can be seen at the links: demo video: https://youtu.be/Pi3Cdxc5FV0code: https://github.com/evgenykislov/tagvfs

Next, there will be an analysis of how this was done, what subtleties were found out and questions arose. Since this is still a tutorial, there will be a description of basic things and, in addition to direct paths, there will be auxiliary descriptions – perhaps they will be useful for understanding.

General Implementation Issues

If you look for a suitable technology that allows you to organize files “in your own way”, then very quickly you come across three letters vfs: virtual file system. According to the descriptions and capabilities, this technology fits the task very well. In addition, it is “local” – no external servers, clouds, etc. are needed for operation. And “lightweight” – for implementation, it is required to make a kernel module, data can be stored in a file with a simple structure, no DBMS, etc. .. The disadvantages of this approach include the fact that it will be a kernel module: each computer will need to build its own version with the installation of the compiler, header files, etc.

Ok, let’s try to make our own virtual file system. And let’s call it tagvfs.

First of all, it was necessary to decide on the way to store and present video files in tagvfs (this is the virtual file system being created). There are several considerations here:

Firstly, I categorically did not want to save copies of files in the tagvfs itself: it takes up a lot of space, creating a copy will take a significant amount of time (video files are usually heavy). Also, this method of storage limits the work with files in the “normal” mode: after all, not only video files, but also some other documents can be stored. Documents can be edited; store together with other documents (for example, the original document and its export to pdf), etc. Therefore, it is desirable that the files have their own “permanent” storage location in the normal file system, and tagvfs would display these files.

Secondly, it is quite acceptable to read external files in your virtual file system and issue them as your own. This approach even has some advantages. For example, the ability to implement operations from the category of “abnormal programming”: access to files bypassing rights, modifying data when reading / writing, blocking certain areas, etc. can read, write, etc.). And, of course, all these operations on files need to be implemented in code.

Thirdly, there is an option with symbolic links to files. For managing video files, this is a very good option. Of the minuses of the approach – some inconvenience when adding files to tagvfs: you have to make symbolic links instead of copying. Otherwise, there are only pluses: there is no need to interrogate the files yourself and store information about them (at least the presence of files and their size). And Linux also handles file permissions.

Ok, the implementation of tagvfs is done through the management of symbolic links to files.

Kernel module (for version 5.10)

Kernel modules are specially designed code that the operating system kernel loads and runs.

Programmatically, a kernel module must contain an initialization function and may have a termination function.

Functions can be defined in two ways:

  1. Through macros:

static int tagvfs_init(void) { ... ; return res; }

static void tagvfs_exit(void) { ... }

module_init(tagvfs_init);
module_exit(tagvfs_exit);
  1. Explicit declaration of functions:

int init_module(void) { ... ; return res; }
void cleanup_module(void) { ... }

In general, the first way (through macros) is the recommended one. makes it easier to build a module into the Linux kernel (if your module ever gets into the kernel sources). In our case, the module will be dynamically loaded and the difference between the two methods is insignificant. But we do what we recommend.

There must be one initialization function per module. Of course, if you do not specify it at all, then the module will be compiled, but it is not clear what to do with such a module. There must also be one termination function per module; or it may not be present if resource cleanup operations are not required. Moreover, if the module is compiled into the kernel, then the termination function will not be called at all, even if it exists.

The initialization function returns the result of its work through int. In this case, everything is pretty obvious: a value of 0 means successful initialization, a negative value indicates an error. It so happened historically that in many functions the return value, in addition to the result (which they try to make non-negative), may also contain information about the error. Things get a little more complicated if the function returns an address. The address can also be converted to a signed number and checked for negativity, however, even a valid address after conversion can produce a negative value. Therefore, not any negative number is taken as an error (both in the case of an address and in the case of an int value), but only from -4095 to -1. (WARNING: This is for kernel 5.10).

Of course, there are macros for working with such values:

IS_ERR_VALUE, IS_ERR – for checking integers and addresses;

ERR_PTR, PTR_ERR – for converting an error code into an address and vice versa.

More information about macros can be found here: https://elixir.bootlin.com/linux/v5.10/source/include/linux/err.h .

For the case when 0 is returned on success (and there are many such functions), perhaps the simplest error check is used:

if (res) { ... обработка ошибок ... }

As for the return value itself in case of errors, it is recommended to use the regular kernel constants corresponding to the error that has occurred, and not to invent your own values. This is due to the fact that Linux, in addition to the error code, will also display a text transcript, and if, for example, a programmer returns -1 for any error, then the user will look for a more powerful sudo utility, because. will see something like:

insmod: ERROR: could not insert module tagvfs.ko: Operation not permitted

Frequently used constants (with minus sign): -ENOMEM, -EINVAL. And the list can be seen here: https://elixir.bootlin.com/linux/v5.10/source/include/uapi/asm-generic/errno-base.h .

In the module code, you can make a number of declarations that speak about the license of the module, its author, contain a description, etc. Of course, they can be omitted. However, at the compilation stage, you will definitely be reminded of the missing license. And in the kernel there is a function license_is_gpl_compatible, which, by the name of the license, determines how GPL it is. In general, it is better to indicate the license, but the author with a description will not be superfluous. All this can be set for example like this:

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Author");
MODULE_DESCRIPTION("Description information");

The peculiarity of these macros is that you can specify several of them for one module: you can even specify several licenses (and which one will be used in the kernel?). Moreover, this is the recommended way to specify multiple authors: each is specified in a separate macro.

The specified information can then be read through the modinfo utility and get something like:

description:    VFS module to organize files by tags
author:         Evgeny Kislov <dev@evgenykislov.com>
license:        GPL
name:           tagvfs
vermagic:       5.10.0-23-amd64 SMP mod_unload modversions 

In addition to the data specified in the code, there is another very important information line: vermagic. It contains the kernel version for which the module is built.


Continuation – it will be.


The splash screen is an image generated by a neural network.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *