My experience with LiteDB

[*]

Recently I was looking for a storage system for my program. It is a desktop application that needs to store a lot of objects and search for text in them. And I thought, “Why not try something new.” Instead of a SQL database like SqLite, I could use a document database. But I would like it not to require a separate server so that it works with a simple file. An Internet search for similar systems for .NET applications quickly led me to LiteDB. Here I want to share what I found in the process of working with this database.


Inheritance

The specifics of my program is as follows. I want to store simple objects like:

internal class Item
{
    public string Title { get; set; }

    public string Description { get; set; }

    public List<Field> Fields { get; set; } = new List<Field>();
}

But here is the class Field I have an abstract one that has a lot of heirs:

internal abstract class Field
{
}

internal sealed class TextField : Field
{
    public string Text { get; set; }
}

internal sealed class PasswordField : Field
{
    public string Password { get; set; }
}

internal sealed class DescriptionField : Field
{
    public string Description { get; set; }
}

...

When working with SQL databases, I had to configure the preservation of various class heirs Field. I thought that in LiteDB I would have to write my own BSON serialization mechanism, since this opportunity is provided. But LiteDB pleasantly surprised me. No effort was required on my part. Saving and restoring various types occurs completely without my participation. You create the required objects:

var items = new Item[]
{
    new Item
    {
        Title = "item1",
        Description = "description1",
        Fields =
        {
            new TextField
            {
                Text = "text1"
            },
            new PasswordField
            {
                Password = "123"
            }
        }
    },
    new Item
    {
        Title = "item2",
        Description = "description2",
        Fields =
        {
            new TextField
            {
                Text = "text2"
            },
            new DescriptionField
            {
                Description = "description2"
            }
        }
    }
};

…and insert them into the database:

using (var db = new LiteDatabase(connectionString))
{
    var collection = db.GetCollection<Item>();

    collection.InsertBulk(items);
}

That’s all. LiteDB comes with a handy program LiteDB.Studio, which allows you to explore the contents of your database. Let’s see how our objects are stored:

{
  "_id": {"$oid": "62bf12ce12a00b0f966e9afa"},
  "Title": "item1",
  "Description": "description1",
  "Fields":
  [
    {
      "_type": "LiteDBSearching.TextField, LiteDBSearching",
      "Text": "text1"
    },
    {
      "_type": "LiteDBSearching.PasswordField, LiteDBSearching",
      "Password": "123"
    }
  ]
}

It turns out that for each object in the field _type its type is preserved, which allows you to correctly restore the object when reading from the database.

Well, we figured out the preservation. Let’s move on to reading.

Text search

As I said, I need to search for objects Item in properties Title and Description which, as well as in the properties of their fields (property Fields) contains some text.

With search inside properties Title and Description all clear. The documentation contains clear examples:

var items = collection.Query()
    .Where(i => i.Title.Contains("1") || i.Description.Contains("1"))
    .ToArray();

But with the search in the fields there is a problem. The point is that the abstract class Field does not define any properties. Therefore, I cannot refer to them here. Luckily, LiteDB allows you to use string notation for queries:

var items = collection.Query()
    .Where("$.Title LIKE '%1%' OR $.Description LIKE '%1%'")
    .ToArray();

But how to use it to search inside the fields? The documentation hints that the expression should look something like this:

$.Title LIKE '%1%' OR $.Description LIKE '%1%' OR $.Fields[@.Text] LIKE '%1%' OR $.Fields[@.Description] LIKE '%1%' OR $.Fields[@.Password] LIKE '%1%'

But this entry results in an error:

Left expression `$.Fields[@.Text]` returns more than one result. Try use ANY or ALL before operant.

Indeed, using the ANY function solves the problem:

$.Title LIKE '%1%' OR $.Description LIKE '%1%' OR ANY($.Fields[@.Text LIKE '%1%']) OR ANY($.Fields[@.Description LIKE '%1%']) OR ANY($.Fields[@.Password LIKE '%1%'])

I would like to make a few remarks about this expression. First, from it it seems that you can use an expression like this as a filter:

ANY($.Fields[@.Text LIKE '%1%'])

But it’s not. An attempt to filter records by this expression results in an error:

Expression 'ANY($.Fields[@.Text LIKE "%1%"])' are not supported as predicate expression.

Strange, isn’t it. It turns out you need to write like this:

ANY($.Fields[@.Text LIKE '%1%']) = true

The 1s and 0s used in predicates in SQL Server immediately come to mind. Well, God be their judge.

Secondly, I was somewhat confused by the phrase Try use ANY or ALL before operant. Somehow it did not agree with the function call for me. It turns out that LiteDB also supports the following syntax:

$.Fields[*].Text ANY LIKE '%1%'

Unfortunately, it is not described in the documentation, I came across it while browsing source code LiteDB tests on GitHib. It, unlike the ANY function, normally works as a predicate without any comparisons with true.

As a result, the search expression can be rewritten as follows:

$.Title LIKE '%1%' OR $.Description LIKE '%1%' OR ($.Fields[*].Text ANY LIKE '%1%') OR ($.Fields[*].Description ANY LIKE '%1%') OR ($.Fields[*].Password ANY LIKE '%1%')

There were two things that bothered me here. First, for each new field type, I might need to append this expression if it used a new text property name that I needed to search. Can something be done about this? Well, you can.

LiteDB supports attribute BsonField, which allows you to specify the name of the database field in which this property will be stored. You can use it like this:

internal sealed class TextField : Field
{
    [BsonField("TextField")]
    public string Text { get; set; }
}

internal sealed class PasswordField : Field
{
    [BsonField("TextField")]
    public string Password { get; set; }
}

internal sealed class DescriptionField : Field
{
    [BsonField("TextField")]
    public string Description { get; set; }
}

Now you can use one search expression for any objects Field:

$.Title LIKE '%1%' OR $.Description LIKE '%1%' OR $.Fields[*].TextField ANY LIKE '%1%'

Adding a new class descendant FieldI can just mark its property with an attribute [BsonField("TextField")]. Then I won’t have to make any changes to the search expression.

Unfortunately, this method does not quite solve all our problems. The fact is that the heir Field there can be an arbitrary number of properties to search. Therefore, it may turn out that they cannot be reduced to fields already in the database.

For this reason, I’ll stick with the old request form for now:

$.Title LIKE '%1%' OR $.Description LIKE '%1%' OR ($.Fields[*].Text ANY LIKE '%1%') OR ($.Fields[*].Description ANY LIKE '%1%') OR ($.Fields[*].Password ANY LIKE '%1%')

She has another problem. I had to write the search string several times '%1%'. Yes, and SQL Injection has not yet been canceled (although I’m not sure if the word SQL is applicable here). In short, my point is that it would be nice to use query parameters. Indeed, the API allows you to do this:

Request parameters
Request parameters

But how exactly do I refer to the parameter in the request body? Unfortunately the documentation let me down again. I had to go into the test code for LiteDB and look for therehow parameters should be used:

var items = collection.Query()
    .Where("$.Title LIKE @0 OR $.Description LIKE @0 OR ($.Fields[*].Text ANY LIKE @0) OR ($.Fields[*].Description ANY LIKE @0) OR ($.Fields[*].Password ANY LIKE @0)", "%1%")
    .ToArray();

Well, we figured out the search. But how quickly does it happen?

Indices

LiteDB supports indexes. Of course, my application will not store such a large amount of data that it is critical. But it would still be nice if my queries used indexes and were fast.

First, how do we know if a given query uses an index or not. To do this, LiteDB has a command EXPLAIN. In LiteDB.Studio I will execute my query like this:

EXPLAIN
SELECT $ FROM Item
WHERE $.Title LIKE '%1%'
    OR $.Description LIKE '%1%'
    OR ($.Fields[*].Text ANY LIKE '%1%')
    OR ($.Fields[*].Description ANY LIKE '%1%')
    OR ($.Fields[*].Password ANY LIKE '%1%')

The output of this command contains the following information about the index used:

"index":
  {
    "name": "_id",
    "expr": "$._id",
    "order": 1,
    "mode": "FULL INDEX SCAN(_id)",
    "cost": 100
  },

As you can see, the full view of all data is now applied. I would like to get better results.

Documentation speaks directlythat it is possible to create an index based on array type properties. In this case, it will be possible to search for any value included in such arrays. For example, this is how you can create an index to search for information in properties Text my fields:

collection.EnsureIndex("TextIndex", "$.Fields[*].Text");

Now this index can be used in the search:

var items = collection.Query()
    .Where("$.Fields[*].Text ANY LIKE @0", "%1%")
    .ToArray();

Team EXPLAIN in LiteDB.Studio shows that this query is indeed using the index we created:

"index":
  {
    "name": "TextIndex",
    "expr": "MAP($.Fields[*]=>@.Text)",
    "order": 1,
    "mode": "FULL INDEX SCAN(TextIndex LIKE \"%1%\")",
    "cost": 100
  },

But how do we combine all our properties into one index? Team comes to the rescue CONCAT. It combines several properties into one array. This is what creating a full index would look like:

collection.EnsureIndex("ItemsIndex", @"CONCAT($.Title,
            CONCAT($.Description,
                CONCAT($.Fields[*].Text,
                    CONCAT($.Fields[*].Password,
                            $.Fields[*].Description
                    )
                )
            )
        )");

To search on it, we have to rewrite our search expression:

var items = collection.Query()
    .Where(
        @"CONCAT($.Title,
            CONCAT($.Description,
                CONCAT($.Fields[*].Text,
                    CONCAT($.Fields[*].Password,
                            $.Fields[*].Description
                    )
                )
            )
        ) ANY LIKE @0",
        "%1%")
    .ToArray();

Now our search actually uses the index:

"index":
  {
    "name": "ItemsIndex",
    "expr": "CONCAT($.Title,CONCAT($.Description,CONCAT(MAP($.Fields[*]=>@.Text),CONCAT(MAP($.Fields[*]=>@.Password),MAP($.Fields[*]=>@.Description)))))",
    "order": 1,
    "mode": "FULL INDEX SCAN(ItemsIndex LIKE \"%3%\")",
    "cost": 100
  },

Unfortunately the operator LIKE still results in a FULL INDEX SCAN. It remains to be hoped that the index still gives some gain. But why should we hope. We can measure everything. We also have BenchmarkDotNet.

I wrote the following class for performance tests:

[SimpleJob(RuntimeMoniker.Net60)]
public class LiteDBSearchComparison
{
    private LiteDatabase _database;
    private ILiteCollection<Item> _collection;

    [GlobalSetup]
    public void Setup()
    {
        if (File.Exists("compare.dat"))
            File.Delete("compare.dat");

        _database = new LiteDatabase("Filename=compare.dat");

        _collection = _database.GetCollection<Item>();

        _collection.EnsureIndex("ItemIndex", @"CONCAT($.Title,
            CONCAT($.Description,
                CONCAT($.Fields[*].Text,
                    CONCAT($.Fields[*].Password,
                            $.Fields[*].Description
                    )
                )
            )
        )");

        for (int i = 0; i < 100; i++)
        {
            var item = new Item
            {
                Title = "t",
                Description = "d",
                Fields =
                {
                    new TextField { Text = "te" },
                    new PasswordField { Password = "p" },
                    new DescriptionField { Description = "de" }
                }
            };

            _collection.Insert(item);
        }
    }

    [GlobalCleanup]
    public void Cleanup()
    {
        _database.Dispose();
    }

    [Benchmark(Baseline = true)]
    public void WithoutIndex()
    {
        _ = _collection.Query()
            .Where("$.Title LIKE @0 OR $.Description LIKE @0 OR ($.Fields[*].Text ANY LIKE @0) OR ($.Fields[*].Description ANY LIKE @0) OR ($.Fields[*].Password ANY LIKE @0)",
                "%1%")
            .ToArray();
    }

    [Benchmark]
    public void WithIndex()
    {
        _ = _collection.Query()
            .Where(@"CONCAT($.Title,
                        CONCAT($.Description,
                            CONCAT($.Fields[*].Text,
                                CONCAT($.Fields[*].Password,
                                        $.Fields[*].Description
                                )
                            )
                        )
                    ) ANY LIKE @0",
                "%1%")
            .ToArray();
    }
}

The results I got for it are as follows:

method

mean

error

StdDev

Ratio

WithoutIndex

752.7 us

14.71 us

21.56 us

1.00

WithIndex

277.5 us

4.30 us

4.02 us

0.37

As you can see, an index can indeed provide a significant performance advantage.

Conclusion

That’s all I wanted to say today. In general, I have a pretty good impression of LiteDB, I would use it as a document repository for small projects. Unfortunately the documentation leaves a lot to be desired in my opinion.

I hope the information provided will be useful to you. Good luck!