how not to get lost in the variety of solutions

Index Types in NoSQL

Primary indices:

Primary indexes provide a unique identification for each record in the database and organize data according to a specific key. In NoSQL databases, primary indexes play a major role in distributing data across nodes.

Primary indexes are defined automatically when a record is created and do not require additional effort to maintain. For example, in MongoDB _id is the default primary index, while Cassandra uses a user-defined primary key.

In MongoDB field _id automatically indexed:

// поиск документа по первичному индексу _id
db.collection.find({ _id: ObjectId("64bd1c9f4a4e19d8f84f06f9") })

In Cassandra, the primary key is defined when the table is created, and is automatically indexed:

-- cоздание таблицы с первичным ключом
CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    name TEXT,
    email TEXT
);

-- поиск по первичному индексу
SELECT * FROM users WHERE user_id = '550e8400-e29b-41d4-a716-446655440000';

Primary indexes provide fast access to records because the data is sorted by key. However, they are only suitable for key-related operations.

Secondary indices

Secondary indexes enable fast searching on fields other than the primary key. They help you perform complex queries that cannot be implemented using primary indexes alone.

MongoDB:

Creating a secondary index to optimize field searches name:

// создание вторичного индекса на поле name
db.collection.createIndex({ name: 1 });

// поиск по вторичному индексу
db.collection.find({ name: "Artem" });

Cassandra:

Creating a secondary index to optimize field searches email:

-- создание вторичного индекса на поле email
CREATE INDEX ON users (email);

-- поиск по вторичному индексу
SELECT * FROM users WHERE email="artem@example.com";

Let's look at an example of using primary and secondary keys, where we are working with a collection of users in MongoDB and need to optimize searches by several fields:

// создание коллекции пользователей с автоматическим первичным индексом _id
db.users.insertMany([
    { name: "Ivan", email: "ivan@example.com", age: 28 },
    { name: "Artem", email: "artem@example.com", age: 32 },
    { name: "Kolya", email: "kolya@example.com", age: 25 }
]);

// создание вторичных индексов
db.users.createIndex({ name: 1 });
db.users.createIndex({ email: 1 });

// поиск пользователя по имени
db.users.find({ name: "Alice" });

// поиск пользователя по email
db.users.find({ email: "bob@example.com" });

Created indexes on fields name And emailso you can quickly find users by these parameters.

Range-based indices

Range-based indexes allow you to process queries where you need to work with intervals of values: timestamps, Numeric ranges or alphabetical lines.

Elasticsearch makes extensive use of range-based indexes to handle time-sensitive data and complex queries. For example:

PUT /logs
{
  "mappings": {
    "properties": {
      "timestamp": { "type": "date" },
      "message": { "type": "text" }
    }
  }
}

// поиск по диапазону дат
GET /logs/_search
{
  "query": {
    "range": {
      "timestamp": {
        "gte": "2024-08-01",
        "lte": "2024-08-31"
      }
    }
  }
}

Cassandra supports range-based indexing for numeric and temporal data:

-- создание таблицы с временными метками
CREATE TABLE events (
    event_id UUID PRIMARY KEY,
    timestamp TIMESTAMP,
    description TEXT
);

-- поиск событий за определенный период
SELECT * FROM events WHERE timestamp >= '2024-08-01' AND timestamp <= '2024-08-31';

Let's imagine a task in which we work with event logs and need to quickly find events for certain time intervals:

// создание коллекции логов событий
db.logs.insertMany([
    { timestamp: new Date("2024-08-01T10:00:00Z"), message: "User login" },
    { timestamp: new Date("2024-08-02T12:00:00Z"), message: "Data backup" },
    { timestamp: new Date("2024-08-03T14:00:00Z"), message: "System update" }
]);

// создание индекса на поле timestamp
db.logs.createIndex({ timestamp: 1 });

// поиск событий за август 2024
db.logs.find({
    timestamp: {
        $gte: new Date("2024-08-01"),
        $lte: new Date("2024-08-31")
    }
});

Geospatial indices

Geospatial indexes are designed to work with geographic data such as point coordinates, lines, and polygons. They allow you to perform geographic queries, such as: “search for nearest” And “entering the area“.

MongoDB uses 2dsphere index to support complex geographic queries:

// создание коллекции с географическими данными
db.places.insertMany([
    { name: "Central Park", location: { type: "Point", coordinates: [-73.9654, 40.7829] } },
    { name: "Golden Gate Park", location: { type: "Point", coordinates: [-122.4862, 37.7694] } }
]);

// создание геопространственного индекса
db.places.createIndex({ location: "2dsphere" });

// поиск мест в радиусе 10 км от заданной точки
db.places.find({
    location: {
        $near: {
            $geometry: { type: "Point", coordinates: [-73.935242, 40.730610] },
            $maxDistance: 10000
        }
    }
});

Elasticsearch supports geospatial indexes for fast coordinate searching:

PUT /places
{
  "mappings": {
    "properties": {
      "location": { "type": "geo_point" }
    }
  }
}

// Поиск ближайших мест
GET /places/_search
{
  "query": {
    "geo_distance": {
      "distance": "10km",
      "location": {
        "lat": 40.730610,
        "lon": -73.935242
      }
    }
  }
}

Let's say we are developing an application to find nearby cafes using MongoDB:

// создание коллекции с данными о кафе
db.cafes.insertMany([
    { name: "Cafe 1", location: { type: "Point", coordinates: [-73.935242, 40.730610] } },
    { name: "Cafe 2", location: { type: "Point", coordinates: [-73.985428, 40.748817] } }
]);

// создание геопространственного индекса
db.cafes.createIndex({ location: "2dsphere" });

// поиск кафе в радиусе 5 км от заданной точки
db.cafes.find({
    location: {
        $near: {
            $geometry: { type: "Point", coordinates: [-73.961452, 40.768063] },
            $maxDistance: 5000
        }
    }
});

Full-text indexes

Full-text indexes allow you to process queries related to text content.

MongoDB supports full-text indexes using text index for text search:

// создание коллекции с текстовыми данными
db.articles.insertMany([
    { title: "NoSQL: A Comprehensive Guide", content: "This article explains the key concepts of NoSQL databases." },
    { title: "Understanding MongoDB Indexes", content: "Indexes in MongoDB are crucial for performance optimization." }
]);

// создание полнотекстового индекса
db.articles.createIndex({ content: "text" });

// поиск по ключевым словам
db.articles.find({ $text: { $search: "NoSQL databases" } });

But Elasticsearch was originally designed for full-text search:

PUT /articles
{
  "mappings": {
    "properties": {
      "content": { "type": "text" }
    }
  }
}

// Поиск по ключевым словам
GET /articles/_search
{
  "query": {
    "match": {
      "content": "NoSQL databases"
    }
  }
}

Let's imagine that we want to implement search by articles with MongoDB:

// MongoDB: Создание коллекции статей
db.blog.insertMany([
    { title: "The Rise of NoSQL", content: "NoSQL databases have become increasingly popular..." },
    { title: "Indexing in MongoDB", content: "Indexes are essential for query performance in MongoDB..." }
]);

// создание полнотекстового индекса на поле content
db.blog.createIndex({ content: "text" });

// поиск статей, содержащих слово "NoSQL"
db.blog.find({ $text: { $search: "NoSQL" } });

Indexing is a powerful tool that can improve the performance and flexibility of projects.

If there are any other topics or questions worth considering, please let me know!

In conclusion, I would like to remind you about the upcoming open lessons that will be held as part of the online course “NoSQL”:

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *