Training and Practice

In the era of big data, efficient search and analysis of information has become critical to business and development.

Elasticsearch is a search engine for Javarunning on the HTTP protocol which allows you to quickly and efficiently process huge amounts of data, giving users the ability to find the information they need in a matter of seconds.

This article is for those who want to learn the basics Elasticsearch and learn to use its capabilities in practice. We'll cover key concepts such as indexes, documents, and queries, as well as learn how to set up the environment and perform basic operations. Whether you're a developer, an analyst, or simply want to expand your knowledge of modern data technologies, this guide will help you step into the world with confidence Elasticsearch. Get ready for an exciting journey into the world of search technology!

One way to use Elaticsearch – docker file. The docker-compose file will be attached below for convenience.

Solving possible startup problems: limit resource consumption through JVM options.

docker-compose
version: "3.8"
services:
  elasticsearch:
    image: elasticsearch:8.6.2
    ports:
      - "9200:9200"
      - "9300:9300"
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
    ulimits:
      memlock:
        soft: -1
        hard: -1
  kibana:
    image: kibana:8.6.1
    ports:
      - "5601:5601"

Ways to work with Elasticsearch:

  • Via terminal using cURL

  • Using Kibana

  • Using Dbeaver/DataGrip to Connect

Indexes

Index – storage of documents with the same data schema, specified by name and settings. Index name restrictions: lower case, no defined characters, length up to 255 bytes.

Document – an entry in the index corresponding to the data schema.

Document outline – description of the data in the fields, data type (for example, long for numbers, keyword for text).

Search

Data Indexing And search by index – two main stages of the search engine. The general search algorithm involves processing a query, retrieving data from an index, calculating relevance, and ranking results.

Document relevance – assessment of the correspondence of the result to the search query, determined by the search algorithms used.

Filtering data in Elasticsearch: Use of classic query language and SQL syntax. An alternative way to use SQL: the translate function to translate SQL queries into the Elasticsearch DSL.

Full text search

Full-text search is important for improving the user experience, allows you to search for any word or phrase in a document, and requires the implementation of a full-text index.

Full text index consists of full-text documents and full-text fields.

Full-text documents contain text, and full-text fields contain information about the document.

Bulk requests

Bulk-requests to Elasticsearch allow you to download data in batches, which speeds up the process And reduces network load and Elasticsearch.

Bulk-the request consists of a pack json-objects, where the same thing is repeated on odd lines, and the last line must contain a \n break character. Elasticsearch returns JSON-response with information about saved documents and possible errors.

If errors occur during the saving process, you can create a save request with the filter_path=items.*.error parameter.

Practice

Index creation block

Creating an Index

Data entry block

Entering data with a pre-entered index
POST favorite_films/_doc/1
{
  "title": "Catch me If you can",
  "type": "drama",
  "year": 2001
}
Entering data
POST favorite_films/_doc
{
  "title": "EuroTrip",
  "type": "camedy",
  "year": 2004
}
Entering a large amount of data
POST favorite_films/_bulk
{ "index": { "_id": "2" } }
{"title": "The Wolf of Wall Street", "type": "camedy", "year": 2013}
{ "index": { "_id": "4" } }
{"title": "Ted", "type": "camedy", "year": 2012}
{ "index": { "_id": "5" } }
{"title": "Inglourious basterds", "type": ["crime", "drama", "camedy"], "year": 2009}
{ "index": { "_id": "6" } }
{"title": "The Hangover", "type": "camedy", "year": 2009}
{ "index": { "_id": "7" } }
{"title": "The Hangover Part II", "type": "camedy", "year": 2011}
{ "index": { "_id": "8" } }
{"title": "The Hangover Part III"," type": "camedy", "year": 2013}

Data change block

Updating data (deletes old data, so fields must also be transferred)
POST favorite_films/_doc/1
{
  "title": "Catch me If you can",
  "type": "drama",
  "year": 2002
} 
Updating data without overwriting
POST favorite_films/_update/1
{
  "doc": {
    "type": ["crime","drama"]
  }
}

Data acquisition block

Getting all the information
GET favorite_films/_search
Getting information by id
GET favorite_films/_doc/1
Search where the movie release date is from 2002 to 2004
GET favorite_films/_search
{
  "query": {
    "range": {
      "year": {
        "gte": 2002,
        "lte": 2004
      }
    }
  }
}
Getting all movies where the movie genre is comedy
GET favorite_films/_search
{
  "query": {
    "match": {
      "type":"camedy"
    }
  }
}

When using several words, the search occurs one word at a time, not all together

Requests are the same
GET favorite_films/_search
{
  "query": {
    "match": {
      "type":"camedy drama"
    }
  }
}
GET favorite_films/_search
{
  "query": {
    "match": {
      "type": {
        "query": "drama camedy"
      }
    }
  }
Using multiple words, searching all words together
GET favorite_films/_search
{
  "query": {
    "match": {
      "type": {
        "query": "drama camedy",
        "operator": "and"
      }
    }
  }
}
Search for a phrase
GET favorite_films/_search
{
  "query": {
    "match_phrase": {
      "title": {
        "query": "The Hangover Part"
      }
    }
  }
}
Search across multiple fields
GET favorite_films/_search
{
  "query": {
    "multi_match": {
        "query": "The Hangover",
        "fields": ["title","type"],
        "type": "phrase"
      }
  }
}
Search with the required content of a phrase and a specific year
GET favorite_films/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "title": "The Hangover Part"
          }
        },
        {
          "match": {
            "year": 2013
          }
        }
      ]
    }
  }
}
Search for aggregate values ​​by specific name
 GET favorite_films/_search
{
  "query": {
    "match_phrase": {
      "title": "The Hangover"
    }
  },
  "aggs": {
    "average_year": {
      "avg": {
        "field": "year"
      }
    }
  }
}

Conclusion

Studying Elasticsearch opens up broad horizons in the field of data processing and analysis. This powerful tool not only allows you to quickly find information, but also provides flexible options for scaling and integration with other systems. Once you master the basics of working with indexes, documents, and queries, you can effectively process large amounts of data and extract valuable information from it. We hope this guide has given you a useful start in the world Elasticsearch. Practice is the key to mastery, so don't be afraid to experiment with different features and capabilities of the platform. You can further expand your knowledge by learning more advanced topics such as configuring clusters, managing performance, and implementing complex search queries. Good luck in your learning and successful search for solutions with the help Elasticsearch!

We recommend reading Matthew Lee Kinman's book “Elasticsearch in Action” and watching courses from the Offical Elastic Community on YouTube. Link attached

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *