field scheme for facet search, filter in the online store

In this article, I will omit such details of working with Elasticsearch (hereinafter simply ES), such as:

  1. How to install

  2. How to connect

  3. Reveal the full mapping scheme for an online store product

  4. A detailed description of the entire structure and all requests for obtaining a product page with search results and a filter.

And anything else.

Here, as it is written in the title, I will only try to describe the scheme only for the product characteristics fields and how to make aggregation and filtering requests for them.

Foreword

I came to writing this article after an unsuccessful experience in developing an online store on the framework and MySQL with tens of thousands of products, which in turn had several dozen characteristics and many values ​​​​for them. Due to a lot of queries to get the values ​​of the product filter and, possibly, absolutely wrong structuring of the tables or for some other reason, the site was terribly slow and loaded for a long time. It got to the point that in the Yandex webmaster I received a similar error:

Screenshot from the internet.

Screenshot from the internet.

The site was not developed by me. There was neither knowledge nor desire to deal with him in the future. I decided that later I would develop an online store on my own, but using a different and non-relational data store, and not Mysql. The choice fell on ES, and when studying, understanding the structuring of the characteristics of goods and obtaining values ​​for them, which later could be changed painlessly and without affecting the code, took a lot of time. Personally, I really missed absolutely simple examples on the Russian-speaking Internet, which are, for example, for PHP + Mysql.

Everything described is only based on my personal experience and understanding of the scheme and structure of documents, focused on using to build a faceted search in an online store, which I came to while studying and developing. That is, the article is designed more for beginners who have begun to study ES.

As a matter of fact

Elasticsearch is a distributed search and analytics engine based on Apache Lucene. The full description can be found at official website.

Faceted search (faceted navigation) – search for a product in a section, category or on a full-text search page by characteristics: color, material, price, manufacturer, etc. For the end user – a set of filters. Each filter is a characteristic. The values ​​of this filter are all possible values ​​of the characteristic. For an online store, this is the main search function, and users expect it to be fast enough.

In the example below, the user is in the category “chandeliers” and additionally filtered products in the price range from 1394 to 42207 rubles. and with black color. 198 products were found, and the filter panel on the left lists those characteristics that are contained in the search results, as well as the number of available values ​​\u200b\u200bthat have this attribute (number of facets):

Here you can personally try out the filter and repeat the steps described above (the site uses ES).
Here you can personally try the filter and repeat the steps described above (the site uses ES).

To create a faceted search in ES, a fairly powerful aggregation tool. One of the nice things about aggregations is that they can be nested – in other words, you can define top-level aggregations that create “buckets” of documents and other aggregations that run inside those buckets. For ease of understanding, this is broadly similar to the SQL GROUP_BY command. On the basis of filters, documents are generalized and grouped according to some specific attribute.

Indexing Facet Values

Before aggregations can be created, document attributes that can serve as facets must be indexed in ES. One way to index them is to list all attributes and their values ​​in one field, as in the following example:

"facets": {
  "color": "Черный",
  "style": "Лофт",
  "room": "Гостиная",
}

Mapping ES should look like this:

"facets": {
  "type": "nested",
  "properties": {
      "color": {
          "type": "keyword",    
      },
      "style": {
        "type": "keyword",
      }
      "room": {
        "type": "keyword",
      }
  }
}

This approach may work, but for faceting in this case, queries will have to explicitly list all the names of the fields for which we want to create an aggregation.

"aggs": {
  "facets": {
    "nested": {
      "path": "facets"
    },
    "aggs": {
      "color": {
        "terms": {
          "field": "facets.color"
        }
      },
      "style": {
        "terms": {
          "field": "facets.style"
        }
      },
      "room": {
        "terms": {
          "field": "facets.room"
          }
      },
    }
  }
}

Obviously, this is not very practical and effective with a large number of product characteristics that can change and be supplemented over time. And, for example, when deleting, changing or adding a new product characteristic, you will have to manually change the mapping, re-index and change the query by adding a new field name to it.

Instead I came up with the following

Separated the names and values ​​of the facets sent to the elastic index as follows:

"string_facets": {
  {
    "name": "color",
    "value": "Черный"
  },
  {
    "name": "color",
    "value": "Белый"
  },
  {
    "name": "style",
    "value": "Лофт"
  },
  {
    "name": "style",
    "value": "Техно"
  },
  {
    "name": "room",
    "value": "Гостиная"
  },
  {
    "name": "room",
    "value": "Спальня"
  }
}

mapping:

"string_facets": {
  "type": "nested",
  "properties": {
    "name": {
      "type": "keyword",    
   },
    "value": {
      "type": "keyword",
    }
  }

Filtering and aggregating such a structure requires nested filters and nested aggregations in queries.

Aggregation:

"aggs": {
  "aggs_text_facets": {
    "nested": {
      "path": "string_facets"
    },
    "aggs": {
      "name": {
        "terms": {
          "field": "string_facets.name"
        },
        "aggs": {
          "value": {
            "terms": {
              "field": "string_facets.value"
            }
          }
        }
      }
    }
  }
}

Filtration:

"filter": {
  "nested": {
    "path": "string_facets",
    "filter": {
      "bool": {
        "must": {
          {
            "term": {
              "string_facets.name": "color"
            }
          },
          {
            "terms": {
              "string_facets.value": {
                "Черный"
              }
            }
          }
        }
      }
    }
  }
}

This applies to characteristics that have text values. Characteristics with numerical values ​​must be stored and analyzed separately. This is due to the fact that numerical characteristics (for example, dimensions: width, length) sometimes have a huge number of different values. And instead of listing all the possible values, it’s enough to simply get the minimum and maximum values ​​and display them as a range selector or slider. This is only possible if the values ​​are stored as numbers.

In mapping it will look like this:

"number_facets": {
  "type": "nested",
  "properties": {
    "name": {
      "type": "keyword",    
   },
    "value": {
      "type": "double",
    }
  }

Aggregation:

"aggs_number_facet": {
  "nested": {
    "path": "number_facets"
  },
  "aggs": {
    "name": {
      "terms": {
        "field": "number_facets.name"
      },
      "aggs": {
        "value": {
          "stats": {
            "field": "number_facets.value"
          }
        }
      }
    }
  }
}

With this approach, there is no need to know the list of available characteristics at query time. Also, at any time, you can simply change the data in the index by deleting or changing the necessary characteristics without affecting the mapping and queries.

PS Having organized the scheme of documents in this way and having registered all the necessary requests, I ran into one more problem. When filtering, only products with the selected value in the product filter were left; therefore, it was impossible to select several values ​​of the same filter, which in my case affected the convenience for users. A separate article is required to describe the solution to the problem.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *