Ideal catalog, performance measurements

Hello everyone.

I am developing a library to work with Entity Attribute Value (repository), abbreviated as EAV (database model for storing arbitrary data). At the end of the last article, I asked you what else I need to write about, you asked to show an example of use and make performance measurements.

What is important to us when working with data? Write speed (adding or updating) and reading speed (specifically – filtering by models of one entity). And the search speed is a priority, because we write down once every ten days, but we read every minute / second and even more than once, and maybe more than one hundred times.

The trick of the library is that it would not work with submission built on basic EAV tables, and work with a small part of this data recorded in a separate materialized view or in a separate table

During the New Year holidays, I made performance measurements and I want to share the results with you.

What are we going to measure?

Reading:

  • Subtraction time for all items in the category

  • Time of formation of filtration parameters

  • Filtration time

Record:

  • Time of adding a new characteristic (attribute)

  • Time of adding a new heading

  • Time of updating a commodity item

Data set

The history of the development of the library began with the development of a market place, so consider that we are working with a catalog of commodity items.

Data set requirements

For testing the reading time from MATERIALIZED VIEW and from TABLE, it is not so important how many records were in the original tables of the EAV model, so several hundred categories will be enough to test the hypothesis.

There are no more than 100 characteristics for a category (and, accordingly, for a commodity item) (for an ordinary online store, this is even a lot, usually the characteristics of a product are no more than a couple of dozen).

Data set characteristics

Taking into account such input, using the script data-generator/generate.php, the dataset was created:

  1. number of entities (categories) – 320

  2. number of attributes (characteristics) – 144

  3. number of models (items in all categories) – 29,000

  4. numerical values ​​- 800,000

  5. string values ​​- 600,000

The total volume of EAV tables (with indexes) is 310 MB.

Accounting for data variability

The amount of data in which you need to search (filter) is of great importance for the search time.

We divide all entities into three types: many attributes, an average amount and a small amount, select one entity from each type:

  • Many – 238 models, 112 attributes, total values ​​26656

  • Average – 75 models, 34 attributes, 2550 values

  • Small – 4 models, 5 attributes, 20 values

Measured values

All measurements in milliseconds

Entity with many models and attributes

Many – 238 models, 112 attributes, total values ​​26656.

Reading

Access method

Get all records, no filtering

Get filters

Filter

VIEW

400

382.6

82.8

MATERIALIZED VIEW

5

402.8

40.9

TABLE

4.9

395.9

46.1

It is impossible to get all records from VIEW without filtering, because we must have at least one filter – by category – already. Why such a monstrously long time turned out, I cannot understand.

For the “Filter” dimension, which has many models and attributes by entity, a selection was made by 46 columns (out of 112).

When it comes to reading, filtering by MATERIALIZED VIEW is 10% faster than TABLE.

Recording

Access method

add attribute

add model

update model

VIEW

79.7

155.6

188.7

MATERIALIZED VIEW

1338

1 112

1,462

TABLE

3

196.5

217

Adding an attribute to a view is re-creating it.

Adding an attribute to the MATERIALIZED VIEW is a complete update.

Adding an attribute to a table is ALTER TABLE, so only 3ms.

Adding a model means creating a record in a table thing and creating empty values ​​in tables word and number… Accordingly, 155 ms for presentation, this is the insertion of 112 records into the table of values ​​(numeric – number, string – word).

For MATERIALIZED VIEW, we again need to recalculate it completely – update it entirely.

For the table, we add a record that will have 112 empty columns, this is only 40.9 ms, but the record is added based on the view, so 155.6 + 40.9 = 196.5.

Updating a record (updating all 112 columns) is done by updating the data in the base EAV tables and then:

  • This is enough for the presentation;

  • For MATERIALIZED VIEW, we need to recalculate everything completely again;

  • For the table, the update occurs independently of the base tables, just one row is updated, but this row is updated only after a successful update of the data in the base tables, so we see 217 ms, in fact, the actual update of the record is 30 ms.

conclusions

As we can see, the use of MATERIALIZED VIEW gives the maximum gain in read time (twice as fast as from the view and 10% faster than from the table), but at the same time updating the data takes an order of magnitude more time (loss is 10 times).

Using the table gives a gain in reading twice, and a loss in writing by 15%, and this loss is due to the fact that we are waiting for a successful update of the master data, if we do not wait, then the gain will be 6 times.

Of course, in order not to wait, you need to modify the library with a file, add asynchrony or the logic of creating and executing jobs there. Of course, write logic for eventual consistency

If we update the data in our “universal” catalog once an hour, then the loss of one second for updating is insignificant, and it is definitely worth it to get an additional 10% gain compared to using the table.

If we update data several times per hour, then I would use a table.

An entity with an average number of models and attributes

Average – 75 models, 34 attributes, 2550 values.

Reading

Access method

Get all records, no filtering

Get filters

Filter

VIEW

49.5

83

27.9

MATERIALIZED VIEW

1.7

69

12.3

TABLE

1.4

66

12.6

Recording

Access method

add attribute

add model

update model

VIEW

24.4

50

65

MATERIALIZED VIEW

207

212

242

TABLE

3

90

90

When the record is updated, we again see about 30 ms for the actual update of the record in the table, but compared to 65 ms for the update of the record in the base tables, this is already 50%.

Entity with a small number of models and attributes

Small – 4 models, 5 attributes, 20 values.

Reading

Access method

Get all records, no filtering

Get filters

Filter

VIEW

2.4

10

5.4

MATERIALIZED VIEW

one

thirteen

4

TABLE

0.8

thirteen

2.9

Recording

Access method

add attribute

add model

update model

VIEW

6.2

8.8

8.1

MATERIALIZED VIEW

29

nineteen

27

TABLE

4.4

4.4

10.7

General conclusion

With the decrease in the amount of data, the difference between the way to access this data disappears. For medium categories, there is no difference in reading, the difference in writing is only two times.

Nevertheless, the general trend persists – searching for a materialized view and a table is twice as fast as for a view created on the basis of EAV tables.

If the number of entities is increased by an order of magnitude, then the search time in the “table” will not change, and the search time in the view will increase (it seems to me), therefore, of course, when working with EAV, you should use “tables”.

The question of whether this will be a materialized view, or whether it will be a table itself – must be decided based on the balance between reading and writing.

What’s next ?

In the next article I will give an example of using the library.

I hope you were interested.

Similar Posts

Leave a Reply