creating a cluster, connecting, changing table settings in a cluster

About the author

Mike

Hi, I'm Mike.

I recently started working at Manticore as a Developer Advocate. I'm not exactly far from IT, but I'm currently actively learning modern technologies. In this blog, I'll share my experience and what I learn about Manticore. I plan to keep a diary where I'll tell you what Manticore is and how to work with it. Let's figure out together how everything works, identify problems, and interact with developers.

If you're interested in exploring Manticore with me, I'll keep you updated at:

About replication

What is the general purpose for which replication might be needed?

IN previous article we updated the full-text search settings in the form of replacing the word form file, and at that time the table was unavailable. In the format of our example with the store, it did not take a critically long time, a few seconds of server unavailability is not such a big price for convenience, right? For small projects – maybe yes. But our store is growing, there are more and more products, as well as clients, and now such changes in the server settings can lead to restrictions on the system's operation for several hours. After all, the larger the database, the longer it will take to reindex it, although for Manticore these are not critical days and weeks, and even a couple of hours, but we would not want to allow even such a delay. Also, what if something happens to our only server? Or the number of clients on it alone becomes too large – all the advantages of a high-speed search engine will be lost. Therefore, now we should worry about creating several copies of one database, simultaneously and in parallel, and so that when writing to one of them, the data is automatically copied to the other linked servers or nodes.

In Manticore, replication between nodes is implemented through a library Galera. Galera uses synchronous multi-master replication technology, which provides high availability and fault tolerance for database clusters. When a new record is added to one of the servers (nodes), the change is instantly broadcast to all connected nodes. This process includes three phases: writing to the local transaction log on the source node, replicating the changes to other nodes, and confirming receipt of the data by all nodes before actually applying the transaction. The transaction is applied only after receiving confirmation from all nodes in the cluster, which guarantees data consistency on all nodes. For the user, these processes remain invisible, and access to new data from any node occurs instantly after the successful completion of the transaction on this node.

Initial setup

At the moment, our container from the previous article works fine, but if it is stopped and deleted, the data will be lost irretrievably. Guide to Launching Containers with Manticore strongly recommends using mapping of the disk (folder) with data outside of Docker. In addition, the configuration, which we did last timedid not provide for forwarding the binary port 9312 needed for replication. Let's fix this by creating a folder snapshot from a running container and launch a new one with the correct port and storage settings.
First, we need to ensure the integrity of the data, go into the container and in Manticore “freeze” the table, so that all the data that may be in RAM will be reliably moved to the disk and nothing will change on the disk during copying.

FREEZE products;

Now let's copy the directory with Manticore data from the container to the outside:

docker cp Manticore:/var/lib/manticore .

The team consists of three parts:

  • cp – copy,

  • Manticore:/var/lib/manticore – the name of the container and the path to the folder inside it,

  • . – local path to copy to, in this case it is the current directory.

In order for the current container to continue working in the previous mode, we “defrost” it:

UNFREEZE products;

Now let's create a new container with the settings we need:

docker run -e EXTRA=1 --name manticore_new -v )/manticore:/var/lib/manticore -p 10306:9306 -p 10312:9312 -d manticoresearch/manticore     
docker exec -it manticore_new mysql

As a result of this manipulation, we have a clone of our container with forwarded ports and the location of the database files on the server outside the container. Let's check if everything is in place in our new container:

New container view

Great, all the data has been transferred, including the configuration and the word form file, and now we will create a cluster based on this container.

By the way, a little about the word form file: in the folder we copied there is a copy of this file, I recommend copying it from there, because in the future, as practice has shown, it may be useful to edit it, and using a file located in the folder with the table for this is a wrong idea, which can ultimately lead to some problems. I made a copy for myself outside the database folder: cp manticore/products/wf_pet_products.txt wf_pet_products.txt
And another piece of good news, I talked to my colleagues – soon you won't have to drag the word form file anywhere manually when using mysqldump, everything will be automatically compiled into the dump. Here problem on github.

Let's create our first cluster

In order to implement our new cluster, no complex operations are required – it is enough to create a cluster with a name using one command, then attach the necessary table to it. Then it remains only to check how everything turned out.

  • To add a cluster, use the command CREATE CLUSTER

  • To add a table to a cluster, use the command ALTER CLUSTER ADD. It is important to emphasize here that clustering, replication and other joys of Manticore are available only to RT type tables!So, let's create our first cluster and immediately add our table to it:

CREATE CLUSTER pet_shop;
ALTER CLUSTER pet_shop add products;

Now let's check what we got, for this we use the command show statusbut it will give out a fantastic amount of information, and in order not to get lost in it, you can use the notation using the operand like:

show status like '%cluster%';
Show status result

At the moment we are interested in the following lines: cluster_name, cluster_pet_shop_status, cluster_pet_shop_indexes. They display the cluster name, status (if Primary — then everything is fine) and the tables that are currently included in this cluster. Separately, we will remember the line cluster_pet_shop_incoming_addressesin my assembly it looks like this: 172.17.0.5:9312,172.17.0.5:9315:replicationfrom it we will need the address 172.17.0.5:9312while port 9312 is forwarded to port 10312 in the conditions of the server outside of Docker, but in the example we will launch a new node in the conditions of the same Docker and network 172.17.0.0which makes it easier for us to use ports.

Technically, the original container could have been used. from the article about Wordformssimply adding a cluster to it and connecting a table, and then building a new container with an external storage connection and rest assured that replication will drag everything into the local image. But then I would not have shown how to solve the problem in another way, with saving the dump by copying a folder from the container… =)
We confidently come up with problems for ourselves and heroically solve them! =)

Our first node is already set up, no further actions are required. Simple? In my opinion, very simple!

Adding another node to the cluster

First, we need to launch another container with Manticore. We won't be transferring anything to it, we'll just connect it to the existing cluster. Except that the local storage should be different (the connected folders should be different if you're doing this on the same server). It's important to mention ports again, since we've already used ports 9306, 10306, and 10312. Therefore, we'll assign other ports, for example, 11306 and 11312.

Let's create another container with an instance of Manticore, call it Manticore_new_1. We specify 11306 and 11312 as ports, for the volume we specify manticore_new_1 (the local folder must already exist), and do not forget to set the environment variable EXTRA=1.

New node

Or all the same in one command:

docker run -e EXTRA=1 --name manticore_new_1 -v " class="formula inline">(pwd)/manticore_new_1:/var/lib/manticore -p 11306:9306 -p 11312:9312 -d manticoresearch/manticore

We go through the mysql client. There is a nuance here: if you use a local mysql client to connect, and not the one inside the container, then use the external port that you specified when creating the node – 11306, if you use the docker interface and enter through the container terminal (docker exec), then the port that is set by default for Manticore is used – 9306. In any case, we are connected. Let's see if there is a table – show tables. The result is, as expected, empty, since we just created an empty container with Manticore. Now let's connect it to an existing cluster – join cluster pet_shop at '172.17.0.5:9312';

Connect to cluster and checking parameters

For clarity, I changed the console color for the second node.

As we can see, the table has been added, the result of the number of records matches the original node, the configuration of the stemmer and the word form file is correct.
In principle, that’s all, the cluster is assembled and working, data is transferred between nodes.

Important note. If a node connected to a cluster has tables with the same name as tables in the cluster, the tables on the node will be overwritten with data from the cluster when connecting to the cluster. Writing in the cluster has a higher priority than local tables, so if you are connecting an existing node with some content, make sure that the names of the existing tables differ from those in the cluster. Well, and as usual – in any unclear situation, make a backup.

Managing data in a cluster

When working with table data in a cluster, there are some differences, since the insert command already requires some modifications – now we need to specify, in addition to the table name, also its membership in the cluster: insert into <cluster name>:<table name>(<fields>) values (<values>). Don't forget to correct this command in your client.
Let's add another entry while still in the node we just created:

insert into pet_shop:products (name, info, price, avl) values ('Aquarium ship', 'Decorative ship model for aquarium', 6, 1);
Result from adding new record

Judging by the result, the record has been added, but what about the other node?

Result for second node

And here everything is in place!

Let's try to update the data:

mysql> update products set price = 8.0 where id = 3317338896206921730;
ERROR 1064 (42000): table products: table 'products' is a part of cluster 'pet_shop', use 'pet_shop:products'

For actions with updating, changing and especially deleting records, it is now also necessary to specify the cluster name in the table name:

update pet_shop:products set price = 8 where id = 3317338896206921730;
Query OK, 1 row affected (0.01 sec)

We've sorted out the data, it's now transferred between nodes automatically and without any particular complications to the entire structure, except perhaps for a small change in the process of updating records.

Changing replicated table settings

But what if we need to change the configuration of the table, or delete it altogether, for example, to change the word form file? previous article To do this, we had to delete and recreate the table, leaving the user without server responses for some time. In that example, the time it took us to update the settings was very short, but the table was also small. In the case of large databases, tables with many millions of records, etc., updating and indexing can take a long time, in many cases measured in hours. In order to ensure uninterrupted operation of services built on Manticore, there are distribution tablesbut we will look at this in another article.
In the meantime, we have a database replicated to several nodes with a product table. You can change the configuration of this table by using a prefix to the table name with the cluster name, but you won't be able to delete it, even with the addition of this prefix. In order to change the settings of a replicated table, you first need to disconnect it from the cluster: ALTER CLUSTER <cluster name> DROP <table name>and the table will be deleted only from the cluster, but not from the database. From the moment the table is deleted from the cluster, it will no longer be possible to update data from the application, since it refers to the cluster (for example, insert into pet_shop:products ...), and there is no table in it (it is recommended to handle this situation in the application). But all operations on deletion or reconfiguration will become available to us directly with this table. Let's correct the table configuration: switch from the stemmer to the lemmatizer.

To do this we need to take the following steps:

  • Disconnecting a table from a cluster

  • Changing morphology in a table from stemmer to lemmatizer

  • Reloading data into the table

  • Recovering a table in a cluster

  • Check on the second node

Disconnecting a table from a cluster:

ALTER CLUSTER pet_shop DROP products;

Now the table on all nodes in the cluster is disconnected from it and its scheme and settings can be modified. The logic of our work implies that on one node we perform some technical work, in turn, the second gives the user information on the request select, but adding new records will no longer work, since in the client when trying to write, commands in the format are used <cluster>:<table>and this table is no longer in the cluster.

update pet_shop:products set price = 9 where id = 3317338896206921730;
ERROR 1064 (42000): table products: table 'products' is not in any cluster, use just 'products'

After we disconnect the table from the cluster, let's try to execute the query select:

Result for other node

As we can see, the request is processed, all the data is issued and the end user is happy.

Now we will modify the morphology from stemmer to lemmatizer, reindex the records and connect everything back. last time we replaced the word form file and stemmer using a crowbar and some obscene word forms. Here we will use more civilized tools – after all, all operations to replace the word form file or to change the morphology used in the table can be done with one command: ALTER TABLE <table name> morphology='<morph type>'. Let's replace our stemmer with a lemmatizer:

ALTER TABLE products morphology='lemmatize_en_all';

After changing any parameters related to text preprocessing in the database, it is necessary to reindex all existing records so that the new morphology and other tokenization settings are applied to them:

mysqldump -P9306 -h0 --replace -etc --skip-comments manticore products|mysql -P9306 -h0;

Here we use the technique of working with mysqldump, while redirecting the dump output directly to the input of Manticore via mysql. In this case --replace makes mysqldump generate REPLACE commands instead of INSERT, which allows you to “re-upload” the entire table at once. Here you should keep in mind that the execution time of this command for a large table or on a weak server can be long, but this does not scare us too much, since we have a backup node that is currently giving data to user requests. In addition, the mysqldump command does not lock the table.

Having performed this fantastically simple manipulation to reconfigure the table with products, we received a new version:

New table

The new settings and all data are applied, now let's add this table back to the cluster:

ALTER CLUSTER pet_shop add products;

That's it, now the table is updated on all servers, and the data was available to users all the time from the second node, while we were catching missing files or configuring and checking that everything worked correctly.

Other node

It is worth paying attention to restoring the entire cluster if all nodes have failed – if the restoration sequence is incorrect, there is a possibility of losing all its settings. The restoration methods are described in detail in documentation.

By the way, you can try working with replication and node administration on the course service play.manticoresearch.com.

That's all for today! High availability, fewer collisions! This was Mike, good luck!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *