PostgreSQL – some features of the database and encodings in it

A little more detail on encodings in PostgreSQL relational database.

Default databases

Let’s enter the command:

If we have just installed PostgreSQL, then there we will see the following List of databases:

Depending on the installed operating system in the columns encoding, collate and ctype you can see different encodings.

While we are interested in the names of the databases: template1 and template0

These databases are templates for creating new databases. In other words: in their image and likeness, other databases will be created later.

template0 Is a database template with initial settings. This database should be left as it is.

template1 – this template is used by default when creating new databases. Here we can change the attributes, which will then be borrowed for new databases created according to this template.

Encoding problem

Differences in encodings

In order to migrate data from a local database to a database on a server, or vice versa, the encoding must be the same. Otherwise, the error will be something like this:

Command was: INSERT INTO public.operations (id, user_id) VALUES (333, 1);
pg_restore: error: could not execute query: ERROR: character with byte sequence 0xd0 0x94 in encoding “UTF8” has no equivalent in encoding “LATIN1”

Different sorting in the same encodings

Even if the encodings are the same, it doesn’t matter can lead to excellent sorting results… The author at the link above claims that the encoding lc_collate = C solves these problems.

Changing LATIN1 encodings to UTF8

First, let’s take a look at a couple of errors that can occur in the process.

This error occurs if you skip the first command and immediately try to switch to the database template0:

postgres- # c template0
FATAL: database “template0” is not currently accepting connections
Previous connection kept

The following error has a complicated history, but we can avoid it when we force the values ​​in LC_CTYPE and LC_COLLATE

ERROR: encoding “UTF8” does not match locale “en_US”
DETAIL: The chosen LC_CTYPE setting requires encoding “LATIN1”.

Now let’s define a few basic concepts.

LC_COLLATE – sort order of strings
LC_CTYPE – character classification

Below are the command to change the locale for the database template1… Other values ​​can be used instead of ‘C’, for example: “en_US.UTF-8”.

First comes the command, if its execution is successful, then the value will be output the same as indicated here next. If nothing is displayed or other messages are displayed, then the commands were entered incorrectly or with errors.

update pg_database set datallowconn = TRUE where datname = ‘template0’;
UPDATE 1

c template0
You are now connected to database “template0”

update pg_database set datistemplate = FALSE where datname = ‘template1’;
UPDATE 1

drop database template1;
DROP DATABASE

Next, we make UTF8 encoding with C sorting.

CREATE DATABASE template1 WITH template = template0 ENCODING ‘UTF8’ LC_CTYPE ‘C’ LC_COLLATE ‘C’;
CREATE DATABASE

Or with en_US.UTF-8:

CREATE DATABASE template1 WITH template = template0 ENCODING ‘UTF8’ LC_CTYPE ‘en_US.UTF-8’ LC_COLLATE ‘en_US.UTF-8’;
CREATE DATABASE

We continue to execute the commands:

update pg_database set datistemplate = TRUE where datname = ‘template1’;
UPDATE 1

c template1
You are now connected to database “template1”

update pg_database set datallowconn = FALSE where datname = ‘template0’;
UPDATE 1

Let’s check the result:

Now when we create tables, they will immediately be in the desired encoding.

Alternative option

Although this option is simpler, it is technically correct to edit the reference table, and based on it, create the following tables. However, you can do without this and forcibly create a database with the required encodings every time:

CREATE DATABASE ploshadka_db_1 WITH owner= ploshadka ENCODING = ‘UTF-8’ lc_collate = ‘C’ lc_ctype = ‘C’ template template0;

The result will be the same as if we changed the locale of the template1 table and then outside the psql session entered the command:

Here you can see that the user is postgres, not ploshadka, but this is easily fixable if you create it with the user specified:

createdb ploshadka_db_3 –owner ploshadka

Alternative option to all options

The problem with encodings might not have arisen if you specified the required encoding in the system itself before installing PostgreSQL. This can affect a lot in the system, I have not researched everything, so I will not recommend it, but I will only give examples of commands:

List all available locales on the server:

Reflect the current locale:

Generate the desired locale:

sudo locale-gen en_US.UTF-8

Change locale (will only be changed during the current session):

export LC_ALL=“en_US.UTF-8”

For permanent changes, add here:

sudo nano /etc/default/locale

And fix:

Or automatically:

sudo localectl set-locale LANG= en_US.UTF-8

I should note that the locale from the file does not always work / etc / default / locale

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *