PostgreSQL – some features of the database and encodings in it
A little more detail on encodings in PostgreSQL relational database.
Default databases
Let’s enter the command:
If we have just installed PostgreSQL, then there we will see the following List of databases:
Depending on the installed operating system in the columns encoding, collate and ctype you can see different encodings.
While we are interested in the names of the databases: template1 and template0…
These databases are templates for creating new databases. In other words: in their image and likeness, other databases will be created later.
template0 Is a database template with initial settings. This database should be left as it is.
template1 – this template is used by default when creating new databases. Here we can change the attributes, which will then be borrowed for new databases created according to this template.
Encoding problem
Differences in encodings
In order to migrate data from a local database to a database on a server, or vice versa, the encoding must be the same. Otherwise, the error will be something like this:
Command was: INSERT INTO public.operations (id, user_id) VALUES (333, 1);
pg_restore: error: could not execute query: ERROR: character with byte sequence 0xd0 0x94 in encoding “UTF8” has no equivalent in encoding “LATIN1”
Different sorting in the same encodings
Even if the encodings are the same, it doesn’t matter can lead to excellent sorting results… The author at the link above claims that the encoding lc_collate = C solves these problems.
Changing LATIN1 encodings to UTF8
First, let’s take a look at a couple of errors that can occur in the process.
This error occurs if you skip the first command and immediately try to switch to the database template0:
postgres- # c template0
FATAL: database “template0” is not currently accepting connections
Previous connection kept
The following error has a complicated history, but we can avoid it when we force the values in LC_CTYPE and LC_COLLATE…
ERROR: encoding “UTF8” does not match locale “en_US”
DETAIL: The chosen LC_CTYPE setting requires encoding “LATIN1”.
Now let’s define a few basic concepts.
LC_COLLATE – sort order of strings
LC_CTYPE – character classification
Below are the command to change the locale for the database template1… Other values can be used instead of ‘C’, for example: “en_US.UTF-8”.
First comes the command, if its execution is successful, then the value will be output the same as indicated here next. If nothing is displayed or other messages are displayed, then the commands were entered incorrectly or with errors.
UPDATE 1
c template0
You are now connected to database “template0”…
update pg_database set datistemplate = FALSE where datname = ‘template1’;
UPDATE 1
drop database template1;
DROP DATABASE
Next, we make UTF8 encoding with C sorting.
CREATE DATABASE template1 WITH template = template0 ENCODING ‘UTF8’ LC_CTYPE ‘C’ LC_COLLATE ‘C’;
CREATE DATABASE
Or with en_US.UTF-8:
CREATE DATABASE template1 WITH template = template0 ENCODING ‘UTF8’ LC_CTYPE ‘en_US.UTF-8’ LC_COLLATE ‘en_US.UTF-8’;
CREATE DATABASE
We continue to execute the commands:
UPDATE 1
c template1
You are now connected to database “template1”…
update pg_database set datallowconn = FALSE where datname = ‘template0’;
UPDATE 1
Let’s check the result:
Now when we create tables, they will immediately be in the desired encoding.
Alternative option
Although this option is simpler, it is technically correct to edit the reference table, and based on it, create the following tables. However, you can do without this and forcibly create a database with the required encodings every time:
CREATE DATABASE ploshadka_db_1 WITH owner= ploshadka ENCODING = ‘UTF-8’ lc_collate = ‘C’ lc_ctype = ‘C’ template template0;
The result will be the same as if we changed the locale of the template1 table and then outside the psql session entered the command:
Here you can see that the user is postgres, not ploshadka, but this is easily fixable if you create it with the user specified:
createdb ploshadka_db_3 –owner ploshadka
Alternative option to all options
The problem with encodings might not have arisen if you specified the required encoding in the system itself before installing PostgreSQL. This can affect a lot in the system, I have not researched everything, so I will not recommend it, but I will only give examples of commands:
List all available locales on the server:
Reflect the current locale:
Generate the desired locale:
sudo locale-gen en_US.UTF-8
Change locale (will only be changed during the current session):
export LC_ALL=“en_US.UTF-8”
For permanent changes, add here:
sudo nano /etc/default/locale
And fix:
Or automatically:
sudo localectl set-locale LANG= en_US.UTF-8
I should note that the locale from the file does not always work / etc / default / locale…