Suspicious Types

Nothing is suspicious in their appearance. Moreover, they even seem to you well and have long been familiar. But this is only until you test them. It is here that they will manifest their insidious essence, having worked quite differently from what you expected. And sometimes they throw something that makes the hair stand on end – for example, they lose the secret data entrusted to them. When you confront them, they say that they don’t know each other, although they work hard in the shade under one cap. It’s time to finally bring them to clean water. Let us deal with these suspicious types too.

Data typing in PostgreSQL, with all its logic, really presents sometimes very strange surprises. In this article we will try to clarify some of their quirks, to understand the reason for their strange behavior and understand how not to encounter problems in everyday practice. To tell you the truth, I wrote this article, including as a reference for myself, a reference that could be easily consulted in controversial cases. Therefore, it will be replenished as new surprises from suspicious types are discovered. So, in a way, about tireless trackers of databases!

Dossier number one. real / double precision / numeric / money

It would seem that numerical types are the least problematic in terms of surprises in behavior. But no matter how. Therefore, we will begin with them. So…

Forgot to count

SELECT 0.1::real = 0.1


What’s the matter? In that PostgreSQL converts the untyped constant 0.1 to double precision and tries to compare it with 0.1 of real. And these are completely different meanings! The bottom line is the representation of real numbers in machine memory. Since 0.1 cannot be represented as a finite binary fraction (it will be 0.0 (0011) in binary form), numbers with different bit depths will be different, hence the result that they are not equal. Generally speaking, this is a topic for a separate article, I will not write in more detail here.

Where does the error come from?

SELECT double precision(1)

ERROR:  syntax error at or near "("
LINE 1: SELECT double precision(1)
********** Ошибка **********
ERROR: syntax error at or near "("
SQL-состояние: 42601
Символ: 24

Many people know that PostgreSQL allows for functional type casting. That is, you can write not only 1 :: int, but also int (1), which will be the same. But not for types whose name consists of several words! Therefore, if you want to convert a numeric value to double precision in functional form, use an alias of this type float8, i.e. SELECT float8 (1).

What is more infinity?

SELECT 'Infinity'::double precision < 'NaN'::double precision


There it is! It turns out that there is something greater than infinity, and this is NaN! At the same time, the PostgreSQL documentation looks at us with honest eyes and claims that NaN is certainly larger than any other number, and, therefore, infinity. The converse is also true for -NaN. Hello lovers of matanalysis! But we must remember that all this acts in the context of real numbers.

Rounding eyes

SELECT round('2.5'::double precision)
     , round('2.5'::numeric)

      round      |  round
double precision | numeric
2                | 3

Another unexpected hello from the base. Again, remember that for the double precision and numeric types, different roundings apply. For numeric, this is normal when 0.5 is rounded up, and for double precision, rounding 0.5 occurs towards the nearest even integer.

Money is something special

SELECT '10'::money::float8

ERROR:  cannot cast type money to double precision
LINE 1: SELECT '10'::money::float8
********** Ошибка **********
ERROR: cannot cast type money to double precision
SQL-состояние: 42846
Символ: 19

According to PostgreSQL, money is not a real number. According to some individuals, too. We need to remember that casting to money is only possible to type numeric, just like to type money, only type numeric can be cast. But with him you can already play as your heart desires. But it will not be that money.

Smallint and sequence generation

  FROM generate_series(1::smallint, 5::smallint, 1::smallint)

ERROR:  function generate_series(smallint, smallint, smallint) is not unique
LINE 2:   FROM generate_series(1::smallint, 5::smallint, 1::smallint...
HINT:  Could not choose a best candidate function. You might need to add explicit type casts.
********** Ошибка **********
ERROR: function generate_series(smallint, smallint, smallint) is not unique
SQL-состояние: 42725
Подсказка: Could not choose a best candidate function. You might need to add explicit type casts.
Символ: 18

Doesn't like PostgreSQL to trifle. What are these smallint-based sequences? int, no less! Therefore, when trying to execute the above request, the database tries to cast smallint to some other integer type, and sees that there can be several such casts. Which cast to choose? She cannot solve this, and therefore falls with an error.

Dossier number two. "Char" / char / varchar / text

A number of oddities are present in character types. Let's get to know them too.

What are these tricks?

SELECT 'ПЕТЯ'::"char"
     , 'ПЕТЯ'::"char"::bytea
     , 'ПЕТЯ'::char
     , 'ПЕТЯ'::char::bytea

 char  | bytea |    bpchar    | bytea
"char" | bytea | character(1) | bytea
 ╨     | xd0  | П            | xd09f

What kind of char is this, what kind of clown is this? We don’t need such ones ... Because he pretends to be an ordinary char, even though in quotation marks. And it differs from the usual char, which is without quotes, in that it displays only the first byte of the string representation, while normal char displays the first character. In our case, the first character is the letter P, which in a unicode representation takes 2 bytes, as evidenced by the conversion of the result to bytea type. And the type "char" takes only the first byte of this unicode representation. Then why is this type needed? The PostgreSQL documentation says this is a special type used for special needs. So we are unlikely to need it. But look into his eyes and make no mistake when you meet him with his special behavior.

Extra spaces. Out of sight, out of mind

SELECT 'abc   '::char(6)::bytea
     , 'abc   '::char(6)::varchar(6)::bytea
     , 'abc   '::varchar(6)::bytea

     bytea     |   bytea  |     bytea
     bytea     |   bytea  |     bytea
x616263202020 | x616263 | x616263202020

Take a look at the above example. I specifically led all the results to the type bytea, so that it was clearly visible what lies there. Where are the trailing spaces after casting to varchar (6) type? The documentation concisely states: "When character is cast to a different character type, padding spaces are discarded." This dislike must be remembered. And note that if the string constant in quotes is immediately cast to varchar (6), trailing spaces are preserved. Such are the miracles.

Dossier number three. json / jsonb

JSON is a separate structure that lives its own life. Therefore, its entities and PostgreSQL entities are slightly different. Here are some examples.

Johnson and Johnson. feel the difference

SELECT 'null'::jsonb IS NULL


The thing is that JSON has its own null entity, which is not an analogue of NULL in PostgreSQL. At the same time, the JSON object itself may well be NULL, so the SELECT null :: jsonb IS NULL expression (note the absence of single quotes) this time will return true.

One letter changes everything

SELECT '{"1": [1, 2, 3], "2": [4, 5, 6], "1": [7, 8, 9]}'::json

{"1": [1, 2, 3], "2": [4, 5, 6], "1": [7, 8, 9]}


SELECT '{"1": [1, 2, 3], "2": [4, 5, 6], "1": [7, 8, 9]}'::jsonb

{"1": [7, 8, 9], "2": [4, 5, 6]}

The thing is that json and jsonb are completely different structures. In json, the object is stored as is, and in jsonb it is already stored as a parsed indexed structure. That is why in the second case, the value of the object by key 1 was replaced with [1, 2, 3] on the [7, 8, 9]that came into the structure at the very end with the same key.

Do not drink water from the face

SELECT '{"reading": 1.230e-5}'::jsonb
     , '{"reading": 1.230e-5}'::json

          jsonb         |         json
          jsonb         |         json
{"reading": 0.00001230} | {"reading": 1.230e-5}

PostgreSQL in the JSONB implementation changes the formatting of real numbers, leading them to a classic look. For the JSON type, this does not happen. Strange a little, but its right.

Dossier number four. date / time / timestamp

There are some oddities with date / time types too. Let's look at them. I must say right away that some of the features of behavior become understandable if you understand the essence of working with time zones well. But this is also a topic for a separate article.

My your not to understand

SELECT '08-Jan-99'::date

ERROR:  date/time field value out of range: "08-Jan-99"
LINE 1: SELECT '08-Jan-99'::date
HINT:  Perhaps you need a different "datestyle" setting.
********** Ошибка **********
ERROR: date/time field value out of range: "08-Jan-99"
SQL-состояние: 22008
Подсказка: Perhaps you need a different "datestyle" setting.
Символ: 8

It would seem that there is something incomprehensible? But still, the base does not understand what we put here in the first place - a year or a day? And decides that it is January 99, 2008, that blows her brain. Generally speaking, in the case of transferring dates in text format, you need to carefully check how correctly the database recognized them (in particular, analyze the datestyle parameter with the SHOW datestyle command), since ambiguities in this matter can be very expensive.

Where did you come from?

SELECT '04:05 Europe/Moscow'::time

ERROR:  invalid input syntax for type time: "04:05 Europe/Moscow"
LINE 1: SELECT '04:05 Europe/Moscow'::time
********** Ошибка **********
ERROR: invalid input syntax for type time: "04:05 Europe/Moscow"
SQL-состояние: 22007
Символ: 8

Why can't the base understand the explicitly specified time? Because for the time zone, not an abbreviation is indicated, but the full name, which makes sense only in the context of the date, since it takes into account the history of changes in time zones, and it does not work without a date. And the wording of the time line itself raises questions - what did the programmer really mean? Therefore, everything is logical, if you look.

What is wrong with him?

Imagine a situation. You have a field with the type timestamptz in the table. You want to index it. But you understand that building an index on this field is not always justified due to its high selectivity (almost all values ​​of this type will be unique). Therefore, you decide to reduce the selectivity of the index by casting this type to a date. And get a surprise:

CREATE INDEX "iIdent-DateLastUpdate"
  ON public."Ident" USING btree

ERROR:  functions in index expression must be marked IMMUTABLE
********** Ошибка **********
ERROR: functions in index expression must be marked IMMUTABLE
SQL-состояние: 42P17

What's the matter? In that, to convert the timestamptz type to the date type, the value of the TimeZone system parameter is used, which makes the type conversion function dependent on the tunable parameter, i.e. volatile Such functions are not allowed in the index. In this case, you must explicitly indicate in which time zone the type conversion is performed.

When now is not even now

We are used to the fact that now () returns the current date / time based on the time zone. But look at the following queries:

SELECT now();

  timestamp with time zone
2019-11-26 13:13:04.271419+03


SELECT now();

  timestamp with time zone
2019-11-26 13:13:04.271419+03


SELECT now();

  timestamp with time zone
2019-11-26 13:13:04.271419+03


The date / time is returned the same no matter how much time has passed since the previous request! What's the matter? The fact that now () is not the current time, but the start time of the current transaction. Therefore, it does not change within the transaction. Any query launched outside the scope of a transaction is wrapped implicitly in the transaction, therefore we don’t notice that the time given by a simple SELECT now () request; in fact, it’s not current ... If you want to get an honest current time, you need to use the clock_timestamp () function.

Dossier number five. bit

Strange a little bit

SELECT '111'::bit(4)


Which side should be added in case of type expansion? It seems to be on the left. But only the base has a different opinion on this. Be careful: if the number of digits does not match when casting, you will not get what you wanted. This applies to adding bits to the right, as well as to trimming bits. Also on the right ...

Dossier number six. Arrays

Not even NULL



As normal people raised in SQL, we expect the result of this expression to be NULL. But it was not there. An array is returned. Why? Because in this case, the base casts NULL to an integer array and implicitly calls the array_cat function. But all the same, it remains unclear why this "massive cat" does not reset the array. This behavior also just needs to be remembered.

Summarize. There are enough oddities. Most of them, of course, are not so critical as to speak of blatantly inappropriate behavior. And others are explained by the ease of use or the frequency of their applicability in certain situations. But at the same time there are a lot of surprises. Therefore, you need to know about them. If you find something else strange or unusual in the behavior of any type, write in the comments, I will gladly supplement the dossiers available on them.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *