Meet the Next Generation UUID for High-Load Systems Keys

March 31, 2022 on the website IETF was officially posted text of the working paper (copy 1, copy 2) New UUID Formats (hereinafter referred to as the standard), which should formally update, but actually replace the long outdated and initially flawed RFC 4122.

The standard introduces new formats for universally unique identifiers (UUID) having the following properties:

  • for use in high-load applications and databases – both monolithic and distributed,

  • increasing in generation time (without leap seconds),

  • containing a timestamp, a counter with its segments initialized to zero and a pseudo-random value, as well as a pseudo-random value itself,

  • combined with metadata.

The standard recommends that DBMS vendors ensure that UUIDs are created and stored in new formats for use as identifiers or left-hand sides of identifiers, such as, but not limited to:

IN long and heated debate managed to develop a standard of essentially impeccable quality. Although some vague formulations and previous unsuccessful technical solutions have been preserved in the text, original and beautiful solutions to all problems have been found. It is worth noting especially the creative contribution of a resident of Japan with a pseudonym LiosK and reasonable decisions of the initiators of the standard Brad Peabody And Kyzer Davis from USA. The standard provides the maximum possible speed for searching for records by the UUID value contained in them. The standard contains many correct recommendations with justification. The only significant flaw in the standard is the wasteful use of 6 of the 128 UUID bits (segments ver And var) only for compatibility with the obsolete RFC 4122. The standard is superior ULID, KSUID, CUID and other analogues. All of them were explored and are specified in the standard.

The standard has not yet been approved, but DBMS vendors may already be starting to implement it. It is impossible to imagine that another, better and significantly different version of the standard will appear. Prototypes attached to the standard in the language C are highly simplified and therefore cannot be a good basis for development. Of the three proposed formats, the most practical value is the version UUIDv7.

Due to time constraints, the current version of the standard does not include alternative UUID text encodings. The initiators of the standard want to include them in the next version of the standard, and the encoding Crockford’s Base32 already approved by them.

While the standard gives the developers of UUID generators a great deal of freedom within the outlined limits, the reference UUID structure that was discussed during the development of the standard is as follows:

Designation in the standard

Segment position in UUID from left to right

Length, bits

Binary value or calculation algorithm

purpose

unix_ts_ms

one

48

Number of milliseconds since midnight (00:00:00) January 1, 1970 Coordinated Universal Time (UTC) minus leap seconds

Ensuring monotonicity of written UUIDs. Timestamp with millisecond precision, lagging behind UTC by tens of seconds. The millisecond is the maximum possible precision for ordering by point in time generating UUIDs coming from different sources

ver

2

4

“0111”

UUIDv7 version. The meaning of this segment is only in compatibility with RFC 4122

rand_a

3

one

Counter segment initialized to zero every millisecond

Counter overflow protection against unlikely counter initialization failure with a large pseudo-random value

4

eleven

Counter segment initialized to a pseudo-random value every millisecond

The counter ensures monotonicity of UUIDs from a single source, generated within a millisecond. Initializing the counter with a pseudo-random value reduces the chance of UUID collisions

var

five

2

“10”

A variant detailed in a standard or in RFC 4122, as opposed to other variants mentioned in RFC 4122. The meaning of this segment is only in compatibility with RFC 4122

rand_b

6

12

Counter segment initialized to a pseudo-random value every millisecond

The counter should be long enough to protect against overflow, but not too long to speed up the desired high-order binary lookup of the UUID. Within a millisecond, the entire counter is incremented by one for each next UUID

7

fifty

Pseudo-random value generated separately for each UUID

Unlike the counter segment, which is initialized with a pseudo-random value every millisecond, this segment makes it difficult to guess close UUIDs with the same timestamps

no designation

to the right of the UUID in an identifier used as a unique or surrogate key

any

Custom segment that can be compound

See below the table for possible custom segment elements

Possible custom segment elements:

  • additional pseudo-random value

  • entity type or database table code

  • namespace

  • shard (segment) or partition (section)

  • data source code

  • operation type or message type code

  • check sum

  • other application-specific elements

Similar Posts

Leave a Reply