Briefly about TLV, BER, CER, DER, PER formats

I would like to talk about the data formats common in the IT industry, including in the field of public key infrastructures (PKI), smart cards, including next generation documents based on smart cards, in mobile communications. Although the formats discussed are related to ASN.1, some of them have gone far beyond this area. Many people know about some of them, but not everyone knows enough to, for example, be able to distinguish BER from DER, and some variants of the PER type are generally exotic.

I will not dive deep into the topic. I’ll just introduce you to the main features in order to understand what it is and what it is eaten with. Thoroughly and in full, all this is described in the relevant standards ITU-T X.690 and ISO 7816.

One of my motivating tasks is to put the topic in my head on the shelves.

The Abstract Notation (ASN.1) rules are used when the format of a data structure needs to be specified. The rules themselves are described in the ITU-T X.680-X.683 standards. Perhaps the most common use case is X.509 certificate formats and everything related to them. An example text notation might look something like this:

Certificate  ::=  SEQUENCE  {
        tbsCertificate       TBSCertificate,
        signatureAlgorithm   AlgorithmIdentifier,
        signatureValue       BIT STRING  }

AlgorithmIdentifier  ::=  SEQUENCE  {
        algorithm               OBJECT IDENTIFIER,
        parameters              ANY DEFINED BY algorithm OPTIONAL  }

But it’s just a text entry. Encoding rules must be attached to it so that it can be converted to a binary form and, for example, save specific data in the correct format to a file or transfer them to a communication channel. Here the binary representation rules appear on the scene: BER, CER, DER, PER, XER, OER, JER. I will not touch the last three for now, but we will consider the rest.

To complete the picture, let’s add the TLV format from the ISO7816-4 standard to this company. So, I will briefly talk about the formats:

Designation

Name

Standard

TLV

Tag, Length, Value

ISO 7816-4

BER

Basic Encoding Rules

ITU-T X.690-2021

cer

Canonical Encoding Rules

ITU-T X.690-2021

DER

Distinguished Encoding Rules

ITU-T X.690-2021

PER

Packed Encoding Rules

ITU-T X.691-2021

TLV – tag, length, value

This is perhaps the simplest format given. ISO 7816-4 is the base standard for smart cards and their file systems. It mentions SIMPLE-TLV. Strictly speaking, the term TLV is an informal name for a family of formats.

According to SIMPLE-TLV, each data object (DO, data object) consists of three fields: a tag (T) and a length field (L) are required, and an optional data field (V). Text designation: {TLV}.

The tag consists of one byte that takes values ​​from 1 to 254. The values ​​0 and FF are prohibited.

The length field consists of one or three bytes. If the first byte is not equal to FF, then this is the length value and this field consists of one byte. If the first byte is FF, then the next two bytes indicate a length in the range 0 to 65535.

If the length L is not zero, then L data bytes follow. Moreover, the data can be just data (primitive DO) or other objects (constructed DO). In the latter case, a construction of the following form is obtained: {TL-{T1-L1-V1}-{T2-L2-V2}-…-{Tn-Ln-Vn}}.

Example: The data “82 02 D4 AF” denotes a primitive object where T=82, L=2, and V=D4AF. And the data “D1 0A A4 FF 00 02 BD 27 82 02 D4 AF” denotes a compound object {D1-0A-{A4-FF0002-BD27}-{82-02-D4AF}}.

BER – Basic Encoding Rules

The BER format is similar to SIMPLE-TLV in that it retains the idea of ​​three fields: tag, length, value, but each of these fields is encoded differently. Sometimes you can see the name BER-TLV. It is the base for the next two formats (CER and DER), so let’s dwell on it a little longer.

First, ITU-T X.690 uses slightly different terminology: “identifier, length, content” instead of “tag, length, value”, respectively. In what follows, for simplicity, I will use the latter option.

Now the tag is an identifier, which is described by three parameters:

  1. Class;

  2. Flag-indicator “primitive/composite”;

  3. Number.

A tag can belong to one of four classes: Universal, Application, Context-specific, Private. These classes are described in the ASN.1 standard, which states that the Universal class is used only by the ASN.1 specification and users should not use it for their own needs. There are no significant differences between the other three classes.

The primitive/composite indicator flag indicates whether the given object is primitive (primitive, value 0), i.e. contains unstructured data, or contains other objects (constructed, value 1).

The tag number is an unsigned integer that uniquely identifies the type of data contained in the object itself. For example, the tag “Universal 2” is an integer type (INTEGER) in ASN.1, and “Universal 6” is an OID (Object Identifier).

Tags are encoded like this. In the first byte of the tag, the high bits 8 and 7 indicate the class (00 – Universal, 01 – Application, 10 – Context-specific, 11 – Private). Bit 6 is an indicator flag. The remaining lower 5 bits encode the tag number if it does not exceed 30 (short one-byte form, short form). If it exceeds, then these 5 bits should be equal to “11111”, and then each subsequent byte related to the tag should have the most significant bit set. The last byte related to the tag has the most significant bit zero. In all these bytes, the lower 7 bits collectively encode the tag number as an unsigned integer in big-endian format. This form of tag presentation is called long form.

short form
биты      8  7  6  5  4  3  2  1
         ----- --- -------------
           |    |     +--> номер тега от 0 до 30
           |    +--> флаг "примитивный/составной"
           +--> класс тега
long form
Байты     1                        2,3,...                   последний
биты      8  7  6  5  4  3  2  1   8  7  6  5  4  3  2  1    8  7  6  5  4  3  2  1
значение  x  x  x  1  1  1  1  1   1  x  x  x  x  x  x  x    0  x  x  x  x  x  x  x
         ----- --- -------------      -------------------       -------------------
           |    |                          номер тега               номер тега     
           |    +--> флаг "примитивный/составной"
           +--> класс тега

Tag field encoding examples

Tag

Encoded field, hex

Universal, primitive, 6

00000110b = 06h

Application, constructed, 17

01110001b = 71h

Private, primitive, 532

11011111 10000100 00010100b = DF 84 14h

The length field can be represented in one of two forms:

  1. final form (definite form);

  2. Infinite form (indefinite form).

The final form of the length is represented as follows.

If the first byte of the length field has the most significant bit of zero, then the remaining seven bits are the length value. So you can encode lengths from 0 to 127 inclusive.

биты      8  7  6  5  4  3  2  1
значение  0  x  x  x  x  x  x  x
            |-------------------|
                  длина

If the first byte of the length field has the most significant bit cocked, then the remaining seven bits indicate the number of subsequent bytes for the length value. This is followed by the specified number of bytes, followed by an integer in big-endian format.

Байты    1                  2                       3     
биты     8 7 6 5 4 3 2 1    8 7 6 5 4 3 2 1  ...    8 7 6 5 4 3 2 1   
значение 1 x x x x x x x    x x x x x x x x         x x x x x x x x  
          |-------------|  |---------------|       |---------------|
    количество следуюющих    1й байт длины             n-й длины
                байт

Length field encoding examples

Length, dec

Encoded length field, hex

20

14 or 81 14 or 82 00 14 or 83 00 00 14 etc.

124

7C or 81 7C or 82 00 7C or 83 00 00 7C etc.

200

81 C8 or 82 00 C8 or 83 00 00 C8, etc.

10459

82 28 DB or 83 00 28 DB etc.

The length field is followed by the data itself in the specified size.

If the length field starts at byte 80, then this indicates an infinite form. Immediately after byte 80, data begins and continues until two zeros “00 00” meet. This length encoding option is only suitable for compound objects that store other objects. In this case, two zeros is also an object: an object with the “Universal 0” tag and zero length.

It should be noted important property of BER – This ambiguity in the representation of data objects. The same length value can be encoded in several ways, so there are more strict formats – CER and DER.

Interestingly, there is a well-known example of erroneous application of the BER format in the international standard. We are talking about ICAO Doc 9303 on international readable travel documents (the so-called passport with a microcircuit).

First, it uses tags such as 5F01 or 5F08, which is not possible according to BER (their numbers require a short form), but it is so widely implemented that it cannot be fixed. On this occasion, the seventh edition of the standard itself has a special section explaining this inconsistency (see Doc 9303, edition 7, part 10, clause 4.3.1).

4.3.1 Data elements encoding normative note

There is a mismatch between the LDS (version 1.7 and 1.8) specifications and [ISO/IEC 8825-1] (BER/DER encoding rules) [ISO/IEC 8825-1] States for Tags with a number ranging from zero to 30 (inclusive), …

Interestingly, this section has disappeared from the current eighth edition of ICAO Doc 9303! And the discrepancy itself, of course, remained.

Secondly, Doc 9303 uses tags to indicate the semantic load of data, and not their types (in a sense, this can be attributed to features, not errors, but still). What I mean? There, for example, the tag 5F51 denotes the name of a person, the tag 5F53 is an address, and 5F13 is a profession. All these fields have different meanings, but the data type is the same! Just a line. In BER, the tag denotes the type of data, not their meaning. Here in SIMPLE-TLV from ISO the tag denotes the meaning of the data. ICAO got its own variation of BER-TLV, which is not compatible with the typical implementation, so standard ASN.1 parsers will not work here.

CER – Canonical Encoding Rules

The CER format can be thought of as a refined BER with the following restrictions:

  1. If the data object is composite, then the infinite form of length must be used.

  2. If the data object is primitive, then the length field should be the smallest possible size, i.e., for example, the option “81 23” is not allowed, because the length of 23h bytes can be packed shorter: “23”

  3. Some other restrictions regarding certain ASN.1 types. For example, an object of type Octet String must be primitive if it has less than 1000 bytes, otherwise it must be composite. There is a list of such restrictions related to ASN.1 content. You can get acquainted with it in the standard.

DER – Distinguished Encoding Rules

The DER format, like CER, is a refinement of BER, but with different restrictions:

  1. The length field is always encoded in its final form and in the smallest possible size.

  2. Restrictions on the content of ASN.1 types. The main goal is the unambiguity of data representation. For example, an object of type Octet String should never be encoded as a composite. Or, the Boolean type can only take 00 as false or FF as true, while BER uses 00 as false and any non-zero value as true. Or, the unused bits in a BitString must be zero (in BER they can be anything), and so on.

Regarding item 2. DER and CER have a lot in common.

PER – Packed Encoding Rules

The most compact format discussed is PER. It is not related to the TLV family because it has no tags and sometimes even no length field. There is no “superfluous” at all, i.e. what can be learned from the data format specification. There is only data! PER allows two encoding options:

  1. It is possible to assemble structures at the bit level (Unaligned PER);

  2. You can collect structures from byte fields (Aligned PER). In this case, the bit fields are aligned to a length of 8 bits.

PER is devoid of redundancy to the maximum. For example, if the specification of a certain data structure indicates that first comes a number, and then a string, then in the binary representation it makes no sense to specify type tags: the first object is a number, and the next one is a string. Therefore, we note an important property: it is impossible to parse data in the PER format without an exact knowledge of their structure.

Why, for example, pass the length of the Boolean field? It is always the same: one bit is enough.

If there is, say, a number n of type INTEGER (lb..ub), i.e. from lb to ub, then it shall be represented as a value (n-lb) in the range 0 to (ub-lb), and the minimum possible number of bits shall be used to represent the range 0 to (ub-lb). For example, the INTEGER(250…257) type requires only three bits to convey any of the eight possible values.

A BitString value is represented simply as a sequence of bits. If the length of the sequence can be variable, then it must be specified before the value as an INTEGER (lb..ub) value, where lb and ub are the minimum and maximum allowable size of the bit string.

PER is used, for example, in UMTS (3G) and LTE (4G) mobile network protocols.

A number of examples can be found in the Rec. ITU-T X.691.