Briefly about TLV, BER, CER, DER, PER formats
I would like to talk about the data formats common in the IT industry, including in the field of public key infrastructures (PKI), smart cards, including next generation documents based on smart cards, in mobile communications. Although the formats discussed are related to ASN.1, some of them have gone far beyond this area. Many people know about some of them, but not everyone knows enough to, for example, be able to distinguish BER from DER, and some variants of the PER type are generally exotic.
I will not dive deep into the topic. I’ll just introduce you to the main features in order to understand what it is and what it is eaten with. Thoroughly and in full, all this is described in the relevant standards ITU-T X.690 and ISO 7816.
One of my motivating tasks is to put the topic in my head on the shelves.
The Abstract Notation (ASN.1) rules are used when the format of a data structure needs to be specified. The rules themselves are described in the ITU-T X.680-X.683 standards. Perhaps the most common use case is X.509 certificate formats and everything related to them. An example text notation might look something like this:
Certificate ::= SEQUENCE {
tbsCertificate TBSCertificate,
signatureAlgorithm AlgorithmIdentifier,
signatureValue BIT STRING }
AlgorithmIdentifier ::= SEQUENCE {
algorithm OBJECT IDENTIFIER,
parameters ANY DEFINED BY algorithm OPTIONAL }
But it’s just a text entry. Encoding rules must be attached to it so that it can be converted to a binary form and, for example, save specific data in the correct format to a file or transfer them to a communication channel. Here the binary representation rules appear on the scene: BER, CER, DER, PER, XER, OER, JER. I will not touch the last three for now, but we will consider the rest.
To complete the picture, let’s add the TLV format from the ISO7816-4 standard to this company. So, I will briefly talk about the formats:
Designation | Name | Standard |
TLV | Tag, Length, Value | ISO 7816-4 |
BER | Basic Encoding Rules | ITU-T X.690-2021 |
cer | Canonical Encoding Rules | ITU-T X.690-2021 |
DER | Distinguished Encoding Rules | ITU-T X.690-2021 |
PER | Packed Encoding Rules | ITU-T X.691-2021 |
TLV – tag, length, value
This is perhaps the simplest format given. ISO 7816-4 is the base standard for smart cards and their file systems. It mentions SIMPLE-TLV. Strictly speaking, the term TLV is an informal name for a family of formats.
According to SIMPLE-TLV, each data object (DO, data object) consists of three fields: a tag (T) and a length field (L) are required, and an optional data field (V). Text designation: {TLV}.
The tag consists of one byte that takes values from 1 to 254. The values 0 and FF are prohibited.
The length field consists of one or three bytes. If the first byte is not equal to FF, then this is the length value and this field consists of one byte. If the first byte is FF, then the next two bytes indicate a length in the range 0 to 65535.
If the length L is not zero, then L data bytes follow. Moreover, the data can be just data (primitive DO) or other objects (constructed DO). In the latter case, a construction of the following form is obtained: {TL-{T1-L1-V1}-{T2-L2-V2}-…-{Tn-Ln-Vn}}.
Example: The data “82 02 D4 AF” denotes a primitive object where T=82, L=2, and V=D4AF. And the data “D1 0A A4 FF 00 02 BD 27 82 02 D4 AF” denotes a compound object {D1-0A-{A4-FF0002-BD27}-{82-02-D4AF}}.
BER – Basic Encoding Rules
The BER format is similar to SIMPLE-TLV in that it retains the idea of three fields: tag, length, value, but each of these fields is encoded differently. Sometimes you can see the name BER-TLV. It is the base for the next two formats (CER and DER), so let’s dwell on it a little longer.
First, ITU-T X.690 uses slightly different terminology: “identifier, length, content” instead of “tag, length, value”, respectively. In what follows, for simplicity, I will use the latter option.
Now the tag is an identifier, which is described by three parameters:
Class;
Flag-indicator “primitive/composite”;
Number.
A tag can belong to one of four classes: Universal, Application, Context-specific, Private. These classes are described in the ASN.1 standard, which states that the Universal class is used only by the ASN.1 specification and users should not use it for their own needs. There are no significant differences between the other three classes.
The primitive/composite indicator flag indicates whether the given object is primitive (primitive, value 0), i.e. contains unstructured data, or contains other objects (constructed, value 1).
The tag number is an unsigned integer that uniquely identifies the type of data contained in the object itself. For example, the tag “Universal 2” is an integer type (INTEGER) in ASN.1, and “Universal 6” is an OID (Object Identifier).
Tags are encoded like this. In the first byte of the tag, the high bits 8 and 7 indicate the class (00 – Universal, 01 – Application, 10 – Context-specific, 11 – Private). Bit 6 is an indicator flag. The remaining lower 5 bits encode the tag number if it does not exceed 30 (short one-byte form, short form). If it exceeds, then these 5 bits should be equal to “11111”, and then each subsequent byte related to the tag should have the most significant bit set. The last byte related to the tag has the most significant bit zero. In all these bytes, the lower 7 bits collectively encode the tag number as an unsigned integer in big-endian format. This form of tag presentation is called long form.
short form
биты 8 7 6 5 4 3 2 1
----- --- -------------
| | +--> номер тега от 0 до 30
| +--> флаг "примитивный/составной"
+--> класс тега
long form
Байты 1 2,3,... последний
биты 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1
значение x x x 1 1 1 1 1 1 x x x x x x x 0 x x x x x x x
----- --- ------------- ------------------- -------------------
| | номер тега номер тега
| +--> флаг "примитивный/составной"
+--> класс тега
Tag field encoding examples
Tag | Encoded field, hex |
Universal, primitive, 6 | 00000110b = 06h |
Application, constructed, 17 | 01110001b = 71h |
Private, primitive, 532 | 11011111 10000100 00010100b = DF 84 14h |