Notes on Serialization + System.Text.Json

Formats and markup languages

  • System.Text.Json

  • Serialization

    Serialization is the process of converting a runtime object into a form suitable for further transportation. In simple terms, it is the process of writing the data of an object in memory, i.e. a class or structure, into a form that can be transmitted over a network, stored on disk, used between processes, and so on. Deserialization, accordingly, is the reverse process of restoring the state of an object from the transport format.

    Typically these are text formats like JSON or markup languages. YAMLXML, SOAP and others. For example, binary serializers write bytes directly to the destination stream, although if you delve deeper into the topic – all of the above also undergoes an encoding process (usually using UTF-8), and parsers developed by the authors of the formats restore the state thanks to strict data formatting protocols.

    By the way, the markup language is different from the text format.

    the fact that a markup language is a more complex set of rules for storing and presenting data, using tags, syntax rules, definitions of new entities, etc. Text formats are more lightweight, easy to write/read, and are limited to a small number of control entities.

    Formats and markup languages

    JSON

    JSON (JavaScript Object Notation) is a lightweight text format for representing structured data as key-value pairs, composite objects consisting of these pairs, and JSON arrays. JSON was originally based on JavaScript syntax, but is language-independent and supported by most modern programming languages.

    Representation of data in JSON format:

    XML

    XML (Extensible Markup Language) is a markup language designed to store and transmit data in a format that is easy for computers to process. XML syntax is based on user-defined tags. XML also allows attributes, escape characters, comments, namespaces, and schema validation. Right The compiled XML document is called well-formed.

    The general XML syntax for a C# object would look like this:

    • Simple properties (elements), the name is specified in the opening and closing tag, the value is specified between them. Also,
      A tag can be self-closing using the syntax <tag/>.

      public class SomeData
      {
          public float pi { get; set; } = 3.14f;
          public bool b { get; set; } = false;
          public char c { get; set; } = 'c';
          public string s { get; set; } = "Some Text";
      }
      
      <?xml version="1.0" encoding="UTF-8"?> <!--Этого мы коснемся позже-->
      <SomeData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
        <pi>3.14</pi>
        <b>false</b>
        <c>c</c>
        <s>Some Text</s>
      </SomeData>
      
    • You may have noticed attributes like xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance". We will consider the meaning of some of them later. Speaking about the attributes themselves – they are part of XML tags, syntactically they look like attribute="value"are not separated by commas if there are several attributes. Attributes are the same data recorded by the user or program, it is generally accepted practice to place metadata in them, that is, “information about information”, the second “information” in this case is the value of the XML element, for example:

      <Order time="8/18/2010 4:32:00"> <!--Метаданные о времени заказа-->
        <ID>114</ID>
      </Order>
      
    • Composite objects work according to the same rules.

      <?xml version="1.0" encoding="UTF-8"?>
      <SomeData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
        <pi>3.14</pi>
        <b>false</b>
        <c>с</c>
        <ComplexStruct> <!--Композитный объект. Кстати, это комментарий. -->
          <X>1.5</X>
          <Y>2.7</Y>
        </ComplexStruct>
      </SomeData>
      
    • Collections do not have any specific syntax. However, we can notice that element tags inside arrays use the element type as the name, although this behavior can be overridden by different configurations.

      <?xml version="1.0" encoding="UTF-8"?>
      <SomeData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
        <values>
          <string>first</string>
          <string>second</string>
          <string>third</string>
        </values>
      </SomeData>
      
    • XML supports escape sequences for reserved characters. < > & ' ". By the way, the only places where you can use these symbols directly are comments and CDATA blocks.

      Escapeable characters
      &lt; — представляет символ < (открывающая угловая скобка).
      &gt; — представляет символ > (закрывающая угловая скобка).
      &amp; — представляет символ & (амперсанд).
      &quot; — представляет символ " (двойная кавычка).
      &apos; — представляет символ ' (одинарная кавычка).
      
    • The difference between CDATA and a comment is that a comment is not part of the XML document and is ignored by parsers, while CDATA (character data) is part of the XML data, inside which the text is located “raw” (this can be compared to a raw string in C#).

      <?xml version="1.0" encoding="UTF-8"?>
      
      <SomeData>
          <SomeElement>
              <!--Внутри этого блока я могу использовать <>&"", но комментарии не являются "полезной" частью XML документа.-->
      
              <![CDATA[ CDATA - character data, т.е. данные, которы не требуют экранирования.
              Я могу использовать <>&", а также это часть XML документа.]]>
          </SomeElement>
      </SomeData>
      

    We may have already noticed that the beginning of the XML document contains a prologue <?xml version="1.0" encoding="UTF-8"?>describing the xml version and encoding. When you see a tag with signs ?this means that this tag is not a data representation, but a processing instruction (Processing Instruction - PI), such instructions are often required by the receiving party of the XML document, such as a web server.

    Next comes the opening tag of the object, which specifies namespaces via the attribute xmlns:namespace_name. Namespaces are needed to avoid name conflicts, and to use predefined elements and attributes. Namespaces themselves are usually URIs, although they can also be a simple string like foo. Most XML parsers know about the definitions that any of the standard (non-user) namespaces provide, so no request process to the web server occurs, by standard we mean, for example, http://www.w3.org/2001/XMLSchema-instancecontaining attributes like nil(to indicate whether the value can be zero) or type(for type restriction) or schemaLocation to specify the path to the XSD for the current XML document. In this case, by following the URL itself, we will simply get an information page with links. And again, repeating again, all namespaces, attributes and elements are just text data, which are already operated by parsers, lexers and validators, which is possible due to the deterministic structure of XML.

    XML Schemas

    XSD (XML Schema Definition) is a language for describing the structure, content, and semantics of XML documents. XSD defines the rules and restrictions that XML documents must meet to be considered valid in relation to a given schema and uses XML syntax. The main task of the schema is to validate XML documents. For example, a schema can be passed to a service that will then validate incoming XML, or errors can be avoided by manually filling in an XML file if a schema is available. Please note that a link to a schema and its presence is not a mandatory part of XML.

    I would like to point out that this topic is quite complex and its full analysis would require a separate article. Let's go through it

    XSD Basics:

    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
        <xs:element name="School"> <!--Корневой элемент будущего XML-->
            <xs:complexType>
                <xs:all> <!--All - элементы в любом порядке(в противовес sequence). Но в данном случае у нас только массив студентов-->
                    <xs:element name="Students"> <!--Массив-->
                        <xs:complexType>
                            <xs:sequence>
                                <xs:element name="Student" type="Student" minOccurs="0" maxOccurs="unbounded"/> <!--Здесь мы ограничиваем элементы массива по типу Student(он ниже), а также указываем что этот элемент может встречаться от 0 до неограниченного кол-ва раз, тем самым создавая коллекцию-->
                            </xs:sequence>
                        </xs:complexType>
                    </xs:element>
                </xs:all>
            </xs:complexType>
        </xs:element>
    
    
        <xs:complexType name="Student"> <!--Объявление типа Student, complexType под корнем схемы-->
            <xs:sequence>
                <xs:element name="Name" type="xs:string"/>
                <xs:element name="Grades">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element name="Grade" type="Grade" minOccurs="1" maxOccurs="30"/> <!--Массив элементов типа Grade(он ниже), с размером от 1 до 30 элементов-->
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    
    
        <xs:complexType name="Grade"> <!--Также объявление типа Grade-->
            <xs:sequence>
                <xs:element name="Subject" type="xs:string"/>
                <xs:element name="Value" type="xs:float"/>
            </xs:sequence>
        </xs:complexType>
    
    </xs:schema>
    

    The final XML compiled according to our scheme would look like this:

    <School
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:noNamespaceSchemaLocation="School.xsd">
    
        <Students>
    
            <Student ID="11">
                <Name>Jimmy McGill</Name>
                <Grades>
                    <Grade>
                        <Subject>Math</Subject>
                        <Value>5.6</Value>
                    </Grade>
                </Grades>
            </Student>
    
            <Student ID="01">
                <Name>Pit Ritt</Name>
                <Grades>
                    <Grade>
                        <Subject>History</Subject>
                        <Value>8.7</Value>
                    </Grade>
    
                    <Grade>
                        <Subject>Math</Subject>
                        <Value>7.8</Value>
                    </Grade>
                </Grades>
            </Student>
    
        </Students>
    
    </School>
    

    In addition to XSD, there is DTD (Document Type Definition), which also serves to define the structure of XML. But it is considered obsolete, does not use XML syntax, and has much fewer capabilities. Example syntax:

    <!DOCTYPE note [
      <!ELEMENT note (to,from,heading,body)>
      <!ELEMENT to (#PCDATA)>
      <!ELEMENT from (#PCDATA)>
      <!ELEMENT heading (#PCDATA)>
      <!ELEMENT body (#PCDATA)>
    ]>
    

    SOAP

    SOAP (Simple Object Access Protocol) is a messaging protocol used to transfer data between computers. SOAP is based on XML and defines strict rules for exchanging data between web services and clients. The protocol includes a description of messages, how and in what way they should be transmitted, as well as standard calls and responses.

    That is, SOAP messages represent data transmitted, for example, in the body of an HTTP request/response. To tell the truth, I have not worked with SOAP, the protocol can be classified as obsolete and used mainly in giant legacy projects, such as banking systems and other State Services, modern services more often rely on data exchange via formats such as JSON or YAML.

    An example of a SOAP message looks like this:
    <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ex="http://example.com/">
       <soapenv:Header/>
       <soapenv:Body>
          <ex:GetStudentDetailsRequest>
             <ex:StudentID>12345</ex:StudentID>
          </ex:GetStudentDetailsRequest>
       </soapenv:Body>
    </soapenv:Envelope>
    

    Explanation of elements:

    • <soapenv:Envelope> — is the root element of a SOAP message. It defines the beginning and end of a message. xmlns:soapenv Specifies the namespace associated with the SOAP protocol.

    • <soapenv:Header> — an optional element used to convey metainformation, such as authentication. In this example, it is empty.

    • <soapenv:Body> — a mandatory element containing the main content of the message. Inside the Body are the data that the client transmits to the server or receives from it.

    • <ex:GetStudentDetailsRequest> — is a custom element that is part of the main message content. It includes information that is needed to perform a specific request (in this case, requesting student details by ID).

    • <ex:StudentID> — is an element that contains specific request data (in this case, the student ID).

    YAML

    YAML (Yet Another Markup Language YAML Ain't Markup Language) is a text format (formerly a markup language) for data serialization, designed to provide a human-readable representation of structured data. YAML is used to store configuration files, transfer data between programs, and various other tasks. YAML is not an out-of-the-box format for .NET, although there is a third-party library YAML.NET.

    General YAML syntax:

    • Yaml starts with the first line --- and ends with the line ...(optional, for example several yaml in one file).

    • Simple properties are stored in pairs key:value

      ---
      pi: 3.14
      b: false
      c: c
      o: null
      s: Some Text
      
    • To form composite types and indicate nesting, spaces (not tabs!) are used:

      ---
      pi: 3.14
      b: false
      c: c
      ComplexStruct: # Вложенные члены X и Y
       X: 1.5
       Y: 2.7
      
    • Collections use either inline or inline syntax Values: {a, b, c}or the indent-space-dash newline syntax, especially useful for denoting composite array elements.

      Students: # Array
       - Name: Pam Beasley # Element 1
         Age: 10
         AverageGrade: 3.9
       - Name: Ryan Howard # Element 2
         Age: 11
         AverageGrade: 4.2
      
    • Some data structures may require their own serialization algorithms. For example, dictionaries, as well as in JSON (to/from which yaml is easily converted), can also look like a composite object

      Teachers:
       MichaelScott: {Pam Beasley, Ryan Howard} # string key, string[] value, "" опционально
       JanGofrey: {"Alex Swanson"}
    
    TeachersComplexMappingKey:
    ? Name: MichaelScott # Явный ключ начинается с ?
      Age: 41        
      Subject: Math  
    : # Явное значение ключа начинается с :, в данном случае значение это массив 
      - Name: Pam Beasley
        Age: 10
        AverageGrade: 3.9
      - Name: Ryan Howard
        Age: 11
        AverageGrade: 4.4
    

    Serialization technologies

    We are moving on to practical notes. I will note that almost all the information was taken and digested from Microsoft documentationwhere each section is described in some detail and code examples are provided.

    System.Text.Json

    The most convenient library for serialization to/from JSON today is System.Text.Json. Most of the work with it consists of using static methods JsonSerializerconfigurations using an instance JsonSerializerOptionspassed to methods JsonSerializerusing attributes, implementing custom handlers and object converters.

    System.Text.Json Basics

    Serialization/deserialization is used JsonSerializer.Serialize<T>(), JsonSerializer.Deserialize<T>() and theirs async brothers. Serialization is possible both in/from Streamas well as in a regular line or Utf8Json Reader/Writer. Methods have a large number of overloads for different needs.
    Also, the most important (but optional) step of serialization is the creation of an instance JsonSerializerOptionspassed to the above methods to configure the entire process, so every time you see a mention of JsonSerializerOptions in this material, I mean a design like:

    JsonSerializerOptions options = new()
    {   // В 99% случаев мы конфигурируем опции через синтаксис инициализатора 
        WriteIndented = true,
        IncludeFields = true,
        DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull,
        // И прочие свойства JsonSerializerOptions, что я указываю после точки
    };
    
    string serialized = JsonSerializer.Serialize<MyData>(myobj, options);
    

    Customization of the naming policy and the names themselves in the output JSON is done using JsonSerializerOptions.PropertyNamingPolicy and attribute [JsonPropertyName("")] before the property/field.

    By default JSONSerializer serializes only all public properties (if their type supports serialization). This behavior can be configured to ignore properties using [JsonIgnore](with the ability to specify a condition for ignoring) or when configuring JsonSerializerOptions.DefaultIgnoreCondition (and also JsonSerializerOptions.IgnoreReadOnlyProperties). It is also possible to include non-public properties and fields in Json via the attribute JsonInclude / JsonSerializerOptions.IncludeFields.

    If there are elements in the JSON that cannot be mapped to any property in the target type, we can create Dictionary<string, JsonElement> and mark it with the attribute [JsonExtensionData]where the “lost” JSON properties will be written, for reading which we use the type from the DOM model JsonElementwe will touch on it later.

    Supported collection types are listed Here. Generally speaking, anything that implements is supported for serialization. IEnumerable (where it is simply iterated over and written to JSON). For deserialization, the type must implement one of the interfaces listed in the article, these are the common collection interfaces from ICollection<> to IQueue<>providing an interface for adding elements. There are nuances in Dictionary<K,V> And Stack<T>in the case of a stack, due to its semantics, when deserialized, the values ​​will go backwards (i.e. a serialized stack 3 2 1 0 will be deserialized into 0 1 2 3). In the case of a dictionary, the list of keys is limited to simple serializable types (in other words, those that are serialized as fields/properties as "name":value), because the dictionary is serialized as a JSON Object, not an Array, i.e. if we look at the JSON representation of the dictionary:

    { // Dictionary<string, int> выглядит так
      "First": 1,
      "Second": 2,
      "Third": 3
    }
    

    then we will see that string the key is written as the name of the Json property, and the int value is written as the value.

    And to serialize keys of a more complex type, you may need to implement your own JsonConverterwhich we will look at later.

    Immutable types and padding

    With deserialization into mutable types with public get set properties are simple, if you simplify it very roughly – JSON finds a member of the object whose name matches the name of the JSON property, and writes data to it (in fact, the first time you access it, metadata is generated in reflection mode, but that's not important now).
    However, deserialization into immutable types is also possible, or rather into their get-only/readonly data, by using a parameterized constructor. If there is more than one constructor, or a constructor without parameters is present, the constructor for deserialization must be marked [JsonConstructor]. All constructor parameter names must match the names of the serialized fields/properties, case insensitive. For example, the name in the constructor must be simpleNumberand the name of the field/property in the object must be SimpleNumber/SIMPLENUMBER/simpleNumberwhile [JsonPropertyName] does not affect anything in this implementation. Also note that the type of arguments passed to the constructor must match the type of the corresponding properties of the object.

    public class ReadonlyData
    {
      [JsonInclude]
      public readonly int ID;
      public string Name { get; }
      public DateOnly BirthDate { get; init; }
      public IReadOnlyCollection<int> Numbers { get; } // IReadOnlyCollection чтобы одурачить вас. Это не относится к теме.
    
      [JsonConstructor]
      public ReadonlyData(int id, string name, DateOnly birthDate, IReadOnlyCollection<int> numbers)
      {
        ID = id;
        Name = name;
        BirthDate = birthDate;
        Numbers = numbers;
      }
    }
    

    In addition to deserializing into immutable types, JsonSerializer can also fill initialized values. Normally, the serializer creates a new object for each json property, then assigns a reference to it to the c# property. However, if any c# property/field is already initialized, for example

    [JsonObjectCreationHandling(JsonObjectCreationHandling.Populate)] // На уровне класса
    class A
    {
      [JsonObjectCreationHandling(JsonObjectCreationHandling.Populate)] // или тут
      public List<int> Numbers { get; } = [1, 2, 3];
    }
    

    then, in order to “fill” a given sheet, rather than re-create it, we can use the attribute JsonObjectCreationHandling(JsonObjectCreationHandling.Populate). Serializer via get will take a reference to the sheet and fill it with deserializable values ​​via Add(). By the way, this attribute helps when working with immutable get-only/readonly properties of reference type, however for value types(struct) must be specified set (since instead of a reference to the object in the heap, the serializer will receive a copy of the structure via getfill it out and will have to write it back through set).

    Handling links. $id and $ref

    The objects being serialized may be complex in structure and point to each other. Moreover, the same object may appear multiple times in the serialized data. For example, suppose we have an employee class that has a property pointing to its boss:

    public class Employee
    {
        public string Name { get; set; }
        public Employee? Boss { get; set; }
    }
    

    and let's assume we have a class Corporatecontaining a link to the director and a full list of employees (which includes the director). When creating a company, we create a graph of employees, where the CEO Boss = nulland two managers Boss = CEO and so on.

    public class Corporate
    {
      public Employee CEO { get; set; }
    
      public List<Employee> Employees { get; set; } // Сюда мы пишем сотрудников, указывающих на свое начальство, и само начальство
      }
    

    If we have 1 director and 2 employees pointing to the director, then with normal deserialization we will get 4 director objects: 1 in the property Company.CEO1 per sheet Company.Employees and one more for each employee in the property Employee.Boss:

    {
      "CEO": { // CEO 1
        "Name": "Jim Root",
        "Boss": null
      },
      "Employees": 
      [
        {
          "Name": "Jim Root", // CEO 2. Директор также находится в листе сотрудников.
          "Boss": null
        },
        {
          "Name": "Jason Tward",
          "Boss": {
            "Name": "Jim Root", // CEO 3
            "Boss": null
          }
        },
        {
          "Name": "Alex Stein",
          "Boss": {
            "Name": "Jim Root", // CEO 4. Итого, десериализуя, мы получим 4! одинаковых объекта
            "Boss": null
          }
        }
      ]
    }
    

    However, we can pass in the configuration JsonSerializerOptions .NET defined reference handler ReferenceHandler.Preserve:

    JsonSerializerOptions options = new()
    {
        // Preserve НЕ работает с readonly полями и свойствами и НЕ работает с иммутабельными типами.
        ReferenceHandler = ReferenceHandler.Preserve,
    };
    

    After this, each object in the output JSON will have a meta property added $idwhich is an integer. This will be something like a key for each entry in Json, in turn, objects (in our case, employees) referring to the director, instead of full (re)serialization of the director will be placed in "Boss" meta-property $ref with key $id:

    {
      "CEO": {
        "$id": "1", // ключ для указания на нашего директора
        "Name": "Jim Root",
        "Boss": null
      },
      "Employees": 
      {
        "$id": "2", // У каждой сериализуемой записи будет $id, в том числе у листа. Но эти ключи нас не интересуют.
        "$values": 
        [
          {
            "$ref": "1" // Ссылка на CEO, поскольку он также в листе сотрудников
          },
          {
            "$id": "3",
            "Name": "Jason Tward",
            "Boss": {
              "$ref": "1" // Ссылка на CEO
            }
          },
          {
            "$id": "4",
            "Name": "Alex Stein",
            "Boss": {
              "$ref": "1" // Еще одна
            }
          }
        ]
      }
    }
    

    It works in a pretty simple way, inside JsonReferenceHandler there is a link to the strategy class JsonReferenceResolver. Inside it is a dictionary, the serializer reads the director's JSON, finds the meta-property $iddeserializes the object, places it in the mentioned director dictionary under the key equal to its $idand then, bumping into $ref in employees, he already receives from the dictionary the director's object as specified in $ref key and assigns a reference to this object to the deserialized employee field Boss.

    By the way, it is precisely because of the reading order that it is extremely important that the objects that are referenced come before the objects that are referenced, but we can apply the attribute [JsonPropertyOrder] to determine the order. Tip: in the absence of third-party configurations, fields included in serialization via, for example, [JsonInclude]always come after the properties.

    Polymorphism

    Polymorphism is one of the problems of serialization. If there is a reference of a base type that actually points to an object of a child type, then it will be impossible to deserialize the original state of the object without additional tools, since we only know about the base type, what type of object was actually serialized remains unknown. For such situations, there is an attribute [JsonDerivedType(typeof(DerivedTypeName))]. It is specified before the base class, in its arguments we list the child types. That is, we kind of tell the serializer “If you deserialize an object into a reference of this type, then you may be working with an object of one of the child types listed in its attribute.” The attribute can be used multiple times if there are several inherited types that require polymorphic deserialization.

    Just to clarify, when serializing, the serializer relies on an object in memory. Any of your objects that are serialized via a reference like objectwill be serialized as your object, since the serializer internally relies on reflection. With deserialization, everything is more interesting, since it needs to create this very object, which means it can only rely on the type argument JsonSerializer.Deserialize<T>(). To support polymorphism, a meta-property is added to the JSON representation of the serialized object. $type. It contains the discriminator (key) of the type whose instance needs to be created. By the way, the attribute [JsonDerivedType(typeof(DerivedTypeName), typeDiscriminator)] takes an argument typeDiscriminatorto which we can transfer int or stringfor manual configuration, otherwise the int 0..n counter for each child type in attributes will be used.

    [JsonDerivedType(typeof(Developer), "developer")] // "developer" - наш дискриминатор
    [JsonPolymorphic(UnknownDerivedTypeHandling = JsonUnknownDerivedTypeHandling.FallBackToNearestAncestor)] // Об этом позже
    public class Employee
    {
      // ...
    }
    

    So the output JSON will have a new entry:

    {
      "$type": "developer",
      // Другие данные
    }
    

    Matching the value from the meta property $type and one of the discriminators in the attribute JsonDerivedType – the serializer will create an instance of the specified type and deserialize the data into it.

    By the way, with polymorphic deserialization and serialization as a type argument JsonSerializer.Serialize<T>()/JsonSerializer.Deserialize<T> we must specify the base type, that is, the type marked with attributes [JsonDerivedType].

    Also, in the C# code example above, the attribute was mentioned [JsonPolymorphic]. It is used for additional configuration of polymorphic deserialization, for example in the example it is used [JsonPolymorphic(UnknownDerivedTypeHandling = JsonUnknownDerivedTypeHandling.FallBackToNearestAncestor)]which says that if an unknown type is found that is not specified in [JsonDerivedType]then JSON deserializes the object into an instance of the closest known (specified) ancestor.

    JSON Document Object Model

    JSON DOM (Document Object Model) is a model for representing the structure of a JSON document in memory as an object or tree of objects that can be worked with programmatically. We can deserialize any object into a structure that can be worked with without having a specific type.

    The main types for working with DOM are JsonNode And JsonDocument. The first allows working with data and changing it, the second is immutable. JsonNode based on indexer syntax

    { // Исходник
      "ID": 1,
      "Name": "Jim Carry",
      "Sales":
      [
        1,2,3,4,5
      ]
    }
    
    JsonNode node = JsonNode.Parse(stream)!;
    JsonNode idJsonValue = node["ID"]!;
    int ID = idJsonValue.Deserialize<int>();
    Console.WriteLine(ID);
    

    and represents the data in the form of 3 entities: JsonValuei.e. a primitive key-value property like "id":1, JsonObjectwhich is a composite object (enclosed in curly braces), and JsonArrayrepresenting arrays. All three types inherit from JsonNodewhich also allows you to use its interface and in particular the indexer and go deeper into the tree of objects.

    JsonDocument – immutable implementation for working with DOM. It is also essentially a graph of objects from parsed Json and allows you to access nested properties, iterate over them, and so on. Working with JsonDocument is done through the type JsonElement and begins with an appeal to JsonDocument.RootElement:

    { // Исходник
      "ID": 1,
      "Name": "Jim Carry",
      "Sales":
      [
        1,2,3,4,5
      ]
    }
    
    using JsonDocument document = JsonDocument.Parse(stream); // IDisposable, поэтому using
    JsonElement root = document.RootElement; // Корневой элемент
    foreach (var obj in root.EnumerateObject()) // Перебираем все записи в корневом объекте
    {
        Console.WriteLine(obj);
    }
    
    JsonElement sales = root.GetProperty("Sales"); // Получаем массив в корне, перебираем элементы массива.
    foreach (JsonElement sale in sales.EnumerateArray())
    {
        Console.Write(sale.GetInt32() + ", ");
    }
    

    UTF8 Json Writer/Reader

    Utf8JsonWriter / Utf8JsonReader – types that provide writing/reading of JSON at the lowest level. All serialization and deserialization comes down to using them. Brief examples of their work:

    Utf8JsonWriter provides an API for writing step-by-step objects, properties, arrays, quite easy to use, from a distance similar to a regular one StreamWriter. If you take a piece of paper, a pen, sit down to deserialize an object manually and speak out every action you take, you will turn into Utf8JsonWriter:

    using MemoryStream ms = new(); // Создадим стрим, куда будем писать
    using Utf8JsonWriter writer = new(ms, options); // Writer тоже IDisposable
    
    writer.WriteStartObject(); // Начинаем писать сам сериализуемый объект
    writer.WriteStartObject("Employee"); // Начинаем писать Json-объект. Employee здесь имя свойства в выходном Json, т.е. "Employee":{value}
    
    // Совет: кодируем UTF-16 .NET строку в UTF-8 через JsonEncodedText для производительности своими руками.
    JsonEncodedText name = JsonEncodedText.Encode("Jim Carry");
    writer.WriteString("Name", name); // На Json выходе получаем: "Name": "Jim Carry"
    
    writer.WriteNull("Null"); // "Null": null
    writer.WriteNumber("ID", 10); // "ID": 10
    
    writer.WriteStartArray("Values"); // Начинаем писать массив внутри Json объекта
    writer.WriteNumberValue(10);
    writer.WriteNumberValue(20);
    writer.WriteEndArray(); // Закрываем массив
    
    writer.WriteEndObject(); // Закрываем Json объект Employee
    writer.WriteEndObject(); // Закрываем сам сериализуемый объект
    writer.Flush(); // Записываем оставшееся в буфере JsonWriter'a в целевой поток ms
    

    Utf8JsonReader – is much more optimized, is ref structreads Json payload(content) by tokens. A token can be the beginning of an object, a property name, a property value, the end of an object, the beginning of an array, etc.

    JsonReaderOptions options = new() // Опции reader'a для обработки комментариев и "лишних" замыкающих запятых
    {
        CommentHandling = JsonCommentHandling.Skip,
        AllowTrailingCommas = true
    };
    
    ReadOnlySpan<byte> jsonBytes = Encoding.UTF8.GetBytes(json); // Быстренько (и неоптимизированно) получим байты
    Utf8JsonReader reader = new(jsonBytes, options);
    
    while (reader.Read()) // Пока буфер не кончился
    {
        switch (reader.TokenType) // Reader читает все байты одного токена, после чего мы проверяем что это за токен
        {
            case JsonTokenType.StartArray: // Если это начало массива, следующий токен будет элементом массива
            
            while (reader.Read() && reader.TokenType != JsonTokenType.EndArray) // Пока не дошли до конца массива
            {
                Console.Write(reader.GetInt32() + ", "); // Читаем данные элемента массива и пишем их в консоль
            }
        }
    }
    

    Implementation of JsonConverter

    If a type for some reason requires custom serialization rules, then the solution may be to implement your own JsonConverter<T>transferred to the collection JsonSerializerOptions.Converters. Converters are divided into 2 types: Basic (JsonConverter<T>) and Factory (JsonConverterFactorycreating instances of basic JsonConverter<T>).

    With a regular one, everything is simple, it checks whether it can handle the passed type (the serializer passes all serializable properties to the converters starting from the root object, where each one checks through CanConvert(Type typeToConvert)whether it can convert objects of the given type). If it can convert the given type, the overridden methods you defined are called. void Write(Utf8JsonWriter writer, T value, JsonSerializerOptions options);
    or T? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options);. As we can see, the converter is passed Utf8Json Writer/Reader, through which you write/read objects, as discussed earlier.

    Implementation of Basic Converter
    public class Employee // Тип для сериализации
    {
        public string Name { get; set; }
        public int ID { get; }
        public readonly DateTime dateOfBirth;
    
        public Employee(string name, int iD, DateTime dateOfBirth)
        {
            Name = name;
            ID = iD;
            this.dateOfBirth = dateOfBirth;
        }
    }
    
    public class EmployeeJsonConverter : JsonConverter<Employee>
    {
      public override void Write(Utf8JsonWriter writer, Employee value, JsonSerializerOptions options)
      {
        // Алгоритм записи супер примитивный, давайте просто запишем все свойства и поля объекта через дефис
        string data = $"{value.Name} - {value.ID} - {value.dateOfBirth.ToString(CultureInfo.InvariantCulture)}";
        writer.WriteStringValue(data);
      }
    
      public override Employee? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
      {
        // Создаем переменные для будущей передачи в конструктор
        string Name = string.Empty;
        int ID = 0;
        DateTime dateOfBirth = DateTime.MinValue;
    
        // В нашем алгоритме записи мы просто пишем 3 поля объекта через дефис. Поэтому читаем 1 строку.
        string? data = reader.GetString();
        if (data != null)
        {
          // Делим данные в JSON
          var splitted = data.Split('-', StringSplitOptions.TrimEntries); 
          
          // И парсим их
          Name = splitted[0];
          ID = int.Parse(splitted[1]);
          dateOfBirth = DateTime.Parse(splitted[2], CultureInfo.InvariantCulture);
        }
    
        return new Employee(Name, ID, dateOfBirth);
      }
    }
    

    Factory converters are more complicated. Their main purpose is to create basic converters, mentioned earlier, which means we use factory converters when some undefined type requires conversion, which we cannot implement manually. The most common case is unclosed generic types. For example, we have a class DictionaryKeywhich is used as a key in the dictionary. Since it is a composite type with fields and properties, it cannot be used out of the box as a key. This means that we need to create a converter for Dictionary<DictionaryKey,>the second generic argument after the comma is not specified, which means the generic type is open, so we don't have to write a separate basic converter for each value type that we plan to use in the dictionary.

    Once fabrick converter is sure it is dealing with the right type, it creates an instance of the already basic converter (also written by us), typically using System.Reflection.Activator(which is a common technique for instantiating objects at runtime). So, to work with factory converters, we also need to define a regular converter that knows how to read/write our DictionaryKey.

    public override bool CanConvert(Type typeToConvert)
    {
        return typeToConvert.IsGenericType && // Если тип generic
        typeToConvert.GetGenericTypeDefinition() == typeof(Dictionary<,>) &&  // И словарь
        typeToConvert.GetGenericArguments()[0] == typeof(DictionaryKey); // И первый (известный) аргумент это DictionaryKey, значит я могу создать для вас экземпляр обычного конвертера.
    }
    
    public override JsonConverter? CreateConverter(Type typeToConvert, JsonSerializerOptions options)
    {
        var genArgs = typeToConvert.GetGenericArguments(); // Получаем все gen аргументы словаря
        Type valueType = genArgs[1]; // Второй аргумент, в данном случае тип значения в словаре, о котором мы ничего не знаем
    
        // Создаем экземпляр заранее определенного Basic конвертера, в данном случае JsonConverter<TValue>, типизируя его вторым generic аргументом, то есть valueType. Почему типизируем его вторым аргументом - потому что об аргументе значения словаря, т.е. DictionaryKey, он и так знает, в этом есть цель его существования. Описанный конвертер вы можете найти ниже.
    
        return 
            (JsonConverter)Activator.CreateInstance(typeof(DictionaryKeyJsonConverter<>).MakeGenericType(valueType), // Типизируем конвертер типом значения словаря
            BindingFlags.Instance | BindingFlags.Public,
            binder: null,
            args: [options], // Конструктор конвертера ожидает JsonSerializerOptions
            culture: null)!;
    }
    
    Implementing a Basic factory generated converter for DictionaryKey

    Let's imagine that we have a simple class DictionaryKey. We will make its conversion for serialization into a dictionary key extremely amateurish: we will simply separate the field values ​​with hyphens. Of course, in field conditions to ToString() no one comes running.

    public class DictionaryKey
    {
      // Поля вместо свойств просто потому что. Не играет роли.
      [JsonInclude]
      private int first;
      [JsonInclude]
      private float second;
      [JsonInclude]
      private string name;
    
    
      [JsonConstructor]
      public DictionaryKey(int first, float second, string name)
      {
          this.first = first;
          this.second = second;
          this.name = name;
      }
    
    
      public override string ToString()
      {
          return $"{first}-{second}-{name}";
      }
    }
    

    Now we implement the converter that the above factory generates through Activator.

    public class DictionaryKeyJsonConverter<TValue> : JsonConverter<Dictionary<DictionaryKey, TValue>>
    {
        private JsonConverter<TValue> valueConverter; // Конвертер для TValue, тип значения нашего словаря.
    
        
        public DictionaryKeyJsonConverter(JsonSerializerOptions options)
        {
            valueConverter = (JsonConverter<TValue>)options.GetConverter(typeof(TValue)); // Который мы получаем из конфигурации сериализатора. Если в конфигурацию не был передан конвертер для данного типа, используется конвертация по умолчанию, с которой мы имели дело все это время.
        }
    
    
        public override void Write(Utf8JsonWriter writer, Dictionary<DictionaryKey, TValue> dict, JsonSerializerOptions options)
        {
            writer.WriteStartObject(); // Начинаем писать объект, который на самом деле является словарем.
    
            foreach ((DictionaryKey key, TValue value) in dict)
            {
                string propertyName = key.ToString()!; // Наш ключ для наблюдателя будет выглядеть как имя json-свойства. То есть "key": value, где key это DictionaryKey, умело уместивший свои данные в строку через ToString(), для простоты примера.
    
                // Соблюдая политику наименования, которая возможно была передана в JsonSerializerOptions, записываем наш DictionaryKey как имя свойства.
                writer.WritePropertyName(options.PropertyNamingPolicy?.ConvertName(propertyName) ?? propertyName);
    
                valueConverter.Write(writer, value, options); // А значение записываем через конвертер нашего TValue типа, полученный в конструкторе.
            }    
    
            writer.WriteEndObject(); // Словарь записан как объект.
        }
    
    
        public override Dictionary<DictionaryKey, TValue>? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
        {
            // Наш словарь - JSON объект, а значит начинается с этого токена.
            if (reader.TokenType != JsonTokenType.StartObject)
            {
                throw new JsonException();
            }
    
            Dictionary<DictionaryKey, TValue> result = new();
            
            while(reader.Read())
            {
                if (reader.TokenType == JsonTokenType.EndObject) // Закончили читать словарь
                {
                    return result;
                }
    
                // Запись в словаре всегда начинается с PropertyName, в которое мы умело запихали DictionaryKey 
                if (reader.TokenType != JsonTokenType.PropertyName)
                {
                    throw new JsonException();
                }
    
                string propertyName = reader.GetString()!; // Читаем имя свойства
                // И, предположим, мы ToString() реализовали путем склеивания всех свойств и полей DictionaryKey через дефисы. Расклеиваем обратно.
                string[] data = propertyName.Split('-', StringSplitOptions.RemoveEmptyEntries);
    
                // Предположим DictionaryKey состоит из 3 свойств int,float,string.
                DictionaryKey key = new(int.Parse(data[0]), float.Parse(data[1]), data[2]);
    
                // Reader читает токенами, а значит вызывав Read мы переместим его указатель с токена имени свойства(DictionaryKey здесь) на его значение, которое также является TValue для нашей пары <DictionaryKey, TValue>
                reader.Read();
    
                // Делегируем чтение конвертеру TValue
                TValue value = valueConverter.Read(ref reader, typeof(TValue), options)!;
    
                result.Add(key, value); // Добавляем пару в словарь
            }
    
    
            throw new JsonException(); // Вас здесь не должно быть, вы пропустили JsonTokenType.EndObject, делающий return.
        }
    
    }
    

    Contracts

    Every serializable .NET type requires what is called a contract. The contract specifies whether to include fields, how to write property names in JSON, which properties to ignore, which converter to use for which property, etc. We typically customize contracts using JSON attributes, passing an instance to the serializer. JsonSerializerOptionscreating your own converters, etc. However, there is also an option to customize contracts at a higher level.
    To customize the contract we can contact JsonSerializerOptions.TypeInfoResolver and initialize either with its implementation or with an instance of a predefined type DefaultJsonTypeInfoResolver. What is the convenience of the second approach – we do not need to inherit and implement complex logic, as it was with converters, since working with JsonTypeInfoResolver consists of transferring delegates to the collection Modifiers.
    This collection accepts any delegate with signature Action<JsonTypeInfo>that is, we can pass methods there (hooks in essence, intercepting the serialization process) that do not return anything and accept an object JsonTypeInfo. This object hides every entity that the serializer plans to include in the JSON, from the root object to the nested ones.
    In short, JsonTypeInfo looks a bit like MemberInfo from reflection, through it we can get a list of its own properties and fields, through it we can determine whether the type will be serialized as a Json object/simple property/array, access its C# type, access converters if any are defined, and so on.

    Simple examples of contract customization:

    public static void IgnorePasswords(JsonTypeInfo typeInfo)
    {
        for (int i = 0; i < typeInfo.Properties.Count; i++) // Перебираем все свойства объекта
        {
            if (typeInfo.Properties[i].PropertyType == typeof(Password)) // Если любое свойство является паролем ( какой-то наш класс)
            {
                typeInfo.Properties.RemoveAt(i); // То удаляем его из списка свойств. То есть объекты типа Password не будут включены в выходной JSON.
            }
        }
    }
    

    Or, for example, we can use reflection inside the contract modifier to include all fields in the serialization (but of course passed to the serializer JsonSerializerOptions.IncludeFields would be much more effective)

    public static void IncludeFieldsModifier(JsonTypeInfo typeInfo)
    {
        if (typeInfo.Kind is not JsonTypeInfoKind.Object) // Если это не композитный JSON-объект, игнорируем его, массивы и примитивные пары "key":value не обладают полями
        {
            return;
        }
    
        // Вытаскиваем все поля, обращаясь к уже .NET типу сериализуемого typeInfo через свойство Type
        foreach (var fieldInfo in typeInfo.Type.GetFields(System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.NonPublic))
        {
            // Создаем новый JsonPropertyInfo. Это информация о JSON-свойстве, используя которую сериализатор запишет "key":value пару. 
            JsonPropertyInfo jsonPropertyInfo = typeInfo.CreateJsonPropertyInfo(fieldInfo.FieldType, fieldInfo.Name);
    
            // Нам нужно указать на логику чтения и записи значения новоиспеченного JSON-свойства, в данном случае мы передаем методы рефлексии для чтения/записи FieldInfo, поскольку по сигнатуре они подходят под делегаты Get/Set
            jsonPropertyInfo.Get = fieldInfo.GetValue; // Передаем в делегат Get метод для получения значения
            jsonPropertyInfo.Set = fieldInfo.SetValue; // Передаем в делегат Set метод для установки значения
    
            typeInfo.Properties.Add(jsonPropertyInfo); // Добавляем JSON-свойство к JSON-свойствам нашего Json-объекта
        }
    }
    

    After creating our contracts, we pass them to the configuration:

    JsonSerializerOptions options = new()
    {
        TypeInfoResolver = new DefaultJsonTypeInfoResolver()
        {
            Modifiers = { IncludeFieldsModifier } // Передаем наш метод в лист делегатов, которые будут вызываться.
        }
    };
    

    Serialization modes

    Usually JsonSerializer works in Reflection mode. When a type is serialized for the first time, metadata (contracts) are created that describe how the type should be serialized, what properties and fields, their attributes, etc. This approach satisfies most needs, but in some cases, where optimization is especially important, we may want to avoid collecting metadata at runtime.

    Therefore, the serializer has a second mode of operation: SourceGeneration.
    It is also divided into 2 sub-modes: Matadata-based and Serialization Optimization.

    However, these modes have drawbacks: most of the serializer features stop working, such as reference resolution, deserialization of immutable types, attributes for filling initialized objects, etc.

    Otherwise, this topic is incredibly huge and the pursuit of optimization is often not worth the candle, so I just I'll leave a link to the documentation.

    Conclusion

    I hope this article will remain in your bookmarks and will be useful. Write notes, drink tea, bake buns.

    Similar Posts

    Leave a Reply

    Your email address will not be published. Required fields are marked *