msgspec. Library for serializing and deserializing anything

In everyday tasks, there are many tools for working with different data formats such as JSON, TOML, YAML and others.

msgspec — is a tool that can work with all these formats and at the same time be fast and easy to use. There is one import for all formats, which is an advantage in terms of working with this library. If you need to parse many different data formats, then this library is definitely for you.

The library contains:

  • High-performance encoders/decoders for common protocols: JSON, MessagePack, YAML and TOMI.

  • Support for a wide range of data types.

  • Fast data validation

  • Structures that allow you to represent data (similar to attrs/pydantic, but faster)

First, let's install the library:

 pip3 install "msgspec[toml,yaml]"

Without additional parameters, the library will be installed with json and messagepack support.

Serialization/deserialization

Serialization and deserialization of data require a data format and an input value.

import msgspec

a = {"1": 2, "3": 4}
encoded_json_data = msgspec.json.encode(a)
print(encoded_json_data)  # b'{"1": 2, "3": 4}'

decoded_data = msgspec.json.decode(encoded_json_data)
print(decoded_data)  # {'1': 2, '3': 4}

yaml_data = msgspec.yaml.encode(a)
print(yaml_data)  # b"'1': 2\n'3': 4\n"

It can be concluded that it is enough to write msgspec.{format}.[encode/decode]where format is json, yaml, toml, msgpack.

Validation

For data validation, it is proposed to create classes inheriting from msgspec.Structsimilar to those in attrs/dataclsses/pydantic.

class User(msgspec.Struct):
    name: str
    surname: str
    email: str | None = None


print(msgspec.json.decode(b'{"name":"vasya","surname":"pupkin"}', type=User))
# User(name="vasya", surname="pupkin", email=None)

print(msgspec.json.decode(b'{"name":"vasya","surname":123}', type=User))
# print(msgspec.json.decode(b'{"name":"vasya","surname":123}', type=User))
#       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# msgspec.ValidationError: Expected `str`, got `int` - at `$.surname`

Only the argument is added type which defines the scheme.

Benchmarks

Let's compare msgspec in speed with the most popular serialization libraries:

To do this, we take large files and run them a thousand times for each case, and compare the average time.

Input data
tests = [
    # JSON
    {
        "msgspec.json.decode(data)": {"data": json_text},
        "json.loads(data)": {"data": json_text},
    },
    {
        "msgspec.json.encode(data)": {"data": json_data},
        "json.dumps(data)": {"data": json_data},
    },
    {
        "msgspec.json.decode(data)": {"data": json_text},
        "orjson.loads(data)": {"data": json_text},
    },
    {
        "msgspec.json.encode(data)": {"data": json_data},
        "orjson.dumps(data)": {"data": json_data},
    },
    # TOML
    {
        "msgspec.toml.decode(data)": {"data": toml_text},
        "toml.loads(data)": {"data": toml_text},
    },
    {
        "msgspec.toml.decode(data)": {"data": toml_text},
        "tomllib.loads(data)": {"data": toml_text},
    },
    {
        "msgspec.toml.encode(data)": {"data": toml_data},
        "toml.dumps(data)": {"data": toml_data},
    },
    # YAML
    {
        "msgspec.yaml.decode(data)": {"data": yaml_text},
        "yaml.load(data, Loader=yaml.Loader)": {
            "data": yaml_text,
        },
    },
    {
        "msgspec.yaml.decode(data)": {"data": yaml_text},
        "yaml.load(data, Loader=yaml.CLoader)": {
            "data": yaml_text,
        },
    },
    {
        "msgspec.yaml.encode(data)": {"data": yaml_data},
        "yaml.dump(data, Dumper=yaml.Dumper)": {
            "data": yaml_data,
        },
    },
    {
        "msgspec.yaml.encode(data)": {"data": yaml_data},
        "yaml.dump(data, Dumper=yaml.CDumper)": {
            "data": yaml_data,
        },
    },
]

Test

Time (ms)

Msgspec (ms)

Acceleration

json.loads(data)

707.8

315.4

2.2

json.dumps(data)

946.4

118.6

8.0

orjson.loads(data)

306.2

315.7

1.0

orjson.dumps(data)

83.9

118.2

0.7

toml.loads(data)

1,017.9

420.0

2.4

tomllib.loads(data)

420.4

420.2

1.0

toml.dumps(data)

154.0

170.3

0.9

yaml.load(data, Loader=yaml.Loader)

16,142.7

1,602.5

10.1

yaml.load(data, Loader=yaml.CLoader)

1,625.4

1,603.1

1.0

yaml.dump(data, Dumper=yaml.Dumper)

10,032.0

1,595.4

6.3

yaml.dump(data, Dumper=yaml.CDumper)

1,600.5

1,594.8

1.0

It can be concluded that the library has better parsers under the hood. If you use only json, then you should take a closer look at another library (for example, orjson), but still the values ​​of the libraries do not differ critically. Also, in the example of yaml, msgspec uses standard C implementations.

Besides, There is benchmarks from the library creators that compare the speed of data serialization and validation.

If your code requires working with different data formats, for example, your program accepts a config in yaml, raises an API that returns json and is not tied to any pydantiс, then you should try msgspec. The library also has goodies in the form of json schemes.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *