Basics you need to know about JSON Schema

JSON Schema allows you to describe:

  • Data format (numbers, strings, objects, arrays).

  • Acceptable values ​​(minimum, maximum).

  • Required and optional fields.

  • And many other cool things that will save you a lot of time debugging.

Simple JSON Schema

Straight to practice. Let's say you need to describe JSON for the user. The JSON will contain name, age and email. Let's start with something simple:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "name": {
      "type": "string"
    },
    "age": {
      "type": "integer",
      "minimum": 18
    },
    "email": {
      "type": "string",
      "format": "email"
    }
  },
  "required": ["name", "age", "email"]
}

What's going on here?

  1. $schema is the version of the specification you follow (we use draft-07).

  2. type — indicates the data type. In our case this is an object.

  3. properties – this is a description of the fields, where their types and additional restrictions are indicated. For example, age must be an integer and a minimum of 18.

  4. required – required fields, without which our user has no right to exist.

Yes, in this example we are checking that the age is at least 18 years old. It looks pretty trivial, but there are a lot of possibilities hidden under the hood of JSON Schema, which we'll talk about now.

Complex circuits

Now let's complicate the task. Let's say you need to describe not just a user, but an array of users. Each user has an ID, name, age and list of addresses. JSON Schema also helps here:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "id": {
        "type": "string"
      },
      "name": {
        "type": "string"
      },
      "age": {
        "type": "integer",
        "minimum": 18
      },
      "addresses": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "street": {
              "type": "string"
            },
            "city": {
              "type": "string"
            },
            "postalCode": {
              "type": "string",
              "pattern": "^[0-9]{5}$"
            }
          },
          "required": ["street", "city", "postalCode"]
        }
      }
    },
    "required": ["id", "name", "age"]
  }
}

There's a lot going on here:

  1. We described the array type: arraywhere each element is a user object.

  2. The user has a list of addresses, which is also an array.

  3. At each address we have a regular check of the zip code (postal code). Here we add a little sweetness – regular expressions (in our case, the index must be strictly a five-digit number).

Yes, at first glance everything is simple, but when you need to validate real data (hundreds of users, a bunch of fields, different countries), such little things save the API from collapse.

AnyOf, AllOf, OneOf

JSON Schema is not only about strict boundaries. It often happens that you don't know exactly what type of data is expected. For example, you can get an object or a string depending on the context. Things like this help here: anyOf, oneOf And allOf.

AnyOf: how to say “either this or that”

If you need one of several conditions to be true, use anyOf. For example, I want a field to be either a number or a string:

{
  "type": "object",
  "properties": {
    "price": {
      "anyOf": [
        { "type": "number" },
        { "type": "string" }
      ]
    }
  }
}

Field price can be either a number or a string. It is very useful when there is an API where data can come in different formats, but you want to leave some freedom.

OneOf: strictly one

WITH oneOf everything is stricter – only one of the conditions is validated:

{
  "type": "object",
  "properties": {
    "discount": {
      "oneOf": [
        { "type": "number", "minimum": 0, "maximum": 100 },
        { "type": "boolean" }
      ]
    }
  }
}

There's a field here discount can be either a discount percentage (from 0 to 100) or a Boolean value (for example, discount is on or off).

AllOf: you need everything at once

WITH allOf there is a requirement that all conditions be met. Useful if there is a complex structure and the data must meet several criteria:

{
  "type": "object",
  "properties": {
    "product": {
      "allOf": [
        { "type": "string" },
        { "minLength": 3 },
        { "pattern": "^[A-Z].*$" }
      ]
    }
  }
}

Field product must be a string, at least 3 characters long, and begin with a capital letter. Yes, such simple requirements sometimes exist in real life.

How to validate JSON using JSON Schema

Now the important point is how to use all these descriptions in practice? After all, writing a beautiful diagram is only half the battle. It also needs to be used for data validation.

For example, in Python this is done using the library jsonschema. You can install it simply via pip:

pip install jsonschema

Validation example:

import jsonschema
from jsonschema import validate

# Определяем схему
schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer", "minimum": 18},
    },
    "required": ["name", "age"]
}

# Пример данных
data = {
    "name": "Nikolay Sobolev",
    "age": 5
}

# Валидация данных
try:
    validate(instance=data, schema=schema)
    print("Данные валидны")
except jsonschema.exceptions.ValidationError as err:
    print(f"Ошибка валидации: {err.message}")

Here we described a simple scheme for the user and checked whether the data matches this scheme. If something is wrong, the script will display an error message.

Case study of JSON Schema

So, let's imagine that we work for a company that produces cat food, and you need to create an API to process orders. Of course, the data that enters the system must be strictly structured. Let’s say there is a large amount of information about the food itself: taste, package size, expiration date, recommendations for different age groups of cats.

We need to describe a JSON schema that will serve as validation for food information. For example, there are the following parameters:

  1. Product name — string, required field.

  2. Description — string, optional field.

  3. Taste — list of flavors (for example, chicken, fish, beef), required field.

  4. Package weight – number in grams, minimum value – 50 g, maximum – 10,000 g.

  5. Price — number, required field, minimum value — 0.

  6. Best before date — a string in date format.

  7. Recommended age of a cat – an integer, minimum age – 1 year, maximum – 25 years.

  8. Food rating — floating point number from 1.0 to 5.0.

  9. Availability in stores — an array with city names.

  10. Stock — an object that can contain a “discount” field, but it should not exceed 50%.

Let's create a JSON Schema to validate this data:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "productName": {
      "type": "string"
    },
    "description": {
      "type": "string"
    },
    "flavors": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "minItems": 1
    },
    "weight": {
      "type": "integer",
      "minimum": 50,
      "maximum": 10000
    },
    "price": {
      "type": "number",
      "minimum": 0
    },
    "expirationDate": {
      "type": "string",
      "format": "date"
    },
    "recommendedAge": {
      "type": "integer",
      "minimum": 1,
      "maximum": 25
    },
    "rating": {
      "type": "number",
      "minimum": 1.0,
      "maximum": 5.0
    },
    "availability": {
      "type": "array",
      "items": {
        "type": "string"
      }
    },
    "promotion": {
      "type": "object",
      "properties": {
        "discount": {
          "type": "number",
          "maximum": 50
        }
      }
    }
  },
  "required": ["productName", "flavors", "weight", "price"]
}

What's going on here:

  1. Product name (productName) – required field, must be a string.

  2. Description — an optional field, also a string.

  3. Flavors — an array of strings, where at least one element is required. Tastes can be any, but at least one must be specified.

  4. Package weight (weight) — an integer, limited to values ​​from 50 to 10,000 grams. If someone accidentally enters the weight of a pack in milligrams, JSON Schema will immediately catch it.

  5. Price – required, minimum value – 0 (because food cannot be sold for negative money, although if it were possible, everyone would be happy).

  6. Expiration date (expiration date) — a string in date format, it is very important to monitor the freshness of the product.

  7. Recommended age (recommendedAge) – an integer, because cats also age, and babies should receive special food, just like pensioners.

  8. Food rating — a floating point number from 1.0 to 5.0, because this is customary in marketing, where everything should be on a scale.

  9. Availability in stores — an array of strings with the names of cities where the product is available.

  10. Promotions — an object with the “discount” field, where the discount should not exceed 50%.

Now let's look at how to validate incoming data for feed using Python. For this we use the library jsonschema.

Example of data to check:

import jsonschema
from jsonschema import validate

# JSON Schema для данных корма
schema = {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "properties": {
        "productName": {
            "type": "string"
        },
        "description": {
            "type": "string"
        },
        "flavors": {
            "type": "array",
            "items": {
                "type": "string"
            },
            "minItems": 1
        },
        "weight": {
            "type": "integer",
            "minimum": 50,
            "maximum": 10000
        },
        "price": {
            "type": "number",
            "minimum": 0
        },
        "expirationDate": {
            "type": "string",
            "format": "date"
        },
        "recommendedAge": {
            "type": "integer",
            "minimum": 1,
            "maximum": 25
        },
        "rating": {
            "type": "number",
            "minimum": 1.0,
            "maximum": 5.0
        },
        "availability": {
            "type": "array",
            "items": {
                "type": "string"
            }
        },
        "promotion": {
            "type": "object",
            "properties": {
                "discount": {
                    "type": "number",
                    "maximum": 50
                }
            }
        }
    },
    "required": ["productName", "flavors", "weight", "price"]
}

# Пример данных о корме
data = {
    "productName": "Kitty Feast Deluxe",
    "description": "Лучший корм для твоего котика",
    "flavors": ["курица", "говядина"],
    "weight": 500,
    "price": 15.99,
    "expirationDate": "2024-12-31",
    "recommendedAge": 3,
    "rating": 4.7,
    "availability": ["Москва", "Санкт-Петербург"],
    "promotion": {
        "discount": 20
    }
}

# Валидация данных
try:
    validate(instance=data, schema=schema)
    print("Данные валидны")
except jsonschema.exceptions.ValidationError as err:
    print(f"Ошибка валидации: {err.message}")

If everything is correct, the program output will say “The data is valid“.

This is how you can easily and simply control incoming product data.


Conclusion

Whether it's simple data type checking or complex conditions, JSON Schema makes your life easier and your API more stable.

Always remember – it’s better to spend 10 minutes writing a diagram than to spend nights figuring out why everything went to hell because of some extra space or incorrect type.

I invite everyone interested to the free webinars of the System Analyst course. Advanced:

  • Exclusive Strategies for Systems Analysts: How to Succeed in a Technical Interview. Register

  • How to move from a monolith to a microservice: approaches, problems and solutions. Register

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *