JS. Data validation. Write your YUP

Why is validation needed during development and when to use it?

In web development, when working with user data, validation should be used when the data is received by the service. Validation can be roughly divided into:

  • Client. When entering data into forms, it is important to validate the entered data and inform the user that it is incorrect. This gives a clear feedback to the user about his actions and prevents further incorrect actions in the service.

  • Server room. Any code running on the client, as well as requests coming from the client application, cannot be trusted and must be validated. You cannot count on the fact that the client application is guaranteed to prepare correct data, since during development there may be a discrepancy between the logic for working with data on the server and the client. However, we may also encounter a case where a client manually prepares data while masquerading as an application.”

In general, data should be validated as often as possible, especially in the context of the full development cycle on both the server and client. Let's look at what libraries currently exist for this purpose.

Analysis of existing solutions

Popular solutions that can be used both on the client and on the server include: yup And zod. Let's look at their features and pay attention to their shortcomings.

In general, both libraries suffer from:

  • Excessive variety of functionality. This includes both type conversion – both libraries provide the functionality of type conversion during validation, and the desire to provide for all possible cases of validation. This increases the size of the code base and reduces the understandability of the code for other developers who decide to get into the source code. For example, the method getIn in yup and unpassable regexp field, methods that must provide for all configuration options in zod (This is not to mention files that are 6000 lines long.).

  • Ignoring performance issues. Both libraries focus on expanding functionality rather than on the performance of what they have. And this manifests itself in small things, for example, in these libraries, adding any new validation rule leads to a complete copy of the entity yup, zod.

Library architecture

Principles

Let's try to create our own library, avoiding the above problems. To do this, we will formulate principles that we should be guided by.

  • The code should be simple

  • The code should be productive as much as the previous paragraph allows

Structure

Let's try to build on the code that we expect to see in the finished library. By analogy with yup and zod, it should look something like this:

const schema = string().min(2);
const value="hello";

schema.validate(value);

It should be noted that there are two or more validations here

  • string() – checks that value is a string (by default, the string must not be empty either)

  • min(2) – checks that the string length must be at least 2 characters

We could add these conditions further, but we already see the main thing,

  • the structure we choose to store rules must support an unlimited list of rules

  • It is necessary to provide a chain of methods so that the following can be written: string().min(2).max(4)

It might look like this:

type Checker = () => string;
class String {
  conditions: Checker[] = [];

  constructor() {
    // Добавление правила валидации
    this.conditions.push((value) => {
      if (typeof value !== 'string') {
        return 'Is not a string';
      }
      return '';
    });
  }

  min(num: string) {
    // Добавление правила валидации
    this.conditions.push((value) => {
      if (value.length < min) {
        return 'Too short string';
      }
      return '';
    });

    // Возвращение всей сущности для возможности чейнинга
    return this;
  }
}

Now, in order to validate the transmitted data, it remains to find out whether there is such a condition that will return a non-empty string when executed:

type Checker = () => string;
class String {
  conditions: Checker[] = [];
  // ...
  validate(value: any) {
    for (const condition of this.confiditons) {
      const error = condition(value);
      if (error !== '') {
        return error;
      }
    }

    return '';
  }
}

Here you can see that we stop at the first error encountered and complete the cycle of checks. In the real world, this will add performance to our solution. We will follow the same approach when working with other data, for example, objects – interrupt the search when the first error is detected. This solution may seem strange to some, but in my opinion it is the most practical:

  • If we are interested in every error in the data, for example when validating forms. For each entity (input) you can write your own validation

  • If we are interested in why the server did not accept our data, we assume that there are several reasons. First, you can fix the already indicated error, and then fix new ones

We have already provided several checks, but life is always more diverse, and it is worth allowing the developer to add rules in addition to the standard ones

type Checker = () => string;
class String {
  conditions: Checker[] = [];

  test(checker: (value: any) => string) {
    this.conditions.push(checker);
    return this;
  }
}

Let us immediately note that validate(), test(), conditions() seem to be common methods/properties, without which no type of validation can do. Therefore, we will move them into a separate class from which all our specific types will inherit. The final code will look like this:

type Checker = (value: any) => string;
class Schema {
  conditions: Checker[] = [];

  validate(value: any) {
    for (const condition of this.conditions) {
      const error = condition(value);
      if (error !== '') {
        return error;
      }
    }

    return '';
  }

  test(checker: Checker) {
    this.conditions.push(checker);
    return this;
  }
}

class String extends Schema {
  constructor() {
    super();
    this.conditions.push((value) => {
      if (typeof value !== 'string') {
        return 'Is not a string';
      }
      return '';
    });
  }

  min(min: number) {
    this.conditions.push((value) => {
      if (value.length < min) {
        return 'Too short string';
      }
      return '';
    });

    return this;
  }
}

const checkUpperCase = (value: string) => {
  if (value !== value.toUpperCase()) {
    return 'NOT UPPER CASE';
  }

  return '';
};
const string = () => new String();
const schema = string().min(2).test(checkUpperCase);

const valueValid = 'HELLO';
const valueError="Hello";

console.log(schema.validate(valueValid)); // ''
console.log(schema.validate(valueError)); // 'NOT UPPER CASE'

I note that real example just a little more complicated because

  • conditions – must contain the names of the rules so that in certain cases they can be replaced or removed. Therefore, instead of ordinary functions, you should use objects that contain the names and these functions

  • I would like to see the error message from checker be more informative; with a complex nested structure in the text, the name of the key in which the error occurred would be useful

Nested Structures

We've written great code for primitives, but what about more complex structures? For example

const user = {
  name: 'Aleksey',
  age: 42,
};

For this we need a separate entity objectwhich allows you to write nested rules

const schema = object({
  name: string(),
  age: number(),
});

Its implementation:

class Object extends Schema {
  constructor(objSchema) {
    super();
    this.conditions.push((obj) => {
      for (const key in objSchema) {
        const innerSchema = objSchema[key];

        // innerSchema сама знает как провалидировать данные, нам остается только ее запустить
        const error = innerSchema.validate(obj);
        if (error !== '') {
          return `${key} props has wrong type`;
        }
      }

      return '';
    });
  }
}

Ts types

By describing the schema, we essentially already indicate the types that should be in the validated object. Using ts we can completely save the developer from having to describe types several times. In order to realize this, let's try to do a little magic ts

Simple example

const schema = string();
const rawValue="hello";

const error = schema(rawValue);
if (error !== '') {
  // do something
}

const value = rawValue as Infer<typeof schema>; // string type

Let's try to implement this. As a basis for the idea, let's create an internal types field, which will store the entity type and from where Infer can get the required type

class Schema<TValue> {
  types!: TValue;
}

class String extends Schema<string> {}

type Infer<TType extends Schema<any>> = TType['types'];

Works! Now let's move on to a more complex example:

const rawUser = {
  name: 'Aleksey',
};

const schema = object({
  name: string(),
});

const error = schema(rawUser);
if (error !== '') {
  // do something
}

const user = rawUser as Infer<typeof schema>; // {name: string, age: number} type

Let's try to implement it. There's going to be some TypeScript magic now, so put away the kids and Flow followers

type Infer<TType extends Schema<any>> = TType['types'];

class Schema<TValue> {
  types!: TValue;
}

class String extends Schema<string> {}

const string = () => new String();

type ObjectValue = Record<string, Schema<any>>;
type PreparedTypes<TValue extends ObjectValue> = {
  [K in keyof TValue]: Infer<TValue[K]>;
};

class ObjectVidator<
  TValue extends ObjectValue,
  TValueTypes = PreparedTypes<TValue>,
> extends Schema<TValueTypes> {
  value: TValue;

  constructor(value: TValue) {
    super();
    this.value = value;
  }
}

function object<TValue extends ObjectValue>(value: TValue) {
  return new ObjectVidator(value);
}

const schema = object({
  name: string(),
});

type User = Infer<typeof schema>; // {name: string} type

Real library

The approaches described above describe at the top level the concept of a library that can be implemented. Now it's a matter of adding specific types for number, boolean, and so on. At the same time, creating a real library will require more resources. I followed the path described above when writing my desy library. In it you can see what the specified code actually looks like and if you want to use it in your project

desy – Dead Extraordinary Simple Yup

Thoughts on Performance

After writing the library, I was surprised by how much more productive desy turned out to be than other solutions. Of course, I expected better benchmarks, but not such a rapid increase that occurred in reality. How can the reason be identified?

  • refusal to throw errors

  • refusal of validation when errors are found

  • rejection of immutable structures and complicated code with deep branching

Writing specific numbers is always a dubious matter, so you can study the measurements on one's own

Questions that might remain

  • Why is the error indicator a string? A string is the most expressive means of communicating the details of an error. Considering that we refused to throw errors, true/false will definitely not work for us

  • Why don't we throw errors? Error forwarding is an operation that should report unexpected application operation. Inconsistency of data with the schema, despite the fact that this occurs inside a program specially created for this purpose, cannot be called unforeseen situations. We are literally asking the program to tell us whether the data is valid or not. To do this, the usual ways of working with data must be used. +performance

  • Why are all checks synchronous? Supporting asynchronous checks would require an increase in the code base and branching of the execution logic. At the same time, asynchronous checks are required extremely rarely. It’s so rare that in these cases it’s easier to do without a ready-made solution

Conclusion

As a result, I would like to say:

  • Many of the libraries we use in everyday life are neither performant nor understandable or extensible. We are simply accustomed to these tools and often perceive them as something global and unshakable. Sometimes you need to write your own bikes and maybe some of them will be better than the original. Do not forget that many popular libraries are the author’s response to the fact that he did not like something in existing ones

  • Validate your data. Seriously. The user cannot be trusted. And better use for validation desy

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *