How to register a user without breaking your head

How do you start working with an application, bot or website?
The answer is simple – by registering a user in your system.

In this article I will tell you what non-obvious problems we encountered when developing registration in bots, sites and applications. A seemingly simple operation can bring significant problems, if you do not take into account some of the nuances.

Seems like registration could be easier? Made a string variable with full name, closed the case and went to write the real functionality. But it was not there.

There is a 90% chance that if you do this, then everything will work well at first, but after a while problems may arise. You look into your database and you will find a huge amount of garbage instead of data, and at worst, you will find your database in the public domain on all sorts of questionable forums.

Why did this happen?

If you have such a choice of date, then you can simply remove it, there is no point in it
If you have such a choice of date, then you can simply remove it, there is no point in it

Firstly, the user has no motivation to give you his real data, for example, in incentive 93% of users were born on January 1st. This anomaly is explained very simply – the first day of the year is by default in the form of choosing the date of birth and, of course, the user does not waste his time choosing a real date, but simply presses the OK button and goes to download games. And it turns out that this data is just rubbish and no adequate analytics can be done. Why this is bad, I will not write here, you all understand yourself. Well, the second is just security, but I’ll talk about that a little later.

We have a problem and we need to solve it. As always, stick and carrot should be used for balance. How to feed a confection is the task of your product, because everything is unique for each case. But the whip is quite standard and universal for all projects.

Regular Expressions

The obvious solution would be to check the adequacy of the data that the user writes. There are all sorts of exceptions, but generally people are not called “X Æ A-12”. Therefore, you can safely embed regular expressions and run data from the user through them.

What specific regular schedules to set is up to you, for each project, I repeat, your own decision. And then let’s talk about typical cases that have given us a lot of headaches.

So, first you need to decide on the target audience of your product, since each specific choice will lead you along a unique path. Product for which market? Russian or foreign? If foreign, then what – Chinese or European? Do users write in Latin, Cyrillic or their own unique writing system?

Yes, yes, all this is needed for a simple and stupid, at first glance, task like registering a user. The fact is that we are Europeans and we think accordingly, and in the world there is a huge number of people with other cultures who have a different type of written thinking, if you can call it that. There are a huge number of options, more experienced people will tell you about working with all sorts of rare things like the old Monogol vertical writing ᠮᠤᠩᠭᠤᠯ (It’s funny here, Habr does not know how to correctly display this type of letter) or Arabic script. Therefore, we decide what we will work with and cut off all unnecessary things. According to the standard, we will register a person in Cyrillic, in regular expressions we only allow it.

But there are eleven nuances:

  1. Full name in one field.
    Often, instead of three fields, they make one, where they offer to enter everything that a person has. What seems to be the problem here? The first word is the surname, the second is the name, and the third, respectively, is the patronymic. But it was not there, firstly, users often confuse the order, and secondly, they may have a double name or no middle name. And this will either prohibit the registration of this user, or reduce the quality of the data.

    Ideally, you should always split the full name into three different variables, but sometimes this is impossible, because a long registration will discourage the desire to use your product, and therefore you need to get data from a person in one field as soon as possible. But then you will need to abandon part of the functionality or write a monster that determines where what is located and assign the appropriate class already inside your program.

2. Double surnames.
Since we have already started with this nuance, we continue. A banal one-field construction works fine before you first encounter a person with a double last name. And then a problem arises – a double surname is essentially one word, but the sign does not belong to the Cyrillic alphabet, which is why registration is impossible. So you need to keep this in mind.

3. Double names.
Similarly to the previous point, just now imagine that there are people with double names and double surnames at the same time, and their full name can be very long, so we learn the maximum length of the entered text with a margin.

4. Full name from a large number of words.
The cultural factor, we in Russia are used to the fact that there are three words in the full name, although in some countries it may consist of more. Therefore, we also take this into account in regular expressions, or make a note of which part of the name and where to use.

5. Patronymic.
Due to the fact that we are all a little Russian-centric, we sometimes forget about other cultures, where the names were built on completely different principles. Central Asian, Japanese, Chinese, Korean full names can consist of more words, and none of them will be a middle name.

6. Problems with Oglu, ibn.
And here the opposite situation, in the Turkic languages ​​Ogly is essentially a patronymic, however, it can be written both with a hyphen and as a separate word. Consider this point so as not to confuse the packaging by variables.

7. Several types -.

-, -, -, -. No, these are not emoticons, they are from left to right: hyphen, minus, en dash and dash. And if you have to check for a double name / surname, then there should be a hyphen sign, but users can insert any other of these characters, and the regular will simply not skip further.

8. Latin alphabet in letters.
Another additional argument to introduce regular lines is Latin letters in Russian words. You cannot imagine how much pain it will add to you to look for the hidden Latin “o” in the name Ivanov.

9. The letter E.
If all other cases are universal for any language, then this case is only for the Russian language. The fact is that regex simply does not know that we have a letter in the alphabet. e… Those. if you write regex for first and last name no more than 50 characters([А-Я][а-я]{1,49}) ([А-Я][а-я]{1,49}), then such an expression will not miss either my name (Peter) or the last name containing this letter. Therefore, we add it separately to the regulars.

We add yo to the regulars for correct work.
We add yo to the regulars for correct work.

10. Several thousand characters. The dumbest and most reliable way to hang up the registration. You enter tens of thousands of characters and you just wait. It depends on the platform you are working with. We often had a telegram, so I’ll show it on it. It splits huge messages into several parts and simultaneously sends them to the server. And it turns out that the trigger works not for one message, but for tens, if not hundreds. For example, if you have a reply message after registration

hi username,

then the user will receive several hundred huge messages, including this long name of thousands of characters. Because of this, there is a chance that your entire service will hang or partially break it.

I personally saw how the loading screen of the Sberbank mobile application was mangled with a name of several thousand random characters. So this is not only a scourge of small projects.

11. Transcription (slugs).
Slightly aside, but I think it will be useful. If you have a translation of values ​​into the Latin alphabet (for example, full name for letters abroad), then you need to use a single tool for creating slugs (human-readable identifiers) in the project. In one project we used 2 different methodologies for creating slugs on the front and back, and for a long time we did not understand what the problem was. It turned out that we are keying through slugs, which are formed in different ways. For example, the word “transliteration” in various systems looks like this: Transliteraciya and Transliteratsiya.

Well, now the answer to the main question – why such a post cover?

On the screenshot, you see SQL injection. You probably often met this expression, but did not understand what it is in essence.

So what is this SQL injection of yours? Quite simply, when registering under the guise of a username, it’s stupidly flying into the database, most likely a malicious SQL code. Which one exactly is unknown, but this is a hole right in your database.

– What is your name?
– My name is DROP DATABASE base1

And that’s it, your base is no longer there. Yes, of course, the chance that the attacker will guess the name of the table is very small, but no one has canceled brute force and industrial espionage, leaving such a hole is very dangerous. Hundreds of articles have been written on injection protection and this is the standard in information security. Therefore, injections, in the overwhelming majority of cases, can be done only when the developer does not even suspect that this is possible.

This was the final argument for the need to introduce regular expressions.

Main conclusions:

  1. Less Eurocentric. Other cultures have their own rules for naming — and this must be taken into account. Or it can be customized according to your format, but according to unified scenarios.

  2. If it is possible to divide the full name into separate fields, divide by all means, this will greatly simplify your life.

  3. If there are no absolutely critical contraindications, then add regular expressions just in case.

  4. Remember that if you directly put something in the database, it means, if desired, external specialists can do anything with your base.

  5. It is better to provide for the rarest spelling variations of names, so that later you do not run around and look for a problem that is often difficult to detect.

    I do not pretend to be a guide, I described only those cases that I personally encountered and which added a lot of headaches to me. The point of the article is to help beginners discover hidden pitfalls and rid their projects of potential problems.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *