Checking the complexity of passwords in Python

Users are very fond of simple passwords. The reasons for this may be different – someone simply does not think about the complexity of the password, someone is too lazy to remember, and someone just likes it when a common but cool word is used as a password.

An adequate response from the developers to this problem is to check user passwords and, accordingly, if the password is too simple, the proposal to create a password is more serious. Let’s take a look at how you can implement the most common checks.

Initial Actions

Let’s create our own ValidationError error class, and all validation functions will be built according to the following principle: if the password is valid, the function simply silently processes it and does not return anything, if the password is not valid, then the function will throw our validation error. For convenience, I will run the validation function check using pytest.

Password format validation

The easiest way is to check the password against a regular expression. The most common requirements are to check the minimum password length, the presence of upper and lower case characters, the presence of numbers and, sometimes, special characters in the password.

import re  


class ValidationError(Exception):    
  	"""Raises when password is not valid."""

# Проверяет наличие символов в обоих регистрах, 
# чисел, спецсимволов и минимальную длину 8 символов
pattern1 = r'^(?=.*[a-z])(?=.*[A-Z])(?=.*d)(?=.*[@$!%*#?&])[A-Za-zd@$!%*#?&]{8,}$'

# Проверяет наличие символов в обоих регистрах, 
# числел и минимальную длину 8 символов
pattern2 = r'^(?=.*[a-z])(?=.*[A-Z])(?=.*d)[A-Za-zd]{8,}$'

def validate_by_regexp(password, pattern):    
  	"""Валидация пароля по регулярному выражению."""    
  	if re.match(pattern, password) is None:        
    	raise ValidationError('Password has incorrecr format.')
    
    
def test_validate_by_regexp():    
  	password1 = 'qWer5%ty'    
  	password2 = '5qWerty5'    
  	assert validate_by_regexp(password1, pattern1) is None    
  	with pytest.raises(ValidationError):        
    		validate_by_regexp(password2, pattern1)        
  	assert validate_by_regexp(password2, pattern2) is None
$ pytest main.py::test_validate_by_regexp -v
main.py::test_validate_by_regexp PASSED

Validation against a list of the most common passwords

Regular expression validation is great, but I think you were somewhat suspicious of the 5qWerty5 password, which formally passes our verification. But besides qwerty, there are thousands of similar words that users are very fond of using as passwords. password, iloveyou, football … thousands of them. It would be nice to make a list of such words and check if the password sent to us is among them. The good news is that there is such a wonderful person in the world named Royce williamswhich has already collected thousands of such passwords. The entire list is available at gist

We can download an archive that contains a text file with passwords in the following format frequency: sha1-hash: plain, that is, the frequency of occurrence of the password, its hash, and the password itself as it is. Let’s write a function that will open a file with a list and, iterating through the lines, check our password against the next password in the list:

from itertools import dropwhile
from pathlib import Path


def validate_by_common_list(password):    
		"""Валидация пароля по списку самых распространенных паролей."""
		common_passwords_filepath = Path(__file__).parent.resolve() / 'common-passwords.txt'txt'
  	with open(common_passwords_filepath) as f:
        for line in dropwhile(lambda x: x.startswith('#'), f):
            common = line.strip().split(':')[-1] # выделяем сам пароль
            if password.lower() == common:
                raise ValidationError('Do not use so common password.')

          
def test_validate_by_common_list():
  	with pytest.raises(ValidationError):
    		validate_by_common_list('qwerty')
    with pytest.raises(ValidationError):
      	validate_by_common_list('flower')
      	assert validate_by_common_list('C_$s^8C7') is None # Хороший пароль
$ pytest main.py::test_validate_by_common_list -v
main.py::test_validate_by_common_list PASSED

Well, our function easily finds such obvious words like qwerty, but what if the user is not so simple, and to our remark that his password is too obvious, say, just add a dot somewhere or put a couple of numbers at the beginning and end: (insert test results?) qwert.y, 0qwerty0 or even qwerty?

Let’s add these checks to the test:

def test_validate_by_common_list():
    with pytest.raises(ValidationError):
        validate_by_common_list_simply('qwerty')

    with pytest.raises(ValidationError):
        validate_by_common_list_simply('flower')

    with pytest.raises(ValidationError):
        validate_by_common_list_simply('qwert.y')

    with pytest.raises(ValidationError):
        validate_by_common_list_simply('0qwerty0')
        
    assert validate_by_common_list_simply('C_$s^8C7') is None # Хороший пароль
$ pytest main.py::test_validate_by_common_list -v
main.py::test_validate_by_common_list FAILED
...
 with pytest.raises(ValidationError):
>           validate_by_common_list('qwert.y')
E           Failed: DID NOT RAISE <class 'main.ValidationError'>

Our validator is no longer able to catch such obvious hacks.

As a solution, one could, of course, try to insert the removal of dots (and / or other special characters) from the string, for example, something like this password = password.replace('.', '')… However, everyone understands that this path, to put it mildly, is not very aesthetic and correct. Instead, you can use the python standard library module difflib

As follows from the description – this module provides classes and functions for comparing sequences, which is great for us – after all, strings in python have the properties of sequences. Let’s take a closer look at the object difflib.SequenceMatcher

Class SequenceMatcher takes two sequences as input and provides several methods for assessing their similarity. We are interested in the method ratio() which returns a number in the range [0,1] characterizing the “similarity” of two sequences, where 1 corresponds to two absolutely identical sequences, and 0 is absolutely different.

Let’s rewrite our validation function as follows:

from difflib import SequenceMatcher
from itertools import dropwhile
from pathlib import Path


def validate_by_common_list(password):
    """
    Валидация по списку самых распространенных паролей,
    с учетом слишком похожих случаев.
    """
    common_passwords_filepath = Path(__file__).parent.resolve() / 'common-passwords.txt'
    max_similarity = 0.7
    
    with open(common_passwords_filepath) as f:
        for line in dropwhile(lambda x: x.startswith('#'), f):
            common = line.strip().split(':')[-1]
            diff = SequenceMatcher(a=password.lower(), b=common)
            if diff.ratio() >= max_similarity:
                raise ValidationError('Do not use so common password.')

A few clarifications:

max_similarity – characterizes the maximum allowable similarity, you should not get carried away and underestimate this parameter too much, otherwise your validator will catch the slightest matches up to a couple of characters. In my experience, the value of 0.7 is the minimum threshold below which you should not fall, while the threshold of 0.75 will already pass this password 'q.w.e.r.t.y' , so determine the size of this parameter for yourself.

Also, here’s where I’m using the function:

dropwhile(lambda x: x.startswith('#'), f)

from module itertools in order to skip the commented lines at the beginning of the common-passwords.txt file, however, they could simply be deleted manually.

Let’s test our rewritten validator:

def test_validate_by_common_list():
    with pytest.raises(ValidationError):
        validate_by_common_list('qwerty')

    with pytest.raises(ValidationError):
        validate_by_common_list('flower')

    with pytest.raises(ValidationError):
        validate_by_common_list('qWer5%ty')

    with pytest.raises(ValidationError):
        validate_by_common_list('5qWerty5')

    with pytest.raises(ValidationError):
        validate_by_common_list('q.w.e.r.t.y')

    assert validate_by_common_list('C_$s^8C7') is None # Хороший пароль
$ pytest main.py::test_validate_by_common_list -v
main.py::test_validate_by_common_list PASSED

Validation by using other fields as a password

So, we have identified the required password format and checked it so that it is not too obvious. Another common case is using the value of another user attribute as the password. For example, if a user simply copies his email or login into the password field. To detect such cases, you can use the same method that we used to determine similar passwords – object difflib.SequenceMatcher, only this time we will compare the password with the value of other fields:

def validate_by_similarity(password, *other_fields):
    """Проверяем, что пароль не слишком похож на другие поля пользователя."""
    max_similarity = 0.75

    for field in other_fields:
        field_parts = re.split(r'W+', field) + [field]
        for part in field_parts:
            if SequenceMatcher(a=password.lower(), b=part.lower()).ratio() >= max_similarity:
                raise ValidationError('Password is too similar on other user field.')

Here we split the password into parts according to the pattern W+, which is suitable for all non-standard characters (that is, not including letters, numbers and underscores), for cases when the user can use part of his email as a password without a domain. For example, if we use the email someemailname@gmail.com as a password, we will get the following parts: [‘someemailname’, ‘gmail’, ‘com’, ‘someemailname@gmail.com’]…

Let’s check how our function works:

def test_validate_by_similarity():
    user_login = 'joda777jedi'
    email="jedimaster1@jediacademy.co"

    with pytest.raises(ValidationError):
        validate_by_similarity('jedimaster1', user_login, email)

    with pytest.raises(ValidationError):
        validate_by_similarity('joda777jedi', user_login, email)

    with pytest.raises(ValidationError):
        validate_by_similarity('jedimaster1@jediacademy.co', user_login, email)

    with pytest.raises(ValidationError):
        validate_by_similarity('joda777', user_login, email)

    assert validate_by_similarity('C_$s^8C7') is None
$ pytest main.py::test_validate_by_similarity -v
main.py::test_validate_by_similarity PASSED

The listed methods, as a rule, will be enough, however, and you should not get too carried away with them either, otherwise the friendliness of your interface for users will become something like this:

So, the choice of which methods and with what degree of difficulty to use in your projects depends only on the seriousness of your project, the required degree of protection … well, and perhaps on your sadistic inclinations.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *