Online appear regularly news about another large-scale password leak. Password dictionaries are getting thicker, there are more brute-force tools, and it is becoming more difficult for users to come up with strong passwords and remember them without outside help.
New research offer us scientific methods for creating strong and convenient passwords. Scientists at Carnegie Mellon University (CMU) have figured out how to create a password policy with a human face and not sacrifice security. We’ve translated the key CMU guidelines and supplemented them with a selection of useful tools for self-checking passwords.
What data are the recommendations based on?
Many of the mechanisms for building good password habits have a scientific basis. For example, CMU employees provedthat it is better to increase the minimum password length than to force the use of characters of different classes. Other research groups have developed “Password meters” to evaluate passwords based on big data. Separate scientific works are devoted to password dictionaries.
But there are few specific recommendations on how to combine all these mechanisms and achieve optimal password complexity and convenience. For this reason, CMU conducted 2 experiments and evaluated how different combinations of password policies protect against attacks and are perceived by users.
The participants in the experiments were asked to create a password according to a given password policy. First, scientists measured the strength of these passwords using various methods. Users were then asked to remember their password, and researchers rated how easy it was to remember.
How policies were drafted and tested
Number and classes of characters… All policies were based on the requirements for the length of the password and the number of classes: these are upper and lower case letters, numbers and special characters. Researchers have designated each policy with a formula of the form:
All tested policies included a password length of at least 8 characters. The result is something like this list of policies: 1s8, 3c8, 4c8, 1s10, 1s12, 2s12, 3c12, 1s16…
Dictionary check… Character requirements help generate more complex passwords. However, some users may follow the requirement and still come up with a weak password. For example, for politicians 4c8 and 1c16 you can create predictable options: 1Password! and passwordpassword… To avoid this, politicians are often combined with dictionary searches.
Dictionary check requires a list of frequently used, predictable or compromised passwords, as well as the dictionary search algorithm itself. For example, you can search for exact matches in a list, or you can use a fuzzy search that finds matches with difference in case or individual characters. Let’s say the dictionary contains password… Fuzzy search will also count as “dictionary” passwords like pa $$ word or Passw0rd…
case-insensitive full-string comparison (cifs);
case-sensitive full-string comparison (fs);
separation of numbers and special characters and subsequent case-insensitive search for exactly the same strings in the dictionary (strip-cifs);
case-insensitive search for any 5-character matches of dictionary fragments with user password fragments (ciss).
In combination with different dictionaries, we got the following options for checking: Pwned-fs, Xato-cifs, Xato-strip-cifs, Xato-ciss etc.
Number of brute force attempts… One method for assessing the strength of a password is how many attempts an attacker will make before guessing the password. Various models are used for this verification, including neural networks. The minimum requirements for the number and frequency of attempts depend on many factors, such as the defenses used or the nature of the attack. In their work, CMU considers 2 types of attacks:
in online attacks, a hacker can brute-force passwords only when interacting with the server. If you cut off access, you will not be able to continue trying.
in offline attacks, a hacker can brute-force passwords without feedback from the server, that is, longer. This is possible if the attackers have a stolen database with password hashes.
To assess reliability, CMU used machine learning models. Using a database of compromised passwords, the neural network was trained to predict how likely a password would be to guess. The degree of reliability was determined by the number of necessary attempts and was written down by the formula:
Method sharing hypotheses… The researchers used different combinations of policies to answer the questions:
How does dictionary checking work with policies like 1c8 and 3c8?
How are different variants of dictionary checks combined with a policy like 1c8?
To what extent do different length and grade policies from 1c8 to 3c12 fulfill the NN6 reliability requirement?
How well does dictionary checking work with neural network checking?
How successful is the NN8 and NN10 reliability check with different requirements only for the password length?
How is dictionary checking combined with different requirements for the number of character classes (from 1c8 to 4c8)?
The result is such a table for checking different conditions:
How user-friendliness was rated
Now let’s move on to the dimensions of convenience. At the first stage, all participants in the experiment had to come up with a password in a specially designed “password meter”. The password policy was chosen randomly from all the combinations that the researchers tested.
Participants in the experiment received a list of recommendations, entered a password, and watched the password difficulty indicator change in real time.
Once the minimum requirements were met, the “password meter” made recommendations to strengthen the password:
While the user was coming up with a password, the “password meter” recorded the attempts made to fulfill the requirements and how long it took to create the correct password.
All generated passwords were checked using neural networks and predicted how many attempts it would take to brute force.
After 2 days, the scientists asked users to take part in the second stage and remember the password they created. If users didn’t respond after 5 days, they dropped out of the experiment. For the rest of the sample, the share of those who managed to remember the password was recorded. The researchers analyzed all the accompanying behavior of the participants: whether they can reproduce the password from memory or write it down, how long and how long it takes to reproduce it.
Participants in both stages were asked to fill out a questionnaire and describe in it their experience of participating in the experiment: how difficult it is to create a password, how strongly the chosen password policy annoyed the user.
For each combination of conditions, the scientists recorded the objective indicators of the experiment: the number of participants, the proportion of those who dropped out at the second stage, the actual time to create the password, the actual time to reproduce the password, the proportion of successful attempts to guess the password. In addition, the subjective feelings of the survey participants were also indicated: how difficult and annoying creating a password seems to them, how easy it is to remember it.
The resulting table is:
What conclusions did the researchers come to?
The most optimal password policy, according to the researchers, is setting the minimum password length and simultaneously checking the password using a neural network. The policy of the form 1c12 + NN10: a password of at least 12 characters with a “guessability” of at least 10 billion attempts. This provides sufficient protection against offline attacks and the required level of convenience.
If verification using neural networks is not suitable, then the researchers recommend setting the password length to at least 8 characters and simultaneously checking passwords using dictionaries. This check provides less protection, but is quite effective against online attacks. At the same time, protection can be increased by increasing the minimum password length.
Dictionary checking will be more efficient if you use fuzzy search methods. That is, it is better to include in the search not only exact matches, but also variations like P @ $$ w0rd… If an exact search is used, it is worth starting a check against the maximum available dictionary of compromised passwords.
Users are more uncomfortable if they need to create a password with different character classes. If the user only needs to keep the length of at least 8-16 characters and pass the “guessing” password check, then the password policy is perceived as convenient.
What other tools can you use
The authors of the study urge not to forget about other tools for protecting accounts: password managers and multifactor authentication (2FA). If you want to check the strength of the passwords themselves, here are some tools:
You can play with the CMU “password meter” here: https://cups.cs.cmu.edu/meter/…
The same “password meter” on github: https://github.com/cupslab/password_meter…
Password dictionaries that we use to run hashes of user passwords in DataLine:
Full recommendations in the original: Practical Recommendations for Stronger, More Usable Passwords Combining Minimum-strength, Minimum-length, and Blocklist Requirements…
Researchers’ comments to media: Years of Research Reveals Holy Grail of Password Strength