At its heart, a password cracking attack is just a guessing attack. An attacker makes guesses about a user’s password until they guess correctly or they give up. While the defender may limit the number of guesses an attacker is allowed, a password’s strength often depends on how hard it is for an attacker to model and reproduce the way a user created their password.
If humans were effective at practicing unique habits, or generating and remembering random values, cracking passwords would be a near impossible task. In reality, that isn’t true. A vast majority of people still follow common patterns, from capitalizing the first letter of their password to putting numbers at the end. What is changing though are the protective techniques being employed that are independent of user behavior. Practices such as salting password hashes negate the ability to pre-compute attacks. Likewise, password hashes are becoming more computationally complex, raising the costs for each guess an attacker makes.
While before an attacker could rely on simple brute force methods and ad-hoc models, there is a growing demand for more effective ways to predict what a user’s password will be. The need for this is especially strong in the law enforcement community, where tough encryption is encountered regularly. It is also important for the defender to be able to accurately model the security that user generated passwords provide.
This paper details several new ways that probability information can be applied to maximize the success of password cracking attacks. From evaluating the effectiveness of known probabilistic techniques such as Markov models, to designing novel techniques such as using probabilistic context free grammars to create password guesses, there are many different ways probability information can be incorporated into modeling user behavior. Furthermore, the techniques described in this paper have been developed using real life passwords and have been tested in actual controlled password cracking attacks. This focus on training and testing against large sets of real life passwords is fairly unique, and only possible due to the increasing availability of disclosed password lists. In addition to allowing the development of more effective attacks, knowledge of how people select passwords can then be applied to evaluating the effectiveness of password creation policies. For example, how much stronger is an eight character password compared to a seven character password? In short, a better understanding of how users create passwords can benefit both the attacker and the defender.