-
Notifications
You must be signed in to change notification settings - Fork 0
Introduction
Hash functions are key building blocks of modern cryptography, having applications in encryption, authentication protocols, and integrity verification schemes (http://wiki.theory.org/BitTorrentSpecification). Today’s most widely-used hash functions are inspired by Rivest’s MD2 (http://tools.ietf.org/html/rfc1319) and MD4, the designs of which are characterized by input chunking, substitution, and the repeated application of overlapping bitwise operations.
Although they are important to the integrity of many security schemes, most modern commonly-used hash functions, including SHA-1 and MD5 (hereafter referred to as Rivest functions), have either had serious attacks mounted against them or are in danger of being vulnerable in the near future. MD5 in particular can now be broken in under a minute on a commodity machine. (http://eprint.iacr.org/2005/067) As a result, an NIST competition has been initiated for the creation of a new cryptopgraphically secure function (http://csrc.nist.gov/groups/ST/hash/index.html). Unfortunately, hash function creation is an extremely difficult task, and thorough vetting and verification takes a great deal of analysis. However, we show that it is possible to approximate a good hash function through the use of genetic algorithm.
The genetic algorithm is inspired by biological natural selection. John Holland of the University of Michigan pioneered its use in the 1970s (Holland, “Genetic algorithms and the optimal allocation of trials.” SIAM J. Comput., 2.88-105); however, it wasn’t until the 1980s and 1990s that the computational capacity for practical implementations of the algorithm became readily available. The algorithm as we use it consists of a series of functions – a randomized generator, a fitness evaluator, and a breeder – that operate on chromomsomes, which are strings formed from a genetic description alphabet. The algorithm proceeds as following:
- Create a random starting population of size S.
- Evaluate the fitness of every member of the current generation.
- Breed the fittest members to form the next generation.
- Add random or old members until we have the next generation is once more of size S.
- Go to step 2 if we’re not done.
With proper parameter selection and a sufficient amount of time, the genetic algorithm will converge to some local optimum. Holland’s Schemata Theorem provides a more rigid explanation as to why this occurs (Holland, Adaptation in Natural and Artificial Systems, 1975).