Transforming variable-length to fixed keys

I posted this to sci.crypt recently but the response to it was rather underwhelming, so I thought I'd repost it here to see if anyone has any comments on it. What it is is a scheme for transforming arbitrary user keys (typically a long passphrase) into a fixed-length key for a particular algorithm. This has the following properties: 1. The user key 'userKey' is transformed into an algorithm-specific key 'key' using a number of 'iterations' of a hash algorithm 'hash()'. 2. The transformation is strongly serialized so that any form of attack involving parallelization or precomputation isn't possible. 3. The transformation is non-reversible, so that recovering the transformed key won't recover the original key. 4. The result of the transformation is algorithm-dependant, so that if an attacker recovers a transformed key for one algorithm they can't recover the transformed key (from the same user key) for another algorithm. 5. The transformation can be iterated as often as required to make password-guessing attacks difficult. 6. The transformation process is algorithm-independant and can use any type of hash algorithm and original and transformed key size. The transformation algorithm (which was designed with the help of John Kelsey) is as follows: key[] = { 0 }; state = hash( algorithm, mode, parameters, userKey ); for count = 1 to iterations for length = 1 to keyLength (in hash_output_size blocks) state = hash( state ); key[ length ] = hash( state, userKey ); The state acts as an RNG which ensures that the key hashing is serialized. The initial state depends on all encryption parameters, not just the user key. If we hashed the user key directly and then used it for a number of algorithms then someone who could recover the transformed key for one algorithm could compromise it if used for other algorithms (for example recovering a DES key would also recover half an IDEA key). Hashing all algorithm-related parameters means that a successful attack one an algorithm, mode, or configuration won't allow the key for any other algorithm, mode, or configuration to be recovered. The code which implements the iterated hashing is: /* Hash the variable-length input to a fixed-length output */ memset( key, 0, keyLength ); for( count = 0; count < iterations; count++ ) { for( keyIndex = 0; keyIndex < keyLength; keyIndex += hashOutputSize ) { /* state = hash( state ); key[ n ] = hash( state, userKey ) */ hash( state, state, hashOutputSize, HASH_ALL ); hash( NULL, state, hashOutputSize, HASH_START ); hash( temp, userKey, userKeyLength, HASH_END ); | | | output input input size /* Copy as much of the hashed data as required to the output */ length = ( keyLength - keyIndex ) % hashOutputSize; for( i = 0; i < length; i++ ) key[ i ] ^= temp[ i ]; } } Peter.

Peter Guttmann <pgut001@cs.auckland.ac.nz> writes on cpunks:
I posted this to sci.crypt recently but the response to it was rather underwhelming, so I thought I'd repost it here to see if anyone has any comments on it. What it is is a scheme for transforming arbitrary user keys (typically a long passphrase) into a fixed-length key for a particular algorithm. This has the following properties:
1. The user key 'userKey' is transformed into an algorithm-specific key 'key' using a number of 'iterations' of a hash algorithm 'hash()'.
2. The transformation is strongly serialized so that any form of attack involving parallelization or precomputation isn't possible.
If the speed of your key generation is an issue, you could do something like: key[] = { 0 }; const int nhashes = 4; typedef void (*hashfnptr)(byte*, byte*, int); /* array of hash functions */ hashfnptr hash[ nhashes ] = { md5, sha1, haval, ... }; state = hash[ 0 ]( algorithm, mode, parameters, userKey ); for count = 1 to iterations for length = 1 to keyLength (in hash_output_size blocks) /* selecting a hash function based on the state */ state = hash[ state % nhashes ]( state ); key[ length ] = hash[ state % nhashes]( state, userKey ); This provides more expense in hardware for the same expense in software, so for the same CPU time you get more hardware expense, and could reduce the iterations for the same security. `nhashes' determined by the number of digest algorithms you consider trustworthy. (They need hardware for `nhashes' different digest algorithms). You need to do something about resolving the differing output and state sizes. Probably speed isn't an issue though. Adam
participants (2)
-
Adam Back
-
pgut001@cs.auckland.ac.nz