rsync and md4

Andrew Tridgell tridge at arvidsjaur.anu.edu.au
Fri Jun 28 23:45:31 PDT 1996


I've recently released a package called rsync that uses a checksum
search to provide very efficient file update over a slow link.  (see
ftp://samba.anu.edu.au/pub/rsync if you are interested)

Now I'd like to calculate some probabilities of failure of the
algorithm. The fundamental thing I need to know to do the calculation
is the probability of a random piece of data of length n having the
same md4 checksum as another given piece of data of the same length.

A first guess might be 2^-128 but I know that this sort of thing is
rarely that simple. Is md4 that good?

Note that I am not interested in "attacks" on md4 as such as the
source of the random data is just another file provided by the same
user, so it won't have been specially designed to defeat md4. 

If the probability is within a few orders of magnitude of 2^-128 then
can I also be sure that if I only use the first b bits of a md4
checksum it will be within a few orders of magnitude of 2^-b ? There
is an option in rsync to use a shorter checksum by truncating
md4. This saves some bytes on the link at the risk of lowering the
confidence. 

Why md4? I chose md4 because it seemed to be the fastest of the
reputedly strong, publicly available checksum algorithms. Suggestions
for alternative algorithms are welcome.

Cheers, Andrew






More information about the cypherpunks-legacy mailing list