In message Tue, 5 Oct 1993 17:11:17 -0400 (EDT), Matthew J Ghio <mg5n+@andrew.cmu.edu> writes:
Seriously tho, just posting a list of MS-DOS filenames is rather useless as filenames do get changed. It is highly likely that a sysop or user might have changed the filenames to something else, especially if their operating system supported filenames longer than 8 characters.
Doesn't this bring up a fundamental question: when is a file equivalent? we can easily use MD5 or brik to identify identical files. But GIFs, and other image files (MPEG, JPEG, TIFF, etc.) are subject to both lossey compression and stegnagraphic [sic, sorry] coding techniques. If you change one pixel of the background, the checksums are different, but it will still show *porm or whatever to a judge who "knows it when he sees it." We can prove statistical insignificance of duplication using strong hashing functions. Can we find a way to statistically prove "looks like" on a numerical basis? Pat Pat Farrell Grad Student pfarrell@cs.gmu.edu Department of Computer Science George Mason University, Fairfax, VA Public key availble via finger #include <standard.disclaimer>