Identifying GIFs, was Re: criminal gif upload
In message Tue, 5 Oct 1993 17:11:17 -0400 (EDT), Matthew J Ghio <mg5n+@andrew.cmu.edu> writes:
Seriously tho, just posting a list of MS-DOS filenames is rather useless as filenames do get changed. It is highly likely that a sysop or user might have changed the filenames to something else, especially if their operating system supported filenames longer than 8 characters.
Doesn't this bring up a fundamental question: when is a file equivalent? we can easily use MD5 or brik to identify identical files. But GIFs, and other image files (MPEG, JPEG, TIFF, etc.) are subject to both lossey compression and stegnagraphic [sic, sorry] coding techniques. If you change one pixel of the background, the checksums are different, but it will still show *porm or whatever to a judge who "knows it when he sees it." We can prove statistical insignificance of duplication using strong hashing functions. Can we find a way to statistically prove "looks like" on a numerical basis? Pat Pat Farrell Grad Student pfarrell@cs.gmu.edu Department of Computer Science George Mason University, Fairfax, VA Public key availble via finger #include <standard.disclaimer>
"Pat Farrell" <pfarrell@gmu.edu> writes:
We can prove statistical insignificance of duplication using strong hashing functions. Can we find a way to statistically prove "looks like" on a numerical basis?
Yes. If you were to take an image and divide it into let's say about 20 sections horizontally, and 20 sections vertically, and then average the intensities of all pixels in each of the 400 rectangles formed, you would create a fuzzy low-resolution version of the original picture which could be used to compare other pictures to it to determine weather they look like the orginal by using the same averaging method, and then comparing the block-pixel averages. If the pictures differed by less than +/- 5% or so for each block, the original pictures probably look very much alike. This method works well even if one of the images had been converted to a different resolution, or if it's color pallete had been changed slightly to fit a different graphic format, or if one was converted to black & white. Such a system would probably be very helpful to sysops to get rid of duplicate pictures on their systems, but unfortunanently it would also give the cops an automated system for busting people. :(
participants (2)
-
Matthew J Ghio -
Pat Farrell