"Pat Farrell" <pfarrell@gmu.edu> writes:
We can prove statistical insignificance of duplication using strong hashing functions. Can we find a way to statistically prove "looks like" on a numerical basis?
Yes. If you were to take an image and divide it into let's say about 20 sections horizontally, and 20 sections vertically, and then average the intensities of all pixels in each of the 400 rectangles formed, you would create a fuzzy low-resolution version of the original picture which could be used to compare other pictures to it to determine weather they look like the orginal by using the same averaging method, and then comparing the block-pixel averages. If the pictures differed by less than +/- 5% or so for each block, the original pictures probably look very much alike. This method works well even if one of the images had been converted to a different resolution, or if it's color pallete had been changed slightly to fit a different graphic format, or if one was converted to black & white. Such a system would probably be very helpful to sysops to get rid of duplicate pictures on their systems, but unfortunanently it would also give the cops an automated system for busting people. :(