Expecting a front view of an image to match with a side view of the same image is impossible. They are both disjoint sets of information.
If all the images are frontal images, we can match them with a hight probability, otherwise I doubt this technology has a future.
You are applying pure logic to a very complex subject. I'd bet this is already routinely done by TLAs and whatnot, at least as a pre-screen before human photograph inspectors. The most obvious hole in your statement is with respect to 2D Spatial FFTs of the image...you can probably greatly increase your match probability via certain masking criteria applied to the 2D FFT. And from there there's lots of stuff that can be done with colors and other indirect stuff such as (perhaps) camera signatures in the photo (eg, If there's text that says "Hamamatsu Synchroscan Streak Camera" then don't bother doing the FFT--it ain't a picture of your dog). Look...a human being can recognize the side image of a person a lot of the time. There should be no reason this intelligence can't be encoded somehow. -TD