Looks Like It
Thursday, 26 May 2011
For the last few months, I have had a nearly constant stream of queries asking how TinEye works and, more generally, how to find similar pictures.
The truth is, I don't know how the TinEye image search engine works. They don't disclose the specifics of the algorithm(s) that they use. However, based on the type of results it returns, it appears to me to be some variation of a perceptual hash algorithm.
Perceptual hashes are a different concept compared to cryptographic hash functions like MD5 and SHA1. With cryptographic hashes, the hash values are random. The data used to generate the hash acts like a random seed, so the same data will generate the same result, but different data will create different results. Comparing two SHA1 hash values really only tells you two things. If the hashes are different, then the data is different. And if the hashes are the same, then the data is likely the same. (Since there is a possibility of a hash collision, having the same hash values does not guarantee the same data.) In contrast, perceptual hashes can be compared -- giving you a sense of similarity between the two data sets.
Every perceptual hash algorithm that I have come across has the same basic properties: images can be scaled larger or smaller, have different aspect ratios, and even minor coloring differences (contrast, brightness, etc.) and they will still match similar images. These are the same properties seen with TinEye. (But TinEye does appear to do more; I'll get to that in a moment.)
With pictures, high frequencies give you detail, while low frequencies show you structure. A large, detailed picture has lots of high frequencies. A very small picture lacks details, so it is all low frequencies. To show how the Average Hash algorithm works, I'll use a picture of actress Alyson Hannigan.
If you want to compare two images, construct the hash from each image and count the number of bit positions that are different. (This is a Hamming distance.) A distance of zero indicates that it is likely a very similar picture (or a variation of the same picture). A distance of 5 means a few things may be different, but they are probably still close enough to be similar. But a distance of 10 or more? That's probably a very different picture.
A more robust algorithm is used by pHash. (I use my own variation of the algorithm, but it's the same concept.) The pHash approach extends the average approach to the extreme, using a discrete cosine transform (DCT) to reduce the frequencies.
Then again, if you are running a service like TinEye, then you're not going to compute the pHash every time. I am certain that they have a database of pre-computed hash values. The basic comparison system is extremely fast. (There are some heavily optimized ways to compute a Hamming distance.) So computing the hash is a one-time cost and doing a million comparisons in a few seconds (on one computer) is very realistic.
Other variations can track general coloring (e.g., her hair is more red than blue or green, and the background is closer to white than black) or the relative location of lines.
When you can compare images, then you can start doing really cool things. For example, the search engine GazoPa [now offline] allows you to draw a picture. As with TinEye, I don't know the details about how GazoPa works. However, it appears to use a variation of the perceptual hash. Since the hash reduces everything down to the lowest frequencies, my crappy line drawing of three stick figures can be compared with other pictures -- likely matching photos that contain three people.
Updates
This blog posting has become very popular among the techie community. Over at Reddit, a person named Wote wrote:
Wote has made his python source code for the Average Hash algorithm public. (Thanks, Wote!)
David Oftedal also wrote an implementation of the image hash, using C#.
reverend_dan posted an excellent example of TinEye performing a complex match. In his example, it matched the picture of a woman in front of a skyline to a picture of the skyline without the woman.
A couple of people at Reddit complained about my used of Alyson Hannigan as the example image. (She's so cute that she is distracting.) However, it is actually part of my master plan. (Don't tell Alyson!) I'm hoping that she'll notice it one day and call me. Maybe she and I can double-date at Defcon this year... *sigh*
Over at Hacker News, the discussion is more technical with plenty of references to other algorithms. Sadly, one person pointed out that GazoPa will be shutting down soon.
Finally, TinEye has noticed the feedback and posted a response on their blog. According to them, the power behind TinEye is magic. (Can anyone cite this algorithm?)
Update 2013-01-21: Another perceptual hash algorithm based on gradients is described under this blog entry.
The truth is, I don't know how the TinEye image search engine works. They don't disclose the specifics of the algorithm(s) that they use. However, based on the type of results it returns, it appears to me to be some variation of a perceptual hash algorithm.
That's Perceptive!
Perceptual hash algorithms describe a class of comparable hash functions. Features in the image are used to generate a distinct (but not unique) fingerprint, and these fingerprints are comparable.Perceptual hashes are a different concept compared to cryptographic hash functions like MD5 and SHA1. With cryptographic hashes, the hash values are random. The data used to generate the hash acts like a random seed, so the same data will generate the same result, but different data will create different results. Comparing two SHA1 hash values really only tells you two things. If the hashes are different, then the data is different. And if the hashes are the same, then the data is likely the same. (Since there is a possibility of a hash collision, having the same hash values does not guarantee the same data.) In contrast, perceptual hashes can be compared -- giving you a sense of similarity between the two data sets.
Every perceptual hash algorithm that I have come across has the same basic properties: images can be scaled larger or smaller, have different aspect ratios, and even minor coloring differences (contrast, brightness, etc.) and they will still match similar images. These are the same properties seen with TinEye. (But TinEye does appear to do more; I'll get to that in a moment.)
Looking Good
So how do you create a perceptual hash? There are a couple of common algorithms, but none are very complicated. (I'm always surprised that the most common algorithms even work, because they seem too simple!) One of the simplest hashes represents a basic average based on the low frequencies.With pictures, high frequencies give you detail, while low frequencies show you structure. A large, detailed picture has lots of high frequencies. A very small picture lacks details, so it is all low frequencies. To show how the Average Hash algorithm works, I'll use a picture of actress Alyson Hannigan.
- Reduce size. The fastest way to remove high frequencies and detail is to shrink the image. In this case, shrink it to 8x8 so that there are 64 total pixels. Don't bother keeping the aspect ratio, just crush it down to fit an 8x8 square. This way, the hash will match any variation of the image, regardless of scale or aspect ratio.
- Reduce color. The tiny 8x8 picture is converted to a grayscale. This changes the hash from 64 pixels (64 red, 64 green, and 64 blue) to 64 total colors.
- Average the colors. Compute the mean value of the 64 colors.
- Compute the bits. This is the fun part. Each bit is simply set based on whether the color value is above or below the mean.
- Construct the hash. Set the 64 bits into a 64-bit integer. The order does not matter, just as long as you are consistent. (I set the bits from left to right, top to bottom using big-endian.)
= = 8f373714acfcf4d0
If you want to compare two images, construct the hash from each image and count the number of bit positions that are different. (This is a Hamming distance.) A distance of zero indicates that it is likely a very similar picture (or a variation of the same picture). A distance of 5 means a few things may be different, but they are probably still close enough to be similar. But a distance of 10 or more? That's probably a very different picture.
Getting Funky With pHash
While the Average Hash is quick and easy, it may be too rigid of a comparison. For example, it can generate false-misses if there is a gamma correction or a color histogram is applied to the image. This is because the colors move along a non-linear scale -- changing where the "average" is located and therefore changing which bits are above/below the average.A more robust algorithm is used by pHash. (I use my own variation of the algorithm, but it's the same concept.) The pHash approach extends the average approach to the extreme, using a discrete cosine transform (DCT) to reduce the frequencies.
- Reduce size. Like Average Hash, pHash starts with a small image. However, the image is larger than 8x8; 32x32 is a good size. This is really done to simplify the DCT computation and not because it is needed to reduce the high frequencies.
- Reduce color. The image is reduced to a grayscale just to further simplify the number of computations.
- Compute the DCT. The DCT separates the image into a collection of frequencies and scalars. While JPEG uses an 8x8 DCT, this algorithm uses a 32x32 DCT.
- Reduce the DCT. While the DCT is 32x32, just keep the top-left 8x8. Those represent the lowest frequencies in the picture.
- Compute the average value. Like the Average Hash, compute the mean DCT value (using only the 8x8 DCT low-frequency values and excluding the first term since the DC coefficient can be significantly different from the other values and will throw off the average). Thanks to David Starkweather for the added information about pHash. He wrote: "the dct hash is based on the low 2D DCT coefficients starting at the second from lowest, leaving out the first DC term. This excludes completely flat image information (i.e. solid colors) from being included in the hash description."
- Further reduce the DCT. This is the magic step. Set the 64 hash bits to 0 or 1 depending on whether each of the 64 DCT values is above or below the average value. The result doesn't tell us the actual low frequencies; it just tells us the very-rough relative scale of the frequencies to the mean. The result will not vary as long as the overall structure of the image remains the same; this can survive gamma and color histogram adjustments without a problem.
- Construct the hash. Set the 64 bits into a 64-bit integer. The order does not matter, just as long as you are consistent. To see what this fingerprint looks like, simply set the values (this uses +255 and -255 based on whether the bits are 1 or 0) and convert from the 32x32 DCT (with zeros for the high frequencies) back into the 32x32 image:
= 8a0303f6df3ec8cd
At first glance, this might look like some random blobs... but look closer. There is a dark ring around her head and the dark horizontal line in the background (right side of the picture) appears as a dark spot.
Best in Class?
Since I do a lot of work with digital photo forensics and huge picture collections, I need a way to search for similar pictures. So, I created a picture search tool that uses a couple of different perceptual hash algorithms. In my unscientific but long-term-use experience, I have found that Average Hash is significantly faster than pHash. Average Hash is a great algorithm if you are looking for something specific. For example, if I have a small thumbnail of an image and I know that the big one exists somewhere in my collection, then Average Hash will find it very quickly. However, if there are modifications -- like text was added or a head was spliced into place, then Average Hash probably won't do the job. While pHash is slower, it is very tolerant of minor modifications (minor being less than 25% of the picture).Then again, if you are running a service like TinEye, then you're not going to compute the pHash every time. I am certain that they have a database of pre-computed hash values. The basic comparison system is extremely fast. (There are some heavily optimized ways to compute a Hamming distance.) So computing the hash is a one-time cost and doing a million comparisons in a few seconds (on one computer) is very realistic.
Variations
There are variations to the perceptual hash algorithm that can also improve performance. For example, the image can be cropped before being reduced in size. This way, extra empty space around the main part of the image won't make a difference. Also, the image can be segmented. For example, if you have a face detection algorithm, then you can compute hashes for each face. (I suspect that TinEye's algorithm does something similar.)Other variations can track general coloring (e.g., her hair is more red than blue or green, and the background is closer to white than black) or the relative location of lines.
When you can compare images, then you can start doing really cool things. For example, the search engine GazoPa [now offline] allows you to draw a picture. As with TinEye, I don't know the details about how GazoPa works. However, it appears to use a variation of the perceptual hash. Since the hash reduces everything down to the lowest frequencies, my crappy line drawing of three stick figures can be compared with other pictures -- likely matching photos that contain three people.
Updates
This blog posting has become very popular among the techie community. Over at Reddit, a person named Wote wrote:
I just quickly implemented his average hash algorithm in Python and tried it on some random pictures, and it seems to be better at finding crops (or reverse crops, like demotivationals) than it has any right to be.
Wote has made his python source code for the Average Hash algorithm public. (Thanks, Wote!)
David Oftedal also wrote an implementation of the image hash, using C#.
reverend_dan posted an excellent example of TinEye performing a complex match. In his example, it matched the picture of a woman in front of a skyline to a picture of the skyline without the woman.
A couple of people at Reddit complained about my used of Alyson Hannigan as the example image. (She's so cute that she is distracting.) However, it is actually part of my master plan. (Don't tell Alyson!) I'm hoping that she'll notice it one day and call me. Maybe she and I can double-date at Defcon this year... *sigh*
Over at Hacker News, the discussion is more technical with plenty of references to other algorithms. Sadly, one person pointed out that GazoPa will be shutting down soon.
Finally, TinEye has noticed the feedback and posted a response on their blog. According to them, the power behind TinEye is magic. (Can anyone cite this algorithm?)
Update 2013-01-21: Another perceptual hash algorithm based on gradients is described under this blog entry.
Oh yeah, dream on.
Locally, I'm using DupDetector (freeware) to find similar images in a collection (I don't know how exactly it works, but it has different algorithms and a variety of filter options such as percentage of similarity to narrow things down). Every now and then I'm looking for better/newer software that does the job, but so for none has really convinced me either in terms of search results or useability/ergonomics. DupDetector's maker, Prismatic Software, apparently has just returned from the grave after years of being dead and gone.
There is also the spectral hashing algorithm, which seems to have quite good performance for image retrieval. The big thing is to compute the hash function after a suitable image transformation.
I did not test it for retrieval tasks, but used it to query image patches (5x5 to 11x11 pixels), and it was quite impressive.
See it here:
http://www.cs.huji.ac.il/~yweiss/SpectralHashing/
Question: if you had two images to compare, and wanted to measure perceptual difference between them, would pHash be most appropriate? We use something similar (but much simpler) now, but it falls down in subtler scenarios (e.g. when the difference is an extra hairline, or bold vs. plain text in an image, etc).
Could a variation of this technique be used to just find faces (So if you search for someone named Rose, you can limit the search to just faces). In other words, are there similarities of all the hashes that have faces in them as opposed to those that don't.
Also, could this technique be refined to do facial recognition like Picasa and iPhoto use? Or would that require an entirely different algorithm altogether?
One thing this article has done for me: It's made me want to relearn how to program (a process I already started, but I'll now accelerate.)
A few comments and questions..
1) I don't understand why compressing an image is going to generate the same 8x8 image each time no matter what aspect ratio it was originally... whether it has been stretched before.. If you stretch and then recompress a bunch of times don't you eventually lose the information?
That math is for some reason totally counter-intuitive to me. Could someone do a proof?
2) I don't truly understand the notion of a DCT and why it invites the possibility of doing cool "magical" tricks. Unfortunately there is no Khan Academy video on this..
3) On Hacker News someone asked, "'With pictures, high frequencies give you detail, while low frequencies show you structure.' I have a vague idea of what this means but can someone please explain it in a bit more detail?", to which I replied:
a "way i like to think of it is that high-frequency means high-detail (highly frequently needing information to specify how it looks) whereas low-frequency means low-detail"
Is my thinking/intuition totally off?
What happens if you want to compare a subset of an image with the full sized one. This algorithm would likely not work, right?
I.e. Comparing the above picture with the picture of only her face.
actually, don't look closer, because unless you run an inverse cosine transform on it, it's still just frequencies.
This is the inverse DCT (iDCT), which converts frequencies back into an image.
Which bit do we use when a pixel or DCT value is exactly equal to the average?
This should be more or less a matter of taste, of course, but could yield vastly different results depending on one's arbitrary choice.
... Guess I could have just checked.
If my GIMPology is correct, pixel # 32 using the first algorithm has a value exactly equal to the average, which appears as a 0 in the hash.
As with the ordering of bits, what to do when the value is equal to the mean does not matter. The only thing that matters is that you handle the situation consistently.
For the comparison, you can either use "greater than" or "greater than or equal". While your choice will change the hash value (for this corner case situation), it won't change the algorithm's acuracy.
Are Perceptual Hashes an instance of Compressive Sensing ?
over at:
http://nuit-blanche.blogspot.com/2011/06/are-perceptual-hashes-instance-of.html
Cheers,
Igor.
I implemented the first algorithm successfully and I am pretty amazed by how good it works! I looked for something like that for a long time to get some kind of raw understanding of it.
Now, I was also trying to integrate your pHash like implementation but I somehow don't get it.
I found a DCT implementation in C# which I then translated into my more familiar cross-platform development environment REALBasic. I got a realaticely basic, but good understanding of C due to my knowledge in php and javascript, but sometimes it's just insufficient …
What I need you to explain me and what I hope you do is (sadly) step 4 to 6.
To make it easier I try to ask my maybe stupid questions more specifically
4: Reduce the DCT => Do you mean I should calculate the coefficients with a 32x32 pixel matrix of the pictures gray values (0-255) and then from the resulting two-dimensional double-array only work with values from 0 to 7 for x and y ?
5: If I am right in step 4 I than go through all the DCT values I received (8x8 doubles of DCT coefficients) starting at x and y=1 to 7 instead of 0-7 to exclude the first term? by first term you mean the first two coefficients, right?
6: I then (like in average hash) go through each of the values again and calculate a binary representation wheter the current coefficient is above or below the average of all, right?
but do I exclude the first "term" again? because I'd halve only 7 values I could work with on each hash. This would also mean if I would include it, each hash would probably start with 00 in hex representation …
So obviously I am doing something wrong here. I don't get it …
I tried to work with different values from DCT and iDCT, I plotted each of them out to compare the results, but I didn't get good results. At first I thought I had it, but then I compared your pHash implementation against average hash and it actually produces worse results, detecting fewer of the duplicates than it probably should (while you said it could handle color transformations more easily.
So, please, what am I doing wrong?
Cheers
Congratulations on your implementation! Glad to hear it is working!
Regarding your questions:
4. Reduce the DCT.
I think you've got it right.
I have an old blog entry where I talk about the DCT's scaling properties:
http://www.hackerfactor.com/blog/index.php?/archives/252-PNG-and-Cameras.html
Basically, the forward DCT converts a 32x32 pixel square into 32x32 frequencies. The top left are the lowest frequencies and the bottom right are the highest. Zero out everything that isn't in the top-left 8x8. Alternately, construct a new 8x8 DCT and just copy the top-left 8x8 from the 32x32 into the new 8x8. This cuts out all of the high frequencies.
This is the same as doing a DCT scaling, from 32x32 to 8x8. The result that you want is an 8x8 DCT based on a larger DCT frequency decomposition. (You want to throw out the high frequencies.)
5. Compute the average value.
The 8x8 gives you 64 values. You want to process 63 of the 64 values. (Skip the first one.) There are a couple of different ways to do this. You can either omit the first value, or subtract it out afterwards. (Subtracting out is actually faster since you're not testing a conditional 64 times.)
The skip approach:
for(x=0; x < 8; x++)
for(y=0; y < 8; y++)
{
if ((x==0) && (y==0)) continue; /* skip (0x0) */
Mean += DCT[x][y];
}
Mean /= 63;
The subtraction approach:
for(x=0; x < 8; x++)
for(y=0; y < 8; y++)
{
Mean += DCT[x][y];
}
Mean -= DCT[0][0];
Mean /= 63;
NOTE: Your pseudo-code seems to skip an entire column (the y=1 bug). You just want to skip the first term.
Step 6: Just like the average hash. But this time, include the first term. The first term tells you the average intensity of the whole picture. Is the intensity greater or less than the mean computed from the other values?
"So, please, what am I doing wrong?"
I'm not familiar with your DCT or graphics library, but it sure sounds like you have the basic algorithm right. Don't worry about trying to match my exact values. Different libraries can lead to slightly different results. If the hamming distances between your value and mine vary by 3-5 bits, that's good enough to say you've got the algorithm correct. Simply different scaling functions -- to get it down to 32x32 -- can result in bit differences.
NOTE: I just revised this blog entry so the DCT value is "8a0303f6df3ec8cd" and not "8a0303769b3ec8cd". It's a 3-bit difference due to a different test case. (I had originally written the blog entry with a test case showing the picture with text in it, and had pasted the wrong hash.) Thanks for making me look back at my values!
that answer came faster than I expected.
And it pretty much helped me. Thanks!
My hash results are different, but it looks to work, so actually as it stays that way I don't care.
And for the theory behind it, I think I got it. There are still things about DCT I have to read about, but I understood the most important parts that I need right now.
I also uploaded some screenshots of the results http://oi54.tinypic.com/zmfkp2.jpg
Two other things though:
1. I think your e-mail notification for new posts is not working.
2. Do you, by any chance, know about some nice upscaling/interpolation algorithms for images which do a good work in guessing the missing pixels on high zoom factors? Like being able to re-render license plates but without actually tracing or implementing any character recognition?
I am not that much of a math expert and only a self-taught programmer and lacking time to privately study those topics in detail. Half the mathematical algorithms and papers I downloaded over time I may understand partially, but I am not able to extract an algorithm …
I read about basics like nearest neighbour etc. (which I would probably be able to integrate) but things like bicubic- or bilinear interpolation, or lanczos resampling is pretty much out of my grasp without further explanation and "easier" peeudo-code …
Maybe you can point to some resources that may explain it the easy way, or may be willing to do it yourself in a new post?
Thank you btw for helping me out with understanding the phash implementation!
Cheers!
> I also uploaded some screenshots of the results http://oi54.tinypic.com/zmfkp2.jpg
Oh! I see the problem!
I had this same problem the first time, too!
You're using signed ints for the hash. Your integer values are negative, so you're flooding it with 0xffff.... values. Use "unsigned char[8]" or uint64_t instead. You can print a uint64_t with printf("%llu",...).
> 1. I think your e-mail notification for new posts is not working.
Sorry -- I disabled it due to spammers trying to relay messages when they post comments.
> 2. Do you, by any chance, know about some nice upscaling/interpolation algorithms for images which do a good work in guessing the missing pixels on high zoom factors? Like being able to re-render license plates but without actually tracing or implementing any character recognition?
Hyper-zoom. I'm just beginning to play with it. Nothing ready for showtime yet.
> I am not that much of a math expert and only a self-taught programmer and lacking time to privately study those topics in detail. Half the mathematical algorithms and papers I downloaded over time I may understand partially, but I am not able to extract an algorithm ...
Yeah. I've ranted about that before. Most of those papers are written by academics, and they equate head-scratches with importance: the longer the reader sits and scratches their head, the more important the paper seems.
http://www.hackerfactor.com/blog/index.php?/archives/221-Oh-Baby,-Talk-Technical-To-Me.html
Many times, overly complex papers are really garbage. They leave out critical details so that the work cannot be recreated for validation. The reviewers rarely try to validate the findings, so the bogus papers get published.
> I read about basics like nearest neighbour etc. (which I would probably be able to integrate) but things like bicubic- or bilinear interpolation, or lanczos resampling is pretty much out of my grasp without further explanation and "easier" peeudo-code …
>
> Maybe you can point to some resources that may explain it the easy way, or may be willing to do it yourself in a new post?
A request! It will be a while (the image analysis posts take more time to write and my work schedule is filling up), but I'll see what I can do next month.
> The reviewers rarely try to validate the findings, so the bogus papers get published.
so, let me try to fill in some of the blanks.
AHASH
sample A (from average hash, introduction of "the waifu") is a JPEG, 69007 bytes, 500x500 pixels, 26577 colors, 67 urls.
lets just put this aside for now.
sample B (from average hash, step "1. reduce size") is a PNG, 289 bytes, 8x8 pixels, 63 colors.
text suggests B is a scaled version of A, visual inspection confirms this as quite possible.
a semi-exhaustive search over different scaling implementations and precisions finds a very
close (though not perfect) match with CImg::resize in "moving average interpolation" mode using high precision math.
the steps 2-4 come without sample data, this makes direct validation complicated.
detail on the step 2 "grayscaling" is missing entirely.
detail on steps 3+4 "conversion to bits" are present, lacking sample data one can not be sure usage of the term "mean" is
meant as detail though. the term "average" is used as well.
the wording in the similar steps 5+6 of the phash section avoids using the term "mean", using "average".
(while phash is actualy using median, not mean)
sample C (from average hash, step "5. construct the hash") is a 64bit value in string and graphical representation.
visual inspection suggests both representations may indeed show the same 64bit value.
the sample has 35 bits set to 1, which (assuming the "high precision math" indication on sample B holds) strongly suggests
there was actualy the mean value used in steps 3+4. (high precision and median will tend to yield exactly 32 bits set, low
precision and median with >= condition can yield >32)
for now, a placeholder implementation is used:
a simple CImg.RGBtoYCbCr().channel(0) conversion for the grayscaling.
a CImg.unroll("x") followed by a CImg.mean() and using (val >= mean) for determining the bit value.
the hash value X obtained this way for sample A is e7c7cbc2c4f4fae8.
comparing our own hash value X with sample C gives a hamming distance of 29, indicating a significant difference.
at this point headscratching ensued.
the implementation was checked in various different ways, all the way down to making sure the hamming distance calculation worked.
different variants of step 2 (grayscaling) were tested: classic 30+59+11, simple sum, squared sum, varying precisions...
different variants of step 3+4 (bitfolding) were tested: mean, median, varying condition (>= vs >), varying precisions...
no relevant progress was made in getting the selfcomputed hash for sample A closer to sample C.
at this point a visual comparison of samples B and C led to the discovery that there are significant differences in the overall
structure of the sample data, in particular in the left quarter of the square layout.
in sample B (and sample A) there are about 25% on the left taken up by blank wall, before the start of the circular shape.
in sample C the circle seems to clip the left and top boundary.
sir, we may have a case of "cropping the waifu"!
the validation code was reverted to the original state of high precision, YCbCr grayscaling, >= mean.
a semi-exhaustive search of possible cropping parameters applied to sample A was performed.
about 3150 matches for the hash presented as sample C were found when cropping 90-100 pixels left, 0-50 top, 0-15 bottom.
many more close matches (hamming distance < 5) were obtained in the same range.
a less-exhaustive search using low precision and median calculation found 25 matches with crops 97-102 left, 7-11 and 21-37 top, 0-6 bottom.
note the ranges given are those where matches were actualy found, the covered searchspace was ... wider.
the observed distribution of matches suggests the hashing is insensitive to about 10 pixels of cropping.
this could also be an artifact from the scaler.
so there might not have been any top/bottom cropping at all, but surely a good chunk of cropping on the left.
a larger cropping on the left with a potential smaller cropping at the top/bottom could be consistent with a
manual "click top-left and drag to bottom-right" cropping operation being performed.
possibility of cropping on the right side was not investigated mainly for reasons related to the impact on workload
of adding a fourth dimension to the search space.
different handling of pixel coordinates (mainly whether they count from 0 or from 1) between the involved cropping implementations (gimp, imagemagick, cimg) was identified as a potential source of error during the cropping search but
was not investigated closer since the hashing seems to be insensitive to off-by-one differences in general and enough
results matching sample C were obtained. the above pixel ranges for crops should be read as having a +/-2 error potential.
the difference between sample B and the local result obtained with CImg is small enough to be explained by differences in
two implementations of the same scaler or even by the same implementation being used with different precisions.
different implementations/precisions of converting a jpeg into RGB data could also explain the observed differences.
no closer investigation was performed.
no closer investigation of the impact of the different grayscalers on the hash was performed.
the data obtained while headscratching indicates no significant differences in the tested implementations.
a grayscaling based on image perception (yuv chan0, 30+59+11) might be better suited than the purely technical channel reductions.
the conversion to bits in step 3+4 compensates for most differences in post-grayscaling value-range, with
median-based calculation being completely insensitive and mean-based being mostly insensitive to scaling of the absolute
values while keeping the relations between the pixels.
so, it seems the lady got cropped somewhere between step 1 and 5.
various evidence in this case can be found in http://tma.no-ip.de/waifu/
of particular interest might be ...
http://tma.no-ip.de/waifu/hashimg.cpp
snapshot of the code used above.
generates average hash (AHASH) and phash (both in mean and median flavors) and RHASH (see comment 11).
littered with a lot of ifed-out debugprints that might help to understand whats going on.
comments on cimg download (no need to install) and compiling at the top.
http://tma.no-ip.de/waifu/test-crop-95-50-10.png
a cropped version of sample A that hashes to sample B for hashimg AHASH mean.
http://tma.no-ip.de/waifu/rgb4x4-cimg-vs-im.jpg
lower end of ten matches found by running the CImg generated RHASH for sample A against a database of 56M imagemagick generated vectors.
results #1 to #7 are all resaved/scaled variants of sample A.
result #8 is sample A, the vector distance of 29 is due to the different implementations used.
result #9 ... i will get to that in a minute.
result #10 indicates the lady got cropped before.
perfdata indicates it was comparing about 486mil images per second, including backend overhead and resulthandling.
http://tma.no-ip.de/waifu/result9.png
there is a clear need for you to clear her reputation by showing in a way of your choice that it was the picture and not herself being violated there.
and it only counts if you provide enough details and perhaps even 20 lines of code that can be used for validation.
and could you please provide the hash for sample A, and/or the image used for generating the hash sample C?
ph_dct_imagehash from pHash.cpp from pHash-0.9.4 is investigated.
you can find a copy to stare at here: http://tma.no-ip.de/pHash-0.9.4/src/pHash.cpp
just search for ph_dct_imagehash, the code should be easy enough to follow. mostly.
you might also want to play around with PHASH_DEBUG in hashimg.cpp to see the gory details.
the image is loaded with a low precision type CImg<uint8_t>.
all following computations are done on higher precision CImg<float>.
the spectrum == 4 handling seems broken, img is in an undefined state at the time, the width/height/depth should probably be taken from src.
the application of meanfilter (after the conversion to grayscale) seems to result mainly in a 7x7 blur and a shift in value range.
the scaling to 32x32 (after grayscale and meanfilter) is done with CImg default of "nearest-neighbor interpolation".
without much investigation, the result looks worse than the "moving average interpolation" used in the AHASH sample.
my lack understanding of dct operation in general and the cpp-template mess that is used in the specific implementation results in me
not understanding what exactly is going on in the magic step even though i have been staring at it for a week now and
stepped through part of it in gdb.
the result in dctImage shows the expected concentration of high-absolute values in the top left section of the 32x32 grid.
from the 32x32 result of the magic step, a 8x8 subsection starting at (1,1) (while counting from 0) is taken.
this discards not just the high value at (0,0), but the whole row and column 0.
the 8x8 subsection is converted to a bit vector using the median (not mean!) value for filtering.
this results in the bits being mainly set depending on whether the output of the magic step in the position was positive or negative.
and a narrowed hash value range from the exactly-50%-bits-set coming with it.
it seems there is some potential for improvement.
only been playing around for it for some hours.
the following is just me thinking out loud, i will probably try some/all of this over the next weekend, and probably report back on it.
- try selecting from 0,0
- the blur filter has to go.
it seems to make phash perform quite horrible on scaled down versions of images.
replace blur + lowquality scaler by f.ex. the scaler used for the AHASH implementation.
- try different order of components.
like replacing the rectangular selection of 8x8 dct by the first 64 elements returned by the jpeg zig-zag pattern scaled to 32x32.
or keeping rectangular selection and going zigzag on the bitfolding.
- the mean vs median question.
i actualy like the median since it eliminates the 0,0 problem in an elegant way.
when using mean, exclude 0,0 from mean calculation but not bitcounting.
- try whether scaling before or after grayscaling makes a difference.
- set up proper test cases for "scaling" and "finding in pile" tasks.
- try to find an answer to "how far does using median restrict the hashspace".
or "how many 64 bit values with exactly 32 bit set to 1 do exist".
any suggestions?
the image hash pointed out in comment #11 by Mr Schwartz has actualy been my workhorse for image comparison since about 2005.
as the article referenced there contains a commented implementation i will spare you the details.
not that there are a lot, it is pretty straightforward.
the hash data is 48 byte of rgb data, the image scaled to 4x4.
the comparison metric is sum of absolute differences: abs(a[0]-b[0]) + abs(a[1]-b[1]) + ... + abs(a[47]-b[47])
(or the 48 dimensional manhattan distance.)
to use the words someone had on AHASH performance: the result is better than it has any right to be.
now, some tactical magic that can be used with AHASH too.
since there is a welldefined relationship between components of the hash and areas of the input image it is possible
to apply "masks" and "transformations" at comparison time.
"masks" meaning you can do the comparison with restrictions like "ignore the bottom 25% of the image, someone put text there".
with RHASH this goes down to a granularity of color channel or just ignoring some (f.ex. the lowest two) bits of some/all values.
masks need to be implemented in the "performance" codepath of the comparison, but as all involved data has to be at the cpu for
the following distance calculation anyways the performance lost to the additional and and/or or operation(s) is small.
"transformations" meaning you can do searches for rotations in 90degree steps, mirroring, inverts.
with RHASH you get additional options like color channel swaps.
transformations are simply applied to the searchedfor vector, then a regular search with or without mask(s) is performed.
this comes at the price of having six times (48b vs 8b) the memory footprint for the vectors compared with AHASH or PHASH.
on a 2007 vintage Athlon x2 3GHz i am seeing comparison rates of about 115M vectors per second for somewhat optimized
code working on vector data that is already in memory while returning the best 10 results.
the limit on this platform for a straight single-target search is mainly the memory bandwidth.
48byte * 115M is about 5.5GByte/s, well more than the 3GB/s the athlon with its ddr2 can actualy handle.
did i mention optimizations?
on platforms with more memory bandwith than possibly useful (like core-i5, 11-12GB/s) you get about 200-250Mcps for a single core of about 3GHz.
the optimizations there are somewhat different from those for membw-bound operation.
starting to use more cpu cores for vector comparison will put any known CPU into a membw-bound situation.
one way to put excess cpu under membw-bound conditions to use is batch searches.
you take more than one reference vector and put them as close to the cpu as possible, registers or l1 cacheline.
then iterating over the database vectors (the membw-intensive part) you compare each loaded dbvector to your
multiple refvectors.
this puts the comparison back to being cpu-bound. quickly.
but it also means you are getting 1-3 additional comparisons "for free" in the same wallclock time.
now multiply by cores.
besides actual batch searching for different image vectors, this can be combined with transformations, like querying batches
of a reference image and nine of its common transformations (rotations, mirrors) as a batch of ten at a limited performance
impact that shows mostly linear scaling with batchsize beyond the "for free" point of unused cpu.
Appreciate your comments. Thanks.
Regards,
Vilva
I used the Wote's Python code and I improved the hash function by doing a (Laplacian) filter of the 8x8 image.
Here it is the code of the "avhash" function (most likey you need to correct the whitespaces ):
-----cut----
def avhash(im):
im = Image.open(im)
im = im.resize((8, 8), Image.ANTIALIAS).convert('L')
in_data=numpy.array((im.getdata())).reshape(8,8)
filt=numpy.array([[0,1,0],[1,-4,1],[0,1,0]])
filt_data=signal.convolve2d(in_data,filt,mode='same',boundary='symm').flatten()
result=reduce(lambda x, (y, z): x | (z << y),
enumerate(map(lambda i: 0 if i < 0 else 1, filt_data)),
0)
return result
---cut---
I used this code to make a simple program which finds an image (from my pictures) based on the example file.
This is necessary for doing do a backup copy of the metadata from the Panoramio (GPS coordinates) and save it into the original file.
I've written an implementation of the first method into some PHP code that I'm currently testing and it appears to be working relatively well, however I have a few questions maybe you could help with.
- Would any modifications to image sharpness, contrast, etc be beneficial if applied to the images prior to hashing?
- How much more accurate would a DCT type hash be?
I'd like to set up code to perform a DCT hash but this would be done with PHP and although I can look up a DCT formula I can't exactly understand all of the math involved. Are there any resources you would recommend to someone who hasn't done this kind of math since college?
http://pastebin.com/Pj9d8jt5
Not very pretty (I don't really understand any of the maths), but it does appear to work!
Thanks for the great post.
E.
Probably I've found a little bug: When calculating the hash value after step 6, you are adding values only when x!=0 AND y!=0. If this is correct you could simply remove the if(x!=0&&y!=0) statement and start both loops at 1 instead of 0.
However, I think what you wanted to do is rather to add hash values only when x!=0 OR y!=0 because you wanted to exclude only the very first term as noted in step 5.
What you describe sounds right.
Assuming you are encoding it the same way I did, then the last four bits represent the highest frequencies along the vertical axis. Since you scaled the picture 50%x60%, that changes the vertical axis the most, and scaling smaller removes higher frequencies. In effect, you change the highest vertical frequencies and it appears in the pHash.
I'm actually a little surprised that it worked this well. Usually asymmetrical scaling changes a few more bits.
Thanks for your quick reply and explanation.
Will there be any algorithm which will help us to find similar images(even after asymmetrical scaling)?
This will help me a lot,
Thanks a lot.
"Will there be any algorithm which will help us to find similar images(even after asymmetrical scaling)?"
That's exactly what pHash does. The hash can change a little if there is scaling, minor editing, cropping, etc. You're not looking for an absolute match in the hash values. You looking for a "mostly similar" hash.
In your example, the difference is 4 bits. That's close enough to say they are similar. If not identical images, then they are probably based on the same image.
Make no mistake: your asymmetrical scaling is a modification to the image. Your pHash values identified that they are mostly the same picture, but there is a minor modification between them.
Seems to be related to double rounding differences as discussed in these bugs:
https://bugs.openjdk.java.net/browse/JDK-7131459
http://hg.openjdk.java.net/jdk8/jdk8/jdk/rev/bc1f16f5566f
We have started looking at modifying algorithm to tolerate the rounding differences but not understanding how it really works makes this a bit difficult for us.
Can anyone point us to the area of the code and type of change to make? I am thinking forcing a precision somewhere with Big Integers or such?
Thanks again,
K
However I'm having trouble understanding why I need to hash the binary string to compute the hamming distance. Can't I just find the hamming distance on the binary string.
Also, I'm having trouble understanding how to construct the hash from the binary string. Can you point me to any examples or information.
Thanks again for the great tutorial!
As I mentioned, my code isn't exactly the same as pHash. Among other things, I don't bother with any pre-blur. (Then again, I didn't look at the pHash source code, so I didn't even know they did that. )
Applying a pre-blur, such as a Gaussian blur, would be just another step to reduce high frequencies. They could be doing it to speed up the file size reduction step. (Blur the picture, then use pixel sampling to shrink the picture. This could be faster than doing a pixel average to shrink the image, depending on how it is implemented.)
You might want to write to David Starkweather over at phash.org and ask him. (He's friendly and responsive and his answer would be authoritative.)
You are right. Wikipedia says the same thing:
Gaussian smoothing is commonly used with edge detection. Most edge-detection algorithms are sensitive to noise; the 2-D Laplacian filter, built from a discretization of the Laplace operator, is highly sensitive to noisy environments. Using a Gaussian Blur filter before edge detection aims to reduce the level of noise in the image, which improves the result of the following edge-detection algorithm. This approach is commonly referred to as Laplacian of Gaussian, or LoG filtering. With more smoothing, fewer edges are detected.
Thank you for the great article!
I need your help please. I try to implement image hash in C. When i compare 2 near-identical test images, i get the first one 16 in length of hash and the second one 11 in length That must be wrong with my implementation. I am almost sure something is wrong in the following code. Please check it line for line for me. Thanks!
double **dct[x][y] // i am here
// Compute average
double total = 0;
for (int x = 0; x < 8; x++) { // should be x = 1 ??
for (int y = 0; y < 8; y++) { // should be y = 1 ??
total += dct[x][y];
}
}
total -= fdct[0][0]; // Excluding the first term, correct??
double avg = total / (double)((8 * 8 ) - 1);
// Compute hash
uint64_t hash; // should be uint64_t hash = 0; ??
for (int x = 0; x < 0; x++) { // should be x = 1 ??
const float *row = (const float*)dct[x]; // is that correct??
for (int y = 0; y < 8; y++) { // should be y = 1 ??
if (row[y] > avg){ // and here correct??
hash |= 1;
}
hash <<= 1;
}
}
printf("HASH: %llx\
", hash);
return hash;
a) you should exclude the "first term" from the same matrix as the total was compiled from. currently you seem to be using different arrays. (dct for summing vs fdct for exclusion)
b) the hash should be initialized to 0, or you should add a "set individual bits to 0 where needed"-else.
c) the compute-hash for-x will not run at all since it will go all the way from 0 to -1, the "x < 0" should probably be a "x < 8". in its current form, the hash will not be computed at all due to this. it will simply print whatever happens to be in the memory hash got assigned to.
d) "hash |= 1" will set the same bit again and again and again and again. you need to shift the applied marker (the "1") or the hash after each bit (even if you didnt set a bit, or set it to 0!).
and some sidenotes while i am at it:
e) there is no real need to have the row pointer, you can access the value by [x][y] just as you did for summing them up.
f) C arrays tend to count from 0, so having the for loops go from 0 is the right thing.
You are partially correct. The hash may change, but it is dependent on the scaling algorithm you use to shrink the large picture to the smaller size. This is because scaling smaller can reduce high-frequency edges and could make the differences between low-frequency regions less distinct.
In the worst case, you might change a few bits (like 4 bits) in the 64-bit hash, but you won't change it enough to have a "significantly different" Hamming distance.
With the algorithm I use, I typically see 0-2 bits difference when scaling from a massively huge image to the 32x32 image for pHash, depending on the size that is being reduced. But since the algorithm compares based on Hamming distance and not absolute hash values, the difference isn't major. A "likely the same picture" match of 64 bits and 62 bits will both yield the same "likely the same" result.
With a lot of digital cameras, the thumbnail image may have sharper edges than if you take the large camera picture and shrink it to the thumbnail size. If I have a picture I am trying to match, I might get a 63-bit match on the big picture and a 60-bit match on the thumbnail -- both matches are close enough to determine "likely the same". You can actually see this same factor on TinEye -- the best matches are usually the same size in log(2) of each other.
you algo is simple and it works good.
in the first simple algorithm, Average hashing you didn't use DCT. May i know why? and also DCT based algorithm is called as PHASH. then what about the Average Hashing Algorithm name?
Any reference paper about Avg Hash Algorithm.
Thanks
I got some strange false positives when using the average hash, which I reduced by using the difference.
On the other hand, it's not as fond of your squeezed version of Ms. Hannigan:
./AverageHash.py Alyson_Hannigan_200512.jpg Alyson_Hannigan_200512-8x8.png
0xe7c7dbc2c6fcfae8 Alyson_Hannigan_200512.jpg
0xe7c7dbc2c4fcfae8 Alyson_Hannigan_200512-8x8.png
Similarity: 98.4375%
./DifferenceHash.py Alyson_Hannigan_200512.jpg Alyson_Hannigan_200512-8x8.png
0xcd4a9a53946d22d7 Alyson_Hannigan_200512.jpg
0xcf4ada53946f2ad7 Alyson_Hannigan_200512-8x8.png
Similarity: 93.75%
(Can be found at http://folk.uio.no/davidjo/programming.php .)
I had added some forwarders to the old page, but it seems it's all been deleted.
New page:
http://01101001.net/averagehash.php
http://01101001.net/AverageHash.py
http://01101001.net/Imghash.zip
http://01101001.net/differencehash.php
http://01101001.net/DifferenceHash.py
Does the position of the different bit tell me how much the difference of the picture?
Unfortunately, no. By the time the picture has been shrunk, frequency reduced, and summarized, the bits really don't tell you much about specific differences between the images beyond gross-descriptions of the low frequency patterns.
pHash program uses Hamming Distance as one of the ways to find if the images are similar. It uses a threshold hamming distance value of 21.
So, lets say if I implement my own version of TinEye and I pre-calculate pHash of all the images in my data set and store these hashes in a database(persistent or in-memory), then I will need to match the pHash of query image with pHash of all the images in my database.
A naive algorithm for this problem will take O(n), where n is the number of images in the database. The search time can increase with large number of images. I tried looking at another possible solution, which is a research done at Google, http://research.google.com/pubs/archive/33026.pdf. However, this technique works for smaller hamming distances upto 10 and would not be effective in this case.
I am sure, this problem must have been faced before too. So, can you please suggest if there is an existing paper/solution to this problem for large hamming distance like 21.
Regards,
Piyush
Do you solve it ?
but I don't kown how to count the tables number, I can't understand the Implicit function int character 3.1.2, can you show me the explicit function ? thanks
https://github.com/bitlyfied/js-image-similarity
I've also posted the code on Stack Overflow: http://stackoverflow.com/questions/14106984/how-to-calculate-discrete-cosine-transform-dct-in-php
function imgDTC($img){
// m1 = Matrix1, an associative array with pixel data from the image
// m2 = Matrix2, an associative array with DCT Frequencies
// x1, y1 = coordinates in matrix1
// x2, y2 = coordinates in matrix2
$m1 = array();
$m2 = array();
// iw = image width
// ih = image height
$iw = imagesx($img);
$ih = imagesy($img);
// populate matrix1
for ($x1=0; $x1<$iw; $x1++) {
for ($y1=0; $y1<$ih; $y1++) {
$m1[$x1][$y1] = imagecolorat($img, $x1, $y1) & 0xff;
}
}
// populate matrix2
// for each coordinate in matrix2
for ($x2=0;$x2<$iw;$x2++) {
for ($y2=0;$y2<$ih;$y2++) {
// for each coordinate in matrix1
$sum = 1;
for ($x1=0;$x1<$iw;$x1++) {
for ($y1=0;$y1<$ih;$y1++) {
$sum +=
cos(((2*$x1+1)/(2*$iw))*$x2*pi()) *
cos(((2*$y1+1)/(2*$ih))*$y2*pi()) *
$m1[$x1][$y1]
;
}
}
// apply coefficients
$sum *= .25;
if ($x2 == 0 || $y2 == 0) {
$sum *= 1/sqrt(2);
}
$m2[$x2][$y2] = $sum;
}
}
return $m2;
}
Though I suppose I can now that the stackoverflow thread is there
www.musicdsp.org/showone.php?id=18
Honestly, I've never heard of the Walsh-Hadamard Transform until you mentioned it. Is there a better description of what it does beyond the vague Wikipedia articles?
http://en.wikipedia.org/wiki/Fast_Walsh%E2%80%93Hadamard_transform
http://en.wikipedia.org/wiki/Walsh%E2%80%93Hadamard_transform
From what I gather, WHT is another signal decomposition function, similar to FFTs. The reason we use DCTs instead of FFTs has to do with the number of parameters. 8x8 values yields 64 DCT scalars but 128 FFT scalars (64 real + 64 imaginary). Since WHT would transform 8x8 real inputs into 64 real outputs, I don't see why it wouldn't work as a variant to the DCT approach.
This is my solution to overcome this:
Bitmap squeezedImage = new Bitmap(8, 8, PixelFormat.Format32bppRgb);
Graphics drawingArea = Graphics.FromImage(squeezedImage);
drawingArea.CompositingMode = CompositingMode.SourceCopy;
drawingArea.CompositingQuality = CompositingQuality.HighQuality;
drawingArea.InterpolationMode = InterpolationMode.HighQualityBilinear;
drawingArea.SmoothingMode = SmoothingMode.HighQuality;
using (ImageAttributes wrapMode = new ImageAttributes())
{
wrapMode.SetWrapMode(WrapMode.TileFlipXY);
drawingArea.DrawImage(theImage,new Rectangle(0,0,8,8), 0, 0, theImage.Width, theImage.Height, GraphicsUnit.Pixel, wrapMode);
}
Code could be optimized but works and class is ready to use:
<?php
/* ImageDiff --------------------------------------------
Class to identify difference between two images.
Difference is identified in comparing the delta between average color.
Based on idea from:
http://www.hackerfactor.com/blog/?/archives/432-Looks-Like-It.html
call function Diff to get an Array with the results
Diff Description:
Array Diff( string $im1, string $im2 [, boolean $save[, integer $size]] ):
$im1: Path to first file
$im2: Path to second file
$save: Saves the compare images in the same folder than the soruce file1
details are returned back in the array.
$size: defines the resize image size which is used to compare.
default is 16 pixels, Higher = more Detail, but slower
Example:
$clIM = new ImageDiff;
$ret = clIM->Diff("./images/image1.jpg","./images/image2.jpg", true)
print_r($ret);
*/
class ImageDiff{
public $dbg = false;
public function Diff($im1, $im2, $save=false, $size=16 ){
// $im1, $im2 = path to two images
// $size default 8, resize size for delta processing
// $save default. saves the resized images (), same target as src with prefix sized_
$dbg = $this->dbg;
// Vergleichsbilder erzeugen Nr. 1
$i1 = imagecreatefromjpeg($im1);
list($width1, $height1) = getimagesize($im1);
$id1 = imagecreatetruecolor($size, $size);
imagecopyresized($id1, $i1, 0, 0, 0, 0, $size,$size, $width1, $height1);
imagefilter($id1,IMG_FILTER_GRAYSCALE);
// Vergleichsbilder erzeugen Nr. 2
$i2 = imagecreatefromjpeg($im2);
list($width2, $height2) = getimagesize($im2);
$id2 = imagecreatetruecolor($size, $size);
imagecopyresized($id2, $i2, 0, 0, 0, 0, $size,$size, $width2, $height2);
imagefilter($id2,IMG_FILTER_GRAYSCALE);
// Nun Pixel
$i1_avg = $this->average($id1);
$i2_avg = $this->average($id2);
//Compute Hash
$i1_hash = $this->ComputeHash($id1,$i1_avg);
if($dbg) echo "Hash Bild 1 : ".implode($i1_hash)." <br>";
$i2_hash = $this->ComputeHash($id2,$i2_avg);
if($dbg) echo "Hash Bild 2 : ".implode($i2_hash)." <br>";
// Count Distance:
$fail = 0;
for($x=0;$x < count($i2_hash);$x++){
if($i2_hash[$x] != $i1_hash[$x]){
$fail++;
}
}
$delta = ($fail / ($size*$size))*100;
if($save){
// Create Delta image
$id3 = imagecreatetruecolor($size, $size);
imagecopy($id3, $id1,0,0,0,0,$size,$size);
$col =imagecolorallocate( $id3 , 0 , 255 , 0);
$x=$y=0;
$cnt=0;
for($y=0; $y < $size; $y++){
for($x=0; $x < $size; $x++){
if($i2_hash[$cnt] != $i1_hash[$cnt]){
imagesetpixel($id3, $x, $y,$col);
}
$cnt++;
}
}
$fndelta = pathinfo($im1, PATHINFO_DIRNAME)."/delta_".basename($im1);
imagejpeg($id3,$fndelta);
$ret["deltaimage"]= $fndelta;
// Save temporary images
$fn1 = pathinfo($im1, PATHINFO_DIRNAME)."/sized_".basename($im1);
$fn2 = pathinfo($im2, PATHINFO_DIRNAME)."/sized_".basename($im2);
imagejpeg($id1,$fn1);
imagejpeg($id2,$fn2);
$ret["tempimg1"] = $fn1;
$ret["tempimg2"] = $fn2;
}
$ret["delta"]= $delta;
return $ret;
}
private function ComputeHash($img, $avg = 100){
global $dbg;
$w = imagesx($img);
$h = imagesy($img);
$r = $g = $b = 0;
for($y = 0; $y < $h; $y++) {
for($x = 0; $x < $w; $x++) {
$rgb = imagecolorat($img, $x, $y);
$r = $rgb >> 16;
if($r <= $avg){
$hash [] ="1";
} else {
$hash [] ="0";
}
}
}
return $hash;
}
private function average($img) {
$w = imagesx($img);
$h = imagesy($img);
$r = $g = $b = 0;
for($y = 0; $y < $h; $y++) {
for($x = 0; $x < $w; $x++) {
$rgb = imagecolorat($img, $x, $y);
$r += $rgb >> 16;
//$g += $rgb >> 8 & 255;
//$b += $rgb & 255;
}
}
$pxls = $w * $h;
//$r = (round($r / $pxls));
//$g = (round($g / $pxls));
$r = (round($r / $pxls)); // Da wir graustufen verwenden reicht eine Farbe
//return $r ."," . $g .",". $b;
if($r < 10 )$r=0; // Wenn fast Schwarz, dann schwarz!
if($r > 245 )$r=255; // auch mit Weiß so verfahren.
return $r;
}
}
?>
i accidentally formated a 2TB-Harddrive which had nearly 1TB of Pictures/Movies (selfmade, some have EXIF-Info, but the most doesn't have) on it.
After noticing my failure i made an DD-Image of the Harddrive an tried to restore as much as possible with testdisk.
Now i have nearly 4-5 Million Files with temporary names, with some files being there more than once.
In an attempt to sort out doubles i tried to use md5/sha256/sha512 but came to the conclusion that this gives me too much false positives, the hashes are not unique enough.
So i stumbled across pHash and was wondering myself, could i use this tool to generate all the hashes to find out all the duplicates or is pHash not the right tool to archive this?
I don't care how long it would take to generate the hashes, i just don't want to wait for a few weeks until it finishes to find out that i used the wrong tool.
What do you think, could i use this tool for this kind of "work" or are there better tools out there that should i use for this particular problem?
Thanks in Advance,
StoneCraX
Now I'm doing research on finding the most similar images, and may use your method and post, so should I cite this web-site or, cite one of your published paper on this topic?
Thanks in advance:-)
If you're writing about my blog post and findings, then cite this blog entry.
If you're writing about a specific algorithm (pHash or average hash), then cite http://phash.org/docs/pubs/thesis_zauner.pdf
I have to disagree with the fact that ordering the DCT bits is not important. Or, it is true when you compare just the same hashes but if you want to find "close" matches the ordering is actually quite important (unless you have a secret but efficient hamming distance sorting algortihm )
Please see the blog entry here for reasoning and the results obtained by ordering the bits "smartly".
http://nekkidphpprogrammer.blogspot.fi/2014/01/not-all-bits-are-created-equal.html
Thank you for this article, it directed me to the right direction (all my blog entries are mulling over perceptual hash and finding similar images, at least so far)
I looked over your code. Why are you sorting the bits?!?
The order that you choose to encode does not matter: left-to-right, right-to-left, top-to-bottom, etc. In my example, I chose a raster pattern from left-to-right, top-to-bottom and I encodeded them in big-endian format (top left is MSB, lower right is LSB). However, you can choose to do a reverse raster, encode in little-endian, or even some weird hybrid.
However, in order to compare hashes they all have to be encoded *consistently*. If you encoded from left-to-right, then every hash must be encoded from left-to-right.
Sorting breaks the consistency. With sorting, every hash is now encoded in a pseudo-unique way and is not comparable.
As far as comparisons go... Computing the Hamming distance over a 64-bit pattern is extremely efficient. (Doing a for-loop across the 64-bits is very slow.) Even if the lowest frequency position is most significant for the match, it's still faster to compare them all first.
Your idea of weighting the bit positions is very interesting and your results suggest that they can be used to reduce the number of false-positives.
Of course the bit order has to be consistent, but I still stand (based by my experience) that ordering of the bits is significant when trying to find matches from a big number of images (in my case currently well over 400k) where comparing each image pairwise is too costly.
I think we are talking a bit from a different viewpoint or I have missed something
If you have an algorithm that efficiently finds ALL the images within defined Hamming-distance in the image set (say in a database), I fully agree that the ordering is insignificant. If we just compare two (or a few) images this surely is true.
In my scheme every hash has the bits in the same order but the (constant) order of bits is based on the "weight" of that position in the DCT matrix. So the "importance" of each bit decreases when going from left to right in the hash (MSB -> LSB). E.g. in the basic 8 x 8 grid the relatively insignificant bit on the top right corner of DCT grid is eighth bit of the hash, but in my ordering it's on the bit position 50. If we sort the hash that insignificant change affects the sort order much less with my hash than with the traditional hash.
In my code example the sorting of the array (ksort) is just for ordering the bits so that building the hash would be easier (it doesn't really affect the order the bits are situated in the hash in the end, but allows me to use the simple "multiply by two and add 1 if over average"-logic) instead of OR-ing the nth bit.
In other words the $order array is just an index into the "traditional" hash bits to facilitate the transformation to the new ordering of hash bits.
After using this sorting I have got much better hit ratio so there just MUST be something in it
Ah, I understand.
You are correct that some bits (the low frequency bits) are usually more important. However, that isn't always the case. (Try a picture from a video capture card -- the horizontal interlace lines and scan line can make a big difference in the low frequency horizontal patterns.)
It would be interesting to see if there is an optimal weighting factor for each bit position that would increase the accuracy.
Until now I haven't found any missed duplicates but in regard of the huge number of images it's probable there are some or many.
To be on the safe side I'd classify my findings like this: "When used with good-quality photographs this sorting order seems to form a better base for indexing similar images close to each other."
I would also argue that this ordering might be at least close to the optimal one (for photos) - though the ordering of "same-weight" bits is just a 50-50 decision - which intuitively doesn't matter statistically but maybe it does if the images are skewed somehow (like video with interface and scan lines).
If there's a known pattern one could of course move bits around that frequency to lower signifigance by modifying the signifigancy matrix. So the matrix would be somehow connected to the problem space. Maybe one could try to locate the changes statistically when comparing the similar images, if they can be found with other means - or generated for the test.
Ah... Bingo! I like that and I can totally understand how that would happen. This would be a great first-guess when looking at hundreds of millions of pictures.
It's very subtle (and I missed it at first), but this is an excellent observation. Thank you for the great idea.
It can be found at:
http://nekkidphpprogrammer.blogspot.fi/2014/01/the-better-dct-perceptual-hash-algorithm.html
After changing the 32x32 to 128x128 and constructed with my bit-reorganizing-method it seems too perfect to be true. No false positives found and quite a lot of new true positives pruned from the same image set.
http://nekkidphpprogrammer.blogspot.fi/2014/02/yes-it-is-perfect-now.html
I'll rest my case
The enhanced hash not only makes a better initial guess but what is more valuable - to me at least- is that it facilitates finding (almost) all similar images in the image database (not only for finding similar images to image "X").
With a simple SQL Query grouped by n left hex characters it lists sets of similar images efficiently. I have ended the value of 8 (ie 32 bits out of 64) to be the best with my dataset; just a few (maybe 1 per cent) of false matches and still finds almost all true matches. Selection of 7 (28 first bits) still finds a few more matches but also the ratio of false matches rises significantly.
Sorry for bombing the blog, I think I now have said enough
Rather than reordering my hashes (requiring me to recompute everything -- which would be a few hours), I just put in a bitmask.
I'm storing values in as 64-bit value and as a raster from top-left to bottom right. I use a C union structure that allows me to access the value as a uint64_t or uint8_t[8] or uint16_t[4].
typedef union
{
int64_t i64;
uint64_t u64;
uint32_t i32[2];
uint32_t u32[2];
uint16_t i16[4];
uint16_t u16[4];
byte u8[8];
} hash64;
I tested the top 2x2 of the DCT with a simple bitmask:
if ((needle.u16[0] & 0xC0C0) != (haystack.u16[0] & 0xC0C0)) then it misses the top 2x2 and should fail to match.
The result?
(1) Minor speed improvement that will significantly scale larger. (Awesome!)
(2) None of the matches drop off and all of but one of the false-matches drop off.
Increasing to a 3x3 begins to drop off positive matches when a border or major edit is performed. Even using 8 of the 9 3x3 (excluding position [3,3]) drops off a few positive matches.
Thanks Nekkid PHP Programmer. This is really a nice optimization.
Update
I spoke too soon.
I did a more thorough test. The optimization excludes matches where high-contrast edges are cropped out of the picture.
As an example:
http://fotoforensics.com/analysis.php?id=6a0b67c3a2cbdad866085f28cc9c5e317d811782.460679
Without the optimization, there are 270 matches -- all are correct matches with different variations of cropping and edits.
With the test for the same upper 2x2, it only finds 35 of the 270 matches. This is because most pictures either crop the space above the guy's head or crop out the high-contrast black thing to the right of the wine glass. Because of how DCT works, these are significant changes to the upper 2x2 DCT but still yields visually similar content.
Or more succinctly, the 2x2 check will exclude matches like:
http://fotoforensics.com/analysis.php?id=c9c58579a172378b02e8dab14d812361a19a8542.964469
http://fotoforensics.com/analysis.php?id=64882bacdabe951738f624af71f97644b008a411.158020
Think of it this way: There are actually an infinite number of ways to represent a DCT. (Well, almost infinite if you use ints.) For example, I can either have a high value at [0,0]... or I can have a low value at [0,0] and compensate with a different balance of other DCT values that result in an equivalent picture.
Great article. I'm looking at building an application that has an image search feature and this is a good insight into what I needed. Hashing is important to my application because it provides an extremely concise image representation which makes for a good index key. I've been trying to work around some of its limitations though and I'm struggling.
Images that have a large amount of whitespace around them (pictures of logos, scanned images, etc) often hash to 0 since even at low frequency the average color value is very high.
Another issue is that pHash is very sensitive to portions of images being scaled up or down. To continue with the example of logos, two pictures of the same logo, one where the logo consumes 50% of the image space and one where it consumes 80% may differ by a Hamming distance of around 30. Is this an inherent limitation of pHash?
Keep in mind that I'm not so concerned about the speed of computing these hashes. My plan is to precompute them and store them in a database, then use the entropy of the hash to generate a rough range of images to examine. So the issues I'm having are not so much speed but accuracy.
I tried converting the php code but i missed something along the way and my code doesnt work.
Any help would be appreciated.
Regards
I'm wondering if the instructions here for pHash are correct, maybe someone can help. I've implemented pHash in PHP exactly according to instructions here, but I'm not getting really poor results. I have 2 images, 1st is original image and 2nd is just resized version of the first one. My implementation returns similarity around 75%, but pHash demo gives me haaming distance of 0 which is waaaay better. Is this instruction lacking something or am I just stupid and can't implement it?
The perceptual hashes described in this blog entry are all scale-invariant. That means just scaling the picture bigger or smaller should not change the hash value. (In extreme scaling cases, the hashes may change a little but not a lot.)
The first step in the hash generation process is to reduce the image size. A big picture and a small picture should look the same after being reduced to a common size.
http://i.imgur.com/XfJMC8K.jpg
Note the vertical banding, this seems to be common with Instagram and I've seen it nowhere else.
What I've done to try and find the source is undo what I think Instagram did, so that involves automatic colour correction in Photoshop, sharpening it, recently I tried removing banding through FFT, but Google just won't find an original image.
pHash and aHash (and even dHash from the other blog entry) all scale the image smaller. The scaling removes subtle artifacts like those vertical lines.
I'm not sure how aHash would perform on the dog picture since aHash is very dependent on relative coloring. (Instagram filters alter relative coloring.) However, pHash and dHash work great on other pictures that have gone through various Instagram filters.
If anyone needs a Perl implementation so have I implemented this, and made a cpan module named Image::Hash that is available at http://search.cpan.org/~runarb/Image-Hash/lib/Image/Hash.pm .
http://web.archive.org/web/20110611015845/http://blog.tineye.com/2011/06/03/tineye-is-magic/
Thanks! I found the new link location at TinEye and I've replaced the broken link.
Thanks!
-Neal
DO we need to calculate 2D-DCT or DCT?
You said that JPEG uses an 8x8 DCT, this algorithm uses a 32x32 DCT.I used the following equation to calculate the DCT.
\begin{displaymath}
F(u,v) = \left(\frac{2}{N}\right)^{\frac{1}{2}}
\left(\frac{...
...}(2i+1)
\right]cos\left[ \frac{\pi.v}{2.M}(2j+1) \right].f(i,j)\end{displaymath} this is the link I got the equation.
(http://www.cs.cf.ac.uk/Dave/Multimedia/node231.html) Here it said ,8 bit pixels have levels from 0 to 255, therefore an 8 point DCT would be:
where \begin{displaymath}
\Lambda(\xi) = \left\{ \begin{array}
{ll} \frac{1}{\sqrt{2}} & {\rm
for}
\xi = 0 \ 1 & {\rm otherwise}\end{array} \right.\end{displaymath} But you said to use 32*32. So, what should I need to change in this equation. Do I need to change \begin{displaymath}
\Lambda(\xi) = \left\{ \begin{array}
{ll} \frac{1}{\sqrt{2}} & {\rm
for}
\xi = 0 \ 1 & {\rm otherwise}\end{array} \right.\end{displaymath} ? Please help ,me Thank you
Can you tell me how to generate hash in step 7?I mean after doing step 7, I got a string of length 64 which contains 0's and 1's. How to generate a hash like this, "8a0303f6df3ec8cd" using the above bitstream ?I am using php.
Can you tell me to check the similarity with position invariant?
Thanks for a very great article. I am trying to implement the first average algorithm you describe, and I was wondering how do you handle different types of images, which have a different internal ways of representing their data? For example, i am trying to implement this in Java, and there are multiple inner representation of a BufferedImage object (eg. TYPE_INT_ARGB, TYPE_INT_ARGB_PRE, TYPE_3BYTE_BGR, etc). Would it be a good idea to convert all the images to the same format, as you load them, although that can result in some image data loss? Or, the input type doesn't matter, since you end up with a greyscale image in the end?
I am asking this because I tried the implementation on a png image and a jpg image (which use different internal type representations) and, even though they were basically representing the same image, I got a big difference in their signatures.
The actual input format does not matter since everything gets converted to grayscale.
The one exception is an alpha channel. For average hash, you can just exclude hidden pixels from the computation. For other algorithms, you'll need to decide on a background color. White, black, image-average... the color will make a huge difference in the hash output. But since these algorithms only care about "visually similar images", any color should work.
If you're making a hash database, then I'd suggest 3 hashes for alpha-channel images: white, black, and gray -- because you don't know what background may be rendered under the picture.
Thanks!
I have a slightly different need. I'm looking for the closest wallpapers from a picture taken by a user.
Wallpapers have cyclic patterns and thus are not "regular" images. Details are also important.
So I was thinking I could not discard high/low frequencies but remove "rare" ones.
But what if the reference images and the pictures have different sizes? I mean the user can be more or less far away from the wallpaper.
Has anyone any insight/recommendation on this?
Thanks and congrats with the great expertise already shared,
Michaël
I learned a lot from your article.Thank you very much! Can you tell me where you found this sentence,"the dct hash is based on the low 2D DCT coefficients starting at the second from lowest, leaving out the first DC term. This excludes completely flat image information (i.e. solid colors) from being included in the hash description."
Tnanks again!
That was from a personal correspondence.
This blog entry used to have a joke about Alyson Hannigan being my next wife. (The Boss's next husband was supposed to be George Clooney, before she found a younger actor she liked.)
After 10 years, I've started to see some criticisms about it being more creepy than funny. It was not my intent to be creepy, so I've removed the quip.
Just a sec ago I was watching some unrelated video, and Alyson Hannigan was there - and I was like - wasn't there a joke about this woman in this article? And I came here - and it's gone. Oh man
I guess you can never make everyone happy, but I will thank you anyways for this article.
This is an 11-year-old blog entry. You might have better luck asking your question on Reddit.