> Marko Kreen wrote:
>> On 1/23/08, Andreas Ericsson <ae@op5.se> wrote:
>>> Dmitry Potapov wrote:
>>>> On Wed, Jan 23, 2008 at 09:32:54AM +0100, Andreas Ericsson wrote:
>>>>> The FNV hash would be better (pasted below), but I doubt
>>>>> anyone will ever care, and there will be larger differences
>>>>> between architectures with this one than the lt_git hash (well,
>>>>> a function's gotta have a name).
>>>> Actually, Bob Jenkins' lookup3 hash is twice faster in my tests
>>>> than FNV, and also it is much less likely to have any collision.
>>>>
>>> >From
http://burtleburtle.net/bob/hash/doobs.html
>>> ---
>>> FNV Hash
>>>
>>> I need to fill this in. Search the web for FNV hash. It's faster
>>> than my hash on Intel (because Intel has fast multiplication),
>>> but slower on most other platforms. Preliminary tests suggested
>>> it has decent distributions.
>> I suspect that this paragraph was about comparison with lookup2
>
>
> It might be. It's from the link Dmitry posted in his reply to my
> original
> message. (something/something/doobs.html).
>
>> (not lookup3) because lookup3 beat easily all the "simple" hashes
>
> By how much? FNV beat Linus' hash by 0.01 microseconds / insertion,
> and 0.1 microsecons / lookup. We're talking about a case here where
> there will never be more lookups than insertions (unless I'm much
> mistaken).
>
>> If you don't mind few percent speed penalty compared to Jenkings
>> own optimized version, you can use my simplified version:
>>
http://repo.or.cz/w/pgbouncer.git?a=blob;f=src/
>> hash.c;h=5c9a73639ad098c296c0be562c34573189f3e083;hb=HEAD
>
> I don't, but I don't care that deeply either. On the one hand,
> it would be nifty to have an excellent hash-function in git.
> On the other hand, it would look stupid with something that's
> quite clearly over-kill.
>
>> It works always with "native" endianess, unlike Jenkins fixed-endian
>> hashlittle() / hashbig(). It may or may not matter if you plan
>> to write values on disk.
>> Speed-wise it may be 10-30% slower worst case (in my case sparc-
>> classic
>> with unaligned data), but on x86, lucky gcc version and maybe
>> also memcpy() hack seen in system.h, it tends to be ~10% faster,
>> especially as it does always 4byte read in main loop.
>
> It would have to be a significant improvement in wall-clock time
> on a test-case of hashing 30k strings to warrant going from 6 to 80
> lines of code, imo. I still believe the original dumb hash Linus
> wrote is "good enough".
>
> On a side-note, it was very interesting reading, and I shall have
> to add jenkins3_mkreen() to my test-suite (although the "keep
> copyright note" license thing bugs me a bit).