> On Jan 23, 2008, at 6:39 AM, Andreas Ericsson wrote:
>> Marko Kreen wrote:
>>> On 1/23/08, Andreas Ericsson <ae@op5.se> wrote:
>>>> Dmitry Potapov wrote:
>>>>> On Wed, Jan 23, 2008 at 09:32:54AM +0100, Andreas Ericsson wrote:
>>>>>> The FNV hash would be better (pasted below), but I doubt
>>>>>> anyone will ever care, and there will be larger differences
>>>>>> between architectures with this one than the lt_git hash (well,
>>>>>> a function's gotta have a name).
>>>>> Actually, Bob Jenkins' lookup3 hash is twice faster in my tests
>>>>> than FNV, and also it is much less likely to have any collision.
>>>>>
>>>> >From
http://burtleburtle.net/bob/hash/doobs.html
>>>> ---
>>>> FNV Hash
>>>>
>>>> I need to fill this in. Search the web for FNV hash. It's faster
>>>> than my hash on Intel (because Intel has fast multiplication), but
>>>> slower on most other platforms. Preliminary tests suggested it has
>>>> decent distributions.
>>> I suspect that this paragraph was about comparison with lookup2
>>
>>
>> It might be. It's from the link Dmitry posted in his reply to my original
>> message. (something/something/doobs.html).
>>
>>> (not lookup3) because lookup3 beat easily all the "simple" hashes
>>
>> By how much? FNV beat Linus' hash by 0.01 microseconds / insertion,
>> and 0.1 microsecons / lookup. We're talking about a case here where
>> there will never be more lookups than insertions (unless I'm much
>> mistaken).
>>
>>> If you don't mind few percent speed penalty compared to Jenkings
>>> own optimized version, you can use my simplified version:
>>>
>>>
http://repo.or.cz/w/pgbouncer.git?a=blob;f=src/hash.c;h=5c9a73639ad098c296c0be562c3457...
>>>
>>
>> I don't, but I don't care that deeply either. On the one hand,
>> it would be nifty to have an excellent hash-function in git.
>> On the other hand, it would look stupid with something that's
>> quite clearly over-kill.
>>
>>> It works always with "native" endianess, unlike Jenkins fixed-endian
>>> hashlittle() / hashbig(). It may or may not matter if you plan
>>> to write values on disk.
>>> Speed-wise it may be 10-30% slower worst case (in my case sparc-classic
>>> with unaligned data), but on x86, lucky gcc version and maybe
>>> also memcpy() hack seen in system.h, it tends to be ~10% faster,
>>> especially as it does always 4byte read in main loop.
>>
>> It would have to be a significant improvement in wall-clock time
>> on a test-case of hashing 30k strings to warrant going from 6 to 80
>> lines of code, imo. I still believe the original dumb hash Linus
>> wrote is "good enough".
>>
>> On a side-note, it was very interesting reading, and I shall have
>> to add jenkins3_mkreen() to my test-suite (although the "keep
>> copyright note" license thing bugs me a bit).
>
> Would you, for completeness' sake, please add Tcl and STL hashes to your
> test suite?