login
Header Space

 
 

Re: I'm a total push-over..

Score:
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Linus Torvalds <torvalds@...>
Cc: Git Mailing List <git@...>, Junio C Hamano <gitster@...>
Date: Wednesday, January 23, 2008 - 4:32 am

Linus Torvalds wrote:


Insofar as hashes go, it's not that shabby for hashing filenames.
Here's the test output from a small hash-comparison program I've
got, which runs the test-input through a series of different hashes
to compare dispersion, collisions, lookup- and insert times and
other things that are interesting from a practical PoV.

Fowler/Noll/Vo cheap hash
Collisions: 1829, 7.91%. Depth max: 3, average: 0.09.
Time spent in lookups: 167.859ms. Per entry: 0.145us
Time spent inserting: 2.753ms. Per entry: 0.119us

PHP Zend cheap hash
Collisions: 1908, 8.25%. Depth max: 3, average: 0.09.
Time spent in lookups: 171.819ms. Per entry: 0.149us
Time spent inserting: 2.778ms. Per entry: 0.120us

Phong Vo's linear congruential hash
Collisions: 1996, 8.63%. Depth max: 3, average: 0.09.
Time spent in lookups: 168.276ms. Per entry: 0.146us
Time spent inserting: 2.840ms. Per entry: 0.123us

Phong Vo's second linear congruential hash
Collisions: 1933, 8.36%. Depth max: 3, average: 0.09.
Time spent in lookups: 170.416ms. Per entry: 0.147us
Time spent inserting: 2.774ms. Per entry: 0.120us

Glib string hash
Collisions: 1907, 8.24%. Depth max: 3, average: 0.09.
Time spent in lookups: 192.420ms. Per entry: 0.166us
Time spent inserting: 3.154ms. Per entry: 0.136us

sdbm hash
Collisions: 1899, 8.21%. Depth max: 3, average: 0.09.
Time spent in lookups: 170.797ms. Per entry: 0.148us
Time spent inserting: 2.724ms. Per entry: 0.118us

bfd hash
Collisions: 1949, 8.43%. Depth max: 3, average: 0.09.
Time spent in lookups: 206.504ms. Per entry: 0.179us
Time spent inserting: 3.241ms. Per entry: 0.140us

Linus' GIT hash
Collisions: 1946, 8.41%. Depth max: 4, average: 0.09.
Time spent in lookups: 179.336ms. Per entry: 0.155us
Time spent inserting: 2.781ms. Per entry: 0.120us

This is with 64K buckets, which (with my implementation) means
a total hash-table size of 256KiB. The test-input is a simple
file-listing from the linux kernel (although it has the .git
directory included too).

As you can see, it loses out on mathematical correctness, as it
has more collisions (but not that many). OTOH, the simplicity
of the implementation makes it a viable option anyway, since
the time spent in insertion (which is almost completely inside
the hash-function) is on average so much shorter.

For a temporary hash-table, it will work splendidly. It will
probably have issues scaling to, say, 30 or 40 million input
lines (fairly typical test-data for database hashes, fe).

Note that real-world timings will be shorter. My test-program
doesn't have the luxury of keeping hash-functions inline, so
some compiler optimizations become impossible.

This is on a Intel(R) Core(TM)2 Duo CPU T7700  @ 2.40GHz
(according to /proc/cpuinfo), with a cache size of 4meg. The
use of multiplication is sane though, as it means no arch
will suffer greatly.

The FNV hash would be better (pasted below), but I doubt
anyone will ever care, and there will be larger differences
between architectures with this one than the lt_git hash (well,
a function's gotta have a name).

/*
 * Fowler/Noll/Vo hash
 *
 * The basis of the hash algorithm was taken from an idea sent by email to the
 * IEEE Posix P1003.2 mailing list from Phong Vo (kpv@research.att.com) and
 * Glenn Fowler (gsf@research.att.com).  Landon Curt Noll (chongo@toad.com)
 * later improved on their algorithm.
 *
 * The magic is in the interesting relationship between the special prime
 * 16777619 (2^24 + 403) and 2^32 and 2^8.
 *
 * This hash produces the fewest collisions of any function that we've seen so
 * far, and works well on both numbers and strings.
 *
 * (Last comment from MySQL code)
 *
 */
u32 FNV1(u8 *k, u32 len)
{
	u8 *e;
	u32 h;

	e = k + len;
	for (h = 0; k < e; k++) {
		h *= 16777619;
		h ^= *k;
	}

	return (h);
}


I could provide figures for other table-sizes too, if anyone's
interested.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
I'm a total push-over.., Linus Torvalds, (Tue Jan 22, 7:37 pm)
Re: I'm a total push-over.., Andreas Ericsson, (Wed Jan 23, 4:32 am)
Re: I'm a total push-over.., Linus Torvalds, (Wed Jan 23, 12:06 pm)
Re: I'm a total push-over.., Dmitry Potapov, (Wed Jan 23, 5:15 am)
Re: I'm a total push-over.., Andreas Ericsson, (Wed Jan 23, 5:31 am)
Re: I'm a total push-over.., Dmitry Potapov, (Wed Jan 23, 1:10 pm)
Re: I'm a total push-over.., Andreas Ericsson, (Thu Jan 24, 6:39 am)
Re: I'm a total push-over.., Marko Kreen, (Wed Jan 23, 10:01 am)
Re: I'm a total push-over.., Andreas Ericsson, (Wed Jan 23, 10:39 am)
Re: I'm a total push-over.., Marko Kreen, (Thu Jan 24, 9:19 am)
Re: I'm a total push-over.., Andreas Ericsson, (Thu Jan 24, 12:00 pm)
Re: I'm a total push-over.., Marko Kreen, (Fri Jan 25, 4:08 pm)
Re: I'm a total push-over.., Dmitry Potapov, (Thu Jan 24, 12:28 pm)
Re: I'm a total push-over.., Linus Torvalds, (Thu Jan 24, 1:15 pm)
Re: I'm a total push-over.., Marko Kreen, (Fri Jan 25, 4:52 pm)
Re: I'm a total push-over.., Linus Torvalds, (Fri Jan 25, 6:16 pm)
Re: I'm a total push-over.., Marko Kreen, (Sat Jan 26, 8:37 am)
Re: I'm a total push-over.., Marko Kreen, (Sat Jan 26, 8:16 am)
Re: I'm a total push-over.., Linus Torvalds, (Sun Jan 27, 2:51 am)
Re: I'm a total push-over.., Marko Kreen, (Sun Jan 27, 5:45 am)
Re: I'm a total push-over.., Dmitry Potapov, (Sun Jan 27, 11:06 am)
Re: I'm a total push-over.., Dmitry Potapov, (Sun Jan 27, 4:21 am)
Re: I'm a total push-over.., Johannes Schindelin, (Sun Jan 27, 10:07 am)
Re: I'm a total push-over.., Dmitry Potapov, (Sun Jan 27, 10:48 am)
Re: I'm a total push-over.., Linus Torvalds, (Fri Jan 25, 6:35 pm)
Re: I'm a total push-over.., Dmitry Potapov, (Thu Jan 24, 2:45 pm)
Re: I'm a total push-over.., Linus Torvalds, (Thu Jan 24, 3:08 pm)
Re: I'm a total push-over.., Marko Kreen, (Thu Jan 24, 12:13 pm)
Re: I'm a total push-over.., Luke Lu, (Thu Jan 24, 2:51 am)
Re: I'm a total push-over.., Andreas Ericsson, (Thu Jan 24, 6:24 am)
Re: I'm a total push-over.., Junio C Hamano, (Tue Jan 22, 10:23 pm)
Re: I'm a total push-over.., Linus Torvalds, (Tue Jan 22, 10:58 pm)
Re: I'm a total push-over.., Junio C Hamano, (Wed Jan 23, 3:23 am)
Re: I'm a total push-over.., Johannes Schindelin, (Wed Jan 23, 8:25 am)
Re: I'm a total push-over.., Linus Torvalds, (Wed Jan 23, 12:25 pm)
Re: I'm a total push-over.., Johannes Schindelin, (Wed Jan 23, 12:34 pm)
Re: I'm a total push-over.., Linus Torvalds, (Wed Jan 23, 1:09 pm)
Re: I'm a total push-over.., Linus Torvalds, (Wed Jan 23, 1:29 pm)
Re: I'm a total push-over.., Jeremy Maitin-Shepard, (Fri Jan 25, 1:21 am)
Re: I'm a total push-over.., Johannes Schindelin, (Fri Jan 25, 8:51 am)
Re: I'm a total push-over.., Jeremy Maitin-Shepard, (Fri Jan 25, 2:19 pm)
Re: I'm a total push-over.., Junio C Hamano, (Fri Jan 25, 3:07 pm)
Re: I'm a total push-over.., Johannes Schindelin, (Fri Jan 25, 2:24 pm)
Re: I'm a total push-over.., Linus Torvalds, (Tue Jan 22, 11:19 pm)
Re: I'm a total push-over.., Junio C Hamano, (Fri Jan 25, 2:50 am)
Re: I'm a total push-over.., Linus Torvalds, (Fri Jan 25, 12:24 pm)
Re: I'm a total push-over.., Junio C Hamano, (Tue Jan 22, 10:36 pm)
Re: I'm a total push-over.., Johannes Schindelin, (Wed Jan 23, 8:24 am)
Re: I'm a total push-over.., David Kastrup, (Wed Jan 23, 8:28 am)
Re: I'm a total push-over.., Theodore Tso, (Wed Jan 23, 8:56 am)
Re: I'm a total push-over.., Kevin Ballard, (Tue Jan 22, 9:35 pm)
speck-geostationary