Re: I'm a total push-over..

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Linus Torvalds
Date: Wednesday, January 23, 2008 - 9:06 am

On Wed, 23 Jan 2008, Andreas Ericsson wrote:

Hashing filenames is pretty easy. You can do a reasonable job with any 
"multiply by an odd number, add in value". Picking the odd number is a 
random choice, some are better than others (and it depends on whether you 
end up always having a power-of-two bucket size etc), but it generally 
won't be horrible.

And considering that we generally won't have tons and tons of pathnames 
(ie we'd generally have thousands, not millions), and that the underlying 
hash not only resizes itself but actually uses the ful 32 bits as the 
lookup key, I don't worry too much. I suspect my random choice is fine.

So no, I didn't think my hash would _suck_, although I also didn't 
research which odd numbers to pick.

No, it's a joke because it doesn't really give an example of how to do the 
_expected_ hash collissions.

Here's another hash that is actually going to collide *much* more (and on 
purpose!), but is actually showing an example of something that actually 
hashes UTF strings so that upper-case and lower-case (and normalization) 
will all still hash to the same value:

	static unsigned int hash_name(const char *name, int namelen)
	{
	        unsigned int hash = 0x123;

	        do {
	                unsigned char c = *name++;
			if (c & 0x80)
				c = 0;
			c &= ~0x20;
	                hash = hash*101 + c;
	        } while (--namelen);
	        return hash;
	}

but the above does so by making the hash much much worse (although 
probably still acceptable for "normal source code name distributions" 
that don't have very many same-name-in-different-cases and high bit 
characters anyway).

The above is still fairly fast, but obviously at a serious cost in hash 
goodness, to the point of being totally unusable for anybody who uses 
Asian characters in their filenames.

To actually be really useful, you'd have to teach it about the character 
system and do a lookup into a case/normalization table.

So *that* is mainly why it's a joke. But it should work fine for the case 
it is used for now (exact match).

Picking a better-researched constant might still be a good idea.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
I'm a total push-over.., Linus Torvalds, (Tue Jan 22, 4:37 pm)
Re: I'm a total push-over.., Kevin Ballard, (Tue Jan 22, 6:35 pm)
Re: I'm a total push-over.., Junio C Hamano, (Tue Jan 22, 7:23 pm)
Re: I'm a total push-over.., Junio C Hamano, (Tue Jan 22, 7:36 pm)
Re: I'm a total push-over.., Linus Torvalds, (Tue Jan 22, 7:58 pm)
Re: I'm a total push-over.., Linus Torvalds, (Tue Jan 22, 8:19 pm)
Re: I'm a total push-over.., Junio C Hamano, (Wed Jan 23, 12:23 am)
Re: I'm a total push-over.., Andreas Ericsson, (Wed Jan 23, 1:32 am)
Re: I'm a total push-over.., Dmitry Potapov, (Wed Jan 23, 2:15 am)
Re: I'm a total push-over.., Andreas Ericsson, (Wed Jan 23, 2:31 am)
Re: I'm a total push-over.., Johannes Schindelin, (Wed Jan 23, 5:24 am)
Re: I'm a total push-over.., Johannes Schindelin, (Wed Jan 23, 5:25 am)
Re: I'm a total push-over.., David Kastrup, (Wed Jan 23, 5:28 am)
Re: I'm a total push-over.., Theodore Tso, (Wed Jan 23, 5:56 am)
Re: I'm a total push-over.., Marko Kreen, (Wed Jan 23, 7:01 am)
Re: I'm a total push-over.., Andreas Ericsson, (Wed Jan 23, 7:39 am)
Re: I'm a total push-over.., Linus Torvalds, (Wed Jan 23, 9:06 am)
Re: I'm a total push-over.., Linus Torvalds, (Wed Jan 23, 9:25 am)
Re: I'm a total push-over.., Johannes Schindelin, (Wed Jan 23, 9:34 am)
Re: I'm a total push-over.., Linus Torvalds, (Wed Jan 23, 10:09 am)
Re: I'm a total push-over.., Dmitry Potapov, (Wed Jan 23, 10:10 am)
Re: I'm a total push-over.., Linus Torvalds, (Wed Jan 23, 10:29 am)
Re: I'm a total push-over.., Luke Lu, (Wed Jan 23, 11:51 pm)
Re: I'm a total push-over.., Andreas Ericsson, (Thu Jan 24, 3:24 am)
Re: I'm a total push-over.., Andreas Ericsson, (Thu Jan 24, 3:39 am)
Re: I'm a total push-over.., Marko Kreen, (Thu Jan 24, 6:19 am)
Re: I'm a total push-over.., Andreas Ericsson, (Thu Jan 24, 9:00 am)
Re: I'm a total push-over.., Marko Kreen, (Thu Jan 24, 9:13 am)
Re: I'm a total push-over.., Dmitry Potapov, (Thu Jan 24, 9:28 am)
Re: I'm a total push-over.., Linus Torvalds, (Thu Jan 24, 10:15 am)
Re: I'm a total push-over.., Dmitry Potapov, (Thu Jan 24, 11:45 am)
Re: I'm a total push-over.., Linus Torvalds, (Thu Jan 24, 12:08 pm)
Re: I'm a total push-over.., Jeremy Maitin-Shepard, (Thu Jan 24, 10:21 pm)
Re: I'm a total push-over.., Junio C Hamano, (Thu Jan 24, 11:50 pm)
Re: I'm a total push-over.., Johannes Schindelin, (Fri Jan 25, 5:51 am)
Re: I'm a total push-over.., Linus Torvalds, (Fri Jan 25, 9:24 am)
Re: I'm a total push-over.., Jeremy Maitin-Shepard, (Fri Jan 25, 11:19 am)
Re: I'm a total push-over.., Johannes Schindelin, (Fri Jan 25, 11:24 am)
Re: I'm a total push-over.., Junio C Hamano, (Fri Jan 25, 12:07 pm)
Re: I'm a total push-over.., Marko Kreen, (Fri Jan 25, 1:08 pm)
Re: I'm a total push-over.., Marko Kreen, (Fri Jan 25, 1:52 pm)
Re: I'm a total push-over.., Linus Torvalds, (Fri Jan 25, 3:16 pm)
Re: I'm a total push-over.., Linus Torvalds, (Fri Jan 25, 3:35 pm)
Re: I'm a total push-over.., Marko Kreen, (Sat Jan 26, 5:16 am)
Re: I'm a total push-over.., Marko Kreen, (Sat Jan 26, 5:37 am)
Re: I'm a total push-over.., Linus Torvalds, (Sat Jan 26, 11:51 pm)
Re: I'm a total push-over.., Dmitry Potapov, (Sun Jan 27, 1:21 am)
Re: I'm a total push-over.., Marko Kreen, (Sun Jan 27, 2:45 am)
Re: I'm a total push-over.., Johannes Schindelin, (Sun Jan 27, 7:07 am)
Re: I'm a total push-over.., Dmitry Potapov, (Sun Jan 27, 7:48 am)
Re: I'm a total push-over.., Dmitry Potapov, (Sun Jan 27, 8:06 am)