Re: [PATCH] Use a hashtable for objects instead of a sorted list

Previous thread: We are pros in getting you laid by Rolland on Saturday, February 11, 2006 - 11:27 pm. (1 message)

Next thread: Configuration file musings by Mark Wooding on Sunday, February 12, 2006 - 6:45 am. (2 messages)
From: Junio C Hamano
Date: Sunday, February 12, 2006 - 5:11 am

Thanks.  Now we have three independent numbers to back up that
Johannes is the winner....

Grrrrrrr.  Please, DO NOT USE THIS ONE YET.

At least, not with your production repository.

I am trying to nail it down but it appears at least fsck-objects
using this version gives bogus results.  I am first trying to
see if my primary working repository is sane.

Oh, and thanks again for your initial patch, which was what
started this drastic improvement.

-

From: Johannes Schindelin
Date: Sunday, February 12, 2006 - 7:31 am

Hi,


I am sorry! I tested fsck, but only *once*, since I did not think such a 
creepy bug was in there. And then, I had to run to sing Beethoven's Missa 
Solemnis, and missed all the action about this patch.

Just a few remarks around the comments in this thread:

- the doubling of obj_allocs is arbitrary. Originally, I planned to do the 
growing much faster, which would have been helped by the fact. But it 
turned out my thinking was defective. So, you can grow the hashtable by 
whatever you like (doubling is quite effective, though).

- hashtable has expected O(1) insertion, and that is what boosts the 
performance. Since the table growing is linear in the number of objects 
(both size and computing time), and all operations afterwards are linear 
on the table, *and the hash is already computed*, the hashtable is 
preferrable over other data structures (sorted list has O(n) insertion 
time, and tree still O(log n)).

- the bug Junio fixed was not triggered here, since I did all the testing 
on my venerable iBook. The PowerPC architecture evidently aligns 
all pointers to 32-bit, so I could reinterpret the pointer as to an 
unsigned int. Note that there is a small overhead in Junio's version, but 
it is probably not worth the hassle to make that a compile time option. 
But I agree with Florian that memcpy would be more efficient.

- Arithmetic and Boolean operations on 32-bit integers typically are 
handled very efficiently in modern 32-bit CPUs, so there should be no 
reason to use "&" instead of "%" (especially since understanding the code 
wouldn't be helped by that).

Ciao,
Dscho


-

Previous thread: We are pros in getting you laid by Rolland on Saturday, February 11, 2006 - 11:27 pm. (1 message)

Next thread: Configuration file musings by Mark Wooding on Sunday, February 12, 2006 - 6:45 am. (2 messages)