Re: pack operation is thrashing my server

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Nicolas Pitre <nico@...>
Cc: Shawn O. Pearce <spearce@...>, Geert Bosch <bosch@...>, Andi Kleen <andi@...>, Ken Pratt <ken@...>, <git@...>
Date: Thursday, August 14, 2008 - 1:58 pm

On Thu, 14 Aug 2008, Linus Torvalds wrote:

Btw, it's not that hard to run oprofile (link git statically to get better 
numbers). For me, the answer to what is going on for a kernel rev-list is 
pretty straightforward:

	263742   26.6009  lookup_object
	135945   13.7113  inflate
	110525   11.1475  inflate_fast
	75124     7.5770  inflate_table
	64676     6.5232  strlen
	48635     4.9053  memcpy
	47744     4.8154  find_pack_entry_one
	35265     3.5568  _int_malloc
	31579     3.1850  decode_tree_entry
	28388     2.8632  adler32
	19441     1.9608  process_tree
	10398     1.0487  patch_delta
	8925      0.9002  _int_free
	..

so most of it is in inflate, but I suspect the cost of "lookup_object()" 
is so high becuase when we parse the trees we also have to look up every 
blob - even if they didn't change - just to see whether we already saw it 
or not.

For me, an instruction-level profile of lookup_object() shows that the 
cost is all in the hashcmp (53% of the profile is on that "repz cmpsb") 
and in the loading of the object pointer (26% of the profile is on the 
test instruction after the "obj_hash[i]" load). I don't think we can 
really improve that code much - the hash table is very efficient, and the 
cost is just in the fact that we have a lot of meory accesses.

We could try to use the (more memory-hungry) "hash.c" implementation for 
object hashing, which actually includes a 32-bit key inside the hash 
table, but while that will avoid the cost of fetching the object pointer 
for the cases where we have collisions, most of the time the cost is not 
in the collision, but in the fact that we _hit_.

I bet the hit percentage is 90+%, and the cost really is just that we 
encounter the same object hundreds or thousands of times.

Please realize that even if there may be "only" a million objects in the 
kernel, there are *MANY* more ways to _reach_ those objects, and that is 
what git-rev-list --objects does! It's not O(number-of-objects), it's 
O(number-of-object-linkages).

For my current kernel archive, for example, the number of objects is 
roughly 900k. However, think about how many times we'll actually reach a 
blob: that's roughly (blobs per commit)*(number of commits), which can be 
approximated with

	echo $(( $(git ls-files | wc -l) * $(git rev-list --all | wc -l) ))

which is 24324*108518=2639591832 ie about 2.5 _billion_ times.

Now, we don't actually do anything close to that many lookups, because 
when a subdirectory doesn't change at all, we'll skip the whole tree after 
having seen it just once, so that will cut down on the number of objects 
we have to look up by probably a couple of orders of magnitude.

But this is why the "one large directory" load performs worse: in the 
worst case, if you really have a totally flat directory tree, you'd 
literally see that 2.5 billion object lookup case.

So it's not that git scales badly. It's that "git rev-list --objects" is 
really a very expensive operation, and while some good practices (deep 
directory structures) makes it able to optimize the load away a lot, it's 
still potentially very tough.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
pack operation is thrashing my server, Ken Pratt, (Sun Aug 10, 3:47 pm)
Re: pack operation is thrashing my server, Jakub Narebski, (Wed Aug 13, 8:43 am)
Re: pack operation is thrashing my server, Shawn O. Pearce, (Sun Aug 10, 11:04 pm)
Re: pack operation is thrashing my server, Ken Pratt, (Mon Aug 11, 3:43 am)
Re: pack operation is thrashing my server, Andi Kleen, (Mon Aug 11, 3:10 pm)
Re: pack operation is thrashing my server, Geert Bosch, (Tue Aug 12, 11:12 pm)
Re: pack operation is thrashing my server, Nicolas Pitre, (Wed Aug 13, 10:35 am)
Re: pack operation is thrashing my server, Geert Bosch, (Wed Aug 13, 12:01 pm)
Re: pack operation is thrashing my server, Nicolas Pitre, (Wed Aug 13, 1:26 pm)
Re: pack operation is thrashing my server, Dana How, (Wed Aug 13, 1:13 pm)
Re: pack operation is thrashing my server, Shawn O. Pearce, (Wed Aug 13, 10:59 am)
Re: pack operation is thrashing my server, Nicolas Pitre, (Wed Aug 13, 11:43 am)
Re: pack operation is thrashing my server, Shawn O. Pearce, (Wed Aug 13, 11:50 am)
Re: pack operation is thrashing my server, Nicolas Pitre, (Wed Aug 13, 1:04 pm)
Re: pack operation is thrashing my server, Linus Torvalds, (Thu Aug 14, 1:21 pm)
Re: pack operation is thrashing my server, Nicolas Pitre, (Thu Aug 14, 2:38 pm)
Re: pack operation is thrashing my server, Linus Torvalds, (Thu Aug 14, 2:55 pm)
Re: pack operation is thrashing my server, Linus Torvalds, (Thu Aug 14, 1:58 pm)
Re: pack operation is thrashing my server, Nicolas Pitre, (Thu Aug 14, 3:04 pm)
Re: pack operation is thrashing my server, Linus Torvalds, (Thu Aug 14, 3:44 pm)
Re: pack operation is thrashing my server, Nicolas Pitre, (Thu Aug 14, 5:50 pm)
Re: pack operation is thrashing my server, Linus Torvalds, (Thu Aug 14, 7:14 pm)
Re: pack operation is thrashing my server, Linus Torvalds, (Fri Aug 15, 8:34 pm)
Re: pack operation is thrashing my server, Junio C Hamano, (Sat Sep 6, 9:03 pm)
Re: pack operation is thrashing my server, Linus Torvalds, (Sat Sep 6, 9:46 pm)
Re: pack operation is thrashing my server, Mike Hommey, (Sun Sep 7, 3:45 am)
Re: pack operation is thrashing my server, Jon Smirl, (Sat Sep 6, 10:50 pm)
Re: pack operation is thrashing my server, Linus Torvalds, (Sat Sep 6, 11:07 pm)
Re: pack operation is thrashing my server, Andreas Ericsson, (Sun Sep 7, 4:18 am)
Re: pack operation is thrashing my server, Jon Smirl, (Sat Sep 6, 11:43 pm)
Re: pack operation is thrashing my server, Linus Torvalds, (Sun Sep 7, 12:50 am)
Re: pack operation is thrashing my server, Jon Smirl, (Sun Sep 7, 9:58 am)
Re: pack operation is thrashing my server, Nicolas Pitre, (Sun Sep 7, 1:08 pm)
Re: pack operation is thrashing my server, Jon Smirl, (Sun Sep 7, 4:33 pm)
Re: pack operation is thrashing my server, Nicolas Pitre, (Mon Sep 8, 10:17 am)
Re: pack operation is thrashing my server, Jon Smirl, (Mon Sep 8, 11:12 am)
Re: pack operation is thrashing my server, Jon Smirl, (Mon Sep 8, 12:01 pm)
Re: pack operation is thrashing my server, Junio C Hamano, (Sat Sep 6, 10:33 pm)
Re: pack operation is thrashing my server, Nicolas Pitre, (Sun Sep 7, 1:11 pm)
Re: pack operation is thrashing my server, Junio C Hamano, (Sun Sep 7, 1:41 pm)
Re: pack operation is thrashing my server, Björn, (Thu Aug 14, 7:39 pm)
Re: pack operation is thrashing my server, Linus Torvalds, (Thu Aug 14, 8:06 pm)
Re: pack operation is thrashing my server, Björn, (Sat Aug 16, 8:47 am)
Re: pack operation is thrashing my server, Linus Torvalds, (Thu Aug 14, 8:25 pm)
Re: pack operation is thrashing my server, Andi Kleen, (Thu Aug 14, 5:30 pm)
Re: pack operation is thrashing my server, Linus Torvalds, (Fri Aug 15, 12:15 pm)
Re: pack operation is thrashing my server, Andreas Ericsson, (Thu Aug 14, 2:33 am)
Re: pack operation is thrashing my server, Nicolas Pitre, (Thu Aug 14, 10:01 am)
Re: pack operation is thrashing my server, Thomas Rast, (Thu Aug 14, 6:04 am)
Re: pack operation is thrashing my server, Andreas Ericsson, (Thu Aug 14, 6:15 am)
Re: pack operation is thrashing my server, Shawn O. Pearce, (Thu Aug 14, 6:33 pm)
Re: pack operation is thrashing my server, Nicolas Pitre, (Thu Aug 14, 9:46 pm)
Re: pack operation is thrashing my server, Shawn O. Pearce, (Wed Aug 13, 1:19 pm)
Re: pack operation is thrashing my server, Shawn O. Pearce, (Tue Aug 12, 11:15 pm)
Re: pack operation is thrashing my server, Geert Bosch, (Tue Aug 12, 11:58 pm)
Re: pack operation is thrashing my server, Nicolas Pitre, (Wed Aug 13, 10:37 am)
Re: pack operation is thrashing my server, Jakub Narebski, (Wed Aug 13, 10:56 am)
Re: pack operation is thrashing my server, Shawn O. Pearce, (Wed Aug 13, 11:04 am)
Re: pack operation is thrashing my server, Johan Herland, (Wed Aug 13, 12:10 pm)
Re: pack operation is thrashing my server, Ken Pratt, (Wed Aug 13, 1:38 pm)
Re: pack operation is thrashing my server, Nicolas Pitre, (Wed Aug 13, 1:57 pm)
Re: pack operation is thrashing my server, David Tweed, (Wed Aug 13, 11:26 am)
Re: pack operation is thrashing my server, Martin Langhoff, (Wed Aug 13, 7:54 pm)
Re: pack operation is thrashing my server, David Tweed, (Thu Aug 14, 5:04 am)
Re: pack operation is thrashing my server, Shawn O. Pearce, (Mon Aug 11, 3:22 pm)
Re: pack operation is thrashing my server, Ken Pratt, (Mon Aug 11, 3:29 pm)
Re: pack operation is thrashing my server, Shawn O. Pearce, (Mon Aug 11, 3:34 pm)
Re: pack operation is thrashing my server, Andi Kleen, (Mon Aug 11, 4:10 pm)
Re: pack operation is thrashing my server, Ken Pratt, (Mon Aug 11, 3:15 pm)
Re: pack operation is thrashing my server, Nicolas Pitre, (Tue Aug 12, 10:38 pm)
Re: pack operation is thrashing my server, Andi Kleen, (Tue Aug 12, 10:50 pm)
Re: pack operation is thrashing my server, Shawn O. Pearce, (Tue Aug 12, 10:57 pm)
Re: pack operation is thrashing my server, Shawn O. Pearce, (Mon Aug 11, 11:01 am)
Re: pack operation is thrashing my server, Ken Pratt, (Mon Aug 11, 3:13 pm)
Re: pack operation is thrashing my server, Avery Pennarun, (Mon Aug 11, 11:40 am)
Re: pack operation is thrashing my server, Shawn O. Pearce, (Mon Aug 11, 11:59 am)
Re: pack operation is thrashing my server, Martin Langhoff, (Sun Aug 10, 7:06 pm)
Re: pack operation is thrashing my server, Ken Pratt, (Sun Aug 10, 7:12 pm)
Re: pack operation is thrashing my server, Martin Langhoff, (Sun Aug 10, 7:30 pm)
Re: pack operation is thrashing my server, Ken Pratt, (Sun Aug 10, 7:34 pm)