Re: cleaner/better zlib sources?

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Linus Torvalds
Date: Saturday, March 17, 2007 - 10:55 am

On Fri, 16 Mar 2007, Nicolas Pitre wrote:

Actually, it's even better than that.

If we're walking a certain pathspec (which is reall ythe only thing that 
is expensive), we're pretty much *guaranteed* that we'll hit exactly this 
case. Doing some instrumentation on the test-case I've been using (which 
is just "git log drivers/usb/ > /dev/null") shows:

	[torvalds@woody linux]$ grep Needs delta-base-trace | wc -l
	469334
	[torvalds@woody linux]$ grep Needs delta-base-trace | sort -u | wc -l
	21933

where that delta-base-trace is just a trace of which delta bases were 
needed. Look how we currently generate almost half a million of them, but 
only 22000 are actually unique objects - we just generate many of them 
over and over again. In fact, the top delta bases with counts looks like:

    558 Needs 102398354
    556 Needs 161353360
    554 Needs 161354852
    552 Needs 161354916
    550 Needs 161354980
    526 Needs 161355044
    524 Needs 161355108
    522 Needs 161355174
    520 Needs 161355238
    508 Needs 161445724
    446 Needs 119712387
    425 Needs 133406737
    420 Needs 161513997
    387 Needs 120784913
    331 Needs 127094253
    321 Needs 95694853
    319 Needs 125888524
    303 Needs 155109487
    301 Needs 155627964
    299 Needs 155628028
    .....

ie the top twenty objects were all generated hundreds of times each.

More importantly, the trace also shows that it actually has very good 
locality too - exactly as you'd expect, since when we traverse the trees, 
we'd generally see a particular delta base used as a base when that thing 
is slowly changing, so of the half-million "needs" entries in my trace, if 
I pick the top delta_base (102398354), and use "cat -n" to give them all 
line numbers (from 1 to half a million), and grep for that particular 
delta:

	grep Needs delta-base-trace | cat -n | grep 102398354 | less -S

they are *all* at lines 61624..89352, with the bulk of them being very 
close together (the bulk of those are all around 88k line mark).

In other words, it's not "spread out" over time. It's very clustered, 
which I'd expect anyway, which means that even a simple cache of just a 
few hundred entries (statically sized) will be very effective.

So the cache doesn't need to be "complete". It will get good hit-rates 
even from being very simple. I think I have a very simple and cunning 
plan, I'll try it out asap.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
cleaner/better zlib sources?, Linus Torvalds, (Thu Mar 15, 6:04 pm)
Re: cleaner/better zlib sources?, Shawn O. Pearce, (Thu Mar 15, 6:10 pm)
Re: cleaner/better zlib sources?, Jeff Garzik, (Thu Mar 15, 6:11 pm)
Re: cleaner/better zlib sources?, Matt Mackall, (Thu Mar 15, 6:14 pm)
Re: cleaner/better zlib sources?, Davide Libenzi, (Thu Mar 15, 6:33 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Thu Mar 15, 6:46 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Thu Mar 15, 6:54 pm)
Re: cleaner/better zlib sources?, Davide Libenzi, (Thu Mar 15, 7:06 pm)
Re: cleaner/better zlib sources?, Davide Libenzi, (Thu Mar 15, 7:43 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Thu Mar 15, 7:56 pm)
Re: cleaner/better zlib sources?, Davide Libenzi, (Thu Mar 15, 8:16 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Fri Mar 16, 9:21 am)
Re: cleaner/better zlib sources?, Davide Libenzi, (Fri Mar 16, 9:24 am)
Re: cleaner/better zlib sources?, Jeff Garzik, (Fri Mar 16, 9:35 am)
Re: cleaner/better zlib sources?, Linus Torvalds, (Fri Mar 16, 9:35 am)
Re: cleaner/better zlib sources?, Matt Mackall, (Fri Mar 16, 9:42 am)
Re: cleaner/better zlib sources?, Linus Torvalds, (Fri Mar 16, 9:51 am)
Re: cleaner/better zlib sources?, Nicolas Pitre, (Fri Mar 16, 10:06 am)
Re: cleaner/better zlib sources?, Nicolas Pitre, (Fri Mar 16, 10:12 am)
Re: cleaner/better zlib sources?, Linus Torvalds, (Fri Mar 16, 10:51 am)
Re: cleaner/better zlib sources?, Nicolas Pitre, (Fri Mar 16, 11:09 am)
Re: cleaner/better zlib sources?, Davide Libenzi, (Fri Mar 16, 12:21 pm)
Re: cleaner/better zlib sources?, Shawn O. Pearce, (Fri Mar 16, 4:22 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Fri Mar 16, 5:01 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Fri Mar 16, 6:11 pm)
Re: cleaner/better zlib sources?, Nicolas Pitre, (Fri Mar 16, 8:28 pm)
Re: cleaner/better zlib sources?, Shawn O. Pearce, (Fri Mar 16, 10:19 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Sat Mar 17, 10:55 am)
Re: cleaner/better zlib sources?, Linus Torvalds, (Sat Mar 17, 12:40 pm)
[PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 12:44 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 2:45 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Junio C Hamano, (Sat Mar 17, 3:37 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 3:44 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 4:09 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Junio C Hamano, (Sat Mar 17, 4:12 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 4:24 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Jon Smirl, (Sat Mar 17, 4:52 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 4:54 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Nicolas Pitre, (Sat Mar 17, 6:13 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Morten Welinder, (Sat Mar 17, 6:14 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 6:29 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Nicolas Pitre, (Sat Mar 17, 6:38 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 6:44 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 6:55 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Nicolas Pitre, (Sat Mar 17, 7:03 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 7:20 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Nicolas Pitre, (Sat Mar 17, 8:00 pm)
[PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Sat Mar 17, 8:06 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 8:31 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Julian Phillips, (Sat Mar 17, 10:30 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Avi Kivity, (Sat Mar 17, 11:28 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Junio C Hamano, (Sun Mar 18, 12:47 am)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Junio C Hamano, (Sun Mar 18, 2:45 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, Robin Rosenberg, (Sun Mar 18, 3:53 am)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Sun Mar 18, 8:54 am)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Sun Mar 18, 8:57 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sun Mar 18, 10:23 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sun Mar 18, 10:34 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, Robin Rosenberg, (Sun Mar 18, 11:29 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, Shawn O. Pearce, (Sun Mar 18, 2:25 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Shawn O. Pearce, (Sun Mar 18, 2:38 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Sun Mar 18, 2:48 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, David Brodsky, (Mon Mar 19, 6:16 am)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Johannes Schindelin, (Mon Mar 19, 8:05 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Junio C Hamano, (Mon Mar 19, 8:16 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Shawn O. Pearce, (Mon Mar 19, 8:29 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Shawn O. Pearce, (Mon Mar 19, 8:40 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Mon Mar 19, 9:11 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Shawn O. Pearce, (Mon Mar 19, 9:18 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Mon Mar 19, 9:31 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Shawn O. Pearce, (Mon Mar 19, 9:39 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Mon Mar 19, 9:45 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Mon Mar 19, 9:57 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Junio C Hamano, (Mon Mar 19, 10:44 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Robin Rosenberg, (Mon Mar 19, 11:35 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, David Brodsky, (Tue Mar 20, 2:13 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Tue Mar 20, 7:37 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Nicolas Pitre, (Tue Mar 20, 7:54 pm)