Re: cleaner/better zlib sources?

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Linus Torvalds
Date: Friday, March 16, 2007 - 6:11 pm

On Fri, 16 Mar 2007, Linus Torvalds wrote:

Damn. I think I know why it's happening, and I'm an idiot. I think it's 
actually an issue I wondered about a *loong* time ago, and then forgot all 
about. And later or Nico made it almost impossible to fix with his "pack 
offset" changes.

The thing that made me realize was one of the callchains into inflate() 
that I looked at:

   (gdb) where
   #0  inflate (strm=0x7fff10d83810, flush=4) at inflate.c:566
   #1  0x000000000044c165 in unpack_compressed_entry (p=0x6d52e0, w_curs=0x7fff10d838e0, curpos=94941911,
       size=<value optimized out>) at sha1_file.c:1348
   #2  0x000000000044c2b6 in unpack_entry (p=0x6d52e0, obj_offset=94941909, type=0x7fff10d85d8c, sizep=0x7fff10d83928)
       at sha1_file.c:1408
   #3  0x000000000044c32e in unpack_entry (p=0x6d52e0, obj_offset=94942707, type=0x7fff10d85d8c, sizep=0x7fff10d83988)
       at sha1_file.c:1373
   #4  0x000000000044c32e in unpack_entry (p=0x6d52e0, obj_offset=94943021, type=0x7fff10d85d8c, sizep=0x7fff10d839e8)
       at sha1_file.c:1373
   #5  0x000000000044c32e in unpack_entry (p=0x6d52e0, obj_offset=94943382, type=0x7fff10d85d8c, sizep=0x7fff10d83a48)
       at sha1_file.c:1373
   #6  0x000000000044c32e in unpack_entry (p=0x6d52e0, obj_offset=94943531, type=0x7fff10d85d8c, sizep=0x7fff10d83aa8)
       at sha1_file.c:1373
   #7  0x000000000044c32e in unpack_entry (p=0x6d52e0, obj_offset=94943622, type=0x7fff10d85d8c, sizep=0x7fff10d83b08)
       at sha1_file.c:1373
   #8  0x000000000044c32e in unpack_entry (p=0x6d52e0, obj_offset=94945357, type=0x7fff10d85d8c, sizep=0x7fff10d83b68)
       at sha1_file.c:1373
   #9  0x000000000044c32e in unpack_entry (p=0x6d52e0, obj_offset=94945447, type=0x7fff10d85d8c, sizep=0x7fff10d83bc8)
       at sha1_file.c:1373
   #10 0x000000000044c32e in unpack_entry (p=0x6d52e0, obj_offset=94945571, type=0x7fff10d85d8c, sizep=0x7fff10d85d80)
       at sha1_file.c:1373
   #11 0x000000000044c3f8 in read_packed_sha1 (sha1=<value optimized out>, type=0x7fff10d85d8c, size=0x7fff10d85d80)
       at sha1_file.c:1567
   #12 0x000000000044c741 in read_sha1_file (sha1=0x7fff10d85d60 "�Ab\217��236���031", type=0x7fff10d85d8c,
       size=0x7fff10d85d80) at sha1_file.c:1636
   ....

and notice the deep recursion in sha1_file.

The way we unpack delta chains is that we do

 - find a delta
 - we apply it to "recursively unpack the thing it's a delta to"

which sounds totally obvious and straightforward, right?

EXCEPT it's actually O(n**2) in the delta depth, because we never save the 
intermediate results, so when we have a delta depth of 10 (our default), 
and we decode a lot of these things, we basically will look up the base 
object 10 times, apply the first delta 9 times, apply the second delta 8 
times, etc etc.. 

I didn't worry about it, because it never actually hit as much of a
performance problem (and when you do a *single* tree operation you'd
never see it anyway: you apply the deltas you need, and nothing else),
but what it means is that we actually call inflate on the chain entries
55 times instead of just doing it 10 times. 

It's also somewhat limited by the delta depth that we enforce anyway (I 
say "somewhat", because we only limit the maximum depth, not the number of 
times an object can be used as a base, and if you use an object as a base 
a thousand times, it will literally be unpacked a thousand times too!

I also didn't worry about it, because I felt that if it became a problem, 
it would be easy to just add a cache of base objects (we probably do *not* 
want to keep the whole unpacked object info in memory all the time just 
because of memory pressure issues, so "cache of base objects" is better). 
However, the "pack file + offset" thing makes it harder to do, since we 
now don't even have the SHA1 of the base object before we unpack it.

But I guess we could just index this by a <packfile, offset> tuple.

Anyway, I bet that this is a much bigger issue than the pack format 
itself (and is largely independent).

		Linus
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
cleaner/better zlib sources?, Linus Torvalds, (Thu Mar 15, 6:04 pm)
Re: cleaner/better zlib sources?, Shawn O. Pearce, (Thu Mar 15, 6:10 pm)
Re: cleaner/better zlib sources?, Jeff Garzik, (Thu Mar 15, 6:11 pm)
Re: cleaner/better zlib sources?, Matt Mackall, (Thu Mar 15, 6:14 pm)
Re: cleaner/better zlib sources?, Davide Libenzi, (Thu Mar 15, 6:33 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Thu Mar 15, 6:46 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Thu Mar 15, 6:54 pm)
Re: cleaner/better zlib sources?, Davide Libenzi, (Thu Mar 15, 7:06 pm)
Re: cleaner/better zlib sources?, Davide Libenzi, (Thu Mar 15, 7:43 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Thu Mar 15, 7:56 pm)
Re: cleaner/better zlib sources?, Davide Libenzi, (Thu Mar 15, 8:16 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Fri Mar 16, 9:21 am)
Re: cleaner/better zlib sources?, Davide Libenzi, (Fri Mar 16, 9:24 am)
Re: cleaner/better zlib sources?, Jeff Garzik, (Fri Mar 16, 9:35 am)
Re: cleaner/better zlib sources?, Linus Torvalds, (Fri Mar 16, 9:35 am)
Re: cleaner/better zlib sources?, Matt Mackall, (Fri Mar 16, 9:42 am)
Re: cleaner/better zlib sources?, Linus Torvalds, (Fri Mar 16, 9:51 am)
Re: cleaner/better zlib sources?, Nicolas Pitre, (Fri Mar 16, 10:06 am)
Re: cleaner/better zlib sources?, Nicolas Pitre, (Fri Mar 16, 10:12 am)
Re: cleaner/better zlib sources?, Linus Torvalds, (Fri Mar 16, 10:51 am)
Re: cleaner/better zlib sources?, Nicolas Pitre, (Fri Mar 16, 11:09 am)
Re: cleaner/better zlib sources?, Davide Libenzi, (Fri Mar 16, 12:21 pm)
Re: cleaner/better zlib sources?, Shawn O. Pearce, (Fri Mar 16, 4:22 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Fri Mar 16, 5:01 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Fri Mar 16, 6:11 pm)
Re: cleaner/better zlib sources?, Nicolas Pitre, (Fri Mar 16, 8:28 pm)
Re: cleaner/better zlib sources?, Shawn O. Pearce, (Fri Mar 16, 10:19 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Sat Mar 17, 10:55 am)
Re: cleaner/better zlib sources?, Linus Torvalds, (Sat Mar 17, 12:40 pm)
[PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 12:44 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 2:45 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Junio C Hamano, (Sat Mar 17, 3:37 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 3:44 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 4:09 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Junio C Hamano, (Sat Mar 17, 4:12 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 4:24 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Jon Smirl, (Sat Mar 17, 4:52 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 4:54 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Nicolas Pitre, (Sat Mar 17, 6:13 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Morten Welinder, (Sat Mar 17, 6:14 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 6:29 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Nicolas Pitre, (Sat Mar 17, 6:38 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 6:44 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 6:55 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Nicolas Pitre, (Sat Mar 17, 7:03 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 7:20 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Nicolas Pitre, (Sat Mar 17, 8:00 pm)
[PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Sat Mar 17, 8:06 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 8:31 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Julian Phillips, (Sat Mar 17, 10:30 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Avi Kivity, (Sat Mar 17, 11:28 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Junio C Hamano, (Sun Mar 18, 12:47 am)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Junio C Hamano, (Sun Mar 18, 2:45 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, Robin Rosenberg, (Sun Mar 18, 3:53 am)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Sun Mar 18, 8:54 am)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Sun Mar 18, 8:57 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sun Mar 18, 10:23 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sun Mar 18, 10:34 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, Robin Rosenberg, (Sun Mar 18, 11:29 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, Shawn O. Pearce, (Sun Mar 18, 2:25 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Shawn O. Pearce, (Sun Mar 18, 2:38 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Sun Mar 18, 2:48 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, David Brodsky, (Mon Mar 19, 6:16 am)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Johannes Schindelin, (Mon Mar 19, 8:05 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Junio C Hamano, (Mon Mar 19, 8:16 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Shawn O. Pearce, (Mon Mar 19, 8:29 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Shawn O. Pearce, (Mon Mar 19, 8:40 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Mon Mar 19, 9:11 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Shawn O. Pearce, (Mon Mar 19, 9:18 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Mon Mar 19, 9:31 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Shawn O. Pearce, (Mon Mar 19, 9:39 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Mon Mar 19, 9:45 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Mon Mar 19, 9:57 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Junio C Hamano, (Mon Mar 19, 10:44 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Robin Rosenberg, (Mon Mar 19, 11:35 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, David Brodsky, (Tue Mar 20, 2:13 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Tue Mar 20, 7:37 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Nicolas Pitre, (Tue Mar 20, 7:54 pm)