Re: cleaner/better zlib sources?

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Linus Torvalds <torvalds@...>
Cc: Davide Libenzi <davidel@...>, Jeff Garzik <jeff@...>, Git Mailing List <git@...>, <mpm@...>, <bcrl@...>
Date: Friday, March 16, 2007 - 1:06 pm

On Fri, 16 Mar 2007, Linus Torvalds wrote:


This is why in pack v4 there will be an alternate tree object 
representation which is not deflated at all.

In short we intend to have 3 tables where common things are factored 
out:

 1) the path component string table (deflated)

 2) author/committer string table (deflated)

 3) sorted SHA1 table (obviously not deflated)

The sorted SHA1 table will be part of the pack instead of being in the 
pack index.  The idea is that most SHA1's are already duplicated in the 
pack already anyway within commit and tree objects.  With a single table 
then commit and tree objects can index into that SHA1 table rather than 
providing the SHA1 value inline for the objects they refer to.

This means that a tree object record would be only 6 bytes according to 
the current design: 2 bytes to index into the path component string 
table (which also include the mode information), and 4 bytes to index 
into the sorted SHA1 table.  And similarly for commit objects.

This means that the pack index will only have a table of offsets 
corresponding to the table of sorted SHA1's.

So... walking revisions will become only a matter of picking the first 
commit object, using the tree index value (which is not deflated), but 
instead of using it in the SHA1 table it could be used in the offset 
table to find the location of the corresponding tree object directly.  
Same goes for tree entries, or for locating the parent's commit object.

No deflating, no binary searching, no SHA1 comparisons.  Plain straight 
pointer dereference.

Then, if you want to filter tree walking on path spec, you only need to 
locate the path component in the path table once and use the 
corresponding index to filter tree entries instead of repeated strcmp().  
Same thing if you want to filter commits based on author/committer.  
One side effect of this is that you can tell straight away that a path 
doesn't exist in the whole pack if one of its components cannot be found 
in the table (that works only if no legacy tree representations are 
present of course).  That should make history walking blazingly fast.

The only thing that gets deflated is the commit message which needs to 
be inflated only when displaying it.

And so far that makes for quite smaller packs too!


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
cleaner/better zlib sources?, Linus Torvalds, (Thu Mar 15, 9:04 pm)
Re: cleaner/better zlib sources?, Davide Libenzi, (Thu Mar 15, 9:33 pm)
Re: cleaner/better zlib sources?, Davide Libenzi, (Thu Mar 15, 10:06 pm)
Re: cleaner/better zlib sources?, Jeff Garzik, (Thu Mar 15, 9:11 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Thu Mar 15, 9:46 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Thu Mar 15, 9:54 pm)
Re: cleaner/better zlib sources?, Davide Libenzi, (Thu Mar 15, 10:43 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Thu Mar 15, 10:56 pm)
Re: cleaner/better zlib sources?, Davide Libenzi, (Thu Mar 15, 11:16 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Fri Mar 16, 12:21 pm)
Re: cleaner/better zlib sources?, Nicolas Pitre, (Fri Mar 16, 1:06 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Fri Mar 16, 1:51 pm)
Re: cleaner/better zlib sources?, Nicolas Pitre, (Fri Mar 16, 2:09 pm)
Re: cleaner/better zlib sources?, Jeff Garzik, (Fri Mar 16, 12:35 pm)
Re: cleaner/better zlib sources?, Shawn O. Pearce, (Fri Mar 16, 7:22 pm)
Re: cleaner/better zlib sources?, Nicolas Pitre, (Fri Mar 16, 1:12 pm)
Re: cleaner/better zlib sources?, Matt Mackall, (Fri Mar 16, 12:42 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Fri Mar 16, 12:51 pm)
Re: cleaner/better zlib sources?, Davide Libenzi, (Fri Mar 16, 12:24 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Fri Mar 16, 12:35 pm)
Re: cleaner/better zlib sources?, Davide Libenzi, (Fri Mar 16, 3:21 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Fri Mar 16, 8:01 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Fri Mar 16, 9:11 pm)
Re: cleaner/better zlib sources?, Nicolas Pitre, (Fri Mar 16, 11:28 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Sat Mar 17, 1:55 pm)
Re: cleaner/better zlib sources?, Linus Torvalds, (Sat Mar 17, 3:40 pm)
[PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 3:44 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 6:44 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 5:45 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Morten Welinder, (Sat Mar 17, 9:14 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Avi Kivity, (Sun Mar 18, 2:28 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 9:29 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 9:44 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Nicolas Pitre, (Sat Mar 17, 9:38 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 9:55 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Nicolas Pitre, (Sat Mar 17, 10:03 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 10:20 pm)
[PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Sat Mar 17, 11:06 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Junio C Hamano, (Sun Mar 18, 5:45 am)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Sun Mar 18, 11:54 am)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Junio C Hamano, (Mon Mar 19, 11:16 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Tue Mar 20, 12:31 am)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Shawn O. Pearce, (Tue Mar 20, 12:39 am)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Tue Mar 20, 12:57 am)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Johannes Schindelin, (Mon Mar 19, 11:05 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Shawn O. Pearce, (Mon Mar 19, 11:29 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Shawn O. Pearce, (Mon Mar 19, 11:40 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Tue Mar 20, 12:11 am)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Junio C Hamano, (Tue Mar 20, 1:44 am)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Shawn O. Pearce, (Tue Mar 20, 12:18 am)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Tue Mar 20, 12:45 am)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Sun Mar 18, 11:57 am)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Shawn O. Pearce, (Sun Mar 18, 5:38 pm)
Re: [PATCH 3/2] Avoid unnecessary strlen() calls, Linus Torvalds, (Sun Mar 18, 5:48 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Nicolas Pitre, (Sat Mar 17, 11:00 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 11:31 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Robin Rosenberg, (Sun Mar 18, 6:53 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sun Mar 18, 1:34 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Robin Rosenberg, (Sun Mar 18, 2:29 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, David Brodsky, (Mon Mar 19, 9:16 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, Robin Rosenberg, (Tue Mar 20, 2:35 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, David Brodsky, (Tue Mar 20, 5:13 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Tue Mar 20, 10:37 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Nicolas Pitre, (Tue Mar 20, 10:54 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Shawn O. Pearce, (Sun Mar 18, 5:25 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Julian Phillips, (Sun Mar 18, 1:30 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sun Mar 18, 1:23 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Junio C Hamano, (Sat Mar 17, 7:12 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 7:24 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Jon Smirl, (Sat Mar 17, 7:52 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Junio C Hamano, (Sat Mar 17, 6:37 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Nicolas Pitre, (Sat Mar 17, 9:13 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Junio C Hamano, (Sun Mar 18, 3:47 am)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 7:09 pm)
Re: [PATCH 2/2] Implement a simple delta_base cache, Linus Torvalds, (Sat Mar 17, 7:54 pm)
Re: cleaner/better zlib sources?, Shawn O. Pearce, (Sat Mar 17, 1:19 am)
Re: cleaner/better zlib sources?, Matt Mackall, (Thu Mar 15, 9:14 pm)
Re: cleaner/better zlib sources?, Shawn O. Pearce, (Thu Mar 15, 9:10 pm)