Johannes Schindelin wrote:
quoted text > Hi,
>
> On Wed, 13 Dec 2006, Andreas Ericsson wrote:
>
>> Junio C Hamano wrote:
>>> "Bahadir Balban" <bahadir.balban@gmail.com> writes:
>>>
>>> There is one thing we could further optimize, though.
>>>
>>> Switching branches with 100k blobs in a commit even when there
>>> are a handful paths different between the branches would still
>>> need to populate the index by reading two trees and collapsing
>>> them into a single stage. In theory, we should be able to do a
>>> lot better if two-tree case of read-tree took advanrage of
>>> cache-tree information. If ce_match_stat() says Ok for all
>>> paths in a subdirectory and the cached tree object name for that
>>> subdirectory in the index match what we are reading from the new
>>> tree, we should be able to skip reading that subdirectory (and
>>> its subdirectories) from the new tree object at all.
>>>
>>> Anybody interested to give it a try?
>>>
>> I'm not vell-versed enough in git internals to have my hopes high of
>> making something useful of it, but if you give me a pointer of where to
>> start I'd be happy to try, and perhaps learn something in the process.
>
> Okay, I'll have a stab at explaining it.
>
> For huge working directories, you usually have a huge number of trees. The
> idea of cache_tree is to remember not only the stat information of the
> blobs in the index, but to cache the hashes of the trees also (until they
> are invalidated, e.g. by an update-index). This avoids recalculation of
> the hashes when committing.
>
> This cache is accessible by the global variable active_cache_tree. It is
> best accessed by the function cache_tree_find(), which you call like that:
>
> struct cache_tree *ct = cache_tree_find(active_cache_tree, path);
>
> where the variable "path" may contain slashes. The SHA1 of the
> corresponding tree is in ct->sha1, and you can check if the hash is still
> valid by asking
>
> if (cache_tree_fully_valid(ct))
> /* still valid */
>
> AFAIU Junio would like to take the shortcut of doing nothing at all when
> (twoway) reading a tree whose hash is identical to the hash stored in the
> corresponding cache_tree _and_ when the cache is still fully valid.
>
Seems you wrote half the code for me already. :)
Thanks for the excellent explanation. I'll see if I can grok it further
tonight.
--
Andreas Ericsson
andreas.ericsson@op5.se
OP5 AB
www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to
majordomo@vger.kernel.org
More majordomo info at
http://vger.kernel.org/majordomo-info.html