Re: [PATCH 5/6] Teach "fsck" not to follow subproject links

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Sam Vilain <sam@...>
Cc: Git Mailing List <git@...>, Junio C Hamano <junkio@...>
Date: Wednesday, April 11, 2007 - 7:16 pm

On Thu, 12 Apr 2007, Sam Vilain wrote:

I think we'll eventually want that *regardless* of how the object handling 
is done (a kind of "cross-submodule boundary check"), but I think that's 
actually outside of the scope of the current fsck.

The current fsck goes to great lengths to make sure that the internal 
consistency of a repository is good. That's also why it takes so long, and 
why it is such an expensive operation to do (notably when you do a 
"--full" check).

In contrast, the "cross-submodule boundary check" is a much cheaper 
operation, *if* you have already verified that the projects are internally 
consistent. It literally boils down to doing a very simplified commit 
chain walker that only parses tree objects and simply spits out the 
SHA1's of the sub-tree commits (and their location in the tree), and then 
a separate phase that just verifies those against the submodules.

And that separate phase - once you've done the fsck for all the 
*individual* repositories - is truly trivial. It's literally just a matter 
of "is that SHA1 a valid commit object". That's *cheap*.

See?


So I think that the way to verify a superproject is:

 - fsck each and every project totally independently. This is something 
   you have to do *anyway*.

 - either as you fsck, or as a separate phase after the fsck, just 
   traverse the trees and spit out "these are the SHA1's of subprojects"

 - finally, just go through the list of SHA1's (after every project has 
   been fsck'd) and verify that they exist (since if they exist, they will 
   have everything that is reachable from them, as that's one of the 
   things that the *local* fsck verifies)

Notice? At no point do you actually need to do a "global fsck". You can do 
totally independent local fsck's, and then a really cheap test of 
connectedness once those fsck's have completed.

The reason a *full* global fsck is so expensive is that it would have an 
absolutely humungous working set, and effectively keep everything in 
memory through it all. Doing it in stages ("fsck smaller individiual trees 
separately") is actually the same amount of absolute work, but the working 
set never grows, so it scales much better.

(fsck'ing projects individually also happens to allow you to do the 
sub-project fsck's in parallel across multiple CPU's or multiple machines, 
so it actually scales much better that way too - but the big problem 
tends to be excessive memory use, so the "SMP parallel version" only 
makes sense if you have tons of memory and can afford to do these things 
at the same time!)

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH 0/6] Initial subproject support (RFC?), Linus Torvalds, (Tue Apr 10, 12:12 am)
Re: [PATCH 0/6] Initial subproject support (RFC?), Linus Torvalds, (Tue Apr 10, 12:46 am)
Re: [PATCH 0/6] Initial subproject support (RFC?), Alex Riesen, (Tue Apr 10, 9:04 am)
Re: [PATCH 0/6] Initial subproject support (RFC?), Martin Waitz, (Wed Apr 11, 4:32 am)
Re: [PATCH 0/6] Initial subproject support (RFC?), Alex Riesen, (Wed Apr 11, 4:42 am)
Re: [PATCH 0/6] Initial subproject support (RFC?), Martin Waitz, (Wed Apr 11, 4:57 am)
Re: [PATCH 0/6] Initial subproject support (RFC?), Linus Torvalds, (Tue Apr 10, 11:13 am)
Re: [PATCH 0/6] Initial subproject support (RFC?), Alex Riesen, (Tue Apr 10, 11:48 am)
Re: [PATCH 0/6] Initial subproject support (RFC?), Linus Torvalds, (Tue Apr 10, 12:07 pm)
Re: [PATCH 0/6] Initial subproject support (RFC?), Junio C Hamano, (Tue Apr 10, 3:32 pm)
Re: [PATCH 0/6] Initial subproject support (RFC?), Linus Torvalds, (Tue Apr 10, 4:11 pm)
Re: [PATCH 0/6] Initial subproject support (RFC?), Junio C Hamano, (Tue Apr 10, 4:52 pm)
Re: [PATCH 0/6] Initial subproject support (RFC?), Sam Ravnborg, (Tue Apr 10, 5:02 pm)
Re: [PATCH 0/6] Initial subproject support (RFC?), Junio C Hamano, (Tue Apr 10, 5:27 pm)
Re: [PATCH 0/6] Initial subproject support (RFC?), Nicolas Pitre, (Tue Apr 10, 5:03 pm)
Re: [PATCH 0/6] Initial subproject support (RFC?), J. Bruce Fields, (Sun Apr 15, 7:21 pm)
Re: [PATCH 0/6] Initial subproject support (RFC?), Alex Riesen, (Tue Apr 10, 12:43 pm)
Re: [PATCH 6/6] Teach core object handling functions about g..., Josef Weidendorfer, (Thu Apr 12, 11:12 am)
Re: [PATCH 6/6] Teach core object handling functions about g..., Josef Weidendorfer, (Tue Apr 10, 12:28 pm)
Re: [PATCH 6/6] Teach core object handling functions about g..., Josef Weidendorfer, (Tue Apr 10, 3:29 pm)
Re: [PATCH 6/6] Teach core object handling functions about g..., Josef Weidendorfer, (Tue Apr 10, 1:23 pm)
Re: [PATCH 6/6] Teach core object handling functions about g..., Frank Lichtenheld, (Tue Apr 10, 4:40 am)
[PATCH 5/6] Teach "fsck" not to follow subproject links, Linus Torvalds, (Tue Apr 10, 12:15 am)
Re: [PATCH 5/6] Teach "fsck" not to follow subproject links, Linus Torvalds, (Wed Apr 11, 7:16 pm)
Re: [PATCH 5/6] Teach "fsck" not to follow subproject links, Junio C Hamano, (Wed Apr 11, 10:00 pm)
Re: [PATCH 5/6] Teach "fsck" not to follow subproject links, Junio C Hamano, (Wed Apr 11, 10:06 pm)
Re: [PATCH 5/6] Teach "fsck" not to follow subproject links, Linus Torvalds, (Wed Apr 11, 10:28 pm)
Re: [PATCH 5/6] Teach "fsck" not to follow subproject links, Linus Torvalds, (Wed Apr 11, 10:14 pm)
Re: [PATCH 5/6] Teach "fsck" not to follow subproject links, Linus Torvalds, (Fri Apr 13, 11:23 am)
Re: [PATCH 5/6] Teach "fsck" not to follow subproject links, Junio C Hamano, (Wed Apr 11, 10:30 pm)
[PATCH 3/6] Add 'resolve_gitlink_ref()' helper function, Linus Torvalds, (Tue Apr 10, 12:14 am)
Re: [PATCH 3/6] Add 'resolve_gitlink_ref()' helper function, Linus Torvalds, (Tue Apr 10, 10:58 am)
Re: [PATCH 3/6] Add 'resolve_gitlink_ref()' helper function, Josef Weidendorfer, (Tue Apr 10, 11:54 am)
Re: [PATCH 3/6] Add 'resolve_gitlink_ref()' helper function, Linus Torvalds, (Tue Apr 10, 11:52 am)
Re: [PATCH 3/6] Add 'resolve_gitlink_ref()' helper function, Linus Torvalds, (Tue Apr 10, 12:16 pm)