Re: [PATCH 0/6] Initial subproject support (RFC?)

Previous thread: sscanf/strtoul: parse integers robustly by Jim Meyering on Monday, April 9, 2007 - 4:01 pm. (3 messages)

Next thread: [PATCH 12/10] validate reused pack data with CRC when possible by Nicolas Pitre on Monday, April 9, 2007 - 9:15 pm. (1 message)
From: Linus Torvalds
Date: Monday, April 9, 2007 - 9:12 pm

Ok, the following is a series of six patches that implement some very 
low-level plumbing for what I consider sane subproject support.

NOTE! I want to make it very clear that this series of patches does not 
make subprojects "usable". They are very core plumbing that allows people 
to think about the issues, and shows how the low-level code could (and in 
my opinion, should) be done.

Some of the early patches are just cleanups and very basic stuff required 
to actually get to the meat of it all. I actually think that they are all 
in a state where they could be applied, if only because they don't 
actually really *do* anything unless you start generating index files 
entries (and trees) that have the "gitlink" entries in them.

I've actually done some testing with a repository that has these kinds of 
subproject pointers in them, and no, it's really not fully fleshed out 
yet, but yes, I can actually do a commit in one of the subprojects, and 
when I do that, the "raw" diff literally looks like this:

	[torvalds@woody superproject]$ git diff --raw
	:160000 160000 5813084832d3c680a3436b0253639c94ed55445d 0000000... M    sub-B

and I can do a "git commit -a" in the superproject to commit the new 
state.

NOTE! This series of six patches does not actually contain everything you 
need to do that - in particular, this series will not actually connect up 
the magic to make "git add" (and thus "git commit") actually create the 
gitlink entries for subprojects. That's another (quite small) patch, but I 
haven't cleaned it up enough to be submittable yet.

I split my original larger patch up into more manageable pieces, so that 
you should be able to actually just read the patches themselves and get a 
reasonable idea about what it's doing, even *without* actually testing it. 
And obviously, "make test" still completes happily, if only because none 
of the tests actually trigger any of the new code.

The patches are all fairly small, and the two first ones are really just ...
From: Linus Torvalds
Date: Monday, April 9, 2007 - 9:13 pm

The diff helpers used to do the magic mode canonicalization and all the
other special mode handling by hand ("trust executable bit" and "has
symlink support" handling).

That's bogus. Use "ce_mode_from_stat()" that does this all for us.

This is also going to be required when we add support for links to other
git repositories.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 diff-lib.c |   15 +++------------
 1 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index 5c5b05b..c6d1273 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -357,7 +357,7 @@ int run_diff_files(struct rev_info *revs, int silent_on_removed)
 					continue;
 			}
 			else
-				dpath->mode = canon_mode(st.st_mode);
+				dpath->mode = ntohl(ce_mode_from_stat(ce, st.st_mode));
 
 			while (i < entries) {
 				struct cache_entry *nce = active_cache[i];
@@ -374,8 +374,7 @@ int run_diff_files(struct rev_info *revs, int silent_on_removed)
 					int mode = ntohl(nce->ce_mode);
 					num_compare_stages++;
 					hashcpy(dpath->parent[stage-2].sha1, nce->sha1);
-					dpath->parent[stage-2].mode =
-						canon_mode(mode);
+					dpath->parent[stage-2].mode = ntohl(ce_mode_from_stat(nce, mode));
 					dpath->parent[stage-2].status =
 						DIFF_STATUS_MODIFIED;
 				}
@@ -424,15 +423,7 @@ int run_diff_files(struct rev_info *revs, int silent_on_removed)
 		if (!changed && !revs->diffopt.find_copies_harder)
 			continue;
 		oldmode = ntohl(ce->ce_mode);
-
-		newmode = canon_mode(st.st_mode);
-		if (!trust_executable_bit &&
-		    S_ISREG(newmode) && S_ISREG(oldmode) &&
-		    ((newmode ^ oldmode) == 0111))
-			newmode = oldmode;
-		else if (!has_symlinks &&
-		    S_ISREG(newmode) && S_ISLNK(oldmode))
-			newmode = oldmode;
+		newmode = ntohl(ce_mode_from_stat(ce, st.st_mode));
 		diff_change(&revs->diffopt, oldmode, newmode,
 			    ce->sha1, (changed ? null_sha1 : ce->sha1),
 			    ce->name, NULL);
-- 
1.5.1.110.g1e4c

-

From: Linus Torvalds
Date: Monday, April 9, 2007 - 9:13 pm

This just makes sure that when we do a read_directory(), we check
that the filename fits in the buffer we allocated (with a bit of
slop)

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 dir.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/dir.c b/dir.c
index 7426fde..4f5a224 100644
--- a/dir.c
+++ b/dir.c
@@ -353,6 +353,9 @@ static int read_directory_recursive(struct dir_struct *dir, const char *path, co
 			     !strcmp(de->d_name + 1, "git")))
 				continue;
 			len = strlen(de->d_name);
+			/* Ignore overly long pathnames! */
+			if (len + baselen + 8 > sizeof(fullname))
+				continue;
 			memcpy(fullname + baselen, de->d_name, len+1);
 			if (simplify_away(fullname, baselen + len, simplify))
 				continue;
-- 
1.5.1.110.g1e4c

-

From: Linus Torvalds
Date: Monday, April 9, 2007 - 9:14 pm

This new function resolves a ref in *another* git repository.  It's
named for its intended use: to look up the git link to a subproject.

It's not actually wired up to anything yet, but we're getting closer to
having fundamental plumbing support for "links" from one git directory
to another, which is the basis of subproject support.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 refs.c |   79 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 refs.h |    3 ++
 2 files changed, 82 insertions(+), 0 deletions(-)

diff --git a/refs.c b/refs.c
index d2b7b7f..229da74 100644
--- a/refs.c
+++ b/refs.c
@@ -215,6 +215,85 @@ static struct ref_list *get_loose_refs(void)
 
 /* We allow "recursive" symbolic refs. Only within reason, though */
 #define MAXDEPTH 5
+#define MAXREFLEN (1024)
+
+static int resolve_gitlink_packed_ref(char *name, int pathlen, const char *refname, unsigned char *result)
+{
+	FILE *f;
+	struct cached_refs refs;
+	struct ref_list *ref;
+	int retval;
+
+	strcpy(name + pathlen, "packed-refs");
+	f = fopen(name, "r");
+	if (!f)
+		return -1;
+	read_packed_refs(f, &refs);
+	ref = refs.packed;
+	retval = -1;
+	while (ref) {
+		if (!strcmp(ref->name, refname)) {
+			retval = 0;
+			memcpy(result, ref->sha1, 20);
+			break;
+		}
+		ref = ref->next;
+	}
+	free_ref_list(refs.packed);
+	return retval;
+}
+
+static int resolve_gitlink_ref_recursive(char *name, int pathlen, const char *refname, unsigned char *result, int recursion)
+{
+	int fd, len = strlen(refname);
+	char buffer[128], *p;
+
+	if (recursion > MAXDEPTH || len > MAXREFLEN)
+		return -1;
+	memcpy(name + pathlen, refname, len+1);
+	fd = open(name, O_RDONLY);
+	if (fd < 0)
+		return resolve_gitlink_packed_ref(name, pathlen, refname, result);
+
+	len = read(fd, buffer, sizeof(buffer)-1);
+	close(fd);
+	if (len < 0)
+		return -1;
+	while (len && isspace(buffer[len-1]))
+		len--;
+	buffer[len] = 0;
+
+	/* Was it a detached head or an ...
From: Alex Riesen
Date: Tuesday, April 10, 2007 - 2:38 am

Can't a subproject be bare?
-

From: Linus Torvalds
Date: Tuesday, April 10, 2007 - 7:58 am

Not when it is checked out, no. That's what "checked out" means ;)

If a subproject is bare, it never gets resolved, because it's never 
checked out in a superproject.

So a subproject *can* be bare, but when it's bare it is just a totally 
regular independent git project, simply by *definition* of not being 
checked out inside a superproject.

But hey, that was just a design decision of mine, and if people can argue 
for it being wrong, I don't think I'm married to it ;)

		Linus
-

From: Alex Riesen
Date: Tuesday, April 10, 2007 - 8:35 am

I didn't actually had a use case in mind as I asked it.
After a bit of thinking I could imagine a repo which is
used for integration exclusively (no compilation or looking
at the files at all).
-

From: Linus Torvalds
Date: Tuesday, April 10, 2007 - 8:52 am

Well, you also cannot *commit* to a bare repository, so it's a bit 
pointless for integration reasons. You'd still have to commit all changes 
somewhere else.

That said, it's definitely designed so that if you want to automate 
tracking other peoples bare repositories, you can do so: you'd just have 
to *really* script it with something like

	git update-index --cacheinfo 0160000 <sha1> <dirname>

(which is how you could create those commits to a bare repo too, so it's 
not like this is really even any different)

		Linus
-

From: Alex Riesen
Date: Tuesday, April 10, 2007 - 8:57 am

Yes. Subprojects are push-only for storing and reference purposes.

Nice :)
-

From: Linus Torvalds
Date: Tuesday, April 10, 2007 - 9:16 am

Well, the *really* nice thing about doing it like this is that you can 
actually update subprojects without even having them even be *local* to 
where you do the superproject.

IOW, you could literally build up the superproject by saying that you want 
to track "all git projects I care about" somewhere else, and do a series 
of automated

	git ls-remote sub-project-xyzzy tracking-branch-xyzzy | ...

and basically create the "superproject" without ever actually downloading 
or populating the subprojects at all.

Then, if everything is set up correctly, you can basically use the 
superproject as an "auto-mirror" - whenever you want to get all the 
projects you care about, you just clone that superproject, and (once 
you've taught "git clone" to fetch the subprojects, of course ;^) you'd 
basically fetch them all from their appropriate locations - without ever 
having the actual superproject have to even *really* care about it.

So basically, a superproject could be used as just a "gathering point", 
without having to actually *contain* any of the subprojects. The actual 
sources for subprojects may be on totally different servers. That's what 
real distribution is all about.

		Linus
-

From: Josef Weidendorfer
Date: Tuesday, April 10, 2007 - 8:54 am

It would be nice if a redirection via a "gitdir = ..." line
in .git/link of the subproject (when existing) would be possible.
This was part of the light-weight checkout proposal.

In contrast to contrib/workdir/git-new-workdir, this would allow
for (to be implemented) magic symlinks to stay intact when
moving the submodule directory around.

However, this can be added later.

Josef

PS: I wonder how long it takes to move the official KDE repository over to git ;-)
-

From: Linus Torvalds
Date: Monday, April 9, 2007 - 9:14 pm

This just adds the basic helper functions to recognize and work with git
tree entries that are links to other git repositories ("subprojects").
They still aren't actually connected up to any of the code-paths, but
now all the infrastructure is in place.

The next commit will start actually adding actual subproject support.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 cache.h |   20 +++++++++++++++++++-
 1 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/cache.h b/cache.h
index eb57507..1b3d00e 100644
--- a/cache.h
+++ b/cache.h
@@ -25,6 +25,22 @@
 #endif
 
 /*
+ * A "directory link" is a link to another git directory.
+ *
+ * The value 0160000 is not normally a valid mode, and
+ * also just happens to be S_IFDIR + S_IFLNK
+ *
+ * NOTE! We *really* shouldn't depend on the S_IFxxx macros
+ * always having the same values everywhere. We should use
+ * our internal git values for these things, and then we can
+ * translate that to the OS-specific value. It just so
+ * happens that everybody shares the same bit representation
+ * in the UNIX world (and apparently wider too..)
+ */
+#define S_IFDIRLNK	0160000
+#define S_ISDIRLNK(m)	(((m) & S_IFMT) == S_IFDIRLNK)
+
+/*
  * Intensive research over the course of many years has shown that
  * port 9418 is totally unused by anything else. Or
  *
@@ -104,6 +120,8 @@ static inline unsigned int create_ce_mode(unsigned int mode)
 {
 	if (S_ISLNK(mode))
 		return htonl(S_IFLNK);
+	if (S_ISDIR(mode) || S_ISDIRLNK(mode))
+		return htonl(S_IFDIRLNK);
 	return htonl(S_IFREG | ce_permissions(mode));
 }
 static inline unsigned int ce_mode_from_stat(struct cache_entry *ce, unsigned int mode)
@@ -121,7 +139,7 @@ static inline unsigned int ce_mode_from_stat(struct cache_entry *ce, unsigned in
 }
 #define canon_mode(mode) \
 	(S_ISREG(mode) ? (S_IFREG | ce_permissions(mode)) : \
-	S_ISLNK(mode) ? S_IFLNK : S_IFDIR)
+	S_ISLNK(mode) ? S_IFLNK : S_ISDIR(mode) ? S_IFDIR : S_IFDIRLNK)
 
 #define ...
From: Linus Torvalds
Date: Monday, April 9, 2007 - 9:15 pm

Since the subprojects don't necessarily even exist in the current tree,
much less in the current git repository (they are totally independent
repositories), we do not want to try to follow the chain from one git
repository to another through a gitlink.

This involves teaching fsck to ignore references to gitlink objects from
a tree and from the current index.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 builtin-fsck.c |    9 ++++++++-
 tree.c         |   15 ++++++++++++++-
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/builtin-fsck.c b/builtin-fsck.c
index 4d8b66c..f22de8d 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -253,6 +253,7 @@ static int fsck_tree(struct tree *item)
 		case S_IFREG | 0644:
 		case S_IFLNK:
 		case S_IFDIR:
+		case S_IFDIRLNK:
 			break;
 		/*
 		 * This is nonstandard, but we had a few of these
@@ -695,8 +696,14 @@ int cmd_fsck(int argc, char **argv, const char *prefix)
 		int i;
 		read_cache();
 		for (i = 0; i < active_nr; i++) {
-			struct blob *blob = lookup_blob(active_cache[i]->sha1);
+			unsigned int mode;
+			struct blob *blob;
 			struct object *obj;
+
+			mode = ntohl(active_cache[i]->ce_mode);
+			if (S_ISDIRLNK(mode))
+				continue;
+			blob = lookup_blob(active_cache[i]->sha1);
 			if (!blob)
 				continue;
 			obj = &blob->object;
diff --git a/tree.c b/tree.c
index d188c0f..dbb63fc 100644
--- a/tree.c
+++ b/tree.c
@@ -143,6 +143,14 @@ struct tree *lookup_tree(const unsigned char *sha1)
 	return (struct tree *) obj;
 }
 
+/*
+ * NOTE! Tree refs to external git repositories
+ * (ie gitlinks) do not count as real references.
+ *
+ * You don't have to have those repositories
+ * available at all, much less have the objects
+ * accessible from the current repository.
+ */
 static void track_tree_refs(struct tree *item)
 {
 	int n_refs = 0, i;
@@ -152,8 +160,11 @@ static void track_tree_refs(struct tree *item)
 
 	/* Count how many entries there are.. */
 ...
From: Sam Vilain
Date: Wednesday, April 11, 2007 - 3:41 pm

Does this consider the case where the intent of the subprojects are to
collate multiple, small projects into one bigger project?

In that case, you might want to keep all of the subprojects in the same
git repository.

Sam.
-

From: Linus Torvalds
Date: Wednesday, April 11, 2007 - 3:48 pm

I assume you mean "you might want to keep all of the subprojects' objects 
in the same git object directory".

And yes, that's absolutely true, but it's technically no different from 
just using GIT_OBJECT_DIRECTORY to share objects between totally unrelated 
projects, or using git/alternates to share objects between (probably 
*less* unrelated repositories, but still clearly individual repos).

So the main point of superproject/subprojects is to allow independence 
(because independence is what allows it to scale), but there is nothing to 
say that things *have* to kept totally isolated. 

			Linus
-

From: Sam Vilain
Date: Wednesday, April 11, 2007 - 3:59 pm

Would that be the only distinction?


I'm particularly interested in repositories with, say, thousands of
submodules but only a few hundred meg. I really want to avoid the
situation where each of those submodules gets checked or descended into
separately for updates etc.

Sam.
-

From: Linus Torvalds
Date: Wednesday, April 11, 2007 - 4:16 pm

I think we'll eventually want that *regardless* of how the object handling 
is done (a kind of "cross-submodule boundary check"), but I think that's 
actually outside of the scope of the current fsck.

The current fsck goes to great lengths to make sure that the internal 
consistency of a repository is good. That's also why it takes so long, and 
why it is such an expensive operation to do (notably when you do a 
"--full" check).

In contrast, the "cross-submodule boundary check" is a much cheaper 
operation, *if* you have already verified that the projects are internally 
consistent. It literally boils down to doing a very simplified commit 
chain walker that only parses tree objects and simply spits out the 
SHA1's of the sub-tree commits (and their location in the tree), and then 
a separate phase that just verifies those against the submodules.

And that separate phase - once you've done the fsck for all the 
*individual* repositories - is truly trivial. It's literally just a matter 
of "is that SHA1 a valid commit object". That's *cheap*.


So I think that the way to verify a superproject is:

 - fsck each and every project totally independently. This is something 
   you have to do *anyway*.

 - either as you fsck, or as a separate phase after the fsck, just 
   traverse the trees and spit out "these are the SHA1's of subprojects"

 - finally, just go through the list of SHA1's (after every project has 
   been fsck'd) and verify that they exist (since if they exist, they will 
   have everything that is reachable from them, as that's one of the 
   things that the *local* fsck verifies)

Notice? At no point do you actually need to do a "global fsck". You can do 
totally independent local fsck's, and then a really cheap test of 
connectedness once those fsck's have completed.

The reason a *full* global fsck is so expensive is that it would have an 
absolutely humungous working set, and effectively keep everything in 
memory through it all. Doing it in ...
From: David Lang
Date: Wednesday, April 11, 2007 - 4:05 pm

would it make sense to have a --multiple-project option for fsck that would let 
you specify multiple 'projects' that share a object set and have the default 
checking not do the reachability checks that cause problems in this case?

Then people can share the objects if they want to and still do a full check, but 
would get warned that the full check would take a lot of time. which is not a 
big problem for a housekeeping thing that's run infrequently to find unreachable 
objects (which is something that should seldom happen in a well managed project)

David Lang
-

From: Linus Torvalds
Date: Wednesday, April 11, 2007 - 4:53 pm

Well, the thing is, sharing object directories actually makes things 
*harder* to check, rather than easier.

It can be a nice space optimization, and yes, if there really is a lot of 
shared state, it can make it much cheaper to do some of the checks, but 
right now we have absolutely *no* way for fsck to then do the reachability 
check, because there is no way to tell fsck where all the refs are (since 
now the refs come in from multiple repositories!)

So the individual objects get cheaper to fsck (no need to fsck shared 
objects over and over again), but the reachability gets much harder to 
fsck.

It's not an insurmountable problem, or even necessarily a very large one, 
but it boils down to one very basic issue:

 - nobody seems to actually *use* the shared object directory model!

The thing is, with pack-files and alternates directories, a lot of the 
original reasons for shared object directories simply don't exist..

		Linus
-

From: Dana How
Date: Wednesday, April 11, 2007 - 5:00 pm

Cool -- my previous email makes me either a git idiot or a git pioneer!

So I'll think through my usage model some more and
look over the fsck source.

Until then,
-- 
Dana L. How  danahow@gmail.com  +1 650 804 5991 cell
-

From: David Lang
Date: Wednesday, April 11, 2007 - 4:30 pm

this is why I was suggesting a --multiple-project option to let you tell fsck 


I suspect that if it coudl be checked it would be used more, especially with the 
subproject support.

David Lang
-

From: Linus Torvalds
Date: Wednesday, April 11, 2007 - 7:14 pm

Well, just from a personal observation:
 - I would *personally* actually refuse to share objects with anybody 
   else.

I just find the idea too scary. Somebody doing something bad to their 
object store by mistake (running "git prune" without realizing that there 
are *my* objects there too, or just deciding that they want to play with 
the object directory by hand, or running a new fancy experimental importer 
that has a subtle bug wrt object handling or anything like that).

I'll endorse use "alternates" files, but partly because I know the main 
project is safe (any alternates usage is in the "satellite" clones anyway, 
and they will never write to the alternate object directory), and partly 
because at least for the kernel, we don't have branches that get reset in 
the main project, so there's no reason to fear that a "git repack -a -d" 
will ever screw up any of the satellite repositories even by mistake.

But for git projects, even alternates isn't safe, in case somebody bases 
their own work on a version of "pu" that eventually goes away (even with 
reflogs, pruning *eventually* takes place).

So I tend to think that alternates and shared object directories are 
really for "temporary" stuff, or for *managed* repositories that are at 
git *hosting* sites (eg repo.or.cz), and where there is some other safety 
involved, ie users don't actually access the object directories directly 
in any way.

So I've at least personally come to the conclusion that for a *developer* 
(as opposed to a hosting site!), shared object directories just never make 
sense. The downsides are just too big. Even alternates is something where 
you just need to be fairly careful!

		Linus
-

From: Junio C Hamano
Date: Wednesday, April 11, 2007 - 7:30 pm

Actually that is not even true for repo.or.cz -- the site lets
people to create *forks* of the main project, and I recall it is
implemented in terms of alternates.

That's one of the reasons I never asked to take over git.git
repository there.  I have alt-git.git instead, which does not
allow forks.

-

From: David Lang
Date: Thursday, April 12, 2007 - 10:18 am

I was actually thinking that hosting sites (and things like gitorrent) would be 
the ones that would get the most benifit from shareing objects. the amount saved 
for any individual developer is probably fairly minor (and the individual 
developer could run a script to look across their objects and hard-link them 
togeather if they care about the space)

David Lang
-

From: Dana How
Date: Thursday, April 12, 2007 - 11:32 am

These arguments all seem pretty convincing to me --
maybe the problem is that I'm not a "*developer*" right now.
Instead I'm part of a multi-developer *site*.
Below I talk about a possible way we could use git
without changing it (since I recognize this would be a minority usage pattern).

We use perforce to manage a mixed hardware/software project
(I'm the 55GB check-out guy, remember?).  We have at least 3 different
kinds of data with different usage patterns, and using perforce for
everything in one centralized server was not the best solution.

Each user ("client") has their own worktree and the perforce
repository is on a shared central server.  You can consider perforce
to have the equivalent of git's index, but it is stored on the server,
in one file ("db.have") covering all clients.  Obviously that becomes a
bottleneck -- and recently db.have got larger than the total cache RAM on
the server, which really slowed things down until we moved to a larger
server.  But repository architecture aside,  the real problem has been
perforce's usability.  Frequently one contributor,  having gotten ahead
of the team,  needs to share this more recent work with only a few
people.  This could be done with p4 branching,  but this is really clunky.
So instead the work is pushed out (submitted) to everyone, causing
instability; this is partially remedied by doing it in smaller chunks.
Another perforce problem is that tagging consumes a lot of server
space (and may slow things down as well).

Some of this data will stay in perforce, some will move into revision
control built-in to some of our other tools, and I'd like to try to move some
of it into git.  The main attraction for the last group is the lightweight
branching that would allow early/tentative work to be easily shared.
I think the subproject work currently being discussed is going to
be very helpful as well -- the perforce equivalent is chaotic.

We could give each user a work tree and an object repository,
and then have a ...
From: Linus Torvalds
Date: Thursday, April 12, 2007 - 12:17 pm

Yes.

The issues for hosting sites are very different from the issues of 
individual developers having their own git repositories, and I agree 100% 
that both alternates and shared object directories make tons of sense for 

I hope it wouldn't even be a minority usage pattern. I am a firm believer 
that distributed SCM's and git in particular makes a lot more sense for 
source control hosting than CVS or SVN do. I'm really disappointed with 
things like sourceforge, and part of the problem is literally that a 
centralized SCM is really *fundamentally* wrong for a hosting entity. 

Using a distributed SCM just makes _so_ much more sense for hosting 
projects, and I've actually very much wanted to try to make sure that git 
can help people who host things. 

It's not my *own* primary use, but I think it's a very important usage 
pattern, even though it's very different froma "normal developer" private 
sandbox case.

So I think your case is really very interesting. I'd love to help figure 
out how to help you guys with git, but because it's not how I personally 
work, I can really just try to help when you actually hit a problem - 
you'll have to figure out what your usage patterns actually are on your 
own ;)

And btw, I think the shared object model really works very well, but I 
think it has to be paired with some stricter rules than people who use 
their own repos tend to have. For example, end-point developers have 
become very used to rebasing and generally rewriting history (or just 
resetting to an older state), and that's something that works find in a 
"local repository" setup, but it's also the kinds of patterns that can 
really screw you in a hosted and shared-object environment.

As to your two setups: I would suggest you go with the "hidden" shared 
version (ie people use the remote access pull/push to a server, and the 
*server* uses a shared object repository for multiple repositories), 
rather than having a user-visible globally shared object ...
From: Rogan Dawes
Date: Friday, April 13, 2007 - 2:00 am

Would it not make sense for a hosting environment to say, if you are 
using alternates, or shared object directories, then you need to include 
*all* the refs in *all* the projects if you ever do an fsck?

I'm not sure how well git will scale in this case, although it just 
should be a matter of how well git scales to dealing with a single 
project with tens of thousands of refs/tags/etc. The only problem might 
be in passing all those refs/tags to fsck in one go. STDIN, I guess?

Rogan
-

From: Linus Torvalds
Date: Friday, April 13, 2007 - 8:23 am

Yes. And it shouldn't be hard to add support to do it. It's just not been 
done.

A lot of git programs already take refs on stdin, but fsck just doesn't do 
it (it can do it from the command line, but you'd run out of command line 
space very quickly).

More natural would be to just list all the git repos by git repo pathname 
(and there, usually the command line probably *is* long enough), but 
somebody would just have to do it. It's probably not very much code: just 
iterate over each repo both when adding refs and when actually doing the 

For a real shared object directory, passing the refs to stdin (and 
teaching fsck about a "--stdin" flag) would be consistent with what we do 
for many other commands, so yes, that would work.

However, fsck actually tends to want not just the refs, but actually 
things like the index files and reflog files too, because those add other 
reachability info, which is why it's probably more natural to just give 
fsck the list of related repositories and let it figure them out.

That's also what you'd want to do for "alternates", since now there is no 
longer a single object directory either, but multiple separate (but 
related) ones.

Somebody would just have to write the code.. The basic rules are really 
all in "git/builtin-fsck.c": cmd_fsck(). Hint hint.

			Linus
-

From: Dana How
Date: Saturday, April 14, 2007 - 11:50 pm

For clarity I should have written *office* instead of *site* to
describe my situation,
We did go down the local disk route, but after two significant losses of
individuals' work,  it was decreed that (perforce) work trees must be
on the NetApp.  So we already made the investment in beefiness --
for different reasons -- and I need to conform to these decisions for
the moment.

After reliability, the other big criterion (especially with our
penchant for large files)
will be speed. With perforce,  users now see submit={1 copy to server},
sync={1 copy from server}.  In the short term I can't get away with changing
this to submit={copy working to indiv repo, copy indiv repo to shared repo}
and sync={copy shared repo to indiv repo, copy indiv repo to working},
because at first everyone will be trying to emulate what they did in perforce.

So probably I'll start out with either a very small testgroup,
or one shared object repository with sticky/group tricks on the NetApp.
Once git's collaboration advantages are apparent,
I'll switch to the hidden repository model which I prefer as well.
And hopefully these collaboration advantages will also mean people
will commit more often and local disks can come back into favor --
and then the "extra" local repo file copy operations will be less noticeable.

In any event, I have some scripting to do to learn more about our usage
patterns and pushing our datasets throught git.  I also need to finish
the pack-splitting patch (after 64b index goes in). Finally,  before all that,
I'll be out of the country for the next ~10 days...

Thanks,
-- 
Dana L. How  danahow@gmail.com  +1 650 804 5991 cell
-

From: Sam Vilain
Date: Wednesday, April 11, 2007 - 5:03 pm

I think that's just the chicken-and-egg problem. Once this happens I
think we'll see people aggregating all sorts of related repositories
with this feature, and possibly making much richer histories by tracking
portions of their trees as subprojects rather than just a subdirectory.

Sam.

-

From: Junio C Hamano
Date: Wednesday, April 11, 2007 - 5:34 pm

The small detail in the last step is wrong, though.  Even if
they EXIST, they may be isolated commits that are note connected
to refs, and fsck in the repository would not have warned about
unreachable trees from such unconnected commits.  So you would
need to do a reachability from these commits to the refs in the
subproject.

This would be similar to the quick-fetch topic I sent out a
couple of patches for, that implements logic to skip fetching
objects from your alternate.  You would have rev-list --objects
traverse from them with "--not --all" in the subproject
repository and make sure it does not trigger "I could not list
all objects reachable from the commits you wanted because such
and such tree/blob are missing".

    That reminds me of one thing I haven't verified.  I am not
    absolutely sure that rev-list --objects makes sure that
    blobs it lists exist (trees are checked as it needs to read
    them, and if they are missing or corrupt it would notice and
    barf).  When it is used for the purpose of this "subproject
    boundary fsck" and the quick-fetch, it should.  Perhaps a
    specialized option to check deeper than usual is needed.  I

This is still true.

-

From: Linus Torvalds
Date: Wednesday, April 11, 2007 - 6:52 pm

The superproject *is* a ref.

You cannot prune the subprojects on their own. That's the *only* real 
special rule about subprojects. Exactly because pruning them on their own 
is not a valid op to do.

It's the same way with an source of "alternate" objects (or a shared 
object directory) - you'd better not prune them, because other projects 
may have refs to them that you don't know about locally. So this isn't 
somethign new to subprojects.

		Linus
-

From: Junio C Hamano
Date: Wednesday, April 11, 2007 - 7:00 pm

But when you fsck the subproject repository in isolation in the
earlier step in your procedure, that is not taken into account,
is it?

The situation I had in mind was not about pruning, but an
earlier fetch, either the native one that unpacks the objects
into loose form or a http walker, fetched a commit near the tip
but was interrupted/killed before finishing the fetch nor
updating the ref.  The tip of such an incomplete commit chain
would be reported dangling.  They are ahead of your refs but
they may lack commits and trees to complete the chain back to
your refs yet.  When the higher-level project points at such a
commit, the existence of the commit is not a proof that
everything needed to complete the commit is available.

We need to prove that separately, and that was my suggestion to
run a "rev-list --objects $those-commits --not --all" in the
subproject repository, simlar to what the quick-fetch topic
does.



-

From: Junio C Hamano
Date: Wednesday, April 11, 2007 - 7:06 pm

Ah, forget about this.  The HEAD, which is in the tree of the
higher-level project, is a ref.  Silly me.

-

From: Linus Torvalds
Date: Wednesday, April 11, 2007 - 7:28 pm

Well, not entirely "silly you".

If you do a "git reset" in the superproject, that will obviously have to 
rewrite the heads in the subproject.

I do suspect that we should always enable reflogs for the subprojects, so 
that pruning is safe even for these kinds of situations, but that 
doesn't resolve all issues.

For example: to manage *cloning* of the extra stuff, you might actually 
want to have externally visible refs, and while I suspect the main 
solution will always be to just do good maintenance (ie "don't do 'git 
bisect' and _never_ rewrite history in the main superproject!!"), I don't 
think it's out of the question to add other safety nets too..

So for example, while I'm not sure it's necessary, I don't think it would 
be *wrong* if we might eventually end up having *other* safety features 
like adding a totally separate "refs/superprojects/xyzzy" ref structure. 

Or something like that.. Just to make the refs more visible both 
externally and internally, and to make it much harder to make stupid 
mistakes without realizing it.

I suspect a lot of this will depend on just how many mistakes people make. 
I don't think we've so far had a single problem with alternates files, 
re-basing, and people then pruning away objects used by other repositories 
by mistake, so maybe people really don't make those kinds of mistakes.

So maybe we don't need any extra safety nets at all. But who knows..

		Linus
-

From: Dana How
Date: Wednesday, April 11, 2007 - 4:30 pm

This seems slightly related to the hazy picture I'm forming of how
I'd like to use git at our site.  Essentially, everyone would have their
own working tree with .git directory, but .git/objects is a symlink
to a shared object repository.  How do you fully run git-fsck on this
shared object repository?  The actual heads (roots) are distributed amongst
many .git/refs directories (I suppose you could do something akin
to git-fsck $(cat /somepaths*/.git/refs/*), but that means you know
where all the repositories are).  So in this setup, maybe I'd want to run
fsck twice: the first time checking everything but not complaining about
dangling commit objects [but listing them?], and maybe a 2nd finding
all these in the users' repos [still need to know where these are].
Please note this is just a thought experiment at this point.

Anyway,  git started out with a 1:1 relationship between working tree,
index, and object repository. Various things could weaken that --
alternates, subprojects with different relationships to their object
repositories, etc. -- so special commands like git fsck which
focus mostly on the object repository may need a little tweaking eventually.

-- 
Dana L. How  danahow@gmail.com  +1 650 804 5991 cell
-

From: Linus Torvalds
Date: Monday, April 9, 2007 - 9:20 pm

This teaches the really fundamental core SHA1 object handling routines
about gitlinks.  We can compare trees with gitlinks in them (although we
can not actually generate patches for them yet - just raw git diffs),
and they show up as commits in "git ls-tree".

We also know to compare gitlinks as if they were directories (ie the
normal "sort as trees" rules apply).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

Ok, that's it for now.

NOTE NOTE NOTE! I'd like to note once more that this doesn't actually get 
you working subproject support. Not only do I need to connect up a few 
more low-level helper functions (things like "git diff" don't know how to 
generate even rudimentary "subproject X changed" patches, nor can you 
actually yet *add* subprojects), but quite apart from that low-level 
stuff, anything more high-level (like "git fetch" and friends) will need 
to know about subprojects.

In general, think of this like the early git plumbing: it's the early
"content-addressable filesystem" part. The actual SCM parts going on top 
of it are yet to be done.

I'm hoping/expecting that there are more people who have the ability and 
the interest to work on the higher-level interfaces once the core plumbing 
support is there. There's still some plumbing to be done, but after that, 
maybe more people (and maybe the SoC people) can start filling out the 
higher-level details..

Comments on the patches/approach so far?

 builtin-ls-tree.c |   20 +++++++++++++++++++-
 cache-tree.c      |    2 +-
 read-cache.c      |   35 +++++++++++++++++++++++++++++++----
 sha1_file.c       |    3 +++
 4 files changed, 54 insertions(+), 6 deletions(-)

diff --git a/builtin-ls-tree.c b/builtin-ls-tree.c
index 6472610..1cb4dca 100644
--- a/builtin-ls-tree.c
+++ b/builtin-ls-tree.c
@@ -6,6 +6,7 @@
 #include "cache.h"
 #include "blob.h"
 #include "tree.h"
+#include "commit.h"
 #include "quote.h"
 #include "builtin.h"
 
@@ -59,7 +60,24 @@ static int show_tree(const ...
From: Frank Lichtenheld
Date: Tuesday, April 10, 2007 - 1:40 am

Not that I have time right now to look up the exact context (only read
the patch), but I would've expected a "case S_IFDIRLNK:" here?

Gruesse,
-- 
Frank Lichtenheld <frank@lichtenheld.de>
www: http://www.djpig.de/
-

From: Alex Riesen
Date: Tuesday, April 10, 2007 - 4:31 am

No, the st_mode comes directly from file system. It knows nothing about
dirlinks.
-

From: Linus Torvalds
Date: Tuesday, April 10, 2007 - 7:55 am

So we have this strange (and worrying) dualism inside git: we use the same 
macros *both* for "stat data" *and* for "git-internal file modes".

So sometimes a mode is the result of a [l]stat() call like above, and then 
a gitlink is just a directory and we use S_IFDIR. And if it comes from the 
index, then it uses the internal git representation, and is S_IFDIRLNK.

I'm not very happy about it, but I'm actually most unhappy about it since 
I could imagine that the constants themselves are different on different 
OS's (eg VMS - a Unix-related OS will use the same constants for 
historical reasons).

In this particular place (index-path), we obviously not only have a stat() 
result, but more importantly, we never come here for a "normal" directory, 
since a normal directory would have been expanded into its component paths 
by the "read_directory()" logic.

So that interaction with directory expansion is somewhat non-obvious: 
normal directories are expanded recursively into the files they contain, 
while git directories end up being visible to internals as real 
directories, and are turned into gitlinks by code like the above.

		Linus
-

From: Josef Weidendorfer
Date: Tuesday, April 10, 2007 - 9:28 am

So this does mean that the SHA1 of a gitlink entry corresponds
to the commit in the subproject?

I wonder if it is not useful to be able to add some attribute(s)
to a gitlink, i.e. first reference a gitlink object in the superproject,
which then references the submodule commit, and also holds some
further attributes. These attributes can not be put into the subproject,
as it should be independent.

An example for such an attribute would be a subproject name/ID.
An argument for this: The user should be able to specify some policies
for submodules, like "do not clone/checkout this submodule". But the
path where the submodule resides in a given commit is not useful here,
as a submodule can reside at different paths in the history of the
supermodule.

Josef
-

From: Alex Riesen
Date: Tuesday, April 10, 2007 - 9:50 am

These attributes can be put into a file in superproject tree and
checked in at the same as the gitlink. No real need for introducing
another object type (right now there is no gitlink object type, just
an entry in tree with special mode).
-

From: Josef Weidendorfer
Date: Tuesday, April 10, 2007 - 10:23 am

Like... .gitattributes ? ;-)
Ok, this could work; however, there of course is the possibility of
inconsistencies when e.g. manually moving subprojects around.

How is consistency ensured for .gitattributes ?
I see that for .gitignore consistency, the user is responsible.

Josef
-

From: Linus Torvalds
Date: Tuesday, April 10, 2007 - 11:45 am

The special "link" object has come up before, and I actually thought I'd 
do it that way first, but there were a few reasons why I didn't:

 - I tend to like "minimal", and the patches I sent out really are pretty 
   minimal, in the sense that they introduce just _one_ new concept, in 
   one place (it's basically a "tree entry" - so it shows up in tree 
   reading and writing, and nowhere else. The index, of course, is the 
   staging area for trees, so the index was also affected, but that was 
   really a very direct result of that "it's a new tree entry" thing).

 - in a "link" object, the only thing that would normally *change* is 
   really just the commit SHA1. Everything else is really pretty static. 
   As such, I decided that it's just a waste of a perfectly fine object to 
   have several thousands of the "link" objects that really only differ in 
   the pointer to the commit.

 - the "static" part, which you might as well have somewhere else, tends 
   to be stuff that you would need to be able to override locally, and as 
   such it does *not* really have a global meaning that is useful 
   historically.

   For example, the things that you'd want to associate with the gitlink 
   are things like "where would I find the repository that the commit is 
   part of" and "what is a description of that submodule" and "what are 
   the relationships between the submodules". These are things that aren't 
   necessarily even totally independent: in CVS, for example, you have 
   module names that are really not submodules themselves, but are really 
   just aliases for *collections* of submodules.

   So a 1:1 link object simply wouldn't make much sense anyway, and you'd 
   want to override those defaults with site-specific ones (maybe there is 
   a "canonical" address for the submodule repository, but if you have a 
   copy of it locally on-site, when you clone, you'd rather use the 
   *local* copy over the standard site, for example).

So all of this just ...
From: Andy Parkins
Date: Tuesday, April 10, 2007 - 12:04 pm

Would it be nicer if .gitmodules were line-based to aid in merging?


Andy
-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com
-

From: Linus Torvalds
Date: Tuesday, April 10, 2007 - 12:20 pm

I seriously doubt you'll ever be merging or changing this a lot. So I 
don't think it's a huge concern.

		Linus
-

From: Junio C Hamano
Date: Tuesday, April 10, 2007 - 1:19 pm

I think Andy's comment comes from our earlier discussion on the
other in-tree configuration, .gitattributes file.

We were talking about using in-tree .gitattributes for deciding
if we apply crlf to each paths and other things like which 3-way
file-level merge backend to apply, and need to make the system
gracefully degrade even when in-tree .gitattributes have
conflict markers during a merge.  And for that purpose, it is
certainly easier to arrange "pick each line, while ignoring <<<
or === or >>>, and if there are conflicting duplicates do
something sensible about them", if the file is line oriented.

But I do not think the .gitmodules thing needs that.  If we have
conflicting (or non-conflicting for that matter) submodule
moves, that's a _MAJOR_ project re-organization, and I do not
think we would even want to automatically descend into
submodules for merging or checking-out when we have such a
situation in the higher level project.


-

From: Linus Torvalds
Date: Tuesday, April 10, 2007 - 1:33 pm

100% agreed. 

Also, note that while the ".gitmodules" (or whatever) file will be 
required to do things like "git pull", the basic tree-level logic that I 
sent out obviously doesn't need/use .gitmodules at all.

So there's a very real issue where a repository with submodules still 
"works", even with a .gitmodules file that is totally scrogged and doesn't 
have the right information (yet), it's just that it may simply not be able 
to do all the operations because it cannot figure out where to pull 
missing subproject data from etc..

So there is no reason to believe that we need to magically and 
automatically resolve conflicts - if conflicts happen, functionality is 
reduced, but it's not reduced so much that you cannot use the tree and try 
to resolve them (which is important, btw, since often before you commit 
your fix for the conflicts you'd want to *test* that fix, so we definitely 
don't want these kinds of files to be so central that it gets hard to get 
normal work done without them).

It really boils down to the same design issue: the way I think submodules 
should work is that they are very loosely coupled with the supermodule. 
The fact that the ".gitmodules" file isn't *that* critical comes largely 
from that loose coupling.

		Linus
-

From: Sam Vilain
Date: Wednesday, April 11, 2007 - 5:12 pm

Whoa... "missing" subproject data?

Surely, unless you're doing lightweight/shallow clones, if you have a
gitlink you've also got the dependent repository? Otherwise the
reachability rule will be broken.

Sam.
-

From: Martin Waitz
Date: Wednesday, April 11, 2007 - 5:35 pm

hoi :)


With submodules you actually have a natural cutting point where
you can say: no, I don't want to get that.
So for submodules the reachability rule is a little bit more relaxed.

And when you fetch the superproject you now need some way to fetch
the new submodule objects.  They may be in the same upstream repository
but it may make sense to have this configurable.

--=20
Martin Waitz
From: Linus Torvalds
Date: Wednesday, April 11, 2007 - 7:01 pm

[ Dang. Power failure in the middle of writing emails. Can't remember 
  which one was lost. Am rewriting some of this reply in abbreviated form.  ]


Absolutely. Not just subproject data. The whole subproject is often 
missing.

If I fetch the KDE superproject, I generally do *not* want every single 
subproject. In fact, I'd likely just want one or two subprojects.

The notion that all subprojects are populated is a *bug*. I would 
personally refuse to use such a setup. Even CVS can handle that just fine, 
we certainly don't want to be worse than CVS here.

If you just track a project, it's quite common to only check out the "src" 
module, and *not* fetch things like the "validation" or "test" module if 
you're just following along. 

Or you might fetch the "kdebase" module, but that sure doesn't mean that 
you want all the other ones (kdevelop source code? full kdelibs sources? 

The reachability rule *must* be breakable. That's why fsck currently 
doesn't care AT ALL.

It's much better to break that rule than to even check it! I'd rather 
leave fsck like it is now, than to *ever* fix it, if the "fix" involves 
"you have to always fetch all submodules to shut fsck up".

		Linus
-

From: Sam Vilain
Date: Wednesday, April 11, 2007 - 8:56 pm

Ok, but couldn't this be considered a variation of a lightweight checkout?

The only reason I'm worried about this is the case where the
superproject contains *thousands* of subprojects. Eg, a superproject for
all repo.or.cz projects. Say in a day 200 projects get updated with a
few commits - do you have to do 200 pulls or just one? But maybe that
problem can be solved in another way, or maybe it won't really hurt so
much in practice and still be faster/more efficient than rsync mirroring.

This is especially the case in concert with gittorrent, which will need
modifications to support sharing multiple repositories (not that that's

Well fsck can be fixed easily enough to not descend, like lightweight
checkouts.

What I really want to avoid is the situation where you can't checkout,
even though you didn't indicate a shallow/lightweight clone.

What else might this decision impact? Obviously with a smaller base you
have fewer delta targets, though that's probably not a real issue.

Sam.
-

From: Junio C Hamano
Date: Tuesday, April 10, 2007 - 1:06 pm

I personally feel that if there are cases that merge conflict is
hard to resolve, there is something wrong in the communication
between project members.  In other words, merging this *should*
be hard.

Really, if somebody wants to have project X at directory sub/X/
and somebody else wants the same at directory X/, merging the
modules file would be the least of your concern -- resulting
toplevel would not build correctly until you decide which tree
hierarchy should be picked, and later exchange of results among
project members would not be usable easily to half the people
who picked the hierarchy differently from you did.


-

From: David Lang
Date: Tuesday, April 10, 2007 - 12:41 pm

this is very similar to the problem I asked about with merging config files a 
couple weeks ago. the answer then was that when we get .gitattributes we should 
be able to specify content specific merge programs that could deal with this 
sort of thing on a per-file basis. That sounds like the answer to your concern 
as well, rather then makeing things order dependant and otherwise harder to read 
to make it able to be merged with the current tools (which assume line-based 
order-dependant content)

David Lang
-

From: Josef Weidendorfer
Date: Tuesday, April 10, 2007 - 12:29 pm

So when moving the kdelibs submodule around, you would
have to update the .gitmodules file.

I like it.

Josef
-

From: Linus Torvalds
Date: Tuesday, April 10, 2007 - 12:45 pm

Right. The assumption here is:
 - submodules almost never actually change. You might add a new one 
   occasionally, and once a decade you might do some bigger 
   re-organization, but in general it's pretty much static.
 - when you do move submodules around, it's probably a big flag-day anyway 
   (ie I would expect that it's a big reorg, and that you'd quite likely 
   expect developers to have to re-check out their tree if you did major 
   surgery).

That's certainly how it works under CVS. I bet we can make it much nicer 
than CVS, but the point is, people really don't expect submodules to be 
something that you move around very dynamically. You want to be *able* to 

The advantage with splitting things out like this is that it allows you 
much more flexibility than something automatic and deeply integrated does. 

You can still edit the modules setup even if you yourself might not even 
have that particular module checked out! That may sound insane, but it's 
actually *required* for things like "oh, the standard server for that 
module went away, I need to edit the module settings to get it from xyz 
instead".

		Linus
-

From: Sam Vilain
Date: Wednesday, April 11, 2007 - 4:47 pm

Also, in the Perl 5 Perforce conversion there are a number of
"submodules" (ie, bundled modules with their own history) that move
around a lot. In some tree representations used during the conversion
process they might even appear twice in a given tree with differing
versions.

Sam.
-

From: Linus Torvalds
Date: Wednesday, April 11, 2007 - 5:13 pm

That should actually be something that is fairly natural to handle with 
the current git submodule design - there's absolutely no problem with 
having the same subproject showing up in multiple different places in the 
tree (and each place obviously will have its own commit).

However, it causes some questions at two points:

 - What do you do in the ".gitmodules" file, where you describe the 
   submodule setup?

   This is not so much a _problem_ as a "how do you want to handle it" 
   issue.

   Would people want such a module to show up as "one module" that is just 
   visible in the tree in multiple places? Or do people prefer to think of 
   of it as completely separate modules that just happen to have the same 
   base repository?

   I don't think it's clear that one or the other is the "right way" to 
   see things, and I don't think git really should care. I suspect it's 
   more likely to be a detail that some importer script just has to 
   resolve one way or the other.

   The core git infrastructure needs to be able to have one module show up 
   in multiple places over time anyway, so I don't think there is any real 
   reason not to allow the same module to show up in multiple places even 
   within one single commit.. (Ie it's really mostly about the .gitmodules 
   file *syntax* - but if we use the config file syntax, it's actually 
   very natural to allow multiple entries for the module directory name)

   At the same time, there are reasons why you might want to consider them 
   separate modules too - maybe you want to *descibe* them separately, and 
   maybe one of the copies is used for "legacy support", and you might be 
   in a situation where you want to check out only one of the copies and 
   not the other (and thus describing them as two *different* modules 
   rather than two versions of the *same* module actually makes sense!).

   So I think this is something where we are technically neutral, but 
   where we may have non-technical ...
From: Torgil Svensson
Date: Wednesday, April 11, 2007 - 5:42 pm

I guess this file could also cover the case where the superproject is
only interested in a small subset of the subproject. For example if I
only uses some header-files in a library and want
"/lib1/src/interface" in the subproject end up as "/includes/lib1" in
the superproject. Could single files be handled in a similar way?

Although this is just an example, external links shouldn't be
specified in the same configuration file as project internal things
(which should be version-controlled). If the url configuration gets
overwritten with checkouts there will be problems bisecting if the url
changes over time.
-

From: Martin Waitz
Date: Wednesday, April 11, 2007 - 5:56 pm

hoi :)


Conceptionally this information would have to be part of the
supermodule tree (after all it changes how your tree is set up).

I think it makes more sense to make users think about which part
of their tree can be reused and make them choose submodule boundaries

Most of the time we may not need to add any per-submodule URL
information anyway.  If you fetch a new supermodule version, you
can get the new submodule from the same source (or from a per-submodule
source which can be determined by looking at and munching the supermodule U=
RL).

--=20
Martin Waitz
From: Torgil Svensson
Date: Thursday, April 12, 2007 - 2:23 pm

I agree. This could be included in the module config file which in

Sometimes you can't control upstream projects the way you want it.
Also, splitting up projects for the potential need of future
superprojects has several obvious disadvantages (multiple changelogs,
versions etc). I don't see the subfolder checkout thing as a problem
since the core plumbing in Linus's implementation doesn't care what's
beneath the commit link. The subfolder checkout can "easily" be done
in a porcelain.

It's more problematic if you want to cherry-pick individual files in a
subproject. Here, I think the tight connection between links and
directories to be too restrictive. Why does a subproject commit-link
have to be represented as a folder?

//Torgil
-

From: Sam Vilain
Date: Wednesday, April 11, 2007 - 4:36 pm

I mentioned this briefly on another strand of this thread, but I think
that the simplest way to do this would be to just make refs/subproject/*
populate itself sensibly when you commit in the superproject.

I mentioned refs/subprojects/path/branch before, but I think it would
probably be the sort of thing that should be in the .git/config

Sam.
-

From: Martin Waitz
Date: Wednesday, April 11, 2007 - 1:06 am

hoi :)

thanks Linus for your nice implementation.  Your core code is so much
nicer than my hacked-up prototype :-).

I only had little time to actually have a look at it but the core is
very similiar to my approach and I'll try to rebase some of my code on
top of yours in the following days.

The only thing I disagree with you is in using HEAD of the submodule:


Always using HEAD of the submodule makes branches in the submodule
useless.

Whenever you do a checkout in the supermodule you also have to update
the submodule and this update has to change the same thing which is read
above.
Updating the branch which HEAD points to is dangerous.  You could
overwrite some unrelated branch just because the user forgot to switch
back to his supermodule-tracking-branch.  The user would always have to
make sure that all the submodules are in the correct state for an update
of the supermodule.
Updating HEAD directly is possible now and may make some sense, but you
still get problems when you want to switch to some temporary branch in
the submodule.  You have no chance to get back to the original supermodule
version and now your temporary submodule branch gets shown as the new
submodule version which should be part of the supermodule.
The submodule version which is stored in the supermodules tree is kind
of a hidden/remote reference/branch.  When working on a remote branch
we first create a local working branch and then sync it with the remote
one.  I think that it makes sense to use the same model for submodules:
have one local branch in the submodule which is used for all work that
is done in the supermodule context.

So my advice is:
Always read and write one dedicated branch (hardcoded "master" or
configurable) when the supermodule wants to access a submodule.

Then you have two type of branches:
You can branch the supermodule and have you own branch of the entire
project with all submodules.  Use this if you want to commit your
work on the submodule into the ...
From: Alex Riesen
Date: Wednesday, April 11, 2007 - 1:29 am

In this case it does not correspond to the working tree anymore.
HEAD is the "closest" to working tree of submodule.
-

From: Martin Waitz
Date: Wednesday, April 11, 2007 - 1:36 am

hoi :)


yes.

This has been discussed in length already.
Please have a look at the archives.

Your working tree now contains a complete git repository which has
features which are not available for normal files.  Notable, you
have the possibility to create branches in the submodule.
If you insist in using HEAD you throw away those submodule capabilities.

--=20
Martin Waitz
From: Alex Riesen
Date: Wednesday, April 11, 2007 - 1:49 am

I should. But at least a short summary of the reasons

In this (a very special, I believe) case, why not use git update-index
--cacheinfo?
-

From: Martin Waitz
Date: Wednesday, April 11, 2007 - 2:20 am

hoi :)


Not neccessarily, yes.

Branches in the submodule make no sense unless they are independent
=66rom supermodule branches.  And then changing to another branch in
the submodule automatically means that your current submodule working
directory should be independent to the supermodule.

git-status in the supermodule should of course warn when a submodule
is on a different branch, so that you don't accidently loose submodule

I think misunderstood each other.
For me branching is not special case.

--=20
Martin Waitz
From: Junio C Hamano
Date: Wednesday, April 11, 2007 - 2:15 am

Why?  If you are working in the parent module (e.g integration)
and notice breakage due to a bug in a submodule, it is very
plausible that you would want to cd into the directory you have
the submodule checked out, which has its own .git/ as its
repository, and perform a fix-up there, with the goal of coming
up with a commit usable by the parent project pointed at by the
HEAD of the submodule repository.  And while working toward that
goal, you will use branches, rebase, rewind or use StGIT there
in that submodule repository.  It does not forbid you from using
any of these things -- as long as you end up with a good commit
at HEAD that the supermodule can use.

Once you come up with a suitable commit sitting at HEAD of the
submodule repository, you cd up to the parent module.  Top-level
git-diff would notice that the commit recorded at the submodule
path has been updated (because you now have a good commit at
HEAD of the submodule repository, while earlier the one in your
index was a dud).

So it is not clear to me what your argument about throwing away
capabilities is.


-

From: Martin Waitz
Date: Wednesday, April 11, 2007 - 3:03 am

hoi :)


that's perfectly fine.
I only require one more thing: make sure that your commit is on
one dedicated branch (simply by merging your working/rebased/whatever
branch into the dedicated one) and not on some random one.

Again: for your above example this is not neccessary and using HEAD
would indeed be perfectly fine.

But you also have to update the submodule when you do a checkout in
the supermodule.  So what do you update?  Updating 'HEAD' is not
very concrete, please have a look at my initial mail to Linus.

What is stored in the supermodule?  It stores a reference to a specific
point in the history of the submodule.  As such I am convinced that
the right counterpart inside the submodule is a refs/heads/whatever,
and not the branch selector HEAD.
You can have other branches next to the one which is tracked by the
supermodule.  If you always update HEAD you don't have a clear

If the supermodule just updates some random submodule branch I happen to
use at the time of a supermodule pull then submodule branches are
of much lower value.
Suddenly you have to make sure for yourself that the correct branch
gets updated.
For me, different branches should be independent and I want git to
always update the correct one.

--=20
Martin Waitz
From: Junio C Hamano
Date: Wednesday, April 11, 2007 - 1:01 pm

Because 'submodule' is a project on its own, it can make
progress while the parent project is still using the stable
commit.  Think of this:

 - Your application uses product of another project as a
   library (e.g. you are doing video application and embedding
   ffmpeg).
   
 - Your 'master' commit records a commit in the library
   subproject.  Maybe library subproject declared stable 1.0 and
   that is what you used to integrate.

 - But being an independent project on its own, the library
   project can make progress, outside the context of this
   aggregated work (i.e. your application).  Next time you do:

	$ cd ffmpeg ; git fetch

   there may not be any branch that points at the exact "stable 1.0"
   commit.

When you do a "checkout -f --recurse-into-subprojects" from the
toplevel, I suspect that you would need to detach HEAD in the
subproject repository grafted in your application tree to move
it to the exact commit the toplevel project (i.e. your
application) wants, and match the working tree to that commit.
The toplevel simply should _not_ have to care what branch that
commit comes from.

-

From: Martin Waitz
Date: Wednesday, April 11, 2007 - 3:19 pm

hoi :)


yes.

But why does everybody want to detach the submodule HEAD, instead
of creating one 'special' branch which holds the commit which is
used by the supermodule?

If you then want to switch to another submodule branch you loose
the reference that comes from the supermodule.

I want to create the extra branch exactly _because_ there is
independent work going on in the submodule (or the project it is
based on).  As you can switch between detached HEAD and an
independent branch you can also switch between the 'supermodule branch'
and independent branches -- only that you can easily switch back
if you have an branch of your own.

BTW: I also think that your --recurse-into-subprojects should
be implied.
If you check out one index entry, you should be able to read it
back afterwards.  That is a nice property everyone expects from
normal files and we should try to keep that for submodules.
When checkout_entry wants to touch a submodule we can simply rewrite
the 'supermodule branch' in the submodule.  If HEAD happens to point
to it we also read-tree the submodule.
This is easy to understand and implement and I have some good experience
with this model.

--=20
Martin Waitz
From: Linus Torvalds
Date: Wednesday, April 11, 2007 - 3:36 pm

I don't think "everybody" wants it.

But the point is, *regardless* of whether you want a "detached HEAD" or 
you want a "'special' branch", you should always use HEAD to look up the 
commit, and using HEAD *allows* both (ie just make HEAD a symref to the 
'special' branch if you want that behaviour).

And if you *do* use a special branch, HEAD *must* match that special 
branch anyway, since when you commit in the supermodule, the only 

And that is entirely appropriate.

But that still means that HEAD must point to that branch (when in the 
submodule), since that branch must be the one that is checked out. If it 
isn't the branch that is checked out, normal operations like "git diff" 
etc wouldn't make sense from the supermodule.

And that is why *regardless* of whether you use a special branch or not, 
HEAD is the right thing to look up.

		Linus
-

From: Andy Parkins
Date: Wednesday, April 11, 2007 - 2:47 am

I know we've had this discussion before, but I'm going to bring it up again - 
mainly because Linus's implementation exactly matches what I envisaged when 
we originally spoke of this.  I think in your "Updating the branch which HEAD 
points to is dangerous" section, the main thing you're not taking into 
account is that git can make detached checkouts.  Updating HEAD is not 
dangerous - updating refs is; and I don't think anyone is proposing that a 
submodule ref should ever be updated by a supermodule.

I think you're also too strongly focussed on the idea that the supermodule 
tracks submodule branches - it cannot branches are not part of "the" 
repository they point at "a" repository.  References are outside the 
repository pointing in, and hence the supermodule cannot refer to them at its 
core.

Now, if you check out a revision in the supermodule, that's going to look up 
the submodule revision stored in the DIRLINK tree entry which will recurse 
into the submodule and checkout that revision - almost certainly as a 
detached HEAD.  There are three possibilities then:
 - The submodule revision is in the past and no submodule branch points at it
 - The submodule revision is current and a submodule branch points at it
 - The submodule revision is current and multiple submodule branches point at 
   it
The supermodule checkout will have to make a decision whether to update the 
submodule HEAD (in one case it's obvious: a revision in the past has to be 
detached HEAD as there is no suitable branch).  It's also possible that the 
single submodule branch case is easy - undetach HEAD; however I don't think 
that is universally correct.

I know you're very much in favour of making branches in the submodule 
correspond to branches in the supermodule, but I just don't see a way of 
making it work - the supermodule cannot know about submodule branches, 
branches are not part of the repository, they just point at the repository.  
My branches could be different from your ...
From: Martin Waitz
Date: Wednesday, April 11, 2007 - 4:31 am

hoi :)


Then we already agree on the most important part.
My argument is mostly against updating the ref which is behind HEAD, not
HEAD per se.  And I haven't thought about using detached HEADs until I

No, that may be an misunderstanding because my very first prototype
really did track branches.  In the meantime I changed my mind, my
current prototypes all track submodule commits directly.
But in doing so we create a branch of its own: remember, a branch in
git is just a moving reference into the history.  Such a reference
can be stored in .git/refs/heads or it can be stored in the index/tree of
the supermodule.  The difference is not really big.

I don't like to guess which branches to update.

That would not work, you are right.


Again, doing things conditionally here just adds to confusion.
Just have one dedicated branch and be done with it.

--=20
Martin Waitz
From: Linus Torvalds
Date: Wednesday, April 11, 2007 - 8:16 am

Well, I don't actually see much choice. HEAD is just shorthand for 

No. 

Branches in submodules actually in many ways are *more* important than 
branches in supermodules - it's just that with the CVS mentality, you 
would never actually see that, because CVS obviously doesn't really 
support such a notion.

So I'd argue that branches in submodules give you:

 - you can develop the submodule *independently* of the supermodule, but 
   still be able to easily merge back and forth.

   Quite often, the submodule would be developed entirely _outside_ of the 
   supermodule, and the "branch" that gets the most development would thus
   actually be the "vendor branch", entirely outside the supermodule. Call 
   that the "main" branch or whatever, inside the supermodule it would 
   often be something like the remote "remotes/origin/master" branch.

   So inside the supermodule, the HEAD would generally point to something 
   that is *not* necessarily the "main development" branch, because the 
   supermodule maintainer would quite logically and often have his own 
   modifications to the original project on that branch. It migth be a 
   detached branch, or just a local branch inside the submodule.

 - branches inside submodules are *also* very useful even inside the 
   supermodule, ie they again allow topic work to be fetched into the
   submodule *without* having to actually be part of the supermodule,
   or as a way to track a certain experimental branch of the supermodule.

   I suspect that most supermodule usage is as an "integrator" branch, 
   which means that the supermodule tends to follow the "main 
   development", and the whole point of the supermodule is largely to have 
   a collection of "stable things that work together". 

   In contrast, branches within submodules are useful for doing all the 
   development that is *not* yet ready to be committed to the supermodule, 
   exactly because it's not yet been tested in the full "make World" kind 

I ...
From: Sam Vilain
Date: Wednesday, April 11, 2007 - 3:49 pm

To discuss this detail, what about keeping refs, such as
refs/submodules/branch/path/* (or some other convention) which are
updated on commit? Then you can also easily clone just the submodule.

Sam.
-

From: Martin Waitz
Date: Wednesday, April 11, 2007 - 4:54 pm

hoi :)


I fully agree with you about the importance of submodule branches.
In fact, I want to make them even more important and useable!



I fully agree.

If you use a detached HEAD then you can no longer switch back to it
once you used some other (independent) branch (for testing or whatever).
This is my main argument: If you just update some 'special'
refs/heads/from-supermodule (or whatever, maybe get it from
=2Egitmodules/config) you can still switch between branches, making them
more useful IMHO.

If we create some other way to easily get to the commit referenced by
the index of the supermodule then a detached HEAD is ok for me, too.
But why create two things (this not-yet-existing way to get the
supermodule index entry, plus submodules HEAD) for the same thing?
Why not simply create a new refs/heads/whatever?

Fully agree.

Please don't confuse my "I always want to use one dedicated branch" with
"I always want to use one special branch from the submodule project".
This refs/heads/whatever I am talking about is _purely_ for ease of
use of the submodule inside the supermodule.  It is in no way linked
to the branchnames that are used by the submodule project.


So you now have this nice "my-integration" branch lying next to other
independent (not-supermodule-related) branches.
If you want to _switch_ to one of these unrelated branches you obviously
have to change HEAD, and suddenly your unrelated branches are
considered to be part of the supermodule (ok, not yet part of its
index of course, but now all supermodule operations would work on
this unrelated branch).

I want to preserve these unrelated branches and see them as a strong
feature.  Branches in submodules should be independent from the
supermodule _because_ the supermodule has no notion of which branch

Only that you loose your nice detached HEAD view once you start using

In terms of flexibility it is important what you can do with the
submodule.  Being able to use branches just like in a ...
From: Brian Gernhardt
Date: Wednesday, April 11, 2007 - 6:57 pm

Why can't can't we extend checkout with an option to look for an  
enclosing git project, find the gitlink in the index, and check out  
that commit?  That allows you to return to the original state without  
needing to bother with new special branches.

And instead of recording the path in a .gitmodules file, why not a  
list of git directories we search for the commit?  Allows moving of  
subprojects without suddenly breaking configuration files.  When we  
find the appropriate git dir, we can use a .gitlink file or symlinks  
to attach the directory to it's repository.

I dislike moving git in the direction of enforcing more policy  
instead of less, and of making it less capable of handling content  
movement instead of more.

~~ Brian
-

From: Josef Weidendorfer
Date: Thursday, April 12, 2007 - 8:12 am

The supermodule checkout could create a .git/SUPER_HEAD for this.
OK, that is a special kind of reference.

Or introduce "git --super ..." with works with the superproject.
Form a submodule directory, a "git --super checkout ." could reset the
submodule checkout. 

Josef
-

From: Linus Torvalds
Date: Monday, April 9, 2007 - 9:46 pm

Here is, for your enjoyment, the last patch I used to actually test this 
all. I do *not* submit it as a patch for actual inclusion - the other 
patches in the series are, I think, ready to actually be merged. This one 
is not.

It's broken for a few reasons:

 - it allows you to do "git add subproject" to add the subproject to the 
   index (and then use "git commit" to commit it), but even something as 
   simple as "git commit -a" doesn't work right, because the sequence that 
   "git commit -a" uses to update the index doesn't work with the current 
   state of the plumbing (ie the

	git-diff-files --name-only -z |
		git-update-index --remove -z --stdin

   thing doesn't work right.

 - even for "git add", the logic isn't really right. It should take the 
   old index state into account to decide if it wants to add it as a 
   subproject. 

so this patch really isn't very good, but it allows people who are 
interested to perhaps actually test something. For example, my test repo 
was actually created with this:

	[torvalds@woody superproject]$ git log --raw
	commit 649ad968bdd79cb3b0f50feb819b7e9b134d3a1a
	Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
	Date:   Mon Apr 9 21:36:53 2007 -0700
	
	    This commits the modification to sub-project B
	
	:160000 160000 5813084832d3c680a3436b0253639c94ed55445d 17d246a35f27a46762328281eb6e9d4558f91e9d M      sub-B

	commit f3c55ffcc000a8c0fecc6801e8909d084e3d419e
	Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
	Date:   Mon Apr 9 16:12:29 2007 -0700
	
	    Superproject with two subprojects
	
	:000000 160000 0000000... c0daf4c85d48879ab450a6a887bbb241eb0de00a A    sub-A
	:000000 160000 0000000... 5813084832d3c680a3436b0253639c94ed55445d A    sub-B

	commit 45eb14edb43b10e3d3ac7a495a1ec861e85dc36f
	Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
	Date:   Mon Apr 9 15:36:24 2007 -0700
	
	    Add top-level Makefile for super-project
	
	:000000 100644 0000000... 57e8394... A  ...
From: Alex Riesen
Date: Tuesday, April 10, 2007 - 6:04 am

The other thing which will be missed a lot (I miss it that much)
is a subproject-recursive git-commit and git-status.
It is very possible that the default should be different for
the git-commit and git-status: git-commit is likely to have it
off whereas git-status will very much depend on how fast
the usual response is (or wished for). An integrator on very fast
machine may like it on for both, a subproject developer can have
it off for both (to avoid accidental commits and generally being
not interested in anything besides his code), an occasional person
can have the status defaulting to on and commit to off - to avoid
accidental commits in subprojects which are just tracked.

A separate config option and a command-line switch, probably.
-

From: Linus Torvalds
Date: Tuesday, April 10, 2007 - 8:13 am

Note that I was definitely planning on adding them too, but they are at a 
higher level. 

So the long-term plan is/was to add a flag to "git diff" (and "git 
ls-tree" etc) to say "recurse into subprojects".

You cound perhaps even make that flag the default with some .git/config 
option, if your superproject is small enough.

But this series of 6 (and the seventh ugly hack) is literally meant for 
just the really core object-handling stuff, and even there it's not really 
complete.

For example, you cannot even clone a superproject yet, simply because 
git-upload-pack doesn't know that it's not supposed to follow the gitlink 
things etc. So there's a lot of details left even for the really *core* 
stuff, but I wanted to post the series of six patches because those six 
patches are actually enough to reach the point where you can start looking 
at individual problems (like "git upload-pack") and fix them 
incrementally.

So I'd like this to be merged somewhere, not because "it works" or "it's 
complete", but because it's in a shape where I think a lot of people can 
start fixing small details. 

For example, with just two smallish updates:
 - teach "git upload-pack" not to try to follow gitlinks
 - teach "git read-tree" to check out a git-link as just an empty 
   subdirectory
you should already be pretty close to being able to clone a superproject. 
You'd still have to clone the subprojects one-by-one manually, and that 
would be more of a porcelain'ish issue to teach git clone to fetch 
submodules too (with some ".gitmodules" file that contains the rules for 
that!)

But no, I didn't do any of that. I literally did just the "tree object 
format change" to support the *notion* of gitlinks - not all the pieces to 
then actually *implement* the notion are done by a long shot.

I think everybody agrees that we need some kind of subproject support, and 
the KDE repository certainly shows that subprojects need to be truly 
independent (because if they aren't, you end ...
From: Alex Riesen
Date: Tuesday, April 10, 2007 - 8:48 am

It is already "merged somewhere": as soon as the patches left landed
on vger, it is not possible to loose (and even destroy) them.

which also should fix switching between the branches with subprojects.
-

From: Linus Torvalds
Date: Tuesday, April 10, 2007 - 9:07 am

Well, unless it hits something like Junios 'pu' (or 'next') branch, or 
somebody (like you?) ends up maintaining a repo with this, it's just 
unnecessarily hard to have lots of people working together on it..

I'm obviously interested in working on it, but at the same time, I don't 
expect to be a primary *user* of it, so I'm hoping others will come in and 
start looking at it.

It looks promising that you're getting involved, but I suspect you may be 
a bit too optimistic when you say "just too much sought after". We've been 
*talking* about subprojects for a long long time, and we've had other 

Yes. It would require either git-read-tree or the git-checkout script 
around it knowing to then also check out the subproject branches.

It's actually not *entirely* obvious what you should do when you switch 
branches (or even just do a "git reset --hard") in the superproject. The 
branches in the subprojects are likely to be totally different from the 
superproject, so as far as I can see, you end up having two choices when 
you reset a subproject:

 - either basically create a "disconnected HEAD" in the subproject(s) when 
   you switch them around as a consequence of resetting/switching the 
   branch in the superproject.

 - or you'd stay on the same branch in the subproject, and just reset that 
   branch..

 - or you describe the branch name in the ".gitmodules" file in the
   superproject, and use whatever branch in the submodule that is 
   described in the supermodule that you reset/check-out.

 - or possibly other policies.

So there is bound to be various "policy" issues like this worth sorting 
out. I don't think they matter that deeply.

I would _personally_ tend to like the notion of using ".gitmodules" in the 
supermodule to describe things like this, exactly because it's a policy 
decision - not something that git itself should really decide about, but 
that the supermodule maintainers can just decide to agree on.

But I haven't really even thought ...
From: Alex Riesen
Date: Tuesday, April 10, 2007 - 9:43 am

The people who need the feature are still using other VCS.
Some do not even know about git, the others are more interested
in their own projects than in hacking on git (like KDE or Ubuntu
people). And then there are commercial projects with thirdparty
libraries, components or data. The other VCS' provide the feature,
even if they do it wrong and badly (I never could go back in time in my
day-work project, always asked myself what was the point of using
Perforce at all).
So, I suspect it is the people who are unable or unwilling
to contribute to git (to anything, really) who need the feature most.
-

From: Junio C Hamano
Date: Tuesday, April 10, 2007 - 12:32 pm

Well, I was planning to apply this directly on 'master' after
giving them another pass.

-

From: Linus Torvalds
Date: Tuesday, April 10, 2007 - 1:11 pm

Goodie. I gave them another pass myself, and noticed a small leak and a 
stupid copy-paste problem, fixed thus..

		Linus

---
diff --git a/read-cache.c b/read-cache.c
index 8fe94cd..f458f50 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -279,7 +279,7 @@ int base_name_compare(const char *name1, int len1, int mode1,
 	c2 = name2[len];
 	if (!c1 && (S_ISDIR(mode1) || S_ISDIRLNK(mode1)))
 		c1 = '/';
-	if (!c2 && (S_ISDIR(mode2) || S_ISDIRLNK(mode1)))
+	if (!c2 && (S_ISDIR(mode2) || S_ISDIRLNK(mode2)))
 		c2 = '/';
 	return (c1 < c2) ? -1 : (c1 > c2) ? 1 : 0;
 }
diff --git a/refs.c b/refs.c
index 229da74..11a67a8 100644
--- a/refs.c
+++ b/refs.c
@@ -229,6 +229,7 @@ static int resolve_gitlink_packed_ref(char *name, int pathlen, const char *refna
 	if (!f)
 		return -1;
 	read_packed_refs(f, &refs);
+	fclose(f);
 	ref = refs.packed;
 	retval = -1;
 	while (ref) {
-

From: Junio C Hamano
Date: Tuesday, April 10, 2007 - 1:52 pm

By the way,...

People occasionally ask "how would I make a small fix to a
commit that is buried in the history", so let me take a moment
to give them a recipe.

Let's say while reviewing the code after applying all of the
6-series, you noticed the above thinko.  First find out which
commit caused it:

$ git checkout lt/gitlink
$ git blame -L229,+7 master.. -- refs.c
b60108a1 (Linus Torvalds 2007-04-09 21:14:26 -0700 229) 	if (!f)
b60108a1 (Linus Torvalds 2007-04-09 21:14:26 -0700 230) 		re..
b60108a1 (Linus Torvalds 2007-04-09 21:14:26 -0700 231) 	read_packe..
b60108a1 (Linus Torvalds 2007-04-09 21:14:26 -0700 232) 	ref = refs..
b60108a1 (Linus Torvalds 2007-04-09 21:14:26 -0700 233) 	retval = -..
b60108a1 (Linus Torvalds 2007-04-09 21:14:26 -0700 234) 	while (ref..
b60108a1 (Linus Torvalds 2007-04-09 21:14:26 -0700 235) 		if..

The commit to fix is b60108a1 (this is what I have in my private
repo, and I'll be rebuilding the series with this example, so
you will never see this commit object name in the end result
I'll be pushing out).  So I detach the HEAD at that commit and
make a fix:

$ git checkout b60108a1
$ edit refs.c
$ git diff; # just to make sure
$ git commit -a --amend

At this point, the detached HEAD and the original branch look
like this:

$ git show-branch lt/gitlink HEAD
! [lt/gitlink] Teach core object handling functions about gitlinks
 * [HEAD] Add 'resolve_gitlink_ref()' helper function
--
 * [HEAD] Add 'resolve_gitlink_ref()' helper function
+  [lt/gitlink] Teach core object handling functions about gitlinks
+  [lt/gitlink^] Teach "fsck" not to follow subproject links
+  [lt/gitlink~2] Add "S_IFDIRLNK" file mode infrastructure for git links
+  [lt/gitlink~3] Add 'resolve_gitlink_ref()' helper function
+* [HEAD^] Avoid overflowing name buffer in deep directory structures

We fixed lt/gitlink~3 and the fixed-up commit is at HEAD.  We
want to rebase the rest of lt/gitlink on top of HEAD, like this:

$ git rebase HEAD lt/gitlink

This will ...
From: Nicolas Pitre
Date: Tuesday, April 10, 2007 - 2:03 pm

This is definitively good Documentation/howto/ material.


Nicolas
-

From: J. Bruce Fields
Date: Sunday, April 15, 2007 - 4:21 pm

There's actually something similar already in "modifying a single
commit" in the "user manual":

http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#id276844

But it uses a throw-away branch instead of the detached head, and uses
rebase --onto instead of rebasing and then --skip'ing.

--b.
-

From: Sam Ravnborg
Date: Tuesday, April 10, 2007 - 2:02 pm

That recipe looks ummm complicated...
What I usually do is:

git format-patch HEAD~4..HEAD
git reset --hard HAED~4
patch -p1 < 0004*
...edit...
delete diff from 0004*
git diff >> 0004*
git reset --hard
git am 000*


Maybe this is as complicated as your example but this
is very simple to deal with.
And I do not destroy history or anything.

But that said I do not use topic brances but simply
clone my local repository as needed.
And I always deal with a linear history.


[I post this mostly to check if this is insane
and I need to understand the way you propose to do stuff]

	Sam
-

From: Junio C Hamano
Date: Tuesday, April 10, 2007 - 2:27 pm

It's really the same.  You keep 000* file, I keep them in the
original branch and have "git rebase" take care of the details.

-

From: Martin Waitz
Date: Wednesday, April 11, 2007 - 1:32 am

hoi :)


git-status should really point out if a subproject has any changes,
as it does for files.  Only that a submodule may have more types of
possible changes: has new commits which are not yet in the supermodule
index, has an dirty index of its own, dirty working directory.

But for commit it really does not make any sense.  The commit in the
submodule is totally independent to the commit in the supermodule.
You'd want the the submodule commit message to not refer to any
supermodule stuff (as you likely want to reuse the submodule in other
supermodules), while the supermodule commit is much more high-level and
only records that the submodule got changed.

When viewed from the supermodule, a submodule is just part of its tree,
just as normal files.  So a submodule commit is conceptually similiar to
changing a file, and you don't change files while you commit, also ;-).

--=20
Martin Waitz
From: Alex Riesen
Date: Wednesday, April 11, 2007 - 1:42 am

Only if I want it to. HEAD change check (which is cheap enough

Right. Perhaps not a commit in submodule but a recursive check
for working directory changes in submodules. So that you can
make that you don't make a superproject commit which cannot
be resolved to what you had in all the working directories:

  git commit -a --check-clean-subprojects
-

From: Martin Waitz
Date: Wednesday, April 11, 2007 - 1:57 am

hoi :)


Yes, that's the equivalent of checking normal files.

For -a such a check may even make sense unconditionally.
And without -a I don't see any value in such a check.
So we can just add that check to -a if we see that dirty submodules
are a problem for users.

--=20
Martin Waitz
Previous thread: sscanf/strtoul: parse integers robustly by Jim Meyering on Monday, April 9, 2007 - 4:01 pm. (3 messages)

Next thread: [PATCH 12/10] validate reused pack data with CRC when possible by Nicolas Pitre on Monday, April 9, 2007 - 9:15 pm. (1 message)