login
Header Space

 
 

Re: [RFC] Submodules in GIT

Previous thread: [PATCH] Trim hint printed when gecos is empty. by Han-Wen Nienhuys on Tuesday, November 28, 2006 - 6:27 am. (2 messages)

Next thread: Re: [PATCH] Trim hint printed when gecos is empty. by Jakub Narebski on Tuesday, November 28, 2006 - 7:15 am. (7 messages)
To: <git@...>
Date: Tuesday, November 28, 2006 - 6:50 am

&gt; branch to track in the submodule?
To: <git@...>
Date: Tuesday, November 28, 2006 - 9:35 am

The reason I thought it would have to be HEAD at all times, is to prevent 
situations where the supermodule commit doesn't reflect the state of the 
current tree.

Let's imagine that we're doing non-HEAD tracking in the supermodule.
  supermodule
   +-------- libsubmodule1
   +-------- libsubmodule2
So, you do a "make" in supermodule; this of course will call make in each of 
the submodules.  You test the output and find that it's all working nicely.  
Time for a supermodule commit.  We want to freeze this working state.  You 
commit and tag "supermodule-rc1"

Unfortunately, during development, you've switched libsubmodule1 to 
branch "development", but supermodule isn't tracking libsubmodule1/HEAD it's 
tracking libsubmodule1/master.  Your supermodule commit doesn't capture a 
snapshot of the tree you're using.

Now you say to the mailing list "hey guys, can you test "supermodule-rc1"?  
They check it out, and find that everything is broken.  Why?  Because what 
you wanted to check in was libsubmodule@development, but what actually went 
in was libsubmodule@master.

I think I've talked myself into the position where it definitely has to be 
HEAD being tracked in the submodules; anything else is a disaster waiting to 
happen because commit doesn't check in your current tree.



Andy

-- 
Dr Andy Parkins, M Eng (hons), MIEE
andyparkins@gmail.com
-
To: Andy Parkins <andyparkins@...>
Cc: <git@...>
Date: Wednesday, November 29, 2006 - 12:03 pm

hoi :)


The way I wanted to address this is to show in the supermodule
git-status that the submodule is using another branch.
That way you are warned and can decide not to commit the supermodule.

I implemented tracking of refs/heads/master (not HEAD) without much
thinking, and only recently began to think about possible problems with
this approach.

But I think it is an important design decision to take, so I'd like to
have consensus here.

Pro HEAD:
 - update-index on submodule really updates the supermodule index with
   a commit that resembles the working directory.
Contra HEAD:
 - HEAD is not garanteed to be equal to the working directory anyway,
   you may have uncommitted changes.
 - when updating the supermodule, you have to take care that your
   submodules are on the right branch.
   You might for example have some testing-throwawy branch in one
   submodule and don't want to merge it with other changes yet.

Pro refs/heads/master:
 - the supermodule really tracks one defined branch of development.
 - you can easily overwrite one submodule by changing to another branch,
   without fearing that changes in the supermodule change anything
   there.
Contra refs/heads/master:
 - after updating the supermodule, you may not have the correct working
   directory checked out everywhere, because some submodules may be on a
   different branch.
 - there is one branch in the submodule which is special to all the other.

I think that most of the disadvantages of refs/heads/master can be
solved by printing the above-mentioned warning in git-status when the
submodule is using another branch (similiar to the
planned-but-not-implemented warn if the submodule has uncommited
changes).

I don't yet know how to cope with tracking HEAD directly, so I'm still
in favor of tracking refs/heads/master, as already implemented.

--=20
Martin Waitz
To: <git@...>
Date: Wednesday, November 29, 2006 - 4:00 pm

The problem I see with tracking a particular branch is that it makes it less 
convenient to use git's quick-branching features in the submodules.  Let's 
say I want to try something out quickly in a submodule, I make a branch, 
commit, commit, "hmm, looks good, let's snapshot it in the supermodule", make 
a supermodule branch, "oh no, I've got to tell the supermodule to track the 
new (but temporary) branch in the submodule do a commit, switch the submodule 
branch back to master, delete the temporary branch, remember that the 
supermodule is tracking that branch and tell the supermodule to track 

Ouch.  Why does the submodule need to update the supermodule index?  That 
should be done by update-index in the supermodule.   Further, how is the 
supermodule index going to represent working directory changes in the 
submodule?  The only link between the two is a commit hash.  It has to be 
like that otherwise you haven't made a supermodule-submodule, you've just 
made one super-repository.  Also, if you don't store submodule commit hashes, 
then there is no way to guarantee that you're going to be able get back the 

That's the case for every file in a repository, so isn't really a worry.  It's 
the equivalent of changing a file and not updating the index - who cares?  As 
long as update-index tells you that the submodule is dirty and what to do to 

What is the "right" branch though?  As I said above, if you're tracking one 
branch in the submodule then you've effectively locked that submodule to that 
branch for all supermodule uses.  Or you've made yourself a big rod to beat 
yourself with everytime you want to do some development on an "off" branch on 


You can always do that anyway by simply not running update-index for the 

This seems like the biggest problem to me - doesn't this negate all the 
advantages of a submodule system?  After a check in, you have no idea if what 
you checked in was what was in your working tree.


Andy

-- 
Dr Andrew Parkins, M Eng (Hons), AM...
To: Andy Parkins <andyparkins@...>
Cc: <git@...>
Date: Thursday, November 30, 2006 - 1:06 pm

hoi :)


What about:
You decide to try something out quickly and create a new branch in the
submodule. After you have verified that it works, you merge it to the
submodules master branch and commit that to the supermodule.
Not that complicated, isn't it?
In fact, my current implementation does not even allow to change the

Please excuse that I am not an native english speaker and I may have

That is exactly what I wanted to say. In the supermoduel you call
update-index (with the submodule path as argument) to update the index

This is handled in the next paragraph.
The argument really is: HEAD always points to the checked out branch,

Yes, it's not a real counter-argument, but it relativates the previous



You always know which branch in the submodule is the "upstream" branch
which is managed by the supermodule.
You can easily have several topic-branches and merge updates from the
master branch.
otherwise you always have to remember which branch holds your current
contents from the supermodule.

When viewed from the supermodule, you are storing one branch per

Suppose you are working on a complicated feature in one submodule.
You create your own branch for that feature and work on it.
Now you want to update your project, so you pull a new supermodule
version. Now this pull also included one (for you unimportant) change
in the submodule.
I think it is more clear to update the master branch with the new
version coming from the supermodule, while leaving your work intact
(you haven't commited it to the supermodule yet, so the supermodule
should not care about your changes, it's just some dirty tree).
Then you can freely merge between your branch and master as you like and
are not forced to merge at once. And perhaps you even do not want to
merge at all, because you are on an experimental branch which really is

Of course you know: git-status will tell it.
This is no different to today, where you can commit while still leaving
a part of the tree dirty.

--=20
Martin W...
To: <git@...>
Date: Friday, December 1, 2006 - 5:02 am

WHAT?  I've got to make merges (that I don't necessarily want) in order to 
commit in the supermodule?  This completely negates any useful functioning of 
branches in the submodule.  I want to be able to make a quick development 
branch in the submodule and NOT merge that code into master and then be able 
to still commit that in the supermodule.

I think you're imagining the binding between the super and sub is very much 
tighter than it should be.  What if I'm working on a development version of 
the supermodule, which includes a stable version of the submodule?  Vice 

That prevents me "trying something out" on a topic branch in the submodule.  
Here's a scenario using my suggested "supermodule tracks submodule HEAD" 
method.

 * You're developerA
 * Make a development branch in the supermodule
 * In the submodule, make a whole load of topic branches
 * Make a development branch in the submodule
 * Merge the topic branches into the development branch of the submodule
 * Commit in the supermodule.  This capture
 * Tag that commit "my-tested-arrangement-of-submodule-features"
 * Push that tag to the central repository - tell the world.
 * DeveloperB checks out that tag and tries it.  Great stuff.

Now: here's the secret fact that I didn't tell you that will break 
your "supermodule tracks submodule branch" method.  DeveloperB has decided to 
have this in his remote:
  Pull: refs/heads/master:refs/heads/upstream/master
Oops. The supermodule, which has been told to track the "master" branch in the 
submodule is tracking different things in developerA's repository from 
developerB's repository.  Worse, what if developerB did this:
  Pull: refs/heads/master:refs/heads/development
  Pull: refs/heads/development:refs/heads/master

Branches are completely arbitrary per-repository.  You cannot rely on them 
being consistent between different repositories.  If you store the name of a 
submodule branch in a supermodule - that supermodule is only valid for that 
one special case of yo...
To: Andy Parkins <andyparkins@...>
Cc: <git@...>
Date: Friday, December 1, 2006 - 7:00 am

hoi :)


exactly!

Please think about it.

If you track HEAD, then this means that you track HEAD.
In _both_ directions!

So you not only store your submodule HEAD commit in the supermodule when you
do commit to the supermodule, it also means that your submodule HEAD
will be updated when you update your supermodule.
And what happens if you already commited something to HEAD in the mean
time? Exactly: a merge is needed.

And you are right: you might not want to do this now, because you
branched off, because you _wanted_ to have some development which is
_independent_ to the current supermodule work.

So tracking HEAD really makes branching in the submodule hard to work
with.

What does the supermodule provide to the submodule? It stores one
reference to a commit sha1. Just like a reference inside refs/heads
inside the submodule. There really is not much difference between the
sha1 stored inside the supermodules tree and one stored inside refs/.
So from the submodules point of view, the supermodule is not much more
then one special branch.
But it is not possible to use the supermodule index directly as one
"magic" branch for several reasons.
So we need synchronization methods between the index entry for the
submodule which is stored in the supermodule and the references in the
submodule. These are git-update-index/git-commit and git-checkout, both
called explicitly or implicitly in the supermodule.
And I really think it makes sense to have a one-to-one relationship
between the submodule "branch" stored in the supermodule and the


This is still supposed to be a distributed system.
DeveloperB does not only check out the whole project including several
modules. He is also supposed to _work_ with it.

What if DeveloperB also has several topic branches?
When he checks out the new supermodule, only his current HEAD in the
submodule will be updated.
So he first has to change to some supermodule-tracking branch inside the
submodule, then pull the supermodule updates, then eve...
To: <git@...>
Date: Friday, December 1, 2006 - 8:09 am

Martin Waitz wrote:

Why the magic? The typical workflow in git is

1. You work on a branch, i.e. edit and commit and so on.
2. At some point, you decide to share the work you did on that branch 
(e-mail a patch, merge into another branch, push upstream or let it by 
pulled by upstream)

I fail to understand why these two steps have to be mixed up. Someone 
care to explain?

Regards

Stephan

-
To: sf <sf@...>
Cc: <git@...>
Date: Friday, December 1, 2006 - 8:12 am

hoi :)


3. Other people want to use your new work.

--=20
Martin Waitz
To: <git@...>
Date: Friday, December 1, 2006 - 9:05 am

Sorry, if that was not obvious: You actually procceed with one of the 
options I listed in Step 2. What I wanted to state is that with git you 
do not mix up committing (which is local to your repository and your 
branch) and publishing.

Regards

Stephan

-
To: sf <sf@...>
Cc: <git@...>
Date: Friday, December 1, 2006 - 9:35 am

hoi :)


I guess you are refering to not mix up committing to the submodule and
updating the supermodule index.
These are really two separate steps, I just combined them above because I
wanted to put emphasis on the other part: it is not a one-way flow, it
is bidirectional, so your HEAD would have to changed if the supermodule
gets updated.
And I consider changing HEAD, without looking at the branch it points
to, to be a bad thing.

--=20
Martin Waitz
To: Martin Waitz <tali@...>
Cc: <git@...>
Date: Friday, December 1, 2006 - 9:51 am

The opposite: If you work in the supermodule, even if it is in the code 
of the submodule, you only commit to the supermodule. The submodule does 

Why do you mix up supermodule and submodule? The way I see your proposal 
you cannot change submodule and supermodule independently. That is a 
huge drawback.

Regards

Stephan
-
To: Stephan Feder <sf@...>
Cc: <git@...>
Date: Friday, December 1, 2006 - 10:58 am

hoi :)


I think we are using totally different definitions of "submodule".

For me a submodule is responsible for everything in or below a certain
directory.  So by definition when you change something in this
directory, you have to change it in the submodule.
You can't change the submodule contents in the supermodule without also
changing the submodule.
This is just like you can't commit a change to a file without also
changing the file.

Then the supermodule just records the current content of the entire
tree.  The only new thing is that instead of simple files there are now

No, this is the benefit you get by introducing submodules.
Why would you want to introduce a submodule when it is not linked to the
supermodule?

--=20
Martin Waitz
To: Martin Waitz <tali@...>
Cc: <git@...>
Date: Friday, December 1, 2006 - 11:47 am

No so different. The way I see it is that "I" (meaning with submodules 
implemented as I proposed) could pull regularly from "your" repositories 
(implemented as you proposed) and work with the result (including 

But you do not consider the case where you cannot change the submodule 
because you do not own it.

For example, git has the subproject xdiff. If git had been able to work 
with subprojects as I envision, and if xdiff had been published as a git 
repository (not necessarily subproject enabled), it could have been 
pulled in git's subdirectory xdiff as a subproject. There would not have 
been a separate branch or even repository for xdiff in the git repository.

All changes to xdiff in git could have been committed to the git 
repository only. Independently, they could have been published to 
upstream and be put into the xdiff repository by its author. But the 
last part is what only the owner of the xdiff repository is able to decide.

(Ok, ok... the example sucks badly because xdiff has been massively 
changed for its usage in git so the changes would not be integrated by 
upstream. But you can imagine where you use a library essentially as is, 
only if you discover bugs you fix them immediately in your repository 
and keep those fixes in your version of the library, even on upgrade, 

There is a difference. I would say: If you commit a change to a file in 

Yes, and that is all you need. If the changes are to be part of a branch 

Because the submodule must be independent of the supermodule.

I see where you are coming from. You have one project that is divided 
into subprojects but the subprojects themselves are not independent.

What I would like to solve is the followng: You have a project X, an 
this project is made part of two other projects Y and Z (as a submodule 
or subproject or whatever you want to call it). The project X need not, 
must not or cannot care that it was made a subproject. But in projects Y 
and Z, you must be able to bugfix or extend or m...
To: Stephan Feder <sf@...>
Cc: <git@...>
Date: Friday, December 1, 2006 - 12:54 pm

hoi :)


Sorry, but with all that many people proposing things I am a bit lost
now.  Sometimes I thought you want exactly the same thing as I do,

I do not understand you here.
The submodule is part of the supermodule, and the one who sets up the
repository owns the whole thing, including all submodules, just like all
the files which are part of the project.

If you mean the upstream repository of the submodule, then yes, this is
of course completely separated from the submodule and may be owned by
someone else.  Consequently, this upstream repository of course does not

This could have been done if submodule support would have been available


Yes, but if it would have been integrated as a submodule it obviously
would have been committed to the xdiff submodule inside the git
repository.
So the changes are really part of the git repository, but you could go
to the "git/xdiff" directory and only see the changes in the submodule,


But you need to change _at_least_ one branch.
Otherwise you cannot commit to a branch.

So if you change something in a submodule, you have to change one branch
in the submodule.
If you call git-checkout in the supermodule this will result in

Of course.

So if you wanted to check out everything, you could have something like
~/src/X, ~/src/Y/X, and ~/src/Z/X.
All of these would be GIT repositories, all of them have their
independent branches.

What I am saying is just that if you update Y, and the new Y contains an
updated version of X, then ~/src/Y/X/.git/refs/heads/master will be
changed by the pull, resulting in the new version of X being checked out
in ~/src/Y/X (alongside all the other updates inside ~/src/Y).

No ;-)

--=20
Martin Waitz
To: Martin Waitz <tali@...>
Cc: <git@...>
Date: Friday, December 1, 2006 - 1:33 pm

We are in agreement about two fundamental parts of the implementation 
and their meaning:

1. A submodule is stored as a commit id in a tree object.

2. Every object that is reachable from the submodule's commit are 
reachable from the supermodule's repository.

Please confirm.


If you mean by "owns the whole thing" what I stated above in 2. the we 


That's it: There is no need for a separate branch or repository. If you 
have the subproject's commit in the superproject's object database (and 
we really have that, see 1. and 2. above), why do you _have to_ store it 

No. The xdiff submodule would only exist as part of the git repository. 
You could, f.e., access the xdiff commit in git HEAD as HEAD:xdiff// 
(again my proposed syntax). HEAD:xdiff//~2:xemit.c would give you the 
grandparent of xemit.c in the xdiff submodule. And so on. You can even 





If you mean the submodule repository created by init-module I 

Sorry, have to leave for home so I must leave that uncommented. 
Hopefully I can join in during the weekend.

Regards

Stephan
-
To: <git@...>, <sf@...>
Cc: Martin Waitz <tali@...>
Date: Friday, December 1, 2006 - 3:17 pm

I'm still not convinced about 2.  Why should any of the submodule commits be 
in the supermodule repository?  I know that is what you've implemented, but 
it still feels like too much of a blending of the submodule into the 
supermodule.

In fact, why should the submodule commits be even visible in the supermodule?  
That tree-&gt;submodule commit is sufficient; there isn't any need to view 
submodule history in the supermodule.



Andy

-- 
Dr Andrew Parkins, M Eng (Hons), AMIEE
andyparkins@gmail.com
-
To: Andy Parkins <andyparkins@...>
Cc: <git@...>, <sf@...>
Date: Friday, December 1, 2006 - 3:38 pm

hoi :)


Well, but there is a need for a common object traversal.
You need that when sending all objects between two supermodule versions
and also when you determine which objects are still reachable.

The easiest way to implement the common object traversal is to have all
objects in one object repository.

It may be possible to use two object stores and still do the common
object traversal but I do not think that gives you any benefits.
You still don't have a totally separated repository then, because
you can't do a reachability analysis in the submodule repository alone.

--=20
Martin Waitz
To: <git@...>
Date: Friday, December 1, 2006 - 5:04 pm

No you don't; when traversing the supermodule history you will come across 
trees that have submodule commit hashes in them, that is all the other end 
needs to know.  If it wants it can then connect to the submodule and clone 
submodule to submodule.  The whole operation doesn't have to be done in the 

That's true; but is it the right way?  I really really think the submodule 

There is one benefit - you can git-clone the submodule just as you would if it 
were not a submodule.  In fact, from the submodule's point of view it knows 

I'm going to guess by reachability analysis, you mean that the submodule 
doesn't know that some of it's commits are referenced by the supermodule.  As 
I suggested elsewhere in the thread, that's easily fixed by making a 
refs/supermodule/commitXXXX file for each supermodule commit that references 
as particular submodule commit.  Then you can git-prune, git-fsck whenever 
you want.


Andy
-- 
Dr Andrew Parkins, M Eng (Hons), AMIEE
andyparkins@gmail.com
-
To: Andy Parkins <andyparkins@...>
Cc: <git@...>
Date: Friday, December 1, 2006 - 5:37 pm

hoi :)


The submodule repository obviously has to able to reach all its objects.
This is easily doable with the shared object database.


I wouldn't call this "easily".

--=20
Martin Waitz
To: <git@...>
Date: Friday, December 1, 2006 - 5:54 pm

Of course it is; when you write a supermodule commit you have it's hash, 
$SUPERMODULE_HASH, you have the commit-hash of the submodule commit you're 
referencing, $SUBMODULE_HASH.  It's not really hard to do

echo $SUBMODULE_HASH &gt; 
submodule/.git/refs/supermodules/commit$SUPERMODULE_HASH

Is it?


Andy

-- 
Dr Andrew Parkins, M Eng (Hons), AMIEE
andyparkins@gmail.com
-
To: Andy Parkins <andyparkins@...>
Cc: <git@...>
Date: Friday, December 1, 2006 - 6:08 pm

hoi :)


I guess you are aware that you have to scan _all_ trees inside _all_
supermodule commits for possible references.

So what do you do with deleted submodules?
You wouldn't want them to still sit around in your working directory,
but you still have to preserve them.

--=20
Martin Waitz
To: <git@...>
Date: Saturday, December 2, 2006 - 6:04 am

No you don't; you do it as part of the appropriate normal operations.

 * supermodule commit - scan the current tree for "link" objects in the
   tree.  If you find one write the reference in the submodule.
 * adding a new submodule - if this is a new submodule there can't be any
   references in the supermodule already.
 * cloning a supermodule, every new commit that gets written in the 

Now that is a tricky one.  Mind you, I think that problem exists for any 
implementation.  I haven't got a good answer for that.


Andy

-- 
Dr Andrew Parkins, M Eng (Hons), AMIEE
andyparkins@gmail.com
-
To: Andy Parkins <andyparkins@...>
Cc: <git@...>
Date: Saturday, December 2, 2006 - 4:40 pm

hoi :)


 * removing a branch from the supermodule.
   OK, this is an infrequent operation and it can be handled by redoing
   everything.

I just don't like to duplicate information which is already available
easily.  We'd need much to many special cases, just to correctly support
reachablility analysis.

If you just keep it in a shared object repository you don't have any
problems.

Please note that it is not required to keep it in one physical location.
You can still use alternates/whatever to store some objects in another
repository, but you need to be able to access all objects from the
supermodule.

--=20
Martin Waitz
To: Andy Parkins <andyparkins@...>
Cc: <git@...>
Date: Saturday, December 2, 2006 - 9:50 am

That suggests that it is probably better to separate submodule repositories
from their checked out working trees. Why not put the GITDIRs of the submodules
in subdirectories of the supermodules GITDIR instead?

Josef
-
To: Josef Weidendorfer <Josef.Weidendorfer@...>
Cc: Andy Parkins <andyparkins@...>, <git@...>
Date: Saturday, December 2, 2006 - 4:43 pm

hoi :)


Why not simply use a shared object database instead?

You can still have an alternative to some standalone bare repository of
the submodule if you do not like to store submodule objects in the
supermodule repository.

--=20
Martin Waitz
To: Martin Waitz <tali@...>
Cc: Andy Parkins <andyparkins@...>, <git@...>
Date: Saturday, December 2, 2006 - 9:02 pm

Sure. I have no problem with this.

But can we go one step further?
AFAICS your submodules store the .git/ directories of submodules directly
at submodule position in the working tree - but you have a link .git/objects
into the object database of the supermodule.
When the user wants to delete the submodule, he would remove this .git/ directory,
too. So you loose the .git/refs of the submodule etc. I would suggest to put
the submodule .git dirs into the .git dir of the supermodule.

Josef
-
To: Stephan Feder <sf@...>
Cc: <git@...>
Date: Friday, December 1, 2006 - 2:48 pm

Let's see if I understand you correctly:

You don't want to create an additional .git directory for the submodule
and just handle everything with one toplevel .git repository for the
whole project.
Without the .git directory, you of course do not have refs/heads inside
the submodule.

So this is a different user-interface approach to submodules when
compared to my approach.  But the basis is the same and both could
inter-operate.

Now your submodule is no longer seen as an independent git repository
and I think this would cause problems when you want to push/pull between
the submodule and its upstream repository.
No technical problems, but UI-problems because now your submodule is
=20

But you could still call the "xdiff" part of the git repository a
submodule.  And then changes to the xdiff directory result in a new
submodule commit, even when there is no direct reference to it.

git-cat-file commit HEAD:xdiff already works out of the box (even
cat-file tree to get the submodule tree).  But up to now revision
parsing follows the file name only once.

What about just separating things with "/"?

commit HEAD
tree   HEAD/
blob   HEAD/Makefile
commit HEAD/xdiff
tree   HEAD/xdiff/
blob   HEAD/xdiff~2/xemit.c

this may add some confusion when used with hierarchical branches, but
it's still unique:

	refs/heads/master/xdiff/xemit.c

Just use as many path components until a matching reference is found,
then start peeling.
Or just use / between super and submodule:

	refs/heads/master:xdiff/xemit.c

I think this is easier to read then


Because it helps "normal" git operations ;-)

--=20
Martin Waitz
To: Martin Waitz <tali@...>
Cc: <git@...>
Date: Friday, December 1, 2006 - 7:34 pm

Good. For me that is the main point. As I said before the user interface
is not so important because it can be changed anytime, but to change the
object database later is close to impossible.





You can always pick a single commit or several commits out of a larger
repository and have a complete git repository.


Yes and no. You can always have branches that are only concerned with
submodules' code, say, in refs/heads/submodules/&lt;submodule&gt;/.
"submodules" here is simply an example and has not deeper meaning. You
could call it foo or whatever you like. Or you could use
refs/heads/&lt;submodule&gt;/ if it suits you.

But if you mean the submodule as seen from the supermodule, then there

Let's make certain that we understand each other. I see a clear
distinction between the submodule code in a supermodule branch (commits
in the supermodule's tree and nothing else) and submodule branches which
are independent of the superproject. Supermodule branches and submodule

The double slashes is the only way I can think of that clearly indicates
that I do not mean the contents named by the path, but the commit that
you find there. Once you have named a commit in that way, you can
continue to apply other revision naming suffixes, paths, and so on.

Let's try. What does git cat-file -p
master:dir/sub//^^^:sub/dir/sub//^:dir/file mean?

Explanation: Take branch master and go to path dir/sub. There you will
find a commit. Take its grand-grandparent and go to path sub/dir/sub
(the first sub is a subproject as well but we do not care). There you
will, again, find a commit. Take its parent and go to path dir/file
which happens to be a blob the contents of which you want to cat.

In reality you will never see these kinds of complex paths. Have you
ever seen something like git cat-file -p


Let's see. I still have to try.

Regards

Stephan
-
To: sf <sf-gmane@...>
Cc: <git@...>
Date: Saturday, December 2, 2006 - 3:46 pm

hoi :)

ts.

Sure it you are able to make it work, but it needs more work on the UI part.
How do you handle the index? How do you allow to clone only the
submodule?

I really thought about such a setup too, but then decided that it is
much easier to work with submodules when you can really see it as a

Agreed.
I think the thing which caused some discussion is that I make the
current submodule commit which is used by the supermodule available in a
refs/head in the submodule.
So there is one "branch" in the submodule which corresponds to the
version used by the supermodule, but this is just for user interface.
It's most important purpose is to give this special commit a name, so
that it can be used in merges, etc.

By selecting another refs/heads "branch" in the submodule you can also
easily detach the submodule from the supermodule.
It is really important to understand that you can't branch the submodule
alone and still have it connected to the supermodule, because the
supermodule always tracks only one commit for each submodule.
So every branch that affects the project has to be done on project
(topmost supermodule) level.
But of course the submodule can have other branches which are not
tracked by the supermodule.
So by checking out refs/heads/master (as it is used in my
implementation) you can attach the submodule to the supermodule (attach
as in: bring the working directory in sync with the whole project), and
you can detach it by selecting another refs/heads (the submodule is
still part of the supermodule, but not in the state which is currently
visible in the working directory).
This may sound confusing, but it really is the only semantic for
submodule branches that makes sense.
There are fears that you may commit something that does not match your
current working directory.  Sure, but you explicitly asked for it and I

With the current semantics, you can already get to the submodule commit
(just leave out your double slashes), but what is missing is simply to
apply ...
To: Martin Waitz <tali@...>
Cc: sf <sf@...>, <git@...>
Date: Friday, December 1, 2006 - 9:43 am

So a commit in the supermodule turns into a commit in the submodule? 
That's just plain wrong. If it doesn't, why would the submodule HEAD 
have to change?

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-
To: Andreas Ericsson <ae@...>
Cc: sf <sf@...>, <git@...>
Date: Friday, December 1, 2006 - 9:46 am

hoi :)



So how do you update your submodule?

Remember: if you git-pull in the supermodule, you want to update the
whole thing, including all submodules.

--=20
Martin Waitz
To: Martin Waitz <tali@...>
Cc: sf <sf@...>, <git@...>
Date: Friday, December 1, 2006 - 10:52 am

By committing to it separately, or by getting changes from the upstream 

Only if the new commits I pull into the supermodule DAG has commits 
which includes a new shapshot of the submodule, otherwise it wouldn't be 
necessary.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-
To: Andreas Ericsson <ae@...>
Cc: sf <sf@...>, <git@...>
Date: Friday, December 1, 2006 - 11:00 am

hoi :)


Of course.

But if the supermodule contains changes to the submodule, you still
have to change the submodule.  And this implies changing the submodule
HEAD or some branch.

--=20
Martin Waitz
To: Martin Waitz <tali@...>
Cc: sf <sf@...>, <git@...>
Date: Friday, December 1, 2006 - 12:38 pm

Not really. I fail to see why HEAD needs to be changed so long as the 
commit is in the submodule's odb.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-
To: Andreas Ericsson <ae@...>
Cc: sf <sf@...>, <git@...>
Date: Friday, December 1, 2006 - 12:57 pm

Because I want the submodule to act as a normal git repository.
Please note that I also voted against changing HEAD directly, but that
the new commit which came from the supermodule is just stored in one
branch of the submodule, as part of the supermodule checkout.

--=20
Martin Waitz
To: Martin Waitz <tali@...>
Cc: sf <sf@...>, <git@...>
Date: Friday, December 1, 2006 - 2:08 pm

You're assuming the super- and sub-module will share HEAD, or at least 
ODB, I think. I'm not convinced this is necessary. Convince me. I'll go 
drink bear and get some dancing done while you're at it ;-)

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-
To: Andreas Ericsson <ae@...>
Cc: sf <sf@...>, <git@...>
Date: Friday, December 1, 2006 - 2:51 pm

hoi :)



Get me a beer and I will convince you :-)

--=20
Martin Waitz
To: Andreas Ericsson <ae@...>
Cc: Martin Waitz <tali@...>, sf <sf@...>, <git@...>
Date: Friday, December 1, 2006 - 12:49 pm

Right. A commit in the supermodule should _not_ imply a commit in the 
submodule.

Maybe I should take a look at the code, but it sounds like people are 
still trying to "mix" submodules too much. 

Think of it this way: one common use for submodules is really to just 
(occasionally) track somebody elses code. The submodule should be a 
totally pristine copy from somebody else (ie it might be the "intel driver 
for X.org" submodule, maintained within intel), and the supermodule just 
refers to it indirectly (ie the supermodule might be the "Fedora Core X 
group" which contains all the different drivers from different people).

So anything that mixes super-modules and sub-modules too much will always 
break this kind of model.

A supermodule can never "contain changes" to a submodule. A supermodule 
would always just point to the submodule, and not have any changes 
what-so-ever of its own. The submodule is self-sufficient, and always 
contains all its _own_ changes.

		Linus
-
To: Linus Torvalds <torvalds@...>
Cc: Andreas Ericsson <ae@...>, sf <sf@...>, <git@...>
Date: Friday, December 1, 2006 - 1:14 pm

hoi :)


Yes, but it is not only about tracking, also about distributing
submodules.

One Fedora X developer fixes a bug in the intel driver, commits that to
the submodule and then updates the supermodule to the new version (by
calling "git-update-index drivers/intel &amp;&amp; git-commit" or something).  Then
another Feora X developer updates his X repository.  By pulling the
supermodule he also gets a new version of the submodule.
And this new version of the submodule is stored in a branch which can be

The supermodule always contains _the_entire_ submodule with its complete
history, so it also does contain changes.  But it does not per-se
contain changes, only indirectly (i.e. the commits in the submodule are

Yes.

--=20
Martin Waitz
To: Linus Torvalds <torvalds@...>
Cc: Martin Waitz <tali@...>, <git@...>
Date: Friday, December 1, 2006 - 1:08 pm

Linus Torvalds wrote:
...
 &gt; Think of it this way: one common use for submodules is really to just
 &gt; (occasionally) track somebody elses code. The submodule should be a
 &gt; totally pristine copy from somebody else (ie it might be the "intel 
driver
 &gt; for X.org" submodule, maintained within intel), and the supermodule just
 &gt; refers to it indirectly (ie the supermodule might be the "Fedora Core X
 &gt; group" which contains all the different drivers from different people).

Could you please be a little bit more specific about how you would store 
the "pristine copy". There seems to be some agreement to store the 
commit id of the submodule instead of a plain tree id in the 
supermodules tree object, and that all objects that are reachable from 
this commit are made part of the supermodule repository (either fetched 
or via alternates). Do you agree?

...
 &gt; A supermodule can never "contain changes" to a submodule. A supermodule
 &gt; would always just point to the submodule, and not have any changes
 &gt; what-so-ever of its own. The submodule is self-sufficient, and always
 &gt; contains all its _own_ changes.

That is one of the points Martin Waitz and I are discussing.

If I understand you correctly you cannot make any changes to the 
submodules code _in the supermodule's repository_, no bugfixes, no 
extensions, no adaptions, nothing. Do you mean that?

That would be a third alternative. In my opinion the usefulness of 
submodules would be unnecessarily restricted if it comes to the choice 
of either using the code from upstream as is or do not use submodules at 
all. What is the point of the restriction?

Regards

Stephan
-
To: sf <sf@...>
Cc: <git@...>, Martin Waitz <tali@...>
Date: Friday, December 1, 2006 - 4:13 pm

Note that it's not necessarily "pristine", since the submodule clearly is 
a local git repository in its own right. So like _any_ git repository, you 
can (and may well end up) having your own local branches in the submodule, 
with your own local modifications.

So I'm not claiming that a submodule must always match some external git 
tree 100%, and that it must be read-only or anything like that. I'm just 
saying that I suspect that quite often, one of the MOST IMPORTANT parts is 
that the submodule is really something that somebody else technically 
maintains, and that this is actually one of the _reasons_ why it is a 
submodule in the first place. 

For example, a lot of projects end up having some kind of "library 
component" as a submodule. Take something like a video player project, 
which would have something like ffmpeg as a submodule, not because you'd 
maintain ffmpeg yourself, but simply because (let's say) the library 
interface changes enough, or you need a specific version with some of your 
own fixes that haven't been released widely yet, so you want to carry all 
the libraries you need _with_ you, even though you don't really maintain 
that submodule. You at most have some small extensions of your own.

Now, in this situation, it's relaly really _important_ that the submodule 
really is totally independent of the supermodule, for several reasons.

For example, since you don't "really" own that project, carrying around 
your own fixes is really really painful. We know it happens all the time, 
and a lot of projects end up needing their own version, but the _last_ 
thing you want is to be in merge hell all the time. So as a supermodule 
maintainer, the best possible thing for you is to be able to push back 
those local changes to the original project maintainer, so that you 
_don't_ have to maintain your own changes.

But you need to realize that the real maintainer of the submodule is 
TOTALLY UNINTERESTED in your supermodule. He's not going to maintain it, 
...
To: Linus Torvalds <torvalds@...>
Cc: sf <sf@...>, Git List <git@...>, Martin Waitz <tali@...>
Date: Friday, December 8, 2006 - 2:29 pm

An implication of this is that the entire administrative
responsibility for having some super-sub module interaction
lies entirely with the supermodule.

Why not have a "glue" object at the "stub"-interface of
the supermodule tree that provides policy mappings to
the sub-modules.  Perhaps indicating git URL location,
mappings of branch names between super- and sub- modules,
special commit SHA1s, user policy or config choices at
the boundary, and things like that.

Is that the sort of direction we are headed?

jdl


-
To: Jon Loeliger <jdl@...>
Cc: Linus Torvalds <torvalds@...>, sf <sf@...>, Git List <git@...>, Martin Waitz <tali@...>
Date: Tuesday, December 12, 2006 - 4:32 am

That's a good thing. I wouldn't want the openssl maintainers to have to 
bother with every project that uses their code, and I'm fairly certain 
they feel the same.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-
To: Jon Loeliger <jdl@...>
Cc: Linus Torvalds <torvalds@...>, sf <sf@...>, Git List <git@...>, Martin Waitz <tali@...>
Date: Friday, December 8, 2006 - 2:45 pm

Not unless you have something useful in mind that could be put in
these glue objects.  URLs and branch names, in particular, should
not be stored in the repository itself, but in configuration files,
since they will be different for different copies of the repo.

skimo
-
To: Linus Torvalds <torvalds@...>
Cc: <git@...>, Martin Waitz <tali@...>
Date: Friday, December 1, 2006 - 6:35 pm

True. But if you need the changes to the submodule for your supermodule
to function, and upstream either does not want to merge your changes or
the merge will be available only after a long time, then what is the
alternative? You must be able to keep local changes, and you must be
able to keep pulling from upstream. Of course, what you describe is the
ideal case: You find a bug, push the fix upstream, and in no time at all
your fix is merged and you can just pull a new version into your

No! All you need is a naming scheme to address the commit of the
subproject that should be pulled. The extreme case would be to just
address it with its id (well, currently you cannot do that with git
pull, but that is fixable). But I already proposed a syntax for naming
commits which are "hidden" in a superproject: Just name the path as
described in git-rev-parse and append double slashes (to indicate that
you mean the commit, not the tree it contains). So no manual work needs
be done by upstream.



If you want to track some chosen submodules there are two easy solutions:

1. If you want to track their state as it appears from the supermodule's
view, pull from master:&lt;submodule&gt;//
2. If you want to track their state from their own development branches,
 pull from &lt;submodule&gt;/master


Every commit is a git tree in its own right, is it not?


I am not sure I understand what you say.

1. If you are working on a submodule, then the supermodule never enters
the picture. You are working independently. So far, so good.

2. If you are working on the supermodule, git will not be able to
function? How would you work without submodules, in which case you would

I totally agree. When I try to explain why submodules work that only
exist as part of one or more supermodules, I do not mean to say that you
cannot or should not have independent branches or repositories for the

I took that for granted: from a commit you only ever look backwards (in
time/history dimension) or downwards (in ...
To: Linus Torvalds <torvalds@...>
Cc: sf <sf@...>, <git@...>, Martin Waitz <tali@...>
Date: Friday, December 1, 2006 - 6:06 pm

If you want to allow this, you have to be able to cut off fetching the
objects of the supermodule at borders to given submodules, the ones you
do not want to track. With "border" I mean the submodule commit in some
tree of the supermodule.

This looks a little bit like a shallow clone, where you introduce
graft points at the border to some of the submodule's object DAGs.
But I am not sure that this is scalable: for supermodules with
a large number of submodules you are not interested in,
your graft file would grow very fast, as there will be new borders
with every change in some submodule, which happens to be tracked
in the supermodule.

So IMHO, instead of a huge graft file, you want to have a fast way
to check at a submodule border which submodule this given border is
going into. Then, at fetch time, you easily can decide that you do
not want to fetch any object from the submodule.
Otherwise, you would have to ask the remote end at cloning time:
"Is this commit from some submodule I am locally not interested in?"

So I think we should introduce a submodule namespaces in supermodules.
And at every border from super- to submodules, the name of the
submodule we are going into should be specified.
Which actually means that we need to introduce a "submodule" object,
and trees of a supermodule can have such submodule objects as borders
into a submodule. In a submodule object, of course we have the
SHA1 of the commit into the submodule DAG, and there would be the global
unique name we have choosen for this submodule in this supermodule.
Something like

 submodule: gcc
 commit: 6287376...

Before cloning a supermodule, you should be able to list the names of
the submodules available, and select the submodules you want to have

So in the example, "that/one/submodule" is _not_ the path of the working
tree which happens to be the root of the submodule at current supermodule
HEAD, but the unique name from the submodule namespace.

This is important, as you should be able to move th...
To: Josef Weidendorfer <Josef.Weidendorfer@...>
Cc: sf <sf@...>, <git@...>, Martin Waitz <tali@...>
Date: Friday, December 1, 2006 - 6:26 pm

No. 

I would say that it looks more like a "partial checkout" than a shallow 
clone.

A shallow clone limits the data in "time" - we have _some_ data, but we 
don't have all of the history of that data.

In contrast, a submodule that we don't fetch is an all-or-nothing 
situation: we simply don't have the data at all, and it's really a matter 
of simply not recursing into that submodule at all - much more like not 
checking out a particular part of the tree.

So if a shallow clone is a "limit in time", a lack of a module (or a lack 
of a checkout for a subtree in general - you could certainly imagine doing 
the same thing even _within_ a git repository, and indeed, we did discuss 
exactly that at one point in time) is more of a "limit in space".

		Linus
-
To: Linus Torvalds <torvalds@...>
Cc: sf <sf@...>, <git@...>, Martin Waitz <tali@...>
Date: Friday, December 1, 2006 - 6:55 pm

OK.

I still think it should be about "limit in space" regarding the
objects in the local repository.

For a project containing "gcc" as submodule, and I am not
interested in this submodule, there should be a way to not need
to fetch all the objects from the gcc submodule at clone time.


What about my other argument for a submodule namespace:
You want to be able to move the relative root path of a submodule
inside of your supermodule, but yet want to have a unique name
for the submodule:
- to be able to just clone a submodule without having to know
the current position in HEAD
- more practically, e.g. to be able to name a submodule
independent from any current commit you are on in the supermodule,
e.g. to be able to store some meta information about a submodule:
- "Where is the official upstream of this submodule?"
- "Should git allow to commit rewind actions of this submodule
   in the supermodule?" (which, AFAICS, exactly has the same
   problems as publishing a rewound branch: you will get into
   merge hell when you want to pull upstream changes into the
   supermodule)
- "Should this submodule be checked out?"
and so on.

Josef
-
To: Josef Weidendorfer <Josef.Weidendorfer@...>
Cc: sf <sf@...>, <git@...>, Martin Waitz <tali@...>
Date: Friday, December 1, 2006 - 7:30 pm

Umm? I don't get the issue. A submodule is a git repo in its own right, 
and you clone it exactly like you'd clone any other repo. It _does_ have a 
HEAD. It has it's own branches. It has everything.

So when you clone a submodule, you always get all those branches. The 
supermodule will not _point_ to them all (the branches are local to the 
submodule, and _will_ depend on things like "which upstreams module am I 
tracking"), but they'll have to be there, exactly _because_ the submodule 
has an existence and is tracked on its own.

In the trivial case where the submodule doesn't even _have_ any external 
existence at all (ie it's always maintained as _just_ a submodule, it 
would probably tend to have just one branch, and a clone would get 
whatever that branch is), but that's just a degenerate special case of the 

The current commit within the supermodule would be _totally_ invisible to 
the submodule.

Of course, if HEAD _differs_ from that commit within the supermodule, then 
a "git diff" (when done from within the supermodule) should show that, but 

That's entirely a question for the submodule. You cannot ask that question 
within the confines of the supermodule, because it's not even a relevant 
question in that context. Two different supermodule repositories may well 
decide to get their submodules from difference places, just because they 
got cloned from different places (or even just for practical reasons like 
"that other site is closer to me").

So the official upstream of a submodule must NOT be encoded inside the 
supermodule (or at least not within its _objects_). Exactly because the 
upstream location is not a "global" thing - it's per-repository, and thus 
must not be encoded in the global data (ie the objects).

It should be be encoded in some _ephemeral_ place, eg in the ".git/config" 
file or in a ".git/remotes/origin"-like file (either in the supermodule or 
the submodule, and I would seriously suggest you do it within in the 
submodule itself, beca...
To: Linus Torvalds <torvalds@...>
Cc: sf <sf@...>, <git@...>, Martin Waitz <tali@...>
Date: Friday, December 1, 2006 - 8:14 pm

I just thought about the case when you want to clone a submodule directly
out of the supermodule repository, at a given realive path. And that can
be changing.

Of course, every project which happens to be submodule of some supermodule,
also can have its own repository, as it is fully independent. And then,
you of course can clone from without any knowledge of its relative position


Of course.

Yet, you need some name to store meta information of submodules
into some config file of the supermodule, like whether you want to have
it checked out (see below).

In that case, such a name for a submodule does not have to be global in

Yes. I just gave an example of a policy some project may want for submodule

Exactly. And in this list, you have to specify names.

The thing I wanted to discuss is whether such names would need to be globally
unique in the project containing submodles, or not.

If yes, it IMHO makes a lot of sense to introduce "submodule objects" which contain
these submodule names, and which are used as pointers to submodule commits in
supermodule trees.

Josef
-
To: Josef Weidendorfer <Josef.Weidendorfer@...>
Cc: sf <sf@...>, <git@...>, Martin Waitz <tali@...>
Date: Friday, December 1, 2006 - 8:33 pm

Yes, you do need to have a list of submodules somewhere, and you'd need to 
maintain that separately. One of the results of having the submodules be 
independent from the supermodule is that it's not all "automatically 
integrated", and thus the supermodule does end up having to have things 
like that maintained separately. 

And yes, if you screw that up, you wouldn't be able to fetch submodules 
properly etc, even if you see the supermodule, and yes, this sounds more 
like the CVS "Entries" kind of file that is more "tacked on" than really 
deeply integrated. But I think the separation is _more_ than worth the 
fact that you can see things being separate.

In fact, I'm very much arguing for keeping things as separate as possible, 
while just integrating to the smallest possible degree (just _barely_ 
enough that you can do things like "git clone" and it will fetch multiple 
repositories and put them all in the right places, and "git diff" and 
friends will do reasonably sane things).



My preference would be for it to be "local", just because (as I 
mentioned), with mirroring etc, it might well be that you want to fetch 
things from the _closest_ repository. That's really not a global decision, 

You could do it that way, and then it would be global. It would work, and 
in many ways it would probably be "simpler" on a supermodule level.

The advantage of a global namespace is that you can much more easily 
update it - "git fetch" will just fetch the new file(s) that describe the 
subprojects very naturally if they are all global. Putting them in a local 
.git/config file has it's advantages (see above), but it also makes it 
very hard to version them, and to update the list - it would have to 
become manual.

There are possibly combinations of the two approaches: have a "global 
namespace" that describes the canonical place to get the subprojects, but 
have some way to add local "translation" of the canonical names into 
locally preferred versions (eg you could just h...
To: Linus Torvalds <torvalds@...>
Cc: Josef Weidendorfer <Josef.Weidendorfer@...>, sf <sf@...>, <git@...>, Martin Waitz <tali@...>
Date: Monday, December 4, 2006 - 2:56 pm

(I wrote most of this a couple of days ago, so it's not at the tip of
the conversational tree, so to speak.  But it's effectively a response
to Linus's "what do you want to do with submodules" question, with
some thoughts on implementation.  Sorry it's so long; like Blaise
Pascal, "I would have written a shorter letter, but I did not have the
time.")

The supermodule concept, implemented right, could really improve
cooperation among embedded platform integrators, boutique distro
publishers, and other editorial contributors to sprawling metaprojects
who don't want to run kernel.org-scale mirrors.  To make this work,
you need sparse repositories (conserving resources when fetching, by
omitting the bulk of currently un-needed submodules that can reliably
be obtained later from elsewhere) and shallow cloning (conserving
resources when publishing, by referring cloners to a third-party
repository for universally available content).

For instance, it would be a wonderful thing if the pile-o-patches
nightmare that is PTXdist (and crosstool and buildtool and every other
approach I have seen for ongoing maintenance of embedded toolchains
and userlands) were obsoleted by a git supermodule.  Its submodules
would mostly track external projects, but would also logically contain
the fix-up patches worked out during platform integration, checked in
to branches anchored at each upstream release point.  The supermodule
would contain all of the build automation, log auditing, and remote
unit testing stuff, as well as the metadata for each submodule
involved in this platform build cycle.

At a content level, the sparsely populated / shallowly published
supermodule wouldn't be much different from today's PTXdist.  But the
pay-off comes when you merge forward to a new release of some base
component (compiler, library, etc.) and discover that some of your
fix-ups have been adopted or obsoleted upstream, and new fix-ups are
needed for components that depend on the updated bit, and the set of
configurabl...
To: Michael K. Edwards <medwards.linux@...>
Cc: Linus Torvalds <torvalds@...>, Josef Weidendorfer <Josef.Weidendorfer@...>, sf <sf@...>, <git@...>, Martin Waitz <tali@...>
Date: Monday, December 4, 2006 - 9:31 pm

Did you see GitTorrent?  http://gittorrent.utsl.gen.nz/  A lot of
similar ideas to what you mention.  Sorry, still no prototype :)

I'd see the submodules thing as a good way to glue together a whole
bunch of repositories, so that the core mirror servers only have to
mirror a small-ish number of repositories.

Sam.

-
To: <git@...>
Date: Saturday, December 2, 2006 - 5:27 am

Why?  You just recursively search for every "link" object in the supermodule.  
That tells you which submodules you need and where they should be.

During a supermodule clone, it can tell the client end to start a new clone 
with the correct path because it knows what the local path is at that moment.



Andy
-- 
Dr Andrew Parkins, M Eng (Hons), AMIEE
andyparkins@gmail.com
-
To: Josef Weidendorfer <Josef.Weidendorfer@...>
Cc: Linus Torvalds <torvalds@...>, sf <sf@...>, <git@...>
Date: Friday, December 1, 2006 - 7:07 pm

hoi :)


you can always have a bare repository for all used modules lying around
in some defined location.  There is no need for a unique submodule-name.

--=20
Martin Waitz
To: Linus Torvalds <torvalds@...>
Cc: sf <sf@...>, <git@...>, Martin Waitz <tali@...>
Date: Friday, December 1, 2006 - 6:41 pm

Linus Torvalds wrote:

If you do not want to fetch all of the supermodule then do not fetch the
supermodule. Instead fetch only the submodules you are interested in.
You do not have to fetch the whole repository.

Regards

Stephan
-
To: sf <sf-gmane@...>
Cc: sf <sf@...>, <git@...>, Martin Waitz <tali@...>
Date: Friday, December 1, 2006 - 7:09 pm

So why do you want to limit it? There's absolutely no cost to saying "I 
want to see all the common shared infrastructure, but I'm actually only 
interested in this one submodule that I work with".

Also, anybody who works on just the build infrastructure simply may not 
care about all the submodules. The submodules may add up to hundreds of 
gigs of stuff. Not everybody wants them. But you may still want to get the 
common build infrastructure.

In other words, your "all or nothing" approach is
 (a) not friendly
and
 (b) has no real advantages anyway, since modules have to be independent 
     enough that you _can_ split them off for other reasons anyway.

So forcing that "you have to take everything" mentality onyl has 
negatives, and no positives. Why do it?

		Linus
-
To: Linus Torvalds <torvalds@...>
Cc: sf <sf-gmane@...>, sf <sf@...>, <git@...>
Date: Saturday, December 2, 2006 - 4:12 pm

hoi :)


An interesting way to support this "only fetch some modules" use-case is
to use several supermodules.

So you could have one supermodule which is geared towards developers and
only contains the modules they use.  Another supermodule contails all
the toolchain sources.  And then there is the supermodule used for
releases which is just a merge of all the other supermodules.

The concept is so flexible that you don't have to introduce lots of
other things as module namespaces.  Just use the tools you have in a
creative way ;-)

--=20
Martin Waitz
To: Linus Torvalds <torvalds@...>
Cc: sf <sf@...>, <git@...>, Martin Waitz <tali@...>
Date: Friday, December 1, 2006 - 7:49 pm

If you need a common infrastructure to be able to work with the
submodule, then the submodule is not independent of of the supermodule.


(There have been lots of use cases for shallow clones but for a long
time git did not support them).

If you can extend this partial fetch feature to the non-subproject case
I would agree with your reasoning. What makes the subprojects so special
in this regard. Do I have to turn a plain tree into a subproject to be
able to ignore it? Once you can restrict fetches to parts of the
contents you get the ability to restrict fetches to the "common
infrastructure" and selected submodules for free.

Regards

Stephan

-
To: <sf-gmane@...>, Linus Torvalds <torvalds@...>, sf <sf@...>, <git@...>, Martin Waitz <tali@...>
Date: Saturday, December 2, 2006 - 2:57 pm

Here's an real-world example that doesn't contradict:

http://amarok.kde.org/wiki/Installation_HowTo#From_Anonymous_SVN

"svn co -N svn://anonsvn.kde.org/home/kde/trunk/extragear/multimedia
cd multimedia
svn co svn://anonsvn.kde.org/home/kde/branches/KDE/3.5/kde-common/admin
svn up amarok

To compile the sources (from the multimedia directory):"

and there's probably very few people that want to clone the entire KDE
multimedia sub&amp;super-module in this case.

//Torgil


-
To: Torgil Svensson <torgil.svensson@...>
Cc: <sf-gmane@...>, sf <sf@...>, <git@...>, Martin Waitz <tali@...>
Date: Saturday, December 2, 2006 - 3:41 pm

And I'll add the note that people who do things like submodules aren't 
generally even _used_ to them being "seamless", and most of the time 
probably don't even want complete seamlessness.

As the example that Torgil points to shows, people are quite used to 
actually even naming the submodules separately, and things like having the 
"default" set of submodules not equal the "complete" set. 

In other words, I don't think people expect or want something hugely more 
complicated than the CVS/modules kind of file. 

What people _do_ want (and that CVS in general is horribly bad at, and 
this is not a module-specific issue) is to have the _versioning_ work 
well. When you check out a specific version of a module, you want any 
_linked_ modules to follow along too.

This is the same reason why CVS users use tags a lot: because even 
_within_ a single project (no modules, no nothing), it's often hard to 
re-create the exact state of a version any other way. So you tag every 
single file and do insane things like that, because CVS just isn't very 
good at guaranteeing consistency across the whole project.

The exact same thing is true about subprojects. I don't think that people 
who have used CVS subprojects a lot really mind the CVS/modules file 
itself (but hey, maybe I'm wrong - I've seen _other_ people maintain 
modules in CVS, but I've never done it myself), but they do mind the fact 
that it's hard as hell to do something as simple as "get all modules back 
to version X" without lots and lots of careful crud (ie tagging every 
singl emodule, things like that).

Now, I'm not exactly sure who wants to use git modules, so this is the 
time to ask: did you hate the CVS/modules file? Or was it something you 
set up once, and then basically forgot about? People clearly use the 
ability to mark certain modules as depending on each other, and aliases to 
say "if you ask for this module, you actually get a set of _these_ 
modules".

_I_ suspect that that isn't the problem peopl...
To: Linus Torvalds <torvalds@...>
Cc: <git@...>
Date: Saturday, December 9, 2006 - 5:34 pm

Here's some thoughts on subprojects from my company's perspective.  I 
apologize for the long message.

Abstract: We use submodules heavily in CVS and SVN.  I like what I've read 
from Linus about the "