git submodules and commit

Previous thread: [PATCH] Fix buffer overflow in git-grep by Dmitry Potapov on Wednesday, July 16, 2008 - 6:15 am. (10 messages)

Next thread: [PATCH] builtin-describe.c: make a global variable "pattern" static by Nanako Shiraishi on Wednesday, July 16, 2008 - 6:42 am. (1 message)
To: Git Mailing List <git@...>
Date: Wednesday, July 16, 2008 - 6:32 am

I wonder if this is a fairly common pattern. We tend to have modules
as git repositories, and projects that tie together those git
repositories as submodules. In general, > 90% of the work is done in
one module, and the following stanza gets used a lot:

cd /proj/modA
git commit -s -m "Some change"
git push

cd ..
git add modA
git commit -s -m "Some change (modA)"
git push

But since this is much more cumbersome than (say) "svn ci", what often
happens is developers just commit into modA, then carry on. Or for
people just learning git, they somtimes screw up, and push the parent
proj but not the child modA

This is a shame, as it means any external people pulling updates
directly from proj will not get this change (e.g. CI tools
speculatively compiling against every developer tree).

For me, in some really high proportion of cases, I think I want 'git
commit' to mean 'commit to any child repositories, any sibling
repositories, and any parent repositories (updating the submodule sha1
as appropriate). In other words, 'pretend like the whole thing is one
big repo'.

I guess it probably gets sticky when there are merge conflicts. Is
anyone working on this kind of thing; I might be able to give some
time to help work on it?
--

To: Nigel Magnay <nigel.magnay@...>
Cc: Git Mailing List <git@...>
Date: Wednesday, July 16, 2008 - 11:43 am

I have exactly the same problem as you, and have been working on
improving my own workflow so that someday I can offer patches that
might be generally applicable.

In the meantime, my solution is... some shell scripts checked in at
the top level of my project. :)

In one of my applications, I have a /wv submodule, which provides a
cross-platform build environment. That environment respectively
contains a /wv/wvstreams submodule, which is a library that we use.

When I make a change to wvstreams that's needed for my application, I
need to check into wvstreams, then check that link into wv, then check
that link into the application. Then, when I push, I have to make
sure to always push wvstreams first, then wv, then application, or
else other users can end up with "commit id xxxxxx not found" type
errors.

So basically, committing is always harmless, since I can do anything I
want in my own repo (and I want to be able to update wvstreams
*without* always updating wv, and so on). The tricky part is pushing.
Here's the script I wrote to make sure I don't screw up when pushing:

~/src/vx-lin $ cat push-git-modules
#!/bin/sh -x
set -e
test -e wv/wvstreams/Makefile
(cd wv/wvstreams && git push origin HEAD:master) &&
(cd wv && git push origin HEAD:master) &&
git push origin HEAD:master ||
echo "Failed!"

Now, this script is pretty flawed. Notably, it always pushes to the
'master' branch, which is stupid. However, it works in our particular
workflow, because wvstreams isn't being modified by too many
developers and it's okay if we all commit to master. This is also
aided by the fact that people are trained to push only after they've
made all the unit tests pass, etc. And further, individual apps don't
have to update their wvstreams to the latest anyway unless they really
need the latest changes, which is a wonderful feature of git
submodules.

Now, sometimes the above push script will fail. In my experience,
this is only when someone...

To: Avery Pennarun <apenwarr@...>
Cc: Nigel Magnay <nigel.magnay@...>, Git Mailing List <git@...>
Date: Friday, July 18, 2008 - 12:11 pm

See http://article.gmane.org/gmane.comp.version-control.git/69834
([PATCH] Added recurse command to git submodule)
Or search "submodule recursive" in gmane.

The recursive pull,diff,status for submodule is implemented by Imran M
Yousuf. And IIRC, with this patch, you can walk through the submodule
hierarchy to exectute any command.

--
Ping Yin
--

To: Avery Pennarun <apenwarr@...>
Cc: Git Mailing List <git@...>
Date: Thursday, July 17, 2008 - 5:47 am

Yes - I use something rather similar on my desktop. The unfortunate
thing is that I know how submodules work, and am happy with the
scripts. My users are sometimes in the 'git gui' types - not as

Yeah - this happens a lot. If someone else commits to the
super-project before you, it's always a conflict. What's annoying is
there's no way around it (though resolution is easy - force to current
- but it this is a big bit of what confuses my users. They say 'but I
already resolved the merges in the submodule itself'. I'm not sure
there's an easy way around it though - and this is part of my worry
that there's hidden complexity with trying to make it 'look like 1 big

Yeah. I have an additional usecase, which is around pulling from
another user. If they've made changes in their tree(s) that they want
to get reviewed, normally I could do something like

git fetch ssh://joebloggs.computer/blah +refs/heads/*:refs/remotes/joebloggs/*

But if they've made cross-module changes, I'm SOL, as fetching their
super-project will have references to commits that aren't in the repo
mentioned in .gitmodules (only in joebloggs's tree) - so doing git
submodule update doesn't help. I have to go into each submodule and
explicitly fetch. It feels wierdly centralised for this otherwise

What's bugging me is I'm not sure that it's the right place. It seems
(to me) that having the only place that knows about submodules being
the 'git submodules' script isn't right. What users want is 'git fetch
<blah>' to do the lot - that, for the most, it ought to do the
submodule init, update and clever stuff automatically. That if 'git
fetch' is porcelain, then the porcelain needs to call the
git-submodule stuff.

Hm - I'd be happy with the same commt message in all modules. What I
want is to be able to do (from the top) 'git commit -a' or the same
with the GUI, and see all the files to be committed regardless of
whether they're in a submodule or not.

I'm guessing you probably need to build a tree of submodul...

To: Nigel Magnay <nigel.magnay@...>
Cc: Git Mailing List <git@...>
Date: Thursday, July 17, 2008 - 11:12 am

This might not be as hard as it sounds. We probably just need to
teach the supermodule how to merge gitlinks safely. So basically, if
I moved the gitlink from A to B, and he moved it from A to C, then it
needs to check whether a fast forward merge already exists for the
submodule to combine B and C. This is easier than it sounds, because
if I *already* ran my newest-git-modules script in the inner module,
then I've already manually resolved the merge in question, so that B
*does* actually contain C.

Right now, such a thing results in a conflict. It isn't really a
conflict though, it's a fast forward, and the supermodule's merge
should ideally just notice that and run with it.

Sadly I know very little about the merge code. But I would be happy
to help you test a patch that implemented this :)

A slightly more advanced version of the same would automatically walk
into the submodule and ask it to merge B and C. I suspect that is way
more complicated than it sounds at first glance, though (particularly
if the new B or C gitlink doesn't have A as a parent at all, which
couldn't happen in a unified git repo, but is perfectly allowable with
submodules).

With anything like this, there's always the question of what happens
if you haven't done a "fetch" in the submodule yet; I think reverting
to the current behaviour is fine in that case, because I can make

One slightly non-obvious option here is to actually use the *same*
repo for all your supermodules and submodules, then use "." as the
repo path in your .gitmodules. The original clone is huge that way,
but it makes it obvious how to get any objects that you're missing.

Then you could construct your submodules using --reference the
supermodule. Thus, doing a "fetch" of your user's supermodule, you'll
also get all the objects it references.

Note that I've only basically tried out this technique. I think it's
the one for me, but I haven't experimented with it enough to know any
pitfalls. When I've brought it up on the...

To: Nigel Magnay <nigel.magnay@...>
Cc: Git Mailing List <git@...>
Date: Wednesday, July 16, 2008 - 6:47 am

And I think that this is the problem: If this way of commiting your
changes is *required* in the *majority* of cases, then you are IMO outside
the intended use-case of submodules. You are better served by really
making this one big repo.

IMO, submodules are to be used if you can afford to advance parent project
and submodules at different paces; i.e. if the parent project can work
with newer versions of the submodules (and possibly in a degraded mode
even with outdated versions).

-- Hannes
--

To: Johannes Sixt <j.sixt@...>
Cc: Git Mailing List <git@...>
Date: Wednesday, July 16, 2008 - 7:02 am

Hm - then my contention is that the scope of submodules needs to be
expanded (or something needs to be built on top).

One-big-repo doesn't fly - > 75% of the code volume (the 'other'
modules) are shared between multiple projects. In SVN these are just
svn:externals (which has it's own imperfections).

I think it's a common usecase. You have 'shared' modules and
'project-specific' modules[*]. The 'shared' modules you hope don't
change very much, but they are part of the overall project
configuration - it's really nice that you can branch so easily in git,
then get the module owner to merge those changes into the next release
at their leisure. The superproject then represents the correct
configuration of submodule trees to make a valid build.

The machinery has everything that's required, it's just the user
experience sucks :(

[*] actually there's more subtlety, there's 'shared', 'product' and
'project', so some 'specific' modules are potentially re-shared
--

To: Nigel Magnay <nigel.magnay@...>
Cc: Git Mailing List <git@...>
Date: Wednesday, July 16, 2008 - 7:35 am

Ah, is this your actual scenario? Just to make sure we are talking about
the same thing:

- You own superproject P.
- $Maintainer owns submodule S.
- You use S in P.
- You make changes to S that you would like $Maintainer to include in the
next release.
x You use in P your changes to S while $Maintainer has not yet released a
new version of S with your changes.
- Finally your changes arrive via the new release of S.

That *is* the intended use-case for submodules. But you have to play the
game by the rules:

- $Maintainer defines the official states of S.

- You must never commit an unofficial state of S in P.

The critical step in above list I marked with x:

- During the period where only *you* have the new changes to S, you must
*not* commit your submodule state to P. Instead, you write P in such a way
that it can work with both the old version of S and the upcoming release
that will have your changes[*]. This way you make sure that your consumers
of P always have a working version regardless of which version of S they use.

- After you have received the new release of S from $Maintainer, you
commit the new state of S in P. And if you are nice to your consumers of
P, then you *do not* remove the workaround from P just yet, so that you
don't force them to upgrade S. You will remove it later only if it becomes
a maintainance burden.

[*] If it is not possible to make P work with old and new versions, then
you have to work closely with the $Maintainer so that you never need
commit an unofficial state of S into P.

-- Hannes

--

To: Johannes Sixt <j.sixt@...>
Cc: Git Mailing List <git@...>
Date: Wednesday, July 16, 2008 - 8:48 am

Yes, that is the situation - with the proviso that it's not always
clear in company environments who $Maintainer actually is. For
example, if the only changes occurring in S come from me, then chances
are come release cycle, $Maintainer == me.

Yes - there is one branch ('master') which the changes eventually

If by that you mean that the only person to move the branch 'master'
is $Maintainer, then I agree.
If by that you mean that you can't commit at all to the S tree (and
the S submodule pointer) then I don't agree, and I think that's a

Just to be clear - there's more than just 'me' working on P - there's
a whole team of people working on it. And there's Q R S and T teams
also working on projects that also have S.

Changes that happen to S are, often, new features or bug fixes. We
can't just stop because there isn't an 'official' version of S yet
(and the official version might end up simply being a FF anyway), so
saying 'don't commit your submodule state to P' is unrealistic.

And that should be the big advantage of git. If we suddenly find we
need some additional functionality in S, we just add it to our
P-branch-of-S. The $Maintainer (if he exists) can review these
upcoming changes in the tree, and merge them to master as appropriate
(or work with the projects to iron out cross-branch
incompatibilities). The best example is that S is a "product", and (by
management decree), the only product changes that happen will occur
because of *projects* (like P). And we can do this (and it's
infinitely better than svn, where 'ooh, branches too hard, everyone in
--

To: Nigel Magnay <nigel.magnay@...>
Cc: Git Mailing List <git@...>
Date: Wednesday, July 16, 2008 - 9:38 am

And I'm saying that submodules are designed for *loosely* coupled projects.

It's no wonder that this tool is awkward to use in your workflow.

-- Hannes
--

To: Johannes Sixt <j.sixt@...>
Cc: Git Mailing List <git@...>
Date: Wednesday, July 16, 2008 - 10:03 am

Ok in a sense. I don't think it's particularly clear from the
documentation that this is a limitation of submodules though.

Given that
- The only way in git to separate out re-usable modules is by the use
of submodules
and
- It's a pretty common usecase for these submodules to be interrelated
and
- Looking over the list archives, it seems this is quite common complaint

"I really like the git submodule implementation, I just don't like how
hard it is to work with"

"The current behaviour strongly encourages me to avoid submodules
when I would otherwise like to use them, just to keep the rest of my
team members (who are not git experts) from going insane."

"For my use case, I passionately dislike the fact that a submodule is
not updated automatically. There's never a time when I don't want to
update the submodule. The submodule is a very important piece of our
project and the super-project depends on it being at the right
version."

and
- All the technical capability is there, it's just the porcelain
that's causing the friction.
then
would this not seem to be an area that could be improved? Even if it
were an optional mode of working?
--

To: Nigel Magnay <nigel.magnay@...>
Cc: Git Mailing List <git@...>, Johannes Sixt <j.sixt@...>
Date: Wednesday, July 16, 2008 - 10:17 am

So, were there already any patches posted to add such a functionality
that were rejected? If not, apparently noone cared _enough_, yet. ;-)
You may be the first!

I don't know if there are any _present_ "free developers" willing to
pick up this task now. For many (most?) Git developers, submodules
simply aren't a priority. For me, they actually currently are, but I
probably won't want to use them in your way either (even though I can
agree that your sentiments are valid), so I will personally invest my
time in doing other things than figuring out the precise semantics
these operations should have etc.

--
Petr "Pasky" Baudis
GNU, n. An animal of South Africa, which in its domesticated state
resembles a horse, a buffalo and a stag. In its wild condition it is
something like a thunderbolt, an earthquake and a cyclone. -- A. Pierce
--

To: Petr Baudis <pasky@...>
Cc: Git Mailing List <git@...>, Johannes Sixt <j.sixt@...>
Date: Wednesday, July 16, 2008 - 10:31 am

That's cool. I was guessing it might be the case (or alternatively
that someone might say 'yeah, but it's 25% of the way there'); my
original query was also one of an offer of help ;-) My guess though is
that the core-devs have much more connected neural pathways at
thinking about the problems around the edge cases to be able to give
warnings of 'there be dragons'!

Nigel
--

To: Johannes Sixt <j.sixt@...>
Cc: Nigel Magnay <nigel.magnay@...>, Git Mailing List <git@...>
Date: Wednesday, July 16, 2008 - 8:11 am

I think the issue here is that $Maintainer = him (or Maintainers(P) =
Maintainers(S), in general); the workflow you described still works, but
is overly complicated and that is the original complaint.

Petr "Pasky" Baudis
--

Previous thread: [PATCH] Fix buffer overflow in git-grep by Dmitry Potapov on Wednesday, July 16, 2008 - 6:15 am. (10 messages)

Next thread: [PATCH] builtin-describe.c: make a global variable "pattern" static by Nanako Shiraishi on Wednesday, July 16, 2008 - 6:42 am. (1 message)