Hi all, I have a fairly pressing need for git-submodule-like behaviour, but having tried git-submodule, it doesn't really work the way I'd like. A super-simplified example of what I do: - Project A (app) includes project B (build environment), which includes project C (tool library) - The projects are all open source, but B includes some binary packages so it's a big download. If you don't need the binary packages, people want to just download C (hence the separation). But everyone using A wants B and C, because they're lazy and bandwidth isn't a problem. - We have a local repository at work with mirrors of A, B, and C, which are also available publicly (but there's no reason for everyone in our office to be uploading/downloading the same big blobs all the time). - We frequently change B and C as part of building A (as well as other A-like applications). Here are the main problems, all in a jumble: It's a pain to check out / mirror / check in / push. git-submodule doesn't even init automatically when you check out A, so you have to run it yourself. The relative paths of A, B, and C on your mirror have to be the same as upstream. You can't make a local mirror of A without mirroring B and C. B and C start out with a disconnected HEAD, so if you check in, it goes nowhere, and then when you push, nothing happens, and if you're unlucky enough to pull someone else's update to A and then "git-submodule update", it forgets your changes entirely. When you check in to C, you then have to check in to B, and then to A, all by hand; and when you git-pull, you'd better to C, then B, then A, or risk having A try to check out a revision from B that you haven't pulled, etc. ...phew. It would probably be possible to fix each of these problems individually, but it would be a whole series of different fixes. I'd like to propose a rather different way of doing things that I think would solve most of these problems, and get some feedback: What if *all* the objects ...
Well, that would create a lot of unnecessary work when cloning. Partitioning by project is a natural way to divide the projects up. It's worth noting that the early implementations of submodules were based on this design, of keeping everything together. However, what you are suggesting should IMHO be allowed to work. In particular, if the submodule path is ".", then I think there's a good case that they should come from within the same project. If it's a relative URL, it should initialize based on the remote URL that was used for the original fetch (or, rather, the remote URL for the current branch). And, if it happens that after a checkout, that the commit of a submodule is already in the object directory (ie, there's another branch), then It could easily, if someone would allow clone to have a --track option like git remote does: git init This push failure thing is regrettable; however it's not clear which branch name the submodules should get. A given commit might exist on I think this could be a switch to git clone/pull, configurable to be the There is also a Google Summer of Code project for this - see Well, no, it's true that the current workflow has interface niggles; however it's important to understand why the current implementation is the way it is, and make sure that new designs build on top of the parts which are already designed well, where they can. Sam --
I solved that by adding a "submodule push" that pushes the detached head of each submodule to its own ref ("refs/submodule-update/commit- $sha1", imaginatively). I also made "submodule update" try to fetch that ref when looking for a sha1. I ran into trouble trying to avoid pushing every submodule for each "submodule push", and then more or less decided not to use submodules, so it's not quite fit for public consumption. I still think it's a sound idea in principle, so I'll clean it up and send it to the list if there's any interest. -- Eyvind Bernhardsen --
Hmm, a reasonable decision, but I think it would be better to force the user to choose which branch they want to push to. Leaving breadcrumbs Indeed - it can only become "fit for public consumption" if people submit their usability enhancements! Sam. --
Well, the point of "submodule push" was to avoid having to push in each submodule manually; not enforcing the requirement that commits in submodules must be publicly available before pushing from the main module is a recipe for disaster, or at least annoyance. And nobody likes an annoying git. Pushing to a branch works except that I couldn't figure out what to do if the push doesn't succeed, ie, the branch has advanced on the remote end. That's a problem if more than one module references the submodule or there are multiple branches in the main module. One solution that occurred to me was to have a branch in each submodule for every main module and branch. A branch name would be provided for each submodule in .gitmodules, used by "submodule push" but not "submodule update". In this case, if the push to the branch fails, the main module branch is probably behind too. This seemed like a good idea, but it's racy. If two simultaneous "submodule push"es try to push to the same branch on a submodule, one of them will be rejected, but it might already have updated branches on other submodules. Ick. I briefly toyed with creating tags named after the main module and its branch, with the submodule sha1 included for good measure, but that leaves a _real_ mess in refs/tags. Figuring out that I could use refs/ submodule-push instead seemed like an epiphany at the time. As an aside, my mental model of what the submodule needs is a fetchable reflog for every main module and branch that uses it, containing the history of commits used by that module/branch. It's a reflog, not a branch, because a submodule can be changed to a different branch, rewound, etc between commits in the main module; there's no requirement that the old commit is in the new commit's history. You actually don't want to fetch the whole thing, but you have to be able to fetch every sha1 contained in it, by sha1. ...so that's what refs/submodule-push is ...
It's simple. You just fail and tell the user what happened, and let If it is a rewind there is no issue, because you don't even need to push. But again it comes back to - let the user sort it out, don't try to be too clever. Sam. --
I think you misunderstood: what I'm saying is that submodules' _current_ behaviour is annoying, since you're guaranteed to forget to push a submodule before pushing the main module at least once. My attempt to solve that became too complicated, so I dropped it, and since the current behaviour is annoying, I gave up on submodules Sure, that solves the annoyance problem, but I wanted something more Yep, my problem was wanting to be cleverer than my limited git skills will allow. -- Eyvind Bernhardsen --
On Sun, Mar 30, 2008 at 3:50 PM, Eyvind Bernhardsen That's easy: just error out in that case. If the current system would just error out when I screwed up, I'd at least be able to deal with it. Right now I silently create un-check-outable parent repositories because I failed silently to upload my latest checkins to the child What is unsafe about "submodule update"? Thanks, Avery --
As I tried to explain, all the automatic push solutions I could come up with were flawed, so I decided not to use submodules at all and just have the build tool check out every module (that's what we currently do with CVS, so it's the easy way out anyway). If I understand you correctly, you want to be forced to create a branch and push to that? I don't think that works well with many developers pushing to a shared repository (my situation), and is in any case not the "automagical push" solution that I want. I agree If you have local changes committed in a submodule that is updated by a pull in the main module, "submodule update" will silently overwrite them. I was wrong, though, because you can fix that just by making "submodule update" error out when a submodule doesn't have its HEAD where the main module thinks it should be. -- Eyvind Bernhardsen --
On Mon, Mar 31, 2008 at 5:29 AM, Eyvind Bernhardsen I even *use* git-submodule and had to modify my build scripts because "git submodule init" and "git submodule update" don't seem to kick in automatically for some reason. The ideal situation would be to have git just manage the version control without having to babysit it, of course. That's hard to do in the general case, but should be quite Hmm, this is curious. If you're *not* using submodules, then I don't think you can push successfully without being on a branch, can you? So the suggestion merely extends this behaviour to submodules. (To be more precise, 'git push' seems only to be able to push branch heads. When you're not using git-submodule, commits are by default attached to branch heads, so this doesn't cause a problem. If you disconnect your HEAD, trying to push will silently do nothing, because it'll push some other branch head that hasn't changed, or maybe no branch at all. But with git-submodule, the *default* is a disconnected HEAD, which is too dangerous. I propose to simply have it fail out in this case.) If you 'git checkout -b branchname' inside a submodule, then 'git push' will do the right thing, so I'm not sure what you'd want to be Shouldn't "git merge" get a merge conflict if you've made a checkin that changed the submodule pointer, then try to pull someone else's checking that changes the submodules pointer to something else? It would seem there's no better option than that. While we're here, it's inconvenient to have to call "git submodule update" at all when there *isn't* a conflict. It should always be safe for git checkout or git merge to do that for you, no? Thanks, Avery --
The reason is that not everyone wants that by default. Perhaps it is a good idea for it to be default behaviour; but all in good time. It can I can understand the motivation to write such disparaging remarks; however it may be more productive to come up with good ideas about how it can be made to work better for you, without getting in the way of Sure, you could; git push origin HEAD:branchname However I think the right solution to this is to name the branch Well, where did you get the branch name from? That's the part that requires user intervention. You could make an educated guess, such as with git name-rev, but it would not necessarily be the right guess - so user confirmation of the choice would be desirable. Sam. --
I didn't mean anything disparaging. I have nothing against babysitters :) I'll be happy to work on patches once we have some sort of consensus Okay, yes. But that's just arbitrarily avoiding a local branch and creating a remote one instead. I can't imagine a situation where you'd really want the local branch to be anonymous while the remote one is not. When doing a normal "git clone" without submodules, git automatically creates you a local branch with the same name as the remote's .git/HEAD - which is rather arbitrary, but even an arbitrary local name is better than no name, and when checking out a brand new submodule, there are *no* local branches, so a name conflict is Here's a paraphrase of what I suggested earlier. I don't think it got a response: Instead of storing only the commitid of each submodule in the parent tree, store the current branch name as well. Use this as a hint to 'submodule update' so that when it checks out commitid, it names the local branch with the same name as it used to have. (This is rather user-friendly since if I check in, push, and clone, my new submodule checkout will have the same branchname as it used to have.) Note that the newly checked-out submodule branch will probably have the same name as as remote branch. However, the remote branch may refer to a different commitid (for example, if someone has pushed to that branch after the parent repo was last updated). This is exactly right; it means that if I cd into the submodule and "git push", it'll fail because I'm not up to date (I can always switch to a new branch if I want), and if I "git pull", it'll pull from the place where it should. This way, cloning a project with submodules will work much like cloning the parent project; pushing and pulling the parent and the submodules will do as you expect. The bad news is that this would require a change to the tree format for submodules (to contain the branch name). Is that a problem? Can it be done in a ...
That goes quite against the fundamental design of git submodules in that the submodules are by themselves independent entities. An often-cited example is an appliance project, where superproject bundles a clone of Linux kernel and a clone of busybox repositories as its submodules. Each submodule is an independent project, and as such, must not know anything about the containing superproject (iow, the superproject can know what the submodules are doing, but submodules should not know they are contained within a particular superproject). If your superproject (i.e. the appliance product) uses two branches to manage two product lines, named "v1" and "v2", these names are local to the superproject. It should not force the projects you borrow your submodules from to have branches with corresponding name. Also, the submodules and the superproject are meant to be loosely coupled. A single branch in superproject (say "v1") may have many branches in a submodule ("add frotz to v1 product", "improve nitfol in v1 product") that can potentially be merged and bound to. The work flow for updating a tree would look like: - People "git fetch" in superproject and in its submodules. They obviously prime their tree with "git clone" of superproject, and the submodules they are interested in, and a single fetch will update all the remote tracking branches, so it does not really matter which branch is checked out. However, if you employ a central repository model to keep them, an invariant must hold: all the necessary commits in submodules must be _reachable_ from some branch in them. - When not working in a particular submodule, but using it as a component to build the superproject, it would be better to leave its HEAD detached to the version the superproject points at. IOW, usually you won't have to be on any branch in submodules unless you are working in them. - Sometimes you need to work in a submodule; e.g. you would want to add 'frotz' tool ...
Not sure what you mean here; the supermodule already stores the commitid of the submodule. All I'm proposing is that it also store the default branchname (ie. the branchname that the submodule was using when its gitlink was checked into the supermodule) along with that commitid. The submodule never knows anything about the I meant that we should store the submodule's branch name when committing the superproject, and put it back when checking out the I agree that the submodule should have its HEAD pointing at exactly the superproject-specified commit. However, I believe this commit should have a local branch name (in the subproject) attached to it, or else (as I and my co-workers have frequently experienced) people will accidentally check in to a nameless branch, causing 'git push' to silently not upload anything, and thus lose track of their commits. I have lost work this way. The idea of naming the local-subproject-branch with the same name as it had on checking is that then "git pull" in the subproject will work exactly as expected: it'll get you the latest version of the branch the superproject developer was on. But if you *don't* explicitly "git pull" in the subproject, I'd expect (of course) the checkout to stick to the commit specified by the superproject - and also to leave its This is where my workflow is a bit different. One of my subprojects is a library that gets used by several application superprojects. I often add features to my library in the process of editing a particular superproject. I also expect my co-developers to want to do the same. Thus, the difference from your example is that I want to streamline the process of working in a subproject as well as a superproject, and minimize the chances of losing data in this case. With the current system the way it is, it's too easy to make mistakes, As an orthogonal secondary wish, I'd like to have the subproject and superproject hosted in the same remote repository. This appears to ...
How about this. This could be an optional disambiguator in .gitmodules in the superproject, to allow you to "store the branch it was made from". Glue to make this automatic/easy optional. When updating a submodule, with an option set (or configured; which might even later become a default if people like it enough), it will try to figure out a reasonable branch for that commit, using git-name-rev, and check out the branch with that name. It first uses the hint above as an argument to git-name-rev --refs=XX, and if that doesn't provide a reasonable answer then look for any branch. I think this approach would not get in the way of people who don't want I think this is a separate argument against git-push, the default behaviour of which also causes me to tell people not to use the argument-less form of git-push until they understand how to use the two-argument form. In the context of git-submodule, adding features to it to avoid this if It's not really the local branch name anyway, it's how the default push gets configured; perhaps it's worth distinguishing which part you are Yes - you've already seen the SoC plan for that, although I believe no students applied for that one, and if you think it's minor enough to do, I'd appreciate that feature - though I'm more interested in making sure that I don't push anything where the submodule commit is not available via the URL listed in .gitmodules. Presuming such a check, would that check happen at push time, or do you check at a different time, such as when committing, or when adding the submodule to the index? I think checking that referential integrity is something perhaps easier to bite off and get people to agree on. I think it would solve the overall process problem, by forcing people to push the submodule before the commit of the superproject can succeed without forcing. Thoughts/comments? Sam --
It's not just racy, but I think it's wrong to limit to _one_ branch in
each submodule..
A submodule is an independent project on its own.
Suppose the commit DAG in the submodule looked like this:
o---o
/ \
--o---o---o---o---o---X---o---Z
\ /
o---o---o---o---o---o
\ /
o---o
and the superproject points at commit X. You may need to tweak the
submodule to make it work better with the change you are making to the
superproject.
You have two choices:
(1) update to some "stable" branch head that is descendant of X first,
and make sure it works with the superproject. Then develop on top of
it, and bind the tip of suc development trail to the superproject:
o---o
/ \
--o---o---o---o---o---X---o---Z---o---o---Y (your changes are Z..Y)
\ /
o---o---o---o---o---o
\ /
o---o
I think this is what you are suggesting. But the superproject may not be
ready to use the submodule with the history from the lower side branch
merged in. You would
(2) fork off of X and develop; bind the tip of such development trail to
the superproject. IOW, you make the submodule DAG like this, and
then "git add" commit Y in superproject.
o---o o---o---Y (your changes)
/ \ /
--o---o---o---o---o---X---o---Z
\ /
o---o---o---o---o---o
\ /
o---o
Sometimes forked branches need to be maintained until it proves stable
(and then your "tip" Y may be merged back to the tip of a public branch
Z). So you would at least need to allow a set of topic branches in
submodules that corresponds to a single lineage of superproject history.
Then when both Z (with the changes from the lower side branch) and ...What unnecessary work do you mean? Certainly fetching only a particular set of refs from a remote repository is possible, as that's what 'git pull' does. I agree that partitioning by project makes sense... but it also seems to me that throwing extra objects into a repository that requires them anyhow shouldn't have any major negative results. After all, if you can't build A without B, then downloading A might as well download the objects from B too. Which is not to say that B shouldn't *also* have I'd like to read about the rationale behind this change. Is there a I agree, there's no reason to take away the existing functionality of allowing split repos. I was more suggesting a new functionality so One option is to make a simple "git push origin" operation fail if you're not on any branch; iirc, if you try that now, it just silently *succeeds* without uploading anything at all, which is one reason I so frequently screw it up. Alternatively, is there a reason I can't upload an object *without* giving it a branch name? I guess that would cause problems with garbage collection. Now, the fail-on-branchless-push option still isn't really perfect, because then I'll screw up like this: - make change - check in - try to push: fails - switch to branch - realize I've lost my checkin(s) and have to go scrounge in the reflog to try to find it If we could disallow checkins to disconnected heads, then I'd get an error at step 1, before I had a chance to screw up. I think that would be a usability improvement to git in general. For example, if I screw up a git-rebase and forget to abort, my HEAD ends up disconnected and I occasionally check things in by accident and then lose them (only to be saved by the reflog). Perhaps an extra option to git-commit that must be used if you want to check into a non-branch? Is that too harsh? Another option would be to simply *always* create/update a branch tag when doing "git submodule update". But then the question is ...
A full clone takes a few shortcuts, especially over dumb transports like HTTP. I think there might be shortcuts in the git-daemon code as well. Forcing these to be partial might make these full fetches involve more If you think it is simpler, then I'm sure that submodules users would appreciate you sharing your ideas as a patch. Sorry if I am starting to sound like a parrot ;-). Sam. --
Would a "recurse" sub-command help your workflow? http://thread.gmane.org/gmane.comp.version-control.git/69834 -- Hannes --
Well, typing "git submodule recurse push" or something would allow me to lose the same data without typing quite as much, so strictly speaking I guess it would be an improvement :) I'd like it even more if "git push" actually somehow refused to push at all if I forgot to push in the submodules. Have fun, Avery --
