> branch to track in the submodule?
The reason I thought it would have to be HEAD at all times, is to prevent situations where the supermodule commit doesn't reflect the state of the current tree. Let's imagine that we're doing non-HEAD tracking in the supermodule. supermodule +-------- libsubmodule1 +-------- libsubmodule2 So, you do a "make" in supermodule; this of course will call make in each of the submodules. You test the output and find that it's all working nicely. Time for a supermodule commit. We want to freeze this working state. You commit and tag "supermodule-rc1" Unfortunately, during development, you've switched libsubmodule1 to branch "development", but supermodule isn't tracking libsubmodule1/HEAD it's tracking libsubmodule1/master. Your supermodule commit doesn't capture a snapshot of the tree you're using. Now you say to the mailing list "hey guys, can you test "supermodule-rc1"? They check it out, and find that everything is broken. Why? Because what you wanted to check in was libsubmodule@development, but what actually went in was libsubmodule@master. I think I've talked myself into the position where it definitely has to be HEAD being tracked in the submodules; anything else is a disaster waiting to happen because commit doesn't check in your current tree. Andy -- Dr Andy Parkins, M Eng (hons), MIEE andyparkins@gmail.com -
hoi :) The way I wanted to address this is to show in the supermodule git-status that the submodule is using another branch. That way you are warned and can decide not to commit the supermodule. I implemented tracking of refs/heads/master (not HEAD) without much thinking, and only recently began to think about possible problems with this approach. But I think it is an important design decision to take, so I'd like to have consensus here. Pro HEAD: - update-index on submodule really updates the supermodule index with a commit that resembles the working directory. Contra HEAD: - HEAD is not garanteed to be equal to the working directory anyway, you may have uncommitted changes. - when updating the supermodule, you have to take care that your submodules are on the right branch. You might for example have some testing-throwawy branch in one submodule and don't want to merge it with other changes yet. Pro refs/heads/master: - the supermodule really tracks one defined branch of development. - you can easily overwrite one submodule by changing to another branch, without fearing that changes in the supermodule change anything there. Contra refs/heads/master: - after updating the supermodule, you may not have the correct working directory checked out everywhere, because some submodules may be on a different branch. - there is one branch in the submodule which is special to all the other. I think that most of the disadvantages of refs/heads/master can be solved by printing the above-mentioned warning in git-status when the submodule is using another branch (similiar to the planned-but-not-implemented warn if the submodule has uncommited changes). I don't yet know how to cope with tracking HEAD directly, so I'm still in favor of tracking refs/heads/master, as already implemented. --=20 Martin Waitz
The problem I see with tracking a particular branch is that it makes it less convenient to use git's quick-branching features in the submodules. Let's say I want to try something out quickly in a submodule, I make a branch, commit, commit, "hmm, looks good, let's snapshot it in the supermodule", make a supermodule branch, "oh no, I've got to tell the supermodule to track the new (but temporary) branch in the submodule do a commit, switch the submodule branch back to master, delete the temporary branch, remember that the supermodule is tracking that branch and tell the supermodule to track Ouch. Why does the submodule need to update the supermodule index? That should be done by update-index in the supermodule. Further, how is the supermodule index going to represent working directory changes in the submodule? The only link between the two is a commit hash. It has to be like that otherwise you haven't made a supermodule-submodule, you've just made one super-repository. Also, if you don't store submodule commit hashes, then there is no way to guarantee that you're going to be able get back the That's the case for every file in a repository, so isn't really a worry. It's the equivalent of changing a file and not updating the index - who cares? As long as update-index tells you that the submodule is dirty and what to do to What is the "right" branch though? As I said above, if you're tracking one branch in the submodule then you've effectively locked that submodule to that branch for all supermodule uses. Or you've made yourself a big rod to beat yourself with everytime you want to do some development on an "off" branch on You can always do that anyway by simply not running update-index for the This seems like the biggest problem to me - doesn't this negate all the advantages of a submodule system? After a check in, you have no idea if what you checked in was what was in your working tree. Andy -- Dr Andrew Parkins, M Eng (Hons), AM...
hoi :) What about: You decide to try something out quickly and create a new branch in the submodule. After you have verified that it works, you merge it to the submodules master branch and commit that to the supermodule. Not that complicated, isn't it? In fact, my current implementation does not even allow to change the Please excuse that I am not an native english speaker and I may have That is exactly what I wanted to say. In the supermoduel you call update-index (with the submodule path as argument) to update the index This is handled in the next paragraph. The argument really is: HEAD always points to the checked out branch, Yes, it's not a real counter-argument, but it relativates the previous You always know which branch in the submodule is the "upstream" branch which is managed by the supermodule. You can easily have several topic-branches and merge updates from the master branch. otherwise you always have to remember which branch holds your current contents from the supermodule. When viewed from the supermodule, you are storing one branch per Suppose you are working on a complicated feature in one submodule. You create your own branch for that feature and work on it. Now you want to update your project, so you pull a new supermodule version. Now this pull also included one (for you unimportant) change in the submodule. I think it is more clear to update the master branch with the new version coming from the supermodule, while leaving your work intact (you haven't commited it to the supermodule yet, so the supermodule should not care about your changes, it's just some dirty tree). Then you can freely merge between your branch and master as you like and are not forced to merge at once. And perhaps you even do not want to merge at all, because you are on an experimental branch which really is Of course you know: git-status will tell it. This is no different to today, where you can commit while still leaving a part of the tree dirty. --=20 Martin W...
WHAT? I've got to make merges (that I don't necessarily want) in order to commit in the supermodule? This completely negates any useful functioning of branches in the submodule. I want to be able to make a quick development branch in the submodule and NOT merge that code into master and then be able to still commit that in the supermodule. I think you're imagining the binding between the super and sub is very much tighter than it should be. What if I'm working on a development version of the supermodule, which includes a stable version of the submodule? Vice That prevents me "trying something out" on a topic branch in the submodule. Here's a scenario using my suggested "supermodule tracks submodule HEAD" method. * You're developerA * Make a development branch in the supermodule * In the submodule, make a whole load of topic branches * Make a development branch in the submodule * Merge the topic branches into the development branch of the submodule * Commit in the supermodule. This capture * Tag that commit "my-tested-arrangement-of-submodule-features" * Push that tag to the central repository - tell the world. * DeveloperB checks out that tag and tries it. Great stuff. Now: here's the secret fact that I didn't tell you that will break your "supermodule tracks submodule branch" method. DeveloperB has decided to have this in his remote: Pull: refs/heads/master:refs/heads/upstream/master Oops. The supermodule, which has been told to track the "master" branch in the submodule is tracking different things in developerA's repository from developerB's repository. Worse, what if developerB did this: Pull: refs/heads/master:refs/heads/development Pull: refs/heads/development:refs/heads/master Branches are completely arbitrary per-repository. You cannot rely on them being consistent between different repositories. If you store the name of a submodule branch in a supermodule - that supermodule is only valid for that one special case of yo...
hoi :) exactly! Please think about it. If you track HEAD, then this means that you track HEAD. In _both_ directions! So you not only store your submodule HEAD commit in the supermodule when you do commit to the supermodule, it also means that your submodule HEAD will be updated when you update your supermodule. And what happens if you already commited something to HEAD in the mean time? Exactly: a merge is needed. And you are right: you might not want to do this now, because you branched off, because you _wanted_ to have some development which is _independent_ to the current supermodule work. So tracking HEAD really makes branching in the submodule hard to work with. What does the supermodule provide to the submodule? It stores one reference to a commit sha1. Just like a reference inside refs/heads inside the submodule. There really is not much difference between the sha1 stored inside the supermodules tree and one stored inside refs/. So from the submodules point of view, the supermodule is not much more then one special branch. But it is not possible to use the supermodule index directly as one "magic" branch for several reasons. So we need synchronization methods between the index entry for the submodule which is stored in the supermodule and the references in the submodule. These are git-update-index/git-commit and git-checkout, both called explicitly or implicitly in the supermodule. And I really think it makes sense to have a one-to-one relationship between the submodule "branch" stored in the supermodule and the This is still supposed to be a distributed system. DeveloperB does not only check out the whole project including several modules. He is also supposed to _work_ with it. What if DeveloperB also has several topic branches? When he checks out the new supermodule, only his current HEAD in the submodule will be updated. So he first has to change to some supermodule-tracking branch inside the submodule, then pull the supermodule updates, then eve...
Martin Waitz wrote: Why the magic? The typical workflow in git is 1. You work on a branch, i.e. edit and commit and so on. 2. At some point, you decide to share the work you did on that branch (e-mail a patch, merge into another branch, push upstream or let it by pulled by upstream) I fail to understand why these two steps have to be mixed up. Someone care to explain? Regards Stephan -
hoi :) 3. Other people want to use your new work. --=20 Martin Waitz
Sorry, if that was not obvious: You actually procceed with one of the options I listed in Step 2. What I wanted to state is that with git you do not mix up committing (which is local to your repository and your branch) and publishing. Regards Stephan -
hoi :) I guess you are refering to not mix up committing to the submodule and updating the supermodule index. These are really two separate steps, I just combined them above because I wanted to put emphasis on the other part: it is not a one-way flow, it is bidirectional, so your HEAD would have to changed if the supermodule gets updated. And I consider changing HEAD, without looking at the branch it points to, to be a bad thing. --=20 Martin Waitz
The opposite: If you work in the supermodule, even if it is in the code of the submodule, you only commit to the supermodule. The submodule does Why do you mix up supermodule and submodule? The way I see your proposal you cannot change submodule and supermodule independently. That is a huge drawback. Regards Stephan -
hoi :) I think we are using totally different definitions of "submodule". For me a submodule is responsible for everything in or below a certain directory. So by definition when you change something in this directory, you have to change it in the submodule. You can't change the submodule contents in the supermodule without also changing the submodule. This is just like you can't commit a change to a file without also changing the file. Then the supermodule just records the current content of the entire tree. The only new thing is that instead of simple files there are now No, this is the benefit you get by introducing submodules. Why would you want to introduce a submodule when it is not linked to the supermodule? --=20 Martin Waitz
No so different. The way I see it is that "I" (meaning with submodules implemented as I proposed) could pull regularly from "your" repositories (implemented as you proposed) and work with the result (including But you do not consider the case where you cannot change the submodule because you do not own it. For example, git has the subproject xdiff. If git had been able to work with subprojects as I envision, and if xdiff had been published as a git repository (not necessarily subproject enabled), it could have been pulled in git's subdirectory xdiff as a subproject. There would not have been a separate branch or even repository for xdiff in the git repository. All changes to xdiff in git could have been committed to the git repository only. Independently, they could have been published to upstream and be put into the xdiff repository by its author. But the last part is what only the owner of the xdiff repository is able to decide. (Ok, ok... the example sucks badly because xdiff has been massively changed for its usage in git so the changes would not be integrated by upstream. But you can imagine where you use a library essentially as is, only if you discover bugs you fix them immediately in your repository and keep those fixes in your version of the library, even on upgrade, There is a difference. I would say: If you commit a change to a file in Yes, and that is all you need. If the changes are to be part of a branch Because the submodule must be independent of the supermodule. I see where you are coming from. You have one project that is divided into subprojects but the subprojects themselves are not independent. What I would like to solve is the followng: You have a project X, an this project is made part of two other projects Y and Z (as a submodule or subproject or whatever you want to call it). The project X need not, must not or cannot care that it was made a subproject. But in projects Y and Z, you must be able to bugfix or extend or m...
hoi :) Sorry, but with all that many people proposing things I am a bit lost now. Sometimes I thought you want exactly the same thing as I do, I do not understand you here. The submodule is part of the supermodule, and the one who sets up the repository owns the whole thing, including all submodules, just like all the files which are part of the project. If you mean the upstream repository of the submodule, then yes, this is of course completely separated from the submodule and may be owned by someone else. Consequently, this upstream repository of course does not This could have been done if submodule support would have been available Yes, but if it would have been integrated as a submodule it obviously would have been committed to the xdiff submodule inside the git repository. So the changes are really part of the git repository, but you could go to the "git/xdiff" directory and only see the changes in the submodule, But you need to change _at_least_ one branch. Otherwise you cannot commit to a branch. So if you change something in a submodule, you have to change one branch in the submodule. If you call git-checkout in the supermodule this will result in Of course. So if you wanted to check out everything, you could have something like ~/src/X, ~/src/Y/X, and ~/src/Z/X. All of these would be GIT repositories, all of them have their independent branches. What I am saying is just that if you update Y, and the new Y contains an updated version of X, then ~/src/Y/X/.git/refs/heads/master will be changed by the pull, resulting in the new version of X being checked out in ~/src/Y/X (alongside all the other updates inside ~/src/Y). No ;-) --=20 Martin Waitz
We are in agreement about two fundamental parts of the implementation and their meaning: 1. A submodule is stored as a commit id in a tree object. 2. Every object that is reachable from the submodule's commit are reachable from the supermodule's repository. Please confirm. If you mean by "owns the whole thing" what I stated above in 2. the we That's it: There is no need for a separate branch or repository. If you have the subproject's commit in the superproject's object database (and we really have that, see 1. and 2. above), why do you _have to_ store it No. The xdiff submodule would only exist as part of the git repository. You could, f.e., access the xdiff commit in git HEAD as HEAD:xdiff// (again my proposed syntax). HEAD:xdiff//~2:xemit.c would give you the grandparent of xemit.c in the xdiff submodule. And so on. You can even If you mean the submodule repository created by init-module I Sorry, have to leave for home so I must leave that uncommented. Hopefully I can join in during the weekend. Regards Stephan -
I'm still not convinced about 2. Why should any of the submodule commits be in the supermodule repository? I know that is what you've implemented, but it still feels like too much of a blending of the submodule into the supermodule. In fact, why should the submodule commits be even visible in the supermodule? That tree->submodule commit is sufficient; there isn't any need to view submodule history in the supermodule. Andy -- Dr Andrew Parkins, M Eng (Hons), AMIEE andyparkins@gmail.com -
hoi :) Well, but there is a need for a common object traversal. You need that when sending all objects between two supermodule versions and also when you determine which objects are still reachable. The easiest way to implement the common object traversal is to have all objects in one object repository. It may be possible to use two object stores and still do the common object traversal but I do not think that gives you any benefits. You still don't have a totally separated repository then, because you can't do a reachability analysis in the submodule repository alone. --=20 Martin Waitz
No you don't; when traversing the supermodule history you will come across trees that have submodule commit hashes in them, that is all the other end needs to know. If it wants it can then connect to the submodule and clone submodule to submodule. The whole operation doesn't have to be done in the That's true; but is it the right way? I really really think the submodule There is one benefit - you can git-clone the submodule just as you would if it were not a submodule. In fact, from the submodule's point of view it knows I'm going to guess by reachability analysis, you mean that the submodule doesn't know that some of it's commits are referenced by the supermodule. As I suggested elsewhere in the thread, that's easily fixed by making a refs/supermodule/commitXXXX file for each supermodule commit that references as particular submodule commit. Then you can git-prune, git-fsck whenever you want. Andy -- Dr Andrew Parkins, M Eng (Hons), AMIEE andyparkins@gmail.com -
hoi :) The submodule repository obviously has to able to reach all its objects. This is easily doable with the shared object database. I wouldn't call this "easily". --=20 Martin Waitz
Of course it is; when you write a supermodule commit you have it's hash, $SUPERMODULE_HASH, you have the commit-hash of the submodule commit you're referencing, $SUBMODULE_HASH. It's not really hard to do echo $SUBMODULE_HASH > submodule/.git/refs/supermodules/commit$SUPERMODULE_HASH Is it? Andy -- Dr Andrew Parkins, M Eng (Hons), AMIEE andyparkins@gmail.com -
hoi :) I guess you are aware that you have to scan _all_ trees inside _all_ supermodule commits for possible references. So what do you do with deleted submodules? You wouldn't want them to still sit around in your working directory, but you still have to preserve them. --=20 Martin Waitz
No you don't; you do it as part of the appropriate normal operations. * supermodule commit - scan the current tree for "link" objects in the tree. If you find one write the reference in the submodule. * adding a new submodule - if this is a new submodule there can't be any references in the supermodule already. * cloning a supermodule, every new commit that gets written in the Now that is a tricky one. Mind you, I think that problem exists for any implementation. I haven't got a good answer for that. Andy -- Dr Andrew Parkins, M Eng (Hons), AMIEE andyparkins@gmail.com -
hoi :) * removing a branch from the supermodule. OK, this is an infrequent operation and it can be handled by redoing everything. I just don't like to duplicate information which is already available easily. We'd need much to many special cases, just to correctly support reachablility analysis. If you just keep it in a shared object repository you don't have any problems. Please note that it is not required to keep it in one physical location. You can still use alternates/whatever to store some objects in another repository, but you need to be able to access all objects from the supermodule. --=20 Martin Waitz
That suggests that it is probably better to separate submodule repositories from their checked out working trees. Why not put the GITDIRs of the submodules in subdirectories of the supermodules GITDIR instead? Josef -
hoi :) Why not simply use a shared object database instead? You can still have an alternative to some standalone bare repository of the submodule if you do not like to store submodule objects in the supermodule repository. --=20 Martin Waitz
Sure. I have no problem with this. But can we go one step further? AFAICS your submodules store the .git/ directories of submodules directly at submodule position in the working tree - but you have a link .git/objects into the object database of the supermodule. When the user wants to delete the submodule, he would remove this .git/ directory, too. So you loose the .git/refs of the submodule etc. I would suggest to put the submodule .git dirs into the .git dir of the supermodule. Josef -
Let's see if I understand you correctly: You don't want to create an additional .git directory for the submodule and just handle everything with one toplevel .git repository for the whole project. Without the .git directory, you of course do not have refs/heads inside the submodule. So this is a different user-interface approach to submodules when compared to my approach. But the basis is the same and both could inter-operate. Now your submodule is no longer seen as an independent git repository and I think this would cause problems when you want to push/pull between the submodule and its upstream repository. No technical problems, but UI-problems because now your submodule is =20 But you could still call the "xdiff" part of the git repository a submodule. And then changes to the xdiff directory result in a new submodule commit, even when there is no direct reference to it. git-cat-file commit HEAD:xdiff already works out of the box (even cat-file tree to get the submodule tree). But up to now revision parsing follows the file name only once. What about just separating things with "/"? commit HEAD tree HEAD/ blob HEAD/Makefile commit HEAD/xdiff tree HEAD/xdiff/ blob HEAD/xdiff~2/xemit.c this may add some confusion when used with hierarchical branches, but it's still unique: refs/heads/master/xdiff/xemit.c Just use as many path components until a matching reference is found, then start peeling. Or just use / between super and submodule: refs/heads/master:xdiff/xemit.c I think this is easier to read then Because it helps "normal" git operations ;-) --=20 Martin Waitz
Good. For me that is the main point. As I said before the user interface is not so important because it can be changed anytime, but to change the object database later is close to impossible. You can always pick a single commit or several commits out of a larger repository and have a complete git repository. Yes and no. You can always have branches that are only concerned with submodules' code, say, in refs/heads/submodules/<submodule>/. "submodules" here is simply an example and has not deeper meaning. You could call it foo or whatever you like. Or you could use refs/heads/<submodule>/ if it suits you. But if you mean the submodule as seen from the supermodule, then there Let's make certain that we understand each other. I see a clear distinction between the submodule code in a supermodule branch (commits in the supermodule's tree and nothing else) and submodule branches which are independent of the superproject. Supermodule branches and submodule The double slashes is the only way I can think of that clearly indicates that I do not mean the contents named by the path, but the commit that you find there. Once you have named a commit in that way, you can continue to apply other revision naming suffixes, paths, and so on. Let's try. What does git cat-file -p master:dir/sub//^^^:sub/dir/sub//^:dir/file mean? Explanation: Take branch master and go to path dir/sub. There you will find a commit. Take its grand-grandparent and go to path sub/dir/sub (the first sub is a subproject as well but we do not care). There you will, again, find a commit. Take its parent and go to path dir/file which happens to be a blob the contents of which you want to cat. In reality you will never see these kinds of complex paths. Have you ever seen something like git cat-file -p Let's see. I still have to try. Regards Stephan -
hoi :) ts. Sure it you are able to make it work, but it needs more work on the UI part. How do you handle the index? How do you allow to clone only the submodule? I really thought about such a setup too, but then decided that it is much easier to work with submodules when you can really see it as a Agreed. I think the thing which caused some discussion is that I make the current submodule commit which is used by the supermodule available in a refs/head in the submodule. So there is one "branch" in the submodule which corresponds to the version used by the supermodule, but this is just for user interface. It's most important purpose is to give this special commit a name, so that it can be used in merges, etc. By selecting another refs/heads "branch" in the submodule you can also easily detach the submodule from the supermodule. It is really important to understand that you can't branch the submodule alone and still have it connected to the supermodule, because the supermodule always tracks only one commit for each submodule. So every branch that affects the project has to be done on project (topmost supermodule) level. But of course the submodule can have other branches which are not tracked by the supermodule. So by checking out refs/heads/master (as it is used in my implementation) you can attach the submodule to the supermodule (attach as in: bring the working directory in sync with the whole project), and you can detach it by selecting another refs/heads (the submodule is still part of the supermodule, but not in the state which is currently visible in the working directory). This may sound confusing, but it really is the only semantic for submodule branches that makes sense. There are fears that you may commit something that does not match your current working directory. Sure, but you explicitly asked for it and I With the current semantics, you can already get to the submodule commit (just leave out your double slashes), but what is missing is simply to apply ...
So a commit in the supermodule turns into a commit in the submodule? That's just plain wrong. If it doesn't, why would the submodule HEAD have to change? -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
hoi :) So how do you update your submodule? Remember: if you git-pull in the supermodule, you want to update the whole thing, including all submodules. --=20 Martin Waitz
By committing to it separately, or by getting changes from the upstream Only if the new commits I pull into the supermodule DAG has commits which includes a new shapshot of the submodule, otherwise it wouldn't be necessary. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
hoi :) Of course. But if the supermodule contains changes to the submodule, you still have to change the submodule. And this implies changing the submodule HEAD or some branch. --=20 Martin Waitz
Not really. I fail to see why HEAD needs to be changed so long as the commit is in the submodule's odb. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
Because I want the submodule to act as a normal git repository. Please note that I also voted against changing HEAD directly, but that the new commit which came from the supermodule is just stored in one branch of the submodule, as part of the supermodule checkout. --=20 Martin Waitz
You're assuming the super- and sub-module will share HEAD, or at least ODB, I think. I'm not convinced this is necessary. Convince me. I'll go drink bear and get some dancing done while you're at it ;-) -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
hoi :) Get me a beer and I will convince you :-) --=20 Martin Waitz
Right. A commit in the supermodule should _not_ imply a commit in the submodule. Maybe I should take a look at the code, but it sounds like people are still trying to "mix" submodules too much. Think of it this way: one common use for submodules is really to just (occasionally) track somebody elses code. The submodule should be a totally pristine copy from somebody else (ie it might be the "intel driver for X.org" submodule, maintained within intel), and the supermodule just refers to it indirectly (ie the supermodule might be the "Fedora Core X group" which contains all the different drivers from different people). So anything that mixes super-modules and sub-modules too much will always break this kind of model. A supermodule can never "contain changes" to a submodule. A supermodule would always just point to the submodule, and not have any changes what-so-ever of its own. The submodule is self-sufficient, and always contains all its _own_ changes. Linus -
hoi :) Yes, but it is not only about tracking, also about distributing submodules. One Fedora X developer fixes a bug in the intel driver, commits that to the submodule and then updates the supermodule to the new version (by calling "git-update-index drivers/intel && git-commit" or something). Then another Feora X developer updates his X repository. By pulling the supermodule he also gets a new version of the submodule. And this new version of the submodule is stored in a branch which can be The supermodule always contains _the_entire_ submodule with its complete history, so it also does contain changes. But it does not per-se contain changes, only indirectly (i.e. the commits in the submodule are Yes. --=20 Martin Waitz
Linus Torvalds wrote: ... > Think of it this way: one common use for submodules is really to just > (occasionally) track somebody elses code. The submodule should be a > totally pristine copy from somebody else (ie it might be the "intel driver > for X.org" submodule, maintained within intel), and the supermodule just > refers to it indirectly (ie the supermodule might be the "Fedora Core X > group" which contains all the different drivers from different people). Could you please be a little bit more specific about how you would store the "pristine copy". There seems to be some agreement to store the commit id of the submodule instead of a plain tree id in the supermodules tree object, and that all objects that are reachable from this commit are made part of the supermodule repository (either fetched or via alternates). Do you agree? ... > A supermodule can never "contain changes" to a submodule. A supermodule > would always just point to the submodule, and not have any changes > what-so-ever of its own. The submodule is self-sufficient, and always > contains all its _own_ changes. That is one of the points Martin Waitz and I are discussing. If I understand you correctly you cannot make any changes to the submodules code _in the supermodule's repository_, no bugfixes, no extensions, no adaptions, nothing. Do you mean that? That would be a third alternative. In my opinion the usefulness of submodules would be unnecessarily restricted if it comes to the choice of either using the code from upstream as is or do not use submodules at all. What is the point of the restriction? Regards Stephan -
Note that it's not necessarily "pristine", since the submodule clearly is a local git repository in its own right. So like _any_ git repository, you can (and may well end up) having your own local branches in the submodule, with your own local modifications. So I'm not claiming that a submodule must always match some external git tree 100%, and that it must be read-only or anything like that. I'm just saying that I suspect that quite often, one of the MOST IMPORTANT parts is that the submodule is really something that somebody else technically maintains, and that this is actually one of the _reasons_ why it is a submodule in the first place. For example, a lot of projects end up having some kind of "library component" as a submodule. Take something like a video player project, which would have something like ffmpeg as a submodule, not because you'd maintain ffmpeg yourself, but simply because (let's say) the library interface changes enough, or you need a specific version with some of your own fixes that haven't been released widely yet, so you want to carry all the libraries you need _with_ you, even though you don't really maintain that submodule. You at most have some small extensions of your own. Now, in this situation, it's relaly really _important_ that the submodule really is totally independent of the supermodule, for several reasons. For example, since you don't "really" own that project, carrying around your own fixes is really really painful. We know it happens all the time, and a lot of projects end up needing their own version, but the _last_ thing you want is to be in merge hell all the time. So as a supermodule maintainer, the best possible thing for you is to be able to push back those local changes to the original project maintainer, so that you _don't_ have to maintain your own changes. But you need to realize that the real maintainer of the submodule is TOTALLY UNINTERESTED in your supermodule. He's not going to maintain it, ...
An implication of this is that the entire administrative responsibility for having some super-sub module interaction lies entirely with the supermodule. Why not have a "glue" object at the "stub"-interface of the supermodule tree that provides policy mappings to the sub-modules. Perhaps indicating git URL location, mappings of branch names between super- and sub- modules, special commit SHA1s, user policy or config choices at the boundary, and things like that. Is that the sort of direction we are headed? jdl -
That's a good thing. I wouldn't want the openssl maintainers to have to bother with every project that uses their code, and I'm fairly certain they feel the same. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
Not unless you have something useful in mind that could be put in these glue objects. URLs and branch names, in particular, should not be stored in the repository itself, but in configuration files, since they will be different for different copies of the repo. skimo -
True. But if you need the changes to the submodule for your supermodule to function, and upstream either does not want to merge your changes or the merge will be available only after a long time, then what is the alternative? You must be able to keep local changes, and you must be able to keep pulling from upstream. Of course, what you describe is the ideal case: You find a bug, push the fix upstream, and in no time at all your fix is merged and you can just pull a new version into your No! All you need is a naming scheme to address the commit of the subproject that should be pulled. The extreme case would be to just address it with its id (well, currently you cannot do that with git pull, but that is fixable). But I already proposed a syntax for naming commits which are "hidden" in a superproject: Just name the path as described in git-rev-parse and append double slashes (to indicate that you mean the commit, not the tree it contains). So no manual work needs be done by upstream. If you want to track some chosen submodules there are two easy solutions: 1. If you want to track their state as it appears from the supermodule's view, pull from master:<submodule>// 2. If you want to track their state from their own development branches, pull from <submodule>/master Every commit is a git tree in its own right, is it not? I am not sure I understand what you say. 1. If you are working on a submodule, then the supermodule never enters the picture. You are working independently. So far, so good. 2. If you are working on the supermodule, git will not be able to function? How would you work without submodules, in which case you would I totally agree. When I try to explain why submodules work that only exist as part of one or more supermodules, I do not mean to say that you cannot or should not have independent branches or repositories for the I took that for granted: from a commit you only ever look backwards (in time/history dimension) or downwards (in ...
If you want to allow this, you have to be able to cut off fetching the objects of the supermodule at borders to given submodules, the ones you do not want to track. With "border" I mean the submodule commit in some tree of the supermodule. This looks a little bit like a shallow clone, where you introduce graft points at the border to some of the submodule's object DAGs. But I am not sure that this is scalable: for supermodules with a large number of submodules you are not interested in, your graft file would grow very fast, as there will be new borders with every change in some submodule, which happens to be tracked in the supermodule. So IMHO, instead of a huge graft file, you want to have a fast way to check at a submodule border which submodule this given border is going into. Then, at fetch time, you easily can decide that you do not want to fetch any object from the submodule. Otherwise, you would have to ask the remote end at cloning time: "Is this commit from some submodule I am locally not interested in?" So I think we should introduce a submodule namespaces in supermodules. And at every border from super- to submodules, the name of the submodule we are going into should be specified. Which actually means that we need to introduce a "submodule" object, and trees of a supermodule can have such submodule objects as borders into a submodule. In a submodule object, of course we have the SHA1 of the commit into the submodule DAG, and there would be the global unique name we have choosen for this submodule in this supermodule. Something like submodule: gcc commit: 6287376... Before cloning a supermodule, you should be able to list the names of the submodules available, and select the submodules you want to have So in the example, "that/one/submodule" is _not_ the path of the working tree which happens to be the root of the submodule at current supermodule HEAD, but the unique name from the submodule namespace. This is important, as you should be able to move th...
No. I would say that it looks more like a "partial checkout" than a shallow clone. A shallow clone limits the data in "time" - we have _some_ data, but we don't have all of the history of that data. In contrast, a submodule that we don't fetch is an all-or-nothing situation: we simply don't have the data at all, and it's really a matter of simply not recursing into that submodule at all - much more like not checking out a particular part of the tree. So if a shallow clone is a "limit in time", a lack of a module (or a lack of a checkout for a subtree in general - you could certainly imagine doing the same thing even _within_ a git repository, and indeed, we did discuss exactly that at one point in time) is more of a "limit in space". Linus -
OK. I still think it should be about "limit in space" regarding the objects in the local repository. For a project containing "gcc" as submodule, and I am not interested in this submodule, there should be a way to not need to fetch all the objects from the gcc submodule at clone time. What about my other argument for a submodule namespace: You want to be able to move the relative root path of a submodule inside of your supermodule, but yet want to have a unique name for the submodule: - to be able to just clone a submodule without having to know the current position in HEAD - more practically, e.g. to be able to name a submodule independent from any current commit you are on in the supermodule, e.g. to be able to store some meta information about a submodule: - "Where is the official upstream of this submodule?" - "Should git allow to commit rewind actions of this submodule in the supermodule?" (which, AFAICS, exactly has the same problems as publishing a rewound branch: you will get into merge hell when you want to pull upstream changes into the supermodule) - "Should this submodule be checked out?" and so on. Josef -
Umm? I don't get the issue. A submodule is a git repo in its own right, and you clone it exactly like you'd clone any other repo. It _does_ have a HEAD. It has it's own branches. It has everything. So when you clone a submodule, you always get all those branches. The supermodule will not _point_ to them all (the branches are local to the submodule, and _will_ depend on things like "which upstreams module am I tracking"), but they'll have to be there, exactly _because_ the submodule has an existence and is tracked on its own. In the trivial case where the submodule doesn't even _have_ any external existence at all (ie it's always maintained as _just_ a submodule, it would probably tend to have just one branch, and a clone would get whatever that branch is), but that's just a degenerate special case of the The current commit within the supermodule would be _totally_ invisible to the submodule. Of course, if HEAD _differs_ from that commit within the supermodule, then a "git diff" (when done from within the supermodule) should show that, but That's entirely a question for the submodule. You cannot ask that question within the confines of the supermodule, because it's not even a relevant question in that context. Two different supermodule repositories may well decide to get their submodules from difference places, just because they got cloned from different places (or even just for practical reasons like "that other site is closer to me"). So the official upstream of a submodule must NOT be encoded inside the supermodule (or at least not within its _objects_). Exactly because the upstream location is not a "global" thing - it's per-repository, and thus must not be encoded in the global data (ie the objects). It should be be encoded in some _ephemeral_ place, eg in the ".git/config" file or in a ".git/remotes/origin"-like file (either in the supermodule or the submodule, and I would seriously suggest you do it within in the submodule itself, beca...
I just thought about the case when you want to clone a submodule directly out of the supermodule repository, at a given realive path. And that can be changing. Of course, every project which happens to be submodule of some supermodule, also can have its own repository, as it is fully independent. And then, you of course can clone from without any knowledge of its relative position Of course. Yet, you need some name to store meta information of submodules into some config file of the supermodule, like whether you want to have it checked out (see below). In that case, such a name for a submodule does not have to be global in Yes. I just gave an example of a policy some project may want for submodule Exactly. And in this list, you have to specify names. The thing I wanted to discuss is whether such names would need to be globally unique in the project containing submodles, or not. If yes, it IMHO makes a lot of sense to introduce "submodule objects" which contain these submodule names, and which are used as pointers to submodule commits in supermodule trees. Josef -
Yes, you do need to have a list of submodules somewhere, and you'd need to maintain that separately. One of the results of having the submodules be independent from the supermodule is that it's not all "automatically integrated", and thus the supermodule does end up having to have things like that maintained separately. And yes, if you screw that up, you wouldn't be able to fetch submodules properly etc, even if you see the supermodule, and yes, this sounds more like the CVS "Entries" kind of file that is more "tacked on" than really deeply integrated. But I think the separation is _more_ than worth the fact that you can see things being separate. In fact, I'm very much arguing for keeping things as separate as possible, while just integrating to the smallest possible degree (just _barely_ enough that you can do things like "git clone" and it will fetch multiple repositories and put them all in the right places, and "git diff" and friends will do reasonably sane things). My preference would be for it to be "local", just because (as I mentioned), with mirroring etc, it might well be that you want to fetch things from the _closest_ repository. That's really not a global decision, You could do it that way, and then it would be global. It would work, and in many ways it would probably be "simpler" on a supermodule level. The advantage of a global namespace is that you can much more easily update it - "git fetch" will just fetch the new file(s) that describe the subprojects very naturally if they are all global. Putting them in a local .git/config file has it's advantages (see above), but it also makes it very hard to version them, and to update the list - it would have to become manual. There are possibly combinations of the two approaches: have a "global namespace" that describes the canonical place to get the subprojects, but have some way to add local "translation" of the canonical names into locally preferred versions (eg you could just h...
(I wrote most of this a couple of days ago, so it's not at the tip of the conversational tree, so to speak. But it's effectively a response to Linus's "what do you want to do with submodules" question, with some thoughts on implementation. Sorry it's so long; like Blaise Pascal, "I would have written a shorter letter, but I did not have the time.") The supermodule concept, implemented right, could really improve cooperation among embedded platform integrators, boutique distro publishers, and other editorial contributors to sprawling metaprojects who don't want to run kernel.org-scale mirrors. To make this work, you need sparse repositories (conserving resources when fetching, by omitting the bulk of currently un-needed submodules that can reliably be obtained later from elsewhere) and shallow cloning (conserving resources when publishing, by referring cloners to a third-party repository for universally available content). For instance, it would be a wonderful thing if the pile-o-patches nightmare that is PTXdist (and crosstool and buildtool and every other approach I have seen for ongoing maintenance of embedded toolchains and userlands) were obsoleted by a git supermodule. Its submodules would mostly track external projects, but would also logically contain the fix-up patches worked out during platform integration, checked in to branches anchored at each upstream release point. The supermodule would contain all of the build automation, log auditing, and remote unit testing stuff, as well as the metadata for each submodule involved in this platform build cycle. At a content level, the sparsely populated / shallowly published supermodule wouldn't be much different from today's PTXdist. But the pay-off comes when you merge forward to a new release of some base component (compiler, library, etc.) and discover that some of your fix-ups have been adopted or obsoleted upstream, and new fix-ups are needed for components that depend on the updated bit, and the set of configurabl...
Did you see GitTorrent? http://gittorrent.utsl.gen.nz/ A lot of similar ideas to what you mention. Sorry, still no prototype :) I'd see the submodules thing as a good way to glue together a whole bunch of repositories, so that the core mirror servers only have to mirror a small-ish number of repositories. Sam. -
Why? You just recursively search for every "link" object in the supermodule. That tells you which submodules you need and where they should be. During a supermodule clone, it can tell the client end to start a new clone with the correct path because it knows what the local path is at that moment. Andy -- Dr Andrew Parkins, M Eng (Hons), AMIEE andyparkins@gmail.com -
hoi :) you can always have a bare repository for all used modules lying around in some defined location. There is no need for a unique submodule-name. --=20 Martin Waitz
Linus Torvalds wrote: If you do not want to fetch all of the supermodule then do not fetch the supermodule. Instead fetch only the submodules you are interested in. You do not have to fetch the whole repository. Regards Stephan -
So why do you want to limit it? There's absolutely no cost to saying "I
want to see all the common shared infrastructure, but I'm actually only
interested in this one submodule that I work with".
Also, anybody who works on just the build infrastructure simply may not
care about all the submodules. The submodules may add up to hundreds of
gigs of stuff. Not everybody wants them. But you may still want to get the
common build infrastructure.
In other words, your "all or nothing" approach is
(a) not friendly
and
(b) has no real advantages anyway, since modules have to be independent
enough that you _can_ split them off for other reasons anyway.
So forcing that "you have to take everything" mentality onyl has
negatives, and no positives. Why do it?
Linus
-hoi :) An interesting way to support this "only fetch some modules" use-case is to use several supermodules. So you could have one supermodule which is geared towards developers and only contains the modules they use. Another supermodule contails all the toolchain sources. And then there is the supermodule used for releases which is just a merge of all the other supermodules. The concept is so flexible that you don't have to introduce lots of other things as module namespaces. Just use the tools you have in a creative way ;-) --=20 Martin Waitz
If you need a common infrastructure to be able to work with the submodule, then the submodule is not independent of of the supermodule. (There have been lots of use cases for shallow clones but for a long time git did not support them). If you can extend this partial fetch feature to the non-subproject case I would agree with your reasoning. What makes the subprojects so special in this regard. Do I have to turn a plain tree into a subproject to be able to ignore it? Once you can restrict fetches to parts of the contents you get the ability to restrict fetches to the "common infrastructure" and selected submodules for free. Regards Stephan -
Here's an real-world example that doesn't contradict: http://amarok.kde.org/wiki/Installation_HowTo#From_Anonymous_SVN "svn co -N svn://anonsvn.kde.org/home/kde/trunk/extragear/multimedia cd multimedia svn co svn://anonsvn.kde.org/home/kde/branches/KDE/3.5/kde-common/admin svn up amarok To compile the sources (from the multimedia directory):" and there's probably very few people that want to clone the entire KDE multimedia sub&super-module in this case. //Torgil -
And I'll add the note that people who do things like submodules aren't generally even _used_ to them being "seamless", and most of the time probably don't even want complete seamlessness. As the example that Torgil points to shows, people are quite used to actually even naming the submodules separately, and things like having the "default" set of submodules not equal the "complete" set. In other words, I don't think people expect or want something hugely more complicated than the CVS/modules kind of file. What people _do_ want (and that CVS in general is horribly bad at, and this is not a module-specific issue) is to have the _versioning_ work well. When you check out a specific version of a module, you want any _linked_ modules to follow along too. This is the same reason why CVS users use tags a lot: because even _within_ a single project (no modules, no nothing), it's often hard to re-create the exact state of a version any other way. So you tag every single file and do insane things like that, because CVS just isn't very good at guaranteeing consistency across the whole project. The exact same thing is true about subprojects. I don't think that people who have used CVS subprojects a lot really mind the CVS/modules file itself (but hey, maybe I'm wrong - I've seen _other_ people maintain modules in CVS, but I've never done it myself), but they do mind the fact that it's hard as hell to do something as simple as "get all modules back to version X" without lots and lots of careful crud (ie tagging every singl emodule, things like that). Now, I'm not exactly sure who wants to use git modules, so this is the time to ask: did you hate the CVS/modules file? Or was it something you set up once, and then basically forgot about? People clearly use the ability to mark certain modules as depending on each other, and aliases to say "if you ask for this module, you actually get a set of _these_ modules". _I_ suspect that that isn't the problem peopl...
Here's some thoughts on subprojects from my company's perspective. I apologize for the long message. Abstract: We use submodules heavily in CVS and SVN. I like what I've read from Linus about the "thin
