Packed tag objects breaks Cogito when using git+ssh:// transport. Example: cg-clone -s git+ssh://master.kernel.org/pub/scm/libs/klibc/klibc.git -hpa -
Dear diary, on Sat, Sep 24, 2005 at 12:24:06AM CEST, I got a letter I changed the code to use the git-*-fetch tools to fetch the objects referenced by tags, so this works properly now. Thanks for the report. It takes loooong time, unfortunately - scp -r takes its time itself on many small files, and then we have to make a separate call to git-ssh-fetch for each tag. Isn't that braindamaged... :/ -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ VI has two modes: the one in which it beeps and the one in which it doesn't. -
Dear diary, on Sat, Sep 24, 2005 at 03:18:33AM CEST, I got a letter And now thanks to "walt" I realized that this is a completely wrong way to go. The problem is that the tags don't have to tag anything on your branch, and if you are fetching a given branch, you want only commits from that branch. But fetching the tags will cause all the commits connected to the tags getting slurped too, and we didn't want that. So the strategy I'm thinking of now is to manually (I think no GIT tool can do that for me) dereference the possible tag chain until I end up at some non-tag object. Now, if it is a commit and I don't have it yet, it means that it is not interesting to me because it does not belong to a branch I'm following, so I will just ignore the tag (won't download anything else and won't record it in the refs/tags directory). If it's NOT a commit, well, that's a question. On the assumption that it won't be a great deal of data and it's likely to be assumed that we have it, I would be inclined to fetching it, but I don't feel strongly about it. The ideal and the least expensive solution for this, obviously, would be having this logic in git-fetch-pack. :-) Opinions? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ VI has two modes: the one in which it beeps and the one in which it doesn't. -
v2.6.11 tag is not a commit but presumably it would slurp in a lot of data. -
git-rev-parse $tagname^0
You need "--verify". Otherwise git-rev-parse will think that you just have a strange filename or other random thing: prompt$ git-rev-parse 000^0 000^0 prompt$ git-rev-parse --verify 000^0 fatal: Needed a single revision Now, if the tag doesn't point to a commit, then the "^0" thing will fail. What you could use instead is git-rev-list --max-count=1 "$tag" since git-rev-list will actually follow the tag. Of course, whether it does so correctly or not if the tagged object doesn't exist, I dunno. Testing needed. Finally, you might just do it by hand type=$(git-cat-file -t "$obj") || exit if [ "$type" == "$tag" ]; then tagged=$(git-cat-file tag "$obj" | sed 's/object // ; q') git-rev-parse --verify "$tagged" fi untested, of course. Linus -
Hmm: $ cat .git/refs/tags/v2.6.13-rc4 7eab951de91d95875ba34ec4c599f37e1208db93 $ git-rev-parse v2.6.13-rc4 7eab951de91d95875ba34ec4c599f37e1208db93 $ git-rev-parse v2.6.13-rc4^0 63953523341bcafe5928bf6e99bffd7db94b471e $ git-rev-parse 63953523341bcafe5928bf6e99bffd7db94b471e^0 63953523341bcafe5928bf6e99bffd7db94b471e # The typo that demonstrates what you did: $ git-rev-parse 7eab951de91d95875ba34ec4c599f37e1208db93^- 7eab951de91d95875ba34ec4c599f37e1208db93^- $ git-rev-parse 7eab951de91d95875ba34ec4c599f37e1208db93^0 63953523341bcafe5928bf6e99bffd7db94b471e So I think --verify is beneficial if you want errors returned, but if you know you have real tags or commits, git-rev-parse without the --verify seems to do the right thing. Or, at the very least, in the case where I used this (linux/scripts/setlocalversion), this behavior is fine. -- Ryan Anderson sometimes Pug Majere -
The point being that if you want to test whether you have the thing the
tag _points_ to, you should verify it.
And that's where the "--verify" flag comes in:
[torvalds@g5 linux]$ git-rev-parse v2.6.11^0 ; echo $?
error: Object 5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c is a tree, not a commit
v2.6.11^0
0
and if the object the tag points to didn't exist at _all_ in your object
store, you'd have silently gotten
[torvalds@g5 linux]$ git-rev-parse v2.6.11^0 ; echo $?
v2.6.11^0
0
but if you used "--verify", you'd have at least gotten
[torvalds@g5 linux]$ git-rev-parse --verify v2.6.11^0 ; echo $?
fatal: Needed a single revision
1
which is what you want, I thought.
Linus
-If it _is_ a commit, you could use git-rev-list --max-count=1 $tag It won't help you though if it isn't. skimo -
What is the objective here? If you fetch a tag without the object being tagged (or commit without its tree), you will end up with smaller object database but you would get yelled at by git-fsck-objects. -
Having said that, I am sympathetic to what you are trying to do here; if what I understand what you are trying to do matches what you are actually trying to do, that is. I think there should be a way to say "I do not care if this repository does not have all the history back to root -- as long as I can operate on reasonably recent commits, do not complain about missing objects" to fsck-objects and various fetch engines. We can cauterize commit history chain using the grafts file so that 'git log', 'git whatchanged', and 'gitk' would stop somewhere. Commit walkers can help you, albeit somewhat differently, if you do not give -a flag to them. -
Dear diary, on Tue, Sep 27, 2005 at 12:23:41AM CEST, I got a letter Yes - so you can't save the tag objects either, but then you'll re-slurp them again and again, which is kind of silly. Alternatively, you could actually make git-fsck-object silent about the case when an unreachable (not referenced in refs/) tag object references a non-existing object - perhaps unless --strict is passed to it. If you think the rest of my logic is ok, I think this change to facilitate this "tags caching" is not unreasonable. The alternative solution would be to have the tags cache with the tag objects separate of the main object database, but that'd be very dirty. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ VI has two modes: the one in which it beeps and the one in which it doesn't. -
Now you completely lost me. I really do not understand what you mean by tags caching and re-slurping. If your user _is_ interested in the tag, say v0.99.7d, wouldn't it make sense to make sure that, after the user fetches the tag, the user can build v0.99.7d point release as well? What do you think the reason is when your user says he is interested in another tag, junio-gpg-pub? Wouldn't it be the most natural interpretation that he wants to get the blob the tag refers to, so that he can use it with git-verify-tag? What good does it do for the user if you get only the tag object and do not get the blob the tag refers to? Yes, he can say "git cat-file tag junio-gpg-pub", but that by itself is not that interesting if it cannot be used to validate the other tags (or itself). If the users ask for a tag, I think it is easier for them to understand if you made sure you give them the complete set of objects that need to support that tag, at least by default. Giving the user an option to override it to make a sparse, incomplete, fsck-unclean repository is fine as a spacesaver option, but I think that should be left for "more advanced users" who understand the ramification of using the option. I happen to publish maint branch, but I could have done without. I can make a temporary branch out of v0.99.7c tag, add fixes to extend that branch, tag the branch head as v0.99.7d, and delete the temporary branch without publishing it at all. The tree needed to build v0.99.7d point release would be only reachable by fetching that tag (and here, "fetching the tag" really means "making sure the receiving repository has the tag object, and all the objects that are reachable from that tag object"), so "fetching only the tag object and not the object it refers to" in that case does not make much sense for the end user. Yes, he can say "git cat-file tag v0.99.7d", but that by itself is not that interesting if he cannot use it to build that release. -
I think Petr is interested in the case where the user hasn't asked for a particular tag. He wants to automatically grab all the tags in a repository, or at least those that refer to a branch being downloaded. Of course, if somebody asks for a specific tag, then everything necessary should be downloaded. Somebody is fetching your maint branch, Petr want to automatically download all the tags v0.99.7[a-d], without the user specifying them explicitly. Or more complex, somebody is tracking your master but NOT maint. Then Petr wants to download tags v0.99.[0-9] but not v0.99.7[a-d]. Tom -
Ah, _automatically_ was the key. If all you had were tags and there were no branches (the "I could have done without maint branch"), that kind of automatic grabbing would not work well anyway. I personally feel that is a lost cause. The user can run 'git ls-remote' himself to find out if there are new tags on the remote side and ask for them if needed. Also, I feel names under refs/ is local to the repository, but if the tags are automatically grabbed, I presume they are stored directly under the same name in refs/tags as the remote side has them? -
Dear diary, on Tue, Sep 27, 2005 at 07:28:16AM CEST, I got a letter I don't think that's a realistic situation. IMHO it is a reasonable requirement for Cogito fetch that you are primarily fetching a _head_. Then, you also grab tags which are meaningful for that head - that's what I want to do. If you want to also specifically grab some extra tags, you should be able to tell cg-fetch about that too (cg-fetch -t tagname) or something. Being able to do this, I'm inclined to agree that Yes, that's perhaps a fine solution for the core GIT plumbing, but in Yes. And I certainly don't say that what Cogito does now is perfect, not even that it's very good. But we (well, rather the users) certainly _do_ need some kind of automatic tags fetching - that's something that has to Just Work (tm). As I already said in the past (without much feedback, unfortunately), we certainly need to distinguish between private tags (specific for given repository) and public tags (should be propagated by fetching). Another thing I proposed back then (I think it was in June) was having the refs/tags directory further divised based on heads, so all tags for head A would be in refs/tags/A/, etc. I didn't pursue this idea now because it seemed that there would be way too many duplicate stuff in refs/tags/ since most tags are likely to be shared across heads, but perhaps it is the beast and cleanest solution after all. Dear diary, on Tue, Sep 27, 2005 at 12:37:48AM CEST, I got a letter Well, this wasn't something I had on my mind in this thread, but it is actually what I want to do too (I have such a loooong TODO list). Sure, you can workaround the problem with grafts, but I think that this hack should be really used only in specific cases (like grafting big history pack after importing the project to GIT, making it kind of optional "addon", which is actually very nice). In the general case, I would much more like if you could say "I want only commits to the depth of 5" or even CVS-like "I want only the HE...
I agree that would be nice. If you are only interested in tags that refer to commits that anchor points in published branches, maybe we should have something along the lines of info/refs to help the downloaders? Perhaps info/refs showing the SHA1 id of the non-tag object each tag dereferences to in addition to the current output? This is a bit hard and needs some thinking to do cleanly, because what is in info/refs is what is sent from the publisher side over git-native protocol at the beginning of the handshake, and it is not easy to add that to git-native protocol cleanly and backward-compatibly (I think I know how without breaking existing clients, but it is not clean). -
Argh.
"git-upload-pack" very much on purpose never sends partial object stores:
it really doesn't want to send a tag-object for you to even _look_ at
unless it also sends all the objects that you are missing that the tag
refers to.
I'd really be much happier with the tag fetching being separate.
For example, making
git fetch --tags <dest>
fetch all tags _and_ the objects that they depend on would seem a _lot_
more appropriate.
The thing is, tags really may be totally private. For example, it makes
sense to fetch tags when you pull an official tree (ie my kernel tree, or
your git tree), but it does NOT make sense for me to fetch tags
(automatically or not) when I pull from a developers tree.
That's why git fetch doesn't get the tags by default. It's WRONG.
But we could certainly make it _easier_ to get tags when you want them.
"git-ls-remote" already helps you, and
git-ls-remote ... | cut -f2 | grep '^refs/tags/'
completes the picture. No protocol changes necessary, just some added
magic to git-fetch.sh.
Actually, here's a simple and stupid patch.
Untested as usual, but hey, how hard can it be?
Linus
----
diff --git a/git-fetch.sh b/git-fetch.sh
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -5,6 +5,7 @@
_x40='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
_x40="$_x40$_x40$_x40$_x40$_x40$_x40$_x40$_x40"
+tags=
append=
force=
update_head_ok=
@@ -17,6 +18,9 @@ do
-f|--f|--fo|--for|--forc|--force)
force=t
;;
+ --tags)
+ tags=t
+ ;;
-u|--u|--up|--upd|--upda|--updat|--update|--update-|--update-h|\
--update-he|--update-hea|--update-head|--update-head-|\
--update-head-o|--update-head-ok)
@@ -151,7 +155,12 @@ case "$update_head_ok" in
;;
esac
-for ref in $(get_remote_refs_for_fetch "$@")
+taglist=
+if [ "$tags" ]; then
+ taglist=$(git-ls-remote "$remote" | awk '/refs\/tags/ { print $2":"$2 }')
+fi
+
+for ref in $(get_remote_refs_for_fetch "$@" $taglist)
do
refs="$refs $ref"
-The problem here is that currently there are no global, public branches. And you should not mix private heads in refs/heads with global tags. Perhaps interpret tag objects as global branch names, similar to the "mixture" in .git/refs ? Josef -
Dear diary, on Tue, Sep 27, 2005 at 12:14:31PM CEST, I got a letter But we don't need any global tags or heads. You just have some heads in your refs/heads (it doesn't matter if they are public or remote, that's a "social" issue what you tell people to fetch). And based on your heads you have in your refs/heads, there would be directories in your refs/tags/ corresponding to those. If you fetch remote head, its local subdirectory in refs/tags/ is populated with the new tags, and if you merge two heads, the public tags are copied around. Then if you are resolving a tag, we should first look at refs/tags/$(readlink HEAD)/tagname, and if it doesn't exist, we would look at refs/tags/tagname (so if you wanted to reference a tag not in your head, you'd have to use a "head/tag" form). Optionally, you could also look for refs/tags/*/tagname and if it gives you a unique match, use that - but I'm not sure how good idea this is since it already makes I don't understand. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ VI has two modes: the one in which it beeps and the one in which it doesn't. -
Ah, ok. Let me see if I understand: 1) These tags are bound to a head, and they have the invariant that they appear in the commit history of the head. 2) They are updated automatically. 3) When someone rebases a head, the bound tags should be synced to the rebased head's history. 4) Tags can appear multiple times, if they happen to be in the commit history of multiple heads? Ok, this is the "automatically updated" feature I talked about above. So missing here is: - If you want to get rid of a head, the tags should be removed - If a head is rebased, this has to be detected and the tags recreated, possibly removing some Probably there should be a "cg-tag --recover" to resync these volatile tags with tag objects appearing in the histories of heads? As for lightweight tags of remote repositories, you probably need some space to recover them e.g. on a rebase or creation of a new head without Tag objects in a repository could be interpreted as branch names for commits based on it. When creating a new branch point, I first would put a tag object on this branch, thus renaming it. I think this would be quite handy for navigation in histories. Josef -
It could point to a tree (ie. the kernel's v2.6.11 tag), which may end up being a large pull. I think it's best to not care what type of object the tag references. -- Brian Gerst -
Dear diary, on Mon, Sep 26, 2005 at 11:55:34PM CEST, I got a letter Yes, but the object may not be reachable in any other way. Simple question - if you have a tagged blob containing a GPG public key (let's call it.. hmm.. e.g. junio-gpg-pub ;), would you expect Cogito to ignore it or pick it up? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ VI has two modes: the one in which it beeps and the one in which it doesn't. -
I think you could run git-peek-remote to find all the refs and then run git-fetch-pack to slurp all the tags (and heads for that matter) at once. Is there a particular reason you would prefer the commit walker? -
Dear diary, on Sat, Sep 24, 2005 at 04:00:04AM CEST, I got a letter Actually, probably not, except consistency with rsync and http handling - but that's obviously not too good reason. I did it this way since I'm going to be a bit busy again from now on. I will probably rewrite the tags fetching to use git-peek-remote (info/refs for http) the next weekend. One problem with this is that in many repositories, git-update-server-info does not get ever run and things would break "mysteriously". I don't want the policy that the user has to take care of this on his own for Cogito, so I will probably add something that will automagically append git-update-server-info at least to the post-update hook (like uphook="$_git/hooks/update-post" if ! [ -x "$uphook" ]; then if ! [ -e "$uphook" ]; then echo '#!/bin/sh' >>"$uphook" echo 'exec git-update-server-info' >>"$uphook" fi # If the user added something custom and left the hook # disabled, he knew what he was doing. Also don't # reenable the hook if we already did that once. if [[ "$(grep -v '^#\($\|[^#]\)\|^$' "$uphook")" == "*exec git-update-server-info*" ]]; then chmod a+x "$uphook" echo "## Enabled by Cogito. It won't try to enable it again as long as this comment is here." >>"$uphook" fi fi or something). Actually, I might also add something like [ -e "$_git/git-dummy-support" ] && git-update-server-info at all the places in Cogito where I update the refs. Then the default post-update hook could change to [ -e "$_git/git-dummy-support" ] && exec git-update-server-info and be enabled by default? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ VI has two modes: the one in which it beeps and the one in which it doesn't. -
It wouldn't actually be very hard to rewrite git-*-fetch programs to fetch with a bunch of starting points. The main reason I haven't is actually that I don't have any ideas for a way to extend the command line argument format to include it. -Daniel *This .sig left intentionally blank* -
Perhaps git-ssh-fetch should be fixed? :) -hpa -
| Mikulas Patocka | LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers) |
| Daniel J Blueman | time for TCP ECN defaulting to on? |
| Renato S. Yamane | Error -71 on device descriptor read/all |
| Zdenek Kabelac | Suspend to memory is freezing my machine |
git: | |
| Abdelrazak Younes | Git-windows and git-svn? |
| Giuseppe Bilotta | Re: gitweb and remote branches |
| Petr Baudis | repo.or.cz wishes? |
| Josh England | Re: cloning/pulling hooks |
| Reyk Floeter | Re: Real men don't attack straw men |
| Alexey Suslikov | OT: OpenBSD on Asus eeePC |
| Jernej Makovsek | How secure is OpenBSD really |
| Girish Venkatachalam | Ethernet jumbo frames? |
| Kim Phillips | [PATCH 0/5] fixups for mpc8360 rev. 2.1 erratum #2 (RGMII Timing) |
| Michael Grollman | Re: 8169 Intermittent ifup Failure Issue With RTL8102E Chipset in Intel's New D945... |
| Gerrit Renker | [PATCH 5/5] dccp: Tidy up setsockopt calls |
| Jeff Garzik | Re: [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
