login
Header Space

 
 

Re: Cogito: cg-clone doesn't like packed tag objects

Previous thread: How to make Cogito use git-fetch-pack? by H. Peter Anvin on Friday, September 23, 2005 - 6:20 pm. (5 messages)

Next thread: [PATCH] rsh.c unterminated string by H. Peter Anvin on Friday, September 23, 2005 - 7:30 pm. (1 message)
To: Git Mailing List <git@...>, Petr Baudis <pasky@...>
Date: Friday, September 23, 2005 - 6:24 pm

Packed tag objects breaks Cogito when using git+ssh:// transport.

Example:

cg-clone -s git+ssh://master.kernel.org/pub/scm/libs/klibc/klibc.git

	-hpa
-
To: H. Peter Anvin <hpa@...>
Cc: Git Mailing List <git@...>
Date: Friday, September 23, 2005 - 9:18 pm

Dear diary, on Sat, Sep 24, 2005 at 12:24:06AM CEST, I got a letter

I changed the code to use the git-*-fetch tools to fetch the objects
referenced by tags, so this works properly now. Thanks for the report.

It takes loooong time, unfortunately - scp -r takes its time itself on
many small files, and then we have to make a separate call to
git-ssh-fetch for each tag. Isn't that braindamaged... :/

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.
-
To: H. Peter Anvin <hpa@...>
Cc: Git Mailing List <git@...>
Date: Monday, September 26, 2005 - 5:25 pm

Dear diary, on Sat, Sep 24, 2005 at 03:18:33AM CEST, I got a letter

And now thanks to "walt" I realized that this is a completely wrong way
to go. The problem is that the tags don't have to tag anything on your
branch, and if you are fetching a given branch, you want only commits
from that branch. But fetching the tags will cause all the commits
connected to the tags getting slurped too, and we didn't want that.

So the strategy I'm thinking of now is to manually (I think no GIT tool
can do that for me) dereference the possible tag chain until I end up at
some non-tag object. Now, if it is a commit and I don't have it yet, it
means that it is not interesting to me because it does not belong to a
branch I'm following, so I will just ignore the tag (won't download
anything else and won't record it in the refs/tags directory).

If it's NOT a commit, well, that's a question.  On the assumption that
it won't be a great deal of data and it's likely to be assumed that we
have it, I would be inclined to fetching it, but I don't feel strongly
about it.

The ideal and the least expensive solution for this, obviously, would be
having this logic in git-fetch-pack. :-)

Opinions?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.
-
To: Petr Baudis <pasky@...>
Cc: <git@...>
Date: Tuesday, September 27, 2005 - 3:46 am

v2.6.11 tag is not a commit but presumably it would slurp in a
lot of data.

-
To: Petr Baudis <pasky@...>
Cc: H. Peter Anvin <hpa@...>, Git Mailing List <git@...>
Date: Tuesday, September 27, 2005 - 3:25 am

git-rev-parse $tagname^0
To: Ryan Anderson <ryan@...>
Cc: Petr Baudis <pasky@...>, H. Peter Anvin <hpa@...>, Git Mailing List <git@...>
Date: Tuesday, September 27, 2005 - 11:34 am

You need "--verify". Otherwise git-rev-parse will think that you just have 
a strange filename or other random thing:

	prompt$ git-rev-parse 000^0
	000^0

	prompt$ git-rev-parse --verify 000^0
	fatal: Needed a single revision

Now, if the tag doesn't point to a commit, then the "^0" thing will fail. 
What you could use instead is

	git-rev-list --max-count=1 "$tag"

since git-rev-list will actually follow the tag. Of course, whether it 
does so correctly or not if the tagged object doesn't exist, I dunno. 
Testing needed.

Finally, you might just do it by hand

	type=$(git-cat-file -t "$obj") || exit
	if [ "$type" == "$tag" ]; then
		tagged=$(git-cat-file tag "$obj" |
			sed 's/object // ; q')
		git-rev-parse --verify "$tagged"
	fi

untested, of course.

		Linus
-
To: Linus Torvalds <torvalds@...>
Cc: Petr Baudis <pasky@...>, H. Peter Anvin <hpa@...>, Git Mailing List <git@...>
Date: Tuesday, September 27, 2005 - 1:34 pm

Hmm:

$ cat .git/refs/tags/v2.6.13-rc4
7eab951de91d95875ba34ec4c599f37e1208db93
$ git-rev-parse v2.6.13-rc4
7eab951de91d95875ba34ec4c599f37e1208db93
$ git-rev-parse v2.6.13-rc4^0
63953523341bcafe5928bf6e99bffd7db94b471e
$ git-rev-parse 63953523341bcafe5928bf6e99bffd7db94b471e^0
63953523341bcafe5928bf6e99bffd7db94b471e

# The typo that demonstrates what you did:
$ git-rev-parse 7eab951de91d95875ba34ec4c599f37e1208db93^-
7eab951de91d95875ba34ec4c599f37e1208db93^-

$ git-rev-parse 7eab951de91d95875ba34ec4c599f37e1208db93^0
63953523341bcafe5928bf6e99bffd7db94b471e

So I think --verify is beneficial if you want errors returned, but if
you know you have real tags or commits, git-rev-parse without the
--verify seems to do the right thing.

Or, at the very least, in the case where I used this
(linux/scripts/setlocalversion), this behavior is fine.

-- 

Ryan Anderson
  sometimes Pug Majere
-
To: Ryan Anderson <ryan@...>
Cc: Petr Baudis <pasky@...>, H. Peter Anvin <hpa@...>, Git Mailing List <git@...>
Date: Tuesday, September 27, 2005 - 2:04 pm

The point being that if you want to test whether you have the thing the 
tag _points_ to, you should verify it.

And that's where the "--verify" flag comes in:

	[torvalds@g5 linux]$ git-rev-parse v2.6.11^0 ; echo $?
	error: Object 5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c is a tree, not a commit
	v2.6.11^0
	0

and if the object the tag points to didn't exist at _all_ in your object
store, you'd have silently gotten

	[torvalds@g5 linux]$ git-rev-parse v2.6.11^0 ; echo $?
        v2.6.11^0
        0

but if you used "--verify", you'd have at least gotten

	[torvalds@g5 linux]$ git-rev-parse --verify v2.6.11^0 ; echo $?
	fatal: Needed a single revision
	1

which is what you want, I thought.

		Linus
-
To: Petr Baudis <pasky@...>
Cc: H. Peter Anvin <hpa@...>, Git Mailing List <git@...>
Date: Tuesday, September 27, 2005 - 2:54 am

If it _is_ a commit, you could use 
git-rev-list --max-count=1 $tag

It won't help you though if it isn't.

skimo
-
To: Petr Baudis <pasky@...>
Cc: <git@...>
Date: Monday, September 26, 2005 - 6:23 pm

What is the objective here?  If you fetch a tag without the
object being tagged (or commit without its tree), you will end
up with smaller object database but you would get yelled at by
git-fsck-objects.

-
To: Petr Baudis <pasky@...>
Cc: <git@...>
Date: Monday, September 26, 2005 - 6:37 pm

Having said that, I am sympathetic to what you are trying to do
here; if what I understand what you are trying to do matches
what you are actually trying to do, that is.

I think there should be a way to say "I do not care if this
repository does not have all the history back to root -- as long
as I can operate on reasonably recent commits, do not complain
about missing objects" to fsck-objects and various fetch
engines.  We can cauterize commit history chain using the grafts
file so that 'git log', 'git whatchanged', and 'gitk' would stop
somewhere.  Commit walkers can help you, albeit somewhat
differently, if you do not give -a flag to them.


-
To: Junio C Hamano <junkio@...>
Cc: <git@...>
Date: Monday, September 26, 2005 - 6:29 pm

Dear diary, on Tue, Sep 27, 2005 at 12:23:41AM CEST, I got a letter

Yes - so you can't save the tag objects either, but then you'll re-slurp
them again and again, which is kind of silly. Alternatively, you could
actually make git-fsck-object silent about the case when an unreachable
(not referenced in refs/) tag object references a non-existing object -
perhaps unless --strict is passed to it. If you think the rest of my
logic is ok, I think this change to facilitate this "tags caching" is
not unreasonable.

The alternative solution would be to have the tags cache with the tag
objects separate of the main object database, but that'd be very dirty.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.
-
To: Petr Baudis <pasky@...>
Cc: <git@...>
Date: Tuesday, September 27, 2005 - 12:46 am

Now you completely lost me.  I really do not understand what you
mean by tags caching and re-slurping.

If your user _is_ interested in the tag, say v0.99.7d, wouldn't
it make sense to make sure that, after the user fetches the tag,
the user can build v0.99.7d point release as well?  What do you
think the reason is when your user says he is interested in
another tag, junio-gpg-pub?  Wouldn't it be the most natural
interpretation that he wants to get the blob the tag refers to,
so that he can use it with git-verify-tag?  What good does it do
for the user if you get only the tag object and do not get the
blob the tag refers to?  Yes, he can say "git cat-file tag
junio-gpg-pub", but that by itself is not that interesting if it
cannot be used to validate the other tags (or itself).

If the users ask for a tag, I think it is easier for them to
understand if you made sure you give them the complete set of
objects that need to support that tag, at least by default.
Giving the user an option to override it to make a sparse,
incomplete, fsck-unclean repository is fine as a spacesaver
option, but I think that should be left for "more advanced
users" who understand the ramification of using the option.

I happen to publish maint branch, but I could have done without.
I can make a temporary branch out of v0.99.7c tag, add fixes to
extend that branch, tag the branch head as v0.99.7d, and delete
the temporary branch without publishing it at all.

The tree needed to build v0.99.7d point release would be only
reachable by fetching that tag (and here, "fetching the tag"
really means "making sure the receiving repository has the tag
object, and all the objects that are reachable from that tag
object"), so "fetching only the tag object and not the object it
refers to" in that case does not make much sense for the end
user.  Yes, he can say "git cat-file tag v0.99.7d", but that by
itself is not that interesting if he cannot use it to build that
release.

-
To: Junio C Hamano <junkio@...>
Cc: Petr Baudis <pasky@...>, <git@...>
Date: Tuesday, September 27, 2005 - 1:02 am

I think Petr is interested in the case where the user hasn't asked for a
particular tag. He wants to automatically grab all the tags in a repository,
or at least those that refer to a branch being downloaded.

Of course, if somebody asks for a specific tag, then everything necessary
should be downloaded. Somebody is fetching your maint branch, Petr want to
automatically download all the tags v0.99.7[a-d], without the user specifying
them explicitly. Or more complex, somebody is tracking your master but NOT
maint. Then Petr wants to download tags v0.99.[0-9] but not v0.99.7[a-d]. 

  Tom
-
To: Tom Prince <tom.prince@...>
Cc: <git@...>, Petr Baudis <pasky@...>
Date: Tuesday, September 27, 2005 - 1:28 am

Ah, _automatically_ was the key.

If all you had were tags and there were no branches (the "I
could have done without maint branch"), that kind of automatic
grabbing would not work well anyway.  I personally feel that is
a lost cause.  The user can run 'git ls-remote' himself to find
out if there are new tags on the remote side and ask for them if
needed.

Also, I feel names under refs/ is local to the repository, but
if the tags are automatically grabbed, I presume they are stored
directly under the same name in refs/tags as the remote side has
them?

-
To: Junio C Hamano <junkio@...>
Cc: Tom Prince <tom.prince@...>, <git@...>
Date: Tuesday, September 27, 2005 - 5:40 am

Dear diary, on Tue, Sep 27, 2005 at 07:28:16AM CEST, I got a letter

I don't think that's a realistic situation. IMHO it is a reasonable
requirement for Cogito fetch that you are primarily fetching a _head_.
Then, you also grab tags which are meaningful for that head - that's
what I want to do. If you want to also specifically grab some extra
tags, you should be able to tell cg-fetch about that too (cg-fetch -t
tagname) or something. Being able to do this, I'm inclined to agree that

Yes, that's perhaps a fine solution for the core GIT plumbing, but in

Yes. And I certainly don't say that what Cogito does now is perfect, not
even that it's very good. But we (well, rather the users) certainly _do_
need some kind of automatic tags fetching - that's something that has to
Just Work (tm).

As I already said in the past (without much feedback, unfortunately), we
certainly need to distinguish between private tags (specific for given
repository) and public tags (should be propagated by fetching).

Another thing I proposed back then (I think it was in June) was having
the refs/tags directory further divised based on heads, so all tags for
head A would be in refs/tags/A/, etc. I didn't pursue this idea now
because it seemed that there would be way too many duplicate stuff in
refs/tags/ since most tags are likely to be shared across heads, but
perhaps it is the beast and cleanest solution after all.

Dear diary, on Tue, Sep 27, 2005 at 12:37:48AM CEST, I got a letter

Well, this wasn't something I had on my mind in this thread, but it is
actually what I want to do too (I have such a loooong TODO list). Sure,
you can workaround the problem with grafts, but I think that this hack
should be really used only in specific cases (like grafting big history
pack after importing the project to GIT, making it kind of optional
"addon", which is actually very nice). In the general case, I would much
more like if you could say "I want only commits to the depth of 5" or
even CVS-like "I want only the HE...
To: Petr Baudis <pasky@...>
Cc: Tom Prince <tom.prince@...>, <git@...>
Date: Tuesday, September 27, 2005 - 1:07 pm

I agree that would be nice.  If you are only interested in tags
that refer to commits that anchor points in published branches,
maybe we should have something along the lines of info/refs to
help the downloaders?  Perhaps info/refs showing the SHA1 id of
the non-tag object each tag dereferences to in addition to the
current output?

This is a bit hard and needs some thinking to do cleanly,
because what is in info/refs is what is sent from the publisher
side over git-native protocol at the beginning of the handshake,
and it is not easy to add that to git-native protocol cleanly
and backward-compatibly (I think I know how without breaking
existing clients, but it is not clean).

-
To: Junio C Hamano <junkio@...>
Cc: Petr Baudis <pasky@...>, Tom Prince <tom.prince@...>, <git@...>
Date: Tuesday, September 27, 2005 - 1:56 pm

Argh.

"git-upload-pack" very much on purpose never sends partial object stores: 
it really doesn't want to send a tag-object for you to even _look_ at 
unless it also sends all the objects that you are missing that the tag 
refers to.

I'd really be much happier with the tag fetching being separate.

For example, making

	git fetch --tags &lt;dest&gt;

fetch all tags _and_ the objects that they depend on would seem a _lot_ 
more appropriate.

The thing is, tags really may be totally private. For example, it makes 
sense to fetch tags when you pull an official tree (ie my kernel tree, or 
your git tree), but it does NOT make sense for me to fetch tags 
(automatically or not) when I pull from a developers tree.

That's why git fetch doesn't get the tags by default. It's WRONG. 

But we could certainly make it _easier_ to get tags when you want them. 
"git-ls-remote" already helps you, and

	git-ls-remote ... | cut -f2 | grep '^refs/tags/'

completes the picture. No protocol changes necessary, just some added 
magic to git-fetch.sh.

Actually, here's a simple and stupid patch.

Untested as usual, but hey, how hard can it be?

		Linus

----
diff --git a/git-fetch.sh b/git-fetch.sh
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -5,6 +5,7 @@
 _x40='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
 _x40="$_x40$_x40$_x40$_x40$_x40$_x40$_x40$_x40"
 
+tags=
 append=
 force=
 update_head_ok=
@@ -17,6 +18,9 @@ do
 	-f|--f|--fo|--for|--forc|--force)
 		force=t
 		;;
+	--tags)
+		tags=t
+		;;
 	-u|--u|--up|--upd|--upda|--updat|--update|--update-|--update-h|\
 	--update-he|--update-hea|--update-head|--update-head-|\
 	--update-head-o|--update-head-ok)
@@ -151,7 +155,12 @@ case "$update_head_ok" in
 	;;
 esac
 
-for ref in $(get_remote_refs_for_fetch "$@")
+taglist=
+if [ "$tags" ]; then
+	taglist=$(git-ls-remote "$remote" | awk '/refs\/tags/ { print $2":"$2 }')
+fi
+
+for ref in $(get_remote_refs_for_fetch "$@" $taglist)
 do
     refs="$refs $ref"
 
-
To: <git@...>
Date: Tuesday, September 27, 2005 - 6:14 am

The problem here is that currently there are no global, public branches.
And you should not mix private heads in refs/heads with global tags.
Perhaps interpret tag objects as global branch names, similar to
the "mixture" in .git/refs ?

Josef
-
To: Josef Weidendorfer <Josef.Weidendorfer@...>
Cc: <git@...>
Date: Tuesday, September 27, 2005 - 8:34 am

Dear diary, on Tue, Sep 27, 2005 at 12:14:31PM CEST, I got a letter

But we don't need any global tags or heads. You just have some heads in
your refs/heads (it doesn't matter if they are public or remote, that's
a "social" issue what you tell people to fetch). And based on your heads
you have in your refs/heads, there would be directories in your
refs/tags/ corresponding to those.

If you fetch remote head, its local subdirectory in refs/tags/ is
populated with the new tags, and if you merge two heads, the public tags
are copied around. Then if you are resolving a tag, we should first look
at refs/tags/$(readlink HEAD)/tagname, and if it doesn't exist, we would
look at refs/tags/tagname (so if you wanted to reference a tag not in
your head, you'd have to use a "head/tag" form). Optionally, you could
also look for refs/tags/*/tagname and if it gives you a unique match,
use that - but I'm not sure how good idea this is since it already makes

I don't understand.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.
-
To: Petr Baudis <pasky@...>
Cc: <git@...>
Date: Tuesday, September 27, 2005 - 9:27 am

Ah, ok.

Let me see if I understand:
1) These tags are bound to a head, and they have the invariant that they 
appear in the commit history of the head.
2) They are updated automatically.
3) When someone rebases a head, the bound tags should be synced to the
rebased head's history.
4) Tags can appear multiple times, if they happen to be in the commit
history of multiple heads?


Ok, this is the "automatically updated" feature I talked about above.
So missing here is:
- If you want to get rid of a head, the tags should be removed
- If a head is rebased, this has to be detected and the tags recreated,
possibly removing some

Probably there should be a "cg-tag --recover" to resync these volatile
tags with tag objects appearing in the histories of heads?

As for lightweight tags of remote repositories, you probably need some
space to recover them e.g. on a rebase or creation of a new head without


Tag objects in a repository could be interpreted as branch names
for commits based on it. When creating a new branch point, I first
would put a tag object on this branch, thus renaming it.
I think this would be quite handy for navigation in histories.

Josef
-
To: Petr Baudis <pasky@...>
Cc: H. Peter Anvin <hpa@...>, Git Mailing List <git@...>
Date: Monday, September 26, 2005 - 5:55 pm

It could point to a tree (ie. the kernel's v2.6.11 tag), which may end 
up being a large pull.  I think it's best to not care what type of 
object the tag references.

--
				Brian Gerst
-
To: Brian Gerst <bgerst@...>
Cc: H. Peter Anvin <hpa@...>, Git Mailing List <git@...>
Date: Monday, September 26, 2005 - 5:56 pm

Dear diary, on Mon, Sep 26, 2005 at 11:55:34PM CEST, I got a letter

Yes, but the object may not be reachable in any other way.

Simple question - if you have a tagged blob containing a GPG public key
(let's call it.. hmm.. e.g. junio-gpg-pub ;), would you expect Cogito to
ignore it or pick it up?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.
-
To: Petr Baudis <pasky@...>
Cc: <git@...>
Date: Friday, September 23, 2005 - 10:00 pm

I think you could run git-peek-remote to find all the refs and
then run git-fetch-pack to slurp all the tags (and heads for
that matter) at once.  Is there a particular reason you would
prefer the commit walker?

-
To: Junio C Hamano <junkio@...>
Cc: <git@...>
Date: Saturday, September 24, 2005 - 8:50 am

Dear diary, on Sat, Sep 24, 2005 at 04:00:04AM CEST, I got a letter

Actually, probably not, except consistency with rsync and http handling
- but that's obviously not too good reason. I did it this way since I'm
going to be a bit busy again from now on.

I will probably rewrite the tags fetching to use git-peek-remote
(info/refs for http) the next weekend. One problem with this is that in
many repositories, git-update-server-info does not get ever run and
things would break "mysteriously". I don't want the policy that the user
has to take care of this on his own for Cogito, so I will probably add
something that will automagically append git-update-server-info at least
to the post-update hook (like

	uphook="$_git/hooks/update-post"
	if ! [ -x "$uphook" ]; then
		if ! [ -e "$uphook" ]; then
			echo '#!/bin/sh' &gt;&gt;"$uphook"
			echo 'exec git-update-server-info' &gt;&gt;"$uphook"
		fi
		# If the user added something custom and left the hook
		# disabled, he knew what he was doing. Also don't
		# reenable the hook if we already did that once.
		if [[ "$(grep -v '^#\($\|[^#]\)\|^$' "$uphook")" == "*exec git-update-server-info*" ]]; then
			chmod a+x "$uphook"
			echo "## Enabled by Cogito. It won't try to enable it again as long as this comment is here." &gt;&gt;"$uphook"
		fi
	fi

or something).

Actually, I might also add something like

	[ -e "$_git/git-dummy-support" ] &amp;&amp; git-update-server-info

at all the places in Cogito where I update the refs. Then the
default post-update hook could change to

	[ -e "$_git/git-dummy-support" ] &amp;&amp; exec git-update-server-info

and be enabled by default?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.
-
To: Petr Baudis <pasky@...>
Cc: Junio C Hamano <junkio@...>, <git@...>
Date: Saturday, September 24, 2005 - 1:13 pm

It wouldn't actually be very hard to rewrite git-*-fetch programs to fetch 
with a bunch of starting points. The main reason I haven't is actually 
that I don't have any ideas for a way to extend the command line argument 
format to include it.

	-Daniel
*This .sig left intentionally blank*
-
To: Petr Baudis <pasky@...>
Cc: Git Mailing List <git@...>
Date: Friday, September 23, 2005 - 9:52 pm

Perhaps git-ssh-fetch should be fixed?  :)

	-hpa
-
Previous thread: How to make Cogito use git-fetch-pack? by H. Peter Anvin on Friday, September 23, 2005 - 6:20 pm. (5 messages)

Next thread: [PATCH] rsh.c unterminated string by H. Peter Anvin on Friday, September 23, 2005 - 7:30 pm. (1 message)
speck-geostationary