Re: [PATCH] Change softrefs file format from text (82 bytes per entry) to binary (40 bytes per entry)

Previous thread: [StGIT PATCH not for master] Use gitk --argscmd in contrib/stg-gitk. by Yann Dirson on Sunday, June 3, 2007 - 3:52 pm. (1 message)

Next thread: git submodule init by Aneesh Kumar on Sunday, June 3, 2007 - 9:47 pm. (1 message)
From: Johan Herland
Date: Sunday, June 3, 2007 - 5:51 pm

(In response to the earlier thread on the git 'notes' feature/idea: "[PATCH 
00/15] git-note: A mechanism for providing free-form after-the-fact 
annotations on commits")

Ok, this is getting too big for me to ship all in one patch series, so I'm 
gonna have to split it up. Right now I'm thinking three patch series, but 
as I'm not near finished yet, that may change.

The three patch series I'm talking about are:

1. Refactoring the git tag object. This lays down part of the groundwork 
needed to support the second incarnation of git 'notes'. This is all ready 
to go and should follow shortly after this mail.

2. Introducing soft references (softrefs). Softrefs is the general mechanism 
behind the "reverse mapping" needed to support 'notes', as discussed in the 
previous thread. I'm still working on this patch series, and it will 
hopefully be ready soon-ish.

3. Reimplementing git 'notes' on top of (1) and (2). This should be fairly 
easy (once (2) is done), as it's mostly a matter of building a simple 
porcelain on top of (1) and (2).


Have fun!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-

From: Johan Herland
Date: Sunday, June 3, 2007 - 5:51 pm

This patch series implements part of the ground work for the 'notes'
feature discussed earlier in the thread "[PATCH 00/15] git-note: A
mechanism for providing free-form after-the-fact annotations on commits".

The following patches refactors the tag object by:
1. Making the "tag" header optional
2. Introducing a new optional "keywords" header
3. Making the "tagger" header mandatory as far as possible
4. Do better and more thorough verification of tag objects

Unfortunately the first patch in the series is bigger than I would have
liked, but I couldn't find an easy way to split it up.

Here's the shortlog:

Johan Herland (6):
      Refactor git tag objects; make "tag" header optional; introduce new optional "keywords" header
      git-show: When showing tag objects with no tag name, show tag object's SHA1 instead of an empty string
      git-fsck: Do thorough verification of tag objects.
      Documentation/git-mktag: Document the changes in tag object structure
      git-mktag tests: Fix and expand the mktag tests according to the new tag object structure
      Add fsck_verify_ref_to_tag_object() to verify that refname matches name stored in tag object

 Documentation/git-mktag.txt |   42 +++++--
 builtin-fsck.c              |   35 ++++++
 builtin-log.c               |    2 +-
 mktag.c                     |  148 +++++-------------------
 t/t3800-mktag.sh            |  204 ++++++++++++++++++++++++++++++---
 tag.c                       |  266 +++++++++++++++++++++++++++++++++++--------
 tag.h                       |    4 +-
 7 files changed, 507 insertions(+), 194 deletions(-)


Have fun!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-


In order to support ref-less tag objects (aka. 'notes'), we want to do
some changes to the tag object structure. The new structure implemented
in this patch is backward-compatible in the way that all existing tag
objects (valid in the old implementation) will remain valid in the new
implementation. The following changes are done:

1. Make the "tag" header optional. The "tag" header contains the tag name,
   which is optional for 'notes'. The new semantics for the "tag" header
   are as follows: The tag header _must_ be given for signed tags (this
   is already enforced by git-tag.sh). When the tag header is not given,
   its value defaults to the empty string.

2. Introduce a new optional "keywords" header. The "keywords" header is a
   comma-separated list of free-form values. However, two certain value
   have special meaning: "tag" and "note": When the "keywords" header is
   missing, its default value is set to "tag" if a "tag" header is
   present; else the default "keywords" value is set to "note". The
   "keywords" header is meant to be used by porcelains for classifying
   different types of tag objects. This classification may then be used to
   filter tag objects in the presentation layer (e.g. by implementing
   extra filter options to --decorate, etc.)

3. Make the "tagger" header mandatory. This header has already for a long
   time been "de facto" mandatory, in that it is automatically generated
   by git-tag.sh. This patch verifies the existence of the header when
   creating tag objects. However, since there exists old tags without
   the "tagger" header, the verification should not be done when parsing
   tag objects.

4. Consolidate the parsing and verification of tag objects. Currently
   parsing is done in tag.c:parse_tag_buffer(), and verification is done
   with a very similar piece of code in mktag.c:verify_tag(). This patch
   unifies the parsing and verification of tag objects in a new function
   parse_and_verify_tag_buffer() which is then called ...
From: Matthias Lederhofer
Date: Sunday, June 3, 2007 - 11:08 pm

Why must signed tags have a tag header?  Will notes optionally have a
tag header?
-


The purpose of signing a tag is to cryptographically verify the thing 
pointed at by the tag. But you also want to protect the tag itself. In 
order to make it harder for someone to rename a signed tag (thereby opening 
the door to replacing it with a different - possibly signed - malicious 
tag), you want to include the tag name in the signed data. This allows us 
to verify that the tag ref (as stored in '.git/refs') is identical to the
tag name stored inside the signed object.


Yes, 'notes' will optionally have a "tag" header. When I originally designed  
notes, I didn't think anybody would want to name their notes, but Linus 
requested it, and there's no technical argument against it. Note that if 
you name your note, and put a ref to it (under '.git/refs'), there's 
technically no distinction between a tag object and a note object, except 
what you choose to put in the "keywords" header, of course.


Have fun!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-


This is a consequence of making the "tag" header in tag objects optional.

Signed-off-by: Johan Herland <johan@herland.net>
---
 builtin-log.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/builtin-log.c b/builtin-log.c
index 3744712..1a0f111 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -181,7 +181,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 			printf("%stag %s%s\n\n",
 					diff_get_color(rev.diffopt.color_diff,
 						DIFF_COMMIT),
-					t->tag,
+					*(t->tag) ? t->tag : name,
 					diff_get_color(rev.diffopt.color_diff,
 						DIFF_RESET));
 			ret = show_object(o->sha1, 1);
-- 
1.5.2


-

From: Johan Herland
Date: Sunday, June 3, 2007 - 5:53 pm

Teach git-fsck to do the same kind of verification on tag objects that is
already done by git-mktag.

Signed-off-by: Johan Herland <johan@herland.net>
---
 builtin-fsck.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/builtin-fsck.c b/builtin-fsck.c
index cbbcaf0..a8914ae 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -344,6 +344,20 @@ static int fsck_commit(struct commit *commit)
 static int fsck_tag(struct tag *tag)
 {
 	struct object *tagged = tag->tagged;
+	enum object_type type;
+	unsigned long size;
+	char *data = (char *) read_sha1_file(tag->object.sha1, &type, &size);
+	if (!data)
+		return error("Could not read tag %s", sha1_to_hex(tag->object.sha1));
+	if (type != OBJ_TAG) {
+		free(data);
+		return error("Internal error: Tag %s not a tag", sha1_to_hex(tag->object.sha1));
+	}
+	if (parse_and_verify_tag_buffer(0, data, size, 1)) { /* Thoroughly verify tag object */
+		free(data);
+		return error("Tag %s failed thorough tag object verification", sha1_to_hex(tag->object.sha1));
+	}
+	free(data);
 
 	if (!tagged) {
 		return objerror(&tag->object, "could not load tagged object");
-- 
1.5.2


-

From: Matthias Lederhofer
Date: Sunday, June 3, 2007 - 10:56 pm

The objerror() function prints the sha1 and object type, I think this
one should be used instead of error() here.
-

From: Johan Herland
Date: Monday, June 4, 2007 - 12:51 am

Teach git-fsck to do the same kind of verification on tag objects that is
already done by git-mktag.

Signed-off-by: Johan Herland <johan@herland.net>
---


Of course, you're right. Like this, I hope:

 builtin-fsck.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/builtin-fsck.c b/builtin-fsck.c
index cbbcaf0..71a5fd5 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -344,6 +344,20 @@ static int fsck_commit(struct commit *commit)
 static int fsck_tag(struct tag *tag)
 {
 	struct object *tagged = tag->tagged;
+	enum object_type type;
+	unsigned long size;
+	char *data = (char *) read_sha1_file(tag->object.sha1, &type, &size);
+	if (!data)
+		return objerror(&tag->object, "could not read tag");
+	if (type != OBJ_TAG) {
+		free(data);
+		return objerror(&tag->object, "not a tag (internal error)");
+	}
+	if (parse_and_verify_tag_buffer(0, data, size, 1)) { /* thoroughly verify tag object */
+		free(data);
+		return objerror(&tag->object, "failed thorough tag object verification");
+	}
+	free(data);
 
 	if (!tagged) {
 		return objerror(&tag->object, "could not load tagged object");
-- 
1.5.2

-

From: Junio C Hamano
Date: Wednesday, June 6, 2007 - 12:18 am

The tagger field was introduced mid July 2005; any repository
with a tag object older than that would now get non-zero exit
from fsck.

This won't practically be problem in newer repositories, but it
is somewhat annoying.  Perhaps do this only under the new -v
option to git-fsck, say "warning" not "error", and not exit with
non-zero because of this?


-

From: Johan Herland
Date: Wednesday, June 6, 2007 - 1:06 am

Teach git-fsck to do the same kind of verification on tag objects that is
already done by git-mktag.

Signed-off-by: Johan Herland <johan@herland.net>
---


Like this?

Or would you rather switch around the "verbose" and the
"parse_and_verify_tag_buffer()" (i.e. not even attempt the thorough
verification unless in verbose mode)?


Have fun!

...Johan

 builtin-fsck.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/builtin-fsck.c b/builtin-fsck.c
index bacae5d..fb9a8bb 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -359,11 +359,24 @@ static int fsck_commit(struct commit *commit)
 static int fsck_tag(struct tag *tag)
 {
 	struct object *tagged = tag->tagged;
+	enum object_type type;
+	unsigned long size;
+	char *data = (char *) read_sha1_file(tag->object.sha1, &type, &size);
 
 	if (verbose)
 		fprintf(stderr, "Checking tag %s\n",
 			sha1_to_hex(tag->object.sha1));
 
+	if (!data)
+		return objerror(&tag->object, "could not read tag");
+	if (type != OBJ_TAG) {
+		free(data);
+		return objerror(&tag->object, "not a tag (internal error)");
+	}
+	if (parse_and_verify_tag_buffer(0, data, size, 1) && verbose)
+		objwarning(&tag->object, "failed thorough tag object verification");
+	free(data);
+
 	if (!tagged) {
 		return objerror(&tag->object, "could not load tagged object");
 	}
-- 
1.5.2



-- 
Johan Herland, <johan@herland.net>
www.herland.net
-

From: Junio C Hamano
Date: Wednesday, June 6, 2007 - 2:03 am

Actually I was thinking about doing something like this.

-	if (parse_and_verify_tag_buffer(0, data, size, 1) && verbose)
+	if (parse_and_verify_tag_buffer(0, data, size, verbose))


-

From: Junio C Hamano
Date: Wednesday, June 6, 2007 - 2:21 am

Well, after running fsck with --verbose, I take the whole
suggestion back.  I think it is a good idea to do the "thorough"
tag validation in general, and it should not be buried under the
verbose output, which is almost useless unless in a very narrow
special case that you are really trying to see which exact
object is corrupt.

So I think your original patch to signal error on thorough tag
validation failure is probably a good approach in general.
People need to know that in git.git fsck would return non-zero
because of v0.99 tag, but the people who get hit/annoyed by this
ought to be minority.  It may be the case that a major portion
of git users currently are the ones who futz with the git.git
repository, but there would be a serious problem if it continues
to be the case ;-)





-

From: Johan Herland
Date: Wednesday, June 6, 2007 - 3:26 am

I also noticed that a number of the early tags in the kernel repo use the 
ancient format, and would thus fail fsck.

<stroke-of-madness>
Could we replace the v0.99 tag (and other ancient tags) with "correct" 
versions, and then encourage users who have already cloned to delete their 
v0.99 tag and re-pull? New clones would of course never see the old tag at 
all. This sure as hell sounds similar to inserting foot into mouth before 
shooting oneself in said foot, but it might still be worth considering...
</stroke-of-madness>


Have fun!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-

From: Junio C Hamano
Date: Wednesday, June 6, 2007 - 3:35 am

I actually think that is not too bad.  In the course of git
development, the kernel folks had to do the wholesale repository
conversion twice (once when the order of hashing and compression
was swapped, another when the flat tree was made hierarchical),
I think.  Compared to that, tags are not referred to by other
entities, so it's much easier to "convert" (iow, re-sign).

-

From: Johan Herland
Date: Sunday, June 3, 2007 - 5:54 pm

The new structure of tag objects is documented.

Also some much-needed cleanup is done. E.g. remove the paragraph on the
8kB limit, since this limit was removed ages ago.

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/git-mktag.txt |   42 ++++++++++++++++++++++++++++++------------
 1 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-mktag.txt b/Documentation/git-mktag.txt
index 2860a3d..411105d 100644
--- a/Documentation/git-mktag.txt
+++ b/Documentation/git-mktag.txt
@@ -8,40 +8,58 @@ git-mktag - Creates a tag object
 
 SYNOPSIS
 --------
-'git-mktag' < signature_file
+[verse]
+'git-mktag' < tag_data_file
+
 
 DESCRIPTION
 -----------
-Reads a tag contents on standard input and creates a tag object
+Reads tag object data on standard input and creates a tag object
 that can also be used to sign other objects.
 
 The output is the new tag's <object> identifier.
 
-Tag Format
+
+DISCUSSION
 ----------
-A tag signature file has a very simple fixed format: three lines of
+Tag object data has the following format
 
+[verse]
   object <sha1>
   type <typename>
-  tag <tagname>
+  tag <tagname>               (optional)
+  keywords <keywords>         (optional)
+  tagger <committer>
+
+followed by a blank line and a free-form message and an optional signature
+that git itself doesn't care about, but that may be verified with gpg or
+similar.
 
-followed by some 'optional' free-form signature that git itself
-doesn't care about, but that can be verified with gpg or similar.
+In the above listing, `<sha1>` represents the object pointed to by this tag,
+`<typename>` is the type of the object pointed to ("tag", "blob", "tree" or
+"commit"), `<tagname>` is the name of this tag object (and must correspond
+to the name of the corresponding ref (if any) in `.git/refs/`). `<keywords>`
+is a comma-separated list of keywords associated with this tag object, and
+`<committer>` holds the "`name <email>`" of the tag ...
From: Johan Herland
Date: Sunday, June 3, 2007 - 5:54 pm

The existing tests are updated to reflect the changes in the tag object.

Additionally some more tests are added to test the new "keywords" header,
and to test the more thorough verification routine.

Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t3800-mktag.sh |  204 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 190 insertions(+), 14 deletions(-)

diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
index 7c7e433..f6e3d10 100755
--- a/t/t3800-mktag.sh
+++ b/t/t3800-mktag.sh
@@ -34,7 +34,7 @@ too short for a tag
 EOF
 
 cat >expect.pat <<EOF
-^error: .*size wrong.*$
+^error: .* size .*$
 EOF
 
 check_verify_failure 'Tag object length check'
@@ -46,6 +46,8 @@ cat >tag.sig <<EOF
 xxxxxx 139e9b33986b1c2670fff52c5067603117b3e895
 type tag
 tag mytag
+tagger foo
+
 EOF
 
 cat >expect.pat <<EOF
@@ -61,6 +63,8 @@ cat >tag.sig <<EOF
 object zz9e9b33986b1c2670fff52c5067603117b3e895
 type tag
 tag mytag
+tagger foo
+
 EOF
 
 cat >expect.pat <<EOF
@@ -76,6 +80,8 @@ cat >tag.sig <<EOF
 object 779e9b33986b1c2670fff52c5067603117b3e895
 xxxx tag
 tag mytag
+tagger foo
+
 EOF
 
 cat >expect.pat <<EOF
@@ -91,7 +97,7 @@ echo "object 779e9b33986b1c2670fff52c5067603117b3e895" >tag.sig
 printf "type tagsssssssssssssssssssssssssssssss" >>tag.sig
 
 cat >expect.pat <<EOF
-^error: char48: .*"[\]n"$
+^error: char48: .*"[\]n" after "type"$
 EOF
 
 check_verify_failure '"type" line eol check'
@@ -103,10 +109,12 @@ cat >tag.sig <<EOF
 object 779e9b33986b1c2670fff52c5067603117b3e895
 type tag
 xxx mytag
+tagger foo
+
 EOF
 
 cat >expect.pat <<EOF
-^error: char57: no "tag " found$
+^error: char57: .*$
 EOF
 
 check_verify_failure '"tag" line label check #1'
@@ -118,21 +126,27 @@ cat >tag.sig <<EOF
 object 779e9b33986b1c2670fff52c5067603117b3e895
 type taggggggggggggggggggggggggggggggg
 tag
+keywords foo
+tagger bar@baz.com
+
 EOF
 
 cat >expect.pat <<EOF
-^error: char87: no "tag " found$
+^error: char87: .*$
 EOF
 
 ...

This patch adds the check described by Junio.

Signed-off-by: Johan Herland <johan@herland.net>
---
 builtin-fsck.c |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/builtin-fsck.c b/builtin-fsck.c
index a8914ae..379317e 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -515,6 +515,25 @@ static int fsck_handle_reflog(const char *logname, const unsigned char *sha1, in
 	return 0;
 }
 
+static void fsck_verify_ref_to_tag_object(const char *refname, struct object *obj)
+{
+	/* Verify that refname matches the name stored in obj's "tag" header */
+	struct tag *tagobj = (struct tag *) parse_object(obj->sha1);
+	size_t tagname_len = strlen(tagobj->tag);
+	size_t refname_len = strlen(refname);
+
+	if (!tagname_len) return; /* No tag name stored in tagobj. Nothing to do. */
+
+	if (tagname_len < refname_len &&
+	    !memcmp(tagobj->tag, refname + (refname_len - tagname_len), tagname_len) &&
+	    refname[(refname_len - tagname_len) - 1] == '/') {
+		/* OK: tag name is "$name", and refname ends with "/$name" */
+		return;
+	}
+	else
+		error("%s: Mismatch between tag ref and tag object's name %s", refname, tagobj->tag);
+}
+
 static int fsck_handle_ref(const char *refname, const unsigned char *sha1, int flag, void *cb_data)
 {
 	struct object *obj;
@@ -529,6 +548,8 @@ static int fsck_handle_ref(const char *refname, const unsigned char *sha1, int f
 		/* We'll continue with the rest despite the error.. */
 		return 0;
 	}
+	if (obj->type == OBJ_TAG) /* ref to tag object */
+		fsck_verify_ref_to_tag_object(refname, obj);
 	default_refs++;
 	obj->used = 1;
 	mark_reachable(obj, REACHABLE);
-- 
1.5.2

-

From: Junio C Hamano
Date: Monday, June 4, 2007 - 1:32 pm

Johan, I gave only a cursory look at the series so far (that is,
I looked at your log message and had a very quick pass over the
code to see what the basic idea is and if it is sound, without
really reading/reviewing the code).  It all looks good.


-

From: Johan Herland
Date: Thursday, June 7, 2007 - 3:13 pm

The code that was enabled by passing a non-zero 'thorough_verify' argument
to parse_and_verify_tag_buffer() moved the 'tag_line' and 'keywords_line'
pointer variables forward in memory while checking for illegal chars.
These pointers were later used when setting the respective members on
the parsed tag object.

The fix refactors the verification loop so as to use offsets to the
'tag_line' and 'keywords_line' pointers, instead of moving the pointers
directly.

The patch also includes cleanup of the code associated with moving the
various '*_line' pointers past their initial header field identifier.
These operations are now done along with the calculation of their
corresponding '*_len' variables.

The patch also includes minor changes to expected output in associated
testcases.

The bug was discovered by inspection. Currently none of the callers of
parse_and_verify_tag_buffer() that use thorough_verify != 0, also use
the 'tag' and 'keywords' members of the parsed tag object.

Signed-off-by: Johan Herland <johan@herland.net>
---

This goes on top of the existing "Refactor the tag object" patch series.


Have fun!

...Johan

 t/t3800-mktag.sh |    8 ++++----
 tag.c            |   49 ++++++++++++++++++++++++++-----------------------
 2 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
index f6e3d10..ac9008a 100755
--- a/t/t3800-mktag.sh
+++ b/t/t3800-mktag.sh
@@ -186,7 +186,7 @@ tagger bar@baz.com
 EOF
 
 cat >expect.pat <<EOF
-^error: char67: could not verify tag name$
+^error: char66: could not verify tag name$
 EOF
 
 check_verify_failure 'verify tag-name check'
@@ -240,7 +240,7 @@ tagger bar@baz.com
 EOF
 
 cat >expect.pat <<EOF
-^error: char83: .*$
+^error: char82: .*$
 EOF
 
 check_verify_failure '"keywords" line check #1'
@@ -258,7 +258,7 @@ tagger bar@baz.com
 EOF
 
 cat >expect.pat <<EOF
-^error: char87: .*$
+^error: char86: .*$
 EOF
 
 check_verify_failure '"keywords" line check #2'
@@ ...
From: Johan Herland
Date: Saturday, June 9, 2007 - 11:19 am

This patch series introduces soft references (softrefs); a mechanism for
declaring reachability between arbitrary (but existing) git objects.
Softrefs are meant to provide the mechanism for "reverse mapping" that
we determined was needed for tag objects (especially 'notes'). The patch
series also teaches git-mktag to create softrefs for all tag objects.

See the Discussion section in the git-softref manual page (patch #4/7) or
the comments in the header file (patch #1/7) for more details on the
design of softrefs.

I've added some informal performance data at the bottom of this mail [1].

Note that this patch series is incomplete in that the following things
have yet to be implemented:

1. Clone/fetch/push of softrefs

2. Packing of softrefs

3. General integration of softrefs into parts of git where they might be
   useful

4. Find appropriate value for MAX_UNSORTED_ENTRIES


There are also some questions connected to the above list of todos:

1. Just how should softrefs affect reachability? Should softrefs be
   used/followed in _all_ reachability computations? If not, which?

2. How should softrefs propagate. I suggest they are pretty much always
   propagated under clone/fetch/push. (Note that the softrefs merge
   algorithm in softrefs.c removes duplicates and softrefs between
   non-existing objects, so pre-filtering of the softrefs to be
   clones/fetched/pushed may not be necessary)

3. Where can softrefs be used to improve performance by replacing existing
   techniques?

4. How to best pack softrefs? Keeping them in the same pack as the objects
   they refer to seems to be a good idea, but more thought needs to be put
   into this before we can make an implementation

5. How to find _all_ (even unreachable) tag objects in repo for
   'git-softref --rebuild-tags'?

6. Optimization. Pretty much nothing has been done so far. Performance
   seems to be acceptable for now. Probably needs more testing to
   determine bottlenecks


NOTE: After the 7 ...
From: Johan Herland
Date: Saturday, June 9, 2007 - 11:21 am

See patch for documentation.

Signed-off-by: Johan Herland <johan@herland.net>
---
 softrefs.h |  188 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 188 insertions(+), 0 deletions(-)
 create mode 100644 softrefs.h

diff --git a/softrefs.h b/softrefs.h
new file mode 100644
index 0000000..db0f8b9
--- /dev/null
+++ b/softrefs.h
@@ -0,0 +1,188 @@
+#ifndef SOFTREFS_H
+#define SOFTREFS_H
+
+/*
+ * Softrefs is a general mechanism for declaring a relationship between two
+ * existing arbitrary objects in the repo. Softrefs differ from the existing
+ * reachability relationship in that a softref may be created after _both_ of
+ * the involved objects have been added to the repo. In contrast, the regular
+ * reachability relationship depends on the reachable object's name being
+ * stored _inside_ the other object. A reachability relationship can therefore
+ * not be created at a later time without violating the immutability of git
+ * objects.
+ *
+ * Softrefs are defined as going _from_ one object _to_ another object. Once
+ * a softref between two objects has been created, the "to" object is
+ * considered reachable from the "from" object.
+ *
+ * Also, softrefs are stored in a way that makes it easy and quick to find all
+ * the "to" objects reachable from a given "from" object.
+ *
+ * The softrefs db consists of two files: .git/softrefs.unsorted and
+ * .git/softrefs.sorted. Both files use the same format; one softref per line
+ * of the form "<from-sha1> <to-sha1>\n". Each sha1 sum is 40 bytes long; this
+ * makes each entry exactly 82 bytes long (including the space between the sha1
+ * sums and the terminating linefeed).
+ *
+ * The entries in .git/softrefs.sorted are sorted on <from-sha1>, in order to
+ * make lookup fast.
+ *
+ * The entries in .git/softrefs.unsorted are _not_ sorted. This is to make
+ * insertion fast.
+ *
+ * When softrefs are created (by calling add_softref()/add_softrefs()), they
+ * are appended to ...
From: Johannes Schindelin
Date: Saturday, June 9, 2007 - 11:58 pm

Hi,


This is preposterous. Either you substitute the patch for a documentation, 
or you document it in the commit message. I consider commit messages like 
"See patch for documentation" as reasonable as all those CVS "** no 

This sure sounds like you buy the disadvantages of both. Last time I 
checked, it was recommended to look at your needs and pick _one_ 
appropriate data structure fitting _all_ your needs.

Besides, your lines are way too long. Yes, it is not in 
Documentation/SubmittingPatches, but even just a cursory look into the 
existing source shows you that it is mostly 80-chars-per-line. I think it 
goes without saying that you should try to imitate the existing practices 
in any project, and since you have to read the source to get acquainted 
with it _anyway_, it would only be a duplication to have it in 
SubmittingPatches, too.

Ciao,
Dscho

-

From: Junio C Hamano
Date: Sunday, June 10, 2007 - 12:43 am

Well, maybe we should do this.

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 01354c2..4bdfdfe 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -5,6 +5,7 @@ Checklist (and a short version for the impatient):
 	- make commits of logical units
 	- check for unnecessary whitespace with "git diff --check"
 	  before committing
+	- tab width is 8, the terminal is 80-columns wide.
 	- do not check in commented out code or unneeded files
 	- provide a meaningful commit message
 	- the first line of the commit message should be a short
@@ -82,6 +83,14 @@ option).
 Another thing: NULL pointers shall be written as NULL, not as 0.
 
 
+(1b) Tab width is 8, Terminal is 80-column wide.
+
+We generally follow the same coding style guidelines as the
+Linux kernel project.  Lines are indented with Tabs, each of
+which are 8 columns wide.  Lines should fit on 80-column wide
+terminals.
+
+
 (2) Generate your patch using git tools out of your commits.
 
 git based diff tools (git, Cogito, and StGIT included) generate

-

From: Johannes Schindelin
Date: Sunday, June 10, 2007 - 12:54 am

Hi,


But where to stop?

Many people want to put an opening curly bracket in its own line. Other 
indenting is subject for discussion, too. White space after operators, but 
not after function names should be included, too.

I know you mean good, but I think it is not a bad idea to let people get 
familiar with the code (and the formatting rules) first. This way we can 
even tell who did, and who did not do that, before submitting a patch.

Ciao,
Dscho

-

From: Johan Herland
Date: Sunday, June 10, 2007 - 7:00 am

Well, I could have copied documentation from the header file into the commit

First, the unsorted file is bounded in size to make sure it never gets
large enough to really impact performance
Second, I'd ask you to look at the performance numbers (in patch #0)

I have indeed tried to follow the 80-chars-per-line rule. In softrefs.h
I fail to see a _single_ line exceeding 80 chars per line. In the other
files I believe the number of long lines is comparable to other files
in the git repo.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-

From: Johan Herland
Date: Saturday, June 9, 2007 - 11:22 am

This code tries to implement the softrefs API as straightforwardly as
possible. Virtually no optimization has been done, although I do have
a feeling the code has ok performance as is.

All functions that do not appear in the API docs have some comments
attached to them.

There are also a couple of things to be considered before inclusion:
- File locking. Currently no locking is performed on softrefs files
  before reading or writing entries.
- Packing. We need a plan for how softrefs should be included in packs,
  at which supporting code must be added to this implementation.

Signed-off-by: Johan Herland <johan@herland.net>
---
 softrefs.c |  712 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 712 insertions(+), 0 deletions(-)
 create mode 100644 softrefs.c

diff --git a/softrefs.c b/softrefs.c
new file mode 100644
index 0000000..c7308c8
--- /dev/null
+++ b/softrefs.c
@@ -0,0 +1,712 @@
+#include "cache.h"
+#include "softrefs.h"
+
+/* constants */
+static const char *       UNSORTED_FILENAME    = "softrefs.unsorted";
+static const char *       SORTED_FILENAME      = "softrefs.sorted";
+static const unsigned int MAX_UNSORTED_ENTRIES = 1000;
+
+
+/* softref entry as it appears in a softrefs file */
+struct softrefs_entry {
+	char from_sha1_hex[40];
+	char space;
+	char to_sha1_hex[40];
+	char lf;
+};
+
+/* simple encapsulation of a softrefs file */
+struct softrefs_file {
+	char *filename;
+	int fd;
+	struct softrefs_entry *data; /* mmap()ed softrefs_entry objects */
+	unsigned long data_len;      /* # of softrefs_entry objects in data */
+};
+
+/* Internal file opened/closed by (de)init_softrefs_access() */
+static struct softrefs_file *internal_file = 0;
+
+/*
+ * Open and mmap() the given filename, Assign the file descriptior, data
+ * pointer and data length to the given softrefs_file object.
+ * Return 0 on success, -1 on failure.
+ *
+ * Note that a non-existing file is not a failure per se, but is rather treated
+ ...
From: Johan Herland
Date: Saturday, June 9, 2007 - 11:22 am

git-softref is meant to be used from shell scripts that need to interact
with the softrefs database. The builtin command provides most of the
functionality present in the softrefs C API.

Documentation to follow in a subsequent patch.

Signed-off-by: Johan Herland <johan@herland.net>
---
 builtin-softref.c |  167 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 167 insertions(+), 0 deletions(-)
 create mode 100644 builtin-softref.c

diff --git a/builtin-softref.c b/builtin-softref.c
new file mode 100644
index 0000000..f95db4e
--- /dev/null
+++ b/builtin-softref.c
@@ -0,0 +1,167 @@
+/*
+ * git softref builtin command
+ *
+ * Add, list and administer soft references (softrefs)
+ *
+ * Copyright (c) 2007 Johan Herland
+ */
+
+#include "cache.h"
+#include "tag.h"
+#include "refs.h"
+#include "softrefs.h"
+
+static const char builtin_softref_usage[] =
+	"git-softref [ --list [<from-object>]"
+		" | --has <from-object> <to-object>"
+		" | --add <from-object> <to-object>"
+		" | --rebuild-tags"
+		" | --merge-unsorted [<softrefs-file>]"
+		" | --merge-sorted <softrefs-file> ]";
+
+static int list_helper(
+		const unsigned char *from_sha1,
+		const unsigned char *to_sha1,
+		void *cb_data)
+{
+	printf("%s %s\n", sha1_to_hex(from_sha1), sha1_to_hex(to_sha1));
+	return 0;
+}
+
+int rebuild_tags_helper(
+		const char *refname,
+		const unsigned char *sha1,
+		int flags,
+		void *cb_data)
+{
+	struct object *o = parse_object(sha1);
+	if (o && o->type == OBJ_TAG) {
+		struct tag *t = (struct tag *) o;
+		struct softref_list **prev = (struct softref_list **) cb_data;
+		struct softref_list *current = xmalloc(sizeof(struct softref_list));
+		current->next = *prev;
+		hashcpy(current->from_sha1, t->tagged->sha1);
+		hashcpy(current->to_sha1,   t->object.sha1);
+		*prev = current;
+	}
+	return 0;
+}
+
+int cmd_softref(int argc, const char **argv, const char *prefix)
+{
+	int i;
+	int show_usage = 0, list = 0, has = 0, add = 0, ...
From: Johan Herland
Date: Saturday, June 9, 2007 - 11:23 am

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/git-softref.txt |  119 +++++++++++++++++++++++++++++++++++++++++
 1 files changed, 119 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/git-softref.txt

diff --git a/Documentation/git-softref.txt b/Documentation/git-softref.txt
new file mode 100644
index 0000000..6a3e13b
--- /dev/null
+++ b/Documentation/git-softref.txt
@@ -0,0 +1,119 @@
+git-softref(1)
+==============
+
+NAME
+----
+git-softref - Create, list and administer soft references
+
+
+SYNOPSIS
+--------
+[verse]
+'git-softref' --list [<from-object>]
+'git-softref' --has <from-object> <to-object>
+'git-softref' --add <from-object> <to-object>
+'git-softref' --rebuild-tags
+'git-softref' --merge-unsorted [<softrefs-file>]
+'git-softref' --merge-sorted <softrefs-file>
+
+
+DESCRIPTION
+-----------
+Query and administer soft references in a git repository.
+
+Soft references are used to declare reachability between already existing
+objects. An object (called the 'to-object') may be declared reachable from
+another object (the 'from-object') without affecting the immutability of either
+object.
+
+The `--list` option will list existing softrefs in the database. If given the
+optional <from-object>, the list is limited to softrefs from the given object.
+The `--has` option is used to check for the existence of a softref between two
+given objects, similarly the `--add` option is used to add such a softref.
+
+The `--rebuild-tags` option is used to generate softrefs for all tag objects in
+the repository reachable from tag refs. Tag objects use softrefs to declare
+reachability 'from' the tagged object, 'to' the tag object. This allows for tag
+objects to the cloned/fetched/pushed along with their associated objects.
+
+Finally, the `--merge-unsorted` and `--merge-sorted` options are used to merge
+softrefs files into the sorted softrefs db. The filename argument must point
+to an existing file in unsorted/sorted softrefs ...
From: Johan Herland
Date: Saturday, June 9, 2007 - 11:23 am

Adds testing of the basic options available to the git-softref command.

Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t3050-softrefs.sh |  314 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 314 insertions(+), 0 deletions(-)
 create mode 100755 t/t3050-softrefs.sh

diff --git a/t/t3050-softrefs.sh b/t/t3050-softrefs.sh
new file mode 100755
index 0000000..a925178
--- /dev/null
+++ b/t/t3050-softrefs.sh
@@ -0,0 +1,314 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johan Herland
+#
+
+test_description='Basic functionality of soft references'
+. ./test-lib.sh
+
+
+# Prepare repo and create some notes
+
+test_expect_success 'Populating repo with test data' '
+	echo "foo" > foo &&
+	git-add foo &&
+	test_tick &&
+	git-commit -m "Initial commit" &&
+	git-tag -a -m "Tagging initial commit" footag &&
+	echo "bar" >> foo &&
+	test_tick &&
+	git-commit -m "Second commit" foo &&
+	git-tag -a -m "Tagging second commit" bartag
+'
+
+# At this point we should have:
+# - commit @ 301711b66fe71164f646b798706a2c1f7024da8d ("Initial commit")
+#    - tag @ ad60bc179c6874af6d97f181c67f11adcca5122b ("footag")
+# - commit @ 9671cbee7ad26528645b2665c8f74d39a6288864 ("Second commit")
+#    - tag @ a927fc832d42f1f64d8318e8acec43545d9562de ("bartag")
+# - The tag creation should also have created softrefs:
+#   - From "Initial commit" to "footag"
+#   - From  "Second commit" to "bartag"
+
+# Testing git-softref --list
+
+test_expect_success 'Testing git-softref --list on initial test data (1)' '
+	cat > expected_output << EOF &&
+301711b66fe71164f646b798706a2c1f7024da8d ad60bc179c6874af6d97f181c67f11adcca5122b
+9671cbee7ad26528645b2665c8f74d39a6288864 a927fc832d42f1f64d8318e8acec43545d9562de
+EOF
+	git-softref > actual_output 2>&1 &&
+	cmp actual_output expected_output &&
+	git-softref --list > actual_output 2>&1 &&
+	cmp actual_output expected_output
+'
+
+test_expect_success 'Testing git-softref --list on initial test data (2)' '
+	cat > ...
From: Johan Herland
Date: Saturday, June 9, 2007 - 11:24 am

Also cleans up sorting and whitespace in Documentation/cmd-list.perl

Signed-off-by: Johan Herland <johan@herland.net>
---
 .gitignore                  |    1 +
 Documentation/cmd-list.perl |    7 ++++---
 Makefile                    |    6 ++++--
 builtin.h                   |    1 +
 git.c                       |    1 +
 5 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/.gitignore b/.gitignore
index 27e5aeb..7fd6904 100644
--- a/.gitignore
+++ b/.gitignore
@@ -119,6 +119,7 @@ git-show
 git-show-branch
 git-show-index
 git-show-ref
+git-softref
 git-ssh-fetch
 git-ssh-pull
 git-ssh-push
diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl
index a181f75..4e1f45b 100755
--- a/Documentation/cmd-list.perl
+++ b/Documentation/cmd-list.perl
@@ -90,6 +90,7 @@ git-clean                               mainporcelain
 git-clone                               mainporcelain
 git-commit                              mainporcelain
 git-commit-tree                         plumbingmanipulators
+git-config                              ancillarymanipulators
 git-convert-objects                     ancillarymanipulators
 git-count-objects                       ancillaryinterrogators
 git-cvsexportcommit                     foreignscminterface
@@ -101,13 +102,13 @@ git-diff-files                          plumbinginterrogators
 git-diff-index                          plumbinginterrogators
 git-diff                                mainporcelain
 git-diff-tree                           plumbinginterrogators
-git-fast-import				ancillarymanipulators
+git-fast-import                         ancillarymanipulators
 git-fetch                               mainporcelain
 git-fetch-pack                          synchingrepositories
 git-fmt-merge-msg                       purehelpers
 git-for-each-ref                        plumbinginterrogators
 git-format-patch                        mainporcelain
-git-fsck	                        ancillaryinterrogators
+git-fsck ...
From: Johan Herland
Date: Saturday, June 9, 2007 - 11:24 am

For each tag object created, we create a corresponding softref from the
tagged object to the tag object itself. This is needed to enable efficient
lookup of which tag objects that point to a given commit/object.

Signed-off-by: Johan Herland <johan@herland.net>
---
 mktag.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/mktag.c b/mktag.c
index af0cfa6..db8a6b8 100644
--- a/mktag.c
+++ b/mktag.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "tag.h"
+#include "softrefs.h"
 
 /*
  * Tag object data has the following format: two mandatory lines of
@@ -32,6 +33,7 @@ int main(int argc, char **argv)
 {
 	unsigned long size = 4096;
 	char *buffer = xmalloc(size);
+	struct tag result_tag;
 	unsigned char result_sha1[20];
 
 	if (argc != 1)
@@ -46,7 +48,7 @@ int main(int argc, char **argv)
 	buffer[size] = 0;
 
 	/* Verify tag object data */
-	if (parse_and_verify_tag_buffer(0, buffer, size, 1)) {
+	if (parse_and_verify_tag_buffer(&result_tag, buffer, size, 1)) {
 		free(buffer);
 		die("invalid tag data file");
 	}
@@ -57,6 +59,13 @@ int main(int argc, char **argv)
 	}
 
 	free(buffer);
+
+	/* Create reverse mapping softref */
+	if (add_softref(result_tag.tagged->sha1, result_sha1) < 0) {
+		die("unable to create softref for resulting tag object %s",
+			sha1_to_hex(result_sha1));
+	}
+
 	printf("%s\n", sha1_to_hex(result_sha1));
 	return 0;
 }
-- 
1.5.2.1.144.gabc40

-

From: Johan Herland
Date: Saturday, June 9, 2007 - 11:25 am

The text-based softrefs file format uses 82 bytes per entry (40 bytes
from_sha1 in hex, 1 byte SP, 40 bytes to_sha1 in hex, 1 byte LF).

The binary softrefs file format uses 40 bytes per entry (20 bytes
from_sha1, 20 bytes to_sha1).

Moving to a binary format increases performance slightly, but sacrifices
easy readability of the softrefs files.

Signed-off-by: Johan Herland <johan@herland.net>
---

To illustrate the change in performance from changing the softrefs file
format, I prepared a linux repo (holding 57274 commits) with 10 tag
objects, and created softrefs from every commit to every tag object
(572740 softrefs in total). The resulting softrefs db was 46964680 bytes
when using the text format, and 22909600 bytes when using the binary
format. The experiment was done on a 32-bit Intel Pentium 4
(3 GHz w/HyperThreading) with 1 GB RAM:


========
Operations on unsorted softrefs:
(572740 (10 per commit) entries in random/unsorted order)
========

Listing all softrefs
(sequential reading of unsorted softrefs file)
--------
[text format]
$ /usr/bin/time git softref --list > /dev/null
0.44user 0.02system 0:00.47elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+11786minor)pagefaults 0swaps
[binary format]
$ /usr/bin/time git softref --list > /dev/null
0.35user 0.01system 0:00.38elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5913minor)pagefaults 0swaps

Listing HEAD's softrefs
(sequential reading of unsorted softrefs file)
--------
[text format]
$ /usr/bin/time git softref --list HEAD > /dev/null
0.11user 0.01system 0:00.14elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+11790minor)pagefaults 0swaps
[binary format]
$ /usr/bin/time git softref --list HEAD > /dev/null
0.02user 0.01system 0:00.03elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5918minor)pagefaults 0swaps

Sorting softrefs
--------
[text format]
$ /usr/bin/time git softref ...
From: Johannes Schindelin
Date: Sunday, June 10, 2007 - 1:02 am

Hi,


It is bad style to introduce one type, and then change it to another in a 
backwards-incompatible way. Either you make it backwards compatible, or 
you start with the second format, never even mentioning that you had 
another format.

Ciao,
Dscho

-

From: Junio C Hamano
Date: Sunday, June 10, 2007 - 1:30 am

While I agree with that in principle, I think you are being a
bit too harsh to a set of patches that shows possible
alternatives for an idea that is not even in any unreleased
version of git.

Got out of the wrong side of bed this morning?


-

From: Johannes Schindelin
Date: Sunday, June 10, 2007 - 2:46 am

Hi,


Possibly. Except it was not a bed, but an airplane passenger seat.

And it did not help that I totally disagree with the approach: "sorted 
list does not do this well, unsorted not that... let's take both!"

So, please take my words with a grain of salt. But I still think that 
_what_ I was saying is correct.

Yes, the patch series shows an alternative, but I would have wished for 
either a smaller quick-n-dirty proof-of-concept implementation ([RFC]), or 
a better thought-through one ([PATCH]).

Ciao,
Dscho

-

From: Johan Herland
Date: Sunday, June 10, 2007 - 7:03 am

As Junio correctly pointed out, this patch is only here to demonstrate an
alternative solution. Whether or not this patch should be used should be
determined _long_ before (even thinking about) putting this into a release.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-

From: Junio C Hamano
Date: Saturday, June 9, 2007 - 4:55 pm

The patch series to look-up and maintain "softref" relationship
is trivially clean.  Although I probably would have many nits to
pick, I do not think it is _wrong_ in a major way per-se.  I
would not even mind saying that I liked the basic concept, until
I thought things a bit deeper.

Here are some initial notes I took while reading your patches.


Semantics
---------

Not all "softref" relationship is equal.  "This object is
referred to by these tags" is one obvious application, and only
because we already try to follow tags when git-fetch happens
anyway, it looks natural to make everybody follow such a softref
relationship.

However, as Pierre Habouzit wants to, we may want to make a bug
tracking sheet (the details of the implementation of such a bug
tracking sheet does not matter in this discussion -- it could be
a blob, a commit, or a tag) refer to commits using this
mechanism, by pointing at the blob from commits after the fact
(i.e. "later it was verified that this commit fixes the bug
described in this bug entry").

	Side note: I am assuming a simplest implementation where
	one blob would always capture the latest status of the
	bug.  refs/bugs/12127 would point at the latest version
	of bug 12127's tracking sheet.  An alternative
	implementation would be to represent each entry of the
	tracking sheet for a single bug as a blob, and have
	multiple blobs associated to a commit on the main
	project, or the other way around, but then you would
	need a way to give order between referers to a single
	referent, which I do not find in your proposed
	"softref".

Most users who want to download and compile the main project do
not care about bug tracker objects.  You would need to have a
way to describe what kind of relationship a particular softlink
represents, and adjust the definition of reachability for the
purposes of traversal of objects.

It gets worse when you actually start using softrefs.  I do not
think you would have a limited set of softrefs, such ...
From: Johan Herland
Date: Saturday, June 9, 2007 - 6:25 pm

Yes, I'm starting to see that it's not a good idea to put _all_ softrefs in 

We're getting a little ahead of ourselves, aren't we? IMHO, it would be up 
to the bug system to determine which (and how many) connections to make 
between the bug reports and the commits (or even if softrefs would be the 
correct mechanism for these connections at all). We shouldn't necessarily 
base the softrefs design on how we imagine a hypothetical bug system to 
work. But Pierre might have something to say on how he would want to use 
softrefs, and his system is hopefully _less_ hypothetical. :)

But I can see the use of letting the user/porcelain/bugtracker define 

Yes, I see that different classes of softrefs would have different semantics 
for propagation. we could probably try to set up some sane defaults, and 
then let users put rules in their configs for how they would want to 



Isn't there a .used flag on objects we could easily check to see if we've 
seen an object before, thus preventing us from following the circular 

Hmm. First of all, I'm not sure it would be useful to add a _direct_ link 
between E and A, but even so...
I'm thinking we can do the regular/current reachability calculation first, 
and after it's done, we analyze the softrefs to see if there are more 
objects to be fetched. In that scenario, we wouldn't need to send A again, 

As above, if the receving side gets the list of involved softrefs, it can 
make the decision on which objects it needs to fetch.

Hmm. Thinking about it, this process would of course need to be recursive, 

Yeah, there are still may open questions. But I'm glad to see that most 
people seem to find the basic concept useful, at least.


Thanks for taking the time and effort to comment.


Have fun!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-

From: Johannes Schindelin
Date: Saturday, June 9, 2007 - 11:33 pm

Hi,


Has my lightweight annotation patch reached you?

I like my approach better than yours, because it is

1) a way, way smaller patch, and
2) it automatically includes the versionability.

After thinking about it a little more (my plane was slow, and as a result 
I am allowed to spend 8 more hours in Paris), I think that a small but 
crucial change would make this thing even more useful:

Instead of having "core.showAnnotations" be a boolean config, it might be 
better to have "core.annotationsRef" instead, overrideable by the 
environment variable GIT_ANNOTATION_REF.

With this, you can have different refs for different kinds of annotations.

For example, some people might add bugtracker comments (even comments like 
"this commit was bad: introduced bug #798, solved by commit 9899fdadc.."). 
Those comments could live in refs/annotations/bugs. To see them, just say 

	GIT_ANNOTATION_REF=refs/annotations/bugs gitk

Voila.

I am quite certain that treating annotations as branches, containing 
fan-out directories for the reverse lookup. I am even quite certain that 
in most cases, a working-directory-less merging is possible for such 
annotations.

Ciao,
Dscho

-

From: Johan Herland
Date: Sunday, June 10, 2007 - 6:41 am

I see your point, but your lightweight annotations are solving a different 
problem, aren't they? They do provide the after-the-fact annotations that 
sort of sparked of these discussions, but I can't see how your patch is a 
replacement of the general "relationships between arbitrary objects" 
concept that softrefs try to solve.

Of course, it might be that the lightweight annotations are "good enough" 
for the use cases we currently see, and that softrefs are a bit overkill. 


I'm not convinced about the working-directory-less merging. AFAICS the 
lightweight annotations will behave pretty much like the "regular" version 
controlled filesystem, and you'll have the same kind of conflicts when you 
merge stuff between repos. I'd be glad to be proven wrong, of course.


Have fun!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-

From: Pierre Habouzit
Date: Sunday, June 10, 2007 - 7:09 am

Sorry for the noise, but I'm really pissed with git@vger recently.
That mail I'm answering to, never made it to the list. Neither did my
quite long answer to Martin, who kindly forwared it back to me so that I
can send it again, sadly git@vger just does not wants mail, whereas its
SMTP seems to accept mail:

  Jun 10 16:02:09 pan postfix/smtp[20560]: DE84FCAD9: to=3D<git@vger.kernel=
=2Eorg>, relay=3Dvger.kernel.org[209.132.176.167]:25, delay=3D4, delays=3D0=
=2E31/0.02/0.5/3.1, dsn=3D2.7.0, status=3Dsent (250 2.7.0 nothing apparentl=
y wrong in the message. BF:<H 0.0800319>; S1754569AbXFJOCJ)

  I've sent a mail to postmaster@vger.kernel.org a week ago, but it
seems it remained a dead letter. Is anyone able to tell what's going
wrong ? it's _really_ irritating, and let me want to give up
discussions, as I _hate_ losing mail (I usually don't keep a copy of
mails I send to a mail list as I expect it to send it back to me, and
well, what would be a copy worth if nobody can read the mail anyway ?).

  So if anyone knows what can be done ....



--=20
=C2=B7O=C2=B7  Pierre Habouzit
=C2=B7=C2=B7O                                                madcoder@debia=
n.org
OOO                                                http://www.madism.org
From: Pierre Habouzit
Date: Sunday, June 10, 2007 - 7:25 am

el.org>, relay=3Dvger.kernel.org[209.132.176.167]:25, delay=3D4, delays=3D0=
=2E31/0.02/0.5/3.1, dsn=3D2.7.0, status=3Dsent (250 2.7.0 nothing apparentl=

  SOrry, *I* screwed up. I did not checked first if the archived had the
mails, and it had. So as I don't see the mails come, it's definitely a
problem in between, and is not necessarilly vger.kernel.org's fault, and
I apologies for the noise.

  So if anyone sent a mail to the list (without me in Cc) hoping that I
would get the mail and never answered, it's time to send it again :/

  That is also the reason why some of my mails have been sent many
times: because I thought they were eaten at some point. That should not
happen again, sorry about that.

Cheers,
--=20
=C2=B7O=C2=B7  Pierre Habouzit
=C2=B7=C2=B7O                                                madcoder@debia=
n.org
OOO                                                http://www.madism.org
From: Pierre Habouzit
Date: Sunday, June 10, 2007 - 2:03 am

To be fair, I'm still struggling with the storage backend yet, trying
to make things fast enough (My current import rate of mails is 10 per
second, wich is not that brilliant I guess), and also to design some
simple things like "answering" to a bug.

  For now, my design is the following, I've a 'bts' branch where the
bugs reports (plain mailboxes) go. Grit is able to manage as many branch
the user wants, bts is just the default name for it. Then, for a bts
branch, you have $GIT_DIR/grit/<branch>.index and
$GIT_DIR/grit/<branch>/. The former is the index of the tip of the bts
branch, and the latter contains some bits of the tip of the branch
checkouted (can be seen as some kind of cache, useful to run mutt -f on
a mbox e.g.).

  Bugs have the sha id of the hash of the first imported mail, and are
put in sha[:2]/sha[2:] files, =C3=A0 la .git/objects/. I also should have a
second file with annotations about the bug, format not really clear for
now, as "one file per bug" could be quite inefficient. OTOH if I mix too
many bugs in the same file, the merge risk is bigger (but I suppose I
could use a specific merge strategy on this).

  Here is the sole non hypothetical thing yet. My plans then was to use
"links" (softlinks or not, I'm speaking generically, I hope softrefs
will match my needs, I don't know yet) between specific commits, and
bugs. Links would somehow carry information on wether this is an
"opening" tag (like: this bug is present starting at that commit), an
informationnal tag (like: this commit helps fixing that bug, but is not
enough), or a closing/fixing tag (like: this commit fixes it). A fourth
kind may be also used aka a not-found tag (like: well this commit does
not fixes the bug, but for sure it's not there anymore at that commit).

  Though, softlinks do not need to "carry" the information for real,
they just need to be linked somehow to the bug, bug that would have the
annotations for those softlinks in them.

  What is somehow flawed for me, is that ...
From: Steven Grimm
Date: Saturday, June 9, 2007 - 3:57 pm

(Resending this in plaintext; sorry to those who got it twice.)

Being able to specify relationships between commits after the fact seems 
like a very useful facility.

Does it make sense to have type information to record what the 
relationship between two objects means? Without that, it seems like 
it'll be hard to build much of a tool set on top of this feature, since 
no two tools that made use of it could unambiguously query just their 
own softrefs.

A few use cases for relationships-after-the-fact come to mind in 
addition to the one the patch itself mentions:

A facility like this could replace the info/grafts file, or at least 
provide another way to turn a regular commit into a merge commit. Just 
put a "manually specified merge parent" ref between the target revision 
and the one you want git to think you've merged from. That would scale a 
lot better than info/grafts does, I suspect, if only by virtue of being 
O(log n) searchable thanks to the sorting.

One could easily imagine recording a "cherry picked" softref, which 
could, e.g., be the rebase machinery to skip over an already-applied 
revision. IMO the lack of any tool-readable history about cherry picking 
-- which is, after all, a sort of merge, at least conceptually -- is a 
shortcoming in present-day git. (Not a huge one, but if nothing else 
it'd be great to see cherry picking represented in, e.g., the gitk 
history display.)

It might be possible to annotate rebases to make pulling from rebased 
branches less troublesome. If you have

A--B--C--D
    \
     -E--F--G

and you rebase E onto D, a "rebased from" softref could be recorded 
between E and E':

A--B--C--D
    \     \
     -E....E'--F'--G'

Then a pulling client could potentially use that information to cleanly 
replay the rebase operation to keep its history straight. Perhaps if you 
could record historical rebases like that, you could do away with the 
current gotchas involving rebasing shared repositories. One negative 
side ...
From: Johan Herland
Date: Saturday, June 9, 2007 - 4:16 pm

Actually MadCoder/Pierre had a similar idea on IRC. He wanted to separate 
softrefs into namespaces, so that softrefs for tags could live in a 
different place than softrefs associated with his "gits" bug tracker.

I haven't thought very much about this, but it's certainly possible to do 

Yes, I _knew_ this was similar to grafts in some way :) While working on 
this, I tried to see if I could leverage grafts somewhere in my design, but 
I found them to be too commit-bound and specific. But when you look at it 

Whoa. I hadn't even imagined this, but I guess you're right. I actually 
thought about solving the same problem (using a much worse method) way back 

Thanks a lot for sharing your ideas. :)


Have fun!

...Johan

[1] http://article.gmane.org/gmane.comp.version-control.git/46137

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-

From: Pierre Habouzit
Date: Sunday, June 10, 2007 - 1:29 am

Well, if we're two with the same idea, it's a good one, no ? :)

  In fact, the namespace idea like I told you on IRC isn't _that_
brilliant. But I'm sure recording a softref with:

  <from_sha> <to_sha> <token>

  token would help classify the softref. And I'm sure we'll end up with:

  <from_sha> <to_sha> <token> <flags>

  with the flags to say what behaviour (e.g.) the reachability resolver
should have wrt that link ?

--=20
=C2=B7O=C2=B7  Pierre Habouzit
=C2=B7=C2=B7O                                                madcoder@debia=
n.org
OOO                                                http://www.madism.org
From: Johan Herland
Date: Sunday, June 10, 2007 - 7:31 am

Interesting. But I'm not sure I want to give up the fixed-length softref
records as I imagine it makes the lookup and processing _much_ faster.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-

From: Steven Grimm
Date: Sunday, June 10, 2007 - 12:42 pm

The token (really a namespace identifier) could be defined as a string 
with a fixed, probably small, maximum size. Or, better IMO, it could be 
an integer with a bunch of enumerated values for internal git uses and a 
range reserved for unofficial / experimental use. I agree that 
fixed-length records seem like a win here, assuming this is the general 
storage layout we end up with when all is said and done.

-Steve
-

Previous thread: [StGIT PATCH not for master] Use gitk --argscmd in contrib/stg-gitk. by Yann Dirson on Sunday, June 3, 2007 - 3:52 pm. (1 message)

Next thread: git submodule init by Aneesh Kumar on Sunday, June 3, 2007 - 9:47 pm. (1 message)