[PATCH 10/21] Free mktag's buffer before dying

Previous thread: interaction between cvsimport and cvsexportcommit by picca on Friday, June 8, 2007 - 6:42 am. (2 messages)

Next thread: git-p4 fails when cloning a p4 depo. by Benjamin Sergeant on Friday, June 8, 2007 - 9:41 am. (14 messages)

I am not sure if there is any practical difference between the
two ;-).  But in either case, it appears that we should first
revert d9fa4a8 from 'next' and start from clean slate.  It
really seems that the patch series did upset too many people;
personally I found the first patch still was follow-able, but I
do agree that it should have been much smaller and not mixing
too many things into one).

So, let's do 1.


-

From: Johan Herland
Date: Friday, June 8, 2007 - 5:10 pm

This patch series implements part of the ground work for the 'notes'
feature discussed earlier in the thread "[PATCH 00/15] git-note: A
mechanism for providing free-form after-the-fact annotations on commits".

The following patches refactors the tag object by:
1. Unifying parsing and verification of tag objects (patches 1-9)
2. Do better and more thorough verification of tag objects (patches 10-13)
3. Making the "tagger" header mandatory as far as possible (patch 14)
4. Making the "tag" header optional (patch 15)
5. Introducing a new optional "keywords" header (patch 16)
6. Auxiliary changes supporting the above (patches 17-21)

This patch series replaces the earlier patch series of the same name
(plus the current bugfixes on top of that series). It's also much easier
on the eyes for those with 80 chars wide displays. Also, the selftest
suite should run successfully at any point in this patch series.

Here's the shortlog:

Johan Herland (21):
      Remove unnecessary code and comments on non-existing 8kB tag object restriction
      Return error messages when parsing fails.
      Refactoring to make verify_tag() and parse_tag_buffer() more similar
      Refactor verification of "tagger" line to be more similar to verification of "type" and "tagger" lines
      Make parse_tag_buffer_internal() handle item == NULL
      Refactor tag name verification loop to use index 'i' instead of incrementing pointer 'tag_line'
      Copy the remaining differences from verify_tag() to parse_tag_buffer_internal()
      Switch from verify_tag() to parse_and_verify_tag_buffer() for verifying tag objects in git-mktag
      Remove unneeded code from mktag.c
      Free mktag's buffer before dying
      Rewrite error messages; fix up line lengths
      Use prefixcmp() instead of memcmp() for cleaner code with less magic numbers
      Collect skipping of header field names and calculation of line lengths in one place
      Add proper parsing of "tagger" line, but only when thorough_verify is set
   ...
From: Johan Herland
Date: Friday, June 8, 2007 - 5:12 pm

Signed-off-by: Johan Herland <johan@herland.net>
---
 mktag.c |    7 -------
 1 files changed, 0 insertions(+), 7 deletions(-)

diff --git a/mktag.c b/mktag.c
index 070bc96..b82e377 100644
--- a/mktag.c
+++ b/mktag.c
@@ -12,15 +12,8 @@
  * "object <sha1>\n" is 48 bytes, "type tag\n" at 9 bytes is the
  * shortest possible type-line, and "tag .\n" at 6 bytes is the
  * shortest single-character-tag line.
- *
- * We also artificially limit the size of the full object to 8kB.
- * Just because I'm a lazy bastard, and if you can't fit a signature
- * in that size, you're doing something wrong.
  */
 
-/* Some random size */
-#define MAXSIZE (8192)
-
 /*
  * We refuse to tag something we can't verify. Just because.
  */
-- 
1.5.2


-

From: Johan Herland
Date: Friday, June 8, 2007 - 5:13 pm

This patch brings the already similar tag.c:parse_tag_buffer() and
mktag.c:verify_tag() a little bit closer to eachother.

Signed-off-by: Johan Herland <johan@herland.net>
---
 tag.c |   39 ++++++++++++++++++++++++++++++---------
 1 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/tag.c b/tag.c
index bbacd59..954ed8a 100644
--- a/tag.c
+++ b/tag.c
@@ -35,6 +35,12 @@ struct tag *lookup_tag(const unsigned char *sha1)
 
 int parse_tag_buffer(struct tag *item, void *data, unsigned long size)
 {
+#ifdef NO_C99_FORMAT
+#define PD_FMT "%d"
+#else
+#define PD_FMT "%td"
+#endif
+
 	int typelen, taglen;
 	unsigned char sha1[20];
 	const char *type_line, *tag_line, *sig_line;
@@ -45,28 +51,41 @@ int parse_tag_buffer(struct tag *item, void *data, unsigned long size)
         item->object.parsed = 1;
 
 	if (size < 64)
-		return -1;
-	if (memcmp("object ", data, 7) || get_sha1_hex((char *) data + 7, sha1))
-		return -1;
+		return error("failed preliminary size check");
 
+	/* Verify object line */
+	if (memcmp(data, "object ", 7))
+		return error("char%d: does not start with \"object \"", 0);
+
+	if (get_sha1_hex((char *) data + 7, sha1))
+		return error("char%d: could not get SHA1 hash", 7);
+
+	/* Verify type line */
 	type_line = (char *) data + 48;
-	if (memcmp("\ntype ", type_line-1, 6))
-		return -1;
+	if (memcmp(type_line - 1, "\ntype ", 6))
+		return error("char%d: could not find \"\\ntype \"", 47);
 
+	/* Verify tag-line */
 	tag_line = strchr(type_line, '\n');
-	if (!tag_line || memcmp("tag ", ++tag_line, 4))
-		return -1;
+	if (!tag_line)
+		return error("char" PD_FMT ": could not find next \"\\n\"", type_line - (char *) data);
+	tag_line++;
+	if (memcmp(tag_line, "tag ", 4) || tag_line[4] == '\n')
+		return error("char" PD_FMT ": no \"tag \" found", tag_line - (char *) data);
 
 	sig_line = strchr(tag_line, '\n');
 	if (!sig_line)
 		return -1;
 	sig_line++;
 
+	/* Get the actual type */
 	typelen = tag_line - type_line - strlen("type ...
From: Johan Herland
Date: Friday, June 8, 2007 - 5:13 pm

A fair amount of refactoring is included in this patch, none of which
should affect the actual behaviour of the code in any way.

Here are the changes done:

- Refactor out all the (char *) casting in parse_tag_buffer(). Create a
  wrapper function (parse_tag_buffer()) that casts _once_ and then calls
  parse_tag_buffer_internal() which does the real work

- Variable renaming in parse_tag_buffer_internal():
  - sig_line -> tagger_line
  - typelen  -> type_len
  - taglen   -> tag_len

- Variable renaming in verify_tag():
  - buffer  -> data
  - typelen -> type_len

- Remove unnecessary variable 'object' from verify_tag(). It has always
  the same value as 'data', so use 'data' directly instead

- Fix type of length variables (int -> unsigned long)

- Move null-termination of tag buffer out of verify_tag() and into main().

Signed-off-by: Johan Herland <johan@herland.net>
---
 mktag.c |   41 +++++++++++++++++++----------------------
 tag.c   |   50 +++++++++++++++++++++++++++-----------------------
 2 files changed, 46 insertions(+), 45 deletions(-)

diff --git a/mktag.c b/mktag.c
index b82e377..0bc20c8 100644
--- a/mktag.c
+++ b/mktag.c
@@ -32,52 +32,48 @@ static int verify_object(unsigned char *sha1, const char *expected_type)
 	return ret;
 }
 
+static int verify_tag(char *data, unsigned long size)
+{
 #ifdef NO_C99_FORMAT
 #define PD_FMT "%d"
 #else
 #define PD_FMT "%td"
 #endif
 
-static int verify_tag(char *buffer, unsigned long size)
-{
-	int typelen;
-	char type[20];
 	unsigned char sha1[20];
-	const char *object, *type_line, *tag_line, *tagger_line;
+	char type[20];
+	const char *type_line, *tag_line, *tagger_line;
+	unsigned long type_len;
 
 	if (size < 64)
 		return error("wanna fool me ? you obviously got the size wrong !");
 
-	buffer[size] = 0;
-
 	/* Verify object line */
-	object = buffer;
-	if (memcmp(object, "object ", 7))
+	if (memcmp(data, "object ", 7))
 		return error("char%d: does not start with \"object \"", 0);
 
-	if ...
From: Johannes Schindelin
Date: Friday, June 8, 2007 - 7:54 pm

Hi,





Quite a lot of changes seem to do this object->data. The patch would have 
been much more compact if you just had renamed buffer to object instead of 

This renaming variables has nothing to do with refactoring. In fact, I 
have a hard time to find code changes (which your subject suggests, as you 


Ah, so you terminate the buffer here. From the patch, it is relatively 
hard to see if this line is always hit _before_ the function is called 
that evidently relies on NUL termination. By moving it here, I think it is 
much more likely to overlook the fact that the function, which made sure 
that its assumption was met, needs this line. Whereas if you left it where 


Is this really necessary? Even if (which I doubt), it has nothing to do 
with refactoring.

If you _want_ to _explicitely_ do arithmetic on a char* instead of void*, 

I am really reluctant with renamings like these. IMHO they don't buy you 
much, except for possible confusion. It is evident that sig means the 
signer (and it is obvious in the case of an unsigned tag, who is meant, 

This cast (and indeed, this function, if you ask me) is unnecessary.

I reviewed only this patch out of your long series, mostly because I found 
the subject line interesting. But IMHO the patch does not what the subject 
line suggests.

Unfortunately, it's unlikely that I will have time until Monday night to 
continue with this patch series.

Ciao,
Dscho

-

From: Johan Herland
Date: Saturday, June 9, 2007 - 3:49 am

Hi. Thanks for taking the time to look at (some of) my patch. Most of your
questions below can be answered with a single answer:

The main purpose of the patch is (as the subject line says) to bring the
two functions more in line with eachother. At the time I made the patch,
I had made the observation that these function were trying to do much the
same thing, albeit in a slightly different form. This patch is therefore
about applying a series of (mostly non-functional) refactorings to make
their diff as small as possible. This involves "stupid" changes such as
renaming variables, tweaking whitespace, reordering the declaration of
variables, etc. It's all to make the functions similar to the point where
I can diff them, get a small and meaningful result, see the remaining
_real_ differences, and in the end, _merge_ them (see patches 7-9).
If this whole exercise didn't end up with merging the two functions into
one, I would _totally_ agree with you that all this refactoring is more

First, (and you'll see this in the commit message) I'm _moving_ (not
removing) the NUL termination out of verify_tag() and into main() (which I
can be sure is the only caller of verify_tag(), since verify_tag is
declared static, and there is no other call in that file). Two reasons for
doing this:

1. Make verify_tag more similar to parse_tag_buffer() (because
parse_tag_buffer() does not NUL terminate)

2. Do the NUL termination as close to the code that actually populated the
buffer with data (the read_pipe() in main())


So now you can ask: Why doesn't parse_tag_buffer() NUL terminate its
input? It _that_ safe? And I ran around checking all the callers of
parse_tag_buffer, and found that all of them use data (most of which
originates from read_sha1_file()) that's already NUL terminated.

In the end, I also put in a comment on the resulting function
(parse_and_verify_tag_buffer()), explicitly saying the given data _must_
be NUL terminated.



Side note: At first I actually thought the ...

Also update selftests to reflect that verification of "tagger" now
happens _before_ verification of type name, object sha1 and tag name.

Signed-off-by: Johan Herland <johan@herland.net>
---
 mktag.c          |   16 ++++++++--------
 t/t3800-mktag.sh |    3 +++
 tag.c            |    6 +++---
 3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/mktag.c b/mktag.c
index 0bc20c8..4dbefab 100644
--- a/mktag.c
+++ b/mktag.c
@@ -62,12 +62,18 @@ static int verify_tag(char *data, unsigned long size)
 
 	/* Verify tag-line */
 	tag_line = strchr(type_line, '\n');
-	if (!tag_line)
+	if (!tag_line++)
 		return error("char" PD_FMT ": could not find next \"\\n\"", type_line - data);
-	tag_line++;
 	if (memcmp(tag_line, "tag ", 4) || tag_line[4] == '\n')
 		return error("char" PD_FMT ": no \"tag \" found", tag_line - data);
 
+	/* Verify the tagger line */
+	tagger_line = strchr(tag_line, '\n');
+	if (!tagger_line++)
+		return error("char" PD_FMT ": could not find next \"\\n\"", tag_line - data);
+	if (memcmp(tagger_line, "tagger ", 7) || (tagger_line[7] == '\n'))
+		return error("char" PD_FMT ": could not find \"tagger\"", tagger_line - data);
+
 	/* Get the actual type */
 	type_len = tag_line - type_line - strlen("type \n");
 	if (type_len >= sizeof(type))
@@ -90,12 +96,6 @@ static int verify_tag(char *data, unsigned long size)
 		return error("char" PD_FMT ": could not verify tag name", tag_line - data);
 	}
 
-	/* Verify the tagger line */
-	tagger_line = tag_line;
-
-	if (memcmp(tagger_line, "tagger", 6) || (tagger_line[6] == '\n'))
-		return error("char" PD_FMT ": could not find \"tagger\"", tagger_line - data);
-
 	/* TODO: check for committer info + blank line? */
 	/* Also, the minimum length is probably + "tagger .", or 63+8=71 */
 
diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
index 7c7e433..b4edb4d 100755
--- a/t/t3800-mktag.sh
+++ b/t/t3800-mktag.sh
@@ -133,6 +133,7 @@ cat >tag.sig <<EOF
 object 779e9b33986b1c2670fff52c5067603117b3e895
 ...
From: Johan Herland
Date: Friday, June 8, 2007 - 5:14 pm

This is in preparation for unifying verify_tag() and
parse_tag_buffer_internal().

Signed-off-by: Johan Herland <johan@herland.net>
---
 tag.c |   54 +++++++++++++++++++++++++++++-------------------------
 1 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/tag.c b/tag.c
index 19c66cd..b134967 100644
--- a/tag.c
+++ b/tag.c
@@ -46,9 +46,11 @@ static int parse_tag_buffer_internal(struct tag *item, const char *data, const u
 	const char *type_line, *tag_line, *tagger_line;
 	unsigned long type_len, tag_len;
 
-	if (item->object.parsed)
-		return 0;
-	item->object.parsed = 1;
+	if (item) {
+		if (item->object.parsed)
+			return 0;
+		item->object.parsed = 1;
+	}
 
 	if (size < 64)
 		return error("failed preliminary size check");
@@ -85,28 +87,30 @@ static int parse_tag_buffer_internal(struct tag *item, const char *data, const u
 	memcpy(type, type_line + 5, type_len);
 	type[type_len] = '\0';
 
-	tag_len = tagger_line - tag_line - strlen("tag \n");
-	item->tag = xmalloc(tag_len + 1);
-	memcpy(item->tag, tag_line + 4, tag_len);
-	item->tag[tag_len] = '\0';
-
-	if (!strcmp(type, blob_type)) {
-		item->tagged = &lookup_blob(sha1)->object;
-	} else if (!strcmp(type, tree_type)) {
-		item->tagged = &lookup_tree(sha1)->object;
-	} else if (!strcmp(type, commit_type)) {
-		item->tagged = &lookup_commit(sha1)->object;
-	} else if (!strcmp(type, tag_type)) {
-		item->tagged = &lookup_tag(sha1)->object;
-	} else {
-		error("Unknown type %s", type);
-		item->tagged = NULL;
-	}
-
-	if (item->tagged && track_object_refs) {
-		struct object_refs *refs = alloc_object_refs(1);
-		refs->ref[0] = item->tagged;
-		set_object_refs(&item->object, refs);
+	if (item) {
+		tag_len = tagger_line - tag_line - strlen("tag \n");
+		item->tag = xmalloc(tag_len + 1);
+		memcpy(item->tag, tag_line + 4, tag_len);
+		item->tag[tag_len] = '\0';
+
+		if (!strcmp(type, blob_type)) {
+			item->tagged = &lookup_blob(sha1)->object;
+		} else if (!strcmp(type, tree_type)) ...
From: Johannes Schindelin
Date: Sunday, June 10, 2007 - 1:06 am

Hi,


The patch would have been so much simpler, and so much more descriptive of 
what you're doing, had you just said

	if (!item)
		return 0;

Ciao,
Dscho

-


Signed-off-by: Johan Herland <johan@herland.net>
---
 mktag.c |   29 ++++++++++++++++-------------
 1 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/mktag.c b/mktag.c
index 4dbefab..2e70504 100644
--- a/mktag.c
+++ b/mktag.c
@@ -81,19 +81,22 @@ static int verify_tag(char *data, unsigned long size)
 	memcpy(type, type_line + 5, type_len);
 	type[type_len] = '\0';
 
-	/* Verify that the object matches */
-	if (verify_object(sha1, type))
-		return error("char%d: could not verify object %s", 7, sha1_to_hex(sha1));
-
-	/* Verify the tag-name: we don't allow control characters or spaces in it */
-	tag_line += 4;
-	for (;;) {
-		unsigned char c = *tag_line++;
-		if (c == '\n')
-			break;
-		if (c > ' ')
-			continue;
-		return error("char" PD_FMT ": could not verify tag name", tag_line - data);
+	{
+		unsigned long i;
+
+		/* Verify that the object matches */
+		if (verify_object(sha1, type))
+			return error("char%d: could not verify object %s", 7, sha1_to_hex(sha1));
+
+		/* Verify the tag-name: we don't allow control characters or spaces in it */
+		for (i = 4;;) {
+			unsigned char c = tag_line[i++];
+			if (c == '\n')
+				break;
+			if (c > ' ')
+				continue;
+			return error("char" PD_FMT ": could not verify tag name", tag_line + i - data);
+		}
 	}
 
 	/* TODO: check for committer info + blank line? */
-- 
1.5.2


-


What is this change good for?
How did you justify the type selection for your
loop index variable?

IOW,  the patch looks very useless.
-


I agree. By itself, the patch is useless.

However, if you look at the next patch, you'll see that this exact piece of 
code is moved from verify_tag() to parse_and_verify_tag_buffer(), and in 
the new context, we can't increment tag_line, since the code that follows 
depends on tag_line not being moved.

In other words this patch is here so that the next patch will be easier to 
follow. because it's _literally_ moving copying code from verify_tag() and 
pasting it in parse_and_verify_tag_buffer().

I'm sorry if this is not clear from the patches.


...Johan


-- 
Johan Herland, <johan@herland.net>
www.herland.net
-


Hi,


Then it shouldn't be there.

It seems that you do not place the cuts between patches at the 
_conceptual_ layer. Therefore, they seem intrusive and often the meaning 
evades me.

So, if I understood the purpose of this patch series correctly, namely to 
use the same verification routines both for creation as for validation of 
tags, you could have

	- moved one function into the library (the stricter one), saying 
	  "move this_function() into libgit.a to make it usable from 
	   git-bla" in the commit body,

	- used that from the other program, removing the now-unused 
	  function,

	- and then changed the behaviour to be more chatty or some such.

As it is, you have a mix of conceptually different changes in almost every 
patch, and some changes that conceptually belong into the same patch, are 
not.

Be that as may, I think it is not a good change to reuse the same function 
like you did, exactly because one version _should_ be more forgiving than 
the other.

Ciao,
Dscho

-


Hi,


Do you realize that half of your diff consists of reindenting, just 
because you introduced this ugly construct, instead of being a good boy 
and put the declarations where they belong -- at the beginning of the 

Yes, you can write this construct. That does not change the fact that it 
gives me eye cancer.

Ciao,
Dscho

-

From: Johan Herland
Date: Friday, June 8, 2007 - 5:15 pm

Rename parse_tag_buffer_internal() to parse_and_verify_tag_buffer() since
it now does tag object verification as well.

Add a new parameter 'thorough_verify' for turning on/off the extra code
to be run when verifying tag objects (as opposed to general parsing).

verify_tag() and parse_and_verify_tag_buffer() are now functionally
equivalent, provided that parse_and_verify_tag_buffer() is called with
item == NULL and thorough_verification != 0.

Signed-off-by: Johan Herland <johan@herland.net>
---
 tag.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/tag.c b/tag.c
index b134967..3896e45 100644
--- a/tag.c
+++ b/tag.c
@@ -33,7 +33,26 @@ struct tag *lookup_tag(const unsigned char *sha1)
         return (struct tag *) obj;
 }
 
-static int parse_tag_buffer_internal(struct tag *item, const char *data, const unsigned long size)
+/*
+ * We refuse to tag something we can't verify. Just because.
+ */
+static int verify_object(unsigned char *sha1, const char *expected_type)
+{
+	int ret = -1;
+	enum object_type type;
+	unsigned long size;
+	void *buffer = read_sha1_file(sha1, &type, &size);
+
+	if (buffer) {
+		if (type == type_from_string(expected_type))
+			ret = check_sha1_signature(sha1, buffer, size, expected_type);
+		free(buffer);
+	}
+	return ret;
+}
+
+static int parse_and_verify_tag_buffer(struct tag *item,
+		const char *data, const unsigned long size, int thorough_verify)
 {
 #ifdef NO_C99_FORMAT
 #define PD_FMT "%d"
@@ -79,6 +98,10 @@ static int parse_tag_buffer_internal(struct tag *item, const char *data, const u
 	tagger_line = strchr(tag_line, '\n');
 	if (!tagger_line++)
 		return error("char" PD_FMT ": could not find next \"\\n\"", tag_line - data);
+	if (thorough_verify) {
+		if (memcmp(tagger_line, "tagger ", 7) || (tagger_line[7] == '\n'))
+			return error("char" PD_FMT ": could not find \"tagger\"", tagger_line - data);
+	}
 
 	/* Get the actual type */
 	type_len = ...
From: Alex Riesen
Date: Saturday, June 9, 2007 - 2:31 pm

This looks very familiar. Haven't you just made a very useless patch
which had this very same code? How about putting it in its own
function and just call it from these two places? And what problem
do you have with pointers?!
-

From: Johan Herland
Date: Saturday, June 9, 2007 - 2:39 pm

I just answered your comment on the previous patch, and that answer should
apply here as well.

I'm probably splitting this up into too small pieces, since I keep getting
comments that fail to see the overall picture of what I'm trying to do,
namely taking two similar pieces of code and slowly unifying them to the
point where I can replace one of them by a call to the other (see the two
next patches).

Hope this helps,


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-

From: Johannes Schindelin
Date: Sunday, June 10, 2007 - 1:22 am

Hi,


Maybe I said that your patch was too large. But then, I said something 
much more important: hard-to-review.

These small patches, split in a manner making it even more difficult to 
understand what you want to accomplish, do not help.

Yes, you should make small patches. Even small patch series. But in such a 
fashion that a reviewer can see that it is a good patch[*1*]. Just lean 
back, look at your patches, and ask yourself how you would have reacted if 
you had reviewed them.

Ciao,
Dscho

*1* A good patch follows the immortal words of Saint Exupery: A designer 
knows he has achieved perfection not when there is nothing left to add, 
but when there is nothing left to take away.
-


This involves exposing parse_and_verify_tag_buffer() in the tag API
(tag.h).

Also synchronize selftest with change in error message.

Signed-off-by: Johan Herland <johan@herland.net>
---
 mktag.c          |    5 ++---
 t/t3800-mktag.sh |    2 +-
 tag.c            |    2 +-
 tag.h            |    2 ++
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/mktag.c b/mktag.c
index 2e70504..5780f33 100644
--- a/mktag.c
+++ b/mktag.c
@@ -125,9 +125,8 @@ int main(int argc, char **argv)
 	}
 	buffer[size] = 0;
 
-	/* Verify it for some basic sanity: it needs to start with
-	   "object <sha1>\ntype\ntagger " */
-	if (verify_tag(buffer, size) < 0)
+	/* Verify tag object data */
+	if (parse_and_verify_tag_buffer(0, buffer, size, 1))
 		die("invalid tag signature file");
 
 	if (write_sha1_file(buffer, size, tag_type, result_sha1) < 0)
diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
index b4edb4d..0e87d2a 100755
--- a/t/t3800-mktag.sh
+++ b/t/t3800-mktag.sh
@@ -34,7 +34,7 @@ too short for a tag
 EOF
 
 cat >expect.pat <<EOF
-^error: .*size wrong.*$
+^error: .*size.*$
 EOF
 
 check_verify_failure 'Tag object length check'
diff --git a/tag.c b/tag.c
index 3896e45..e12b9fc 100644
--- a/tag.c
+++ b/tag.c
@@ -51,7 +51,7 @@ static int verify_object(unsigned char *sha1, const char *expected_type)
 	return ret;
 }
 
-static int parse_and_verify_tag_buffer(struct tag *item,
+int parse_and_verify_tag_buffer(struct tag *item,
 		const char *data, const unsigned long size, int thorough_verify)
 {
 #ifdef NO_C99_FORMAT
diff --git a/tag.h b/tag.h
index 7a0cb00..f341b7f 100644
--- a/tag.h
+++ b/tag.h
@@ -13,6 +13,8 @@ struct tag {
 };
 
 extern struct tag *lookup_tag(const unsigned char *sha1);
+extern int parse_and_verify_tag_buffer(struct tag *item,
+	const char *data, const unsigned long size, int thorough_verify);
 extern int parse_tag_buffer(struct tag *item, void *data, unsigned long size);
 extern int parse_tag(struct tag *item);
 extern struct object ...
From: Johan Herland
Date: Friday, June 8, 2007 - 5:16 pm

Signed-off-by: Johan Herland <johan@herland.net>
---
 mktag.c |   94 ---------------------------------------------------------------
 1 files changed, 0 insertions(+), 94 deletions(-)

diff --git a/mktag.c b/mktag.c
index 5780f33..4f226ab 100644
--- a/mktag.c
+++ b/mktag.c
@@ -14,100 +14,6 @@
  * shortest single-character-tag line.
  */
 
-/*
- * We refuse to tag something we can't verify. Just because.
- */
-static int verify_object(unsigned char *sha1, const char *expected_type)
-{
-	int ret = -1;
-	enum object_type type;
-	unsigned long size;
-	void *buffer = read_sha1_file(sha1, &type, &size);
-
-	if (buffer) {
-		if (type == type_from_string(expected_type))
-			ret = check_sha1_signature(sha1, buffer, size, expected_type);
-		free(buffer);
-	}
-	return ret;
-}
-
-static int verify_tag(char *data, unsigned long size)
-{
-#ifdef NO_C99_FORMAT
-#define PD_FMT "%d"
-#else
-#define PD_FMT "%td"
-#endif
-
-	unsigned char sha1[20];
-	char type[20];
-	const char *type_line, *tag_line, *tagger_line;
-	unsigned long type_len;
-
-	if (size < 64)
-		return error("wanna fool me ? you obviously got the size wrong !");
-
-	/* Verify object line */
-	if (memcmp(data, "object ", 7))
-		return error("char%d: does not start with \"object \"", 0);
-
-	if (get_sha1_hex(data + 7, sha1))
-		return error("char%d: could not get SHA1 hash", 7);
-
-	/* Verify type line */
-	type_line = data + 48;
-	if (memcmp(type_line - 1, "\ntype ", 6))
-		return error("char%d: could not find \"\\ntype \"", 47);
-
-	/* Verify tag-line */
-	tag_line = strchr(type_line, '\n');
-	if (!tag_line++)
-		return error("char" PD_FMT ": could not find next \"\\n\"", type_line - data);
-	if (memcmp(tag_line, "tag ", 4) || tag_line[4] == '\n')
-		return error("char" PD_FMT ": no \"tag \" found", tag_line - data);
-
-	/* Verify the tagger line */
-	tagger_line = strchr(tag_line, '\n');
-	if (!tagger_line++)
-		return error("char" PD_FMT ": could not find next \"\\n\"", tag_line - ...
From: Alex Riesen
Date: Saturday, June 9, 2007 - 2:39 pm

So, you do some useless changes just to remove the
function completely afterwards?
-

From: Johan Herland
Date: Saturday, June 9, 2007 - 2:42 pm

Yes. Basically so that people can follow my process. If you don't want the
intermediary/useless states, just look at my first patch series that was
replaced by this series because it was too bulky and disruptive to follow.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-

From: Johan Herland
Date: Friday, June 8, 2007 - 5:16 pm

Signed-off-by: Johan Herland <johan@herland.net>
---
 mktag.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/mktag.c b/mktag.c
index 4f226ab..31eadd8 100644
--- a/mktag.c
+++ b/mktag.c
@@ -21,7 +21,7 @@ int main(int argc, char **argv)
 	unsigned char result_sha1[20];
 
 	if (argc != 1)
-		usage("git-mktag < signaturefile");
+		usage("git-mktag < tag_data_file");
 
 	setup_git_directory();
 
@@ -32,14 +32,17 @@ int main(int argc, char **argv)
 	buffer[size] = 0;
 
 	/* Verify tag object data */
-	if (parse_and_verify_tag_buffer(0, buffer, size, 1))
-		die("invalid tag signature file");
+	if (parse_and_verify_tag_buffer(0, buffer, size, 1)) {
+		free(buffer);
+		die("invalid tag data file");
+	}
 
-	if (write_sha1_file(buffer, size, tag_type, result_sha1) < 0)
+	if (write_sha1_file(buffer, size, tag_type, result_sha1) < 0) {
+		free(buffer);
 		die("unable to write tag file");
+	}
 
 	free(buffer);
-
 	printf("%s\n", sha1_to_hex(result_sha1));
 	return 0;
 }
-- 
1.5.2


-

From: Alex Riesen
Date: Saturday, June 9, 2007 - 2:37 pm

This, and the similar one below are useless. You're destroying the
process, what do you free that buffer for? Either handle the error
case or do not needlessly complicate your change, which really
also absolutely unneeded.
-

From: Johan Herland
Date: Saturday, June 9, 2007 - 2:46 pm

Well, I was taught to treat my memory with care.

Right now it doesn't make any difference in practice (except that
Valgrind might be a bit happier with it), but in the future -- with
the libifaction effort and whatnot -- you never know what might happen
to this piece of code, and I'd like to stay on the safe side.

Feel free to drop this patch from the series if I'm the only one thinking
like this.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-

From: Alex Riesen
Date: Saturday, June 9, 2007 - 3:00 pm

How do you treat your performance?
Besides, was that systems with common address space

So that people have to check your free as well (they will have to,
they come looking for die-calls). You just made more work for them.
-

From: Johan Herland
Date: Saturday, June 9, 2007 - 3:05 pm

Hopefully with care, as well. However, I tend to look at performance _after_

Nope. Never programmed on either. I thought care with memory was generally
considered a good principle. If I'm wrong, please point me at the relevant

Ok. Drop it. This isn't particularily important to me. I just try to
follow good principles when I can.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-

From: Johannes Schindelin
Date: Sunday, June 10, 2007 - 1:38 am

Hi,

On Sat, 9 Jun 2007, Johan Herland wrote:



relate with your commit message?

You might understand that people _could_ get the impression that the 
patches were not very carefully crafted. Or even worse, the impression, 
that it was tried to slip some changes by, with a totally unrelated 
"official" purpose. (Again, reminds me of politics: it's like 
slipping an intrusive privacy law into an agriculture related law at the 
11th hour, just before Christmas, when people do not really pay 
attention.)

Ciao,
Dscho

-

From: Johan Herland
Date: Friday, June 8, 2007 - 5:17 pm

Also update selftests to reflect new error messages.

Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t3800-mktag.sh |   22 +++++++++++-----------
 tag.c            |   44 +++++++++++++++++++++++++++++---------------
 2 files changed, 40 insertions(+), 26 deletions(-)

diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
index 0e87d2a..3bce5e0 100755
--- a/t/t3800-mktag.sh
+++ b/t/t3800-mktag.sh
@@ -49,7 +49,7 @@ tag mytag
 EOF
 
 cat >expect.pat <<EOF
-^error: char0: .*"object "$
+^error: .*char 0.*"object ".*$
 EOF
 
 check_verify_failure '"object" line label check'
@@ -64,7 +64,7 @@ tag mytag
 EOF
 
 cat >expect.pat <<EOF
-^error: char7: .*SHA1 hash$
+^error: .*char 7.*SHA1 hash.*$
 EOF
 
 check_verify_failure '"object" line SHA1 check'
@@ -79,7 +79,7 @@ tag mytag
 EOF
 
 cat >expect.pat <<EOF
-^error: char47: .*"[\]ntype "$
+^error: .*char 47.*"[\]ntype ".*$
 EOF
 
 check_verify_failure '"type" line label check'
@@ -91,7 +91,7 @@ echo "object 779e9b33986b1c2670fff52c5067603117b3e895" >tag.sig
 printf "type tagsssssssssssssssssssssssssssssss" >>tag.sig
 
 cat >expect.pat <<EOF
-^error: char48: .*"[\]n"$
+^error: .*char 48.*"[\]n".*$
 EOF
 
 check_verify_failure '"type" line eol check'
@@ -106,7 +106,7 @@ xxx mytag
 EOF
 
 cat >expect.pat <<EOF
-^error: char57: no "tag " found$
+^error: .*char 57.*"tag ".*$
 EOF
 
 check_verify_failure '"tag" line label check #1'
@@ -121,7 +121,7 @@ tag
 EOF
 
 cat >expect.pat <<EOF
-^error: char87: no "tag " found$
+^error: .*char 87.*"tag ".*$
 EOF
 
 check_verify_failure '"tag" line label check #2'
@@ -137,7 +137,7 @@ tagger a
 EOF
 
 cat >expect.pat <<EOF
-^error: char53: type too long$
+^error: .*char 53.*Type too long.*$
 EOF
 
 check_verify_failure '"type" line type-name length check'
@@ -153,7 +153,7 @@ tagger a
 EOF
 
 cat >expect.pat <<EOF
-^error: char7: could not verify object.*$
+^error: .*char 7.*Could not verify tagged object.*$
 EOF
 
 check_verify_failure 'verify object ...
From: Johannes Schindelin
Date: Sunday, June 10, 2007 - 1:38 am

Hi,


Had you split the patch conceptually, the changes to the tests would have 
been included where appropriate.

I consider a patch _breaking_ a test case, with a follow up patch to the 
test case, a _bug_.

And I consider a change in error messages not good, _unless_ the changes 
have a real value in practice. Which I maintain they do not, in this 
patch-series' case.

Ciao,
Dscho

-

From: Johan Herland
Date: Friday, June 8, 2007 - 5:17 pm

Signed-off-by: Johan Herland <johan@herland.net>
---
 tag.c |   15 +++++++++++----
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/tag.c b/tag.c
index c7c75b3..ac76ec0 100644
--- a/tag.c
+++ b/tag.c
@@ -51,6 +51,13 @@ static int verify_object(unsigned char *sha1, const char *expected_type)
 	return ret;
 }
 
+/*
+ * Perform parsing and verification of tag object data.
+ *
+ * The 'item' parameter may be set to NULL if only verification is desired.
+ *
+ * The given data _must_ be null-terminated.
+ */
 int parse_and_verify_tag_buffer(struct tag *item,
 		const char *data, const unsigned long size, int thorough_verify)
 {
@@ -75,7 +82,7 @@ int parse_and_verify_tag_buffer(struct tag *item,
 		return error("Tag object failed preliminary size check");
 
 	/* Verify object line */
-	if (memcmp(data, "object ", 7))
+	if (prefixcmp(data, "object "))
 		return error("Tag object (@ char 0): "
 			"Does not start with \"object \"");
 
@@ -84,7 +91,7 @@ int parse_and_verify_tag_buffer(struct tag *item,
 
 	/* Verify type line */
 	type_line = data + 48;
-	if (memcmp(type_line - 1, "\ntype ", 6))
+	if (prefixcmp(type_line - 1, "\ntype "))
 		return error("Tag object (@ char 47): "
 			"Could not find \"\\ntype \"");
 
@@ -94,7 +101,7 @@ int parse_and_verify_tag_buffer(struct tag *item,
 		return error("Tag object (@ char " PD_FMT "): "
 			"Could not find \"\\n\" after \"type\"",
 			type_line - data);
-	if (memcmp(tag_line, "tag ", 4) || tag_line[4] == '\n')
+	if (prefixcmp(tag_line, "tag ") || tag_line[4] == '\n')
 		return error("Tag object (@ char " PD_FMT "): "
 			"Could not find \"tag \"", tag_line - data);
 
@@ -105,7 +112,7 @@ int parse_and_verify_tag_buffer(struct tag *item,
 			"Could not find \"\\n\" after \"tag\"",
 			tag_line - data);
 	if (thorough_verify) {
-		if (memcmp(tagger_line, "tagger ", 7) || (tagger_line[7] == '\n'))
+		if (prefixcmp(tagger_line, "tagger ") || (tagger_line[7] == '\n'))
 			return error("Tag object (@ char " ...
From: Alex Riesen
Date: Saturday, June 9, 2007 - 2:42 pm

This hunk really belongs into commit which introduced the function
parse_and_verify_tag_buffer.
-

From: Johan Herland
Date: Saturday, June 9, 2007 - 2:47 pm

Yes. I'm sorry it slipped out of that patch and into this one.


...Johan


-- 
Johan Herland, <johan@herland.net>
www.herland.net
-

From: Johannes Schindelin
Date: Sunday, June 10, 2007 - 1:41 am

Hi,


FWIW I think that _these_ changes are actually somewhat worth it. And they 
should have come _instead_ of moving the order of the memcmp() around 
(you know which patch that was).

Ciao,
Dscho

-

From: Johan Herland
Date: Friday, June 8, 2007 - 5:18 pm

For each of the parsed lines we at some point skip past its initial
identifier ("type ", "tag ", etc.). We also at some point calculate the
length of the remaining line. This patch moves these calculations into
one place. This provides _one_ place for all header lines where their
respective pointers start pointing at the header value (instead of the
start of the line), and their lengths are calculated.

Signed-off-by: Johan Herland <johan@herland.net>
---
 tag.c |   21 ++++++++++++++-------
 1 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/tag.c b/tag.c
index ac76ec0..9a6924f 100644
--- a/tag.c
+++ b/tag.c
@@ -118,12 +118,20 @@ int parse_and_verify_tag_buffer(struct tag *item,
 				tagger_line - data);
 	}
 
+	/*
+	 * Advance header field pointers past their initial identifier.
+	 * Calculate lengths of header fields.
+	 */
+	type_line += strlen("type ");
+	type_len   = tag_line - type_line - 1;
+	tag_line  += strlen("tag ");
+	tag_len    = tagger_line - tag_line - 1;
+
 	/* Get the actual type */
-	type_len = tag_line - type_line - strlen("type \n");
 	if (type_len >= sizeof(type))
 		return error("Tag object (@ char " PD_FMT "): "
-			"Type too long", type_line + 5 - data);
-	memcpy(type, type_line + 5, type_len);
+			"Type too long", type_line - data);
+	memcpy(type, type_line, type_len);
 	type[type_len] = '\0';
 
 	if (thorough_verify) {
@@ -136,7 +144,7 @@ int parse_and_verify_tag_buffer(struct tag *item,
 				sha1_to_hex(sha1));
 
 		/* Verify tag name: disallow control characters or spaces */
-		for (i = 4;;) {
+		for (i = 0;;) {
 			unsigned char c = tag_line[i++];
 			if (c == '\n')
 				break;
@@ -154,9 +162,8 @@ int parse_and_verify_tag_buffer(struct tag *item,
 	}
 
 	if (item) {
-		tag_len = tagger_line - tag_line - strlen("tag \n");
 		item->tag = xmalloc(tag_len + 1);
-		memcpy(item->tag, tag_line + 4, tag_len);
+		memcpy(item->tag, tag_line, tag_len);
 		item->tag[tag_len] = '\0';
 
 		if (!strcmp(type, blob_type)) ...
From: Johannes Schindelin
Date: Sunday, June 10, 2007 - 1:45 am

Hi,


This change does not clarify anything. It is exactly as confusing as 

I know you introduced this in another patch. This is an ugly construct. If 
you want people to review your patches, you might as well put in the 
effort to make the reviewing more pleasant.

Ciao,
Dscho

-

From: Johan Herland
Date: Friday, June 8, 2007 - 5:18 pm

Be explicit about the fact that the "tagger" line is now considered a
mandatory part of the tag object. There are however old tags (from before
July 2005) that don't have a "tagger" line. We therefore consider the
"tagger" line _optional_ when parsing tags without thorough_verify set.

In practice this means that verification of a missing "tagger" line will
only fail when adding or fsck-ing tags.

Signed-off-by: Johan Herland <johan@herland.net>
---
 tag.c |   48 ++++++++++++++++++++++++++++++++++--------------
 1 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/tag.c b/tag.c
index 9a6924f..c373c86 100644
--- a/tag.c
+++ b/tag.c
@@ -69,8 +69,9 @@ int parse_and_verify_tag_buffer(struct tag *item,
 
 	unsigned char sha1[20];
 	char type[20];
-	const char *type_line, *tag_line, *tagger_line;
-	unsigned long type_len, tag_len;
+	const char   *type_line, *tag_line, *tagger_line;
+	unsigned long type_len,   tag_len,   tagger_len;
+	const char *header_end;
 
 	if (item) {
 		if (item->object.parsed)
@@ -81,7 +82,7 @@ int parse_and_verify_tag_buffer(struct tag *item,
 	if (size < 64)
 		return error("Tag object failed preliminary size check");
 
-	/* Verify object line */
+	/* Verify mandatory object line */
 	if (prefixcmp(data, "object "))
 		return error("Tag object (@ char 0): "
 			"Does not start with \"object \"");
@@ -89,13 +90,13 @@ int parse_and_verify_tag_buffer(struct tag *item,
 	if (get_sha1_hex(data + 7, sha1))
 		return error("Tag object (@ char 7): Could not get SHA1 hash");
 
-	/* Verify type line */
+	/* Verify mandatory type line */
 	type_line = data + 48;
 	if (prefixcmp(type_line - 1, "\ntype "))
 		return error("Tag object (@ char 47): "
 			"Could not find \"\\ntype \"");
 
-	/* Verify tag-line */
+	/* Verify mandatory tag line */
 	tag_line = strchr(type_line, '\n');
 	if (!tag_line++)
 		return error("Tag object (@ char " PD_FMT "): "
@@ -105,27 +106,46 @@ int parse_and_verify_tag_buffer(struct tag *item,
 		return ...
From: Johannes Schindelin
Date: Sunday, June 10, 2007 - 1:52 am

Hi,


No. The "before July 2005" part is _not_ the reason that we consider this 
line optional.

The fact that it is bad to fail on a fetch, just because you happen to 
have an invalid tag in your repository, is a good reason not to.

The fact that it is bad to fail on a git branch, just because you happen 
to have an invalid tag in your repository, is a good reason not to.

The fact that it is bad to fail on an fsck, just because you happen to 
have an invalid tag in your repository, is a good reason not to.

And yes, if I remember correctly, your original patch did exactly that.

The paradigm to follow is: fail gracefully. I could have an invalid 
_commit_ in my repository, and would still want _every_ Git operation to 
succeed, _as long_ as it does not touch that bad object.

And I damned well want git-fsck to not crash, just because some 
assumptions are made.

Since this is a fundamental critique on your patch series, I will do the 
detailed review on _this_ patch in another mail.

Ciao,
Dscho

-

From: Johannes Schindelin
Date: Sunday, June 10, 2007 - 1:58 am

Hi,



Hmph. I think everybody reading C code understands that this is mandatory. 
I even think that the comment is useless. It is sort of a 

I maintain that even with thorough checking, it is _wrong_ to error on a 
missing tagger. Since we have to deal with tagger-less tags _anyway_, and 
since it is not like you could really do something about it (the tag is 

Does that really make sense? You effectively treat a missing field as if 


Hmpf.

Ciao,
Dscho

-

From: Johan Herland
Date: Friday, June 8, 2007 - 5:19 pm

The tag line is now optional. If not given in the tag object data, it
defaults to the empty string ("") in the parsed tag object.

The patch also adds a change to git-show; when asked to display a tag
object with no name (missing "tag" header), we will show the tag's sha1
instead of an empty string.

Finally the patch includes some tweaks to the selftests to make them work
with optional tag names.

Signed-off-by: Johan Herland <johan@herland.net>
---
 builtin-log.c    |    2 +-
 t/t3800-mktag.sh |    6 +++---
 tag.c            |   51 +++++++++++++++++++++++++++++----------------------
 tag.h            |    2 +-
 4 files changed, 34 insertions(+), 27 deletions(-)

diff --git a/builtin-log.c b/builtin-log.c
index 212cdfc..8a238c7 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -181,7 +181,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 			printf("%stag %s%s\n\n",
 					diff_get_color(rev.diffopt.color_diff,
 						DIFF_COMMIT),
-					t->tag,
+					*(t->tag) ? t->tag : name,
 					diff_get_color(rev.diffopt.color_diff,
 						DIFF_RESET));
 			ret = show_object(o->sha1, 1);
diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
index 3bce5e0..3381239 100755
--- a/t/t3800-mktag.sh
+++ b/t/t3800-mktag.sh
@@ -106,7 +106,7 @@ xxx mytag
 EOF
 
 cat >expect.pat <<EOF
-^error: .*char 57.*"tag ".*$
+^error: .*char 57.*$
 EOF
 
 check_verify_failure '"tag" line label check #1'
@@ -121,7 +121,7 @@ tag
 EOF
 
 cat >expect.pat <<EOF
-^error: .*char 87.*"tag ".*$
+^error: .*char 87.*$
 EOF
 
 check_verify_failure '"tag" line label check #2'
@@ -169,7 +169,7 @@ tagger a
 EOF
 
 cat >expect.pat <<EOF
-^error: .*char 67.*Could not verify tag name.*$
+^error: .*char 66.*Could not verify tag name.*$
 EOF
 
 check_verify_failure 'verify tag-name check'
diff --git a/tag.c b/tag.c
index c373c86..fb678d7 100644
--- a/tag.c
+++ b/tag.c
@@ -96,15 +96,21 @@ int parse_and_verify_tag_buffer(struct tag *item,
 		return error("Tag object (@ char 47): "
 ...
From: Johannes Schindelin
Date: Sunday, June 10, 2007 - 2:07 am

Hi,


If you don't actually _test_ missing tag names, you might just as well 

This is misleading. What you wanted to say is t->tag[0] == '\0', or 
*(t->tag) == '\0'.

As you wrote it, you have to think a couple of times why it is okay to 
dereference t->tag, to check if you say t->tag.

Besides, it breaks if you _do_ have an empty tag. In that case, I _want_ 
to see that it is actually empty, and _not_ the SHA1 substituted for it.

Ciao,
Dscho

-

From: Johan Herland
Date: Friday, June 8, 2007 - 5:19 pm

This patch introduces a new optional header line to the tag object, called
"keywords". The "keywords" line may contain a comma-separated list of
custom keywords associated with the tag object. There are two "special"
keywords, however: "tag" and "note": When the "keywords" header is
missing, its default value is set to "tag" if a "tag" header is
present; else the default "keywords" value is set to "note". The
"keywords" header is meant to be used by porcelains for classifying
different types of tag objects. This classification may then be used to
filter tag objects in the presentation layer (e.g. by implementing
extra filter options to --decorate, etc.)

Signed-off-by: Johan Herland <johan@herland.net>
---
 tag.c |   60 +++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 tag.h |    1 +
 2 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/tag.c b/tag.c
index fb678d7..1caec19 100644
--- a/tag.c
+++ b/tag.c
@@ -69,8 +69,8 @@ int parse_and_verify_tag_buffer(struct tag *item,
 
 	unsigned char sha1[20];
 	char type[20];
-	const char   *type_line, *tag_line, *tagger_line;
-	unsigned long type_len,   tag_len,   tagger_len;
+	const char   *type_line, *tag_line, *keywords_line, *tagger_line;
+	unsigned long type_len,   tag_len,   keywords_len,   tagger_len;
 	const char *header_end;
 
 	if (item) {
@@ -103,15 +103,26 @@ int parse_and_verify_tag_buffer(struct tag *item,
 			"Could not find \"\\n\" after \"type\"",
 			type_line - data);
 	if (prefixcmp(tag_line, "tag ")) /* no tag name given */
-		tagger_line = tag_line;
+		keywords_line = tag_line;
 	else {                           /* tag name given */
-		tagger_line = strchr(tag_line, '\n');
-		if (!tagger_line++)
+		keywords_line = strchr(tag_line, '\n');
+		if (!keywords_line++)
 			return error("Tag object (@ char " PD_FMT "): "
 				"Could not find \"\\n\" after \"tag\"",
 				tag_line - data);
 	}
 
+	/* Verify optional keywords line */
+	if (prefixcmp(keywords_line, "keywords ")) /* no ...
From: Alex Riesen
Date: Saturday, June 9, 2007 - 2:52 pm

Who frees the keywords and what's wrong with strndup?
-

From: Johan Herland
Date: Saturday, June 9, 2007 - 3:00 pm

Hmm. Same as for the "tag" line, and the rest of the tag object, I guess.
I agree this should probably be specified. What are the rules for other


The code that should free the tag name should also free the keywords.
No code appear to do this at the moment (regardless of whether you use my patch
series or not), which is a shame. We should make sure all objects are properly
deallocated. This is currently a general problem in git, isn't it?


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-


Using xstrndup() yields more compact and readable code than using
xmalloc(), memcpy() and manual NUL termination.
Thanks to Alex Riesen <raa.lkml@gmail.com> for suggesting this.

Also fixes a buglet where item->keywords would always be set to "tag",
even if item->tag was empty.

Signed-off-by: Johan Herland <johan@herland.net>
---


Nothing. 


...Johan

 tag.c |   27 ++++++++++-----------------
 1 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/tag.c b/tag.c
index c3a2855..2307ec9 100644
--- a/tag.c
+++ b/tag.c
@@ -219,26 +219,19 @@ int parse_and_verify_tag_buffer(struct tag *item,
 	}
 
 	if (item) { /* Store parsed information into item */
-		if (tag_len) { /* optional tag name was given */
-			item->tag = xmalloc(tag_len + 1);
-			memcpy(item->tag, tag_line, tag_len);
-			item->tag[tag_len] = '\0';
-		}
-		else { /* optional tag name not given */
-			item->tag = xmalloc(1);
-			item->tag[0] = '\0';
-		}
+		if (tag_len) /* optional tag name was given */
+			item->tag = xstrndup(tag_line, tag_len);
+		else /* optional tag name not given */
+			item->tag = xstrndup("", 0);
 
-		if (keywords_len) { /* optional keywords string was given */
-			item->keywords = xmalloc(keywords_len + 1);
-			memcpy(item->keywords, keywords_line, keywords_len);
-			item->keywords[keywords_len] = '\0';
-		}
+		if (keywords_len) /* optional keywords string was given */
+			item->keywords = xstrndup(keywords_line, keywords_len);
 		else { /* optional keywords string not given. Set default */
 			/* if tag name is set, use "tag"; else use "note" */
-			const char *default_kw = item->tag ? "tag" : "note";
-			item->keywords = xmalloc(strlen(default_kw) + 1);
-			memcpy(item->keywords, default_kw, strlen(default_kw) + 1);
+			if (*(item->tag))
+				item->keywords = xstrndup("tag", 3);
+			else
+				item->keywords = xstrndup("note", 4);
 		}
 
 		if (!strcmp(type, blob_type)) {
-- 
1.5.2.1.144.gabc40

-

From: Junio C Hamano
Date: Saturday, June 9, 2007 - 5:05 pm

It is DEL, but as the code uses uchar, it probably also error on
0x80 or higher, if the intent is "printable ASCII".


-

From: Johan Herland
Date: Saturday, June 9, 2007 - 5:35 pm

Signed-off-by: Johan Herland <johan@herland.net>
---


Is this better?


...Johan

 tag.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tag.c b/tag.c
index 2307ec9..2b92465 100644
--- a/tag.c
+++ b/tag.c
@@ -183,8 +183,8 @@ int parse_and_verify_tag_buffer(struct tag *item,
 		/* Verify tag name: disallow control characters or spaces */
 		if (tag_len) { /* tag name was given */
 			for (i = 0; i < tag_len; ++i) {
-				unsigned char c = tag_line[i];
-				if (c > ' ' && c != 0x7f)
+				char c = tag_line[i];
+				if (c > ' ' && c < 0x7f)
 					continue;
 				return FAIL("Tag object (@ char " PD_FMT "): "
 					"Could not verify tag name",
@@ -198,13 +198,13 @@ int parse_and_verify_tag_buffer(struct tag *item,
 		 */
 		if (keywords_len) { /* keywords line was given */
 			for (i = 0; i < keywords_len; ++i) {
-				unsigned char c = keywords_line[i];
+				char c = keywords_line[i];
 				if (c == ',' && keywords_line[i + 1] == ',')
 					/* consecutive commas */
 					return FAIL("Tag object (@ char "
 						PD_FMT "): Found empty keyword",
 						keywords_line + i - data);
-				if (c > ' ' && c != 0x7f)
+				if (c > ' ' && c < 0x7f)
 					continue;
 				return FAIL("Tag object (@ char " PD_FMT "): "
 					"Could not verify keywords",
-- 
1.5.2.1.144.gabc40

	
-

From: Junio C Hamano
Date: Saturday, June 9, 2007 - 6:33 pm

I said "*if* the intent is to limit to printable ASCII".

It is still unclear what use cases the "keywords" need to
support (e.g. do we want to have "pick tags that have keyword
that matches this pattern"?  if so, what kind of pattern
language do we want to use?  Glob?  Regexp?  Would it be more
convenient if the keywords are treated case insensitively?  Is
there a useful use case if you are allowed to use people's names
as keywords?  Is it reasonable to assume that the keywords can
be split with a comma?  Do we want to allow a comma as part of
keywords?  If so, how would we quote a comma that is inside a
keyword?  etc.).  It depends on the answers to these questions
if "printable ASCII" is a good set.

As a general rule, if you make the initially allowed set of
values too wide, it gets _extremely_ hard to tighten it later.
So if we are to stick to printable ASCII (iow, "no people's
names or names of location as keywords"), I would even suggest
to limit the valid set further, say "^[-A-Za-z0-9_+.]+$", at
least initially.


-

From: Johan Herland
Date: Friday, June 8, 2007 - 5:20 pm

Also update minimum tag object length to the new minimum length after refactoring.

Signed-off-by: Johan Herland <johan@herland.net>
---
 mktag.c |   30 ++++++++++++++++++++++--------
 tag.c   |    2 +-
 2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/mktag.c b/mktag.c
index 31eadd8..af0cfa6 100644
--- a/mktag.c
+++ b/mktag.c
@@ -2,16 +2,30 @@
 #include "tag.h"
 
 /*
- * A signature file has a very simple fixed format: four lines
- * of "object <sha1>" + "type <typename>" + "tag <tagname>" +
+ * Tag object data has the following format: two mandatory lines of
+ * "object <sha1>" + "type <typename>", plus two optional lines of
+ * "tag <tagname>" + "keywords <keywords>", plus a mandatory line of
  * "tagger <committer>", followed by a blank line, a free-form tag
- * message and a signature block that git itself doesn't care about,
- * but that can be verified with gpg or similar.
+ * message and an optional signature block that git itself doesn't
+ * care about, but that can be verified with gpg or similar.
  *
- * The first three lines are guaranteed to be at least 63 bytes:
- * "object <sha1>\n" is 48 bytes, "type tag\n" at 9 bytes is the
- * shortest possible type-line, and "tag .\n" at 6 bytes is the
- * shortest single-character-tag line.
+ * <sha1> represents the object pointed to by this tag, <typename> is
+ * the type of the object pointed to ("tag", "blob", "tree" or "commit"),
+ * <tagname> is the name of this tag object (and must correspond to the
+ * name of the corresponding ref (if any) in '.git/refs/'). <keywords> is
+ * a comma-separated list of keywords associated with this tag object, and
+ * <committer> holds the "name <email>" of the tag creator and timestamp
+ * of when the tag object was created (analogous to "committer" in commit
+ * objects).
+ *
+ * The first two lines are guaranteed to be at least 57 bytes:
+ * "object <sha1>\n" is 48 bytes, and "type tag\n" at 9 bytes is
+ * the shortest possible "type" line. The tagger line ...
From: Johan Herland
Date: Friday, June 8, 2007 - 5:20 pm

Teach git-fsck to do the same kind of verification on tag objects that is
already done by git-mktag.

Signed-off-by: Johan Herland <johan@herland.net>
---
 builtin-fsck.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/builtin-fsck.c b/builtin-fsck.c
index 944a496..7b4c36b 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -359,11 +359,26 @@ static int fsck_commit(struct commit *commit)
 static int fsck_tag(struct tag *tag)
 {
 	struct object *tagged = tag->tagged;
+	enum object_type type;
+	unsigned long size;
+	char *data = (char *) read_sha1_file(tag->object.sha1, &type, &size);
 
 	if (verbose)
 		fprintf(stderr, "Checking tag %s\n",
 			sha1_to_hex(tag->object.sha1));
 
+	if (!data)
+		return objerror(&tag->object, "could not read tag");
+	if (type != OBJ_TAG) {
+		free(data);
+		return objerror(&tag->object, "not a tag (internal error)");
+	}
+	if (parse_and_verify_tag_buffer(0, data, size, 1)) { /* thoroughly verify tag object */
+		free(data);
+		return objerror(&tag->object, "failed thorough tag object verification");
+	}
+	free(data);
+
 	if (!tagged) {
 		return objerror(&tag->object, "could not load tagged object");
 	}
-- 
1.5.2


-

From: Johan Herland
Date: Friday, June 8, 2007 - 5:20 pm

The new structure of tag objects is documented.

Also some much-needed cleanup is done. E.g. remove the paragraph on the
8kB limit, since this limit was removed ages ago.

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/git-mktag.txt |   41 ++++++++++++++++++++++++++++++-----------
 1 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-mktag.txt b/Documentation/git-mktag.txt
index 0ac3be1..411105d 100644
--- a/Documentation/git-mktag.txt
+++ b/Documentation/git-mktag.txt
@@ -8,38 +8,57 @@ git-mktag - Creates a tag object
 
 SYNOPSIS
 --------
-'git-mktag' < signature_file
+[verse]
+'git-mktag' < tag_data_file
+
 
 DESCRIPTION
 -----------
-Reads a tag contents on standard input and creates a tag object
+Reads tag object data on standard input and creates a tag object
 that can also be used to sign other objects.
 
 The output is the new tag's <object> identifier.
 
-Tag Format
+
+DISCUSSION
 ----------
-A tag signature file has a very simple fixed format: three lines of
+Tag object data has the following format
 
+[verse]
   object <sha1>
   type <typename>
-  tag <tagname>
+  tag <tagname>               (optional)
+  keywords <keywords>         (optional)
+  tagger <committer>
+
+followed by a blank line and a free-form message and an optional signature
+that git itself doesn't care about, but that may be verified with gpg or
+similar.
 
-followed by some 'optional' free-form signature that git itself
-doesn't care about, but that can be verified with gpg or similar.
+In the above listing, `<sha1>` represents the object pointed to by this tag,
+`<typename>` is the type of the object pointed to ("tag", "blob", "tree" or
+"commit"), `<tagname>` is the name of this tag object (and must correspond
+to the name of the corresponding ref (if any) in `.git/refs/`). `<keywords>`
+is a comma-separated list of keywords associated with this tag object, and
+`<committer>` holds the "`name <email>`" of the tag creator ...
From: Johan Herland
Date: Friday, June 8, 2007 - 5:21 pm

Some more tests are added to test the new "keywords" header, and to test
the more thorough verification routine.

Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t3800-mktag.sh |  212 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 200 insertions(+), 12 deletions(-)

diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
index 3381239..b3b3121 100755
--- a/t/t3800-mktag.sh
+++ b/t/t3800-mktag.sh
@@ -46,6 +46,8 @@ cat >tag.sig <<EOF
 xxxxxx 139e9b33986b1c2670fff52c5067603117b3e895
 type tag
 tag mytag
+tagger foo
+
 EOF
 
 cat >expect.pat <<EOF
@@ -61,6 +63,8 @@ cat >tag.sig <<EOF
 object zz9e9b33986b1c2670fff52c5067603117b3e895
 type tag
 tag mytag
+tagger foo
+
 EOF
 
 cat >expect.pat <<EOF
@@ -76,6 +80,8 @@ cat >tag.sig <<EOF
 object 779e9b33986b1c2670fff52c5067603117b3e895
 xxxx tag
 tag mytag
+tagger foo
+
 EOF
 
 cat >expect.pat <<EOF
@@ -103,6 +109,8 @@ cat >tag.sig <<EOF
 object 779e9b33986b1c2670fff52c5067603117b3e895
 type tag
 xxx mytag
+tagger foo
+
 EOF
 
 cat >expect.pat <<EOF
@@ -118,6 +126,9 @@ cat >tag.sig <<EOF
 object 779e9b33986b1c2670fff52c5067603117b3e895
 type taggggggggggggggggggggggggggggggg
 tag
+keywords foo
+tagger bar@baz.com
+
 EOF
 
 cat >expect.pat <<EOF
@@ -127,13 +138,15 @@ EOF
 check_verify_failure '"tag" line label check #2'
 
 ############################################################
-#  8. type line type-name length check
+#  8. type line type name length check
 
 cat >tag.sig <<EOF
 object 779e9b33986b1c2670fff52c5067603117b3e895
 type taggggggggggggggggggggggggggggggg
 tag mytag
-tagger a
+keywords foo
+tagger bar@baz.com
+
 EOF
 
 cat >expect.pat <<EOF
@@ -149,7 +162,9 @@ cat >tag.sig <<EOF
 object 779e9b33986b1c2670fff52c5067603117b3e895
 type tagggg
 tag mytag
-tagger a
+keywords foo
+tagger bar@baz.com
+
 EOF
 
 cat >expect.pat <<EOF
@@ -159,13 +174,15 @@ EOF
 check_verify_failure 'verify object (SHA1/type) check'
 
 ...

This patch adds the check described by Junio.

It also includes a bugfix when tag object parsing fails, from
Johannes Schindelin <Johannes.Schindelin@gmx.de>.

Signed-off-by: Johan Herland <johan@herland.net>
---
 builtin-fsck.c |   26 ++++++++++++++++++++++++++
 1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/builtin-fsck.c b/builtin-fsck.c
index 7b4c36b..a47976c 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -542,6 +542,30 @@ static int fsck_handle_reflog(const char *logname, const unsigned char *sha1, in
 	return 0;
 }
 
+static void fsck_verify_ref_to_tag_object(const char *refname, struct object *obj)
+{
+	/* Verify that refname matches the name stored in obj's "tag" header */
+	struct tag *tagobj = (struct tag *) parse_object(obj->sha1);
+	size_t tagname_len;
+	size_t refname_len = strlen(refname);
+
+	if (!tagobj->tag) {
+		error("%s: Failed to parse tag object %s", refname, sha1_to_hex(obj->sha1));
+		return;
+	}
+	tagname_len = strlen(tagobj->tag);
+	if (!tagname_len) return; /* No tag name stored in tagobj. Nothing to do. */
+
+	if (tagname_len < refname_len &&
+		!memcmp(tagobj->tag, refname + (refname_len - tagname_len), tagname_len) &&
+		refname[(refname_len - tagname_len) - 1] == '/') {
+		/* OK: tag name is "$name", and refname ends with "/$name" */
+		return;
+	}
+	else
+		error("%s: Mismatch between tag ref and tag object's name: '%s'", refname, tagobj->tag);
+}
+
 static int fsck_handle_ref(const char *refname, const unsigned char *sha1, int flag, void *cb_data)
 {
 	struct object *obj;
@@ -556,6 +580,8 @@ static int fsck_handle_ref(const char *refname, const unsigned char *sha1, int f
 		/* We'll continue with the rest despite the error.. */
 		return 0;
 	}
+	if (obj->type == OBJ_TAG) /* ref to tag object */
+		fsck_verify_ref_to_tag_object(refname, obj);
 	default_refs++;
 	obj->used = 1;
 	mark_reachable(obj, REACHABLE);
-- 
1.5.2

-

Previous thread: interaction between cvsimport and cvsexportcommit by picca on Friday, June 8, 2007 - 6:42 am. (2 messages)

Next thread: git-p4 fails when cloning a p4 depo. by Benjamin Sergeant on Friday, June 8, 2007 - 9:41 am. (14 messages)