Re: [PATCH 2/5] add the ability to select more email header fields to output

Previous thread: git protocol over port-forwarding by Bill Lear on Wednesday, March 14, 2007 - 11:37 am. (6 messages)

Next thread: [PATCH] Teach git-remote to list pushed branches. by Johannes Sixt on Wednesday, March 14, 2007 - 1:56 pm. (1 message)
From: Don Zickus
Date: Wednesday, March 14, 2007 - 1:12 pm

Another round of cleanups as noticed by Junio.  
Only the the first two patches were touched.

-coding style cleanups
-better boundary checking

Cheers,
Don


-

From: Don Zickus
Date: Wednesday, March 14, 2007 - 1:12 pm

They handle cases where there is no attached patch.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 t/t5100-mailinfo.sh |    2 +-
 t/t5100/info0007    |    5 +++++
 t/t5100/info0008    |    5 +++++
 t/t5100/msg0007     |    2 ++
 t/t5100/msg0008     |    4 ++++
 t/t5100/sample.mbox |   18 ++++++++++++++++++
 6 files changed, 35 insertions(+), 1 deletions(-)
 create mode 100644 t/t5100/info0007
 create mode 100644 t/t5100/info0008
 create mode 100644 t/t5100/msg0007
 create mode 100644 t/t5100/msg0008
 create mode 100644 t/t5100/patch0007
 create mode 100644 t/t5100/patch0008

diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 4d2b781..ca96918 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -11,7 +11,7 @@ test_expect_success 'split sample box' \
 	'git-mailsplit -o. ../t5100/sample.mbox >last &&
 	last=`cat last` &&
 	echo total is $last &&
-	test `cat last` = 6'
+	test `cat last` = 8'
 
 for mail in `echo 00*`
 do
diff --git a/t/t5100/info0007 b/t/t5100/info0007
new file mode 100644
index 0000000..49bb0fe
--- /dev/null
+++ b/t/t5100/info0007
@@ -0,0 +1,5 @@
+Author: A U Thor
+Email: a.u.thor@example.com
+Subject: another patch
+Date: Fri, 9 Jun 2006 00:44:16 -0700
+
diff --git a/t/t5100/info0008 b/t/t5100/info0008
new file mode 100644
index 0000000..e8a2951
--- /dev/null
+++ b/t/t5100/info0008
@@ -0,0 +1,5 @@
+Author: Junio C Hamano
+Email: junio@kernel.org
+Subject: another patch
+Date: Fri, 9 Jun 2006 00:44:16 -0700
+
diff --git a/t/t5100/msg0007 b/t/t5100/msg0007
new file mode 100644
index 0000000..71b23c0
--- /dev/null
+++ b/t/t5100/msg0007
@@ -0,0 +1,2 @@
+Here is an empty patch from A U Thor.
+
diff --git a/t/t5100/msg0008 b/t/t5100/msg0008
new file mode 100644
index 0000000..a80ecb9
--- /dev/null
+++ b/t/t5100/msg0008
@@ -0,0 +1,4 @@
+>Here is an empty patch from A U Thor.
+
+Hey you forgot the patch!
+
diff --git a/t/t5100/patch0007 b/t/t5100/patch0007
new file mode 100644
index 0000000..e69de29
diff ...
From: Don Zickus
Date: Wednesday, March 14, 2007 - 1:12 pm

I have come across many emails that use long strings of '-'s as separators
for ideas.  This patch below limits the separator to only 3 '-', with the
intent that long string of '-'s will stay in the commit msg and not in the
patch file.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>

---
I purposedly separated this patch out because I wasn't sure if anyone would
have objections to it.  I tested it on numerous emails with and with patches
and didn't see any issues.
---
 builtin-mailinfo.c |   37 ++++++++++++++++++++++++++++++++++---
 1 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
index dd0f563..a8d5b60 100644
--- a/builtin-mailinfo.c
+++ b/builtin-mailinfo.c
@@ -652,6 +652,39 @@ again:
 	return (fgets(line, sizeof(line), fin) != NULL);
 }
 
+static inline int patchbreak(const char *line)
+{
+	/* Beginning of a "diff -" header? */
+	if (!memcmp("diff -", line, 6))
+		return 1;
+
+	/* CVS "Index: " line? */
+	if (!memcmp("Index: ", line, 7))
+		return 1;
+
+	/*
+	 * "--- <filename>" starts patches without headers
+	 * "---<sp>*" is a manual separator
+	 */
+	if (!memcmp("---", line, 3)) {
+		line += 3;
+		/* space followed by a filename? */
+		if (line[0] == ' ' && !isspace(line[1]))
+			return 1;
+		/* Just whitespace? */
+		for (;;) {
+			unsigned char c = *line++;
+			if (c == '\n')
+				return 1;
+			if (!isspace(c))
+				break;
+		}
+		return 0;
+	}
+	return 0;
+}
+
+
 static int handle_commit_msg(char *line)
 {
 	static int still_looking=1;
@@ -673,9 +706,7 @@ static int handle_commit_msg(char *line)
 			return 0;
 	}
 
-	if (!memcmp("diff -", line, 6) ||
-	    !memcmp("---", line, 3) ||
-	    !memcmp("Index: ", line, 7)) {
+	if (patchbreak(line)) {
 		fclose(cmitmsg);
 		cmitmsg = NULL;
 		return 1;
-- 
1.5.0.2.211.g2ca9-dirty

-

From: Don Zickus
Date: Wednesday, March 14, 2007 - 1:12 pm

I am working on a project that required parsing through regular mboxes that
didn't necessarily have patches embedded in them.  I started by creating my
own modified copy of git-am and working from there.  Very quickly, I noticed
git-mailinfo wasn't able to handle a big chunk of my email.

After hacking up numerous solutions and running into more limitations, I
decided it was just easier to rewrite a big chunk of it.  The following
patch has a bunch of fixes and features that I needed in order for me do
what I wanted.

Note: I'm didn't follow any email rfc papers but I don't think any of the
changes I did required much knowledge (besides the boundary stuff).

List of major changes/fixes:
- can't create empty patch files fix
- empty patch files don't fail, this failure will come inside git-am
- multipart boundaries are now handled
- only output inbody headers if a patch exists otherwise assume those
headers are part of the reply and instead output the original headers
- decode and filter base64 patches correctly
- various other accidental fixes

I believe I didn't break any existing functionality or compatibility (other
than what I describe above, which is really only the empty patch file).

I tested this through various mailing list archives and everything seemed to
parse correctly (a couple thousand emails).

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 builtin-mailinfo.c |  520 +++++++++++++++++++++++++++------------------------
 git-am.sh          |    4 +
 git-applymbox.sh   |    4 +
 git-quiltimport.sh |    4 +
 4 files changed, 287 insertions(+), 245 deletions(-)

diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
index 766a37e..dacdf77 100644
--- a/builtin-mailinfo.c
+++ b/builtin-mailinfo.c
@@ -11,19 +11,22 @@ static FILE *cmitmsg, *patchfile, *fin, *fout;
 static int keep_subject;
 static const char *metainfo_charset;
 static char line[1000];
-static char date[1000];
 static char name[1000];
 static char email[1000];
-static char subject[1000];
 ...
From: Don Zickus
Date: Thursday, March 15, 2007 - 7:35 am

I am working on a project that required parsing through regular mboxes that
didn't necessarily have patches embedded in them.  I started by creating my
own modified copy of git-am and working from there.  Very quickly, I noticed
git-mailinfo wasn't able to handle a big chunk of my email.

After hacking up numerous solutions and running into more limitations, I
decided it was just easier to rewrite a big chunk of it.  The following
patch has a bunch of fixes and features that I needed in order for me do
what I wanted.

Note: I'm didn't follow any email rfc papers but I don't think any of the
changes I did required much knowledge (besides the boundary stuff).

List of major changes/fixes:
- can't create empty patch files fix
- empty patch files don't fail, this failure will come inside git-am
- multipart boundaries are now handled
- only output inbody headers if a patch exists otherwise assume those
headers are part of the reply and instead output the original headers
- decode and filter base64 patches correctly
- various other accidental fixes

I believe I didn't break any existing functionality or compatibility (other
than what I describe above, which is really only the empty patch file).

I tested this through various mailing list archives and everything seemed to
parse correctly (a couple thousand emails).

Signed-off-by: Don Zickus <dzickus@redhat.com>
---

Accidentally sent out the wrong patch yesterday..

---
 builtin-mailinfo.c |  520 +++++++++++++++++++++++++++------------------------
 git-am.sh          |    4 +
 git-applymbox.sh   |    4 +
 git-quiltimport.sh |    4 +
 4 files changed, 287 insertions(+), 245 deletions(-)

diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
index 766a37e..a5eea82 100644
--- a/builtin-mailinfo.c
+++ b/builtin-mailinfo.c
@@ -11,19 +11,22 @@ static FILE *cmitmsg, *patchfile, *fin, *fout;
 static int keep_subject;
 static const char *metainfo_charset;
 static char line[1000];
-static char date[1000];
 static char ...
From: Don Zickus
Date: Wednesday, March 14, 2007 - 1:12 pm

This is useful when scripts need more than just the basic email headers to
parse.  By specifying the "-x=" option, one can search and output any header
field they want.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 builtin-mailinfo.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
index dacdf77..dd0f563 100644
--- a/builtin-mailinfo.c
+++ b/builtin-mailinfo.c
@@ -856,11 +856,12 @@ int mailinfo(FILE *in, FILE *out, int ks, const char *encoding,
 }
 
 static const char mailinfo_usage[] =
-	"git-mailinfo [-k] [-u | --encoding=<encoding>] msg patch <mail >info";
+	"git-mailinfo [-k] [-u | --encoding=<encoding>] [-x=<field>] msg patch <mail >info";
 
 int cmd_mailinfo(int argc, const char **argv, const char *prefix)
 {
 	const char *def_charset;
+	int top;
 
 	/* NEEDSWORK: might want to do the optional .git/ directory
 	 * discovery
@@ -870,6 +871,8 @@ int cmd_mailinfo(int argc, const char **argv, const char *prefix)
 	def_charset = (git_commit_encoding ? git_commit_encoding : "utf-8");
 	metainfo_charset = def_charset;
 
+	for (top=0; header[top]; top++){ ; }
+
 	while (1 < argc && argv[1][0] == '-') {
 		if (!strcmp(argv[1], "-k"))
 			keep_subject = 1;
@@ -879,7 +882,10 @@ int cmd_mailinfo(int argc, const char **argv, const char *prefix)
 			metainfo_charset = NULL;
 		else if (!prefixcmp(argv[1], "--encoding="))
 			metainfo_charset = argv[1] + 11;
-		else
+		else if (!prefixcmp(argv[1], "-x=")) {
+			header[top] = xmalloc(256*sizeof(char));
+			strncpy(header[top++], argv[1]+3, 256);
+		} else
 			usage(mailinfo_usage);
 		argc--; argv++;
 	}
-- 
1.5.0.2.211.g2ca9-dirty

-

From: Don Zickus
Date: Thursday, March 15, 2007 - 7:36 am

This is useful when scripts need more than just the basic email headers to
parse.  By specifying the "-x=" option, one can search and output any header
field they want.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---

 Accidentally sent out the wrong patch yesterday.

---
 builtin-mailinfo.c |   22 ++++++++++++++++------
 1 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
index a5eea82..8ac6ef4 100644
--- a/builtin-mailinfo.c
+++ b/builtin-mailinfo.c
@@ -302,7 +302,7 @@ static int check_header(char *line, char **hdr_data)
 	int i;
 
 	/* search for the interesting parts */
-	for (i = 0; header[i]; i++) {
+	for (i = 0; header[i] && i < MAX_HDR_PARSED; i++) {
 		int len = strlen(header[i]);
 		if (!hdr_data[i] &&
 		    !strncasecmp(line, header[i], len) &&
@@ -338,8 +338,8 @@ static int check_header(char *line, char **hdr_data)
 	if (!memcmp(">From", line, 5) && isspace(line[5]))
 		return 1;
 	if (!memcmp("[PATCH]", line, 7) && isspace(line[7])) {
-		for (i = 0; header[i]; i++) {
-			if (!memcmp("Subject: ", header[i], 9)) {
+		for (i = 0; header[i] && i < MAX_HDR_PARSED; i++) {
+			if (!memcmp("Subject", header[i], 7)) {
 				if (! handle_header(line, hdr_data[i], 0)) {
 					return 1;
 				}
@@ -796,7 +796,7 @@ static void handle_info(void)
 	char *hdr;
 	int i;
 
-	for (i = 0; header[i]; i++) {
+	for (i = 0; header[i] && i < MAX_HDR_PARSED; i++) {
 
 		/* only print inbody headers if we output a patch file */
 		if (patch_lines && s_hdr_data[i])
@@ -856,11 +856,12 @@ int mailinfo(FILE *in, FILE *out, int ks, const char *encoding,
 }
 
 static const char mailinfo_usage[] =
-	"git-mailinfo [-k] [-u | --encoding=<encoding>] msg patch <mail >info";
+	"git-mailinfo [-k] [-u | --encoding=<encoding>] [-x=<field>] msg patch <mail >info";
 
 int cmd_mailinfo(int argc, const char **argv, const char *prefix)
 {
 	const char *def_charset;
+	int top;
 
 	/* NEEDSWORK: might want to do the optional .git/ ...
From: Don Zickus
Date: Wednesday, March 14, 2007 - 1:12 pm

This issue popped up when testing my changes.  I believe the patch is the
intended output that git-mailinfo should provide.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 t/t5100/patch0005 |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/t/t5100/patch0005 b/t/t5100/patch0005
index 7d24b24..e7d6f66 100644
--- a/t/t5100/patch0005
+++ b/t/t5100/patch0005
@@ -61,7 +61,7 @@ diff --git a/git-cvsimport-script b/git-cvsimport-script
  		push(@old,$fn);
 
 -- 
-David K
Previous thread: git protocol over port-forwarding by Bill Lear on Wednesday, March 14, 2007 - 11:37 am. (6 messages)

Next thread: [PATCH] Teach git-remote to list pushed branches. by Johannes Sixt on Wednesday, March 14, 2007 - 1:56 pm. (1 message)