Re: [PATCH 2/2] send-email: rfc2047-quote subject lines with non-ascii characters

Previous thread: [PATCH 1/2] send-email: specify content-type of --compose body by Jeff King on Tuesday, March 25, 2008 - 4:02 pm. (1 message)

Next thread: [PATCH] Fix branches file configuration by Daniel Barkalow on Tuesday, March 25, 2008 - 4:35 pm. (7 messages)
From: Jeff King
Date: Tuesday, March 25, 2008 - 4:03 pm

We always use 'utf-8' as the encoding, since we currently
have no way of getting the information from the user.

This also refactors the quoting of recipient names, since
both processes can share the rfc2047 quoting code.

Signed-off-by: Jeff King <peff@peff.net>
---
 git-send-email.perl   |   18 +++++++++++++++---
 t/t9001-send-email.sh |   15 +++++++++++++++
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 7c4f06c..075cd0b 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -501,7 +501,12 @@ if ($compose) {
 	open(C,">",$compose_filename)
 		or die "Failed to open for writing $compose_filename: $!";
 	print C "From $sender # This line is ignored.\n";
-	printf C "Subject: %s\n\n", $initial_subject;
+	print C "Subject: ",
+		($initial_subject =~ /[^[:ascii:]]/ ?
+		quote_rfc2047($initial_subject) :
+		$initial_subject),
+		"\n";
+	print C "\n";
 	printf C <<EOT;
 GIT: Please enter your email below.
 GIT: Lines beginning in "GIT: " will be removed.
@@ -626,6 +631,14 @@ sub unquote_rfc2047 {
 	return wantarray ? ($_, $encoding) : $_;
 }
 
+sub quote_rfc2047 {
+	local $_ = shift;
+	my $encoding = shift || 'utf-8';
+	s/([^-a-zA-Z0-9!*+\/])/sprintf("=%02X", ord($1))/eg;
+	s/(.*)/=\?$encoding\?q\?$1\?=/;
+	return $_;
+}
+
 # use the simplest quoting being able to handle the recipient
 sub sanitize_address
 {
@@ -643,8 +656,7 @@ sub sanitize_address
 
 	# rfc2047 is needed if a non-ascii char is included
 	if ($recipient_name =~ /[^[:ascii:]]/) {
-		$recipient_name =~ s/([^-a-zA-Z0-9!*+\/])/sprintf("=%02X", ord($1))/eg;
-		$recipient_name =~ s/(.*)/=\?utf-8\?q\?$1\?=/;
+		$recipient_name = quote_rfc2047($recipient_name);
 	}
 
 	# double quotes are needed if specials or CTLs are included
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index e222c49..a4bcd28 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -210,4 +210,19 @@ test_expect_success '--compose ...
From: Teemu Likonen
Date: Tuesday, March 25, 2008 - 10:59 pm

These patches seem to work except that the quoting of Subject field 
works only if user types a non-Ascii text to the "What subject should 
the initial email start with?" prompt. If she changes the subject in 
editor it won't be rfc2047-quoted.

Thank you anyway, I think we're going to right direction. I think 'git 
send-mail --compose' is nice way to produce introductory message to 
patch series. If --compose doesn't support MIME encoding reasonable 
way, user may have to write and send intro message with real MUA and 
find out the Message-Id for correct In-Reply-To field for the actual 
patch series.

E-mail agents KMail and Mutt have setting for preferred encodings for 
outgoing mail. It's a list of encodings, 
like "us-ascii,iso-8859-1,utf-8". The first one that fits (including 
From, To, Cc, Subject, the body, ...?) is used, so there is some kind 
of detection of content after the message has been composed.

If portable content encoding detection is difficult or considered 
unnecessary, then I think a documented configurable option is fine 
(UTF-8 by default).
--

From: Jeff King
Date: Tuesday, March 25, 2008 - 11:20 pm

Ah, yes, I hadn't considered that. We should definitely do the quoting
after all of the user's input. Replace 2/2 from my series with the patch
below, which handles this case correctly (and as a bonus, the user sees

git-format-patch recently got a --cover-letter option which does the
same thing. I actually use a real MUA (mutt) instead of send-email, and
this way you can avoid the message-id cutting and pasting that is
required. It automatically does the right thing with encodings because I

Yes, the git-send-email code is a real mess for this sort of thing. I
think it started very small and specific, and has gotten hack upon hack
piled on it. It would be much nicer rewritten from scratch around one of
the many abstracted perl mail objects (though that does introduce a new

I think that is sensible. Want to try adding it on top of my patches?

Below is the revised subject-munging patch.

-- >8 --
send-email: rfc2047-quote subject lines with non-ascii characters

We always use 'utf-8' as the encoding, since we currently
have no way of getting the information from the user.

This also refactors the quoting of recipient names, since
both processes can share the rfc2047 quoting code.

Signed-off-by: Jeff King <peff@peff.net>
---
 git-send-email.perl   |   20 ++++++++++++++++++--
 t/t9001-send-email.sh |   15 +++++++++++++++
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 7c4f06c..3694f81 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -536,6 +536,15 @@ EOT
 		if (!$in_body && /^MIME-Version:/i) {
 			$need_8bit_cte = 0;
 		}
+		if (!$in_body && /^Subject: ?(.*)/i) {
+			my $subject = $1;
+			$_ = "Subject: " .
+				($subject =~ /[^[:ascii:]]/ ?
+				 quote_rfc2047($subject) :
+				 $subject) .
+				"\n";
+			}
+		}
 		print C2 $_;
 	}
 	close(C);
@@ -626,6 +635,14 @@ sub unquote_rfc2047 {
 	return wantarray ? ($_, $encoding) : $_;
 }
 
+sub quote_rfc2047 {
+	local $_ = ...
From: Teemu Likonen
Date: Wednesday, March 26, 2008 - 1:30 am

I had missed the --cover-letter option completely. It may be useful too.
I'm still trying to find the best way to send pathces. If I send intro
message with real MUA I either need to wait for the message to show up
on a mailing list or check my sent-mail folder to find the Message-Id.
Once I know the Message-Id I can send the actual patch series with 'git

I'd like to, but I can only do sh/bash stuff and possibly some
copy-and-paste programming with other scripting languages. You'd end up
fixing my code anyway, sorry.

As you noticed, I accidentally sent you a couple of test emails because
send-email CCed mails to patches' author (I think). Now I have set
"suppresscc = all" and "suppressfrom = true" which should prevent such
accidents. Shouldn't these be defaults? In my opinion it's generally the
best practice to always explicitly define what parties emails are sent
to.

--

From: Jeff King
Date: Wednesday, March 26, 2008 - 1:39 am

That is how I used to do it; now I use --cover-letter (which you

OK, I will add it to the end of my long todo. Out of curiosity, do you
actually want something besides utf-8, or is this just to make us feel

I think this is probably a good change. But it is a behavior change,
which means it is definitely out during the -rc freeze. And it may or

Argh, yes. I _thought_ I ran it successfully through the test script,
but obviously I failed to 'make' and just tested the previous version.
It works fine with the bracket removed.

For reference, the fixed-up patch is below.

-- >8 --
send-email: rfc2047-quote subject lines with non-ascii characters

We always use 'utf-8' as the encoding, since we currently
have no way of getting the information from the user.

This also refactors the quoting of recipient names, since
both processes can share the rfc2047 quoting code.

Signed-off-by: Jeff King <peff@peff.net>
---
 git-send-email.perl   |   19 +++++++++++++++++--
 t/t9001-send-email.sh |   15 +++++++++++++++
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 7c4f06c..d0f9d4a 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -536,6 +536,14 @@ EOT
 		if (!$in_body && /^MIME-Version:/i) {
 			$need_8bit_cte = 0;
 		}
+		if (!$in_body && /^Subject: ?(.*)/i) {
+			my $subject = $1;
+			$_ = "Subject: " .
+				($subject =~ /[^[:ascii:]]/ ?
+				 quote_rfc2047($subject) :
+				 $subject) .
+				"\n";
+		}
 		print C2 $_;
 	}
 	close(C);
@@ -626,6 +634,14 @@ sub unquote_rfc2047 {
 	return wantarray ? ($_, $encoding) : $_;
 }
 
+sub quote_rfc2047 {
+	local $_ = shift;
+	my $encoding = shift || 'utf-8';
+	s/([^-a-zA-Z0-9!*+\/])/sprintf("=%02X", ord($1))/eg;
+	s/(.*)/=\?$encoding\?q\?$1\?=/;
+	return $_;
+}
+
 # use the simplest quoting being able to handle the recipient
 sub sanitize_address
 {
@@ -643,8 +659,7 @@ sub sanitize_address
 
 	# rfc2047 is needed if a non-ascii char is ...
From: Teemu Likonen
Date: Wednesday, March 26, 2008 - 2:23 am

I'm using the current 'master' branch so --cover-letter is there.
Managed to miss it anyway. :)

Hmm, do you send the 0000-cover-letter.patch with 'git send-email'? It
seems that this cover letter don't get MIME headers when sent that way.
Sending through 'mutt -H' it works fine but then the Message-Id needs to
be copy-pasted manually to send-mail for the rest of the series (to have

I mostly use (and promote) UTF-8 and now that I begin to understand how
send-email works I can live with the current behaviour just fine. Don't
take my feedback as complaining. :)

In general my interests are in human languages and I have done quite
a lot of work in different areas to make computers interact nicely with
human languages. This is my interest in general level and I tend to
report/fix problems when I notice them. From Git's point of view at the
present moment we can probably say just like you did: "make us feel
feature complete."

Thanks for your work on this. Really.
--

From: Teemu Likonen
Date: Wednesday, March 26, 2008 - 2:32 am

My English is somewhat broken. I meant to thank you for your work.
--

From: Jeff King
Date: Wednesday, March 26, 2008 - 2:35 am

Maybe it is the late hour, but I am a native English speaker, and it
parsed just fine to me.

-Peff
--

From: Jeff King
Date: Wednesday, March 26, 2008 - 2:33 am

No, I have format-patch do the threading. So something like:

  git format-patch --cover-letter --thread --stdout upstream >mbox
  mutt -f mbox

and then in mutt I bind a key to <resend-message>. For each message, I
do the 'resend', set the recipient headers, look it over one last time,
and then send. The most annoying part is entering the recipients;
usually it isn't too bad because I have short aliases for Junio and the
list, but I had to, e.g., cut and paste your address twice for the other
series.

Probably munging the 'to:' and 'cc:' before running mutt would make the

OK, I am inclined to leave the patches as-is, then, and wait for
somebody to complain about their pet encoding. My reasoning is that:

  - in most cases throughout git, we assume things are happening in
    utf-8, so I don't think it will come as a great surprise
  - I think doing it right might be more complex than just send-mail; I
    am thinking there might need to be a "stuff the user inputs is in

No problem at all. Thank you for helping make git better with bug
reports!

-Peff
--

From: Jeff King
Date: Thursday, March 27, 2008 - 12:38 am

Since it looks like you are using mutt also, I will warn you that there
is a problem with this workflow: when mutt does the resend, it generates
a new message-id. Thus the patches are all connected in a thread because
they all in-reply-to the cover letter, but the cover letter is not
connected, since it has a new message-id.

I'm not sure if there is a way to fix this short of patching mutt. :(

-Peff
--

From: Todd Zullinger
Date: Thursday, March 27, 2008 - 12:44 pm

instead of opening the mbox using -f, and then recall the messages to
send.  That *might* prevent mutt from rewriting the message-id, but I
haven't tested it at all.

--=20
Todd        OpenPGP -> KeyID: 0xBEAF0CE3 | URL: www.pobox.com/~tmz/pgp
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Between two evils, I always pick the one I never tried before.
    -- Mae West

Previous thread: [PATCH 1/2] send-email: specify content-type of --compose body by Jeff King on Tuesday, March 25, 2008 - 4:02 pm. (1 message)

Next thread: [PATCH] Fix branches file configuration by Daniel Barkalow on Tuesday, March 25, 2008 - 4:35 pm. (7 messages)