[PATCH (GITK)] gitk: Fix commit encoding support.

Previous thread: [PATCH] bisect: fix missing "exit" by Christian Couder on Sunday, November 9, 2008 - 10:25 am. (1 message)

Next thread: force a merge conflict by Caleb Cushing on Sunday, November 9, 2008 - 11:09 am. (6 messages)
To: <git@...>
Cc: Paul Mackerras <paulus@...>
Date: Sunday, November 9, 2008 - 11:06 am

This commit fixes two problems with commit encodings:

1) git-log actually uses i18n.logoutputencoding to generate
its output, and falls back to i18n.commitencoding only
when that option is not set. Thus, gitk should use its
value to read the results, if available.

2) The readcommit function did not process encodings at all.
This led to randomly appearing misconverted commits if
the commit encoding differed from the current locale.

Now commit messages should be displayed correctly, except
when logoutputencoding is set to an encoding that cannot
represent charecters in the message. For example, it is
impossible to convert Japanese characters from Shift-JIS
to CP-1251 (although the reverse conversion works).

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
---
gitk | 25 +++++++++++++++++++++++--
1 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/gitk b/gitk
index ae775b1..3834fc0 100755
--- a/gitk
+++ b/gitk
@@ -1555,9 +1555,27 @@ proc chewcommits {} {
return 0
}

+proc do_readcommit {id} {
+ global tclencoding
+
+ # Invoke git-log to handle automatic encoding conversion
+ set fd [open [concat | git log --no-color --pretty=raw -1 $id] r]
+ # Read the results using i18n.logoutputencoding
+ fconfigure $fd -translation lf -eofchar {}
+ if {$tclencoding != {}} {
+ fconfigure $fd -encoding $tclencoding
+ }
+ set contents [read $fd]
+ close $fd
+ # Remove the heading line
+ regsub {^commit [0-9a-f]+\n} $contents {} contents
+
+ return $contents
+}
+
proc readcommit {id} {
- if {[catch {set contents [exec git cat-file commit $id]}]} return
- parsecommit $id $contents 0
+ if {[catch {set contents [do_readcommit $id]}]} return
+ parsecommit $id $contents 1
}

proc parsecommit {id contents listed} {
@@ -10565,6 +10583,9 @@ set gitencoding {}
catch {
set gitencoding [exec git config --get i18n.commitencoding]
}
+catch {
+ set gitencoding [exec git co...

To: Alexander Gavrilov <angavrilov@...>
Cc: <git@...>
Date: Monday, November 10, 2008 - 7:46 am

Does this mean there are two conversions going on, one inside git log
and another inside Tcl? Is there a reason why it's better to do two
conversions than one, or is it just more convenient that way?

Would an alternative approach have been to read the output of git
cat-file with -translation binary, look for an encoding header, and do
an encoding convertfrom based on the encoding header? What would be
the disadvantage of such an approach?

Paul.
--

To: Paul Mackerras <paulus@...>
Cc: <git@...>
Date: Monday, November 10, 2008 - 8:06 am

If all commits were loaded through cat-file, that would be the way to
go. Otherwise, when one code path uses one method of conversion, and
another one, which is used rarely and semi-randomly, a different
method, it may lead to confusing results if something goes slightly
wrong.

Alexander
--

To: Alexander Gavrilov <angavrilov@...>
Cc: <git@...>
Date: Thursday, November 13, 2008 - 7:42 am

OK, that makes sense. I applied the patch with a paragraph added to
the description that explains that.

Thanks,
Paul.
--

Previous thread: [PATCH] bisect: fix missing "exit" by Christian Couder on Sunday, November 9, 2008 - 10:25 am. (1 message)

Next thread: force a merge conflict by Caleb Cushing on Sunday, November 9, 2008 - 11:09 am. (6 messages)