Re: [RFC/PATCH] git-gui: Use gitattribute "encoding" for file content display

Previous thread: stgit: config option for diff-opts by Jon Smirl on Tuesday, January 22, 2008 - 11:04 pm. (10 messages)

Next thread: Why does git track directory listed in .gitignore/".git/info/exclude"? by pradeep singh rautela on Wednesday, January 23, 2008 - 9:54 am. (20 messages)
To: <git@...>
Cc: Jakub Narebski <jnareb@...>, Johannes Schindelin <Johannes.Schindelin@...>, Michele Ballabio <barra_cuda@...>, Junio C Hamano <gitster@...>, Finn Arne Gangstad <finnag@...>
Date: Wednesday, January 23, 2008 - 1:47 am

I've got the following change in my "pu" right now and am considering
adding it to git-gui 0.9.2, which would be in git 1.5.4.

I've CC'd a number of people who have emailed me in the past
about git-gui's diff or blame failing to display a non US-ASCII
file content correctly and I am interested to hear if this would
resolve the issue for you. Its configurable on a per-path basis
by an "encoding" attribute in .gitattributes (see git-gui's own
example below).

If we go this route we'll also want to have core Git document in
its gitattributes manpage what this "encoding" attribute is for...

--8>--
git-gui: Use gitattribute "encoding" for file content display

Most folks using git-gui on internationalized files have complained
that it doesn't recognize UTF-8 correctly. In the past we have just
ignored the problem and showed the file contents as binary/US-ASCII,
which is wrong no matter how you look at it.

This really should be a per-file attribute, managed by .gitattributes,
so we now pull the "encoding" attribute data for the given path from
the .gitattributes (if available) and use that, falling back to UTF-8
if the attributes are unavailable, git-check-attr is broken, or an
encoding for this path not specified.

We apply the encoding anytime we show file content, which currently
is limited to only the diff viewer and the blame viewer.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
.gitattributes | 3 +++
git-gui.sh | 13 +++++++++++++
lib/blame.tcl | 5 ++++-
lib/diff.tcl | 9 ++++++---
4 files changed, 26 insertions(+), 4 deletions(-)
create mode 100644 .gitattributes

diff --git a/.gitattributes b/.gitattributes
new file mode 100644
index 0000000..f96112d
--- /dev/null
+++ b/.gitattributes
@@ -0,0 +1,3 @@
+* encoding=US-ASCII
+git-gui.sh encoding=UTF-8
+/po/*.po encoding=UTF-8
diff --git a/git-gui.sh b/git-gui.sh
index f42e461..adc25d0 100755
--- a/git-gui.sh
+++ b/git-gui.sh
@@ -466,6 +466,19 @@ proc ...

To: Shawn O. Pearce <spearce@...>
Cc: Jakub Narebski <jnareb@...>, Johannes Schindelin <Johannes.Schindelin@...>, Michele Ballabio <barra_cuda@...>, Junio C Hamano <gitster@...>, Finn Arne Gangstad <finnag@...>, <git@...>
Date: Wednesday, January 23, 2008 - 3:02 am

Hi,

This solves the problem for me.

The diff display correctly display utf-8 characters.

Best regards,
--
Pedro Melo
Blog: http://www.simplicidade.org/notes/
XMPP ID: melo@simplicidade.org
Use XMPP!

-

To: Shawn O. Pearce <spearce@...>
Cc: Jakub Narebski <jnareb@...>, Johannes Schindelin <Johannes.Schindelin@...>, Michele Ballabio <barra_cuda@...>, Finn Arne Gangstad <finnag@...>, <git@...>
Date: Wednesday, January 23, 2008 - 1:55 am

Hmmm.

At least for now in 1.5.4, I'd prefer the way gitk shows UTF-8
(if I recall correctly latin-1 or other legacy encoding, as long
as LANG/LC_* is given appropriately, as well) contents without
per-path configuration without introducing new attributes.
-

To: Junio C Hamano <gitster@...>
Cc: Jakub Narebski <jnareb@...>, Johannes Schindelin <Johannes.Schindelin@...>, Michele Ballabio <barra_cuda@...>, Finn Arne Gangstad <finnag@...>, <git@...>
Date: Wednesday, January 23, 2008 - 11:36 pm

Hmm. I'll try to rework something along those lines for 1.5.4 then.

--
Shawn.
-

To: Shawn O. Pearce <spearce@...>
Cc: Jakub Narebski <jnareb@...>, Johannes Schindelin <Johannes.Schindelin@...>, Michele Ballabio <barra_cuda@...>, Junio C Hamano <gitster@...>, Finn Arne Gangstad <finnag@...>, Git Mailing List <git@...>
Date: Wednesday, January 23, 2008 - 4:41 am

Shouldn't we first try harder to get things right without adding
an attribute? Maybe we could continue a good tradition and look
at the content of the first: we could first look for hints in the
file about the encoding. XML and many text files contain such
hints already to help editors. For example, Python source can
explicitly contain the encoding [1]; and I guess there are many
other examples. If we don't find a direct hint, we could have
some magic auto-detection similar to what we do for autocrlf. As
a fallback the user could specify a default encoding. But only
as a last resort, I'd use explicit attributes.

[1] http://www.python.org/dev/peps/pep-0263/

Steffen
-

To: Steffen Prohaska <prohaska@...>
Cc: Johannes Schindelin <Johannes.Schindelin@...>, Michele Ballabio <barra_cuda@...>, Junio C Hamano <gitster@...>, Finn Arne Gangstad <finnag@...>, Shawn O. Pearce <spearce@...>, Git Mailing List <git@...>
Date: Wednesday, January 23, 2008 - 6:28 am

For example LaTeX files either use inputenc package to set encoding
(e.g. \usepackage[latin2]{inputenc}) or use magic first line to
specify TCX (TeX character translation) file
(e.g. %& -translate-file=il2-t1).

Emacs encourages to use file variables, either in the form of magic
first line, or file variables at the end of file; I think the same
is true for Vim.

I'd like then for it to be at least as configurable as diff.*.funcname

We can at least try to and check for UTF-16 magic first two bytes, and
detect if we have character which is invalid in UTF-8 (for performance

...and then falling back to fallback encoding, like gitweb does.

--
Jakub Narebski
Poland
-

Previous thread: stgit: config option for diff-opts by Jon Smirl on Tuesday, January 22, 2008 - 11:04 pm. (10 messages)

Next thread: Why does git track directory listed in .gitignore/".git/info/exclude"? by pradeep singh rautela on Wednesday, January 23, 2008 - 9:54 am. (20 messages)