Re: [PATCH] WIP: begin to translate git with gettext

Previous thread: [GSoC update] git-remote-svn: Week 3 by Ramkumar Ramachandra on Monday, May 17, 2010 - 6:28 am. (1 message)

Next thread: stupid error - is there a way to fix? by Eugene Sajine on Monday, May 17, 2010 - 11:32 am. (3 messages)
From: Jeff Epler
Date: Monday, May 17, 2010 - 9:05 am

Signed-off-by: Jeff Epler <jepler@unpythonic.net>
---
[resent with Cc to list and thread participants]

While I'm certain that there are a lot of things to object to in this
patch, it shows 90% of what is needed to use gettext to translate
the portions of git written in c, without involving undesired gnu
infrastructure such as automake.

Makefile adds necessary rules for generating git.pot and for building
and installing compiled message catalogs (.mo) from text message
catalogs (.po).  It also adds a gettext support header and source file.

Minimal changes are made to git to use the requested LC_CTYPE and
LC_MESSAGES, and some messages for 'git status' are marked for
translation.

When I provided a gibberish translation of a message:
#: wt-status.c:87
msgid "# Changed but not updated:"
msgstr "# Changes not blah blah blah"

running 'git status' used the translation:
$ git status
# On branch master
# Your branch is ahead of 'origin/master' by 1 commit.
#
# Changes not blah blah blah
...

I ran with 'make install' and prefix set in config.mak.  It didn't seem
to work when running from the source directory, and it may or may not
work with runtime prefix.


 Makefile    |   26 +++++++++++++
 gettext.c   |   17 +++++++++
 gettext.h   |   15 ++++++++
 git.c       |    3 ++
 wt-status.c |  117 ++++++++++++++++++++++++++++++-----------------------------
 5 files changed, 120 insertions(+), 58 deletions(-)
 create mode 100644 gettext.c
 create mode 100644 gettext.h

diff --git a/Makefile b/Makefile
index 4f7224a..c02ca18 100644
--- a/Makefile
+++ b/Makefile
@@ -294,6 +294,8 @@ RPMBUILD = rpmbuild
 TCL_PATH = tclsh
 TCLTK_PATH = wish
 PTHREAD_LIBS = -lpthread
+XGETTEXT = xgettext
+MSGFMT = msgfmt
 
 export TCL_PATH TCLTK_PATH
 
@@ -518,6 +520,7 @@ LIB_H += userdiff.h
 LIB_H += utf8.h
 LIB_H += xdiff-interface.h
 LIB_H += xdiff/xdiff.h
+LIB_H += gettext.h
 
 LIB_OBJS += abspath.o
 LIB_OBJS += advice.o
@@ -559,6 +562,7 @@ LIB_OBJS += entry.o
 LIB_OBJS += ...
From: Robert Buck
Date: Monday, May 17, 2010 - 4:29 pm

Is gettext portable? Or is it only POSIX? If it's not portable, have
you considered using ICU instead as it is the best of class solution
for I18N/L10N?

That's my 0.02.

-Bob
--

From: Ævar Arnfjörð Bjarmason
Date: Monday, May 17, 2010 - 9:23 pm

Yes it's portable. libintl or libraries that provide compatible
interfaces are available on every platform Git is.

ICU is not a gettext replacement, it's a Unicode processing library, I
don't think it has any gettext-like features. In any case it would be
a huge dependency for the service it would provide.
--

From: Michael J Gruber
Date: Tuesday, May 18, 2010 - 12:40 am

I have no experience whatsover with gettext, but it looks quite
dangerous to me to have printf format specifiers as part of the
localized text. It means that our programs can crash depending on the
LANG setting at run time if localisers mess up. We'll never catch this
unless we run all tests in all languages!

Also, the basic structure of the output should probably be independent
of the language, preferring consistent structure across languages over
linguistically consistent structure  within a language.

That means we'll have to do a lot of strcat's (the _() things are not
compile time constants, are they?) rather than those mechanical



--

From: Ævar Arnfjörð Bjarmason
Date: Tuesday, May 18, 2010 - 1:11 am

On Tue, May 18, 2010 at 07:40, Michael J Gruber

I don't have much experience with gettext either (except through
Launchpad), maybe it has some internal facilities to avoid errors in
these cases.

You can test if the translated messages contain the same format
specifiers as the originals, and in any case much larger projects than
Git manage dozens of translations with gettext while avoiding

Generally you don't to strcat's since you don't want to enforce word
order, doing so will make the messages sound like Yoda in some of the
target languages.


No, as Jeff said it's just a proof of concept. That patch as-is
doesn't reflect good translation practices, it just bootstraps
gettext.

Which is very useful by the way, thanks Jeff.
--

From: Jeff Epler
Date: Tuesday, May 18, 2010 - 9:40 am

This is exactly how gettext works.  Yes, you can get crashes if the
translated string does not have the right arguments--and I would not be
at all surprised to hear of at least one privilege escalation bug
due to a bad message catalog, since printf format errors can be used in
such interesting ways.

Anyway, for printf-style formats, 'msgfmt' can be directed to check for
this situation:
    $ cat bad.po
    msgid ""
    msgstr "Content-Type: text/plain; charset=UTF-8\n"

    #,c-format
    msgid "foo %s %d"
    msgstr "föö %d %d"

    $ msgfmt --check-format bad.po
    bad.po:6: format specifications in 'msgid' and 'msgstr' for argument 1 are not the same

No, the ability of gettext+printf to use the right structure of the
user's language is a strength.  For instance, consider the translation
into Yoda's locale of the following sentence:

    printf("The %s is %s.\n", "Future", "Clouded");

The proper localized message is

    Clouded the Future is.

Anything else will range from confusing to unintelligible to the
native speaker.  You get that with gettext by writing

    printf(_("The %s is %s.\n"), _("Future"), _("Clouded"));

together with the message catalog entry
    msgid "The %s is %s.\n"

No, that one's a mistake.  I did not take care when choosing which
strings to mark, because I was mostly interested in showing a
proof-of-concept for using gettext to translate core parts of git.

The amount of work to mark all the source files and then to keep the
marks up to date should not be underestimated--and that's just the work
to enable translators to localize the software.  It is important to
gauge the interest in the git community in actually doing this work.

As my own primary language is English, I have only a theoretical
interest in this feature.  However, the existence of translations for
gitk and git-gui indicates to me that the community probably does desire
this.

Jeff
--

From: Ævar Arnfjörð Bjarmason
Date: Tuesday, May 18, 2010 - 10:02 am

It's also something you shouldn't overestimate. I've been involved in
internationalizing several projects that were previously English-only.

The work of making things translatable can be done incrementally. You
also don't have to get everything right the first time, the current
proof of concept translation of `git status` for instance suffers from
numerous problems, but it's still better than nothing.

It can be used as-is and then incrementally improved by arranging the
strings more intelligently in the future.
--

From: Ævar Arnfjörð Bjarmason
Date: Thursday, May 20, 2010 - 9:02 am

I did some work on this in my branch:
http://github.com/avar/git/compare/master...topic/git-gettext

Fixed up the Makefile rules a bit, added appropriate gitignores, and
added a work in progress po/is.po.

Still have to go through the gettext manual to figure out how to
integrate this with our shellscripts.
--

From: Dévai Tamás
Date: Friday, May 21, 2010 - 11:02 am

2010. 05. 20, csütörtök keltezéssel 16.02-kor Ævar Arnfjörð Bjarmason

info '(gettext)sh'

or the online help[1][2] might be of some help.

[1]: http://www.gnu.org/software/gettext/manual/gettext.html#sh_002dformat
[2]: http://www.gnu.org/software/gettext/manual/gettext.html#sh


--

Previous thread: [GSoC update] git-remote-svn: Week 3 by Ramkumar Ramachandra on Monday, May 17, 2010 - 6:28 am. (1 message)

Next thread: stupid error - is there a way to fix? by Eugene Sajine on Monday, May 17, 2010 - 11:32 am. (3 messages)