Re: Stable ab/i18n branch

Previous thread: Manual hunk edit mode + emacs + ^G == garbage by Kevin Ballard on Wednesday, October 13, 2010 - 2:37 pm. (9 messages)

Next thread: Urgent matters for your attention! by PRIVATE MAIL on Wednesday, October 13, 2010 - 6:20 pm. (1 message)
From: Junio C Hamano
Date: Wednesday, October 13, 2010 - 9:46 pm

Here are the topics that have been cooking.  Commits prefixed with '-' are
only in 'pu' while commits prefixed with '+' are in 'next'.  The ones
marked with '.' do not appear in any of the integration branches, but I am
still holding onto them.

--------------------------------------------------
[Graduated to "master"]

* ab/makefile-track-cc (2010-09-12) 1 commit
  (merged to 'next' on 2010-09-27 at 51daee0)
 + Makefile: add CC to TRACK_CFLAGS

* bc/fix-cherry-pick-root (2010-09-27) 1 commit
  (merged to 'next' on 2010-09-27 at e27f4c9)
 + builtin/revert.c: don't dereference a NULL pointer

* cw/gitweb-hilite-config (2010-09-21) 1 commit
  (merged to 'next' on 2010-09-27 at dd234ba)
 + Enable highlight executable path as a configuration option

* jk/repack-reuse-object (2010-09-27) 2 commits
  (merged to 'next' on 2010-09-27 at 5719f72)
 + Documentation: pack.compression: explain how to recompress
 + repack: add -F flag to let user choose between --no-reuse-delta/object

* mg/reset-doc (2010-09-15) 6 commits
  (merged to 'next' on 2010-09-22 at 2a10b71)
 + git-reset.txt: make modes description more consistent
 + git-reset.txt: point to git-checkout
 + git-reset.txt: use "working tree" consistently
 + git-reset.txt: reset --soft is not a no-op
 + git-reset.txt: reset does not change files in target
 + git-reset.txt: clarify branch vs. branch head

* uk/fix-author-ident-sed-script (2010-09-23) 1 commit
  (merged to 'next' on 2010-09-27 at 5ad7d90)
 + get_author_ident_from_commit(): remove useless quoting

--------------------------------------------------
[New Topics]

* ab/send-email-perl (2010-09-30) 16 commits
  (merged to 'next' on 2010-09-30 at cf8e58e)
 + send-email: extract_valid_address use qr// regexes
 + send-email: is_rfc2047_quoted use qr// regexes
 + send-email: use Perl idioms in while loop
 + send-email: make_message_id use "require" instead of "use"
 + send-email: send_message die on $!, not $?
 + send-email: use (?:) instead of () if no match ...
From: Nazri Ramliy
Date: Wednesday, October 13, 2010 - 10:51 pm

A few of the Signed-off-by and Reviewed-by stanzas in this series are messed up.

Have a look at

    $ git log 9855b0..41ae8f1

Copy and paste error?

nazri
--

From: Ævar Arnfjörð Bjarmason
Date: Thursday, October 14, 2010 - 2:23 am

Why did you amend this to use this sed trick:

    +sed -e 's/Z$//' >expect <<\EOF &&
    +not ok - 1 tests clean up even after a failure
    +#      Z
    +#          touch clean-after-failure &&
    +#          test_when_finished rm clean-after-failure &&
    +#          (exit 1)
    +#      Z
    +not ok - 2 failure to clean up causes the test to fail
    +#      Z
    +#          test_when_finished \"(exit 2)\"
    +#      Z

Is it just to keep it diff --check clean?

Anyway if we munge the output like this the output of test_cmp will be
more confusing when it fails, because it'll be diff(1)-ing something

Could you please pick up the 160 commit version of this at:

    git://github.com/avar/git.git ab/i18n

It has the new "gettext.c: use libcharset.h instead of langinfo.h when
available" patch by Erik Faye-Lund and me, which would be nice to have
tested in pu.
--

From: Jonathan Nieder
Date: Thursday, October 14, 2010 - 1:00 pm

This is a "give an inch and they'll ask for a mile" sort of thing, but
would it be possible to maintain a stable branch with the i18n
infrastructure that only gets rebased when there is reorganization
going on?

The gettextization and translations are rebased for other reasons (to
avoid going crazy), I know.  But with the infrastructure it is starting
to be hard to track what changes over time.
--

From: Ævar Arnfjörð Bjarmason
Date: Thursday, October 14, 2010 - 1:44 pm

I could do that, but I've been hoping that it just gets picked up for
the `next -> master` process of git.git itself and *that* becomes the
stable target.

But I have no idea what's going on at the other end, i.e. there's no
comments about it in the "What's cooking in git.git" posts or
elsewhere. So it's hard to know whether something like this is needed.

It's been about as ready as it's ever going to get for about a month
now. The libcharset.h change that was added to it a week or so ago
could have come on top of the series. But since it hadn't been merged
anywhere I rebased and inserted that earlier in the series for the
relatively obscure case of helping bisectability on Windows.

I'm starting to get the feeling that there isn't much interest in i18n
support at all. It didn't show up highly in the wanted features in the
Git survey, and most of the translations have been contributed by
people I've poked directly.

So maybe I should just get the hint and drop it *shrug*. I don't have
much time to spend on this these days since I've been moving /
starting a new job. So if it's not going to get merged into core I
might as well do something else.

(I'm not being bitter, really. Just practical)
--

From: Jonathan Nieder
Date: Thursday, October 14, 2010 - 1:54 pm

Probably it is a difference in culture between e.g. the Linux kernel
and other projects.  In the world I'll stereotype as the Linux kernel
world, forks are considered good!  Developments everyone agrees is
good in the long run (like the Linux realtime tree) are not
necessarily merged, for years even, the justification being that
until the _immediate_ benefits for Linus outweigh the risks for Linus,
it just doesn't make sense to merge.

This avoids bloat and bugs from code that is not being used by anyone,
while allowing development to continue to happen.

Now git.git is not the Linux kernel.  In particular, Junio provides
the extra service of a working "proposed updates" branch, including
changes that are not necessarily to be part of the next release.  But
the underlying principle is the same: until there is an _immediate_
benefit to including a feature in releases that does not outweigh
the downsides, it just does not happen.

What that means: interested parties need to start testing the l10n
code.  There should be a reliable upstream for users of this
feature and ideally that should not be Junio unless he wants to (and
Ævar, I think you have been doing a good job of that, just saying it's

I hope not!  e.g. the LC_CTYPE problems have not been resolved (and yes,

I'm interested in it, at least.
--

From: Ævar Arnfjörð Bjarmason
Date: Thursday, October 14, 2010 - 2:18 pm

There's definitely a chicken & egg problem here. Unlike what you'd get
in the case of Linux this isn't an unstable device driver that
somebody needs. It's just something that optionally makes life easier
for people that pretty much by definition don't follow this mailing
list.

So it's hard to build an upstream of users. With all other free
software programs that happens by the canonical upstream version of
the program being internationalized. The distros then picking up that
version and give it to end users.

So what I've tried to do to make this acceptable for inclusion in core
is to make this whole thing a no-op unless it's explicitly
requested.

You can skip it altogether with NO_GETTEXT=YesPlease, and even with
NO_GETTEXT= it won't get used unless you've explicitly set LANGUAGE=,
or LC_*= variables accordingly.

Maybe I could do something to make it even less intrusive, e.g. have a
core.i18n=yes config variable. But I haven't gotten any feedback on
that so I've kept the current scheme of making it behave like pretty

The LC_CTYPE problems I'm aware of were worked around in this patch:

    gettext.c: work around us not using setlocale(LC_CTYPE, "")

While that's not a perfect solution I think it's the best we're going
to get for the time being. I'm pretty convinced that if I tried to
make git itself LC_CTYPE-safe as part of this already huge series it'd
never get merged. Having messages with question marks from strerror()
on certain platforms is an OK compromise in my opinion.


That's good to hear.
--

From: Sverre Rabbelier
Date: Thursday, October 14, 2010 - 2:26 pm

Heya,


Ditto, especially since i18n, once the groundwork has been done,
requires very little work to support (as long as someone keeps an eye
out for people forgetting to _("") a string).

-- 
Cheers,

Sverre Rabbelier
--

From: Jon Seymour
Date: Thursday, October 14, 2010 - 2:50 pm

On Fri, Oct 15, 2010 at 8:18 AM, Ævar Arnfjörð Bjarmason

I agree with Jonathan that there might be some value to clearly
delineating the i18n infrastructure from the application of it to the
rest of the code base.The i18n infrastructure should be, relatively
speaking, less invasive than the application of it throughout the code
base.

It would be good for Junio (I presume) to have the option to integrate
the infrastructure in one hit, but allow the application of it to be
deferred, perhaps on a subject-area by subject-area basis.

jon.
--

From: Ævar Arnfjörð Bjarmason
Date: Thursday, October 14, 2010 - 9:54 pm

It already is pretty much separate internally. To get only the
infrastructure you pull the series and omit the "gettextize"
patches. There is some slight tangling up e.g. here:

    6da9243 gettext tests: skip lib-gettext.sh tests under GETTEXT_POISON
    234ddee gettextize: git-init basic messages
    b161357 gettextize: git-init "Initialized [...] repository" message
    3ad3f9f gettext tests: test if $VERSION exists before using it
    1f56190 gettext tests: add detection for is_IS.ISO-8859-1 locale
    5f24f25 gettext tests: test message re-encoding under Shell
    ec31cc6 gettext tests: test re-encoding with a UTF-8 msgid under Shell
    45e8a56 gettext tests: mark a test message as not needing translation
    c9db9aa po/is.po: add Icelandic translation
    83beb97 gettext tests: test message re-encoding under C

Where the tests I'm adding depend on an earlier "gettextize"
patch. That can easily be split up if there's demand for it. But so
far nobody has asked so I haven't done it.
--

From: Jonathan Nieder
Date: Thursday, October 14, 2010 - 5:07 pm

Hi again,


The question marks[1] are what I was referring to.  Consider this from
the point of view of someone working on the Debian package: would
users consider that an appropriate or positive change for squeeze+1?
(Hint: not if it doesn't come with some benefit in their locale, no.)

I suspect making git work with other LC_CTYPE would not actually be
too hard[2].  Such fixes are useful for futureproofing and increased
sanity anyway --- they do not have to be part of the l10n branch imho.

[1] fatal: unable to stat 'foo.c': Op?ration non permise
[2] except that annoying printf bug.
--

From: Ævar Arnfjörð Bjarmason
Date: Thursday, October 14, 2010 - 10:16 pm

No benefit? The benefit is that the program they previously either
didn't understand or understood poorly is now talking to them in their
native language. That's a pretty big benefit.

The tradeoff is that a small subset of the messages which include
strerror() output will show non-ASCII characters as question marks on
GNU systems.

Don't get me wrong. I think it's a bug that we need to fix. I just
think it's a relatively minor annoyance, not a showstopper. With it
the feature *mostly* just works, and the things that don't can be

It's something we want yes, but not something I have time for these
days. So unless someone else is interested in helping audit all that
code, providing a printf() fallback on glibc etc. it'll block the i18n
series.

For something I at least think is a relatively minor issue that
doesn't warrant throwing the baby out with the bathwater.
--

From: Jonathan Nieder
Date: Thursday, October 14, 2010 - 10:28 pm

And for the languages that are not translated yet?

Don't get me wrong --- I'm only trying to give a sense of what it is
like for a user to experience a regression.  It is generally little

Oh, I never meant to say that this should be a blocker.  Only that
there really are costs and benefits to weigh.

Much more important than the known bugs are the unknown bugs ---
you've heard this before, I think.  The way to get rid of unknown bugs
(aside from inspecting code) is to get users.

For example, if Gerrit doesn't mind, I would like to apply your
patches to experimental once the version being staged for squeeze
clears from there.

Other interested people can attract users in other ways --- by
providing documentation, tarballs, ...
--

From: Ævar Arnfjörð Bjarmason
Date: Thursday, October 14, 2010 - 10:35 pm

That would be great. Let me know if I can help with that in some way.
--

From: Junio C Hamano
Date: Saturday, October 16, 2010 - 9:44 pm

People might have noticed that I've refrained to take other topics that
may add new messages to 'next'.  I would wanted to merge ab/i18n early in
the cycle soon after dust has settled after 1.7.3 release.  And I still
do.

Having said that, there are different classes of risks associated with
i18n effort.

(1) Regressions that even hit a NO_I18N build.
(2) Regressions that hit LC_ALL=C execution in a !NO_I18N build.
(3) Regressions that hit plumbing run in a non-C locale.

 . i18n needs not just marking strings with _("string") but also needs to
   fix code that manually formulates messages by series of strcat().  It
   may need to start using allocations on the heap, with potential risk of
   usual bugs (leaks, use-after-free, etc.) and performance degradation.

 . Messages left unmarked with _("string"), or messages that are marked
   with _("string") that shouldn't have, won't be serious issues for the
   first two classes.  The latter is a serious regression for the
   plumbing.

We are all human, and misconversion during this process is possible, even
though the above classes of regressions are unacceptable.  On the other
hand, as long as the above three classes of regressions are minimum and
quickly fixed/fixable, issues in non-C locale Porcelains are tolerable
during the initial cut.

I've looked at the patches in the series, and plan to take another look.
I'm sure others on the list have checked the series, some with fine combs,
too, and hopefully Ævar has fixed any such regression that has been
reported and plans to do so for the ones discovered in the future.  As
long as we are sure that we have done a reasonable effort to eyeball the
patches, the logical next step would be to merge the series to 'next' for
further testing.

(4) Incomplete *.po file, and languages without *.po file.

Once we are sure that the series does not have the first two classes of
issues, we can ask everybody to mark new strings in their series, iow,
merge the i18n part to 'master'.  ...
From: Ævar Arnfjörð Bjarmason
Date: Sunday, October 17, 2010 - 5:33 am

Good to hear. I didn't know that before, and that's really all I

Right. I'll have time to deal with any bugs that crop up, and I'm

Do you mean to re-arrange it so that there's a patch at the front of
the series that introduces gettext.h with only the fallbacks:

    #define _(s) (s)
    #define N_(s) (s)

And then merge the ~120 gettextize patches first and do the
infrastructure later?

That could be done, but just merging the whole 160 patch series and
turning it off by default would have pretty much the same effect. And
since I thought this was going to get merged soon-ish anyway I didn't

I think we're basically at a point where merging it down to next ->

Thanks.
--

From: Jonathan Nieder
Date: Sunday, October 17, 2010 - 8:59 am

I don't think that's what he was saying, but I think that _would_ be
helpful.  So the history could look like:

                     infrastructure
                    /              \
 introduce gettext.h                ab/i18n
                    \              /
                     gettextization
                          /
                          merges from master where appropriate,
                          perhaps just at the beginning for simplicity.


That way, each side could have patches rearranged or squashed when
needed without disrupting the other.

What do you think?
--

From: Junio C Hamano
Date: Monday, October 18, 2010 - 4:39 pm

Not really.

Two pieces that would be nice to have in 'master' (or even 'maint', if we
consider avoiding merge conflicts and mismerges when fixes are queued
there) are:

 1. preparatory fixes to code that builds message string by concatenating
    parts of speech in English word ordering into buffer or emitting to
    output stream piece by piece; they should convert them to some form of
    sprintf-like format strings plus arguments.  This does not necessarily
    have to mark the format strings with _(s).

 2. the empty definitions for _(s) and N_(s).

I would consider the first one part of general clean-up job, which we know
will help i18n, but which we would want to do regardless of i18n.  And it
is probably the most error prone part in the whole process.  The sooner
the result of these two steps hit 'master', the less pain for everybody.

And then:

 3. actual marking of strings with _(s) and N_(s).

which can be merged to 'next' after vetting for regression (the first two
classes).

 4. Adding and polishing of *.po files for actual messages and languages,
    i.e. l10n.

This can happen pretty much independently from 3.  Honestly I would be
happier if I do not have to keep track of the actual l10n part.

I think the current series to some degree conflates steps 1. and 3.  As
the list of risks I outlined in the previous message show, mistakes in 1.
is much more grave than mistakes in 3. (iow, no big deal for having a few
untranslated messages during early rounds of i18n support); I would have
preferred these two steps were clearly separated, so that we can push the
first two steps out to the 'master' sooner.
--

From: Michael J Gruber
Date: Monday, October 18, 2010 - 11:05 pm

I'd just like to second (or third or..) Junio's points here since I had
suggested a split like that earlier already, and I think the current
state of affairs simply makes many potential reviewers (at least one
that I know of) go away.

1.,2. and (maybe to a lesser degree) also 3. should be able to find many
reviewers, thus making the potentially problematic parts as solid as
possible. (I'm still waiting for a conceptual approach to 4., i.e.
glossary first, but that is a different issue.)

Michael
--

From: Junio C Hamano
Date: Saturday, October 16, 2010 - 9:43 pm

Yes and files with trailing whitespaces are hard to understand in general

Hmm, why?  "expect" has trailing whitespaces on lines that end with Z,
i.e. what the original patch wanted to place in.
--

From: Johan Herland
Date: Wednesday, October 20, 2010 - 7:14 pm

Fixed in the next iteration (v4, just posted). Thanks for noticing.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
--

Previous thread: Manual hunk edit mode + emacs + ^G == garbage by Kevin Ballard on Wednesday, October 13, 2010 - 2:37 pm. (9 messages)

Next thread: Urgent matters for your attention! by PRIVATE MAIL on Wednesday, October 13, 2010 - 6:20 pm. (1 message)