Here are the topics that have been cooking. Commits prefixed with '-' are only in 'pu' while commits prefixed with '+' are in 'next'. The ones marked with '.' do not appear in any of the integration branches, but I am still holding onto them. -------------------------------------------------- [Graduated to "master"] * ab/makefile-track-cc (2010-09-12) 1 commit (merged to 'next' on 2010-09-27 at 51daee0) + Makefile: add CC to TRACK_CFLAGS * bc/fix-cherry-pick-root (2010-09-27) 1 commit (merged to 'next' on 2010-09-27 at e27f4c9) + builtin/revert.c: don't dereference a NULL pointer * cw/gitweb-hilite-config (2010-09-21) 1 commit (merged to 'next' on 2010-09-27 at dd234ba) + Enable highlight executable path as a configuration option * jk/repack-reuse-object (2010-09-27) 2 commits (merged to 'next' on 2010-09-27 at 5719f72) + Documentation: pack.compression: explain how to recompress + repack: add -F flag to let user choose between --no-reuse-delta/object * mg/reset-doc (2010-09-15) 6 commits (merged to 'next' on 2010-09-22 at 2a10b71) + git-reset.txt: make modes description more consistent + git-reset.txt: point to git-checkout + git-reset.txt: use "working tree" consistently + git-reset.txt: reset --soft is not a no-op + git-reset.txt: reset does not change files in target + git-reset.txt: clarify branch vs. branch head * uk/fix-author-ident-sed-script (2010-09-23) 1 commit (merged to 'next' on 2010-09-27 at 5ad7d90) + get_author_ident_from_commit(): remove useless quoting -------------------------------------------------- [New Topics] * ab/send-email-perl (2010-09-30) 16 commits (merged to 'next' on 2010-09-30 at cf8e58e) + send-email: extract_valid_address use qr// regexes + send-email: is_rfc2047_quoted use qr// regexes + send-email: use Perl idioms in while loop + send-email: make_message_id use "require" instead of "use" + send-email: send_message die on $!, not $? + send-email: use (?:) instead of () if no match ...
A few of the Signed-off-by and Reviewed-by stanzas in this series are messed up.
Have a look at
$ git log 9855b0..41ae8f1
Copy and paste error?
nazri
--
Why did you amend this to use this sed trick:
+sed -e 's/Z$//' >expect <<\EOF &&
+not ok - 1 tests clean up even after a failure
+# Z
+# touch clean-after-failure &&
+# test_when_finished rm clean-after-failure &&
+# (exit 1)
+# Z
+not ok - 2 failure to clean up causes the test to fail
+# Z
+# test_when_finished \"(exit 2)\"
+# Z
Is it just to keep it diff --check clean?
Anyway if we munge the output like this the output of test_cmp will be
more confusing when it fails, because it'll be diff(1)-ing something
Could you please pick up the 160 commit version of this at:
git://github.com/avar/git.git ab/i18n
It has the new "gettext.c: use libcharset.h instead of langinfo.h when
available" patch by Erik Faye-Lund and me, which would be nice to have
tested in pu.
--
This is a "give an inch and they'll ask for a mile" sort of thing, but would it be possible to maintain a stable branch with the i18n infrastructure that only gets rebased when there is reorganization going on? The gettextization and translations are rebased for other reasons (to avoid going crazy), I know. But with the infrastructure it is starting to be hard to track what changes over time. --
I could do that, but I've been hoping that it just gets picked up for the `next -> master` process of git.git itself and *that* becomes the stable target. But I have no idea what's going on at the other end, i.e. there's no comments about it in the "What's cooking in git.git" posts or elsewhere. So it's hard to know whether something like this is needed. It's been about as ready as it's ever going to get for about a month now. The libcharset.h change that was added to it a week or so ago could have come on top of the series. But since it hadn't been merged anywhere I rebased and inserted that earlier in the series for the relatively obscure case of helping bisectability on Windows. I'm starting to get the feeling that there isn't much interest in i18n support at all. It didn't show up highly in the wanted features in the Git survey, and most of the translations have been contributed by people I've poked directly. So maybe I should just get the hint and drop it *shrug*. I don't have much time to spend on this these days since I've been moving / starting a new job. So if it's not going to get merged into core I might as well do something else. (I'm not being bitter, really. Just practical) --
Probably it is a difference in culture between e.g. the Linux kernel and other projects. In the world I'll stereotype as the Linux kernel world, forks are considered good! Developments everyone agrees is good in the long run (like the Linux realtime tree) are not necessarily merged, for years even, the justification being that until the _immediate_ benefits for Linus outweigh the risks for Linus, it just doesn't make sense to merge. This avoids bloat and bugs from code that is not being used by anyone, while allowing development to continue to happen. Now git.git is not the Linux kernel. In particular, Junio provides the extra service of a working "proposed updates" branch, including changes that are not necessarily to be part of the next release. But the underlying principle is the same: until there is an _immediate_ benefit to including a feature in releases that does not outweigh the downsides, it just does not happen. What that means: interested parties need to start testing the l10n code. There should be a reliable upstream for users of this feature and ideally that should not be Junio unless he wants to (and Ævar, I think you have been doing a good job of that, just saying it's I hope not! e.g. the LC_CTYPE problems have not been resolved (and yes, I'm interested in it, at least. --
There's definitely a chicken & egg problem here. Unlike what you'd get
in the case of Linux this isn't an unstable device driver that
somebody needs. It's just something that optionally makes life easier
for people that pretty much by definition don't follow this mailing
list.
So it's hard to build an upstream of users. With all other free
software programs that happens by the canonical upstream version of
the program being internationalized. The distros then picking up that
version and give it to end users.
So what I've tried to do to make this acceptable for inclusion in core
is to make this whole thing a no-op unless it's explicitly
requested.
You can skip it altogether with NO_GETTEXT=YesPlease, and even with
NO_GETTEXT= it won't get used unless you've explicitly set LANGUAGE=,
or LC_*= variables accordingly.
Maybe I could do something to make it even less intrusive, e.g. have a
core.i18n=yes config variable. But I haven't gotten any feedback on
that so I've kept the current scheme of making it behave like pretty
The LC_CTYPE problems I'm aware of were worked around in this patch:
gettext.c: work around us not using setlocale(LC_CTYPE, "")
While that's not a perfect solution I think it's the best we're going
to get for the time being. I'm pretty convinced that if I tried to
make git itself LC_CTYPE-safe as part of this already huge series it'd
never get merged. Having messages with question marks from strerror()
on certain platforms is an OK compromise in my opinion.
That's good to hear.
--
Heya, Ditto, especially since i18n, once the groundwork has been done, requires very little work to support (as long as someone keeps an eye out for people forgetting to _("") a string). -- Cheers, Sverre Rabbelier --
On Fri, Oct 15, 2010 at 8:18 AM, Ævar Arnfjörð Bjarmason I agree with Jonathan that there might be some value to clearly delineating the i18n infrastructure from the application of it to the rest of the code base.The i18n infrastructure should be, relatively speaking, less invasive than the application of it throughout the code base. It would be good for Junio (I presume) to have the option to integrate the infrastructure in one hit, but allow the application of it to be deferred, perhaps on a subject-area by subject-area basis. jon. --
It already is pretty much separate internally. To get only the
infrastructure you pull the series and omit the "gettextize"
patches. There is some slight tangling up e.g. here:
6da9243 gettext tests: skip lib-gettext.sh tests under GETTEXT_POISON
234ddee gettextize: git-init basic messages
b161357 gettextize: git-init "Initialized [...] repository" message
3ad3f9f gettext tests: test if $VERSION exists before using it
1f56190 gettext tests: add detection for is_IS.ISO-8859-1 locale
5f24f25 gettext tests: test message re-encoding under Shell
ec31cc6 gettext tests: test re-encoding with a UTF-8 msgid under Shell
45e8a56 gettext tests: mark a test message as not needing translation
c9db9aa po/is.po: add Icelandic translation
83beb97 gettext tests: test message re-encoding under C
Where the tests I'm adding depend on an earlier "gettextize"
patch. That can easily be split up if there's demand for it. But so
far nobody has asked so I haven't done it.
--
Hi again, The question marks[1] are what I was referring to. Consider this from the point of view of someone working on the Debian package: would users consider that an appropriate or positive change for squeeze+1? (Hint: not if it doesn't come with some benefit in their locale, no.) I suspect making git work with other LC_CTYPE would not actually be too hard[2]. Such fixes are useful for futureproofing and increased sanity anyway --- they do not have to be part of the l10n branch imho. [1] fatal: unable to stat 'foo.c': Op?ration non permise [2] except that annoying printf bug. --
No benefit? The benefit is that the program they previously either didn't understand or understood poorly is now talking to them in their native language. That's a pretty big benefit. The tradeoff is that a small subset of the messages which include strerror() output will show non-ASCII characters as question marks on GNU systems. Don't get me wrong. I think it's a bug that we need to fix. I just think it's a relatively minor annoyance, not a showstopper. With it the feature *mostly* just works, and the things that don't can be It's something we want yes, but not something I have time for these days. So unless someone else is interested in helping audit all that code, providing a printf() fallback on glibc etc. it'll block the i18n series. For something I at least think is a relatively minor issue that doesn't warrant throwing the baby out with the bathwater. --
And for the languages that are not translated yet? Don't get me wrong --- I'm only trying to give a sense of what it is like for a user to experience a regression. It is generally little Oh, I never meant to say that this should be a blocker. Only that there really are costs and benefits to weigh. Much more important than the known bugs are the unknown bugs --- you've heard this before, I think. The way to get rid of unknown bugs (aside from inspecting code) is to get users. For example, if Gerrit doesn't mind, I would like to apply your patches to experimental once the version being staged for squeeze clears from there. Other interested people can attract users in other ways --- by providing documentation, tarballs, ... --
That would be great. Let me know if I can help with that in some way. --
People might have noticed that I've refrained to take other topics that may add new messages to 'next'. I would wanted to merge ab/i18n early in the cycle soon after dust has settled after 1.7.3 release. And I still do. Having said that, there are different classes of risks associated with i18n effort. (1) Regressions that even hit a NO_I18N build. (2) Regressions that hit LC_ALL=C execution in a !NO_I18N build. (3) Regressions that hit plumbing run in a non-C locale. . i18n needs not just marking strings with _("string") but also needs to fix code that manually formulates messages by series of strcat(). It may need to start using allocations on the heap, with potential risk of usual bugs (leaks, use-after-free, etc.) and performance degradation. . Messages left unmarked with _("string"), or messages that are marked with _("string") that shouldn't have, won't be serious issues for the first two classes. The latter is a serious regression for the plumbing. We are all human, and misconversion during this process is possible, even though the above classes of regressions are unacceptable. On the other hand, as long as the above three classes of regressions are minimum and quickly fixed/fixable, issues in non-C locale Porcelains are tolerable during the initial cut. I've looked at the patches in the series, and plan to take another look. I'm sure others on the list have checked the series, some with fine combs, too, and hopefully Ævar has fixed any such regression that has been reported and plans to do so for the ones discovered in the future. As long as we are sure that we have done a reasonable effort to eyeball the patches, the logical next step would be to merge the series to 'next' for further testing. (4) Incomplete *.po file, and languages without *.po file. Once we are sure that the series does not have the first two classes of issues, we can ask everybody to mark new strings in their series, iow, merge the i18n part to 'master'. ...
Good to hear. I didn't know that before, and that's really all I
Right. I'll have time to deal with any bugs that crop up, and I'm
Do you mean to re-arrange it so that there's a patch at the front of
the series that introduces gettext.h with only the fallbacks:
#define _(s) (s)
#define N_(s) (s)
And then merge the ~120 gettextize patches first and do the
infrastructure later?
That could be done, but just merging the whole 160 patch series and
turning it off by default would have pretty much the same effect. And
since I thought this was going to get merged soon-ish anyway I didn't
I think we're basically at a point where merging it down to next ->
Thanks.
--
I don't think that's what he was saying, but I think that _would_ be
helpful. So the history could look like:
infrastructure
/ \
introduce gettext.h ab/i18n
\ /
gettextization
/
merges from master where appropriate,
perhaps just at the beginning for simplicity.
That way, each side could have patches rearranged or squashed when
needed without disrupting the other.
What do you think?
--
Not really.
Two pieces that would be nice to have in 'master' (or even 'maint', if we
consider avoiding merge conflicts and mismerges when fixes are queued
there) are:
1. preparatory fixes to code that builds message string by concatenating
parts of speech in English word ordering into buffer or emitting to
output stream piece by piece; they should convert them to some form of
sprintf-like format strings plus arguments. This does not necessarily
have to mark the format strings with _(s).
2. the empty definitions for _(s) and N_(s).
I would consider the first one part of general clean-up job, which we know
will help i18n, but which we would want to do regardless of i18n. And it
is probably the most error prone part in the whole process. The sooner
the result of these two steps hit 'master', the less pain for everybody.
And then:
3. actual marking of strings with _(s) and N_(s).
which can be merged to 'next' after vetting for regression (the first two
classes).
4. Adding and polishing of *.po files for actual messages and languages,
i.e. l10n.
This can happen pretty much independently from 3. Honestly I would be
happier if I do not have to keep track of the actual l10n part.
I think the current series to some degree conflates steps 1. and 3. As
the list of risks I outlined in the previous message show, mistakes in 1.
is much more grave than mistakes in 3. (iow, no big deal for having a few
untranslated messages during early rounds of i18n support); I would have
preferred these two steps were clearly separated, so that we can push the
first two steps out to the 'master' sooner.
--
I'd just like to second (or third or..) Junio's points here since I had suggested a split like that earlier already, and I think the current state of affairs simply makes many potential reviewers (at least one that I know of) go away. 1.,2. and (maybe to a lesser degree) also 3. should be able to find many reviewers, thus making the potentially problematic parts as solid as possible. (I'm still waiting for a conceptual approach to 4., i.e. glossary first, but that is a different issue.) Michael --
Yes and files with trailing whitespaces are hard to understand in general Hmm, why? "expect" has trailing whitespaces on lines that end with Z, i.e. what the original patch wanted to place in. --
Fixed in the next iteration (v4, just posted). Thanks for noticing. ...Johan -- Johan Herland, <johan@herland.net> www.herland.net --
