Re: [2.6 patch] UTF-8 fixes in comments

Previous thread: [2.6 patch] make nfs4_drop_state_owner() static by Adrian Bunk on Monday, April 28, 2008 - 8:39 am. (1 message)

Next thread: [2.6 patch] cleanup #include <linux/version.h>'s by Adrian Bunk on Monday, April 28, 2008 - 8:40 am. (1 message)
From: Adrian Bunk
Date: Monday, April 28, 2008 - 8:40 am

This patch converts some non-UTF-8 encoded text in comments to UTF-8.

Signed-off-by: Adrian Bunk &lt;bunk@kernel.org&gt;

---

This patch is attached compressed to prevent my MUA from mangling it.

 Documentation/PCI/pcieaer-howto.txt |    2 -
 arch/arm/mach-omap2/io.c            |    2 -
 arch/s390/kernel/ebcdic.c           |   36 ++++++++++++++--------------
 drivers/hid/hid-input.c             |    2 -
 drivers/isdn/hisax/enternow_pci.c   |    2 -
 drivers/media/video/saa5249.c       |    2 -
 drivers/misc/ibmasm/command.c       |    2 -
 drivers/misc/ibmasm/dot_command.c   |    2 -
 drivers/misc/ibmasm/dot_command.h   |    2 -
 drivers/misc/ibmasm/event.c         |    2 -
 drivers/misc/ibmasm/heartbeat.c     |    2 -
 drivers/misc/ibmasm/i2o.h           |    2 -
 drivers/misc/ibmasm/ibmasm.h        |    2 -
 drivers/misc/ibmasm/ibmasmfs.c      |    2 -
 drivers/misc/ibmasm/lowlevel.c      |    2 -
 drivers/misc/ibmasm/lowlevel.h      |    2 -
 drivers/misc/ibmasm/module.c        |    2 -
 drivers/misc/ibmasm/r_heartbeat.c   |    2 -
 drivers/misc/ibmasm/remote.h        |    2 -
 drivers/misc/ibmasm/uart.c          |    2 -
 drivers/s390/ebcdic.c               |   36 ++++++++++++++--------------
 drivers/scsi/jazz_esp.c             |    2 -
 drivers/spi/omap2_mcspi.c           |    2 -
 drivers/usb/storage/cypress_atacb.c |    2 -
 drivers/video/omap/rfbi.c           |    2 -
 drivers/video/omap/sossi.c          |    2 -
 26 files changed, 60 insertions(+), 60 deletions(-)

From: Willy Tarreau
Date: Monday, April 28, 2008 - 4:05 pm

Is this really needed Adrian ? I mean, everyone reads iso-8859-1, not
everyone reads UTF-8. Now I get random crappy chars which cripple my
xterms when reading such comments, and I have to do a full-reset once
I've read them. It's not as if it was *that* important, and to be
honnest, if you had not sent this patch, I would not even have known
that non-ASCII characters were here. However, it will quickly get
annoying if a recursive grep returns those pesky codes on non-compatible
consoles...

Quite frankly, it does not bring anything beyond trouble. I'm not adding
a NAK here because I find this rude, but I don't like the orientation
we're taking with the sources. We should not force people to install
version X or Y of a particular system just to read sources.

In fact, I would have better converted accentuated chars to their ASCII
equivalent to be more friendly with people who only read 7-bit.

Regards,
Willy

--

From: H. Peter Anvin
Date: Monday, April 28, 2008 - 6:29 pm

&quot;Everyone&quot; who speaks a Western European language, perhaps; and even 
then, mostly because a lot of tools still have a &quot;oh, it's not valid 
UTF-8, guess iso-8859-1&quot; mode.  The most common instance of non-ASCII 
characters in Linux kernel code are people's names, and there are plenty 
of names which aren't representable in either ASCII or iso-8859-1.

The debate on this was years ago, and the consensus was to migrate to 
UTF-8; however, the salient information should be expressed in the ASCII 
character set unless impossible.

	-hpa

--

From: Willy Tarreau
Date: Monday, April 28, 2008 - 10:06 pm

Or simply because people have not migrated all their install, or have
explicitly disabled UTF-8 a few hours after starting to use it once
they discovered the mess it caused and the poor support from the

And do we really consider that people's names in *comments* cannot
be converted to pure ASCII ? I'm western european and have always
been against accents in comments (another reason to write comments
in english BTW). Unix and internet have lived without accents for
almost 30 years without anyone really bothering. And now we try to
put them everywhere (even in domain names, implying big security
issues) and it causes real annoyances. People's names have not
changed in 30 years, so I guess that the rules used during this

Willy

--

From: H. Peter Anvin
Date: Monday, April 28, 2008 - 11:04 pm

For some languages, it's considered acceptable, for others it's 
considered major corruption.

	-hpa

--

From: Adrian Bunk
Date: Tuesday, April 29, 2008 - 12:29 am

Non-ancient distributions default to UTF-8 and have tools that handle it 
fine.


Accents are very rare in names in the kernel.

Most non-ASCII characters are umlauts and there's no sane way to 
express them in ASCII (and the vowels without umlaut are pronounced 
quite differently and might even make names look very strange).

And that's only within European languages, outside it becomes even 

The comments in the kernel have been converted to UTF-8 quite some time 
ago, what I'm fixing with my patch is just some recent non-UTF-8 stuff 
that creeped in.

And names in comments in the kernel were not pure ASCII since very 
early, they were in other charsets.

Mostly iso-8859-1, but not all of them.

I remember that for one name we first guessed which character it was and 
then tried to figure out which charset it was in (no, it was not one 
of iso-8859-*).

So it was not &quot;ASCII -&gt; UTF-8&quot;, it was

cu
Adrian

-- 

       &quot;Is there not promise of rain?&quot; Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       &quot;Only a promise,&quot; Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Willy Tarreau
Date: Tuesday, April 29, 2008 - 1:14 am

Well, I accidentally used a freshly installed laptop running mandriva 2008.
I was typing in a terminal inside KDE (I don't know the program name, sort
of an xterm, but with huge borders all around). I made a typo in a word and
typed in a &quot;é&quot; (e acute). Pressing backspace to fix it showed me that I
remove more chars than typed. I tried again. Pressing this letter 5 times,
then 10 times backspace. I removed 5 chars from the prompt. I suspect that
if I had used some chars with wider encoding (eg 4 bytes), I could have
removed as many... Clearly those tools are not ready.

Also, I recently upgraded one machine from 2.6.22 to 2.6.25. Same crappy
behaviour on the console (with bash). I quickly set the vt.defaults on
the kernel command line to fix the problem.

At this stage, I'm not even trying to &quot;fix&quot; the problem, as it's
a philosophical debate and I do not want to enter it. Some people
consider it normal that we break user-space applications and that
it's obvious that all useland code has to be replaced to remain
compatible with &quot;evolutions&quot;, and I simply do not support this
principle. I just care about having the ability to disable the
broken behaviour. Most of the problem comes from the variable
length characters causing wrapping lines and misplaced tabs when
read in non UTF-8 aware editors and/or terminals. The rest of
the problem with the terminal going mad could have been caused by

Agreed, but it's been done for *years*. I received mails from people
spelled &quot;jorn&quot; or &quot;jurgen&quot; and they had no trouble using that spelling


I would have loved to see &quot;several different charsets -&gt; ASCII&quot;.

Willy

--

From: Helge Hafting
Date: Tuesday, April 29, 2008 - 2:06 am

So don't use that particular tool, and/or file a bug with the 
maintainer. :-)
I have used utf-8 for years - the fact that some editors and some terminal
emulators fail is not a problem for me. There are so many that works
just fine. There is unicode xterm, and rxvt if you consider xterm too heavy.
Both vi and emacs have versions that handle utf-8 competently. You may 
have to
put in a one-off effort in finding a suitable font for your xterm, if you
actually wants to see proper umlauts in all cases. If you don't care about
looks, then xterm will display blanks/squares and backspace etc. will 
Outside the english-speaking world, userland _was_ completely
broken in the day of ascii. And supporting the multiple
iso8859-xx encodings was completely broken too, if you ever needed
more than one of them.

Unicode gives userland an opportunity to actually work decently
for the first time. Now, ascii may be fine if C development is all
you ever use the machine for. You can mangle a few names in
comments - some people won't like that at all, some won't care.

But try using the same machine for writing a business letter without
a proper character set. You won't be taken seriously. Or even a non-english
gui app with ascii-only menus.

If you want to know what it is like, knock three vowels or so out of the
english alphabet. Consider them not supported. Invent &quot;transcriptions&quot; 
if you like.
Try writing a letter that way! Or even kernel code with informative 
comments.
Consider the alternative - disable the broken behavior by using a
tool that handles UTF-8. There are certainly enough aware apps/tools for
It has been done for years because there were no other choice. If you
wanted to work in unix, just forget your own name! Now there is a choice.
Some people still don' care and is fine with &quot;jorn&quot; and such. Some are
pissed off, takes offense, or stick to windows or simply puts unicode
into kernel comments.

If your mailer doesn't support utf-8, chances are you get some mail
Lots of ...
From: Alan Cox
Date: Tuesday, April 29, 2008 - 2:33 am

&gt; Outside the english-speaking world, userland _was_ completely

(American)

Formal UK English uses accented characters for some foreign imports (eg
café), ï for words like naïve, and if you are really pretentious you need
the æ symbol for words like mediæval although for modern writing this is
considered silly.

The bash problem btw should have been fixed (if it is bash causing it) as
of 2.05b and readline 4.3. If its being cause by the KDE terminal that
would suprise me but might be worth filing a bug.

Alan
--

From: Willy Tarreau
Date: Tuesday, April 29, 2008 - 3:09 am

It was not my machine, and had you been there, you would have heard me call

It's too easy to impose crappy designs to end-users and tell them that if
that does not work they have to file a bug. There are a minimal set of
things that must be tested before shipping. Seeing that the default
terminal emulator in KDE on Mandriva 2008 is configured in UTF-8 and does
not properly render it simply makes me sick. This is broken by design and
even distros trying to get it working for years still can't cope with it.

I don't care about the *look*. Mutt shows me a question mark when it does
not know. I care about the *behaviour*. Having backspace go back farther
than the prompt is not acceptable. Having 80-col lines span over two lines

yes but you just had unexpected characters. Just like MS-DOS when

Unicode yes, UTF-8 no. UTF-8 is a compressed encoding of unicode.
That's as silly as if you had to replace your terminals to read
native gzip, and expect them as well as all the tools to work


Well, booting 2.6.25 with &quot;init=/bin/bash&quot; results in backspace
eating the prompt after pressing accentuated letters. Even the
control chars have been correctly handled on many UNIXes for
decades! The real problem with this crap is that it is viral :
&quot;replace all userland applications or die alone on your island&quot;.
Then &quot;ah, your applications behave in a funny manner, well that
may be because of UTF-8, but that is not important, just wait
for the update&quot;. I'm not even speaking about the security
implications it has on a lot of tools, starting with regex

Funny that you mention Windows. Windows has been using 16-bit unicode
for a long time without problems. It's a clean encoding. Like it or not.
Since they have started using UTF-8, bare windows users have started
telling me that there are often bizarre characters in texts instead of
accents. That most often happens in forwarded mails. so they get hit

Once again, I don't care about the strange looking, just about the

You know why we got this ...
From: Alan Cox
Date: Tuesday, April 29, 2008 - 3:10 am

I would describe the UCS-2 situation as a disaster area - embedded nuls
causing breakage, inability to represent the full unicode space and

Actually it was primarily designed to make moving encoding painless so
that ascii still worked and C properties like \0 plus traditional

screen supports the needed transliteration for you.

Alan
--
&quot;Having worked in a university for more than twenty years after leaving
 industry, I had become unused to seeing management skill routinely
 exercised, universities being administered rather than managed&quot;
                -- Peter Checkland

--

From: Willy Tarreau
Date: Tuesday, April 29, 2008 - 3:33 am

The console yes (by default until I disabled it to restore correct
behaviour). The shell no, it was the one present on my machine and
has never been compiled with UTF-8 support, and should not have to.

If we say that starting with 2.6.24, we're explicitly breaking
compatiblity with old userland, fine. But that was not explicitly
stated.

In my opinion, the problem is that when I press &quot;é&quot;, the system sends
two chars to the bash, which itself sends two chars to the terminal,
which only displays one and moves the cursor one step ahead. Then,
pressing backspace once sends one backspace all along, resulting in
the terminal blanking one displayed char, but the shell not being
aware that only half of it was removed. But if you look at how
control chars are handled, if you display ^H then press backspace,
you remove all of it. It's the terminal which adjusts the position
depending on the character length.

So in my opinion, when we send one backspace to the terminal to
remove one character, since there are two in the buffer, we
should not get back one full char. Ideally, the console driver
should send as many backspaces as needed to fix the multiple
characters that were emitted. It's not logical at all that if
we send 3 chars to a process with one key, sending a cancellation
of those chars only sends one backspace.

You see, that's really what I hate with this encoding. Every
stage relies on the next one to do the fixup. And of course, a

But at least, there is no feeling of having it working. You immediately

I cannot imagine how one can believe that something which transcodes one
char as a series of 1-to-4 chars will be a painless move. A lot of code


Willy

--

From: Alan Cox
Date: Tuesday, April 29, 2008 - 3:34 am

Bizarre, so you are using deliberately misconfigured ancient userspace to

The shell puts the terminal in character by character mode and readline
does this. If you have your shell/readline deliberately set up not to be

The console driver isn't involved - readline took over for the shell, and
readline most definitely supports this in a utf8 locale.

Alan
--

From: Willy Tarreau
Date: Tuesday, April 29, 2008 - 3:12 pm

Hi Alan,


No I'm not using anything deliberately misconfigured. I'm trying to explain
that on the opposite, any tool which has not been explicitly adapted to those

Please, I'm not &quot;deliberately&quot; setting my tools *not* to support unicode.
I have tools which have worked for years and which are now asked to behave

OK I could reproduce the case without ever involving either a shell or
readline or anything. Using &quot;cat&quot; as the init program exhibited the
anomaly, though it was not much easy to analyze. Then I switched to
&quot;init=od -An -tx1 -&quot;.

1) if I enter &quot;A&quot; then press backspace, I get nothing. Pressing enter 16
   times flushes the line buffer and &quot;od&quot; prints 16 times &quot;0a&quot;, indicating
   nothing was remaining in the buffer.

2) if I enter Ctrl-V Ctrl-A, my display prints &quot;^A&quot;, and when I press
   backspace, I correctly get the cursor back two chars. Once again,
   flushing the buffer with enter shows it was empty.

3) if I enter Alt-196, I get a &quot;Ä&quot;. Flushing the buffer shows that od
   got two bytes: c3 84.

4) now if I enter Alt-196 and press backspace, my &quot;Ä&quot; is removed by the
   backspace, but only the second byte is flushed from the line buffer.
   Then, if I press enter 15 times, I get a line with c3 0a 0a 0a ...
   And there is no user-land involved here.

I'm really hoping you better understand the problem now. Pressing backspace
to fix input does not correct the input with multi-byte chars, it leaves
incomplete start sequences. If I press Alt-1111111, then backspace, I get
f4 8f 91 0a 0a 0a 0a because it is f4 8f 91 87 minus one byte.

Of course, pressing Backspace multiple times removes them all, but it also
removes previous characters on the display.

Another experience :

I press 01234, then Alt-255, Backspace, then 56789. On the display, I have
0123456789. od gets 30 31 32 33 34 c3 35 36 37 38 39.

Now if I want to correctly fix the input, I have to press backspace twice,
but then I have to make the '4' disappear from my display, while knowing ...
From: Alan Cox
Date: Tuesday, April 29, 2008 - 3:15 pm

Did you put the console into utf-8 mode before the cat ?
--

From: Willy Tarreau
Date: Tuesday, April 29, 2008 - 4:05 pm

I had not *explictly* disabled it, since as the doc suggests :

        vt.default_utf8=
                        [VT]
                        Format=&lt;0|1&gt;
                        Set system-wide default UTF-8 mode for all tty's.
                        Default is 1, i.e. UTF-8 mode is enabled for all
                        newly opened terminals.

And I know that I can fix the behaviour by explicitly setting it to zero.
Also, the fact that &quot;od&quot; shows me multi-byte characters on the input
indicates to me that everything is set to UTF-8. So unless I'm missing
something, my console is set by default to UTF-8 (I test this on 2.6.25).

Regards,
Willy

--

From: H. Peter Anvin
Date: Thursday, May 1, 2008 - 1:18 pm

Yes, there is apparently a real bug here: this vt setting doesn't 
propagate to the tty layer iutf8 flag.

	-hpa
--

From: Alexander E. Patrakov
Date: Thursday, May 1, 2008 - 2:46 am

export LANG=en_US.UTF-8 (i.e., inform the userspace that you are using UTF-8), 
unset LC_CTYPE and unset LC_ALL (so that they don't override $LANG), and problem 
solved.

-- 
Alexander E. Patrakov
--

From: H. Peter Anvin
Date: Tuesday, April 29, 2008 - 12:33 pm

Not to mention the fact that UCS-2 ran out of code points almost as soon 
as they said &quot;no more codepoints.&quot;  The result was UTF-16, a hideous 
abortion which took all the problems with wide encodings, combined it 
with all the problems of multibyte encodings, and added a few new ones 
for good measure.

	-hpa
--

From: Adrian Bunk
Date: Tuesday, April 29, 2008 - 3:42 am

I can reproduce your problem in a plain xterm when setting LANG=en_US
(most likely the same problem can occur with other non UTF-8 settings).

In this case I'm actually more surprised that the character is displayed 
correctly than that you have to type backspace twice.

Any kind of charset mixing is highly problematic (which is also why my 
patch was attached compressed), so if you disable UTF-8 anywhere in a 
modern distribution problems are somehow expected (it could also be a 

It's not a compressed encoding, it's a variable-length encoding.

Besides the size advantages one main advantage of UTF-8 is that ASCII is 
valid UTF-8. This means that for the ASCII source code in the kernel it 
doesn't matter whether it's treated as ASCII or UTF-8, and no conversion 
was needed.


cu
Adrian

-- 

       &quot;Is there not promise of rain?&quot; Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       &quot;Only a promise,&quot; Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Willy Tarreau
Date: Tuesday, April 29, 2008 - 4:06 am

It's not that I *had* to type it twice. But I *could* type it twice, and

No, it was not disabled at all. I had to type in a command for a
co-worker who just did a default install the day before, and typed a

I don't agree. If you refuse character-set mixing, there's no problem.
Bit 7 of first char == 1 ? =&gt; full text is 32 bit.

Willy

--

From: Adrian Bunk
Date: Tuesday, April 29, 2008 - 4:27 am

You miss my point.

The point is:
A conversion &quot;ASCII -&gt; UTF-8&quot; is a nop.

This means when changing the kernel from half a dozen charsets used in 
comments to UTF-8 we only had to change the few characters actually 
containing non UTF-8.

Going to something like UTF-32 as you suggest would have involved 

cu
Adrian

-- 

       &quot;Is there not promise of rain?&quot; Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       &quot;Only a promise,&quot; Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Adrian Bunk
Date: Tuesday, April 29, 2008 - 4:32 am

cu
Adrian

-- 

       &quot;Is there not promise of rain?&quot; Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       &quot;Only a promise,&quot; Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Jeremy Fitzhardinge
Date: Tuesday, April 29, 2008 - 1:18 pm

Same thing ;)

    J
--

From: Helge Hafting
Date: Wednesday, April 30, 2008 - 2:15 am

Yeah, ascii-only is a crappy design. :-/ 
I don't know if mandriva is broken by design - I only use debian.
It would not surprise me if some distros  botch utf-8 through negligence.
They are based in english-speaking countries and have their biggest
user bases there - the majority of their customers aren't going to use 
more than
ascii so why should they bother. 

Someone made a &quot;cool&quot; terminal emulator? Transparency and effects?
Distribute it, despite the fact that it won't work in all cases.
Distro contains xterm anyway for those that need a fallback.
Machine owner thinks one terminal emulator is enough and
I don't see how wrong characters are better than backspace eating
the prompt or 80-col overflowing when it shouldn't. It is all breakage 
either way.
Stuff break if TERM is set wrong for the terminal in use too, or if the
app in use don't _use_ the TERM variable. This happens too, and you only
notice if the app runs on  a terminal incompatible with TERM=linux.
Amusing and accurate. I use Norwegian which has 3 non-ascii vowels. As well
as some accented characters, but they don't crop up in _every other 
It had to be done in an ascii-compatible way. That way, a userland 
containing
a mix of ascii-only apps,  fully utf-8 supporting apps, and apps with 
partial
utf-8 support will work flawlessly for ascii-only stuff. Like C source and
english language tools. Of course utf-8 only works in the apps 
supporting it,
but utf-8 users keeps fixing this in the apps they need.

Breaking ascii compatibility was not an option, because that means
replacing the entire userland in one operation.  That cannot be done
unless a single authority control everything, and the open source world
isn't like that.

Variable length encoding is necessary, given that:
* Ascii should work as before, i.e. one &quot;char&quot; per ascii character
* One single encoding so a plain text file can contain the symbols of
   any writing system in use. There are way more than 256 symbols.

No, I don't have a utf-8 ...
From: Adrian Bunk
Date: Wednesday, April 30, 2008 - 12:22 pm

Mandriva is a French company.

And what Willy describes really sounds like someone fiddling with some 
settings (or something like accidentally selecting some non UTF-8 
locale).

Bad things can happen when you somehow get charsets mixed, but 
distributions default to UTF-8 for quite some time, and problems

cu
Adrian

-- 

       &quot;Is there not promise of rain?&quot; Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       &quot;Only a promise,&quot; Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: H. Peter Anvin
Date: Wednesday, April 30, 2008 - 12:42 pm

Well, we were talking about Mandriva, which is a Brazilian-French 
company, their main languages are Portugese and French; you'd think 
they'd notice themselves.  Most likely there was something in Willy's 
configuration that buggered it up.

	-hpa

--

From: Adrian Bunk
Date: Tuesday, April 29, 2008 - 2:43 am

This sounds as if you had UTF-8 characters in a non UTF-8 environment.


Email addresses are a different topic.

But it's not right in names, and if someone then pronounces their name 

cu
Adrian

--

       &quot;Is there not promise of rain?&quot; Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       &quot;Only a promise,&quot; Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: H. Peter Anvin
Date: Tuesday, April 29, 2008 - 12:31 pm

Presumably, this was konsole.  konsole works fine with UTF-8 (I use it 
that way every day); the most common cause of this kind of problems is 
people explicitly clobbering the locale or charset class defaults in 
their login scripts.

	-hpa

--

From: Willy Tarreau
Date: Tuesday, April 29, 2008 - 1:05 pm

Possible. It was the one you get by clicking on a terminal icon.
Huuhhh what an horror, I'm discussing icons and GUIs on LKML. I must

I really doubt the miss would have done this. Or someone would have done
it for her which I really doubt in such a small time frame after a fresh
install from the day before. I will investigate though.

Willy

--

From: H. Peter Anvin
Date: Tuesday, April 29, 2008 - 1:09 pm

From one of Alan's posts it sounds like there was a bug with multibyte 
characters in readline at some point that got fixed relatively quickly, 
but still made it out.

	-hpa
--

From: David Kågedal
Date: Friday, May 9, 2008 - 5:48 am

That's a ridiculous statement.  Just because you didn't bother, you
can't assume that the people who were actually affected didn't bother.

I went through large parts of the 1990's under the name &quot;David
K}gedal&quot;. And I bothered.

And no, the second character in my last name is not an accented a,
they have been separate letters for hundreds of years in Sweden.  So I
can live without using accented letters, as long as I can write
Kågedal including the å. :-)

Not that my name appears anywhere in the Linux source, but I still
felt the urge to reply...

-- 
David Kågedal
--

From: Alan Cox
Date: Tuesday, April 29, 2008 - 2:01 am

Perhaps we should put them in latin as well just in case any Roman is
struggling with this new language 8) Distibutions have been shipping UTF
enabled by default for years and years.

Alan
--

From: Jan Engelhardt
Date: Tuesday, April 29, 2008 - 2:19 am

With some being overly late.
--

From: Willy Tarreau
Date: Tuesday, April 29, 2008 - 2:34 am

&quot;enabled&quot; does not mean &quot;working&quot; Alan. I know one distro which I will
not name in order not to offense you which shipped with it enabled by
default, but which would not properly display the characters on the
console, resulting in mangled messages during boot. I particularly
remember the &quot;[ECHEC]&quot; (&quot;[FAILED]&quot;) with random garbage instead of the
Willy

--

From: Alan Cox
Date: Tuesday, April 29, 2008 - 2:41 am

No offence taken. In fact I seem to remember filing similar bugs at the
time about rpm/popt getting its help formatting wrong in some locales (eg
Welsh) for similar reasons - but that was some time ago.

All the mainstream tools handle utf-8 just fine, joe is quite happy
editing utf-8 these days (as are the legacy vim and emacs editing
tools ;)). There really are no good reasons left not to use UTF-8.

Alan
--
        &gt; you are confusing me even more.  
        Of course.  &quot;I'm from IBM.  I'm here to help.&quot;  ;-)
                                -- Alan Altmark
--

From: KOSAKI Motohiro
Date: Tuesday, April 29, 2008 - 5:18 am

Good Job!

   Acked-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;


AFAIK some file already are written by utf-8.
frankly, I say from the standpoint as the non-Europian,

all files are written by ascii:      no problem
all files are written by iso8859-1:  need editor customize
all files are written by utf-8:      no problem
some files are written by iso8859-1, 
but another files are written by utf-8: Ouch! Noooooo!!



--

Previous thread: [2.6 patch] make nfs4_drop_state_owner() static by Adrian Bunk on Monday, April 28, 2008 - 8:39 am. (1 message)

Next thread: [2.6 patch] cleanup #include <linux/version.h>'s by Adrian Bunk on Monday, April 28, 2008 - 8:40 am. (1 message)