Re: [2.6 patch] UTF-8 fixes in comments

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Alan Cox <alan@...>
Cc: Helge Hafting <helge.hafting@...>, Adrian Bunk <bunk@...>, H. Peter Anvin <hpa@...>, <linux-kernel@...>, <trivial@...>
Date: Tuesday, April 29, 2008 - 6:12 pm

Hi Alan,

On Tue, Apr 29, 2008 at 11:34:10AM +0100, Alan Cox wrote:

No I'm not using anything deliberately misconfigured. I'm trying to explain
that on the opposite, any tool which has not been explicitly adapted to those
new usages is impacted.


Please, I'm not "deliberately" setting my tools *not* to support unicode.
I have tools which have worked for years and which are now asked to behave
strangely.


OK I could reproduce the case without ever involving either a shell or
readline or anything. Using "cat" as the init program exhibited the
anomaly, though it was not much easy to analyze. Then I switched to
"init=od -An -tx1 -".

1) if I enter "A" then press backspace, I get nothing. Pressing enter 16
   times flushes the line buffer and "od" prints 16 times "0a", indicating
   nothing was remaining in the buffer.

2) if I enter Ctrl-V Ctrl-A, my display prints "^A", and when I press
   backspace, I correctly get the cursor back two chars. Once again,
   flushing the buffer with enter shows it was empty.

3) if I enter Alt-196, I get a "Ä". Flushing the buffer shows that od
   got two bytes: c3 84.

4) now if I enter Alt-196 and press backspace, my "Ä" is removed by the
   backspace, but only the second byte is flushed from the line buffer.
   Then, if I press enter 15 times, I get a line with c3 0a 0a 0a ...
   And there is no user-land involved here.

I'm really hoping you better understand the problem now. Pressing backspace
to fix input does not correct the input with multi-byte chars, it leaves
incomplete start sequences. If I press Alt-1111111, then backspace, I get
f4 8f 91 0a 0a 0a 0a because it is f4 8f 91 87 minus one byte.

Of course, pressing Backspace multiple times removes them all, but it also
removes previous characters on the display.

Another experience :

I press 01234, then Alt-255, Backspace, then 56789. On the display, I have
0123456789. od gets 30 31 32 33 34 c3 35 36 37 38 39.

Now if I want to correctly fix the input, I have to press backspace twice,
but then I have to make the '4' disappear from my display, while knowing it
still remains in the buffer. And indeed, my display shows "012356789" but
od sees 30 31 32 33 34 35 36 37 38 39.

And this is without anything on the user-land (except 'od'), just plain
stupid text console booted with "init=..."

So obviously there is something broken as the data fed into stdin does not
match what is displayed for multi-byte characters.

Hoping this clarifies the situation,
Willy

--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[2.6 patch] UTF-8 fixes in comments, Adrian Bunk, (Mon Apr 28, 11:40 am)
Re: [2.6 patch] UTF-8 fixes in comments, KOSAKI Motohiro, (Tue Apr 29, 8:18 am)
Re: [2.6 patch] UTF-8 fixes in comments, Willy Tarreau, (Mon Apr 28, 7:05 pm)
Re: [2.6 patch] UTF-8 fixes in comments, Alan Cox, (Tue Apr 29, 5:01 am)
Re: [2.6 patch] UTF-8 fixes in comments, Willy Tarreau, (Tue Apr 29, 5:34 am)
Re: [2.6 patch] UTF-8 fixes in comments, Alan Cox, (Tue Apr 29, 5:41 am)
Re: [2.6 patch] UTF-8 fixes in comments, Jan Engelhardt, (Tue Apr 29, 5:19 am)
Re: [2.6 patch] UTF-8 fixes in comments, H. Peter Anvin, (Mon Apr 28, 9:29 pm)
Re: [2.6 patch] UTF-8 fixes in comments, Willy Tarreau, (Tue Apr 29, 1:06 am)
Re: [2.6 patch] UTF-8 fixes in comments, David Kågedal, (Fri May 9, 8:48 am)
Re: [2.6 patch] UTF-8 fixes in comments, Adrian Bunk, (Tue Apr 29, 3:29 am)
Re: [2.6 patch] UTF-8 fixes in comments, Willy Tarreau, (Tue Apr 29, 4:14 am)
Re: [2.6 patch] UTF-8 fixes in comments, H. Peter Anvin, (Tue Apr 29, 3:31 pm)
Re: [2.6 patch] UTF-8 fixes in comments, Willy Tarreau, (Tue Apr 29, 4:05 pm)
Re: [2.6 patch] UTF-8 fixes in comments, H. Peter Anvin, (Tue Apr 29, 4:09 pm)
Re: [2.6 patch] UTF-8 fixes in comments, Adrian Bunk, (Tue Apr 29, 5:43 am)
Re: [2.6 patch] UTF-8 fixes in comments, Helge Hafting, (Tue Apr 29, 5:06 am)
Re: [2.6 patch] UTF-8 fixes in comments, Willy Tarreau, (Tue Apr 29, 6:09 am)
Re: [2.6 patch] UTF-8 fixes in comments, Helge Hafting, (Wed Apr 30, 5:15 am)
Re: [2.6 patch] UTF-8 fixes in comments, H. Peter Anvin, (Wed Apr 30, 3:42 pm)
Re: [2.6 patch] UTF-8 fixes in comments, Adrian Bunk, (Wed Apr 30, 3:22 pm)
Re: [2.6 patch] UTF-8 fixes in comments, Adrian Bunk, (Tue Apr 29, 6:42 am)
Re: [2.6 patch] UTF-8 fixes in comments, Willy Tarreau, (Tue Apr 29, 7:06 am)
Re: [2.6 patch] UTF-8 fixes in comments, Adrian Bunk, (Tue Apr 29, 7:27 am)
Re: [2.6 patch] UTF-8 fixes in comments, Adrian Bunk, (Tue Apr 29, 7:32 am)
Re: [2.6 patch] UTF-8 fixes in comments, Jeremy Fitzhardinge, (Tue Apr 29, 4:18 pm)
Re: [2.6 patch] UTF-8 fixes in comments, Alan Cox, (Tue Apr 29, 6:10 am)
Re: [2.6 patch] UTF-8 fixes in comments, H. Peter Anvin, (Tue Apr 29, 3:33 pm)
Re: [2.6 patch] UTF-8 fixes in comments, Willy Tarreau, (Tue Apr 29, 6:33 am)
Re: [2.6 patch] UTF-8 fixes in comments, Alexander E. Patrakov, (Thu May 1, 5:46 am)
Re: [2.6 patch] UTF-8 fixes in comments, Alan Cox, (Tue Apr 29, 6:34 am)
Re: [2.6 patch] UTF-8 fixes in comments, Willy Tarreau, (Tue Apr 29, 6:12 pm)
Re: [2.6 patch] UTF-8 fixes in comments, Alan Cox, (Tue Apr 29, 6:15 pm)
Re: [2.6 patch] UTF-8 fixes in comments, Willy Tarreau, (Tue Apr 29, 7:05 pm)
Re: [2.6 patch] UTF-8 fixes in comments, H. Peter Anvin, (Thu May 1, 4:18 pm)
Re: [2.6 patch] UTF-8 fixes in comments, Alan Cox, (Tue Apr 29, 5:33 am)
Re: [2.6 patch] UTF-8 fixes in comments, H. Peter Anvin, (Tue Apr 29, 2:04 am)