Re: UTF-8 and Alt key in the console

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: H. Peter Anvin <hpa@...>
Cc: David Newall <davidn@...>, John T. <j.thomast@...>, <linux-kernel@...>
Date: Tuesday, April 1, 2008 - 4:13 pm

On Saturday 2008-03-29 18:05, H. Peter Anvin wrote:

No backwards searching, just forwards.

In UTF-8 this is simple. You know you are in a character when the highest
two bits are 10, and you can skip bytes until the start of the next
character, whose highest bits are either 00 or 11.

With the VTxxx escape codes, this is hardly possible. Given a broken
code of ^[43m,

 	echo -e '\x1B[43m wonderful \x1B[0m' | cosmicrays | cat

 	3m wonderful ^[[0m

There is no way to check whether you are in the escape code. And there
is no way to find its end. If a heuristic were to be used (which is
certainly a possibility), you would end up killing text up until the
next ^[.

Hence the proposal of using definite start and end markers:

 	echo -e '\x1B43m\x1D wonderful \x1B0m\x1D' | cosmicrays | cat

 	3m^] wonderful ^[0m^]

Ok, finding out whether we are in an escape code is not as easy as with
UTF-8 (the latter of which looks at the current character only), but
still very viable.
Prerequisite to this simple model is that the user does not use an
overly long dumb escape sequence like ^[[43;43;43;43;43;43m, i.e.
that the end marker is in the buffer if we really are in an escape
sequence:

 	static bool in_an_escape_seq(const char *buf)
 	{
 		const char *e = strchr(buf, 0x1D);
 		return e != NULL && e < strchr(buf, 0x1B);
 	}

If so, skipping parts of a faulty write() is easy:

 	static const char *get_out_of_esc(const char *buf)
 	{
 		if (in_an_escape_seq(buf))
 			return strchr(buf, 0x1D) + 1;
 		else
 			return buf;
 	}


-- 
make boldconfig -- to boldly select what no one has selected before
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
UTF-8 and Alt key in the console, John T., (Sun Mar 23, 11:15 am)
Re: UTF-8 and Alt key in the console, Jan Engelhardt, (Sun Mar 23, 11:29 am)
Re: UTF-8 and Alt key in the console, John T., (Sun Mar 23, 11:46 am)
Re: UTF-8 and Alt key in the console, H. Peter Anvin, (Sun Mar 23, 12:54 pm)
Re: UTF-8 and Alt key in the console, John T., (Sun Mar 23, 1:47 pm)
Re: UTF-8 and Alt key in the console, H. Peter Anvin, (Sun Mar 23, 1:55 pm)
Re: UTF-8 and Alt key in the console, John T., (Sun Mar 23, 2:13 pm)
Re: UTF-8 and Alt key in the console, Jan Engelhardt, (Sun Mar 23, 2:46 pm)
Re: UTF-8 and Alt key in the console, H. Peter Anvin, (Fri Mar 28, 7:26 pm)
Re: UTF-8 and Alt key in the console, Marko Macek, (Sun Apr 6, 4:46 am)
Re: UTF-8 and Alt key in the console, H. Peter Anvin, (Sun Apr 6, 12:37 pm)
Re: UTF-8 and Alt key in the console, David Newall, (Sun Apr 6, 6:14 am)
Re: UTF-8 and Alt key in the console, Jan Engelhardt, (Fri Mar 28, 8:07 pm)
Re: UTF-8 and Alt key in the console, H. Peter Anvin, (Fri Mar 28, 8:23 pm)
Re: UTF-8 and Alt key in the console, Jan Engelhardt, (Fri Mar 28, 8:44 pm)
Re: UTF-8 and Alt key in the console, David Newall, (Sat Mar 29, 2:33 am)
Re: UTF-8 and Alt key in the console, H. Peter Anvin, (Sat Mar 29, 1:05 pm)
Re: UTF-8 and Alt key in the console, Jan Engelhardt, (Tue Apr 1, 4:13 pm)
Re: UTF-8 and Alt key in the console, David Newall, (Tue Apr 1, 8:02 pm)
Re: UTF-8 and Alt key in the console, H. Peter Anvin, (Tue Apr 1, 8:38 pm)
Re: UTF-8 and Alt key in the console, H. Peter Anvin, (Tue Apr 1, 4:22 pm)
Re: UTF-8 and Alt key in the console, H. Peter Anvin, (Fri Mar 28, 9:07 pm)