On Fri, 18 Jan 2008, JM Ibanez wrote:But if you want to make it clear, you can use "encoded character" or yes, "code point". But the thing is, even the unicode standard tends to just say "character", and a unicode string (for example) is defined to be a sequence of "code units" which in turn is about those *encoded* characters, which is all about the code points. So you'll find that they are very careful in some technical definition parts to talk about "code points", but then in other sequences they talk about "character" even though they are referring to the actual code point (ie the figure literally has the unicode number in it!) In fact, they sometimes even talk about "characters" in the totally non-encoding meaning of "glyph". So yes, "character" is often ambiguous. It would be good to never use the word at all, and only talk about "code point" and "glyph" and one of the well-defined special terms like "combining character" or "replacement character". But to take a representative example from The Unicode Standard, Chapter 2: "Unicode Design Principles": Characters are represented by code points that reside only in a memory representation, as strings in memory, on disk, or in data transmission. The Unicode Standard deals only with character codes. (any speling mistakes mine). In other words, from the very beginning of the standard, very basic design principles chapter, it starts talking about characters being represented by code points and explicitly says that it really only deals with CHARACTER CODES. Yes, I'm sure you can argue ad infinitum that all the "equivalences" and other crap means that a "character" can sometimes mean just about anything, but I'd say that it's pretty damn reasonable to equate "unicode character" with "code point" or "character code". Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| Ondrej Zary | pata_it821x completely broken |
| Jeremy Fitzhardinge | [PATCH 02 of 36] x86: add memory clobber to save/loadsegment |
| Thomas Renninger | AMD Mobile Semprons (3500+, 3600+,...) break with nohz and highres enabled |
git: | |
| Linus Torvalds | People unaware of the importance of "git gc"? |
| Jakub Narebski | Octopus merge: unique (?) to git, but is it useful? |
| Junio C Hamano | [ANNOUNCE] GIT 1.5.3-rc4 |
| Theodore Tso | Re: git on MacOSX and files with decomposed utf-8 file names |
| qw er | OpenBSD sucks |
| Richard Stallman | Real men don't attack straw men |
| Henning Brauer | Re: About Xen: maybe a reiterative question but .. |
| Kevin Neff | Patching a SSH 'Weakness' |
| David Miller | [GIT]: Networking |
| Steve Wise | pktgen question |
| Jeff Garzik | Re: [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
| Waskiewicz Jr, Peter P | RE: [PATCH 2/3][NET_BATCH] net core use batching |
