On Tue, Jan 22, 2008 at 07:38:04PM -0500, Kevin Ballard wrote:Except there *are* problems, because this promise doesn't apply to Unicode 2.1 (Mac OS 10.2 and before) and Unicode 3.2 (Mac OS 10.3 and above). And there were changes between the normalization algorithm between Unicode 3.2 and the Unicode version 4.1. So taking a hard drive between Mac OS X 10.2 and 10.3 *will* cause problems. The guarantees of Unicode stability didn't come until well past Unicode 2.1. Also, I know of no guarantee that there will be no more new compositions. According to Unicode Stnadard Annex #15 (http://unicode.org/reports/tr15/), new characters that can be decomposed are strongly discouraged, but "It would be possible to add more compositions in a future version of Unicode". Got a reference to back up your claim that there will never be any more? NFC is better if you care about compatibility with existing legacy character sets, where you want round-trip conversions to be idempotent. On the other hand, given that Mac OS has historically never cared about being compatible with the rest of the world, it makes sense that it would choose NFD. Yep. No problems to do that. You seem to think that supporting Unicode requires imposing this constraint, but that's simply not true, except maybe in some kind of religious sense. Nope. They were just not clever enough. If they use a hashed key for their b-tree and used a hash which had the property that two strings that were equivalent in the Unicode sense have the same hash value, it's quite possible to do Unicode-equivalence lookups quickly. Yeah, calculating the hash algorithm takes a bit amount of time, but it gets called no more than the normalization routine, and its performance overhead is no worse than the normalizing a string. I know how to do it in a Linux filesystem; it's just an insane thing to do, and so I choose not to do it. But it is doable; if you must persue the course of filesystem insanity, it's possible to do it in a performant way, without normalization; it's the same way that you can use b-tree lookups in a case insensitive way. No, I did the research to try to find the HFS-specific filename mangling algorithm. And given that's based on an back-level, old version of Unicode, you can't just use NFD algorithm from the latest Unicode spec. As I did that research, I came across the evidence that claims you had made (i.e., that HFS had never changed the Unicode version for its Normalization algorithm), was directly contradicted by the Apple TechNote. - Ted - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| David Miller | [GIT]: Networking |
| Fred . | Please add ZFS support (from GPL sources) |
| Krzysztof Halasa | [PATCH 0/3] Intel IXP4xx network drivers |
| Jon Ivar Rykkelid | sata_nv issues with MCP51 SATA controller |
git: | |
| Thomas Glanzmann | GIT Packages for Debian Etch |
| Paolo Ciarrocchi | UI and git-completion.sh |
| Shawn Pearce | Error writing loose object on Cygwin |
| Nicolas Pitre | Re: If you would write git from scratch now, what would you change? |
| Marco Peereboom | Re: Real men don't attack straw men |
| Brandon Lee | DELL PERC 5iR slow performance |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Marco Peereboom | Re: how to undelete? |
| Jim Winstead Jr. | Re: Root Disk/Book Disk Compatibility |
| Doug Evans | Re: Stabilizing Linux |
| Desmond A. Kirkpatrick | ATI GUP bug with Linux 'tickler' |
| H.J. Lu | Re: ksh has no 'up arrow' command recall |
