On Jan 23, 2008, at 11:16 AM, Linus Torvalds wrote:Well yes, any context in which a string is treated as Unicode instead =20= of an opaque sequence of bytes will probably lead to normalization at =20= some point (e.g. when searching text, I'm going to want M=E4rchen and =20= M=E4rchen to be treated as the same string). The Mac OS X APIs use NFD, =20= and everybody else uses NFC, but either way it's still normalization. Why would the globbing libraries have to do anything special to =20 understand NFD? In fact, I prefer that they don't - it's very handy to =20= be able to type Ma* and have that match M=E4rchen, as the globbing =20 library sees Ma??rchen and is happy to match the ??rchen against *. =20 Were the filename in NFC, I couldn't do that. Similarly, Ma<tab> =20 autocompletes the name M=E4rchen for me. But the convenience is beside =20= the point - what I'm trying to show here is that if the globbing =20 library were NFD-aware, it probably would decide Ma* shouldn't match =20 M=E4rchen, right? I assume globbing libraries et al don't do UTF-8 hackery in Linux, =20 right? And yet using NFC-encoded filenames is fairly common? So why =20 should it be any different on OS X, especially since HFS+ isn't the =20 only option here (and thus doing NFD conversion in the library would =20 mess up other filesystems)? In fact, probably the biggest reason the NFD-encoding was done at the =20= HFS+ level is because they simply couldn't trust user-level libraries =20= to always do the NFD conversion for pathnames. And I quote: "I would prefer that case sensitivity and unicode normalization were =20 not the responsibility of the file system -- but I realize that we =20 cannot just ignore the problem and let the other layers sort it all =20 out." I don't get why you're still calling it corruption when, on an HFS+ =20 system, NFD-encoding is correct. It would be corruption for HFS+ to =20 write anything else but NFD. There's no reason to assume that OS X is actually storing the NFD on =20 the volume. In fact, it's quite explicitly not: "As far as storing exactly what was passed in, its not just HFS =20 that's involved her. In Mac OS X, SMB, MSDOS, UDF, ISO 9660 =20 (Joliet), NTFS and ZFS file systems all store in one form -- NFC. We =20= store in NFC since that what is expected for these files systems. If =20= we were to allow KFD to pass through, it would cause problems when =20 these names were accessed outside of Mac OS X. So this is not just an =20= HFS issue but an interchange issue for Mac OS X. We have the legacy =20 NFD use/expectation in our applications and we chose not to ignore the =20= problem but make a conscience effort to have the appropriate form used =20= (NFD in Mac OS X APIs, NFC elsewhere). Its not perfect but neither is =20= the agnostic approach where both forms can be used and you can have =20 duplicate filenames in your file system." -Kevin Ballard --=20 Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
| Andrew Morton | Re: -mm merge plans for 2.6.23 -- sys_fallocate |
| Michael Opdenacker | [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c compiling |
git: | |
| David Miller | Re: [GIT]: Networking |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Andrew Morton | Re: [BUG] New Kernel Bugs |
