On Jan 23, 2008, at 11:16 AM, Linus Torvalds wrote:Well yes, any context in which a string is treated as Unicode instead =20= of an opaque sequence of bytes will probably lead to normalization at =20= some point (e.g. when searching text, I'm going to want M=E4rchen and =20= M=E4rchen to be treated as the same string). The Mac OS X APIs use NFD, =20= and everybody else uses NFC, but either way it's still normalization. Why would the globbing libraries have to do anything special to =20 understand NFD? In fact, I prefer that they don't - it's very handy to =20= be able to type Ma* and have that match M=E4rchen, as the globbing =20 library sees Ma??rchen and is happy to match the ??rchen against *. =20 Were the filename in NFC, I couldn't do that. Similarly, Ma<tab> =20 autocompletes the name M=E4rchen for me. But the convenience is beside =20= the point - what I'm trying to show here is that if the globbing =20 library were NFD-aware, it probably would decide Ma* shouldn't match =20 M=E4rchen, right? I assume globbing libraries et al don't do UTF-8 hackery in Linux, =20 right? And yet using NFC-encoded filenames is fairly common? So why =20 should it be any different on OS X, especially since HFS+ isn't the =20 only option here (and thus doing NFD conversion in the library would =20 mess up other filesystems)? In fact, probably the biggest reason the NFD-encoding was done at the =20= HFS+ level is because they simply couldn't trust user-level libraries =20= to always do the NFD conversion for pathnames. And I quote: "I would prefer that case sensitivity and unicode normalization were =20 not the responsibility of the file system -- but I realize that we =20 cannot just ignore the problem and let the other layers sort it all =20 out." I don't get why you're still calling it corruption when, on an HFS+ =20 system, NFD-encoding is correct. It would be corruption for HFS+ to =20 write anything else but NFD. There's no reason to assume that OS X is actually storing the NFD on =20 the volume. In fact, it's quite explicitly not: "As far as storing exactly what was passed in, its not just HFS =20 that's involved her. In Mac OS X, SMB, MSDOS, UDF, ISO 9660 =20 (Joliet), NTFS and ZFS file systems all store in one form -- NFC. We =20= store in NFC since that what is expected for these files systems. If =20= we were to allow KFD to pass through, it would cause problems when =20 these names were accessed outside of Mac OS X. So this is not just an =20= HFS issue but an interchange issue for Mac OS X. We have the legacy =20 NFD use/expectation in our applications and we chose not to ignore the =20= problem but make a conscience effort to have the appropriate form used =20= (NFD in Mac OS X APIs, NFC elsewhere). Its not perfect but neither is =20= the agnostic approach where both forms can be used and you can have =20 duplicate filenames in your file system." -Kevin Ballard --=20 Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com
| David Miller | [GIT]: Networking |
| Fred . | Please add ZFS support (from GPL sources) |
| Linus Torvalds | Linux 2.6.26-rc4 |
| Jan Engelhardt | Re: why does x86 "make defconfig" build a single, lonely module? |
git: | |
| Jörg Sommer | [PATCH 2/4] Rework redo_merge |
| Matthieu Moy | git push to a non-bare repository |
| Michael Dressel | git merge --no-commit <branch>; does commit |
| Joakim Tjernlund | [FEATURE REQUEST] git clone, just clone selected branches? |
| Daniel Ouellet | identifying sparse files and get ride of them trick available? |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Unix Fan | Re: Vulnerability Note VU#800113 - Multiple DNS implementations vulnerable to cach... |
| Ihar Hrachyshka | Re: That whole "Linux stealing our code" thing |
| Daniel Brewer | Re: fsync performance hit on 1.6.1 |
| YAMAMOTO Takashi | yamt-km branch |
| der Mouse | Re: mjf-devfs2 branch |
| Ian Zagorskih | POSIX timer_settime() dosn't set timer in some cases (lost accuracy) |
