On Jan 22, 2008, at 7:08 PM, Theodore Tso wrote:http://developer.apple.com/technotes/tn/tn1150.html#UnicodeSubtleties http://lists.limewire.org/pipermail/gui-dev/2003-January/001110.html http://osdir.com/ml/network.gnutella.limewire.core.devel/2003-01/msg00000.= html I just finished talking to one of the HFS+ developers, so I suspect I =20= know a lot more on this subject now than you do. Here's some of the =20 relevant information: * Any new characters added to Unicode will only have one form =20 (decomposed), so HFS+ will always accept new characters as they will =20 be NFD. The only exception is case-sensitivity, as the case-folding =20 tables in HFS+ are static, so new characters with case variants will =20 be treated in a case-sensitive manner. However, as they are already =20 decomposed, the NFD algorithm will not change their encoding. This =20 means that no, there are zero problems moving HFS+ drives between =20 versions of OS X. * At the time HFS+ was developed, there was no one common standard for =20= normalization. The HFS+ developers picked NFD because they thought it =20= was "a more flexible, future-looking form", but Microsoft ended up =20 picking the opposite just a short time later. Interestingly, NFC is a =20= weird hybrid form which only has composed forms for pre-existing =20 characters, and decomposed forms for all new characters (as they only =20= have one form). So in a sense NFD is more sane then NFC. * The core issue here, which is why you think HFS+ is so stupid, is =20 that you guys see no problem with having 2 files "M=E4rchen" (NFC) and =20= "M=E4rchen" (NFD), whereas the HFS+ developers don't consider it =20 acceptable to have 2 visually identical names as independent files. =20 Unfortunately, the only way to do this matching is to store the =20 normalized form in the filesystem, because it would be a performance =20 nightmare to try and do this matching any other way. The HFS+ =20 developers considered it an acceptable trade-off, and as an =20 application developer I tend to agree with them. As I have stated in the past, this isn't a case of HFS+ being stupid =20 and causing problems, it's a case of HFS+ being *different* and =20 causing problems. But this difference is just as much your fault as it =20= is HFS+'s fault. * For detecting case-sensitive filesystems you can use pathconf(2): =20 _PC_CASE_SENSITIVE (if unsupported, you can assume the filesystem is =20 case-sensitive). There is also the getattrlist(2) attribute: =20 VOL_CAP_FMT_CASE_SENSITIVE. There appears to be no API for determining if normalization will be =20 applied. However, any filesystem that uses UTF-8 explicitly as storage =20= (unlike the Linux filesystems, which you claim use UTF-8 but is =20 obviously you really use nothing at all) is pretty much guaranteed to =20= have to normalize or it will have abysmal performance. I must say it is shocking that someone as smart as you is still more =20 interested in finding ways to prove me wrong then to actually address =20= the problem. It's obvious that the only research you did was intended =20= to find ways to call me stupid. -Kevin Ballard --=20 Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com
| Pavel Machek | Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching |
| David Newall | Re: Slow DOWN, please!!! |
| Mark Weber | hdparm standby timeout not working for WD raptors? |
| Andrea Arcangeli | [PATCH 01 of 11] mmu-notifier-core |
git: | |
| David Kastrup | Empty directories... |
| linux | [DRAFT] Branching and merging with git |
| Peter Stahlir | Git as a filesystem |
| Junio C Hamano | Re: irc usage.. |
| Darrin Chandler | Re: bcw(4) is gone |
| Jacob Yocom-Piatt | Re: Real men don't attack straw men |
| Siju George | Re: Real men don't attack straw men |
| Ihar Hrachyshka | Re: That whole "Linux stealing our code" thing |
| YAMAMOTO Takashi | yamt-km branch |
| Martin Husemann | Re: iic(4) device discovery |
| Andrew Doran | Thread benchmarks, round 2 |
| Jonathan Stone | Re: fixing send(2) semantics (kern/29750) |
