http://developer.apple.com/technotes/tn/tn1150.html#UnicodeSubtletieshttp://lists.limewire.org/pipermail/gui-dev/2003-January/001110.htmlhttp://osdir.com/ml/network.gnutella.limewire.core.devel/2003-01/msg00000.=
html
I just finished talking to one of the HFS+ developers, so I suspect I =20=
know a lot more on this subject now than you do. Here's some of the =20
relevant information:
* Any new characters added to Unicode will only have one form =20
(decomposed), so HFS+ will always accept new characters as they will =20
be NFD. The only exception is case-sensitivity, as the case-folding =20
tables in HFS+ are static, so new characters with case variants will =20
be treated in a case-sensitive manner. However, as they are already =20
decomposed, the NFD algorithm will not change their encoding. This =20
means that no, there are zero problems moving HFS+ drives between =20
versions of OS X.
* At the time HFS+ was developed, there was no one common standard for =20=
normalization. The HFS+ developers picked NFD because they thought it =20=
was "a more flexible, future-looking form", but Microsoft ended up =20
picking the opposite just a short time later. Interestingly, NFC is a =20=
weird hybrid form which only has composed forms for pre-existing =20
characters, and decomposed forms for all new characters (as they only =20=
have one form). So in a sense NFD is more sane then NFC.
* The core issue here, which is why you think HFS+ is so stupid, is =20
that you guys see no problem with having 2 files "M=E4rchen" (NFC) and =20=
"M=E4rchen" (NFD), whereas the HFS+ developers don't consider it =20
acceptable to have 2 visually identical names as independent files. =20
Unfortunately, the only way to do this matching is to store the =20
normalized form in the filesystem, because it would be a performance =20
nightmare to try and do this matching any other way. The HFS+ =20
developers considered it an acceptable trade-off, and as an =20
application developer I tend to agree with them.
As I have stated in the past, this isn't a case of HFS+ being stupid =20
and causing problems, it's a case of HFS+ being *different* and =20
causing problems. But this difference is just as much your fault as it =20=
is HFS+'s fault.
* For detecting case-sensitive filesystems you can use pathconf(2): =20
_PC_CASE_SENSITIVE (if unsupported, you can assume the filesystem is =20
case-sensitive). There is also the getattrlist(2) attribute: =20
VOL_CAP_FMT_CASE_SENSITIVE.
There appears to be no API for determining if normalization will be =20
applied. However, any filesystem that uses UTF-8 explicitly as storage =20=
(unlike the Linux filesystems, which you claim use UTF-8 but is =20
obviously you really use nothing at all) is pretty much guaranteed to =20=
have to normalize or it will have abysmal performance.
I must say it is shocking that someone as smart as you is still more =20
interested in finding ways to prove me wrong then to actually address =20=
the problem. It's obvious that the only research you did was intended =20=
to find ways to call me stupid.
-Kevin Ballard
--=20
Kevin Ballard
http://kevin.sb.orgkevin@sb.orghttp://www.tildesoft.com