login
Header Space

 
 

Re: git on MacOSX and files with decomposed utf-8 file names

Score:
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Mike Hommey <mh@...>
Cc: Kevin Ballard <kevin@...>, Linus Torvalds <torvalds@...>, <git@...>
Date: Wednesday, January 23, 2008 - 9:38 am

Here's a reliable test case to test filename normalization on Mac OS.

------ cut here -------
cat > test.pl << EOF
#!/usr/bin/perl -CO
print "M".pack("U",0x00E4)."rchen\n";
print "Ma".pack("U",0x0308)."rchen\n";
EOF
chmod +x test.pl
./test.pl | xargs touch
echo M* | xxd -g1
------ cut here -------

On an NFS mounted filesystem, what you will get is this:

0000000: 4d 61 cc 88 72 63 68 65 6e 20 4d c3 a4 72 63 68  Ma..rchen M..rch
0000010: 65 6e 0a                                         en.

and on an HFS+ mounted filesystem, what you will get is this:

0000000: 4d 61 cc 88 72 63 68 65 6e 0a                    Ma..rchen.

So this demonstrates that on my MacOS 10.4.11 system, on NFS, MacOS is
doing no normalization, as it is creating two files.  On HFS+, MacOS
is mapping both filenames to the same decomposed name.

More (or not) surprisingly, given Kevin Ballard's "reliable source":

  "In Mac OS X,  SMB, MSDOS, UDF, ISO 9660 (Joliet), NTFS and ZFS file
  systems all store in one form -- NFC.  We store in NFC since that what is
  expected for these files systems."

Using a Sony Reader (which uses an internal FAT filesystem) hooked up
to a MacOS 10.4.11 system:

% /fs/u1/tmp/test.pl  | xargs touch
% echo M* | xxd -g1
0000000: 4d 61 cc 88 72 63 68 65 6e 0a                    Ma..rchen.

.. which is the decomposed form.  So it looks like on FAT/MSDOS
filesystems MacOS 10.4.11 normalizes files to NFD, which will *not* do
the right thing as far as Windows compatibility is concerned on USB
sticks, et. al.  Mac OS users would be well advised not to use
non-ASCII names in their filesystems if they care about interoperating
with other systems.  :-P

							- Ted
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: git on MacOSX and files with decomposed utf-8 file names, Theodore Tso, (Wed Jan 23, 9:38 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Wed Jan 23, 12:16 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Wed Jan 23, 7:37 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Jonathan del Strother, (Wed Jan 23, 5:02 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Tue Jan 22, 9:27 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Tue Jan 22, 9:14 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Tue Jan 22, 9:47 pm)
speck-geostationary