login
Header Space

 
 

Re: git on MacOSX and files with decomposed utf-8 file names

Score:
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Martin Langhoff <martin.langhoff@...>
Cc: Linus Torvalds <torvalds@...>, Jakub Narebski <jnareb@...>, Johannes Schindelin <Johannes.Schindelin@...>, Mark Junker <mjscod@...>, git@vger.kernel.org <git@...>
Date: Thursday, January 17, 2008 - 1:23 am

On Jan 16, 2008, at 11:51 PM, Martin Langhoff wrote:



I can imagine. However, I've never been hit by such a situation. This =20=

doesn't mean a case-insensitive filesystem is a problem per se, it =20
means interactions between a case-insensitive and a case-sensitive =20
filesystem can be a problem. That doesn't mean either way is "correct" =20=

it just means both don't work well together.

I like ice cream, and I like steak, but I sure don't think a mixture =20
of steak and ice cream would go well together. Do you?


Both of which would be replicating the directory contents, not a =20
listing of files specified by the user. If, as a user, I were to say =20
"please replicate file FOO" and the file was really called "foo", I =20
wouldn't be in the least surprised to see the tool take me at my word =20=

and produce a file called "FOO" with the contents of "foo". But in =20
general, things like this operate on the filesystem, not on the user =20
args.


If I say "track FOO", I probably mean it. So go ahead and track "FOO", =20=

even if you end up tracking the contents of file "foo". I certainly =20
won't blame the tool for doing what I told it.


Sure I do. I find it  very convenient, for example, to say "cd =20
documents/school" when I really want to go to "Documents/School". =20
Similarly, if I'm trying to reference gitweb/tests/M=C3=A4rchen, I'm =
quite =20
happy to not have to figure out what normalization the filename is =20
using and attempt to replicate that (especially as I have no idea =20
which normalization my input mechanism uses - unlike Linus, I don't =20
have a key dedicated to =C3=A4, and even if I did I wouldn't necessarily =
=20
expect it to use precomposed vs decomposed). I can't think of a single =20=

reason why I'd want to be able to have 2 different files named =20
"M=C3=A4rchen" on my disk. On the other hand, treating unicode =20
normalization as significant can pose security risks - how am I to =20
know that the file that is named "foo.txt" is really the same file =20
"foo.txt" that I last saw? Someone I know on IRC sent me this =20
image[1], which shows 6 files all apparently named "foo.txt" on a disk =20=

image. This is possible because on a case-sensitive HFS+ volume, the =20
file system doesn't ignore ignorables when comparing filenames (it =20
does on a case-insensitive HFS+ system), and so all of those filenames =20=

look identical up until you actually pipe their names through xxd and =20=

look at the byte sequence. When this sort of tomfoolery is possible, I =20=

simply cannot trust the names of any of my files anymore.

[1]: http://sailor=E6=9C=88.com/imgs/ignorable.png


Extra code? I don't think so. The only reason I'd need extra code is =20
if I were attempting to explicitly detect the "real" filename for a =20
user-supplied argument, by scanning the directory contents until I =20
found a file that was equivalent to the given argument. But there's no =20=

reason to do that. None of the code I've ever written, or any of the =20
code I've ever seen, has had to do any extra work because it was on a =20=

case-insensitive filesystem. I contribute to a packaging system for =20
the Mac called MacPorts, and I've never seen any patches on any of the =20=

4000+ ports to handle case insensitivity (granted, I haven't looked at =20=

every port, but I've looked at a significant fraction). It's a =20
complete non-issue.

The content of files is sacred. The filename is only there to provide =20=

a handle to locate the contents. I don't see any problem with =20
expanding the equivalency scope of the filename to accept multiple =20
encodings and cases. The only arguments I can see that have any =20
validity at all are the ones that sound like "we use case-sensitive =20
filesystems, and your case-insensitivity and normalization are causing =20=

problems with our tools! Conform to our world!". As I said above, this =20=

isn't a problem of case-insensitivity or normalization, it's a problem =20=

of interaction between two incompatible viewpoints. All I want to do =20
is make git play nicer in an HFS+ world, and this would be far easier =20=

if you guys were willing to admit this is a problem that should be =20
solved in the tool rather than a problem with the system.

-Kevin Ballard

--=20
Kevin Ballard
http://kevin.sb.org
kevin@sb.org
http://www.tildesoft.com
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 11:34 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Wed Jan 16, 7:03 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 12:32 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Jakub Narebski, (Wed Jan 16, 12:46 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 6:23 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 8:35 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Wed Jan 16, 8:54 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 9:08 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Thu Jan 17, 12:08 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Thu Jan 17, 6:08 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Thu Jan 17, 12:43 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Thu Jan 17, 6:09 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Robin Rosenberg, (Thu Jan 17, 9:27 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Peter Karlsson, (Mon Jan 21, 10:14 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Mon Jan 21, 5:06 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Mon Jan 21, 6:45 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Mon Jan 21, 10:50 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Mon Jan 21, 11:21 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Mon Jan 21, 11:17 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Mon Jan 21, 6:56 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Mon Jan 21, 5:17 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Mon Jan 21, 5:43 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Eric W. Biederman, (Tue Jan 22, 10:46 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Junio C Hamano, (Tue Jan 22, 10:57 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Fri Jan 18, 4:50 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Sat Jan 19, 8:11 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Sun Jan 20, 5:34 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Sat Jan 19, 6:58 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Sun Jan 20, 9:15 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Peter Karlsson, (Fri Jan 18, 11:30 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Robin Rosenberg, (Thu Jan 17, 9:05 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Robin Rosenberg, (Fri Jan 18, 5:42 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Peter Karlsson, (Fri Jan 18, 11:37 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Thu Jan 17, 2:18 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Thu Jan 17, 12:51 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Kevin Ballard, (Thu Jan 17, 1:23 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Thu Jan 17, 6:22 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Thu Jan 17, 11:57 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Robin Rosenberg, (Thu Jan 17, 8:44 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 8:33 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 8:57 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Eyvind Bernhardsen, (Wed Jan 16, 6:37 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Thu Jan 17, 6:28 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Thu Jan 17, 7:10 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Thu Jan 17, 9:05 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Thu Jan 17, 7:51 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Thu Jan 17, 8:53 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Thu Jan 17, 9:40 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Thu Jan 17, 7:46 am)
speck-geostationary