login
Header Space

 
 

Re: git on MacOSX and files with decomposed utf-8 file names

Score:
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Linus Torvalds <torvalds@...>
Cc: Peter Karlsson <peter@...>, Mark Junker <mjscod@...>, Pedro Melo <melo@...>, git@vger.kernel.org <git@...>
Date: Monday, January 21, 2008 - 3:05 pm

On Jan 21, 2008, at 1:12 PM, Linus Torvalds wrote:



I could say the same thing about you.




I'm not saying it's forced on you, I'm saying when you treat filenames =20=

as text, it DOESN'T MATTER if the string gets normalized. As long as =20
the string remains equivalent, YOU DON'T CARE about the underlying =20
byte stream.



Alright, fine. I'm not saying HFS+ is right in storing the normalized =20=

version, but I do believe the authors of HFS+ must have had a reason =20
to do that, and I also believe that it shouldn't make any difference =20
to me since it remains equivalent.


Sure it does. Normalizing a string produces an equivalent string, and =20=

so unless I look at the octets the two strings are, for all intents =20
and purposes, the same.



You're right, but it doesn't have to treat it as a binary stream at =20
the level I care about. I mean, no matter what you do at some level =20
the string is evaluated as a binary stream. For our purposes, just =20
redefine the hashing algorithm to hash all equivalent strings the =20
same, and you can implement that by using SHA1 on a particular =20
encoding of the string.



Decomposing and recomposing shouldn't lose any information we care =20
about - when treating filenames as text, a<COMBINING DIARESIS> and <A =20=

WITH DIARESIS> are equivalent, and thus no distinction is made between =20=

them. I'm not sure what other information you might be considering =20
lost in this case.



I don't believe you. See below.

does =20

When have I ever said that Unicode meant Forced normalization?




Wrong.





Wrong. '\x61\x18' in Latin1, when converted to UTF-8 (NFD) is still =20
'\x61\xc2\xa8'. You're mixing up DIARESIS (U+00A8) and COMBINING =20
DIARESIS (U+0308).

I suspect this is why you've been yelling so much - you have a =20
fundamental misunderstanding about what normalization is actually doing.





See above as to why you're not losing the information you so fervently =20=

believe you are.


People who insult others run the risk of looking like a fool when =20
shown to be wrong.

 =20
in

Sure, it all depends on what level you need to evaluate text. If we're =20=

talking about english paragraphs, then whitespace can be messed with. =20=

When we're talking about unicode strings, then specific encoding can =20
be messed with. When talking about byte sequence, nothing can be =20
messed with.

In our case, when working on an HFS+ filesystem all you have to care =20
about is the unicode string level. The specific encoding can be messed =20=

with, and the client shouldn't care. Problems only arise when =20
attempting to interoperate with filesystems that work at the byte =20
sequence level.

The only information you lose when doing canonical normalization is =20
what the original byte sequence was. Sure, this is a problem when =20
working on a filesystem that cares about byte sequence, but it's not a =20=

problem when working on a filesystem that cares about the unicode =20
string.

-Kevin Ballard

--=20
Kevin Ballard
http://kevin.sb.org
kevin@sb.org
http://www.tildesoft.com
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 11:34 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Wed Jan 16, 7:03 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 12:32 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Jakub Narebski, (Wed Jan 16, 12:46 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 6:23 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 8:35 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Wed Jan 16, 8:54 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 9:08 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Thu Jan 17, 12:08 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Thu Jan 17, 6:08 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Thu Jan 17, 12:43 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Thu Jan 17, 6:09 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Robin Rosenberg, (Thu Jan 17, 9:27 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Peter Karlsson, (Mon Jan 21, 10:14 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Mon Jan 21, 5:06 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Mon Jan 21, 6:45 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Kevin Ballard, (Mon Jan 21, 3:05 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Mon Jan 21, 10:50 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Mon Jan 21, 11:21 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Mon Jan 21, 11:17 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Mon Jan 21, 6:56 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Mon Jan 21, 5:17 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Mon Jan 21, 5:43 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Eric W. Biederman, (Tue Jan 22, 10:46 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Junio C Hamano, (Tue Jan 22, 10:57 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Fri Jan 18, 4:50 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Sat Jan 19, 8:11 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Sun Jan 20, 5:34 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Sat Jan 19, 6:58 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Sun Jan 20, 9:15 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Peter Karlsson, (Fri Jan 18, 11:30 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Robin Rosenberg, (Thu Jan 17, 9:05 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Robin Rosenberg, (Fri Jan 18, 5:42 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Peter Karlsson, (Fri Jan 18, 11:37 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Thu Jan 17, 2:18 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Thu Jan 17, 12:51 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Thu Jan 17, 6:22 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Thu Jan 17, 11:57 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Robin Rosenberg, (Thu Jan 17, 8:44 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 8:33 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 8:57 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Eyvind Bernhardsen, (Wed Jan 16, 6:37 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Thu Jan 17, 6:28 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Thu Jan 17, 7:10 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Thu Jan 17, 9:05 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Thu Jan 17, 7:51 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Thu Jan 17, 8:53 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Thu Jan 17, 9:40 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Thu Jan 17, 7:46 am)
speck-geostationary