On Jan 21, 2008, at 2:41 PM, Linus Torvalds wrote:I believe I already responded to the issue of hashing. In summary, just re-define your hash function to convert the string to a specific encoding. Sure, you'll lose some speed, but we're already assuming that it's worth taking a speed hit in order to treat filenames as strings (please don't argue this point, it's an opinion, not a factual statement, and I'm not necessarily saying I agree with it, I'm just saying it's valid). Perhaps that is the reason, I don't know (neither do you, you're just guessing). However, my point still stands - as long as the string stays canonically equivalent, it doesn't matter to me if the filesystem changes the encoding, since I'm working at the string level. Someone has to look at the octets, but it doesn't have to be me. As long as I use unicode-aware libraries and such, I can let the underlying system care about the byte order and my code will be clean. It does? Why on earth should it do that? Filename doesn't contribute to the listed filesize on OS X. kevin@KBLAPTOP:~> echo foo > foo; echo foo > foobar kevin@KBLAPTOP:~> ls -l foo* -rw-r--r-- 1 kevin kevin 4 Jan 21 14:50 foo -rw-r--r-- 1 kevin kevin 4 Jan 21 14:50 foobar It would be singularly stupid for the filesize to reflect the filename, especially since this means you would report different filesizes for hardlinks. Visible at some level, sure, but not visible at the level my code works on. And thus, I don't have to care about it. I'm not sure what you mean. The byte sequence is different from Latin1 to UTF-8 even if you use NFC, so I don't think, in this case, it makes any difference whether you use NFC or NFD. Yes, the codepoints are the same in Latin1 and UTF-8 if you use NFC, but that's hardly relevant. Please correct me if I'm wrong, but I believe Latin1->UTF-8->Latin1 conversion will always produce the same Latin1 text whether you use NFC or NFD. The only reason it's particularly inconvenient is because it's different from what most other systems picked. And if you want to blame someone for that, blame Unicode for having so many different normalization forms. -Kevin Ballard -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Willy Tarreau | Re: Linux 2.6.21 |
| Jan Kundrát | kswapd high CPU usage with no swap |
git: | |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| David Miller | Re: [PATCH] tcp: splice as many packets as possible at once |
