On Jan 22, 2008 10:42 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote:*thanks* for these notes. Very useful, and... ... I find the above amusing -- different worlds we live in. Programming webapps means that 90% of the code deals with a bit of metaprogramming (with lots of string manipulation) to talk SQL to a backend, and then doing lots of string manipulation on the data the DB returns, which ends up in humongous strings of goop otherwise known as HTML+CSS+JS. After waiting for the DB to return data, over 50% of cpu time is spent in regexes, concatenations, counting words, array ops, etc. So it is pretty significant. So now I have to worry about cost and correctness of stuff that I took for granted in the pre-unicode days - strtolower() can be quite expensive and... buggy! But that's mainly due to Unicode, not UTF8. I think the only slowdown I can pin on UTF-8 is in counting chars, and probably slower regexes. Not that I deal with the C implementation of any of this stuff -- and so happy about it! ;-) </offtopic> (...) I had a few issues with Perl v5.6's utf-8 handling that wasn't binary safe (fread() to a fixed-length buffer would break the input if a unicode char landed across the boundary - ouch!) -- made me think that you couldn't do this in binary safe ways. So I tend to tell Perl to treatfiles as binary, and switch to utf-8 in specially chosen spots. I suspect that 5.8 is a bit saner about this, but I'm not taking chances. cheers, martin - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| Stephane Jourdois | Re: 2.6.21-rc4-mm1 [PATCH] init/missing_syscalls.h fix |
| David Brown | Re: Linux 2.6.21-rc2 |
| Andi Kleen | [PATCH] [1/12] x86: Work around mmio config space quirk on AMD Fam10h |
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| David Miller | Re: [GIT]: Networking |
| David Woodhouse | Re: [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
git: | |
