On Wed, 3 Oct 2007, Linus Torvalds wrote:Btw, this definitely doesn't happen for me, either on x86-64 or plain x86. The x86 thing I tested was Fedora 8 testing (ie not even some stable setup), so I wonder what experimental compiler you have. Your compiler generates movl -16(%ebp),%edx movl (%edx),%edi /* this is _totally_ bogus! */ incl %edx movl %edx,-16(%ebp) movl %edi,%ecx testb %cl,%cl je ... while I get (gcc version 4.1.2 20070925 (Red Hat 4.1.2-28)): movl -16(%ebp), %eax # p, movzbl (%eax), %edi #, c /* not bogus! */ movl %edi, %edx # c, testb %dl, %dl # je .L64 #, incl %eax # movsbl %dl,%ebx #, D.12414 movl %eax, -16(%ebp) #, p where the difference (apart from doing the increment differently and different register allocation) is that I have a "movzbl" (correct), while you have a "movl" (pure and utter crap). I *suspect* that the compiler bug is along the lines of: (a) start off with movzbl (b) notice that the higher bits don't matter, because nobody subsequently uses them (c) turn the thing into just a byte move. (d) make the totally incorrect optimization of using a full 32-bit move in order to avoid a partial register access stall and the thing is, that final optimization can actually speed things up (although it can also slow things down for any access that crosses a cache sector boundary - 8/16 bytes), but it's seriously bogus, exactly because it can cause an invalid access to the three next bytes that may not even exist. Linus -
| Ingo Molnar | Re: x86: 4kstacks default |
| Stephen Rothwell | Re: Announce: Linux-next (Or Andrew's dream :-)) |
| Trent Piepho | [PATCH] [POWERPC] Improve (in|out)_beXX() asm code |
| Rafael J. Wysocki | [Bug #10919] [regression] display dimming is slow and laggy - Acer Travelmate 661lci |
git: | |
| Linus Torvalds | Re: iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49 |
| Andrew Morton | Re: [BUG] New Kernel Bugs |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
