Hi all,
when trying to understand how Bogomips are implemented I have found
bug in arch/i386/lib/delay.c file, delay_loop function
/* simple loop based delay: */
static void delay_loop(unsigned long loops)
{
int d0;
__asm__ __volatile__(
"\tjmp 1f\n"
".align 16\n"
"1:\tjmp 2f\n"
".align 16\n"
"2:\tdecl %0\n\tjns 2b"
:"=&a" (d0)
:"0" (loops));
}
The function fails for loops > 2^31+1. It because SF is set when dec
returns numbers > 2^31
The fix is to use jnz instruction instead of jns (and add one decl
instruction to the end to have exactly the same number of loops as in
original version):
__asm__ __volatile__(
"\tjmp 1f\n"
".align 16\n"
"1:\tjmp 2f\n"
".align 16\n"
"2:\tdecl %0\n\tjnz 2b\n"
"decl %0"
:"=&a" (d0)
:"0" (loops));
IMHO, d0 is not needed at all so that we can further simplify the code:
static void delay_loop(unsigned long loops)
__asm__ __volatile__(
"\tjmp 1f\n"
".align 16\n"
"1:\tjmp 2f\n"
".align 16\n"
"2:\tdecl %0\n\tjnz 2b\n"
"decl %0"
:/*we don't need output */
:"a" (loops));
}
I will attach three small C-program to test it
delay-orig.c - original loop from kernel source code
delay-fixed.c - fixed loop
delay-fixed1.c - fixed loop without d0 variable
Outputs:
============== delay-orig.c ==================
time delay-orig 2147483649
loops 2147483649
loops 2147483649
do -2147483648
real 0m0.002s
user 0m0.000s
sys 0m0.000s
================== delay-fixed.c =============
time delay-fixed 2147483649
loops 2147483649
loops 2147483649
do -1
real 0m1.025s
user 0m1.024s
sys 0m0.000s
========== delay-fixed1.c =====================
time delay-fixed1 2147483649
loops 2147483649
loops 2147483649
real 0m1.073s
user 0m1.060s
sys 0m0.004s
and update kernel source file arch/i386/lib/delay.c. Please let me
know if these modifications make sense.
Thanks a lot
Jiri
very interesting! Could you send a kernel patch for it please: diff -up arch/x86/lib/delay_32.c.orig arch/x86/lib/delay_32.c Ingo --
It is a long time since I have hacked that file, but you should definitely make sure that the function is never called with a zero argument. In such case, the original version made just a single pass, but your version makes 2^32 of them. Have a nice fortnight -- Martin `MJ' Mares <mj@ucw.cz> http://mj.ucw.cz/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth Homo homini lupus, frater fratri lupior, bohemus bohemo lupissimus. --
Hi Ingo, hi Martin! As Martin correctly pointed out, my code will fail when loops=0. So I have added a fix for this. The patch file is attached: diff -up arch/x86/lib/delay_32.c-orig arch/x86/lib/delay_32.c > delay_32.c_patch I'm attaching also both kernel source codes (delay_32.c-orig and delay_32.c) for reference. Thanks Jiri
applied to tip/x86/delay, thanks Jiri! If it passes testing it will show up in the v2.6.27 kernel. [ A few small patch format nits: the first hunk of the patch didnt apply, it was whitespace-damaged - i hand-merged that. The assembly had non-standard formatting - nowadays we try to format the assembly so that it looks nice in the .c file, not i the .s file. I fixed that too. Also, the best way is to send fixes in the format below, with a nice commit description, followed by a signed-off-by line. I fixed that as well. ] please double-check the end result. You can pick up tip/master via: http://people.redhat.com/mingo/tip.git/README do "git-log -1 e01b70ef" or "git-log arch/x86/lib/delay_32.c" to see your commit. Ingo -------------> commit e01b70ef3eb3080fecc35e15f68cd274c0a48163 Author: Jiri Hladky <hladky.jiri@googlemail.com> Date: Mon Jun 2 12:00:19 2008 +0200 x86: fix bug in arch/i386/lib/delay.c file, delay_loop function when trying to understand how Bogomips are implemented I have found a bug in arch/i386/lib/delay.c file, delay_loop function. The function fails for loops > 2^31+1. It because SF is set when dec returns numbers > 2^31. The fix is to use jnz instruction instead of jns (and add one decl instruction to the end to have exactly the same number of loops as in original version). Martin Mares observed: > It is a long time since I have hacked that file, but you should definitely > make sure that the function is never called with a zero argument. In such > case, the original version made just a single pass, but your version > makes 2^32 of them. fixed that. Signed-off-by: Ingo Molnar <mingo@elte.hu> diff --git a/arch/x86/lib/delay_32.c b/arch/x86/lib/delay_32.c index d710f2d..ef69131 100644 --- a/arch/x86/lib/delay_32.c +++ b/arch/x86/lib/delay_32.c @@ -3,6 +3,7 @@ * * Copyright (C) 1993 Linus Torvalds ...
