While reading through the HPET code I realized that the
computation of .mult variables could be done with less
lines of code, resulting in a 1.6% text size saving
for hpet.o
So I propose the following patch, which applies against
today's Linus -git tree.
From 0c6507e400e9ca5f7f14331e18f8c12baf75a9d3 Mon Sep 17 00:00:00 2001
From: Carlos R. Mafra <crmafra@ift.unesp.br>
Date: Mon, 5 May 2008 19:38:53 -0300
Subject: [PATCH] x86: Clean up computation of HPET .mult variables
The computation of clocksource_hpet.mult
tmp = (u64)hpet_period << HPET_SHIFT;
do_div(tmp, FSEC_PER_NSEC);
clocksource_hpet.mult = (u32)tmp;
can be streamlined if we note that it is equal to
clocksource_hpet.mult = div_sc(hpet_period, FSEC_PER_NSEC, HPET_SHIFT);
Furthermore, the computation of hpet_clockevent.mult
uint64_t hpet_freq;
hpet_freq = 1000000000000000ULL;
do_div(hpet_freq, hpet_period);
hpet_clockevent.mult = div_sc((unsigned long) hpet_freq,
NSEC_PER_SEC, hpet_clockevent.shift);
can also be streamlined with the observation that hpet_period and hpet_freq are
inverse to each other (in proper units).
So instead of computing hpet_freq and using (schematically)
div_sc(hpet_freq, 10^9, shift) we use the trick of calling with the
arguments in reverse order, div_sc(10^6, hpet_period, shift).
The different power of ten is due to frequency being in Hertz (1/sec)
and the period being in units of femtosecond. Explicitly,
mult = (hpet_freq * 2^shift)/10^9 (before)
mult = (10^6 * 2^shift)/hpet_period (after)
because hpet_freq = 10^15/hpet_period.
The comments in the code are also updated to reflect the changes.
As a result,
text data bss dec hex filename
2957 425 92 3474 d92 arch/x86/kernel/hpet.o
3006 425 92 3523 dc3 arch/x86/kernel/hpet.o.old
a 1.6% reduction in text size.
Signed-off-by: Carlos R. Mafra <...applied to x86.git for more testing, thanks Carlos. since you seem to be interested in HPET topics, what do you think about the patch below from akpm? It had a build failure with this config: http://redhat.com/~mingo/misc/config-Sun_May__4_09_41_21_CEST_2008.bad but the general cleanliness point Andrew raises is valid i think. Ingo -------------> From: Andrew Morton <akpm@linux-foundation.org> Should already be available via the hpet.h inclusion. Could go further, by defining the do-nothing stub in hpet.h as well, perhaps. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- drivers/char/rtc.c | 2 -- drivers/rtc/rtc-cmos.c | 1 - 2 files changed, 3 deletions(-) diff -puN drivers/char/rtc.c~rtc-remove-unneeded-declarations-of-hpet_rtc_interrupt drivers/char/rtc.c --- a/drivers/char/rtc.c~rtc-remove-unneeded-declarations-of-hpet_rtc_interrupt +++ a/drivers/char/rtc.c @@ -119,8 +119,6 @@ static irqreturn_t hpet_rtc_interrupt(in return 0; } #endif -#else -extern irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id); #endif /* diff -puN drivers/rtc/rtc-cmos.c~rtc-remove-unneeded-declarations-of-hpet_rtc_interrupt drivers/rtc/rtc-cmos.c --- a/drivers/rtc/rtc-cmos.c~rtc-remove-unneeded-declarations-of-hpet_rtc_interrupt +++ a/drivers/rtc/rtc-cmos.c @@ -52,7 +52,6 @@ #define hpet_rtc_timer_init() do { } while (0) #define hpet_register_irq_handler(h) 0 #define hpet_unregister_irq_handler(h) do { } while (0) -extern irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id); #endif struct cmos_rtc { _ --
I think I've found the problem. After fixing the rtc-cmos.o build your
.config produced yet another failure later on:
LD .tmp_vmlinux1
drivers/built-in.o: In function `v4l2_i2c_drv_attach_legacy':
tuner-core.c:(.text+0xc5a31): undefined reference to `v4l2_i2c_attach'
drivers/built-in.o: In function `tuner_command':
tuner-core.c:(.text+0xc6dc2): undefined reference to `v4l_printk_ioctl'
make: *** [.tmp_vmlinux1] Error 1
With the inclusion of hpet.h we don't have the "do-nothing stubs" for
the case !HPET_EMULATE_RTC as they are defined inside rtc.c
and rtc-cmos.c. However hpet_rtc_interrupt() is missing in those
stubs because it was removed with akpm's patch. So I added it.
My explanation for the build failure is as follows.
Your .config did not have HPET_EMULATE_RTC enabled but the code
in rtc-cmos.c wanted to use hpet_rtc_interrupt() anyways. But
looking in arch/x86/kernel/hpet.c this function is inside a
#ifdef CONFIG_HPET_EMULATE_RTC <-- line 465
hpet_rtc_interrupt() <--- line 679
#endif <--- line 716
so it should be used if and only if HPET_EMULATE_RTC is defined.
And the code in question which makes the build fail is inside a
if(is_hpet_enabled()) {
...
rtc_cmos_int_handler = hpet_rtc_interrupt;
...
}
and this would never be executed because with !HPET_EMULATE_RTC (as
in your .config) is_hpet_enabled() is defined to return 0.
So I think Andrew's patch should be the one below (I've folded
my correction into his patch, so I am Cc:ing him also), where my
modification is the adition inside the #ifndef CONFIG_HPET_EMULATE_RTC
I think one could do that too as a cleanup (to remove the do-nothing stubs
from rtc.c and rtc-cmos.c), but I am afraid to make a mistake which would
cause other build failures afterwards. So now I just wanted to understand
and fix the issue you suggested to me (for which I thank you).
Furthermore, the modification in rtc.c is ok because it was used only
if CONFIG_HPET_EMULATE_RTC is defined, but that can on...On Tue, 6 May 2008 15:51:25 -0300 I've lost the plot on this one. Could we please have a description of the problem which is being fixed? ie, the compiler (or linker?) output, and a description of why it is occurring? Thanks. --
dont have the build failure log around anymore, but the config is: http://redhat.com/~mingo/misc/config-Sun_May__4_09_41_21_CEST_2008.bad Ingo --
The build failure was this one
CC drivers/rtc/rtc-cmos.o
drivers/rtc/rtc-cmos.c: In function cmos_do_probe:
drivers/rtc/rtc-cmos.c:661: error: hpet_rtc_interrupt undeclared (first use in this function)
drivers/rtc/rtc-cmos.c:661: error: (Each undeclared identifier is reported only once
drivers/rtc/rtc-cmos.c:661: error: for each function it appears in.)
make[2]: *** [drivers/rtc/rtc-cmos.o] Error 1
make[1]: *** [drivers/rtc] Error 2
make: *** [drivers] Error 2
And that happened with Ingo's config above and your patch (copy & pasted) below,
------------->
From: Andrew Morton <akpm@linux-foundation.org>
Should already be available via the hpet.h inclusion.
Could go further, by defining the do-nothing stub in hpet.h as well, perhaps.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/char/rtc.c | 2 --
drivers/rtc/rtc-cmos.c | 1 -
2 files changed, 3 deletions(-)
diff -puN drivers/char/rtc.c~rtc-remove-unneeded-declarations-of-hpet_rtc_interrupt \
drivers/char/rtc.c
--- a/drivers/char/rtc.c~rtc-remove-unneeded-declarations-of-hpet_rtc_interrupt
+++ a/drivers/char/rtc.c
@@ -119,8 +119,6 @@ static irqreturn_t hpet_rtc_interrupt(in
return 0;
}
#endif
-#else
-extern irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id);
#endif
/*
diff -puN drivers/rtc/rtc-cmos.c~rtc-remove-unneeded-declarations-of-hpet_rtc_interrup \
t drivers/rtc/rtc-cmos.c
--- a/drivers/rtc/rtc-cmos.c~rtc-remove-unneeded-declarations-of-hpet_rtc_interrupt
+++ a/drivers/rtc/rtc-cmos.c
@@ -52,7 +52,6 @@
#define hpet_rtc_timer_init() do { } while (0)
#define hpet_register_irq_handler(h) 0
#define hpet_unregister_irq_handler(h) do { } while (0)
-extern irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id); <-----
#endif |
...yeah, that's a different, known problem with Kconfig structure of the media drivers. Ingo --
Heh, thanks! I will try hard to understand what is going on with --
There's helper functions that should be used called clocksource_hz2mult and one called clocksource_khz2mult. I think they're more accurate than using div_sc. Daniel --
Ok, but they take the frequency as the input while the "natural" variable we have is the period, because that's what we get from the hardware (at least for the HPET, if I understood it correctly). If I want to use clocksource_hz2mult then I have to do one more operation (to find the frequency) before calling it (and that's what the other part of the patch which you didn't quote is actually doing). So the savings in my patch is due to using the period directly, and not the frequency. That's what my idea was, so if you object then my attempt was a failure and should be forgotten :-) Or maybe I should create a clocksource_period2mult to replace clocksource_hz2mult and save the extra operation in more places too? --
The one concern I have is the rounding that is done in the clocksource_hz2mult(). The div_sc doesn't include it .. You could add a clocksource_period2mult(), that would help out any one later that has a period instead of hz .. Daniel --
Hmm, clocksource_period2mult() would be just a rename of div_sc(), see for example how clocksource_hpet.mult is computed with my patch: clocksource_hpet.mult = div_sc(hpet_period, FSEC_PER_NSEC, HPET_SHIFT); However, hpet_clockevent.mult would also require the exchange of the first two arguments, due to the different definition of 'mult' in clockchips.h and clocksource.h So I would like to ask if this different definition of mult variables in clockevent versus clocksource is intentional or not. And do you agree that your first suggestion of using clocksource_hz2mult makes the code a bit bigger due to the extra computation of the frequency? My patch saves 49 bytes, and I thought that being careful in the code comments would make this change a safe thing (because everyone will understand how the computation is done and that there is a difference in the definitions). --
clockevents convert nanoseconds to cycles, clocksources convert cycles I agree, but the size in this case takes a back seat to accuracy. Daniel --
So I wanted to check the accuracy of div_sc versus clocksource_hz2mult
for the computations in arch/x86/kernel/hpet.c, as you made me curious
about it.
First of all, note that hpet_clockevent.mult can not be computed
using clocksource_hz2mult (as you suggested in the first reply) due
to the different definitions for .mult in clockevents.h and
clocksource.h, so I will skip this one and discuss
clocksource_hpet.mult instead.
I applied the test patch below to implement the different ways of
computing it and this is what I got in the dmesg:
clocksource_hpet.mult orig = 292935555
clocksource_hpet.mult walker = 292935555
clocksource_hpet.mult my patch = 292935555 hpet_period = 69841279
so there is no difference at all at this precision level!
Out of curiosity I computed the "exact" value by hand and
got 292935555.9 so we can't even talk about "error" because
the difference is in the extra digit not considered with
the precision I used to print the results.
And if you look at the patch to do computation using
clocksource_hz2mult() you will see that it is uglier
than the simple call to div_sc(), as you have to
consider an extra variable hpet_freq and do an extra
do_div() together with an extra rounding.
So I think you will be ok with the patch now, right? :-)
Thanks,
Carlos
PS: I also computed the "exact" value for hpet_clockevent.mult
by hand and got
hpet_clockevent.mult = 61496114.58 (exact)
hpet_clockevent.mult = 61496114 (my patch)
hpet_clockevent.mult = 61496110 (original)
so my patch in fact improves the accuracy (due to the fact that
it removes the extra do_div() to compute the frequency, which
originaly did not have the rounding trick.)
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 9b5cfcd..ee07694 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -326,6 +326,7 @@ static int hpet_clocksource_register(void)
{
u64 tmp, start, now;
cycle_t t1;
+ uint64_t hpet_freq;
/* Start the counter */...that's reassuring, thanks Carlos - patch is still queued up in x86.git. (with a v2.6.27-ish merge target, provided it all goes well.) Ingo --
Not yet ..
I used you patch with some formatting changes and I get,
clocksource_hpet.mult walker = 292935555
clocksource_hpet.mult my patch = 292935555
Same as yours .. just checking ;)
Now I modify the shift value from 22 to 25 and I get this,
clocksource_hpet.mult walker = 2343484437
clocksource_hpet.mult my patch = 2343484446
I don't think this is due to rounding .. I'm not really sure why they're
different .. I would assume the clocksource_hz2mult is more correct, but
feel free to check into it if you want..
The reason I'm messing with the shift is cause 22 is actually too low..
For your period it can be 25, which would give better accuracy.
If you want to check the shift calculation I used it looks like this,
static inline u32 clocksource_hz2shift(u32 bits, u32 hz)
{
u64 temp;
for (; bits > 0; bits--) {
temp = (u64) NSEC_PER_SEC << bits;
do_div(temp, hz);
if ((temp >> 32) == 0)
break;
}
return bits;
}
Daniel
--Heh, I trust those numbers because they imply that my patch is good :-)
Note that my hpet_period = 69841279 * 10^{-15} secs, then
mult/2^shift = ns/cyc = 69.841279 <-- period in nanoseconds
so, using Kcalc I get for shift 22 (the "exact" value, no rounding)
mult = 2^22 * 69.841279 = 292,935,555.875
while for shift = 25 I get 8 times that, or
mult = 2,343,484,447
So comparing with the numbers you quoted from your experiments
you will realize that the value you got with my patch is actually
10 times better, so your assumption about clocksource_hz2mult
Is there some formula to decide which shift value is better to
a given period? IOW, how did you arrive at 25? Note that I
--If that's the case then we really need to correct clocksource_hz2mult() .. Since that function is what we use to calculate the mult in most cases. Daniel --
