From: Randy Dunlap <randy.dunlap@oracle.com> Use a round_jiffies() variant to reduce overhead of timer wakeups. This causes the ipmi timers to occur at the same time as other timers (per CPU). Typical powertop for /ipmi/ (2.6.31, before patch): 11.4% (247.4) kipmi0 : __mod_timer (process_timeout) 0.6% ( 13.1) <interrupt> : ipmi_si 0.5% ( 10.0) <kernel core> : __mod_timer (ipmi_timeout) powertop for /ipmi/, 2.6.31, after patch: 10.8% (247.6) kipmi0 : __mod_timer (process_timeout) 0.3% ( 6.9) <interrupt> : ipmi_si 0.0% ( 1.0) <kernel core> : __mod_timer (ipmi_timeout) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Corey Minyard <minyard@acm.org> Cc: openipmi-developer@lists.sourceforge.net --- drivers/char/ipmi/ipmi_msghandler.c | 7 +++++-- drivers/char/ipmi/ipmi_si_intf.c | 3 ++- 2 files changed, 7 insertions(+), 3 deletions(-) --- lnx-2632-rc5.orig/drivers/char/ipmi/ipmi_msghandler.c +++ lnx-2632-rc5/drivers/char/ipmi/ipmi_msghandler.c @@ -39,6 +39,7 @@ #include <linux/spinlock.h> #include <linux/mutex.h> #include <linux/slab.h> +#include <linux/timer.h> #include <linux/ipmi.h> #include <linux/ipmi_smi.h> #include <linux/notifier.h> @@ -4057,7 +4058,8 @@ static void ipmi_timeout(unsigned long d ipmi_timeout_handler(IPMI_TIMEOUT_TIME); - mod_timer(&ipmi_timer, jiffies + IPMI_TIMEOUT_JIFFIES); + mod_timer(&ipmi_timer, + round_jiffies_up(jiffies + IPMI_TIMEOUT_JIFFIES)); } @@ -4424,7 +4426,8 @@ static int ipmi_init_msghandler(void) #endif /* CONFIG_PROC_FS */ setup_timer(&ipmi_timer, ipmi_timeout, 0); - mod_timer(&ipmi_timer, jiffies + IPMI_TIMEOUT_JIFFIES); + mod_timer(&ipmi_timer, + round_jiffies_up(jiffies + IPMI_TIMEOUT_JIFFIES)); atomic_notifier_chain_register(&panic_notifier_list, &panic_block); --- lnx-2632-rc5.orig/drivers/char/ipmi/ipmi_si_intf.c +++ lnx-2632-rc5/drivers/char/ipmi/ipmi_si_intf.c @@ -1068,7 +1068,8 @@ static ...
On Wed, 21 Oct 2009 10:28:22 -0700 while it is nice that ipmi_si ande the timer wake up less now.... it's still rather sad that the 247.6 from kipmi0 are still there..... those are a much much bigger issue -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org --
obviously :) Randy, any idea where those are coming from ? Auke --
obviously from kipmi thread :(
drivers/char/ipmi/ipmi_si_intf.c::ipmi_thread():
static int ipmi_thread(void *data)
{
struct smi_info *smi_info = data;
unsigned long flags;
enum si_sm_result smi_result;
set_user_nice(current, 19);
while (!kthread_should_stop()) {
spin_lock_irqsave(&(smi_info->si_lock), flags);
smi_result = smi_event_handler(smi_info, 0);
spin_unlock_irqrestore(&(smi_info->si_lock), flags);
if (smi_result == SI_SM_CALL_WITHOUT_DELAY)
; /* do nothing */
else if (smi_result == SI_SM_CALL_WITH_DELAY)
schedule();
else
schedule_timeout_interruptible(1); <-----
}
return 0;
}
calls setup_timer_on_stack(), which calls process_timeout().
From what I recall (probably 2 years ago), [older] ipmi hardware does not
generate event interrupts, so it has to be polled.
Corey, can you elaborate on this?
---
~Randy
--
Certainly. Yes, some (probably most) IPMI hardware does not use interrupts, and unfortunately, it's not just older machines. The driver used to poll more slowly, but in many cases the performance was unacceptable. kipmid is only started if the hardware doesn't support interrupts, so only users with sub-standard hardware have to suffer with this problem. Thanks for the patch, Randy. -corey --
... but now it burns quite a bit of power (I'd not be surprised if it is 10 Watts extra on a 70W server) is there any way to poll slowly until there is active ipmi traffic, during which we can then poll a bit faster. ... and then go back to slow polling when there is an ipmi idle period ? --
I believe that's what the thread does already. Depending on what userspace apps are generating IPMI requests though, there may not be a whole lot of time between requests. Dell OpenManage software does a poll of the IPMI sensors, SEL logs, etc. at regular intervals, on the order of minutes between runs, but during each run there's almost always an outstanding IPMI command. The difference was several minutes during startup, and 15 minutes -> 1.5 minutes during a firmware flash, with the kipmi0 thread present. I've heard requests to have the some userspace control over the start/stop of that thread, so it could get started only when there are more time-critical userspace requests to be made (such as during firmware update), but I've not worked on that part. We'd need an interface (sysfs file?) to start/stop the thread, and could adjust userspace to use that (Dell OpenManage, ipmitool, sblim-cmpi-ipmi, ...). Though I'm really curious that HP has a KCS+interrupt controller available. That gives me hope that the industry-wide problems which prevented Dell from doing likewise a couple years ago are now resolved. I'll have my team look into it again. -- Matt Domsch Technology Strategist, Dell Office of the CTO linux.dell.com & www.dell.com/linux --
In addition to userland work, the upper layer of the driver polls for events once a second. This is another unfortunate IPMI design flaw. There is a flag (that will thus cause an interrupt) that tells you if the event queue is full, but it doesn't tell you if there is an event in the queue, only if its full. So you have to poll for events. -corey --
Can you expand on "industry-wide problems"? (Forced to share interrupts with a high rate device? Design your gizmo to support MSI/MSI-x. Add MSI support to ipmi_si if necessary...) As far as I can tell, HP has never shipped an interrupt-less BMC. Their current iLO2 BMC is KCS + interrupt. Their older design was SMIC + interrupt. Why does everyone use KCS when BT is obviously better? Can you have your team look into that as well? (Among the various goals here, I assume that BT -- with a single interrupt and a DMA transfer instead of shuffling bytes over I/O ports -- would cost less power. Not that the members of that list will receive this message: it bounces nonmembers.) It appears that BT designs usually include both BT & KCS programming interfaces to the same BMC. So perhaps there is some increased silicon complexity -- but c'mon, we're talking about a couple of silicon library macros here. Also, I see evidence of some Sun BMCs that have BT without KCS, so apparently it isn't required to pair them. Pairing is probably done for the benefit of certain dumb client software that assumes all BMCs are KCS. I say to heck with that SW. Any app running under an OS that provides an adequate driver [which includes at least Linux, Solaris, and -- I assume -- Win] shouldn't be thinking about the BMC programming
This is an industry where pennies matter, apparently. My personal preference would be to use the I2C based standard interface. That actually doesn't perform too badly, it's probably cheaper than KCS since you already have I2C, anyway, and the I2C interfaces are generally tied to an interrupt. The trouble is that the only hardware implementation of this I know of seems to be poorly done, but that mostly affects trying to use the ethernet interface and the local interface at the same time. Of course, the driver for I2C is not yet in the standard kernel as it requires a fairly massive I2C driver rewrite to allow the IPMI driver to do it's panic-time operations. BT would be better for performance, I guess, but it's yet another interface to tie in, and hanging this off an existing bus seems like a sensible thing to do. And performance is really not an issue for IPMI. -corey --
Well yeah. Also an industry where a small leg up on some competitor can That seems like a fairly compelling argument against. It's not in the "standard kernel" (i.e. of Linux); it's probably not in other OSes. So now you're asking the software side of the industry to spend another BT is already supported by existing code (e.g. OpenIPMI driver). It can almost certainly be presented on a PCI card (at PCI-mediated addresses not some legacy ISA port range). So what do you mean "yet another interface to tie in"? It's just a blob of behavior to be situated on a PCI card (or mboard implementation in PCI space). I2C? Volunteering to put it on I2C is like volunteering to put the new branch office on a 1200 baud modem because you're suspicious of all that newfangled fiber stuff. Huh? Regarding performance, we've had real struggles with IPMI performance aspects. There were situations where kipmid failed to start or didn't do its job correctly, resulting in very slow IPMI operations. Seemingly harmless -- except that some OEM management agent daemon was making decisions based on feedback from IPMI. The delays caused it to get into some watchdog code that decided the system was hung and issued a reboot. On other systems, the OEM management agents poll IPMI so persistently that kipmid ends up taking a significant fraction of a CPU (nearly 100% of a CPU in bursts, long term average of 40%+). Since the Linux "Console OS" in COS-bearing versions of ESX is by design limited to a single CPU, and since it does have other tasks to do, this causes some bad bottlenecks. These things I mention are our bugs, should be fixed, are not necessarily the "fault" of IPMI. Yet they would have no noticable impact if the BMC hardware was speaking BT. The slow operations which (when magnified by 1000s of calls) took up most of a CPU would instead take up a barely measurable portion of a CPU. I don't suppose my viewpoint on this is going to be popular on lkml, but it is ...
Corey Minyard & Randy Dunlap wrote: Randy>> From what I recall (probably 2 years ago), [older] ipmi hardware Randy>> does not generate event interrupts, so it has to be polled. Randy>> Randy>> Corey, can you elaborate on this? Corey> Certainly. Yes, some (probably most) IPMI hardware does not use Corey> interrupts, and unfortunately, it's not just older machines. Corey> The driver used to poll more slowly, but in many cases the Corey> performance was unacceptable. Corey> Corey> kipmid is only started if the hardware doesn't support Corey> interrupts, so only users with sub-standard hardware have to Corey> suffer with this problem. Regrettably, of the "big three" in the "PC Server" world, only HP's iLO2 BMC supports interrupts. Dell's DRAC4 & 5 don't, IBM's ASM, RSA, etc. don't. Also (at least out of a sample of one) SuperMicro also doesn't have an interrupt. They also have all settled on the KCS interface, which dribbles one character through per non-interrupt. So sad. Dell's DRAC3 had a BT BMC which transferred whole IPMI packets via DMA _and_ had an interrupt. HP's ancient SMIC equipment also had an interrupt (but that's also char-by-char, and their current KCS has an interrupt, so at least they haven't regressed). I've guessed that some chip vendor must have come out with a Really Cheep KCS implementation and drove every other implementation out of the market. :-(
