RE: [Openipmi-developer] [Discuss] [PATCH] ipmi: use round_jiffies on timers to reduce timer overhead/wakeups

Previous thread: [PATCH 1/1] Stating driver comedi adl_pci8164 coding style fix by BRAGA, Bruno on Wednesday, October 21, 2009 - 9:50 am. (1 message)

Next thread: [PATCH] MAINTAINERS: openipmi list is moderated by Randy Dunlap on Wednesday, October 21, 2009 - 10:40 am. (2 messages)
From: Randy Dunlap
Date: Wednesday, October 21, 2009 - 10:28 am

From: Randy Dunlap <randy.dunlap@oracle.com>

Use a round_jiffies() variant to reduce overhead of timer
wakeups.  This causes the ipmi timers to occur at the same
time as other timers (per CPU).

Typical powertop for /ipmi/ (2.6.31, before patch):
  11.4% (247.4)            kipmi0 : __mod_timer (process_timeout) 
   0.6% ( 13.1)       <interrupt> : ipmi_si 
   0.5% ( 10.0)     <kernel core> : __mod_timer (ipmi_timeout) 

powertop for /ipmi/, 2.6.31, after patch:
  10.8% (247.6)            kipmi0 : __mod_timer (process_timeout) 
   0.3% (  6.9)       <interrupt> : ipmi_si 
   0.0% (  1.0)     <kernel core> : __mod_timer (ipmi_timeout)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc:	Corey Minyard <minyard@acm.org>
Cc:	openipmi-developer@lists.sourceforge.net
---
 drivers/char/ipmi/ipmi_msghandler.c |    7 +++++--
 drivers/char/ipmi/ipmi_si_intf.c    |    3 ++-
 2 files changed, 7 insertions(+), 3 deletions(-)

--- lnx-2632-rc5.orig/drivers/char/ipmi/ipmi_msghandler.c
+++ lnx-2632-rc5/drivers/char/ipmi/ipmi_msghandler.c
@@ -39,6 +39,7 @@
 #include <linux/spinlock.h>
 #include <linux/mutex.h>
 #include <linux/slab.h>
+#include <linux/timer.h>
 #include <linux/ipmi.h>
 #include <linux/ipmi_smi.h>
 #include <linux/notifier.h>
@@ -4057,7 +4058,8 @@ static void ipmi_timeout(unsigned long d
 
 	ipmi_timeout_handler(IPMI_TIMEOUT_TIME);
 
-	mod_timer(&ipmi_timer, jiffies + IPMI_TIMEOUT_JIFFIES);
+	mod_timer(&ipmi_timer,
+		round_jiffies_up(jiffies + IPMI_TIMEOUT_JIFFIES));
 }
 
 
@@ -4424,7 +4426,8 @@ static int ipmi_init_msghandler(void)
 #endif /* CONFIG_PROC_FS */
 
 	setup_timer(&ipmi_timer, ipmi_timeout, 0);
-	mod_timer(&ipmi_timer, jiffies + IPMI_TIMEOUT_JIFFIES);
+	mod_timer(&ipmi_timer,
+		round_jiffies_up(jiffies + IPMI_TIMEOUT_JIFFIES));
 
 	atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
 
--- lnx-2632-rc5.orig/drivers/char/ipmi/ipmi_si_intf.c
+++ lnx-2632-rc5/drivers/char/ipmi/ipmi_si_intf.c
@@ -1068,7 +1068,8 @@ static ...
From: Arjan van de Ven
Date: Wednesday, October 21, 2009 - 11:42 am

On Wed, 21 Oct 2009 10:28:22 -0700

while it is nice that ipmi_si ande the timer wake up less now.... it's
still rather sad that the 247.6 from kipmi0 are still there..... those
are a much much bigger issue

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Kok, Auke
Date: Wednesday, October 21, 2009 - 11:49 am

obviously :)

Randy, any idea where those are coming from ?

Auke


--

From: Randy Dunlap
Date: Wednesday, October 21, 2009 - 1:03 pm

obviously from kipmi thread :(

drivers/char/ipmi/ipmi_si_intf.c::ipmi_thread():

static int ipmi_thread(void *data)
{
	struct smi_info *smi_info = data;
	unsigned long flags;
	enum si_sm_result smi_result;

	set_user_nice(current, 19);
	while (!kthread_should_stop()) {
		spin_lock_irqsave(&(smi_info->si_lock), flags);
		smi_result = smi_event_handler(smi_info, 0);
		spin_unlock_irqrestore(&(smi_info->si_lock), flags);
		if (smi_result == SI_SM_CALL_WITHOUT_DELAY)
			; /* do nothing */
		else if (smi_result == SI_SM_CALL_WITH_DELAY)
			schedule();
		else
			schedule_timeout_interruptible(1); <-----
	}
	return 0;
}


calls setup_timer_on_stack(), which calls process_timeout().

From what I recall (probably 2 years ago), [older] ipmi hardware does not
generate event interrupts, so it has to be polled.

Corey, can you elaborate on this?


---
~Randy
--

From: Corey Minyard
Date: Wednesday, October 21, 2009 - 1:22 pm

Certainly.  Yes, some (probably most) IPMI hardware does not use
interrupts, and unfortunately, it's not just older machines.  The driver
used to poll more slowly, but in many cases the performance was
unacceptable.

kipmid is only started if the hardware doesn't support interrupts, so
only users with sub-standard hardware have to suffer with this problem.

Thanks for the patch, Randy.

-corey
--

From: Arjan van de Ven
Date: Wednesday, October 21, 2009 - 1:57 pm

... but now it burns quite a bit of power (I'd not be surprised if it is 10 Watts extra on a 70W server)

is there any way to poll slowly until there is active ipmi traffic, during which we can then poll a bit faster.
... and then go back to slow polling when there is an ipmi idle period ?
--

From: Matt Domsch
Date: Wednesday, October 21, 2009 - 7:50 pm

I believe that's what the thread does already.  Depending on what
userspace apps are generating IPMI requests though, there may not be a
whole lot of time between requests.  Dell OpenManage software does a
poll of the IPMI sensors, SEL logs, etc. at regular intervals, on the
order of minutes between runs, but during each run there's almost
always an outstanding IPMI command.

The difference was several minutes during startup, and 15 minutes ->
1.5 minutes during a firmware flash, with the kipmi0 thread present.
I've heard requests to have the some userspace control over the
start/stop of that thread, so it could get started only when there are
more time-critical userspace requests to be made (such as during firmware
update), but I've not worked on that part.  We'd need an interface
(sysfs file?) to start/stop the thread, and could adjust userspace to
use that (Dell OpenManage, ipmitool, sblim-cmpi-ipmi, ...).

Though I'm really curious that HP has a KCS+interrupt controller
available.  That gives me hope that the industry-wide problems which
prevented Dell from doing likewise a couple years ago are now
resolved.  I'll have my team look into it again.


-- 
Matt Domsch
Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux
--

From: Corey Minyard
Date: Thursday, October 22, 2009 - 9:45 am

In addition to userland work, the upper layer of the driver polls for 
events once a second.  This is another unfortunate IPMI design flaw.  
There is a flag (that will thus cause an interrupt) that tells you if 
the event queue is full, but it doesn't tell you if there is an event in 
the queue, only if its full.  So you have to poll for events.

-corey
--

From: Bela Lubkin
Date: Thursday, October 22, 2009 - 12:12 pm

Can you expand on "industry-wide problems"?  (Forced to share
interrupts with a high rate device?  Design your gizmo to
support MSI/MSI-x.  Add MSI support to ipmi_si if necessary...)

As far as I can tell, HP has never shipped an interrupt-less
BMC.  Their current iLO2 BMC is KCS + interrupt.  Their older
design was SMIC + interrupt.

Why does everyone use KCS when BT is obviously better?  Can
you have your team look into that as well?  (Among the various
goals here, I assume that BT -- with a single interrupt and a
DMA transfer instead of shuffling bytes over I/O ports -- would
cost less power.  Not that the members of that list will
receive this message: it bounces nonmembers.)

It appears that BT designs usually include both BT & KCS
programming interfaces to the same BMC.  So perhaps there is
some increased silicon complexity -- but c'mon, we're talking
about a couple of silicon library macros here.

Also, I see evidence of some Sun BMCs that have BT without KCS,
so apparently it isn't required to pair them.  Pairing is
probably done for the benefit of certain dumb client software
that assumes all BMCs are KCS.  I say to heck with that SW.
Any app running under an OS that provides an adequate driver
[which includes at least Linux, Solaris, and -- I assume --
Win] shouldn't be thinking about the BMC programming

From: Corey Minyard
Date: Thursday, October 22, 2009 - 1:02 pm

This is an industry where pennies matter, apparently.

My personal preference would be to use the I2C based standard 
interface.  That actually doesn't perform too badly, it's probably 
cheaper than KCS since you already have I2C, anyway, and the I2C 
interfaces are generally tied to an interrupt.  The trouble is that the 
only hardware implementation of this I know of seems to be poorly done, 
but that mostly affects trying to use the ethernet interface and the 
local interface at the same time.

Of course, the driver for I2C is not yet in the standard kernel as it 
requires a fairly massive I2C driver rewrite to allow the IPMI driver to 
do it's panic-time operations.

BT would be better for performance, I guess, but it's yet another 
interface to tie in, and hanging this off an existing bus seems like a 
sensible thing to do.  And performance is really not an issue for IPMI.

-corey
--


Well yeah.  Also an industry where a small leg up on some competitor can

That seems like a fairly compelling argument against.  It's not in the
"standard kernel" (i.e. of Linux); it's probably not in other OSes.  So
now you're asking the software side of the industry to spend another

BT is already supported by existing code (e.g. OpenIPMI driver).  It can
almost certainly be presented on a PCI card (at PCI-mediated addresses
not some legacy ISA port range).  So what do you mean "yet another
interface to tie in"?  It's just a blob of behavior to be situated on a
PCI card (or mboard implementation in PCI space).

I2C?  Volunteering to put it on I2C is like volunteering to put the new
branch office on a 1200 baud modem because you're suspicious of all that
newfangled fiber stuff.  Huh?

Regarding performance, we've had real struggles with IPMI performance
aspects.  There were situations where kipmid failed to start or didn't
do its job correctly, resulting in very slow IPMI operations.  Seemingly
harmless -- except that some OEM management agent daemon was making
decisions based on feedback from IPMI.  The delays caused it to get into
some watchdog code that decided the system was hung and issued a reboot.   

On other systems, the OEM management agents poll IPMI so persistently
that kipmid ends up taking a significant fraction of a CPU (nearly
100% of a CPU in bursts, long term average of 40%+).  Since the Linux 
"Console OS" in COS-bearing versions of ESX is by design limited to a
single CPU, and since it does have other tasks to do, this causes some
bad bottlenecks.             

These things I mention are our bugs, should be fixed, are not
necessarily the "fault" of IPMI.  Yet they would have no noticable
impact if the BMC hardware was speaking BT.  The slow operations which
(when magnified by 1000s of calls) took up most of a CPU would instead
take up a barely measurable portion of a CPU.

I don't suppose my viewpoint on this is going to be popular on lkml,
but it is ...
From: Bela Lubkin
Date: Wednesday, October 21, 2009 - 4:46 pm

Corey Minyard & Randy Dunlap wrote:

Randy>> From what I recall (probably 2 years ago), [older] ipmi hardware
Randy>> does not generate event interrupts, so it has to be polled.
Randy>>
Randy>> Corey, can you elaborate on this?

Corey> Certainly.  Yes, some (probably most) IPMI hardware does not use
Corey> interrupts, and unfortunately, it's not just older machines.
Corey> The driver used to poll more slowly, but in many cases the
Corey> performance was unacceptable.
Corey>
Corey> kipmid is only started if the hardware doesn't support
Corey> interrupts, so only users with sub-standard hardware have to
Corey> suffer with this problem.

Regrettably, of the "big three" in the "PC Server" world, only HP's iLO2
BMC supports interrupts.  Dell's DRAC4 & 5 don't, IBM's ASM, RSA, etc.
don't.  Also (at least out of a sample of one) SuperMicro also doesn't
have an interrupt.

They also have all settled on the KCS interface, which dribbles one
character through per non-interrupt.  So sad.  Dell's DRAC3 had a BT BMC
which transferred whole IPMI packets via DMA _and_ had an interrupt.
HP's ancient SMIC equipment also had an interrupt (but that's also
char-by-char, and their current KCS has an interrupt, so at least they
haven't regressed).  I've guessed that some chip vendor must have
come out with a Really Cheep KCS implementation and drove every other
implementation out of the market.  :-(


Previous thread: [PATCH 1/1] Stating driver comedi adl_pci8164 coding style fix by BRAGA, Bruno on Wednesday, October 21, 2009 - 9:50 am. (1 message)

Next thread: [PATCH] MAINTAINERS: openipmi list is moderated by Randy Dunlap on Wednesday, October 21, 2009 - 10:40 am. (2 messages)