Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0()

Previous thread: [PATCH] Input: evdev - Fix printk() format warning by Roland Dreier on Tuesday, August 19, 2008 - 12:27 pm. (2 messages)

Next thread: [PATCH 0 of 3] define and use phys_addr_t by Jeremy Fitzhardinge on Tuesday, August 19, 2008 - 1:02 pm. (10 messages)
From: Vegard Nossum
Date: Tuesday, August 19, 2008 - 12:51 pm

Hi,

With latest -git (1fca25427482387689fa27594c992a961d98768f), I got
this on reading from /dev/cpu/*/* while hot-unplugging cpu1.

------------[ cut here ]------------
WARNING: at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/arch/x86/kernel/ipi.c:123
send_IPI_mask_bitmask+0xc3/0xe0()
Pid: 3881, comm: cat Not tainted 2.6.27-rc3-00464-g1fca254 #12
 [<c013591f>] warn_on_slowpath+0x4f/0x80
 [<c010a300>] ? native_sched_clock+0x80/0x110
 [<c010a335>] ? native_sched_clock+0xb5/0x110
 [<c015ae5a>] ? __lock_acquire+0x27a/0xa00
 [<c015635b>] ? trace_hardirqs_off+0xb/0x10
 [<c010a335>] ? native_sched_clock+0xb5/0x110
 [<c01563bd>] ? put_lock_stats+0xd/0x30
 [<c0118a43>] send_IPI_mask_bitmask+0xc3/0xe0
 [<c01017c8>] send_IPI_mask+0x8/0x10
 [<c0118307>] native_send_call_func_single_ipi+0x27/0x30
 [<c0160a2b>] generic_exec_single+0x7b/0x80
 [<c0160adf>] smp_call_function_single+0x5f/0x110
 [<c037a440>] ? __rdmsr_safe_on_cpu+0x0/0x60
 [<c037a440>] ? __rdmsr_safe_on_cpu+0x0/0x60
 [<c037a597>] _rdmsr_on_cpu+0x27/0x60
 [<c037a5ea>] rdmsr_safe_on_cpu+0x1a/0x20
 [<c011733e>] msr_read+0x6e/0xa0
 [<c01a87b4>] vfs_read+0x94/0x130
 [<c01172d0>] ? msr_read+0x0/0xa0
 [<c01a8b5d>] sys_read+0x3d/0x70
 [<c01040db>] sysenter_do_call+0x12/0x3f
 =======================
---[ end trace fe4338948cb73be2 ]---
BUG: soft lockup - CPU#0 stuck for 61s! [cat:3881]
irq event stamp: 14632440
hardirqs last  enabled at (14632439): [<c015968b>] trace_hardirqs_on+0xb/0x10
hardirqs last disabled at (14632440): [<c015635b>] trace_hardirqs_off+0xb/0x10
softirqs last  enabled at (14632434): [<c013a4d1>] __do_softirq+0xe1/0x100
softirqs last disabled at (14632427): [<c013a595>] do_softirq+0xa5/0xb0
Pid: 3881, comm: cat Tainted: G        W (2.6.27-rc3-00464-g1fca254 #12)
EIP: 0060:[<c0160952>] EFLAGS: 00200202 CPU: 0
EIP is at csd_flag_wait+0x12/0x20
EAX: f5f31ef0 EBX: c215dc60 ECX: ffffb300 EDX: 000008fa
ESI: 00200292 EDI: c215dc68 EBP: f5f31ec0 ESP: f5f31ec0
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: ...
From: Andi Kleen
Date: Tuesday, August 19, 2008 - 6:39 pm

It's generally known the oprofile doesn't support CPU hotplug well.
Someone needs to make a project out of fixing it properly. Right now
it's just a "don't do that when it hurts"

-Andi

--

From: Vegard Nossum
Date: Tuesday, August 19, 2008 - 11:26 pm

Hm. What you say is true, but this one in particular has nothing to do
with oprofile! It has something to do with reading /dev/cpu/*/msr
while hot-unplugging cpu1:

 [<c011733e>] msr_read+0x6e/0xa0
 [<c01a87b4>] vfs_read+0x94/0x130

I wasn't using oprofile when this happened. So I think it should also
be considered a separate issue. Though yes -- CPU hotplug in general
tends to break a lot of things.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--

From: Dave Jones
Date: Thursday, August 21, 2008 - 5:36 pm

On Wed, Aug 20, 2008 at 08:26:19AM +0200, Vegard Nossum wrote:
 > On Wed, Aug 20, 2008 at 3:39 AM, Andi Kleen <andi@firstfloor.org> wrote:
 > > On Tue, Aug 19, 2008 at 09:51:44PM +0200, Vegard Nossum wrote:
 > >> Hi,
 > >>
 > >> With latest -git (1fca25427482387689fa27594c992a961d98768f), I got
 > >> this on reading from /dev/cpu/*/* while hot-unplugging cpu1.
 > >
 > > It's generally known the oprofile doesn't support CPU hotplug well.
 > > Someone needs to make a project out of fixing it properly. Right now
 > > it's just a "don't do that when it hurts"
 > 
 > Hm. What you say is true, but this one in particular has nothing to do
 > with oprofile! It has something to do with reading /dev/cpu/*/msr
 > while hot-unplugging cpu1:
 > 
 >  [<c011733e>] msr_read+0x6e/0xa0
 >  [<c01a87b4>] vfs_read+0x94/0x130
 > 
 > I wasn't using oprofile when this happened. So I think it should also
 > be considered a separate issue. Though yes -- CPU hotplug in general
 > tends to break a lot of things.

From my reading of the msr code, we check that the cpu is online in ->open,
but we never check it again, and also, we make no guarantees that it
won't go away before we ->read or even ->close it.

Would adding a get_cpu/put_cpu across the open/close solve this?
Peter?

	Dave

-- 
http://www.codemonkey.org.uk
--

From: H. Peter Anvin
Date: Thursday, August 21, 2008 - 7:13 pm

A get_cpu/put_cpu across the whole open..close sequence would seem to 
be, ahem, rude, since userspace could hold it for an arbitrary amount of 
time (plus, there is no guarantee that they are invoked on the same CPU.)

The cpuid driver has the same problem, obviously.

get_online_cpus() and put_online_cpus() around the call to 
{rd,wr}msr_safe_on_cpu() should work; and the CPU hotplug documentation 
seems to claim that we can just disable preemption around those calls, 
which is exactly what get_cpu()..put_cpu() does, so I guess 
get_cpu()..put_cpu() here is fine.  Now, the big question is: should 
this really be done in the MSR/CPUID drivers, or should it be done in
smp_call_function_single(), which is the generic code invoked by this?

It seems to be that doing it in smp_call_function_single() would be more 
correct as it's already protected by get_cpu()..put_cpu() and a 
cpu_online() test in there should not be expensive in comparison to the 
whole rest of the code.

You may want to see if this patch fixes the problem; it does *NOT* have 
the correct error behaviour (some of the intervening layers don't 
propagate errors), but it should make the fault go away.

	-hpa

From: Andi Kleen
Date: Thursday, August 21, 2008 - 7:28 pm

The alternative would be to just take out those msr_on_cpu() 
interfaces again. Right now they are useless in the kernel,
but still cause problems.

They were only added for OpenVZ's vCPUs which they back then
promised me would hit mainline soon. But that was some time
ago and there wasn't much progress on this.

-Andi

--

From: H. Peter Anvin
Date: Thursday, August 21, 2008 - 11:24 pm

We still need the equivalent functionality, though.  The midlayer 
(msr_on_cpu) may be pointless, but that doesn't change the fact that 
putting this functionality in the lower layer (smp_call_function_single) 
makes more sense.

	-hpa
--

From: Andi Kleen
Date: Friday, August 22, 2008 - 2:35 am

Assuming you can actually have interrupts enabled at these point
and be otherwise ready to do call_function_simple (e.g. cpu hotplug
locking etc.) 

For a lot of MSR accesses in more complicated subsystems like cpufreq 
that requires complications.  I would think for many circumstances it's 
better to simply set affinity of the thread before at a higher level.

In hindsight I think it was my mistake to ever merge that.
I admit I never liked it, but just merged it because I wasn't able
to come up with a strong enough counter argument back then.

-Andi
--

From: H. Peter Anvin
Date: Friday, August 22, 2008 - 9:41 am

Well, smp_call_function_single already does all necessary locking; it 
makes more sense for it to check that what it's about to call still 
exists while inside the lock, instead of requiring the higher layers to 
guarantee that cannot happen on it.  This is simply a matter of the cost 
of checking at this point being quite low.

	-hpa
--

From: Jeremy Fitzhardinge
Date: Friday, August 22, 2008 - 11:42 pm

It does, already doesn't it?  Hm, smp_call_function_mask() ands the
provided mask with the online mask, but it doesn't look like
smp_call_function_single() does the equivalent.

    J
--

From: H. Peter Anvin
Date: Friday, August 22, 2008 - 11:44 pm

It doesn't, and that's how this bug was introduced.  It's a trivial add 
(see test patch already posted) and should hardly matter in terms of 
execution time.

I'll write up a clean patch with all the error propagation tomorrow or 
Sunday.

	-hpa
--

From: Vegard Nossum
Date: Sunday, August 24, 2008 - 2:20 am

Hm.

Kernel fails to detect cpu1 at all.

I am currently unsure of whether it's your patch or not. But it's the
same config that I've been booting for ages (and I copy it over for
each new kernel version I check out).

Processor #0 (Bootup-CPU)
I/O APIC #2 Version 32 at 0xFEC00000.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Processors: 1
SMP: Allowing 1 CPUs, 0 hotplug CPUs
mapped APIC to ffffb000 (fee00000)
mapped IOAPIC to ffffa000 (fec00000)
Allocating PCI resources starting at 50000000 (gap: 40000000:bee00000)
PERCPU: Allocating 1221764 bytes of per cpu data
NR_CPUS: 7, nr_cpu_ids: 1, nr_node_ids 1

I really don't get it. Is this something that can be caused by your
patch _at all_ ?


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--

From: H. Peter Anvin
Date: Sunday, August 24, 2008 - 9:43 am

Well, if smp_call_function_single() is called during the CPU up 
sequence, without the CPU having been added to the online mask, then 
yes, it could.  The most likely place would be from a notifier.

That makes it ugly.  Need to track down the reason.

	-hpa
--

From: H. Peter Anvin
Date: Sunday, August 24, 2008 - 10:17 am

Could you try this patch?  It should (hopefully) tell us if there is any 
such invocations and what the call trace looks like.

	-hpa
From: Vegard Nossum
Date: Sunday, August 24, 2008 - 10:22 am

I'm sorry, I _just_ reverted your patch and tested the bare kernel...
but it still only detects cpu0 :-(

Apart from that, it's also incredibly slow and I get some
"end_request: I/O error, dev fd0, sector 0" messages. Start-up (init 3
on a F7) takes closer to 10 minutes. Will now take a closer look at my
config.

Oh. I _just_ noticed a completely different change -- I added acpi=off
to my boot line *blush*

Will now remove it and retry your original patch.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--

From: Vegard Nossum
Date: Sunday, August 24, 2008 - 10:45 am

Removing acpi=off helps with the CPU detection problem. The kernel is
still really slow, though. From /proc/cpuinfo:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 6
model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping        : 5
cpu MHz         : 375.000
cache size      : 2048 KB

Why is MHz on 375!? I tried cpufreq-selector, but nothing changed. Maybe

calling  acpi_cpufreq_init+0x0/0x90
initcall acpi_cpufreq_init+0x0/0x90 returned -19 after 0 msecs

There's also this:

SMP: Allowing 2 CPUs, 0 hotplug CPUs

(but CPU hotplug still work, is the line above about something
different, like physical hotplug?)

Apart from that, with your patch applied, hotplug seems to work OK (no
warnings).

Okay, now I used cpufreq-selector to change to "ondemand" governor,
and MHz goes back to 3000. Weird. Why would "performance" governor put
my machine to a constant 375?

Thanks,


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--

From: H. Peter Anvin
Date: Sunday, August 24, 2008 - 10:59 am

That would be a problem... I presume this problem is independent of the 
patch, though?

	-hpa
--

From: Dave Jones
Date: Sunday, August 24, 2008 - 11:13 am

On Sun, Aug 24, 2008 at 07:45:48PM +0200, Vegard Nossum wrote:
 > Removing acpi=off helps with the CPU detection problem. The kernel is
 > still really slow, though. From /proc/cpuinfo:
 > 
 > processor       : 1
 > vendor_id       : GenuineIntel
 > cpu family      : 15
 > model           : 6
 > model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
 > stepping        : 5
 > cpu MHz         : 375.000
 > cache size      : 2048 KB
 > 
 > Why is MHz on 375!? I tried cpufreq-selector, but nothing changed. Maybe
 > 
 > calling  acpi_cpufreq_init+0x0/0x90
 > initcall acpi_cpufreq_init+0x0/0x90 returned -19 after 0 msecs

-ENODEV.  Because you don't have frequency scaling capable CPU.

 > Okay, now I used cpufreq-selector to change to "ondemand" governor,
 > and MHz goes back to 3000. Weird. Why would "performance" governor put
 > my machine to a constant 375?
 
Probably because you're using p4-clockmod, and it's crap.

	Dave

-- 
http://www.codemonkey.org.uk
--

From: Vegard Nossum
Date: Monday, August 25, 2008 - 11:31 am

I sorted it -- thanks! It turned out to be pretty obscure; my tty
setting for the receiving end of the serial console was set to echo.
So when the machine booted, it was echoing lots of characters into the
Fedora 7 init, which would prompt for the starting of cpuspeed
initscript. Turning off echo for the tty was what triggered the
slowness; removing cpuspeed from the runlevel entirely solved the
problem.

Don't know why cpuspeed would select a governor which runs the CPU at
a constant 300 MHz, though.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--

From: Dave Jones
Date: Monday, August 25, 2008 - 11:38 am

On Mon, Aug 25, 2008 at 08:31:04PM +0200, Vegard Nossum wrote:
 
 > Fedora 7 init, which would prompt for the starting of cpuspeed
 > initscript. Turning off echo for the tty was what triggered the
 > slowness; removing cpuspeed from the runlevel entirely solved the
 > problem.
 > 
 > Don't know why cpuspeed would select a governor which runs the CPU at
 > a constant 300 MHz, though.

p4-clockmod is the only cpufreq driver that can run on your hardware.
There's nothing better.   A while back, Fedora stopped loading
(and even building) p4-clockmod, because it sucks so bad.
I can't remember when we made that change, but it sounds like it must
have been a post F7 thing.

	Dave

-- 
http://www.codemonkey.org.uk
--

From: Andi Kleen
Date: Monday, August 25, 2008 - 11:36 am

> Probably because you're using p4-clockmod, and it's crap.

Really should really bite the bullet and just remove it. People 
run in this all the time and I bet you can count the people who
actually use it consciously and usefully with one hand.

Or at least only make it run when the user set a "I_REALLY_KNOW_WHAT_I_AM_DOING"
option explicitely.

-Andi
--

From: Dave Jones
Date: Monday, August 25, 2008 - 11:54 am

On Mon, Aug 25, 2008 at 08:36:11PM +0200, Andi Kleen wrote:
 > > Probably because you're using p4-clockmod, and it's crap.
 > 
 > Really should really bite the bullet and just remove it. People 
 > run in this all the time and I bet you can count the people who
 > actually use it consciously and usefully with one hand.
 > 
 > Or at least only make it run when the user set a "I_REALLY_KNOW_WHAT_I_AM_DOING"
 > option explicitely.

We can't really remove it until ACPI processor driver has a better
response than 'thermal event, argh!, shut down'.

When that happens, I'll be glad to see it go.

	Dave

-- 
http://www.codemonkey.org.uk
--

From: Andi Kleen
Date: Monday, August 25, 2008 - 12:39 pm

It only does that when the critical trip point is reached (which
basically means that the BIOS tells it -- "I'm on fire"). What else should 
it do in your opinion when this happens?

-Andi
--

From: Dave Jones
Date: Monday, August 25, 2008 - 12:50 pm

On Mon, Aug 25, 2008 at 09:39:26PM +0200, Andi Kleen wrote:
 > On Mon, Aug 25, 2008 at 02:54:51PM -0400, Dave Jones wrote:
 > > On Mon, Aug 25, 2008 at 08:36:11PM +0200, Andi Kleen wrote:
 > >  > > Probably because you're using p4-clockmod, and it's crap.
 > >  > 
 > >  > Really should really bite the bullet and just remove it. People 
 > >  > run in this all the time and I bet you can count the people who
 > >  > actually use it consciously and usefully with one hand.
 > >  > 
 > >  > Or at least only make it run when the user set a "I_REALLY_KNOW_WHAT_I_AM_DOING"
 > >  > option explicitely.
 > > 
 > > We can't really remove it until ACPI processor driver has a better
 > > response than 'thermal event, argh!, shut down'.
 > 
 > It only does that when the critical trip point is reached (which
 > basically means that the BIOS tells it -- "I'm on fire"). What else should 
 > it do in your opinion when this happens?

On some systems (for which there aren't BIOS updates) the trip points are
set too low.  If we get a thermal event that was caused by temporary
increased workload, temperature will drop off again when that workload
is complete.

For sustained workloads we'd get additional thermal events, at which
time we make a decision "ok, we've throttled as far as we can, and
things are still going badly, power off".

In the event of a failed fan or similar, shutting down is obviously
the right thing to do, and we'd get further thermal events after
throttling which would allow us to do so.

	Dave

-- 
http://www.codemonkey.org.uk
--

From: Andi Kleen
Date: Monday, August 25, 2008 - 1:36 pm

There were patches floating to make this configurable. I was always

But none of the cpufreq governours do this. They only care about


So you're saying processor_thermal should let the system cook
for some time first before really taking action?

-Andi
--

From: Dave Jones
Date: Monday, August 25, 2008 - 1:47 pm

On Mon, Aug 25, 2008 at 10:36:49PM +0200, Andi Kleen wrote:
 
 > > If we get a thermal event that was caused by temporary
 > > increased workload, temperature will drop off again when that workload
 > > is complete.
 > 
 > But none of the cpufreq governours do this. They only care about
 > load, not about temperature.

Which is good enough to stop p4 laptops from shutting down as
soon as they've finished booting up.

 > > For sustained workloads we'd get additional thermal events, at which
 > > time we make a decision "ok, we've throttled as far as we can, and
 > > things are still going badly, power off".
 > 
 > That is what the ACPI driver does when the trip point is reached.

yes, except for that "we've throttled" part.
 
	Dave

-- 
http://www.codemonkey.org.uk
--

From: Arjan van de Ven
Date: Monday, August 25, 2008 - 2:24 pm

On Mon, 25 Aug 2008 16:47:02 -0400

that's such an enormous gamble it's not funny.


really; if your bios has broken trippoints we should use the kernel
commandline to disable them (and a dmi blacklist if the amount of
bioses that have it wrong is low.. maybe combined with a date based
threshold).

Just praying that p4clockmod keeps it kinda low enough is not the
answer.



-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: H. Peter Anvin
Date: Monday, August 25, 2008 - 12:08 pm

CONFIG_BROKEN?

	-hpa
--

From: Dave Jones
Date: Monday, August 25, 2008 - 12:13 pm

On Mon, Aug 25, 2008 at 12:08:23PM -0700, H. Peter Anvin wrote:
 > Andi Kleen wrote:
 > >> Probably because you're using p4-clockmod, and it's crap.
 > > 
 > > Really should really bite the bullet and just remove it. People 
 > > run in this all the time and I bet you can count the people who
 > > actually use it consciously and usefully with one hand.
 > > 
 > > Or at least only make it run when the user set a "I_REALLY_KNOW_WHAT_I_AM_DOING"
 > > option explicitely.
 > 
 > CONFIG_BROKEN?

It's not really broken (at least in the CONFIG_BROKEN sense), it just sucks
when used in the wrong situations. (Which is 99% of the use-cases people
try to use it).

	Dave

-- 
http://www.codemonkey.org.uk
--

Previous thread: [PATCH] Input: evdev - Fix printk() format warning by Roland Dreier on Tuesday, August 19, 2008 - 12:27 pm. (2 messages)

Next thread: [PATCH 0 of 3] define and use phys_addr_t by Jeremy Fitzhardinge on Tuesday, August 19, 2008 - 1:02 pm. (10 messages)