Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25

Previous thread: x86/pci: Broken build for X86_VISWS by Robert Richter on Thursday, July 10, 2008 - 2:57 am. (7 messages)

Next thread: [PATCH] Fix SDIO break control to now return success or an error by David Howells on Thursday, July 10, 2008 - 4:28 am. (5 messages)
From: Alexey Dobriyan
Date: Thursday, July 10, 2008 - 4:02 am

Bastard!



--

From: Linus Torvalds
Date: Thursday, July 10, 2008 - 10:21 am

I'm marking this "tested-by" by you too, on the strength of that 
rcutorture thing. I think Nick nailed this one.

Good jorb,

		Linus
--

From: Ingo Molnar
Date: Thursday, July 10, 2008 - 10:34 am

cool! :)

(hm, could anyone please resend Nick's original mail? The original one 
is not in my lkml folder nor on lkml.org - only the quoted one.)

	Ingo
--

From: Ingo Molnar
Date: Thursday, July 10, 2008 - 11:06 am

ok, got the mail now now:

| | Annoyed this wasn't a crazy obscure error in the algorithm I could 
| | fix :) [...]

Paul recently ran a formal proof against all sorts of RCU details (and 
found and fixed a few obscure races that way that no-one ever 
triggered), so i'd be quite surprised if we found anything in the core 
algorithm :-)

| | [...] I spent all day debugging it and had to make a special test 
| | case (rcutorture didn't seem to trigger it), and a big RCU state 
| | logging infrastructure to log millions of RCU state transitions and 
| | events. Oh well.

nice debugging!

Acked-by: Ingo Molnar <mingo@elte.hu>

i'm wondering why rcutorture didnt trigger it. I do run !HOTPLUG + 
RCU_PREEMPT kernels and never saw this. Nor did Paul. That aspect is 
weird.

	Ingo
--

From: Nick Piggin
Date: Thursday, July 10, 2008 - 9:11 pm

It basically requires an active rcu reader to be preempted (preferably
by something doing a lot of call_rcu or other activity ie. the writer
so it can tick along the different states quickly).

I found just 2 threads (reader and writer) bound to the same CPU would
trigger it fastest, my reader has quite a long rcu read section.

I'm not sure why rcutorture doesn't trigger for everyone. I'm surprised
it does not have much longer maximum read delays -- several ms I would
have thought should be useful to have a crticial section open while the
rcu engine can run through a number of states...
--

From: Paul E. McKenney
Date: Friday, August 1, 2008 - 2:09 pm

Hit it in 10 seconds once I actually got HOTPLUG_CPU disabled.

The theory behind the default settings for rcutorture are as follows:

o	Having two reader threads for each CPU helps ensure interactions
	between those threads.

o	The writer is normally going to have to share a CPU with a
	reader or two, maybe three.  This should force reader-writer
	interactions.

o	The read-hold time needs to be long enough to ensure interactions
	with the writer, but if it is too long, there are too few
	rcu_read_lock() and rcu_read_unlock() events to really stress
	the read-side processing.

o	The four fakewriters ensure interaction between multiple
	writers.

To Nick's point, I did use a hacked-up rcutorture with millisecond
read-side delays when debugging preemptable RCU, but I also used stock
rcutorture.

I will give this some thought and see if the defaults should change or
if more knobs are needed.

							Thanx, Paul
--

From: Paul E. McKenney
Date: Friday, August 1, 2008 - 2:09 pm

Turns out that my environment was silently re-enabling HOTPLUG_CPU, so I
only -thought- I was testing !CPU_HOTPLUG.  Once I forced it to really
disable HOTPLUG_CPU (by manually also specifying CONFIG_SUSPEND=n and
CONFIG_HIBERNATION=n), then rcutorture complained within 10 seconds.

Sigh!!!

						Thanx, Paul
--

Previous thread: x86/pci: Broken build for X86_VISWS by Robert Richter on Thursday, July 10, 2008 - 2:57 am. (7 messages)

Next thread: [PATCH] Fix SDIO break control to now return success or an error by David Howells on Thursday, July 10, 2008 - 4:28 am. (5 messages)