I'm marking this "tested-by" by you too, on the strength of that rcutorture thing. I think Nick nailed this one. Good jorb, Linus --
cool! :) (hm, could anyone please resend Nick's original mail? The original one is not in my lkml folder nor on lkml.org - only the quoted one.) Ingo --
ok, got the mail now now: | | Annoyed this wasn't a crazy obscure error in the algorithm I could | | fix :) [...] Paul recently ran a formal proof against all sorts of RCU details (and found and fixed a few obscure races that way that no-one ever triggered), so i'd be quite surprised if we found anything in the core algorithm :-) | | [...] I spent all day debugging it and had to make a special test | | case (rcutorture didn't seem to trigger it), and a big RCU state | | logging infrastructure to log millions of RCU state transitions and | | events. Oh well. nice debugging! Acked-by: Ingo Molnar <mingo@elte.hu> i'm wondering why rcutorture didnt trigger it. I do run !HOTPLUG + RCU_PREEMPT kernels and never saw this. Nor did Paul. That aspect is weird. Ingo --
It basically requires an active rcu reader to be preempted (preferably by something doing a lot of call_rcu or other activity ie. the writer so it can tick along the different states quickly). I found just 2 threads (reader and writer) bound to the same CPU would trigger it fastest, my reader has quite a long rcu read section. I'm not sure why rcutorture doesn't trigger for everyone. I'm surprised it does not have much longer maximum read delays -- several ms I would have thought should be useful to have a crticial section open while the rcu engine can run through a number of states... --
Hit it in 10 seconds once I actually got HOTPLUG_CPU disabled. The theory behind the default settings for rcutorture are as follows: o Having two reader threads for each CPU helps ensure interactions between those threads. o The writer is normally going to have to share a CPU with a reader or two, maybe three. This should force reader-writer interactions. o The read-hold time needs to be long enough to ensure interactions with the writer, but if it is too long, there are too few rcu_read_lock() and rcu_read_unlock() events to really stress the read-side processing. o The four fakewriters ensure interaction between multiple writers. To Nick's point, I did use a hacked-up rcutorture with millisecond read-side delays when debugging preemptable RCU, but I also used stock rcutorture. I will give this some thought and see if the defaults should change or if more knobs are needed. Thanx, Paul --
Turns out that my environment was silently re-enabling HOTPLUG_CPU, so I only -thought- I was testing !CPU_HOTPLUG. Once I forced it to really disable HOTPLUG_CPU (by manually also specifying CONFIG_SUSPEND=n and CONFIG_HIBERNATION=n), then rcutorture complained within 10 seconds. Sigh!!! Thanx, Paul --
