Re: [RFC PATCH] Add CRC checksum for RCU lists

Previous thread: [GIT PULL] [x86 setup] Fix VESA mode decoding in ACPI wakeup by H. Peter Anvin on Thursday, September 20, 2007 - 2:16 pm. (1 message)

Next thread: Re: [Bugme-new] [Bug 9043] New: tty not printed to screen by Andrew Morton on Thursday, September 20, 2007 - 3:04 pm. (2 messages)
To: Paul E. McKenney <paulmck@...>
Cc: LKML <linux-kernel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Peter Zijlstra <a.p.zijlstra@...>, Andrew Morton <akpm@...>
Date: Thursday, September 20, 2007 - 2:34 pm

In recent development of the RT kernel, some of our experimental code
corrupted the rcu header. But the side effect (crashing) didn't rear its
ugly head until way after the fact. Discussing this with Paul, he
suggested that RCU should have a "self checking" mechanism to detect
these kind of issues. He also suggested putting in a CRC into the
rcu_head structure.

This patch does so.

Since there is a bit of overhead with adding this checking, I made it a
config option that would best be turned on for any development that uses
RCU callbacks.

The way this works is when CRC is configured, two elements are added to
the rcu_head. A crc (long to be consistent, although it only uses a
32bit value) and a caller. The caller is location of who called the
call_rcu, which is very useful when a corruption does occur. Although,
another item may have corrupted the rcu_head, from my experience (the
crash we had), there are several of the same items that do the call_rcu
together.

On call_rcu, the checksum is created of the next pointer and the
function to call. The checksum algorithm is simply a XOR of some magic
number with each 4 bytes of the next and func pointer of the rcu_head.
This is then placed into the crc of the rcu_head as well as the return
address.

Various stages of RCU handling does a checksum of the RCU lists to make
sure that everything is what it expects it to be. Again, this does have
a slight performance impact (although I haven't noticed it with various
tasks, but I'm sure any benchmark will), so should be turned off on
production systems. This is focused on development only.

This patch also takes care to update the crc when rcu_head items are
moved from list to list and whenever the next pointer is modified due to
addition.

This is against the lastest git (as of this morning) so it does not
include Paul's recent patches for RCU preempt and RCU boost. For this
reason, this is a RFC patch since it would be better to apply this after
those other patches make it upstre...

To: Steven Rostedt <rostedt@...>
Cc: LKML <linux-kernel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Peter Zijlstra <a.p.zijlstra@...>, Andrew Morton <akpm@...>
Date: Friday, September 21, 2007 - 1:12 pm

Good point, having the caller could give a good clue as to which piece

We can only be thankful that it is not possible to cancel outstanding

Looks good -- a few suggestions for better fault coverage interspersed
below. But would be useful as is. (And it would be good to apply after
the preempt/boost patches, which are currently undergoing integration
with the CPU hotplug patches -- the good news is that this integration
eliminates the need for patch #4!)

Looks good, but one question -- why not include the caller in the CRC?
Not a big deal either way, but would catch a few more cases of corruption.
Also, as things stand, the caller pointer can be silently corrupted,
which might causes confusion if someone had to examine the RCU callback
lists from a crash dump.

Interchanging the order of "crc" and "caller" would change the strategy,

Very magic indeed -- Google doesn't find it, other than in your

Why initialize "p" twice? (Once in the declaration, and once in the

Do we want exactly one (give or take concurrent checks), or do we want
the first ten or so? Sometimes having a modest sample rather than
exactly one is helpful.

And I know that it doesn't matter in this case, but I cannot stop myself
from pointing out the possibility of making "once" be an atomic_t
that is initialized to (say) -10, then making the !once check be an
atomic_add_return(). (Whew! Good to get that off my chest!!!)

Now back to real problems. ;-)

(Note to self...) The way this is coded could possibly result in false
positives. Suppose that the last element in a given callback list has
its CRC correctly calculated. Now suppose that a new callback is being
added to the end of the list. This addition is non-atomic, so the of
the old last element will be momentarily incorrect. So, need to check
that all list checks are protected by some lock. (And all the cases

I suggest also printing head->crc^rcu_crc_calc(head) to make cases

If you do decide to move the caller into the CR...

To: Paul E. McKenney <paulmck@...>
Cc: LKML <linux-kernel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Peter Zijlstra <a.p.zijlstra@...>, Andrew Morton <akpm@...>
Date: Friday, September 21, 2007 - 5:32 pm

--

Thanks, but this is still going to go through changes, from your comments

One reason was that the caller was an addition. I originally didn't have
it, but during my tests, the output was basically useless. It didn't give
any hint to where things went wrong, so I added it. The CRC is to check
the rcu is working, not really the checker itself.

Note, it helped us out lately with Peter's latest file_table patches in
-rt. With this patch applied, it triggered corruption. That was due to the
union in the fs.h between the llist and rcu_head there was a dependency
in the llist where the rcu_head would not grow. The llist can still do a
spin lock on the lock _after_ the item was added to the call_rcu. Because
of that union, the locking of the lock corrupted the crc. Set it to one.

Paul, I'm disappointed in you. That number doesn't ring a bell at all??

Why? probably because I was half asleep when writing it ;-)

I added that because during testing, it would flood the serial console. I
can modify this to whatever deems fit. Perhaps more hits will give a
better clue to what went wrong. I could also just add a print_ratelimit to

Would you prefer the above or the print_ratelimit? Maybe 10 would be best.

I believe that I tried to keep all the checks done in the same locations
that could possible modify the lists. So they should be protect by the
same mechanism. On the rcupreempt side, I made the checks within the

Yeah, I can scrap those initialization macros. That came about my first
attempt where I forgot to add the assignment to the call_rcu and it
obviously failed. So I added these macros, and it stilled failed. Then I
saw the mistake I made, fixed it, and it worked. But I never removed these
macros. I think we can just keep the crc as zero. That would also fail
the test. Hmm, maybe we should add a BAD_CRC number so that it will give

Paul, thanks for all the comments. I'll put out a new round of patches
after yours becomes offical (no "not for incl...

To: Steven Rostedt <rostedt@...>
Cc: LKML <linux-kernel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Peter Zijlstra <a.p.zijlstra@...>, Andrew Morton <akpm@...>
Date: Saturday, September 22, 2007 - 1:59 pm

Well, once Kyle spilled the beans, it was pretty obvious. ;-)

Agreed -- using the already-defined print_ratelimit() sounds much better

Definitely print_ratelimit()!!!

Or maybe a wrapper around the rcu_assign_crc() function that ensures

Sounds good!!!

Thanx, Paul
-

To: Steven Rostedt <rostedt@...>
Cc: Paul E. McKenney <paulmck@...>, LKML <linux-kernel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Peter Zijlstra <a.p.zijlstra@...>, Andrew Morton <akpm@...>
Date: Saturday, September 22, 2007 - 3:04 am

LOL! The RCU patent number; very clever indeed!

Cheers,
Kyle Moffett

-

To: Kyle Moffett <mrmacman_g4@...>
Cc: Steven Rostedt <rostedt@...>, LKML <linux-kernel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Peter Zijlstra <a.p.zijlstra@...>, Andrew Morton <akpm@...>
Date: Saturday, September 22, 2007 - 1:20 pm

Y'know, I newer did figure this one out!

I like it!!! ;-)

Thanx, Paul
-

To: Steven Rostedt <rostedt@...>
Cc: Paul E. McKenney <paulmck@...>, LKML <linux-kernel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Andrew Morton <akpm@...>
Date: Friday, September 21, 2007 - 8:02 am

On Thu, 20 Sep 2007 14:34:11 -0400 Steven Rostedt <rostedt@goodmis.org>

char sym[KSYM_SYMBOL_LEN];
sprint_symbol(sym, head->caller);
-

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Steven Rostedt <rostedt@...>, Paul E. McKenney <paulmck@...>, LKML <linux-kernel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Andrew Morton <akpm@...>
Date: Friday, September 21, 2007 - 9:16 am

This has been wondering me some time. Kernel oopses also use [<%p>],
but what really for are two sort of braces needed?

-

To: Jan Engelhardt <jengelh@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, Paul E. McKenney <paulmck@...>, LKML <linux-kernel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Andrew Morton <akpm@...>
Date: Friday, September 21, 2007 - 9:27 am

I believe the notation of [<some-hex-number>] has always been a
representation of instruction pointer. Seems that's what's used for all
outputs of instruction pointers that I've seen (well most really).

-- Steve

-

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Paul E. McKenney <paulmck@...>, LKML <linux-kernel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Andrew Morton <akpm@...>
Date: Friday, September 21, 2007 - 9:01 am

--

Yes, good point!

Although I may do it this way:

printk(" (caller: [<%p>]", head->caller);
print_symbol(" %s)\n", (unsigned long)head->caller);

and eliminate the temp variable.

-

Previous thread: [GIT PULL] [x86 setup] Fix VESA mode decoding in ACPI wakeup by H. Peter Anvin on Thursday, September 20, 2007 - 2:16 pm. (1 message)

Next thread: Re: [Bugme-new] [Bug 9043] New: tty not printed to screen by Andrew Morton on Thursday, September 20, 2007 - 3:04 pm. (2 messages)