Dave Hansen posted a patch to the lkml this morning, prefacing with the comment, "For those of you who heard the talk at OLS Friday morning, this patch won't be too much of a surprise. But for the rest of you..." He went on to describe an effort the patch would help in understanding how the big kernel lock (BKL) is used:
"The BKL's "magical" properties of allowing recursive holds on a single cpu and its release-on-sleep semantics make it really hard to replace with new locking schemes. Before we can really remove it, we must first characterize the places where it is used in these crazy ways."
His full email follows, further describing what he's found so far using the supplied patch. He also includes sample patches for the tty and ext3 code saying, "Don't take too much stock in these, they're just a demonstration and not nearly complete."
Anyone unclear on what the BKL is might refer to our earlier interview with Peter Chubb, during which he explains:
Peter Chubb: When Linux was first ported to an SMP machine, all accesses to the kernel were protected by a single lock, the `Big Kernel Lock'. Access to the BKL is a bottleneck. So recent work has been to replace the BKL with finer grain locking, to reduce contention. However, there are still some places where it's used. Many people are working on removing it, and either introducing algorithms that don't need locking, or using locks that protect access to much smaller pieces of data.
From: Dave Hansen To: linux-kernel mailing list Subject: [PATCH] debugging for BKL Date: Sat, 29 Jun 2002 06:38:21 -0700 For those of you who heard the talk at OLS Friday morning, this patch won't be too much of a surprise. But for the rest of you... The BKL's "magical" properties of allowing recursive holds on a single cpu and its release-on-sleep semantics make it really hard to replace with new locking schemes. Before we can really remove it, we must first characterize the places where it is used in these crazy ways. This patch replaces centralizes the declaration of (un)lock_kernel() and makes all the architecures define __(un)lock_kernel() instead. The #define is necessary so that __LINE__ and (if I want to, later __FUNCTION__) will work. Several macros have been introduced in order to spit out a message whenever a recursive hold of the BKL is released. By default, each message (from a single unlock_kernel() instance) will only be printed once, but this can be overridden on an individual basis. This limit is helpful to indicate if the particular condition is very rare, or relatively common. There are plenty of ways to do this, so I implemented the second-laziest one. The first is to do nothing :) There is also a er_lock_kernel(), this call expects the BKL to already be held (er==expect recursive). If the BKL isn't held, a message is printed saying so. For instance, I saw a lot of these on my ext3 filesystem: release of recursive BKL hold, depth: 1inode:1108
I went to ext3/inode.c and replaced inode.c:2607's lock_kernel() with
er_lock_kernel(). I'll be notified if the BKL is ever _not_ held
here. I'm using ext3 as an example because it was making the largest
footprint in the logs the first time I booted. Actually, it
never did boot, it was too busy printing messages to the serial port
:) (that's when I implemented the print limiter)
unlock_kernel_quiet() will make sure that none of these messages get
This is all activated with the config option CONFIG_DEBUG_BKL, which
is accessible in the kernel debugging section. If the config option
is off, the object code should be exactly the same as a kernel without
this patch. The kernel is slower, but is quite usable with the patch
applied and turned on.
I also attached a few patches which add some checking to the tty and
ext3 code. Don't take too much stock in these, they're just a
demonstration and not nearly complete.
P.S. Thanks to Ted Ts'o for suggesting that I print messages instead
of simply bugging on these conditions. Be careful what you ask for.