I've spent some time continuing the work of the people on Cc and many others to remove the big kernel lock from Linux and I now have bkl-removal branch in my git tree at git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git that lets me run a kernel on my quad-core machine with the only users of the BKL being mostly obscure device driver modules. The oldest patch in this series is roughly eight years old and is Willy's patch to remove the BKL from fs/locks.c, and I took a series of patches from Jan that removes it from most of the VFS. The other non-obvious changes are: - all file operations that either have an .ioctl method or do not have their own .llseek method used to implicitly require the BKL. I've changed that so they need to explicitly set .llseek = default_llseek, .unlocked_ioctl = default_ioctl, and changed all the code that either has supplied a .ioctl method or looks like it needs the BKL somewhere else, meaning the default_llseek function might actually do something. - The block layer now has a global bkldev_mutex that is used in all block drivers in place of the BKL. The only recursive instance of the BKL was __blkdev_get(), which is now called with the blkdev_mutex held instead of grabbing the BKL. This has some possible performance implications that need to be looked into. - The init/main.c code no longer take the BKL. I figured that this was completely unnecessary because there is no other code running at the same time that takes the BKL. - The most invasive change is in the TTY layer, which has a new global mutex (sorry!). I know that Alan has plans of his own to remove the BKL from this subsystem, so my patches may not go anywhere, but they seem to work fine for me. I've called the new lock the 'Big TTY Mutex' (BTM), a name that probably makes more sense if you happen to speak German. The basic idea here is to make recursive locking and the release-on-sleep explicit, so every mutex_lock, wait_event, ...
Hi Arnd, Interesting work. For the drivers/infiniband part, it seems maybe all these drivers should be using no_llseek instead of default_llseek? (or is it better style to use nonseekable_open()?) Certainly as far as I can tell, nothing in drivers/infiniband pays any attention to f_pos. Also, is there a reason why you add "#include <linux/smp_lock.h>" to all the files where you also do ".llseek = default_llseek"? In any case I can at least take care of the llseek stuff for 2.6.35. - R. -- Roland Dreier <rolandd@cisco.com> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html --
no_llseek makes it clear that you don't want the default_llseek semantics, while nonseekable_open also prevents pread/pwrite. Ideally, I'd just use both. There is a small chance that a random user space application actually tries to seek on the device (e.g. SEEK_END) and expects a zero return value, so when in doubt, I converted everything to default_llseek instead of The last patch in the series moves the default_llseek and default_ioctl function into the same loadable module that contains the BKL itself. Moving the declarations into the respective header seemed appropriate, Ok, thanks! Arnd --
OK, I added the following to my tree, currently queued in my for-next
branch for 2.6.35:
IB: Explicitly rule out llseek to avoid BKL in default_llseek()
Several RDMA user-access drivers have file_operations structures with
no .llseek method set. None of the drivers actually do anything with
f_pos, so this means llseek is essentially a NOP, instead of returning
an error as leaving other file_operations methods unimplemented would
do. This is mostly harmless, except that a NULL .llseek means that
default_llseek() is used, and this function grabs the BKL, which we
would like to avoid.
Since llseek does nothing useful on these files, we would like it to
return an error to userspace instead of silently grabbing the BKL and
succeeding. For nearly all of the file types, we take the
belt-and-suspenders approach of setting the .llseek method to
no_llseek and also calling nonseekable_open(); the exception is the
uverbs_event files, which are created with anon_inode_getfile(), which
already sets f_mode the same way as nonseekable_open() would.
This work is motivated by Arnd Bergmann's bkl-removal tree.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
---
drivers/infiniband/core/ucm.c | 3 ++-
drivers/infiniband/core/ucma.c | 4 +++-
drivers/infiniband/core/user_mad.c | 12 ++++++++----
drivers/infiniband/core/uverbs_main.c | 11 +++++++----
4 files changed, 20 insertions(+), 10 deletions(-)
diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
index 017d6e2..7ef3954 100644
--- a/drivers/infiniband/core/ucm.c
+++ b/drivers/infiniband/core/ucm.c
@@ -1180,7 +1180,7 @@ static int ib_ucm_open(struct inode *inode, struct file *filp)
file->filp = filp;
file->device = container_of(inode->i_cdev, struct ib_ucm_device, cdev);
- return 0;
+ return nonseekable_open(inode, filp);
}
static int ib_ucm_close(struct inode *inode, struct file *filp)
@@ -1228,6 +1228,7 @@ static const struct file_operations ucm_fops = {
...I'm not sure if that is actually the path of sanity (yours at least), nor the right way to whack the other BKL users whose use is horrible but essentially private. It would be nice to get the other bits in first removing BKL from most of the kernel and building kernels which are non BKL except for the tty layer. That (after Ingo's box from hell has run it a bit) would reasonably test the assertion that the tty layer has no BKL requirements that are driven by external to tty layer code. That to me would test the biggest question of all and be a reasonably good base from which to then either apply the tty BTM patches or attack the problem properly with the BKL localised to one subtree. Alan --
Yes, we can do that by applying all patches except 'tty: implement BTM as mutex instead of BKL', which is the only one in the tty section of my series that should really change the behaviour. Building a kernel with all other BKL users gone currently implies disabling usbcore, videodev, soundcore, i4l and capi, as well as a large number of obsolete device drivers. The only ones that I can imagine still interacting with the tty code We could also make the 'tty: implement BTM as mutex instead of BKL' patch a config option that makes it possible to test it out some more while conservative users just continue to get the BKL semantics. Arnd --
Very nice work! How about going one step further: - remove CONFIG_BKL altogether - remove all the remains of the BKL code in lib/kernel_lock.c and kernel/sched.c - turn lock_kernel() into a WARN_ONCE() and unlock_kernel() into a NOP. - ... - Celebrate! :-) Ingo --
<looks inside ptrace.c> Seems that there might be a few tricksy bits in here. Please do push at least the non-obvious parts out to the relevant people. It'd be interesting to see the overall diffstat? --
Sure, that is certainly the plan. Regarding the ptrace bits, this is one of a handful of places where the BKL was put in by someone a really long time ago but with the rest of the series applied, it becomes evident that there is nothing whatsoever that it serializes with, so removing the BKL here does not make the situation worse. It could still be a bug that needs to be fixed by adding a new Let me give you three separate diffstats. One is for the trivial pushdown of the BKL into all the ioctl and llseek functions as well as marking all remaining users as 'depends on BKL' in Kconfig, the second one is for the TTY layer conversion to the Big TTY Mutex and the third one is all the rest. Arnd --- Big TTY Mutex conversion: drivers/char/Makefile | 2 + drivers/char/amiserial.c | 16 ++-- drivers/char/cyclades.c | 20 ++-- drivers/char/epca.c | 4 +- drivers/char/isicom.c | 10 +- drivers/char/istallion.c | 20 ++-- drivers/char/mxser.c | 10 +- drivers/char/n_hdlc.c | 16 ++-- drivers/char/n_r3964.c | 10 +- drivers/char/pty.c | 8 +- drivers/char/riscom8.c | 8 +- drivers/char/rocket.c | 8 +- drivers/char/serial167.c | 4 +- drivers/char/specialix.c | 10 +- drivers/char/stallion.c | 12 ++-- drivers/char/sx.c | 12 ++-- drivers/char/synclink.c | 10 ++- drivers/char/synclink_gt.c | 8 +- drivers/char/synclinkmp.c | 12 ++-- drivers/char/tty_buffer.c | 2 +- drivers/char/tty_io.c | 123 +++++++++++++++------------ drivers/char/tty_ioctl.c | 36 ++++---- drivers/char/tty_ldisc.c | 53 +++++++----- drivers/char/tty_lock.c | 100 ++++++++++++++++++++++ drivers/char/tty_port.c | 6 +- drivers/char/vc_screen.c | 4 +- drivers/char/vt.c | 4 +- drivers/char/vt_ioctl.c ...
Yeah, the comment gives this: /* * This lock_kernel fixes a subtle race with suid exec */ But there is no lock_kernel() in the exec path, may be I missed it... so this may be an old lock_kernel() that doesn't exist anymore, and the bkl in the ptrace path is not going to help in any way. What remain to check are the possible unintended racy places that this bkl might protect. I'm going to check a first pass and if it looks fine, I'll just submit to Oleg and Roland. --
config USB tristate "Support for Host-side USB" depends on USB_ARCH_HAS_HCD && BKL Well, that's very interesting definition of "obscure" :) -- Jiri Kosina SUSE Labs, Novell Inc. --
That's why I said /mostly/ obscure modules. There are soundcore, usb-core, drm, vfat and a few other very common ones, along with many obscure ones. Arnd --
FWIW, this is the full list of 148 modules that require the BKL in an x86 allmodconfig, which is probably the configuration with the largest code ...
firewire-core and raw1394 do not actually require the BKL, they only miss to declare their files as not seekable. I will post patches which change these accordingly. My guess is that there is nothing to seek in dv1394, video1394, and firedtv either. Needless to say, there may be other character device file interfaces which cannot be seeked but don't admit it yet. -- Stefan Richter -=====-==-=- --== ==-== http://arcgraph.de/sr/ --
The <linux/firewire-cdev.h> character device file ABI is based on - ioctl() to initiate actions, - read() to consume events, - mmap() for isochronous I/O DMA buffers. lseek(), pread(), pwrite() (or any kind of write() at all) on the other hand are not applicable to /dev/fw* device files. Alas, whereas for example file_operations.write == NULL causes write() to be failed with an appropriate error, file_operations.llseek == NULL causes fs/read_write.c::default_llseek to be called on lseek() per default. This looks like not doing any harm, but it grabs the Big Kernel Lock. We don't want that, and we should return an error on lseek() and friends. This is provided by fs/read_write.c::no_llseek which we get if we clear the FMODE_LSEEK (and FMODE_PREAD, FMODE_PWRITE) flag by means of nonseekable_open(). Side note: The firewire-cdev interface has always been free of any BKL usage apart from this oversight regarding default_llseek (and from involuntary BKL usage by open() in older kernels). Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> --- Somebody correct me if I got anything wrong in my patch description. This patch is motivated by Arnd's "bkl removal: make unlocked_ioctl mandatory" http://git.kernel.org/?p=linux/kernel/git/arnd/playground.git;a=blobdiff;f=drivers/fir... "BKL removal: mark remaining users as 'depends on BKL'" http://git.kernel.org/?p=linux/kernel/git/arnd/playground.git;a=blobdiff;f=drivers/fir... drivers/firewire/core-cdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: ...
The raw1394 character device file ABI is based on
- write() or ioctl() to initiate actions,
with write being used exactly like _IOW ioctls,
- read() to consume events,
- mmap() for isochronous I/O DMA buffers.
The video1394 character device file ABI is based on
- ioctl() to initiate actions,
- mmap() for isochronous I/O DMA buffers.
The dv1394 character device file ABI is based on
- ioctl() to initiate actions,
- read() and write() for simple serial reception and transmission of
DV streams with a copy_to/from_user,
- mmap() for isochronous I/O DMA buffers for I/O without user copy.
lseek(), pread(), pwrite() on the other hand are not applicable to
/dev/raw1394, /dev/video1394/*, and /dev/dv1394/* device files.
Alas, file_operations.llseek == NULL causes lseek() to call
fs/read_write.c::default_llseek per default. This looks like not doing
any harm in either of the three drivers, but it grabs the Big Kernel
Lock. We don't want that, and we should return an error on lseek() and
friends. This is provided by fs/read_write.c::no_llseek which we get if
we clear the FMODE_LSEEK (and FMODE_PREAD, FMODE_PWRITE) flag by means
of nonseekable_open().
Side note: Apart from this oversight regarding default_llseek, the
raw1394, video1394, and dv1394 interfaces have been converted away from
BKL usage some time ago. Although all this is legacy code which should
be left in peace until it is eventually removed (as it is superseded by
firewire-core's <linux/firewire-cdev.h> ABI), this change seems still
worth doing to further minimize the presence of BKL usage in the kernel.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
Somebody correct me if I misunderstood anything of these ABIs.
This patch is motivated by Arnd's
"bkl removal: make unlocked_ioctl ...In order to remove Big Kernel Lock usages from the DVB subsystem,
we need to
- provide .llseek file operations that do not grab the BKL (unlike
fs/read_write.c::default_llseek) or mark files as not seekable,
- provide .unlocked_ioctl file operations.
Add two dvb_generic_ file operations for file interfaces which are not
seekable and, respectively, do not require the BKL in their ioctl
handlers.
Use them in one driver of which I am sure of that these are applicable.
(Affected code paths were not runtime-tested since I don't have a CAM.)
Notes:
- In order to maximize code reuse and minimize API churn, I pass a
dummy NULL struct inode * through dvb_usercopy() to
dvbdev->kernel_ioctl(). Very ugly; it may be better to convert
everything from .ioctl to .unlocked_ioctl in one go and remove the
inode pointer argument everywhere.
- I suspect that actually all dvb_generic_open() users really want
nonseekable_open --- then we should simply change dvb_generic_open()
instead of adding dvb_generic_nonseekable_open() --- but I haven't
checked other users of dvb_generic_open whether they require
.llssek mehods other than fs/read_write.c::no_llseek.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
This patch is motivated by Arnd's
"bkl removal: make unlocked_ioctl mandatory"
http://git.kernel.org/?p=linux/kernel/git/arnd/playground.git;a=blobdiff;f=drivers/med...
"BKL removal: mark remaining users as 'depends on BKL'"
http://git.kernel.org/?p=linux/kernel/git/arnd/playground.git;a=blobdiff;f=drivers/med...
...In order to remove Big Kernel Lock usages from the DVB subsystem,
we need to
- provide .llseek file operations that do not grab the BKL (unlike
fs/read_write.c::default_llseek) or mark files as not seekable,
- provide .unlocked_ioctl file operations.
Add two dvb_generic_ file operations for file interfaces which are not
seekable and, respectively, do not require the BKL in their ioctl
handlers.
Use them in one driver of which I am sure of that these are applicable.
(Affected code paths in firedtv-ci were not runtime-tested since I don't
have a CAM, but the frontend ioctls were of course runtime-tested.)
Notes:
- The dvb-core internal dvb_usercopy() API is changed to match
.unlocked_ioctl() prototypes.
- I suspect that all dvb_generic_open() users really want
nonseekable_open --- then we should simply change dvb_generic_open()
instead of adding dvb_generic_nonseekable_open() --- but I haven't
checked other users of dvb_generic_open whether they require
.llssek mehods other than fs/read_write.c::no_llseek.
Applies to:
drivers/media/dvb/ttpci/av7110.c
drivers/media/dvb/ttpci/av7110_av.c
drivers/media/dvb/ttpci/av7110_ca.c
drivers/media/dvb/dvb-core/dvb_net.c
drivers/media/dvb/dvb-core/dvb_frontend.c
drivers/media/dvb/dvb-core/dvb_ca_en50221.c
- To be done by those who know the code: Check all users of
struct dvb_device.kernel_ioctl whether they really need the BKL.
Convert to .unlocked_ioctl and remove .kernel_ioctl and the
temporarily introduced dvbdev.c::legacy_usercopy().
Applies to:
drivers/media/dvb/ttpci/av7110.c
drivers/media/dvb/ttpci/av7110_av.c
drivers/media/dvb/ttpci/av7110_ca.c
drivers/media/dvb/dvb-core/dvb_frontend.c
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
Update: Split dvb_usercopy into one that follows the .unlocked_ioctl
prototype form and another one that preserves the old form.
drivers/media/dvb/dvb-core/dmxdev.c ...Your patches look good, but it would be helpful to also set .llseek = no_llseek in the file operations, because that is much easier to grep for than only the nonseekable_open. While it's technically a NOP on the presence of nonseekable_open, it will help that I don't accidentally apply my patch on top of yours. Arnd --
Sounds like a plan, but (a) if my .llseek = no_llseek and your .llseek =
default_llseek are not within diff context range, you (or whoever else
merges mine and yours) only get a compiler warning (Initializer entry
defined twice) rather than a merge conflict which couldn't be missed,
(b) there won't be a merge conflict in "BKL removal: mark remaining
users as 'depends on BKL'". (c) While I don't mind adding more visual
clutter to ieee1394/*, I prefer terse coding in firewire/*.
How about I put my nonseekable_open additions into a release branch and
send you a pull request after a few days exposure in linux-next? If you
do not plan to respin your patch queue soon or at all, I could even let
you pull a for-arnd branch with a semantically correct merge of yours
and mine.
General thoughts:
".llseek = NULL," so far meant "do the Right Thing on lseek() and
friends, as far as the fs core can tell". Shouldn't we keep it that
way? It's as close to other ".method = NULL," as it can get, which
either mean "silently skip this method if it doesn't matter" (e.g.
.flush) or "fail attempts to use this method with a fitting errno" (e.g.
.write).
Of course, as we have already seen with infiniband, firewire, ieee1394,
.llseek = NULL is ambiguous in practice. Does the driver really want to
use default_llseek, or should it rather use no_llseek and/or
nonseekable_open, or should it even implement a dummy_llseek() { return
0; } which avoids the BKL but preserves ABI behaviour? This needs to be
resolved for each and every case eventually, regardless of whether or
when your addition of .llseek = default_llseek enters mainline.
--
Stefan Richter
-=====-==-=- --== ===--
http://arcgraph.de/sr/
--
I can probably remember this specific one now, but for other people doing the same on their subsystems, adding no_llseek may help reduce My series changes the default from 'default_llseek' to 'generic_file_llseek', which is almost identical, except for taking the inode mutex instead of the BKL. Another option that has been discussed before is to make no_llseek the default, but that might cause more serious problems wiht drivers that really require seeking. Since using default_llseek can only ever make a difference if the driver actually uses the BKL in any other function, I could go through the Yes, that also sounds like a good idea. I believe that Jan actually posted a patch to do that at some point. Arnd --
What if another file operation changes the file pointer while holding the bkl? You're not protected anymore in this case. --
Exactly, that's why I changed all the drivers to set default_llseek explicitly. Even this is very likely not needed in more than a handful of drivers (if any), for a number of reasons: - sys_read/sys_write *never* hold any locks while calling file_pos_write(), which is the only place they get updated for regular files. - concurrent llseek plus other file operations on the same file descriptor usually already have an undefined outcome. - when I started inspecting drivers that look at file->f_pos themselves (not the read/write operation arguments), I found that practically all of them are doing this in a totally broken way! - The only think we'd probably ever want to lock against in llseek is readdir, which is not used in any drivers, but only in file systems. Arnd --
Yeah sure. But the pushdown (or step by step replacement with generic_file_llseek) is still necessary to ensure every Right. --
That is not that easy. generic_file_llseek() is testing against 'offset < inode->i_sb->s_maxbytes'. This is not necessarily true when you think about directories with random offset cookies. I know that seeking on directories is Yes, it is in http://git.infradead.org/users/jblunck/linux-2.6.git bkl/default-lseek There are some other patches in that branch that are not upstream yet. Mind to take them for your bkl-removal branch? Cheers, Jan --
Oh, I see. Would it work if we extend generic_file_llseek to only check s_maxbytes Frederic is now collecting the new patches. Your default-lseek series looks good to me, except for the obvious one that says 'FIXME' in the subject. Maybe Frederic can add your series except for that one as another branch to get pulled into his kill-the-bkl master branch. Arnd --
Ok, will have a look at this soon (I will also put a branch for the procfs series). Thanks. --
Yes and maybe rename generic_file_llseek to generic_llseek. --
I pushed modified versions of these patches out to linux1394-2.6.git now (master and for-next branch, on top of unrelated firewire updates). They contain the explicit .llseek = no_llseek now. http://git.kernel.org/?p=linux/kernel/git/ieee1394/linux1394-2.6.git;a=commitdiff;h=3a... http://git.kernel.org/?p=linux/kernel/git/ieee1394/linux1394-2.6.git;a=commitdiff;h=7c... -- Stefan Richter -=====-==-=- -=-- -=-=- http://arcgraph.de/sr/ --
From a quick grep at least EHCI doesn't seem to need it?
Except for those two guys in core/*.c:
/* keep API that guarantees BKL */
lock_kernel();
retval = driver->ioctl(intf, ctl->ioctl_code, buf);
unlock_kernel();
if (retval == -ENOIOCTLCMD)
retval = -ENOTTY;
I guess that could be just moved into the low level modules with unlocked_ioctl
And one use in the usbfs which seems quite bogus and can be probably removed.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
--
As a followup:
I killed some time by going through the various BKL uses in USB and
came up with this git tree to address them . Feel free to integrate
into your tree.
With this only some obscure USB low level drivers still need to
depend on BKL, the majority is clean. Gadgetfs also still needs
it for now.
I ended up also fixing some minor races in usb serial registration/
unregistration.
Opens:
- The usb serial ioctl entry needs to become a unlocked_ioctl,
but I think that needs your tree first. The code below it doesn't
need it anymore.
- The seek function in uhci-debug.c probably is still racy.
Only lightly tested. Some more reviewing would be appreciated
-Andi
The following changes since commit b72c40949b0f04728f2993a1434598d3bad094ea:
Linus Torvalds (1):
Merge branch 'for-linus' of git://git.kernel.org/.../jbarnes/pci-2.6
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc-2.6.git usb-bkl
Andi Kleen (12):
USB-BKL: Remove lock_kernel in usbfs update_sb()
USB-BKL: Remove BKL from usb serial drivers ioctl handlers
USB-BKL: Convert usb_driver ioctl to unlocked_ioctl
USB-BKL: Remove BKL use for usb serial driver probing
USB-BKL: Make usb monitor depend on BKL
USB-BKL: Make usb lcd driver depend on BKL
USB-BKL: Make usb sisvga driver depend on BKL
USB-BKL: Make usb rio500 driver depend on BKL
USB-BKL: Make usb iowarrior driver depend on BKL
USB-BKL: Remove BKL use in uhci-debug
USB-BKL: Make usb gadget fs depend on BKL
USB-BKL: Make usb gadget printer depend on BKL
drivers/usb/core/devio.c | 7 +----
drivers/usb/core/hub.c | 3 +-
drivers/usb/core/inode.c | 4 ---
drivers/usb/gadget/Kconfig | 3 +-
drivers/usb/host/uhci-debug.c | 17 +++++----------
drivers/usb/misc/Kconfig | 6 ++--
drivers/usb/misc/sisusbvga/Kconfig | 2 ...Great, I was hoping someone picks up the missing pieces. Maybe someone The usb-serial driver has a few instances where it takes the BKL in the mainline code, but this gets converted to the Big TTY Mutex That function could be removed in favor of using generic_file_ioctl and setting i_size to up->size. Also, the race is only between concurrent calls of llseek on the same file descriptor, which is undefined anyway. The current code also doesn't protect you against partial updates of f_pos during ->read() on 32 bit systems (nothing ever does), and it even fails to protect against the concurrent llseek race The patch looks correct, but I probably wouldn't bother with the rename, The serial_ioctl function is already called without the BKL, depite the name. tty_operations->ioctl was converted a long time ago, so I guess this patch can be dropped from your series. Arnd --
I wasn't sure it would protect against parallel reads. I think a rename is better, I take compile errors over subtle Ok. -Andi -- ak@linux.intel.com -- Speaking for myself only. --
There is no way for any driver or file system right now to protect against that, nor has there been for a long time[1]. The sys_read and sys_write functions use file_pos_write() to update the file->f_pos without taking any lock, and they pass a local variable into the *ppos argument of the ->read/->write file operations, which means that the file operation itself cannot add locking to the update either. We never do in-place updates of file->f_pos, but on architectures where a 64 bit load can see incorrect data from a 64 bit store, any concurrent read/write/llseek combinations may cause problems, except for two concurrent lseek. Also, llseek is usually serialized ok, fine with me. Arnd [1] http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=55f09ec0087c... was merged in linux-2.6.8, which opened this race. --
I updated the git tree removing the unneeded serial change. Feel free to pull. git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc-2.6.git usb-bkl -Andi -- ak@linux.intel.com -- Speaking for myself only. --
There is a typo in this one:
commit 12c8fcce56c0de4fdcacf048fe723c8778af940d
Author: Arnd Bergmann <arnd@relay.de.ibm.com>
Date: Wed Mar 24 20:08:55 2010 +0100
block: replace BKL with global mutex
It doesn't seem to interact with much else, so give
this a try.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index d9d6206..9c1277a 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
[ snip ]
@@ -1641,7 +1643,7 @@ static ssize_t sysfs_blk_trace_attr_store(struct device *dev,
ret = -ENXIO;
- lock_kernel();
+ mutex_unlock(&blkdev_mutex);
p = dev_to_part(dev);
bdev = bdget(part_devt(p));
if (bdev == NULL)
--
Fixed now, thanks for reporting! Arnd --
I just queued the perf_event one. It looks pretty good. I'm also looking at some of the most trivials (ehm..less hards) in the list and see which we can submit right away. Thanks. --
About this one, there is a "sensible" part:
@@ -1943,7 +1949,7 @@ static ssize_t proc_fdinfo_read(struct file *file, char __user *buf,
}
static const struct file_operations proc_fdinfo_file_operations = {
- .open = nonseekable_open,
+ .llseek = generic_file_llseek,
.read = proc_fdinfo_read,
};
Replacing default_llseek() by generic_file_llseek() as you
did for most of the other parts is fine.
But the above changes the semantics as it makes it seekable.
Why not just keeping it as is? It just ends up in no_llseek().
--
There is also the ioctl part that takes the bkl in procfs. I'll just check nothing weird happens there wrt file pos. We probably first need to pushdown the bkl in the procfs ioctl handlers. --
The default is default_llseek, which uses the BKL and cannot be The BKL in procfs is only for proc files that have registered their own .ioctl instead of .unlocked_ioctl method. Converting every file_operations instance to provide an unlocked_ioctl (as one of the other patches does) makes sure that this path is never taken. BTW, there are less than a handful of procfs files that provide an ioctl operation, and those probably should never have been merged. Arnd --
Yeah, but you removed the nonseekable_open and made generic_file_llseek in llseek on this one. This makes it seekable while it wasn't, changing its ABI. It wasn't taking the bkl before that as it was calling no_llseek(). May be its non seekable property is irrelevant, I don't know, but if this behaviour must be changed, it should be in a There are three of them. I'm going to make them .unlocked_ioctl and push the bkl inside, and warn on further uses of .ioctl, without applying the bkl there anymore. That plus your bkl removal in proc seek, should totally remove the bkl from procfs. --
Ah, I see what you mean. That change was certainly not intentional Ok Arnd --
Great, Arnd, I like this. I also have a private but stale tree where I have collected some remove bkl patches (which I will review against your tree.) I think that it is important that we keep chipping away at it though, and that we all keep sending stuff upstream when it is ready. Thanks John --
Hi Arnd,
Looking at your tree, I see you have commit 753dd249 ("perf_event: use
nonseekable_open") that does:
> --- a/kernel/perf_event.c
> +++ b/kernel/perf_event.c
> @@ -2515,6 +2515,8 @@ static int perf_fasync(int fd, struct file *filp, int on)
> }
>
> static const struct file_operations perf_fops = {
> + .open = nonseekable_open,
> + .llseek = no_llseek,
> .release = perf_release,
> .read = perf_read,
> .poll = perf_poll,
But if I understand this correctly, the assignment to .open is at best
useless -- these file_operations are only used via anon_inode_getfd()
and so there is no possible path that can call the .open method. Or am
I missing something?
(The same applies to the kvm_main.c changes too)
--
Roland Dreier <rolandd@cisco.com> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
Good point, I'll update that. Thanks. --
You're right. I did not consider this. Arnd --
