Re: [GIT, RFC] Killing the Big Kernel Lock

Previous thread: [PATCH 1/1] bluetooth: BT driver using ST for TI combo devices by pavan_savoy on Wednesday, March 24, 2010 - 2:37 pm. (1 message)

Next thread: RE: [PATCH 4/6] drivers:misc: sources for Init manager module by Pavan Savoy on Wednesday, March 24, 2010 - 2:46 pm. (1 message)
From: Arnd Bergmann
Date: Wednesday, March 24, 2010 - 2:40 pm

I've spent some time continuing the work of the people on Cc and many others
to remove the big kernel lock from Linux and I now have bkl-removal branch
in my git tree at git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git
that lets me run a kernel on my quad-core machine with the only users of the BKL
being mostly obscure device driver modules.

The oldest patch in this series is roughly eight years old and is Willy's patch
to remove the BKL from fs/locks.c, and I took a series of patches from Jan that
removes it from most of the VFS.

The other non-obvious changes are:

- all file operations that either have an .ioctl method or do not have their
  own .llseek method used to implicitly require the BKL. I've changed that
  so they need to explicitly set .llseek = default_llseek, .unlocked_ioctl =
  default_ioctl, and changed all the code that either has supplied a .ioctl
  method or looks like it needs the BKL somewhere else, meaning the
  default_llseek function might actually do something.

- The block layer now has a global bkldev_mutex that is used in all block
  drivers in place of the BKL. The only recursive instance of the BKL was
  __blkdev_get(), which is now called with the blkdev_mutex held instead of
  grabbing the BKL. This has some possible performance implications that
  need to be looked into.

- The init/main.c code no longer take the BKL. I figured that this was
  completely unnecessary because there is no other code running at the
  same time that takes the BKL.

- The most invasive change is in the TTY layer, which has a new global
  mutex (sorry!). I know that Alan has plans of his own to remove the BKL
  from this subsystem, so my patches may not go anywhere, but they seem
  to work fine for me.
  I've called the new lock the 'Big TTY Mutex' (BTM), a name that probably
  makes more sense if you happen to speak German.
  The basic idea here is to make recursive locking and the release-on-sleep
  explicit, so every mutex_lock, wait_event, ...
From: Roland Dreier
Date: Wednesday, March 24, 2010 - 2:53 pm

Hi Arnd,

Interesting work.  For the drivers/infiniband part, it seems maybe all
these drivers should be using no_llseek instead of default_llseek?  (or
is it better style to use nonseekable_open()?)  Certainly as far as I
can tell, nothing in drivers/infiniband pays any attention to f_pos.

Also, is there a reason why you add "#include <linux/smp_lock.h>" to all
the files where you also do ".llseek  = default_llseek"?

In any case I can at least take care of the llseek stuff for 2.6.35.

 - R.
-- 
Roland Dreier  <rolandd@cisco.com>
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--

From: Arnd Bergmann
Date: Wednesday, March 24, 2010 - 2:59 pm

no_llseek makes it clear that you don't want the default_llseek semantics,
while nonseekable_open also prevents pread/pwrite. Ideally, I'd just
use both.

There is a small chance that a random user space application actually tries
to seek on the device (e.g. SEEK_END) and expects a zero return value,
so when in doubt, I converted everything to default_llseek instead of

The last patch in the series moves the default_llseek and default_ioctl
function into the same loadable module that contains the BKL itself.
Moving the declarations into the respective header seemed appropriate,

Ok, thanks!

	Arnd
--

From: Roland Dreier
Date: Tuesday, March 30, 2010 - 10:22 pm

OK, I added the following to my tree, currently queued in my for-next
branch for 2.6.35:


IB: Explicitly rule out llseek to avoid BKL in default_llseek()

Several RDMA user-access drivers have file_operations structures with
no .llseek method set.  None of the drivers actually do anything with
f_pos, so this means llseek is essentially a NOP, instead of returning
an error as leaving other file_operations methods unimplemented would
do.  This is mostly harmless, except that a NULL .llseek means that
default_llseek() is used, and this function grabs the BKL, which we
would like to avoid.

Since llseek does nothing useful on these files, we would like it to
return an error to userspace instead of silently grabbing the BKL and
succeeding.  For nearly all of the file types, we take the
belt-and-suspenders approach of setting the .llseek method to
no_llseek and also calling nonseekable_open(); the exception is the
uverbs_event files, which are created with anon_inode_getfile(), which
already sets f_mode the same way as nonseekable_open() would.

This work is motivated by Arnd Bergmann's bkl-removal tree.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
---
 drivers/infiniband/core/ucm.c         |    3 ++-
 drivers/infiniband/core/ucma.c        |    4 +++-
 drivers/infiniband/core/user_mad.c    |   12 ++++++++----
 drivers/infiniband/core/uverbs_main.c |   11 +++++++----
 4 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
index 017d6e2..7ef3954 100644
--- a/drivers/infiniband/core/ucm.c
+++ b/drivers/infiniband/core/ucm.c
@@ -1180,7 +1180,7 @@ static int ib_ucm_open(struct inode *inode, struct file *filp)
 	file->filp = filp;
 	file->device = container_of(inode->i_cdev, struct ib_ucm_device, cdev);
 
-	return 0;
+	return nonseekable_open(inode, filp);
 }
 
 static int ib_ucm_close(struct inode *inode, struct file *filp)
@@ -1228,6 +1228,7 @@ static const struct file_operations ucm_fops = {
 ...
From: Alan Cox
Date: Wednesday, March 24, 2010 - 3:10 pm

I'm not sure if that is actually the path of sanity (yours at least), nor
the right way to whack the other BKL users whose use is horrible but
essentially private.

It would be nice to get the other bits in first removing BKL from most of
the kernel and building kernels which are non BKL except for the tty
layer. That (after Ingo's box from hell has run it a bit) would
reasonably test the assertion that the tty layer has no BKL requirements
that are driven by external to tty layer code.

That to me would test the biggest question of all and be a reasonably
good base from which to then either apply the tty BTM patches or attack
the problem properly with the BKL localised to one subtree.

Alan
--

From: Arnd Bergmann
Date: Wednesday, March 24, 2010 - 3:25 pm

Yes, we can do that by applying all patches except 'tty: implement BTM
as mutex instead of BKL', which is the only one in the tty section of
my series that should really change the behaviour. Building a kernel
with all other BKL users gone currently implies disabling usbcore,
videodev, soundcore, i4l and capi, as well as a large number of obsolete
device drivers.

The only ones that I can imagine still interacting with the tty code

We could also make the 'tty: implement BTM as mutex instead of BKL'
patch a config option that makes it possible to test it out some more
while conservative users just continue to get the BKL semantics.

	Arnd
--

From: Ingo Molnar
Date: Wednesday, March 24, 2010 - 3:23 pm

Very nice work!

How about going one step further:

 - remove CONFIG_BKL altogether
 - remove all the remains of the BKL code in lib/kernel_lock.c and kernel/sched.c
 - turn lock_kernel() into a WARN_ONCE() and unlock_kernel() into a NOP.
 - ...
 - Celebrate! :-)

	Ingo
--

From: Andrew Morton
Date: Wednesday, March 24, 2010 - 2:07 pm

<looks inside ptrace.c>

Seems that there might be a few tricksy bits in here.  Please do push
at least the non-obvious parts out to the relevant people.

It'd be interesting to see the overall diffstat?
--

From: Arnd Bergmann
Date: Thursday, March 25, 2010 - 3:26 am

Sure, that is certainly the plan.

Regarding the ptrace bits, this is one of a handful of places where the BKL
was put in by someone a really long time ago but with the rest of the
series applied, it becomes evident that there is nothing whatsoever that
it serializes with, so removing the BKL here does not make the situation
worse. It could still be a bug that needs to be fixed by adding a new

Let me give you three separate diffstats. One is for the trivial pushdown of
the BKL into all the ioctl and llseek functions as well as marking all
remaining users as 'depends on BKL' in Kconfig, the second one is for the
TTY layer conversion to the Big TTY Mutex and the third one is all the rest.

	Arnd

---
Big TTY Mutex conversion:
 drivers/char/Makefile           |    2 +
 drivers/char/amiserial.c        |   16 ++--
 drivers/char/cyclades.c         |   20 ++--
 drivers/char/epca.c             |    4 +-
 drivers/char/isicom.c           |   10 +-
 drivers/char/istallion.c        |   20 ++--
 drivers/char/mxser.c            |   10 +-
 drivers/char/n_hdlc.c           |   16 ++--
 drivers/char/n_r3964.c          |   10 +-
 drivers/char/pty.c              |    8 +-
 drivers/char/riscom8.c          |    8 +-
 drivers/char/rocket.c           |    8 +-
 drivers/char/serial167.c        |    4 +-
 drivers/char/specialix.c        |   10 +-
 drivers/char/stallion.c         |   12 ++--
 drivers/char/sx.c               |   12 ++--
 drivers/char/synclink.c         |   10 ++-
 drivers/char/synclink_gt.c      |    8 +-
 drivers/char/synclinkmp.c       |   12 ++--
 drivers/char/tty_buffer.c       |    2 +-
 drivers/char/tty_io.c           |  123 +++++++++++++++------------
 drivers/char/tty_ioctl.c        |   36 ++++----
 drivers/char/tty_ldisc.c        |   53 +++++++-----
 drivers/char/tty_lock.c         |  100 ++++++++++++++++++++++
 drivers/char/tty_port.c         |    6 +-
 drivers/char/vc_screen.c        |    4 +-
 drivers/char/vt.c               |    4 +-
 drivers/char/vt_ioctl.c    ...
From: Frederic Weisbecker
Date: Sunday, March 28, 2010 - 1:33 pm

Yeah, the comment gives this:

	/*
	 * This lock_kernel fixes a subtle race with suid exec
	 */

But there is no lock_kernel() in the exec path, may be I missed it...
so this may be an old lock_kernel() that doesn't exist anymore, and
the bkl in the ptrace path is not going to help in any way.

What remain to check are the possible unintended racy places that
this bkl might protect.

I'm going to check a first pass and if it looks fine, I'll just submit
to Oleg and Roland.

--

From: Jiri Kosina
Date: Thursday, March 25, 2010 - 5:55 am

config USB
	        tristate "Support for Host-side USB"
	        depends on USB_ARCH_HAS_HCD && BKL

Well, that's very interesting definition of "obscure" :)

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

--

From: Arnd Bergmann
Date: Thursday, March 25, 2010 - 6:06 am

That's why I said /mostly/ obscure modules. There are soundcore, usb-core, drm,
vfat and a few other very common ones, along with many obscure ones.

	Arnd
--

From: Arnd Bergmann
Date: Thursday, March 25, 2010 - 6:38 am

FWIW, this is the full list of 148 modules that require the BKL in an x86 allmodconfig,
which is probably the configuration with the largest code ...
From: Stefan Richter
Date: Friday, March 26, 2010 - 4:47 pm

firewire-core and raw1394 do not actually require the BKL, they only
miss to declare their files as not seekable.  I will post patches which
change these accordingly.

My guess is that there is nothing to seek in dv1394, video1394, and
firedtv either.

Needless to say, there may be other character device file interfaces
which cannot be seeked but don't admit it yet.
-- 
Stefan Richter
-=====-==-=- --== ==-==
http://arcgraph.de/sr/
--

From: Stefan Richter
Date: Saturday, March 27, 2010 - 2:16 am

The <linux/firewire-cdev.h> character device file ABI is based on
  - ioctl() to initiate actions,
  - read() to consume events,
  - mmap() for isochronous I/O DMA buffers.
lseek(), pread(), pwrite() (or any kind of write() at all) on the other
hand are not applicable to /dev/fw* device files.

Alas, whereas for example file_operations.write == NULL causes write()
to be failed with an appropriate error, file_operations.llseek == NULL
causes fs/read_write.c::default_llseek to be called on lseek() per
default.

This looks like not doing any harm, but it grabs the Big Kernel Lock.
We don't want that, and we should return an error on lseek() and
friends.  This is provided by fs/read_write.c::no_llseek which we get if
we clear the FMODE_LSEEK (and FMODE_PREAD, FMODE_PWRITE) flag by means
of nonseekable_open().

Side note:  The firewire-cdev interface has always been free of any BKL
usage apart from this oversight regarding default_llseek (and from
involuntary BKL usage by open() in older kernels).

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---

Somebody correct me if I got anything wrong in my patch description.

This patch is motivated by Arnd's
"bkl removal: make unlocked_ioctl mandatory"
http://git.kernel.org/?p=linux/kernel/git/arnd/playground.git;a=blobdiff;f=drivers/fir...
"BKL removal: mark remaining users as 'depends on BKL'"
http://git.kernel.org/?p=linux/kernel/git/arnd/playground.git;a=blobdiff;f=drivers/fir...


 drivers/firewire/core-cdev.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: ...
From: Stefan Richter
Date: Saturday, March 27, 2010 - 2:20 am

The raw1394 character device file ABI is based on
  - write() or ioctl() to initiate actions,
    with write being used exactly like _IOW ioctls,
  - read() to consume events,
  - mmap() for isochronous I/O DMA buffers.

The video1394 character device file ABI is based on
  - ioctl() to initiate actions,
  - mmap() for isochronous I/O DMA buffers.

The dv1394 character device file ABI is based on
  - ioctl() to initiate actions,
  - read() and write() for simple serial reception and transmission of
    DV streams with a copy_to/from_user,
  - mmap() for isochronous I/O DMA buffers for I/O without user copy.

lseek(), pread(), pwrite() on the other hand are not applicable to
/dev/raw1394, /dev/video1394/*, and /dev/dv1394/* device files.

Alas, file_operations.llseek == NULL causes lseek() to call
fs/read_write.c::default_llseek per default.  This looks like not doing
any harm in either of the three drivers, but it grabs the Big Kernel
Lock.  We don't want that, and we should return an error on lseek() and
friends.  This is provided by fs/read_write.c::no_llseek which we get if
we clear the FMODE_LSEEK (and FMODE_PREAD, FMODE_PWRITE) flag by means
of nonseekable_open().

Side note:  Apart from this oversight regarding default_llseek, the
raw1394, video1394, and dv1394 interfaces have been converted away from
BKL usage some time ago.  Although all this is legacy code which should
be left in peace until it is eventually removed (as it is superseded by
firewire-core's <linux/firewire-cdev.h> ABI), this change seems still
worth doing to further minimize the presence of BKL usage in the kernel.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---

Somebody correct me if I misunderstood anything of these ABIs.

This patch is motivated by Arnd's
"bkl removal: make unlocked_ioctl ...
From: Stefan Richter
Date: Saturday, March 27, 2010 - 3:40 am

In order to remove Big Kernel Lock usages from the DVB subsystem,
we need to
  - provide .llseek file operations that do not grab the BKL (unlike
    fs/read_write.c::default_llseek) or mark files as not seekable,
  - provide .unlocked_ioctl file operations.

Add two dvb_generic_ file operations for file interfaces which are not
seekable and, respectively, do not require the BKL in their ioctl
handlers.

Use them in one driver of which I am sure of that these are applicable.
(Affected code paths were not runtime-tested since I don't have a CAM.)

Notes:

  - In order to maximize code reuse and minimize API churn, I pass a
    dummy NULL struct inode * through dvb_usercopy() to
    dvbdev->kernel_ioctl().  Very ugly; it may be better to convert
    everything from .ioctl to .unlocked_ioctl in one go and remove the
    inode pointer argument everywhere.

  - I suspect that actually all dvb_generic_open() users really want
    nonseekable_open --- then we should simply change dvb_generic_open()
    instead of adding dvb_generic_nonseekable_open() --- but I haven't
    checked other users of dvb_generic_open whether they require
    .llssek mehods other than fs/read_write.c::no_llseek.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---

This patch is motivated by Arnd's
"bkl removal: make unlocked_ioctl mandatory"
http://git.kernel.org/?p=linux/kernel/git/arnd/playground.git;a=blobdiff;f=drivers/med...
"BKL removal: mark remaining users as 'depends on BKL'"
http://git.kernel.org/?p=linux/kernel/git/arnd/playground.git;a=blobdiff;f=drivers/med...


 ...
From: Stefan Richter
Date: Sunday, March 28, 2010 - 7:47 am

In order to remove Big Kernel Lock usages from the DVB subsystem,
we need to
  - provide .llseek file operations that do not grab the BKL (unlike
    fs/read_write.c::default_llseek) or mark files as not seekable,
  - provide .unlocked_ioctl file operations.

Add two dvb_generic_ file operations for file interfaces which are not
seekable and, respectively, do not require the BKL in their ioctl
handlers.

Use them in one driver of which I am sure of that these are applicable.
(Affected code paths in firedtv-ci were not runtime-tested since I don't
have a CAM, but the frontend ioctls were of course runtime-tested.)

Notes:

  - The dvb-core internal dvb_usercopy() API is changed to match
    .unlocked_ioctl() prototypes.

  - I suspect that all dvb_generic_open() users really want
    nonseekable_open --- then we should simply change dvb_generic_open()
    instead of adding dvb_generic_nonseekable_open() --- but I haven't
    checked other users of dvb_generic_open whether they require
    .llssek mehods other than fs/read_write.c::no_llseek.
    Applies to:
    drivers/media/dvb/ttpci/av7110.c
    drivers/media/dvb/ttpci/av7110_av.c
    drivers/media/dvb/ttpci/av7110_ca.c
    drivers/media/dvb/dvb-core/dvb_net.c
    drivers/media/dvb/dvb-core/dvb_frontend.c
    drivers/media/dvb/dvb-core/dvb_ca_en50221.c

  - To be done by those who know the code:  Check all users of
    struct dvb_device.kernel_ioctl whether they really need the BKL.
    Convert to .unlocked_ioctl and remove .kernel_ioctl and the
    temporarily introduced dvbdev.c::legacy_usercopy().
    Applies to:
    drivers/media/dvb/ttpci/av7110.c
    drivers/media/dvb/ttpci/av7110_av.c
    drivers/media/dvb/ttpci/av7110_ca.c
    drivers/media/dvb/dvb-core/dvb_frontend.c

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---

Update:  Split dvb_usercopy into one that follows the .unlocked_ioctl
prototype form and another one that preserves the old form.

 drivers/media/dvb/dvb-core/dmxdev.c         ...
From: Arnd Bergmann
Date: Saturday, March 27, 2010 - 7:37 am

Your patches look good, but it would be helpful to also set .llseek = no_llseek
in the file operations, because that is much easier to grep for than
only the nonseekable_open. While it's technically a NOP on the presence of
nonseekable_open, it will help that I don't accidentally apply my patch on
top of yours.

	Arnd
--

From: Stefan Richter
Date: Sunday, March 28, 2010 - 5:27 am

Sounds like a plan, but (a) if my .llseek = no_llseek and your .llseek =
default_llseek are not within diff context range, you (or whoever else
merges mine and yours) only get a compiler warning (Initializer entry
defined twice) rather than a merge conflict which couldn't be missed,
(b) there won't be a merge conflict in "BKL removal: mark remaining
users as 'depends on BKL'".  (c) While I don't mind adding more visual
clutter to ieee1394/*, I prefer terse coding in firewire/*.

How about I put my nonseekable_open additions into a release branch and
send you a pull request after a few days exposure in linux-next?  If you
do not plan to respin your patch queue soon or at all, I could even let
you pull a for-arnd branch with a semantically correct merge of yours
and mine.

General thoughts:

".llseek = NULL," so far meant "do the Right Thing on lseek() and
friends, as far as the fs core can tell".  Shouldn't we keep it that
way?  It's as close to other ".method = NULL," as it can get, which
either mean "silently skip this method if it doesn't matter" (e.g.
.flush) or "fail attempts to use this method with a fitting errno" (e.g.
.write).

Of course, as we have already seen with infiniband, firewire, ieee1394,
.llseek = NULL is ambiguous in practice.  Does the driver really want to
use default_llseek, or should it rather use no_llseek and/or
nonseekable_open, or should it even implement a dummy_llseek() { return
0; } which avoids the BKL but preserves ABI behaviour?  This needs to be
resolved for each and every case eventually, regardless of whether or
when your addition of .llseek = default_llseek enters mainline.
-- 
Stefan Richter
-=====-==-=- --== ===--
http://arcgraph.de/sr/
--

From: Arnd Bergmann
Date: Sunday, March 28, 2010 - 1:05 pm

I can probably remember this specific one now, but for other people
doing the same on their subsystems, adding no_llseek may help reduce

My series changes the default from 'default_llseek' to 'generic_file_llseek',
which is almost identical, except for taking the inode mutex instead of the
BKL. Another option that has been discussed before is to make no_llseek
the default, but that might cause more serious problems wiht drivers that
really require seeking.

Since using default_llseek can only ever make a difference if the driver
actually uses the BKL in any other function, I could go through the

Yes, that also sounds like a good idea. I believe that Jan actually posted
a patch to do that at some point.

	Arnd
--

From: Frederic Weisbecker
Date: Sunday, March 28, 2010 - 1:15 pm

What if another file operation changes the file pointer while holding the bkl?
You're not protected anymore in this case.

--

From: Arnd Bergmann
Date: Sunday, March 28, 2010 - 2:34 pm

Exactly, that's why I changed all the drivers to set default_llseek explicitly.
Even this is very likely not needed in more than a handful of drivers (if any),
for a number of reasons:

- sys_read/sys_write *never* hold any locks while calling file_pos_write(),
  which is the only place they get updated for regular files.
- concurrent llseek plus other file operations on the same file descriptor
  usually already have an undefined outcome.
- when I started inspecting drivers that look at file->f_pos themselves (not
  the read/write operation arguments), I found that practically all of them
  are doing this in a totally broken way!
- The only think we'd probably ever want to lock against in llseek
  is readdir, which is not used in any drivers, but only in file systems.

	Arnd
--

From: Frederic Weisbecker
Date: Sunday, March 28, 2010 - 4:24 pm

Yeah sure. But the pushdown (or step by step replacement
with generic_file_llseek) is still necessary to ensure every






Right.

--

From: Jan Blunck
Date: Thursday, April 8, 2010 - 1:45 pm

That is not that easy. generic_file_llseek() is testing against 'offset <
inode->i_sb->s_maxbytes'. This is not necessarily true when you think about
directories with random offset cookies. I know that seeking on directories is

Yes, it is in

http://git.infradead.org/users/jblunck/linux-2.6.git bkl/default-lseek

There are some other patches in that branch that are not upstream yet. Mind to
take them for your bkl-removal branch?

Cheers,
Jan
--

From: Arnd Bergmann
Date: Thursday, April 8, 2010 - 2:27 pm

Oh, I see. Would it work if we extend generic_file_llseek to only check s_maxbytes

Frederic is now collecting the new patches. Your default-lseek series looks
good to me, except for the obvious one that says 'FIXME' in the subject.

Maybe Frederic can add your series except for that one as another branch to
get pulled into his kill-the-bkl master branch.

	Arnd
--

From: Frederic Weisbecker
Date: Thursday, April 8, 2010 - 2:30 pm

Ok, will have a look at this soon (I will also put a branch for the procfs
series).

Thanks.

--

From: Jan Blunck
Date: Friday, April 9, 2010 - 4:02 am

Yes and maybe rename generic_file_llseek to generic_llseek.

--

From: Stefan Richter
Date: Saturday, April 10, 2010 - 8:13 am

I pushed modified versions of these patches out to linux1394-2.6.git now
(master and for-next branch, on top of unrelated firewire updates). They
contain the explicit .llseek = no_llseek now.

http://git.kernel.org/?p=linux/kernel/git/ieee1394/linux1394-2.6.git;a=commitdiff;h=3a...
http://git.kernel.org/?p=linux/kernel/git/ieee1394/linux1394-2.6.git;a=commitdiff;h=7c...
-- 
Stefan Richter
-=====-==-=- -=-- -=-=-
http://arcgraph.de/sr/

--

From: Andi Kleen
Date: Sunday, March 28, 2010 - 2:58 pm

From a quick grep at least EHCI doesn't seem to need it?

Except for those two guys in core/*.c:

                        /* keep API that guarantees BKL */
                        lock_kernel();
                        retval = driver->ioctl(intf, ctl->ioctl_code, buf);
                        unlock_kernel();
                        if (retval == -ENOIOCTLCMD)
                                retval = -ENOTTY;

I guess that could be just moved into the low level modules with unlocked_ioctl

And one use in the usbfs which seems quite bogus and can be probably removed.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.
--

From: Andi Kleen
Date: Sunday, March 28, 2010 - 6:07 pm

As a followup:

I killed some time by going through the various BKL uses in USB and
came up with this git tree to address them . Feel free to integrate
into your tree.

With this only some obscure USB low level drivers still need to 
depend on BKL, the majority is clean. Gadgetfs also still needs
it for now.

I ended up also fixing some minor races in usb serial registration/
unregistration.

Opens:
- The usb serial ioctl entry needs to become a unlocked_ioctl,
but I think that needs your tree first. The code below it doesn't
need it anymore.
- The seek function in uhci-debug.c probably is still racy.

Only lightly tested. Some more reviewing would be appreciated

-Andi

The following changes since commit b72c40949b0f04728f2993a1434598d3bad094ea:
  Linus Torvalds (1):
        Merge branch 'for-linus' of git://git.kernel.org/.../jbarnes/pci-2.6

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc-2.6.git usb-bkl

Andi Kleen (12):
      USB-BKL: Remove lock_kernel in usbfs update_sb()
      USB-BKL: Remove BKL from usb serial drivers ioctl handlers
      USB-BKL: Convert usb_driver ioctl to unlocked_ioctl
      USB-BKL: Remove BKL use for usb serial driver probing
      USB-BKL: Make usb monitor depend on BKL
      USB-BKL: Make usb lcd driver depend on BKL
      USB-BKL: Make usb sisvga driver depend on BKL
      USB-BKL: Make usb rio500 driver depend on BKL
      USB-BKL: Make usb iowarrior driver depend on BKL
      USB-BKL: Remove BKL use in uhci-debug
      USB-BKL: Make usb gadget fs depend on BKL
      USB-BKL: Make usb gadget printer depend on BKL

 drivers/usb/core/devio.c              |    7 +----
 drivers/usb/core/hub.c                |    3 +-
 drivers/usb/core/inode.c              |    4 ---
 drivers/usb/gadget/Kconfig            |    3 +-
 drivers/usb/host/uhci-debug.c         |   17 +++++----------
 drivers/usb/misc/Kconfig              |    6 ++--
 drivers/usb/misc/sisusbvga/Kconfig    |    2 ...
From: Arnd Bergmann
Date: Monday, March 29, 2010 - 4:48 am

Great, I was hoping someone picks up the missing pieces. Maybe someone


The usb-serial driver has a few instances where it takes the BKL
in the mainline code, but this gets converted to the Big TTY Mutex

That function could be removed in favor of using generic_file_ioctl
and setting i_size to up->size.

Also, the race is only between concurrent calls of llseek on
the same file descriptor, which is undefined anyway.
The current code also doesn't protect you against partial updates
of f_pos during ->read() on 32 bit systems (nothing ever does),
and it even fails to protect against the concurrent llseek race

The patch looks correct, but I probably wouldn't bother with the rename,

The serial_ioctl function is already called without the BKL, depite the
name. tty_operations->ioctl was converted a long time ago, so I guess this
patch can be dropped from your series.

	Arnd
--

From: Andi Kleen
Date: Monday, March 29, 2010 - 5:30 am

I wasn't sure it would protect against parallel reads.


I think a rename is better, I take compile errors over subtle

Ok.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.
--

From: Arnd Bergmann
Date: Monday, March 29, 2010 - 7:43 am

There is no way for any driver or file system right now to protect
against that, nor has there been for a long time[1]. The sys_read and
sys_write functions use file_pos_write() to update the file->f_pos
without taking any lock, and they pass a local variable into the
*ppos argument of the ->read/->write file operations, which means
that the file operation itself cannot add locking to the update
either.

We never do in-place updates of file->f_pos, but on architectures
where a 64 bit load can see incorrect data from a 64 bit store,
any concurrent read/write/llseek combinations may cause problems,
except for two concurrent lseek. Also, llseek is usually serialized

ok, fine with me.

	Arnd

[1] http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=55f09ec0087c...
was merged in linux-2.6.8, which opened this race. 
--

From: Andi Kleen
Date: Monday, March 29, 2010 - 1:11 pm

I updated the git tree removing the unneeded serial change. Feel free to pull.

git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc-2.6.git usb-bkl

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.
--

From: Dan Carpenter
Date: Thursday, March 25, 2010 - 6:40 am

There is a typo in this one:

commit 12c8fcce56c0de4fdcacf048fe723c8778af940d
Author: Arnd Bergmann <arnd@relay.de.ibm.com>
Date:   Wed Mar 24 20:08:55 2010 +0100

    block: replace BKL with global mutex
    
    It doesn't seem to interact with much else, so give
    this a try.

    Signed-off-by: Arnd Bergmann <arnd@arndb.de>

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index d9d6206..9c1277a 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c

	[ snip ]

@@ -1641,7 +1643,7 @@ static ssize_t sysfs_blk_trace_attr_store(struct device *dev,
 
        ret = -ENXIO;
 
-       lock_kernel();
+       mutex_unlock(&blkdev_mutex);
        p = dev_to_part(dev);
        bdev = bdget(part_devt(p));
        if (bdev == NULL)


--

From: Arnd Bergmann
Date: Thursday, March 25, 2010 - 7:14 am

Fixed now, thanks for reporting!

	Arnd
--

From: Frederic Weisbecker
Date: Sunday, March 28, 2010 - 1:04 pm

I just queued the perf_event one. It looks pretty good. I'm also
looking at some of the most trivials (ehm..less hards) in the list
and see which we can submit right away.

Thanks.

--

From: Frederic Weisbecker
Date: Sunday, March 28, 2010 - 1:11 pm

This one is upstream now.

--

From: Frederic Weisbecker
Date: Sunday, March 28, 2010 - 4:18 pm

About this one, there is a "sensible" part:


@@ -1943,7 +1949,7 @@ static ssize_t proc_fdinfo_read(struct file *file, char __user *buf,
 }
 
 static const struct file_operations proc_fdinfo_file_operations = {
-	.open		= nonseekable_open,
+	.llseek		= generic_file_llseek,
 	.read		= proc_fdinfo_read,
 };
 

Replacing default_llseek() by generic_file_llseek() as you
did for most of the other parts is fine.

But the above changes the semantics as it makes it seekable.
Why not just keeping it as is? It just ends up in no_llseek().

--

From: Frederic Weisbecker
Date: Sunday, March 28, 2010 - 4:38 pm

There is also the ioctl part that takes the bkl in procfs.
I'll just check nothing weird happens there wrt file pos.
We probably first need to pushdown the bkl in the procfs
ioctl handlers.

--

From: Arnd Bergmann
Date: Monday, March 29, 2010 - 4:04 am

The default is default_llseek, which uses the BKL and cannot be

The BKL in procfs is only for proc files that have registered
their own .ioctl instead of .unlocked_ioctl method. Converting
every file_operations instance to provide an unlocked_ioctl
(as one of the other patches does) makes sure that this path
is never taken. BTW, there are less than a handful of procfs files
that provide an ioctl operation, and those probably should never
have been merged.

	Arnd
--

From: Frederic Weisbecker
Date: Monday, March 29, 2010 - 10:59 am

Yeah, but you removed the nonseekable_open and made generic_file_llseek
in llseek on this one.
This makes it seekable while it wasn't, changing its ABI.
It wasn't taking the bkl before that as it was calling
no_llseek().

May be its non seekable property is irrelevant, I don't know,
but if this behaviour must be changed, it should be in a


There are three of them. I'm going to make them .unlocked_ioctl
and push the bkl inside, and warn on further uses of .ioctl,
without applying the bkl there anymore.

That plus your bkl removal in proc seek, should totally remove the
bkl from procfs.

--

From: Arnd Bergmann
Date: Monday, March 29, 2010 - 2:18 pm

Ah, I see what you mean. That change was certainly not intentional

Ok

	Arnd
--

From: John Kacur
Date: Monday, March 29, 2010 - 5:45 am

Great, Arnd, I like this.

I also have a private but stale tree where I have collected some
remove bkl patches (which I will review against your tree.)
I think that it is important that we keep chipping away at it though,
and that we all keep sending stuff upstream when it is ready.

Thanks
John
--

From: Roland Dreier
Date: Wednesday, March 31, 2010 - 3:11 pm

Hi Arnd,

Looking at your tree, I see you have commit 753dd249 ("perf_event: use
nonseekable_open") that does:

 > --- a/kernel/perf_event.c
 > +++ b/kernel/perf_event.c
 > @@ -2515,6 +2515,8 @@ static int perf_fasync(int fd, struct file *filp, int on)
 >  }
 >  
 >  static const struct file_operations perf_fops = {
 > +       .open                   = nonseekable_open,
 > +       .llseek                 = no_llseek,
 >         .release                = perf_release,
 >         .read                   = perf_read,
 >         .poll                   = perf_poll,

But if I understand this correctly, the assignment to .open is at best
useless -- these file_operations are only used via anon_inode_getfd()
and so there is no possible path that can call the .open method.  Or am
I missing something?

(The same applies to the kvm_main.c changes too)
-- 
Roland Dreier <rolandd@cisco.com> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--

From: Frederic Weisbecker
Date: Wednesday, March 31, 2010 - 3:20 pm

Good point, I'll update that.

Thanks.

--

From: Arnd Bergmann
Date: Thursday, April 1, 2010 - 1:50 am

You're right. I did not consider this.

	Arnd
--

Previous thread: [PATCH 1/1] bluetooth: BT driver using ST for TI combo devices by pavan_savoy on Wednesday, March 24, 2010 - 2:37 pm. (1 message)

Next thread: RE: [PATCH 4/6] drivers:misc: sources for Init manager module by Pavan Savoy on Wednesday, March 24, 2010 - 2:46 pm. (1 message)