Re: [PATCH 4/6] staging: push down BKL into ioctl functions

Previous thread: [GIT PULL] Preparation for BKL'ed ioctl removal by Frederic Weisbecker on Thursday, April 15, 2010 - 8:55 pm. (5 messages)

Next thread: [PATCH] blkio: Initialize blkg->stats_lock for the root cfqg too by Divyesh Shah on Thursday, April 15, 2010 - 9:02 pm. (4 messages)
From: Frederic Weisbecker
Date: Thursday, April 15, 2010 - 8:56 pm

Linus,

In order to stop the bkl contagion in ioctl and push the
bkl from Vfs to the drivers that don't implement a proper
unlock_ioctl, this patch proposes to add a new "locked_ioctl"
field in struct file_operation.

This field is never called from vfs and requires to assign a
new "deprecated_ioctl" helper to the unlocked_ioctl field.
This deprecated_ioctl() helper then calls the locked_ioctl()
callback with the bkl held.

The point of doing this is to change every users of fops::ioctl
to the new scheme. Once there is no more users of the ioctl field,
we'll then be able to remove it.

Also this change brings a new config BKL which is always
enabled for now. Every drivers that use the bkl through
explicit lock_kernel() calls or by using deprecated_ioctl()
will have to select CONFIG_BKL.

Once we get no more uses of the bkl from the core features but
only on individual drivers, config BKL can be turned off
by default and selected by those drivers if needed, relegating
the bkl as an obsolete library for poor old drivers.

And given the work happening currently (tty mutex, vfs pushdowns,
procfs bkl removal, llseek janitorials, etc...), this may happen
sooner than was expected.

Now the point of having this patch in 2.6.34 is to turn
the old ioctl implementations to the new scheme in a set
of patches that relevant maintainers can apply to their
tree (or apply by ourself for unmaintained areas) for .35
Otherwise we would need to queue everything in a single
tree that may bring conflicts in the next merge window.

This single patch has very few chances to bring any regressions.

Please pull the bkl/ioctl branch that can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	bkl/ioctl

Thanks,
	Frederic
---

Arnd Bergmann (1):
      vfs: Introduce CONFIG_BKL and deprecated_ioctl


 fs/ioctl.c               |   22 ++++++++++++++++++++++
 include/linux/fs.h       |    3 +++
 include/linux/smp_lock.h |    4 ++++
 ...
From: Frederic Weisbecker
Date: Wednesday, April 21, 2010 - 5:48 pm

Linus,

In this v2, I've removed the declaration of default_llseek
from smp_lock.h, as this export can be made later (we
want to make any use of default_llseek() depend on
CONFIG_BKL as well, but that can wait).

Please pull the bkl/ioctl-v2 branch that can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	bkl/ioctl-v2

Thanks,
	Frederic
---

Arnd Bergmann (1):
      vfs: Introduce CONFIG_BKL and deprecated_ioctl


 fs/ioctl.c               |   22 ++++++++++++++++++++++
 include/linux/fs.h       |    3 +++
 include/linux/smp_lock.h |    3 +++
 kernel/Kconfig.locks     |   10 ++++++++++
 4 files changed, 38 insertions(+), 0 deletions(-)

---
commit 333f5fb46d15d057f0d69da0dd8b0a3db89bf34f
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Thu Apr 1 14:42:38 2010 +0200

    vfs: Introduce CONFIG_BKL and deprecated_ioctl
    
    This is a preparation for the removal of the big kernel lock that
    introduces new interfaces for device drivers still using it.
    
    We can start marking those device drivers as 'depends on CONFIG_BKL'
    now, and make that symbol optional later, when the point has come
    at which we are able to build a kernel without the BKL.
    
    Similarly, device drivers that currently make use of the implicit
    BKL locking around the ioctl function can now get annotated by
    changing
    
    	.ioctl = foo_ioctl,
    
    to
    
    	.locked_ioctl = foo_ioctl,
    	.unlocked_ioctl = deprecated_ioctl,
    
    As soon as no driver remains using the old ioctl callback, it can
    get removed.
    
    [fweisbec: move config bkl from Kconfig.debug to Kconfig.locks,
    rename default_ioctl to deprecated_ioctl]
    
    v2: Remove default_llseek() declaration from smp_lock.h, we
    don't need to prepare it to be modularized right now.
    
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>

diff --git a/fs/ioctl.c ...
From: Frederic Weisbecker
Date: Saturday, April 24, 2010 - 8:25 am

Hi Linus,

This is the second version of this pull request and you don't seem to be
pulling it either.

Could you please tell us what kind of hesitation or problems you may have
with this?

If you think it's too late to merge this, we can still cook it for 2.6.35
and apply every dependent ioctl conversion in the same tree, this will have
the drawback that we can't push dependent patches to the relevant maintainers
trees (that said I guess that a good part of the drivers that still implement ioctl
are about unmaintained areas).

Or may be you don't like the core idea of this patch.

Either way please tell us so that we can go ahead with this and choose a
direction that looks more appropriate for you.

Thanks,
	Frederic.

--

From: Linus Torvalds
Date: Saturday, April 24, 2010 - 11:36 am

I don't see the point, frankly. Especially not outside the merge window, 
but quite frankly, I don't see it in general. The whole thing seems to be 
designed to be inconvenient, and to have a config option that I 
fundamentally don't believe in (CONFIG_BKL).

		Linus
--

From: Linus Torvalds
Date: Saturday, April 24, 2010 - 11:47 am

More detail: it still leaves that old "ioctl" function pointer that needs 
the BKL and is ungreppable. So the whole and only point of the patch is to 
make our current mess even _more_ complex, with three different cases. No, 
thank you.

Quite frankly, if you want to get rid of the BKL in ioctl's and make them 
easily greppable, then I would suggest a simple renaming: rename the 
current '->ioctl()' thing to '->bkl_ioctl()', and mark it deprecated. No 
new config options, no new games. Just rename it. No need to mark things 
with CONFIG_BKL, when you can just see it by virtue of them using 
'bkl_ioctl'.

			Linus
--

From: Arnd Bergmann
Date: Saturday, April 24, 2010 - 12:54 pm

We want to do the rename in the next merge window and remove the old
->ioctl(), this patch is just a preparation for this so we can start
queuing the patches for the rename in maintainer trees.
The addition of the deprecated_ioctl() helper is not essential but
let's us move all BKL users into loadable modules next.

The CONFIG_BKL stuff is not a requirement for doing this, but something
we /also/ want to do in the next merge window, i.e. mark all BKL users
as CONFIG_BKL, not just the ones that use the locked_ioctl (or bkl_ioctl).

I agree that it's now a bit late for this, but when I initially suggested
this (before -rc3, IIRC), that would have given us the chance to queue up
all the patches with a dependency on this for the next merge window.
Now the current outlook is probably that we do a lot of the preparation
work like this patch in 2.6.35-rc1, which moves the merge of the interesting
parts out to the 2.6.36 timeframe.

	Arnd
--

From: Linus Torvalds
Date: Saturday, April 24, 2010 - 1:01 pm

.. and I think that's simply fundamentally wrong. What does it buy us, 
except for another really annoying config option? It sure as hell doesn't 
buy us any code-size (what, a couple of bytes).

Quite frankly, if you want to just prepare to rename things one by one, 
then you might as well just have a single line

	#define bkl_ioctl ioctl

and then you can do

	.bkl_ioctl = driver_ioctl

but the thing is - what does that _buy_ us without the ability to grep for 
and cause compile errors for drivers that haven't done this? Nothing.

So seriously - I'd _much_ rather just get one single large patch that just 
renames everything. None of this crap that makes no sense.

			Linus
--

From: Arnd Bergmann
Date: Saturday, April 24, 2010 - 1:40 pm

With CONFIG_BKL disabled, we gain a few cyles in the scheduler,
since we no longer need to check if the BKL was held and needs to
be released. Not much, but better than just a few bytes of object size.

With CONFIG_BKL=m, what we gain is visibility: Doing an lsmod will
show you if bkl.ko is loaded and what other modules are using it.
My hope is that this will motivate people to remove it from the
remaining modules.
One of the crazier ideas was to automatically taint the kernel when
bkl.ko gets loaded. This does not gain us anything besides making a

Being able to grep is not the reason for this patch. The reason is that
we're trying to eliminate the BKL from all non-obscure code in the kernel
and one of the nasty ones is fs/ioctl.c, which always gets compiled in.
The trick with bkl.ko requires that any code calling lock_kernel() needs
to be a module, so the deprecated_ioctl() helper will have to be part

Ok. We'll go back to my intial approach for ioctl then, doing it all at
once. The reason for the staged approach was to avoid bisect breakage
and conflicts with maintainer updates, but since most of the remaining
users at this point are really unmaintained (or staging drivers), there's
probably not much to worry about here.

There will be one patch that does three things:
- add a deprecated_ioctl() function
- rename fops->ioctl to fops->bkl_ioctl
- set fops->unlocked_ioctl=deprecated_ioctl for any file_operations instance
  that uses bkl_ioctl
Is this ok for you?

Regarding the CONFIG_BKL option, should that also be done as another patch
changing all the Kconfig files? My preference would be to have one patch
adding the Kconfig option first, giving the maintainers the choice to either
mark their code 'depends on BKL' or to remove the dependency by changing
their code.

	Arnd
--

From: Linus Torvalds
Date: Saturday, April 24, 2010 - 3:15 pm

That has _nothing_ to do with the ioctl's though.

Stop mixing things up.

There are two totally independent issues:

 - making the BKL ioctl's be explicit and findable

 - eventually getting rid of the BKL entirely

and I think you guys are totally mixing things up, and making things WORSE 
in the process.

The notion of having _three_ different "ioctl()" function pointers just 
makes me want to gag. And there is absolutely _zero_ reason for it. Tjhere 
is no way in hell that we want to have every subsystem maintainer try to 
independently do their own ioctl's. Most of the drivers that have those 
things are basically unmaintained or on the back burner anyway.

So don't make the current ugly ioctl situation worse. Not even as a 
stop-gap, because there is absolutely _zero_ upside to making yet another 
new crazy temporary ioctl interface.

And don't try to conflate the issue of ioctl and BKL. There are still 
code-paths that do lock_kernel() without the ioctl's, so the whole ioctl 
renaming has _zero_ to do with CONFIG_BKL.

		Linus
--

From: Frederic Weisbecker
Date: Sunday, April 25, 2010 - 10:39 am

Our final goal was not to have three different ioctl interfaces. This state was
only deemed to be temporary. This was the only way to make the change
smoother and don't conflict with other trees with a single monolithic patch.

But if you are ok with a single one, then we are going this way and we'll send it for


It's true, but once it gets pushed down/dropped from every core parts (which
is what we are working on currently in parallel), lock_kernel() and .bkl_ioctl
is only going to be used by unmaintained drivers. This is the time where having
a CONFIG_BKL is going to make sense. And it won't be a question of saving some
bytes but improve efficiency of schedule() for those who don't need such old or
unmaintained drivers.

May be we should only start to focus on this new config once this state is reached.

And to prepare for that, are you ok with this scheme of:

-	.ioctl = foo,
+	.unlocked_ioctl = bkl_ioctl,
+	.bkl_ioctl	= foo,

...done at the same time as the big rename patch.
This will prepare to remove the bkl from vfs and build it conditionally
from the bkl lib, once the bkl is out the core?

--

From: Linus Torvalds
Date: Sunday, April 25, 2010 - 10:49 am

Seriously, why not just

	-	.ioctl = foo,
	+	.bkl_ioctl = foo

because that line of

	+	.unlocked_ioctl = bkl_ioctl,

is just total and utter _garbage_.  There is zero reason for it.

In the long run (this is a year from now, when we rename "unlocked_ioctl" 
back to just "ioctl"), the vfs_ioctl code will just do

		struct file_operations *fops = filp->f_op;

		if (!fops)
			return -ENOTTY;

		if (fops->ioctl) {
			int error = fops->ioctl(...)
			if (error == -ENOIOCTLCMD)
				error = -EINVAL;
			return error;
		}
	#ifdef CONFIG_BKL
		if (fops->bkl_ioctl) {
			int error;
			lock_kernel();
			error = fops->bkl_ioctl(...)
			unlock_kernel();
			return error;
		}
	#endif
		return -ENOTTY;

and we're all done.

At NO point is there any advantage to that "bkl_ioctl" crap. It doesn't 
help the legacy drivers (which won't even _compile_ unless CONFIG_BKL is 
set anyway), it doesn't help the core code, it doesn't help _anybody_.

Not today, not tomorrow, not with CONFIG_BKL, and not without.

			Linus
--

From: Frederic Weisbecker
Date: Sunday, April 25, 2010 - 11:05 am

Well, we won't be able to get this bkl.ko but the desired effect of
having it was rather psychological than practical, as Arnd explained:
to make the dependency more visible and pull concerned people
into dropping the bkl from the drivers they are using.

But other than that, the final effect remains pretty the same so
it's not a big deal.

I'm ok with that.

Thanks for clarifying the situation!

--

From: Arnd Bergmann
Date: Monday, April 26, 2010 - 1:30 am

We could also stop playing games with with this and just kill the
locked variant of ioctl right away. No rename to bkl_ioctl, no
helper functions.

It's served it's purpose and we now have the list of 157 files
that still use fops->ioctl, so if we just push the BKL into those
files and make them use unlocked_ioctl, we will be able to remove
->ioctl for good. We can do the rename of ->unlocked_ioctl to
->ioctl right after that if you like, or in a year from now, I
don't care.

	Arnd
--

From: Linus Torvalds
Date: Monday, April 26, 2010 - 11:08 am

I wouldn't mind that. But we've not found people to do it so far, so...

Anyway, I spent some time automating the ioctl -> bkl_ioctl conversion. 
The following is the end result. Comments?

Regardless of _what_ we do, renaming 'ioctl' to 'bkl_ioctl' is a good 
thing. It means that we can reliably grep for these usages, whether it is 
then to later introduce a CONFIG_BKL or if it is to get rid of the 
bkl_ioctl entirely.

NOTE! This has gone through a "allmodconfig" (with staging drivers), but 
only on x86-64. There are quite probably missing architecture conversions
and/or drivers. But this should be the bulk of them.

It doesn't look that bad. Just about 100 files affected. 

		Linus
---
commit 5312f12ed68dd43acca09ce646fdce3896a292bb
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Apr 26 10:55:23 2010 -0700

    Rename 'struct file_operations' 'ioctl' fn pointer to 'bkl_ioctl'
    
    This makes it much easier to grep for the use of implicit BKL usage in
    the traditional 'ioctl' function pointer.
    
    The patch was mostly auto-generated with the following shell scripting:
    
    	# Create an approximate file list of ioctl initializers
    	git grep -w -20 file_operations |
    		grep '\.ioctl[ 	]*=' |
    		sed 's/-[        ]*\.ioctl.*$//' |
    		sort -u > f
    
    	# Run a sed script to replace 'ioctl' with 'bkl_ioctl'
    	cat f | while read i; do
    		j="$i.sed"
    		sed 's/\.ioctl([      ]*)=/.bkl_ioctl\1=/' < $i > $j
    		mv $j $i
    	done
    
    followed by some similarly trivial scripting to then mostly fix up
    places that tried to align all the initializers with tabs or spaces.
    
    Finally, some manual fixup, and testing of the end result with a
    allmodconfig make (with 'staging' drivers explicitly enabled).
    
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/cris/arch-v10/drivers/ds1302.c         |    2 +-
 arch/cris/arch-v10/drivers/gpio.c           |    2 +-
 ...
From: Arnd Bergmann
Date: Monday, April 26, 2010 - 12:12 pm

I'll have another go at this tonight, see how far I get doing it

Yes, looks good. I've compared it to a previous series of mine that
changed all the files in a different way. A number of those changes
have found their way into your tree by now, and there were only
two left that you didn't catch (see below), and one false positive

The v4l2_file_operations is actually another can of worms, I count
77 more files with a locked ioctl function in there. I'll put that on
the list of things to do before the BKL is gone.

	Arnd

---
Subject: Missing bits from ->ioctl to ->bkl_ioctl conversion

Feel free to fold this into your patch.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 3de2f32..4890683 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -722,7 +722,7 @@ struct file_operations {
 	ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
 	int (*readdir) (struct file *, void *, filldir_t);
 	unsigned int (*poll) (struct file *, struct poll_table_struct *);
-	int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
+	int (*bkl_ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
diff --git a/sound/oss/au1550_ac97.c b/sound/oss/au1550_ac97.c
index c1070e3..7295781 100644
--- a/sound/oss/au1550_ac97.c
+++ b/sound/oss/au1550_ac97.c
@@ -835,11 +835,11 @@ au1550_ioctl_mixdev(struct inode *inode, struct file *file,
 }
 
 static /*const */ struct file_operations au1550_mixer_fops = {
-	owner:THIS_MODULE,
-	llseek:au1550_llseek,
-	ioctl:au1550_ioctl_mixdev,
-	open:au1550_open_mixdev,
-	release:au1550_release_mixdev,
+	.owner = THIS_MODULE,
+	.llseek = au1550_llseek,
+	.bkl_ioctl = ...
From: Linus Torvalds
Date: Monday, April 26, 2010 - 1:36 pm

Yeah, those v4l2_file_operations were annoying. I didn't notice at first, 
so my first automated script renamed all of them too (because I had just 

Yes, but that - along with the sound drivers, that also have their own 
ioctl interface that gets driven under the BKL - are independent patches.

Your patch did one of the au1550_ac97 'struct file_operations', but missed 
another one. I had missed both, because they had that crazy old-style 
initializer, and didn't trigger a compile on x86-64. But I fixed the other 
one too.

So now the thing looks like the appended.

		Linus
---
commit 6a0ac48e0ac12430a72189c06af02a16f4c4b3f2
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Apr 26 10:55:23 2010 -0700

    Rename 'struct file_operations' 'ioctl' fn pointer to 'bkl_ioctl'
    
    This makes it much easier to grep for the use of implicit BKL usage in
    the traditional 'ioctl' function pointer.
    
    The patch was mostly auto-generated with the following shell scripting:
    
    	# Create an approximate file list of ioctl initializers
    	git grep -w -20 file_operations |
    		grep '\.ioctl[ 	]*=' |
    		sed 's/-[        ]*\.ioctl.*$//' |
    		sort -u > f
    
    	# Run a sed script to replace 'ioctl' with 'bkl_ioctl'
    	cat f | while read i; do
    		j="$i.sed"
    		sed 's/\.ioctl([      ]*)=/.bkl_ioctl\1=/' < $i > $j
    		mv $j $i
    	done
    
    followed by some similarly trivial scripting to then mostly fix up
    places that tried to align all the initializers with tabs or spaces.
    
    Finally, some manual fixup, and testing of the end result with a
    allmodconfig make (with 'staging' drivers explicitly enabled).
    
    [ Two missing conversions added by Arnd Bergmann ]
    
    Signed-off-by: Arnd Bergmann <arnd@arndb.de> [ vfs.txt and au1550_ac97.c ]
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/filesystems/vfs.txt           |    2 +-
 ...
From: Arnd Bergmann
Date: Monday, April 26, 2010 - 3:24 pm

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/staging/crystalhd/crystalhd_lnx.c |   13 +++++++++----
 drivers/staging/dt3155/dt3155_drv.c       |   24 ++++++++++++++++++------
 drivers/staging/poch/poch.c               |   17 +++++++++++++++--
 drivers/staging/vme/devices/vme_user.c    |   18 +++++++++++++++---
 4 files changed, 57 insertions(+), 15 deletions(-)

diff --git a/drivers/staging/crystalhd/crystalhd_lnx.c b/drivers/staging/crystalhd/crystalhd_lnx.c
index 54bad65..3291e14 100644
--- a/drivers/staging/crystalhd/crystalhd_lnx.c
+++ b/drivers/staging/crystalhd/crystalhd_lnx.c
@@ -16,6 +16,7 @@
 ***************************************************************************/
 
 #include <linux/version.h>
+#include <linux/smp_lock.h>
 #include <linux/slab.h>
 
 #include "crystalhd_lnx.h"
@@ -261,12 +262,12 @@ static int chd_dec_api_cmd(struct crystalhd_adp *adp, unsigned long ua,
 }
 
 /* API interfaces */
-static int chd_dec_ioctl(struct inode *in, struct file *fd,
-			 unsigned int cmd, unsigned long ua)
+static long chd_dec_ioctl(struct file *fd, unsigned int cmd, unsigned long ua)
 {
 	struct crystalhd_adp *adp = chd_get_adp();
 	crystalhd_cmd_proc cproc;
 	struct crystalhd_user *uc;
+	int ret;
 
 	if (!adp || !fd) {
 		BCMLOG_ERR("Invalid adp\n");
@@ -279,13 +280,17 @@ static int chd_dec_ioctl(struct inode *in, struct file *fd,
 		return -ENODATA;
 	}
 
+	lock_kernel();
 	cproc = crystalhd_get_cmd_proc(&adp->cmds, cmd, uc);
 	if (!cproc) {
 		BCMLOG_ERR("Unhandled command: %d\n", cmd);
+		unlock_kernel();
 		return -EINVAL;
 	}
 
-	return chd_dec_api_cmd(adp, ua, uc->uid, cmd, cproc);
+	ret = chd_dec_api_cmd(adp, ua, uc->uid, cmd, cproc);
+	unlock_kernel();
+	return ret;
 }
 
 static int chd_dec_open(struct inode *in, struct file *fd)
@@ -345,7 +350,7 @@ static int chd_dec_close(struct inode *in, struct file *fd)
 
 static const struct file_operations chd_dec_fops = {
 	.owner   = THIS_MODULE,
-	.ioctl   = ...
From: Frederic Weisbecker
Date: Tuesday, April 27, 2010 - 11:15 am

Hi Greg,




If you are fine with this patch, I think it would be better if
you get it in your tree.


--

From: Greg KH
Date: Tuesday, April 27, 2010 - 11:33 am

Ok, I'll queue it up (note, the poch driver is now gone in the staging
tree.)

thanks,

greg k-h
--

From: Arnd Bergmann
Date: Monday, April 26, 2010 - 3:24 pm

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/scsi/3w-9xxx.c              |   10 +++++++---
 drivers/scsi/3w-sas.c               |    7 +++++--
 drivers/scsi/3w-xxxx.c              |   10 +++++++---
 drivers/scsi/aacraid/linit.c        |   11 ++++++++---
 drivers/scsi/dpt_i2o.c              |   20 +++++++++++++++++---
 drivers/scsi/gdth.c                 |   20 +++++++++++++++-----
 drivers/scsi/megaraid.c             |   20 +++++++++++++++++---
 drivers/scsi/megaraid/megaraid_mm.c |   22 +++++++++++++++++-----
 drivers/scsi/osst.c                 |   14 ++++++++++----
 drivers/scsi/sg.c                   |   17 ++++++++++++++---
 10 files changed, 117 insertions(+), 34 deletions(-)

diff --git a/drivers/scsi/3w-9xxx.c b/drivers/scsi/3w-9xxx.c
index e9788f5..4f74850 100644
--- a/drivers/scsi/3w-9xxx.c
+++ b/drivers/scsi/3w-9xxx.c
@@ -123,7 +123,7 @@ static void twa_aen_queue_event(TW_Device_Extension *tw_dev, TW_Command_Apache_H
 static int twa_aen_read_queue(TW_Device_Extension *tw_dev, int request_id);
 static char *twa_aen_severity_lookup(unsigned char severity_code);
 static void twa_aen_sync_time(TW_Device_Extension *tw_dev, int request_id);
-static int twa_chrdev_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
+static long twa_chrdev_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
 static int twa_chrdev_open(struct inode *inode, struct file *file);
 static int twa_fill_sense(TW_Device_Extension *tw_dev, int request_id, int copy_sense, int print_host);
 static void twa_free_request_id(TW_Device_Extension *tw_dev,int request_id);
@@ -218,7 +218,7 @@ static struct device_attribute *twa_host_attrs[] = {
 /* File operations struct for character device */
 static const struct file_operations twa_fops = {
 	.owner		= THIS_MODULE,
-	.ioctl		= twa_chrdev_ioctl,
+	.unlocked_ioctl	= twa_chrdev_ioctl,
 	.open		= twa_chrdev_open,
 	.release	= NULL
 };
@@ -635,8 +635,9 @@ out:
 } /* End twa_check_srl() */
 
 /* This ...
From: Arnd Bergmann
Date: Monday, April 26, 2010 - 3:24 pm

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/isdn/capi/capi.c            |   17 ++++++++++++++---
 drivers/isdn/divert/divert_procfs.c |   19 ++++++++++++++++---
 drivers/isdn/i4l/isdn_common.c      |   18 +++++++++++++++---
 drivers/isdn/mISDN/timerdev.c       |   10 ++++++----
 4 files changed, 51 insertions(+), 13 deletions(-)

diff --git a/drivers/isdn/capi/capi.c b/drivers/isdn/capi/capi.c
index ee58375..0cabe31 100644
--- a/drivers/isdn/capi/capi.c
+++ b/drivers/isdn/capi/capi.c
@@ -787,8 +787,7 @@ capi_poll(struct file *file, poll_table * wait)
 }
 
 static int
-capi_ioctl(struct inode *inode, struct file *file,
-	   unsigned int cmd, unsigned long arg)
+capi_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct capidev *cdev = file->private_data;
 	capi_ioctl_struct data;
@@ -981,6 +980,18 @@ register_out:
 	}
 }
 
+static long
+capi_unlocked_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	int ret;
+
+	lock_kernel();
+	ret = capi_ioctl(file, cmd, arg);
+	unlock_kernel();
+
+	return ret;
+}
+
 static int capi_open(struct inode *inode, struct file *file)
 {
 	struct capidev *cdev;
@@ -1026,7 +1037,7 @@ static const struct file_operations capi_fops =
 	.read		= capi_read,
 	.write		= capi_write,
 	.poll		= capi_poll,
-	.ioctl		= capi_ioctl,
+	.unlocked_ioctl	= capi_unlocked_ioctl,
 	.open		= capi_open,
 	.release	= capi_release,
 };
diff --git a/drivers/isdn/divert/divert_procfs.c b/drivers/isdn/divert/divert_procfs.c
index 9f49d90..8084557 100644
--- a/drivers/isdn/divert/divert_procfs.c
+++ b/drivers/isdn/divert/divert_procfs.c
@@ -11,6 +11,7 @@
 
 #include <linux/module.h>
 #include <linux/poll.h>
+#include <linux/smp_lock.h>
 #include <linux/slab.h>
 #ifdef CONFIG_PROC_FS
 #include <linux/proc_fs.h>
@@ -178,8 +179,7 @@ isdn_divert_close(struct inode *ino, struct file *filep)
 /* IOCTL */
 /*********/
 static int
-isdn_divert_ioctl(struct inode *inode, struct file *file,
-		  uint ...
From: Arnd Bergmann
Date: Monday, April 26, 2010 - 3:23 pm

This is half the work of getting rid of the BKL in
the ioctl file operation, the rest in arch/ and fs/
still needs to be done, maybe two more hours of
work (for someone else than me ;-)).

Pushdown is straightforward. In many cases, it's
rather obvious that the BKL is not needed at all,
but let's not mix the removal with the pushdown.

Arnd Bergmann (6):
  dvb: push down BKL into ioctl functions
  scsi: push down BKL into ioctl functions
  isdn: push down BKL into ioctl functions
  staging: push down BKL into ioctl functions
  v4l: always use unlocked_ioctl
  drivers: push down BKL into various drivers

 drivers/block/pktcdvd.c                     |   13 ++++++--
 drivers/char/apm-emulation.c                |    8 +++--
 drivers/char/applicom.c                     |   13 +++++---
 drivers/char/ds1620.c                       |   16 ++++++++-
 drivers/char/dtlk.c                         |   15 +++++----
 drivers/char/generic_nvram.c                |   17 ++++++++--
 drivers/char/genrtc.c                       |   16 ++++++++-
 drivers/char/hpet.c                         |   14 +++++---
 drivers/char/i8k.c                          |   21 ++++++++++--
 drivers/char/ipmi/ipmi_devintf.c            |   26 +++++++++++++---
 drivers/char/ipmi/ipmi_watchdog.c           |   17 +++++++++-
 drivers/char/nvram.c                        |   10 ++++--
 drivers/char/nwflash.c                      |    7 +++-
 drivers/char/raw.c                          |   42 ++++++++++++++-----------
 drivers/hwmon/fschmd.c                      |    9 +++--
 drivers/hwmon/w83793.c                      |   10 ++++--
 drivers/input/misc/hp_sdc_rtc.c             |   34 ++++++++++++++------
 drivers/isdn/capi/capi.c                    |   17 ++++++++--
 drivers/isdn/divert/divert_procfs.c         |   19 ++++++++++--
 drivers/isdn/i4l/isdn_common.c              |   18 +++++++++--
 drivers/isdn/mISDN/timerdev.c               |   10 ++++--
 drivers/macintosh/nvram.c                   |    2 +-
 ...
From: John Kacur
Date: Tuesday, April 27, 2010 - 2:14 am

Hi Arnd. These pushdowns conflicted with Linus's renaming.
In addition I found a few errors due to the change to
unlocked_ioctl.

I will resubmit fixes to these problems in a new mail.

--

From: Arnd Bergmann
Date: Monday, April 26, 2010 - 3:24 pm

This requires changing all users of dvb_usercopy to
omit the inode argument.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/media/dvb/dvb-core/dmxdev.c         |   31 +++++++++++++++++++-------
 drivers/media/dvb/dvb-core/dvb_ca_en50221.c |   17 ++++++++++----
 drivers/media/dvb/dvb-core/dvb_frontend.c   |   30 +++++++++++++-------------
 drivers/media/dvb/dvb-core/dvb_net.c        |   15 +++++++++---
 drivers/media/dvb/dvb-core/dvbdev.c         |   17 +++++++++-----
 drivers/media/dvb/dvb-core/dvbdev.h         |   11 +++------
 drivers/media/dvb/firewire/firedtv-ci.c     |    5 +--
 drivers/media/dvb/ttpci/av7110.c            |    4 +-
 drivers/media/dvb/ttpci/av7110_av.c         |    8 +++---
 drivers/media/dvb/ttpci/av7110_ca.c         |    5 +--
 10 files changed, 85 insertions(+), 58 deletions(-)

diff --git a/drivers/media/dvb/dvb-core/dmxdev.c b/drivers/media/dvb/dvb-core/dmxdev.c
index 9ddc579..425862f 100644
--- a/drivers/media/dvb/dvb-core/dmxdev.c
+++ b/drivers/media/dvb/dvb-core/dmxdev.c
@@ -25,6 +25,7 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/module.h>
+#include <linux/smp_lock.h>
 #include <linux/poll.h>
 #include <linux/ioctl.h>
 #include <linux/wait.h>
@@ -963,7 +964,7 @@ dvb_demux_read(struct file *file, char __user *buf, size_t count,
 	return ret;
 }
 
-static int dvb_demux_do_ioctl(struct inode *inode, struct file *file,
+static int dvb_demux_do_ioctl(struct file *file,
 			      unsigned int cmd, void *parg)
 {
 	struct dmxdev_filter *dmxdevfilter = file->private_data;
@@ -1084,10 +1085,16 @@ static int dvb_demux_do_ioctl(struct inode *inode, struct file *file,
 	return ret;
 }
 
-static int dvb_demux_ioctl(struct inode *inode, struct file *file,
-			   unsigned int cmd, unsigned long arg)
+static long dvb_demux_ioctl(struct file *file, unsigned int cmd,
+			    unsigned long arg)
 {
-	return dvb_usercopy(inode, file, cmd, arg, dvb_demux_do_ioctl);
+	int ret;
+
+	lock_kernel();
+	ret = ...
From: Arnd Bergmann
Date: Monday, April 26, 2010 - 3:24 pm

v4l drivers may still use a locked ioctl method,
but we now always export an unlocked_ioctl and
lock ourselves, so that the ->ioctl file operation
can get removed.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/media/video/v4l2-dev.c |   17 ++++++++++++++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/media/video/v4l2-dev.c b/drivers/media/video/v4l2-dev.c
index 7090699..3606694 100644
--- a/drivers/media/video/v4l2-dev.c
+++ b/drivers/media/video/v4l2-dev.c
@@ -22,6 +22,7 @@
 #include <linux/mm.h>
 #include <linux/string.h>
 #include <linux/errno.h>
+#include <linux/smp_lock.h>
 #include <linux/init.h>
 #include <linux/kmod.h>
 #include <linux/slab.h>
@@ -215,16 +216,21 @@ static unsigned int v4l2_poll(struct file *filp, struct poll_table_struct *poll)
 	return vdev->fops->poll(filp, poll);
 }
 
-static int v4l2_ioctl(struct inode *inode, struct file *filp,
+static long v4l2_ioctl(struct file *filp,
 		unsigned int cmd, unsigned long arg)
 {
 	struct video_device *vdev = video_devdata(filp);
+	int ret;
 
 	if (!vdev->fops->ioctl)
 		return -ENOTTY;
 	/* Allow ioctl to continue even if the device was unregistered.
 	   Things like dequeueing buffers might still be useful. */
-	return vdev->fops->ioctl(filp, cmd, arg);
+	lock_kernel();
+	ret = vdev->fops->ioctl(filp->f_path.dentry->d_inode, filp, cmd, arg);
+	unlock_kernel();
+
+	return ret;
 }
 
 static long v4l2_unlocked_ioctl(struct file *filp,
@@ -323,6 +329,11 @@ static const struct file_operations v4l2_unlocked_fops = {
 	.llseek = no_llseek,
 };
 
+/*
+ * Note: this should not be needed, just check
+ *       both pointers in v4l2_ioctl, or kill
+ *       fops->ioctl. -arnd
+ */
 static const struct file_operations v4l2_fops = {
 	.owner = THIS_MODULE,
 	.read = v4l2_read,
@@ -330,7 +341,7 @@ static const struct file_operations v4l2_fops = {
 	.open = v4l2_open,
 	.get_unmapped_area = v4l2_get_unmapped_area,
 	.mmap = v4l2_mmap,
-	.ioctl = ...
From: Arnd Bergmann
Date: Monday, April 26, 2010 - 3:24 pm

These are the last remaining device drivers using
the ->ioctl file operation in the drivers directory.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/block/pktcdvd.c           |   13 ++++++++--
 drivers/char/apm-emulation.c      |    8 ++++--
 drivers/char/applicom.c           |   13 ++++++----
 drivers/char/ds1620.c             |   16 +++++++++++-
 drivers/char/dtlk.c               |   15 ++++++------
 drivers/char/generic_nvram.c      |   17 +++++++++++--
 drivers/char/genrtc.c             |   16 +++++++++++-
 drivers/char/hpet.c               |   14 +++++++----
 drivers/char/i8k.c                |   21 ++++++++++++++---
 drivers/char/ipmi/ipmi_devintf.c  |   26 +++++++++++++++++----
 drivers/char/ipmi/ipmi_watchdog.c |   17 ++++++++++++-
 drivers/char/nvram.c              |   10 ++++++--
 drivers/char/nwflash.c            |    7 ++++-
 drivers/char/raw.c                |   42 ++++++++++++++++++++---------------
 drivers/hwmon/fschmd.c            |    9 ++++---
 drivers/hwmon/w83793.c            |   10 +++++---
 drivers/input/misc/hp_sdc_rtc.c   |   34 ++++++++++++++++++++--------
 drivers/macintosh/nvram.c         |    2 +-
 drivers/macintosh/via-pmu.c       |   17 +++++++++++--
 drivers/mtd/mtdchar.c             |   19 +++++++++++----
 drivers/pcmcia/pcmcia_ioctl.c     |   17 +++++++++++--
 drivers/rtc/rtc-m41t80.c          |   16 +++++++++++-
 drivers/sbus/char/openprom.c      |   44 +++++++++++++++++++++----------------
 drivers/usb/mon/mon_bin.c         |   23 ++++++++++++++-----
 drivers/usb/mon/mon_stat.c        |    3 +-
 25 files changed, 307 insertions(+), 122 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index ddf1942..84a5658 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -55,6 +55,7 @@
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 #include <linux/miscdevice.h>
+#include <linux/smp_lock.h>
 #include <linux/freezer.h>
 #include <linux/mutex.h>
 #include <linux/slab.h>
@@ ...
From: David Miller
Date: Monday, April 26, 2010 - 1:42 pm

From: Linus Torvalds <torvalds@linux-foundation.org>

"allmodconfig" on sparc64 passes with this patch too, just FYI...
--

From: Frederic Weisbecker
Date: Monday, April 26, 2010 - 3:09 pm

Thanks!

I've queued it for the next merge window in

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	bkl/ioctl

--

From: Linus Torvalds
Date: Monday, April 26, 2010 - 3:32 pm

Btw, I hope you took the second version, that had the two additional fixes 
from Arnd (and my expansion of his fix to au1550_ac97.c).

Looking at the arch code, I doubt there are any big architecture-specific 
things. But there could easily be some other drivers like au1550_ac97.c 
that only get enabled on certain architectures and missed the grepping for 
some reason.

That said, because of Arnd's fix, I did end up grepping for 
'file_operations' and old-style gcc initializers (ie "ioctl: xyz" rather 
than the proper ".ioctl = xyz"), and the grep came up empty.

But it's possible that there's something hiding: with all of serial, 
bluetooth, block drivers, sound, socket proto's and v4l2 each having their 
own 'ioctl' pointers, it's not entirely trivial to grep for it all and be 
sure..

So there might be one or two cases still hiding, but it looks unlikely. 
And I'm almost certain that it definitely isn't more than just one or two.

		Linus
--

From: Frederic Weisbecker
Date: Monday, April 26, 2010 - 4:04 pm

Ok, thanks.

--

From: Ingo Molnar
Date: Monday, April 26, 2010 - 12:25 am

The scheduler will be helped most by getting rid of the BKL altogether. We are 
in reaching distance of that now ...


Once that state is achived we can just get rid of the BKL and mass-push 
per-driver mutexes into those remaining drivers - in a possibly scripted way. 
Something like:

  foo-driver.c

  DEFINE_MUTEX(foo_mutex);

  foo_ioctl()
  {
	mutex_lock(&foo_mutex);
	...
	mutex_unlock(&foo_mutex);
  }

  foo_open()
  {
	mutex_lock(&foo_mutex);
	...
	mutex_unlock(&foo_mutex);
  }

This could be done all automated for a hundred old drivers if need to be. 
There would be no bkl_ioctl's left.

That, even if it looks somewhat coarse is still better than having _yet 
another_ 'temporary transition'. The Big Kernel Lock was supposed to be 
transitionary to begin with. It's been 10+ years and counting :-)

	Ingo
--

From: Arnd Bergmann
Date: Monday, April 26, 2010 - 4:29 am

I don't think it can be fully automated. For the majority of the modules,
your approach would work fine, but there are still the well-known
pitfalls in corner cases:

- recursive uses in functions outside of ioctl (possibly none left
  after the TTY layer is done, but who knows)
- lock-order problems with other mutexes (see DRM)
- code that depends on autorelease to allow one ioctl while another
  is sleeping. (a small number of drivers)

Semi-automated should be fine though. These rules are relatively
easy to check, so we can mass-convert all the trivial cases.

Examples for nontrivial modules are mostly file systems, see ncpfs,

I think the immediate goal should be to get the BKL out of everthing
that's either used by real people or that's reasonably easy to do.
We have patches for almost all of these now [1], and I've been running
a kernel with CONFIG_BKL=n for a few weeks now. As we progress
through the remaining modules, an increasing number of systems can
run without this.

I see the next steps as:
1. make it possible to build a kernel without BKL, by removing the BKL
   from all core components for good, and making all users depend on
   CONFIG_BKL. Make it default y and put it under CONFIG_DEBUG.

2. Remove the BKL from all modules that are either easy to convert
   to a private mutex or that are relevant to real users (e.g.
   definitely DRM, but maybe not hpfs).

3. Change CONFIG_BKL to "default n, depends on DEPRECATED"

4. Remove the remaining modules that nobody knows how to fix
   or cares about. Possibly the number of modules here is close
   to zero.

5. Remove the implementation of the BKL, since no users are left.

Getting stage 1 done for 2.6.36 or even 2.6.35 should be possible but
still needs a lot of work. I think we should focus on that for now
and then see how much is left to do for the other stages.

This is still a temporary transition, but since we can't do all at
once, I don't see better way.

	Arnd

[1] ...
From: Ingo Molnar
Date: Tuesday, April 27, 2010 - 2:25 am

Corner cases are not a problem as long as the risk of them going unnoticed is 

Not a problem even if there's any such usage left: lockdep will sort those out 


This is a real issue, and in fact it's an unknown: there may be an unknown 
number of random sleep points within BKL codepaths that is being relied on in 
creative ways.

Note that by introducing a mutex we (in most cases) make the locking 
_stricter_, so the biggest risk from that is a lockup - which will be 
debuggable via lockdep.

	Ingo
--

From: Frederic Weisbecker
Date: Wednesday, April 28, 2010 - 6:21 am

And the hung task detector too which is the last resort to detect
uncovered resource dependencies (was really useful for reiserfs).

But the problem is among those people who may use such ancient drivers,
I guess few of them will have those debug config enabled.

And because there are almost no testers of these drivers, nobody/few will ever


So, as explained above, lockdep won't even help here.

I mean, for callsites that are obvious, say when it is clear that
the bkl is leaf lock or doesn't introduce uncovered resource dependencies due
to non-release on sleep, we should do such conversion. And I guess most
drivers that use the bkl follow this scheme.

But for the others (rares I think), the operation looks unsafe to me.
If we don't have the hardware to test the driver, then lockdep and hung
task detectors are going to be useless.

That said, once we reach that point with 4 users of bkl remaining, may
be that will be time to buy such hardware for a symbolic $1 in obscure
places and do the tests. Or just git-rm if we are too lazy.

--

From: Ingo Molnar
Date: Wednesday, April 28, 2010 - 6:37 am

Precisely. Note that the actual value of BKL-covered code shrinks every time 
we push the BKL out of commonly used code.

So even if there's still usage sites around, there will be a point where it 
makes sense to just desupport it or do a final, crude conversion. If there are 
no active users who are willing to help us debug potential bugs then there's 
frankly no value in us being just as careful with those drivers as we are with 
other, more commonly used code.

For the rest, lockdep & softlockup detector will be plenty enough to debug any 
bugs that may slip through - just like you did it with ReiserFS.

	Ingo
--

From: Arnd Bergmann
Date: Wednesday, April 28, 2010 - 7:05 am

I actually support your idea of automatic conversion of modules in the
way you lined out, but would not want to apply it to the entire kernel
at once, but only to modules that were manually inspected to be safe for
doing this.

This minimizes both the risk from corner cases and the risk from manual
conversion.

	Arnd
--

Previous thread: [GIT PULL] Preparation for BKL'ed ioctl removal by Frederic Weisbecker on Thursday, April 15, 2010 - 8:55 pm. (5 messages)

Next thread: [PATCH] blkio: Initialize blkg->stats_lock for the root cfqg too by Divyesh Shah on Thursday, April 15, 2010 - 9:02 pm. (4 messages)