Re: [patch 1/1] Drop CAP_SYS_RAWIO requirement for FIBMAP

Previous thread: [PATCH] x86: fix pci-gart failure handling by FUJITA Tomonori on Thursday, October 25, 2007 - 4:08 pm. (2 messages)

Next thread: Re: - mmconfig-validate-against-acpi-motherboard-resources.patch removed from -mm tree by Robert Hancock on Thursday, October 25, 2007 - 4:20 pm. (35 messages)
From: Mike Waychison
Date: Thursday, October 25, 2007 - 4:06 pm

Remove the need for having CAP_SYS_RAWIO when doing a FIBMAP call on an open file descriptor.

It would be nice to allow users to have permission to see where their data is landing on disk, and there really isn't a good reason to keep them from getting at this information.

Signed-off-by: Mike Waychison <mikew@google.com>
 fs/ioctl.c |    2 --
 1 file changed, 2 deletions(-)

Index: linux-2.6.23/fs/ioctl.c
===================================================================
--- linux-2.6.23.orig/fs/ioctl.c	2007-10-09 13:31:38.000000000 -0700
+++ linux-2.6.23/fs/ioctl.c	2007-10-25 15:48:24.000000000 -0700
@@ -56,8 +56,6 @@ static int file_ioctl(struct file *filp,
 			/* do we support this mess? */
 			if (!mapping->a_ops->bmap)
 				return -EINVAL;
-			if (!capable(CAP_SYS_RAWIO))
-				return -EPERM;
 			if ((error = get_user(block, p)) != 0)
 				return error;
 

--

-

From: Alan Cox
Date: Thursday, October 25, 2007 - 5:22 pm

On Thu, 25 Oct 2007 16:06:40 -0700

Historically this was done because people felt it was more secure. It
also allows you to make some deductions about other activities on the
disk but thats probably only a concern for very very security crazed
compartmentalised boxes

Also historically at least FIBMAP could be abused to crash the system.
Now if you can verify that has been fixed I have no problem, but given
that I can find no record of that being fixed it would be wise to audit
it first and review Chris Evans and other reports about what occurs when
FIBMAP is passed random block numbers.

FIBMAP has another problem for this general use as well - it takes an int
but the block number can now be bigger for very large files on 32bit.

Alan
-

From: Jason Uhlenkott
Date: Friday, October 26, 2007 - 2:55 pm

Additionally, ext3_bmap() has this to say about it:

        if (EXT3_I(inode)->i_state & EXT3_STATE_JDATA) {
                /*
                 * This is a REALLY heavyweight approach, but the use of
                 * bmap on dirty files is expected to be extremely rare:
                 * only if we run lilo or swapon on a freshly made file
                 * do we expect this to happen.
                 *
                 * (bmap requires CAP_SYS_RAWIO so this does not
                 * represent an unprivileged user DOS attack --- we'd be
                 * in trouble if mortal users could trigger this path at
                 * will.)
-

From: Mike Waychison
Date: Friday, October 26, 2007 - 2:59 pm

Hmm.  I don't know what the right approach to this is.  This seems to be 
the same situation as the delayed allocation problem, no?

What if we just returned 0?  Tools like lilo are already doing sync(), 
would that cause the journal to get flushed explicitly anyway?

Mike Waychison
-

From: Jason Uhlenkott
Date: Friday, October 26, 2007 - 3:40 pm

Not sure, but I'd be pretty nervous about breaking any existing users
which aren't explicitly syncing.

Are you envisioning users who want to see where their data is landing
for performance reasons?  It seems like such users are going to have
sufficiently different desires from existing FIBMAP users (who need to
know where everything is because they intend to fiddle with the raw
device) that a different interface might be warranted.
-

From: Mike Waychison
Date: Friday, October 26, 2007 - 3:53 pm

True.   We can probably get away with an implicit flush when 

A little of both ;)

We could introduce a new API, though either way, the same fundamental 
problems apply wrt auditing.

I see three reasons that new APIs are warranted:

a) to deal with block numbers > 2^31 --> FIBMAP64
b) to have a path where no syncing is required due to worries about user 
DoS (delayed allocation / data in journal).
c) possibly some way to FIBMAP a range so that userspace doesn't need to 
syscall for each block, something like how mincore() does it?

I have a patchset ready that I'll send out shortly that introduces 
FIBMAP64.  The last patch in that set drops the CAP_SYS_RAWIO, but it's 
probably not what we want given DoS case.  I'd like to send it out 
anyway to get some comments on some of the sanity checks and locking I'm 
adding.

Handling (c) above is just extra sugar and isn't something I'm too 
worried about implementing.

Mike Waychison
-

From: Mike Waychison
Date: Thursday, October 25, 2007 - 5:35 pm

I found Chris's comment about negative block numbers, I'll send a patch 
out for that.

You mentioned back in 99 about racing with ftruncate.  Is it sufficient 
to mutex_lock(i_mutex) and down_read(i_alloc_sem)?

Mike Waychison
-

From: Alan Cox
Date: Thursday, October 25, 2007 - 5:43 pm

One for the fs guys. That code has changed far beyond anything I
understand any more 8)

-

From: Pavel Machek
Date: Monday, October 29, 2007 - 12:08 pm

I believe it is to prevent users from intentionally creating extremely
fragmented files...

You can read 60MB in a second, but fragmented 60MB file could take
10msec * 60MB/4KB = 150 seconds. That's factor 150 slowdown...

...but I agree that SYS_RAWIO may be wrong capability to cover this.

							Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Ric Wheeler
Date: Thursday, November 1, 2007 - 7:51 am

[Empty message]
Previous thread: [PATCH] x86: fix pci-gart failure handling by FUJITA Tomonori on Thursday, October 25, 2007 - 4:08 pm. (2 messages)

Next thread: Re: - mmconfig-validate-against-acpi-motherboard-resources.patch removed from -mm tree by Robert Hancock on Thursday, October 25, 2007 - 4:20 pm. (35 messages)