Remove the need for having CAP_SYS_RAWIO when doing a FIBMAP call on an open file descriptor. It would be nice to allow users to have permission to see where their data is landing on disk, and there really isn't a good reason to keep them from getting at this information. Signed-off-by: Mike Waychison <mikew@google.com> fs/ioctl.c | 2 -- 1 file changed, 2 deletions(-) Index: linux-2.6.23/fs/ioctl.c =================================================================== --- linux-2.6.23.orig/fs/ioctl.c 2007-10-09 13:31:38.000000000 -0700 +++ linux-2.6.23/fs/ioctl.c 2007-10-25 15:48:24.000000000 -0700 @@ -56,8 +56,6 @@ static int file_ioctl(struct file *filp, /* do we support this mess? */ if (!mapping->a_ops->bmap) return -EINVAL; - if (!capable(CAP_SYS_RAWIO)) - return -EPERM; if ((error = get_user(block, p)) != 0) return error; -- -
On Thu, 25 Oct 2007 16:06:40 -0700 Historically this was done because people felt it was more secure. It also allows you to make some deductions about other activities on the disk but thats probably only a concern for very very security crazed compartmentalised boxes Also historically at least FIBMAP could be abused to crash the system. Now if you can verify that has been fixed I have no problem, but given that I can find no record of that being fixed it would be wise to audit it first and review Chris Evans and other reports about what occurs when FIBMAP is passed random block numbers. FIBMAP has another problem for this general use as well - it takes an int but the block number can now be bigger for very large files on 32bit. Alan -
Additionally, ext3_bmap() has this to say about it:
if (EXT3_I(inode)->i_state & EXT3_STATE_JDATA) {
/*
* This is a REALLY heavyweight approach, but the use of
* bmap on dirty files is expected to be extremely rare:
* only if we run lilo or swapon on a freshly made file
* do we expect this to happen.
*
* (bmap requires CAP_SYS_RAWIO so this does not
* represent an unprivileged user DOS attack --- we'd be
* in trouble if mortal users could trigger this path at
* will.)
-
Hmm. I don't know what the right approach to this is. This seems to be the same situation as the delayed allocation problem, no? What if we just returned 0? Tools like lilo are already doing sync(), would that cause the journal to get flushed explicitly anyway? Mike Waychison -
Not sure, but I'd be pretty nervous about breaking any existing users which aren't explicitly syncing. Are you envisioning users who want to see where their data is landing for performance reasons? It seems like such users are going to have sufficiently different desires from existing FIBMAP users (who need to know where everything is because they intend to fiddle with the raw device) that a different interface might be warranted. -
True. We can probably get away with an implicit flush when A little of both ;) We could introduce a new API, though either way, the same fundamental problems apply wrt auditing. I see three reasons that new APIs are warranted: a) to deal with block numbers > 2^31 --> FIBMAP64 b) to have a path where no syncing is required due to worries about user DoS (delayed allocation / data in journal). c) possibly some way to FIBMAP a range so that userspace doesn't need to syscall for each block, something like how mincore() does it? I have a patchset ready that I'll send out shortly that introduces FIBMAP64. The last patch in that set drops the CAP_SYS_RAWIO, but it's probably not what we want given DoS case. I'd like to send it out anyway to get some comments on some of the sanity checks and locking I'm adding. Handling (c) above is just extra sugar and isn't something I'm too worried about implementing. Mike Waychison -
I found Chris's comment about negative block numbers, I'll send a patch out for that. You mentioned back in 99 about racing with ftruncate. Is it sufficient to mutex_lock(i_mutex) and down_read(i_alloc_sem)? Mike Waychison -
One for the fs guys. That code has changed far beyond anything I understand any more 8) -
I believe it is to prevent users from intentionally creating extremely fragmented files... You can read 60MB in a second, but fragmented 60MB file could take 10msec * 60MB/4KB = 150 seconds. That's factor 150 slowdown... ...but I agree that SYS_RAWIO may be wrong capability to cover this. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
