linux-fsdevel mailing list

FromSubjectsort iconDate
Chris Mason
[ANNOUNCE] Btrfs v0.10 (online growing/shrinking, ext3 conve...
Hello everyone, Btrfs v0.10 is now available for download from: http://oss.oracle.com/projects/btrfs/ Btrfs is still in an early alpha state, and the disk format is not finalized. v0.10 introduces a new disk format, and is not compatible with v0.9. The core of this release is explicit back references for all metadata blocks, data extents, and directory items. These are a crucial building block for future features such as online fsck and migration between devices. The back references are ...
Jan 15, 11:52 am 2008
Kyle McMartin
Re: [ANNOUNCE] Btrfs v0.10 (online growing/shrinking, ext3 c...
Looks like fun. btrfsck fails to check if it actually received a dev argument though, so if you don't pass a device, we get a nice segfault. Signed-off-by: Kyle McMartin <kmcmartin@redhat.com> --- diff -Nur btrfs-progs-0.10/btrfsck.c btrfs-progs-0.10-kyle/btrfsck.c --- btrfs-progs-0.10/btrfsck.c 2008-01-15 10:33:32.000000000 -0500 +++ btrfs-progs-0.10-kyle/btrfsck.c 2008-01-15 11:49:24.000000000 -0500 @@ -709,6 +709,11 @@ return err; } +void print_usage(void) { + fprintf(stderr...
Jan 15, 12:55 pm 2008
Fengguang Wu
[PATCH 00/13] writeback bug fixes and simplifications take 2
Andrew, This patchset mainly polishes the writeback queuing policies. The main goals are: (1) small files should not be starved by big dirty files (2) sync as fast as possible for not-blocked inodes/pages - don't leave them out; no congestion_wait() in between them (3) avoid busy iowait for blocked inodes - retry them in the next go of s_io(maybe at the next wakeup of pdflush) The role of the queues: s_dirty: park for dirtied_when expiration s_io: park for io submission s_m...
Jan 15, 8:36 am 2008
Michael Rubin
Re: [PATCH 00/13] writeback bug fixes and simplifications ta...
Fengguang do you have any specific tests for any of these cases? As I have posted earlier I am putting together a writeback test suite for test.kernel.org and if you have one (even if it's an ugly shell script) that would save me some time. Also if you want any of mine let me know. :-) -
Jan 15, 2:33 pm 2008
Fengguang Wu
[PATCH 08/13] writeback: defer writeback on locked buffers
Convert to requeue_io_wait() for case: pages skipped due to locked buffers. Cc: Michael Rubin <mrubin@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/fs-writeback.c | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) --- linux-mm.orig/fs/fs-writeback.c +++ linux-mm/fs/fs-writeback.c @@ -456,7 +456,7 @@ int generic_sync_sb_inodes(struct super_ * writeback is not making progress due to locked ...
Jan 15, 8:36 am 2008
Fengguang Wu
[PATCH 10/13] writeback: introduce queue_dirty()
Introduce queue_dirty() to enqueue a newly dirtied inode. It helps remove duplicate code. Cc: Michael Rubin <mrubin@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/fs-writeback.c | 21 +++++++++++++-------- 1 files changed, 13 insertions(+), 8 deletions(-) --- linux-mm.orig/fs/fs-writeback.c +++ linux-mm/fs/fs-writeback.c @@ -25,6 +25,15 @@ #include <linux/buffer_head.h> #include "internal.h" +/*...
Jan 15, 8:36 am 2008
Fengguang Wu
[PATCH 13/13] writeback: cleanup __sync_single_inode()
Make the if-else straight in __sync_single_inode(). No behavior change. Cc: Michael Rubin <mrubin@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/fs-writeback.c | 15 +++++++-------- 1 files changed, 7 insertions(+), 8 deletions(-) --- linux-mm.orig/fs/fs-writeback.c +++ linux-mm/fs/fs-writeback.c @@ -254,8 +254,13 @@ __sync_single_inode(struct inode *inode, spin_lock(&inode_lock); inode->i_state ...
Jan 15, 8:36 am 2008
Fengguang Wu
[PATCH 12/13] writeback: remove redirty_tail()
Remove redirty_tail(). It's no longer used. Cc: Michael Rubin <mrubin@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/fs-writeback.c | 24 ------------------------ 1 files changed, 24 deletions(-) --- linux-mm.orig/fs/fs-writeback.c +++ linux-mm/fs/fs-writeback.c @@ -148,30 +148,6 @@ static int write_inode(struct inode *ino } /* - * Redirty an inode: set its when-it-was dirtied timestamp and move it to ...
Jan 15, 8:36 am 2008
Fengguang Wu
[PATCH 03/13] writeback: introduce writeback_control.more_io
Introduce writeback_control.more_io to indicate that more I/O is scheduled for this wakeup of pdflush. Note that more_io is only updated on the _visited_ superblocks, which prevents pdflush deamons from interfering with one another. Cc: Michael Rubin <mrubin@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: ...
Jan 15, 8:36 am 2008
Fengguang Wu
[PATCH 07/13] writeback: defer writeback on locked inode
Convert to requeue_io_wait() for case: inode is locked. Cc: Michael Rubin <mrubin@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/fs-writeback.c | 7 ++----- 1 files changed, 2 insertions(+), 5 deletions(-) --- linux-mm.orig/fs/fs-writeback.c +++ linux-mm/fs/fs-writeback.c @@ -329,12 +329,9 @@ __writeback_single_inode(struct inode *i if ((wbc->sync_mode != WB_SYNC_ALL) && (inode->i_state &...
Jan 15, 8:36 am 2008
Fengguang Wu
[PATCH 04/13] writeback: introduce super_block.s_more_io_wait
Introduce super_block.s_more_io_wait to park inodes that for some reason cannot be synced immediately. They will be revisited in the next s_io enqueue time(<=5s). The new data flow after this patchset: s_dirty --> s_io --> s_more_io/s_more_io_wait --+ ^ | | | +----------------------------------+ - to fill s_io: s_more_io + s_dirty(expired) + s_more_io_wait ---> s_io - to drain s...
Jan 15, 8:36 am 2008
Fengguang Wu
[PATCH 09/13] writeback: requeue_io() on redirtied inode
Redirtied inodes could be seen in really fast writes. They should really be synced as soon as possible. redirty_tail() could delay the inode for up to 30s. Kill the delay by using requeue_io() instead. Cc: Michael Rubin <mrubin@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/fs-writeback.c | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) --- linux-mm.orig/fs/fs-writeback.c +++ linux-mm/fs/fs-writebac...
Jan 15, 8:36 am 2008
Fengguang Wu
[PATCH 11/13] writeback: queue_dirty() on memory-backed bdi
Replace redirty_tail() with queue_dirty() on memory backed bdi. It makes no difference - only simpler. Cc: Michael Rubin <mrubin@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/fs-writeback.c | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) --- linux-mm.orig/fs/fs-writeback.c +++ linux-mm/fs/fs-writeback.c @@ -407,7 +407,7 @@ int generic_sync_sb_inodes(struct super_ int err; if (!bdi_cap_writ...
Jan 15, 8:36 am 2008
Fengguang Wu
[PATCH 01/13] writeback: revert 2e6883bdf49abd0e7f0d9b6297fc...
Revert 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b. Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/fs-writeback.c | 2 -- include/linux/writeback.h | 1 - mm/page-writeback.c | 9 +++------ 3 files changed, 3 insertions(+), 9 deletions(-) Index: linux-mm/include/linux/writeback.h =================================================================== --- linux-mm.orig/include/linux/writeback.h +++ linux-mm/include/linux/writeback.h @@ -62,7 +62,6 @@ struct wri...
Jan 15, 8:36 am 2008
Fengguang Wu
[PATCH 05/13] writeback: merge duplicate code into writeback...
Merge duplicate code from background_writeout() and wb_kupdate() into writeback_some_pages(). The pages_skipped in background_writeout() is ignored. The inode cannot be written now will be retried in the next run of pdflush, typically in 5s. Cc: Michael Rubin <mrubin@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- mm/page-writeback.c | 43 +++++++++++++++++++++--------------------- 1 files changed, 22 insertions(...
Jan 15, 8:36 am 2008
Fengguang Wu
[PATCH 02/13] writeback: clear PAGECACHE_TAG_DIRTY for trunc...
The `truncated' page in block_write_full_page() may stick for a long time. E.g. ext2_rmdir() will set i_size to 0, and then the dir inode may hang around because of being referenced by someone. So clear PAGECACHE_TAG_DIRTY to prevent pdflush from retrying and iowaiting on it. Tested-by: Joerg Platte <jplatte@naasa.net> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/buffer.c | 2 ++ 1 files changed, 2 insertions(+) Index: linux/fs/buffer.c ==========================...
Jan 15, 8:36 am 2008
Fengguang Wu
[PATCH 06/13] writeback: defer writeback on not-all-pages-wr...
Convert to requeue_io_wait() for case: - kupdate cannot write all pages due to some blocking condition; - during sync, a file is being written to too fast, starving other files. In the case of sync, requeue_io_wait() can break the starvation because the inode requeued into s_more_io_wait will be served _after_ normal inodes, hence won't stand in the way of other inodes in the next run. Cc: Michael Rubin <mrubin@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-...
Jan 15, 8:36 am 2008
Jens Axboe
Re: [PATCH][RFC] fast file mapping for loop
I split and merged the patch into five bits (added ext3 support), so perhaps that would be easier for people to read/review. Attached and also exist in the loop-extent_map branch here: http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=loop-extent_map -- Jens Axboe
Jan 15, 5:25 am 2008
Jens Axboe
Re: [PATCH][RFC] fast file mapping for loop
Seems my ext3 version doesn't work, it craps out in ext3_get_blocks_handle() triggering this bug: J_ASSERT(handle != NULL || create == 0); I'll see if I can fix that, being fairly fs ignorant... -- Jens Axboe -
Jan 15, 5:36 am 2008
Jens Axboe
Re: [PATCH][RFC] fast file mapping for loop
This works, but probably pretty suboptimal (should end the new journal in map_io_complete()?). And yes I know the >> 9 isn't correct, since the fs block size is larger. Just making sure that we always have enough blocks. Punting to Chris! diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index 55e677d..e97181a 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -1002,11 +1002,25 @@ static struct extent_map *ext3_map_extent(struct address_space *mapping, gfp_t gfp_mask) { st...
Jan 15, 6:07 am 2008
Chris Mason
Re: [PATCH][RFC] fast file mapping for loop
On Tue, 15 Jan 2008 11:07:40 +0100 You can use DIO_CREDITS instead of len >> 9, just like the ext3 O_DIRECT code does. Your current patch is fine, except it breaks data=ordered rules. My plan to work within data=ordered: 1) Inside ext3_map_extent (while the transaction was running), increment a counter in the ext3 journal for number of pending IOs. Then end the transaction handle. 2) Drop this counter inside the IO completion call 3) Change the ext3 commit code to wait for the IO ...
Jan 15, 10:04 am 2008
Ric Wheeler
Re: [RFD] Incremental fsck
I think that you have to keep in mind the way disk (and other media) fail. You can get media failures after a successful write or errors that pop up as the media ages. Not to mention the way most people run with write cache enabled and no write barriers enabled - a sure recipe for corruption. Of course, there are always software errors to introduce corruption even when we get everything else right ;-) From what I see, media errors are the number one cause of corruption in file systems....
Jan 14, 9:04 pm 2008
Pavel Machek
[Patch] document ext3 requirements (was Re: [RFD] Incrementa...
Ok, should something like this be added to the documentation? It would be cool to be able to include few examples (modern SATA disks support bariers so are safe, any IDE from 1989 is unsafe), but I do not know enough about hw... Signed-off-by: Pavel Machek <pavel@suse.cz> diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt index b45f3c1..adfcc9d 100644 --- a/Documentation/filesystems/ext3.txt +++ b/Documentation/filesystems/ext3.txt @@ -183,6 +183,18 @...
Jan 15, 4:16 pm 2008
David Chinner
Re: [Patch] document ext3 requirements (was Re: [RFD] Increm...
ext3 is not the only filesystem that will have trouble due to volatile write caches. We see problems often enough with XFS due to volatile write caches that it's in our FAQ: http://oss.sgi.com/projects/xfs/faq.html#wcache Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group -
Jan 15, 5:43 pm 2008
Pavel Machek
Re: [Patch] document ext3 requirements (was Re: [RFD] Increm...
Nice FAQ, yep. Perhaps you should move parts of it to Documentation/ , and I could then make ext3 FAQ point to it? I had write cache enabled on my main computer. Oops. I guess that means we do need better documentation. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Jan 15, 7:07 pm 2008
Daniel Phillips Jan 15, 7:44 pm 2008
Miklos Szeredi
Re: [patch 4/9] unprivileged mounts: propagate error values ...
Ah yes, this is indeed confusing. Last time dup_mnt_ns() returned a namespace pointer or NULL. But now I see it returns an ERR_PTR(error) instead, which means it's cleaner to just propagate the error value. I'll fix this. Thanks, Miklos -
Jan 15, 6:15 am 2008
Miklos Szeredi
Re: [patch 8/9] unprivileged mounts: propagation: inherit ow...
Hmm, I think the nosuid thing is meant to prevent suid mounts being introduced into a "suidless" namespace. This doesn't apply to dev mounts, which are quite safe in a suidless environment, as long as the user is not able to create devices. But that should be taken care of by capability tests. I'll update the description. Thanks, Miklos -
Jan 15, 6:39 am 2008
Serge E. Hallyn
Re: [patch 8/9] unprivileged mounts: propagation: inherit ow...
Hmm, Part of me wants to say the safest thing for now would be to refuse mounts propagation from non-user mounts to user mounts. I assume you're thinking about a fully user-mounted chroot, where the user woudl still want to be able to stick in a cdrom and have it automounted under /mnt/cdrom, propagated from the root mounts ns? But then are there no devices which the user could create on a floppy while inserted into his own laptop, owned by his own uid, then insert into this machine, and use...
Jan 15, 10:21 am 2008
Miklos Szeredi
Re: [patch 8/9] unprivileged mounts: propagation: inherit ow...
I assume, that the floppy and cdrom are already mounted with nosuid,nodev. The problem case is I think is if a sysadmin does some mounting in the initial namespace, and this is propagated into the fully user-mounted namespace (or chroot), so that a mount with suid binaries slips in. Which is bad, because the user may be able rearange the namespace, to trick the suid program to something it should not do. OTOH, a mount with devices can't be abused this way, since it is not possible to gain pr...
Jan 15, 10:37 am 2008
Serge E. Hallyn
Re: [patch 8/9] unprivileged mounts: propagation: inherit ow...
Yeah, of course, what I'm saying is no different whether the upper mount And really this shouldn't be an issue at all - the usermount chroot would be set up under something like /share/hallyn/root, so the admin would have to purposely set up propagation into that tree, so this Thanks for humoring me, -serge -
Jan 15, 10:59 am 2008
Miklos Szeredi
Re: [patch 7/9] unprivileged mounts: allow unprivileged fuse...
I think the most generic approach, is to be able to set "safeness" for any fs type, not just fuse (Karel's suggestion). E.g: echo 1 > /proc/sys/fs/types/cifs/safe This would also provide a way to query the FS_SAFE flag. Miklos -
Jan 15, 6:29 am 2008
Serge E. Hallyn Jan 15, 9:35 am 2008
Miklos Szeredi
Re: [patch 9/9] unprivileged mounts: add "no submounts" flag
Me neither. Thanks for the review, Serge! Miklos -
Jan 15, 6:41 am 2008
A. C. Censi
Re: [patch 9/9] unprivileged mounts: add "no submounts" flag
Why not "nosubmnt"? -- A. C. Censi accensi [em] gmail [ponto] com accensi [em] montreal [ponto] com [ponto] br -
Jan 15, 6:53 am 2008
Miklos Szeredi
Re: [patch 9/9] unprivileged mounts: add "no submounts" flag
> Why not "nosubmnt"? Why not indeed. Maybe I should try to use my brain sometime. Thanks, Miklos -
Jan 15, 6:58 am 2008
Serge E. Hallyn
Re: [patch 9/9] unprivileged mounts: add "no submounts" flag
Well it really should have 'user' or 'unpriv' in the name somewhere. 'nosubmnt' is more confusing than 'nomnt' because it no submounts really sounds like a reasonable thing in itself... But I never win naming arguments, so I accept that I have poor naming judgement :) -serge -
Jan 15, 9:47 am 2008
Peter Zijlstra
Re: [RFC/PATCH 4/8] revoke: core code V7
Humm, we were trying to get rid of file_list_lock(), this puts up another user of the sb file list. Also, that loop looks horribly expensive: n*(1+m); where n is the list size, and m the number of matching fds. Granted, I see no other options either. -
Jan 15, 11:14 am 2008
Christoph Hellwig
Re: [RFC/PATCH 4/8] revoke: core code V7
Something like the loop above is not going to go in for sure. Once we get rid of the sb->s_files we can put the list_head in struct file to new use eventually if we don't want to get rid of it. E.g. and per-inode list would be much better than the per-superblock one and would regularize what the tty driver is doing. But I'm not too interesting in hashing out these details currently, my primary concern is to get the per-mount r/o plus fallout like the correct remount r/o and file_list_lock re...
Jan 15, 1:27 pm 2008
Matthew Wilcox
Re: Leak in nlmsvc_testlock for async GETFL case
Hi Bruce, I haven't had as much time to play with de-BKL-ising fs/locks.c as I would like, so fixing that for 2.6.25 is probably out of the question, but here are two janitorial patches that hopefully can be applied and will make the next steps easier. They make sense all by themselves, even if I don't get back to this project for a few months. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating ...
Jan 15, 12:26 am 2008
J. Bruce Fields
Re: Leak in nlmsvc_testlock for async GETFL case
OK, thanks. Hopefully we will get back to this--it'll be nice to finally make progress on it. --b. -
Jan 15, 10:42 am 2008
Matthew Wilcox
file locks: Split flock_find_conflict out of flock_lock_file
Reduce the spaghetti-like nature of flock_lock_file by making the chunk of code labelled find_conflict into its own function. Also allocate memory before taking the kernel lock in preparation for switching to a normal spinlock. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> diff --git a/fs/locks.c b/fs/locks.c index b681459..bc691e5 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -699,6 +699,33 @@ next_task: return 0; } +/* + * A helper function for flock_lock_file(). It ...
Jan 15, 12:29 am 2008
J. Bruce Fields
Re: file locks: Split flock_find_conflict out of flock_lock_...
If we did those two steps separately, would the result be two simpler The return value is the reverse of what I'd naively expect--I don't expect something named flock_find_conflict to return something exactly when conflict is *not* found, but I don't see a better way to do it. Perhaps there's a better name? -
Jan 15, 2:50 pm 2008
Matthew Wilcox
file locks: Use wait_event_interruptible_timeout()
interruptible_sleep_on_locked() is just an open-coded wait_event_interruptible_timeout() with a few assumptions since we know we hold the BKL. locks_block_on_timeout() is only used in one place, so it's actually simpler to inline it into its caller. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> locks.c | 33 ++++----------------------------- 1 file changed, 4 insertions(+), 29 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 8b8388e..b681459 100644 --- a/fs/locks.c ...
Jan 15, 12:28 am 2008
J. Bruce Fields
Re: file locks: Use wait_event_interruptible_timeout()
Makes sense, thanks. So the assumption we were depending on the BKL for was that we could count on the wake-up not coming till after we block, so we could skip a check ->fl_next that's normally needed to resolve the usual sleeping-on-some-condition race? -
Jan 15, 10:48 am 2008
Matthew Wilcox
Re: file locks: Use wait_event_interruptible_timeout()
That's right. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." -
Jan 15, 11:04 am 2008
J. Bruce Fields
Re: file locks: Use wait_event_interruptible_timeout()
OK, thanks, applied just with the "few assumptions" replaced by a description of that particular problem: "interruptible_sleep_on_locked() is just an open-coded wait_event_interruptible_timeout(), with the one difference that interruptible_sleep_on_locked() doesn't bother to check the condition on which it waits, depending instead on the BKL to avoid the case where it blocks after the wakeup has already been called. locks_block_on_timeout() is only used in one place, so it's actually ...
Jan 15, 2:54 pm 2008
previous daytodaynext day
January 14, 2008January 15, 2008January 16, 2008