The following commit may be a problem for emergency_remount() [Alt+SysRq+U]:
commit fa4b9074cd8428958c2adf9dc0c831f46e27c193
Author: Tejun Heo <tj@kernel.org>
Date: Sat May 15 20:09:27 2010 +0200
buffer: make invalidate_bdev() drain all percpu LRU add caches
invalidate_bdev() should release all page cache pages which are clean
and not being used; however, if some pages are still in the percpu LRU
add caches on other cpus, those pages are considered in used and don't
get released. Fix it by calling lru_add_drain_all() before trying to
invalidate pages.
This problem was discovered while testing block automatic native
capacity unlocking. Null pages which were read before automatic
unlocking didn't get released by invalidate_bdev() and ended up
interfering with partition scan after unlocking.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The symptoms are a lockdep warning:
SysRq : Emergency Remount R/O
=============================================
[ INFO: possible recursive locking detected ]
2.6.34-cachefs #101
---------------------------------------------
events/0/9 is trying to acquire lock:
(events){+.+.+.}, at: [<ffffffff81042cf0>] flush_work+0x34/0xec
but task is already holding lock:
(events){+.+.+.}, at: [<ffffffff81042264>] worker_thread+0x19a/0x2e2
other info that might help us debug this:
3 locks held by events/0/9:
#0: (events){+.+.+.}, at: [<ffffffff81042264>] worker_thread+0x19a/0x2e2
#1: ((work)#3){+.+...}, at: [<ffffffff81042264>] worker_thread+0x19a/0x2e2
#2: (&type->s_umount_key#30){++++..}, at: [<ffffffff810b3fc8>] do_emergency_remount+0x54/0xda
stack backtrace:
Pid: 9, comm: events/0 Not tainted 2.6.34-cachefs #101
Call Trace:
[<ffffffff81054e80>] validate_chain+0x584/0xd23
[<ffffffff81052ad4>] ? trace_hardirqs_off+0xd/0xf
[<ffffffff8101c264>] ? ...Commit fa4b9074cd8428958c2adf9dc0c831f46e27c193 made s_umount depend
on keventd; however, emergency remount schedules works to keventd
which grabs s_umount creating a circular dependency. Run emergency
remount on a separate workqueue to break it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
Unless someone objects, Andrew, can you please take this patch?
Thanks.
fs/super.c | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/fs/super.c b/fs/super.c
index 69688b1..1ada607 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -575,6 +575,11 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
return 0;
}
+/*
+ * For emergency remount
+ */
+static struct workqueue_struct *emergency_remount_wq;
+
static void do_emergency_remount(struct work_struct *work)
{
struct super_block *sb, *n;
@@ -605,13 +610,25 @@ void emergency_remount(void)
{
struct work_struct *work;
+ if (!emergency_remount_wq)
+ return;
+
work = kmalloc(sizeof(*work), GFP_ATOMIC);
if (work) {
INIT_WORK(work, do_emergency_remount);
- schedule_work(work);
+ queue_work(emergency_remount_wq, work);
}
}
+static int __init emergency_remount_init(void)
+{
+ emergency_remount_wq = create_singlethread_workqueue("emerg-remount");
+ if (!emergency_remount_wq)
+ pr_warn("failed to create emergency remount workqueue\n");
+ return 0;
+}
+subsys_initcall(emergency_remount_init);
+
/*
* Unnamed block devices are dummy devices used by virtual
* filesystems which don't use real block-devices. -- jrs
--
I have a stupid question, why using workqueue instead of calling do_remount_sb() directly in emergency_remount()? Avoid blocking emergency_remount()? Thanks. -- Live like a child, think like the god. --
Umm... because it's called from interrupt handler? Right? -- tejun --
Ah, this is true, sysrq can be both triggered by keyboard and /proc/sysrq-trigger. Thanks! --
On Thu, 27 May 2010 11:57:23 +0200
For a while I thought you had the wrong commit ID, but I worked it out!
Please, always quote the patch title rather than a bare commit ID. The
usual form is
fa4b9074cd8428958c2adf9dc0c831f46e27c193 ("buffer: make
invalidate_bdev() drain all percpu LRU add caches:)
The main reason for this is so that people can more reliably and simply
gaah. Do we really want to add Yet Another Kernel Thread just for that
dopey sysrq-U thing?
I assume (coz you didn't tell us) that it generates a lockdep spew?
Perhaps it'd be better to just suppress that somehow rather than this...
And if we _do_ end up adding a new kernel thread for this, maybe it
would be better to use that thread for lru_add_drain_all() rather than
within the dopey do_emergency_remount(), so as to reduce the likelihood
that we'll need to add even more kernel threads to solve the same
problem elsewhere? But this would require a new kernel thread on each
CPU, grr.
Another possibility might be to change lru_add_drain_all() to use IPI
interrupts rather than schedule_on_each_cpu(). That would greatly
speed up lru_add_drain_all(). I don't recall why we did it that way
and I don't immediately see a reason not to. A few things in core mm
would need to be changed from spin_lock_irq() to spin_lock_irqsave().
But I do have vague memories that there was a reason for it.
<It's a huge PITA locating the commit which initially added
lru_add_drain_all()>
It didn't tell us.
<looks in the linux-mm archives>
Nope, no rationale is provided there either.
--
Absolutely. Also, I think it's usually more readable to quote just the first 12 hex digits of the SHA1 - that's still going to be perfectly unique in any practical situation, and makes it way easier to flow the I do have to agree that it's disgusting. Can't we use an existing thread (slow-work?) or something like that? Linus --
Hello, The dedicated workqueue can go away with cmwq. As it's a temporary measure until then, I wanted to keep it simple. Would it be okay if I note that the dedicated workqueue will go away soonish in the patch description and comment? Thanks. -- tejun --
Close. There's some talk there of using smp_call_function() (actually on_each_cpu()) within lru_add_drain_all(), but nobody seems to have tried it. --
