Hi Linus,
As to the regression reported by Marcin; what happens is that we invoke
printk() while holding the xtime lock for writing. printk() will call
wake_up_klogd() which tries to enqueue klogd on some rq.The known deadlock here is calling printk() while holding rq->lock, which
would then try to recusively lock the rq again when trying to wake klogd.The new deadlock is due to task enqueues setting an hrtimer, which requires
reading the time, which will result in a live-lock when the printk() call-
site is holding the xtime lock for writing.Thomas would like to preserve the printk() information if possible, hence my
proposal of printk_nowakeup(). It will do everything printk() normally does,
except ommit to wake up of klogd. The call is explicitly not EXPORTed so that
its use is confined to core kernel code.Marcin, could you please test these two patches to confirm they do indeed
solve your issue as well?Peter
--
Ok.
Right now, however, I think that for 2.6.25 I'll just remove the printk.
And for the long haul, I really don't think the solution is
"printk_nowakup()", because this is going to happen again when somebody
doesn't realize the code is called with the rq lock held, and it's going
to be a bitch to debug.I just don't think this is maintainable.
Linus
--
So I just committed that patch, but I was also looking at just making
'printk()' itself more robust against this kind of thing.Quite frankly, the printk() console semaphore etc handling is very
confusing, and it has obviously evolved over time. Here's a suggested
cleanup patch that doesn't actually *fix* anything, but it makes the
resulting code flow a heck of a lot more readable.In particular, it would now easily and readably allow us to add other
rules inside the new "can_use_console()" that disable the console flushing
if (for example) the request queue lock is held, or xlock is held for
writing.[ The patch adds more lines than it removes, but that is largely due to
the comments and the fact that it moves some of the code into new helper
functions, which invariably adds a few lines for the function definition
etc.There are actually fewer lines of actual code, because it ends up using
the already-existing "try_acquire_console_sem()", and it avoids one ugly
special case, and removes some trivial code duplication about releasing
the logbuf_lock by reorganizing things a bit ]Comments?
Linus
---
kernel/printk.c | 83 +++++++++++++++++++++++++++++++-----------------------
1 files changed, 48 insertions(+), 35 deletions(-)diff --git a/kernel/printk.c b/kernel/printk.c
index 9adc2a4..c46a20a 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -616,6 +616,40 @@ asmlinkage int printk(const char *fmt, ...)
/* cpu currently holding logbuf_lock */
static volatile unsigned int printk_cpu = UINT_MAX;+/*
+ * Can we actually use the console at this time on this cpu?
+ *
+ * Console drivers may assume that per-cpu resources have
+ * been allocated. So unless they're explicitly marked as
+ * being able to cope (CON_ANYTIME) don't call them until
+ * this CPU is officially up.
+ */
+static inline int can_use_console(unsigned int cpu)
+{
+ return cpu_online(cpu) || have_callable_console();
+}
+
+/*
+ * Try to get console ...
Yeah, we get the printk vs rq->lock thing on regular basis, the xtime
lock is new.I'm afraid I'll have to agree.
How about I use the lockdep infrastructure to check if printk() is
invoked whole holding either xtime or rq lock, and then avoid calling
wake_up_klogd(). That way, we at least get sane debug output when the
lock debugging infrastructure is enabled?As for removing the printk(), Thomas, do you see any other sane way to
relay that information?--
On Mon, 24 Mar 2008 19:15:47 +0100
The core problem seems to be that printk shouldn't be calling wake_up().
Can we fix that?I expect it would be acceptable to do it from the timer interrupt instead.
For NOHZ kernels a poll when we enter the idle loop would also be needed.But does that cover everything? Is it possible for a CPU to run 100% busy
while not receiving timer interrupts? I guess so. To receive no
interrupts at all? Also possible.--
local_irq_disable(); while (1);
But I guess you have more pressing issues when that happens..
---
Subject: printk: robustify printk wakeup behaviourThe klogd wakeup in the printk patch can cause deadlocks when holding the
rq->lock and or xtime_lock for writing.Avoid doing the wakeup under certain conditions and delay it to the next jiffy
tick.Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/kernel.h | 4 ++++
include/linux/seqlock.h | 5 +++++
kernel/printk.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++-
kernel/time/tick-sched.c | 2 +-
kernel/timer.c | 1 +
5 files changed, 57 insertions(+), 2 deletions(-)Index: linux-2.6/include/linux/kernel.h
===================================================================
--- linux-2.6.orig/include/linux/kernel.h
+++ linux-2.6/include/linux/kernel.h
@@ -200,6 +200,8 @@ extern struct ratelimit_state printk_rat
extern int printk_ratelimit(void);
extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
unsigned int interval_msec);
+extern void printk_tick(void);
+extern int printk_needs_cpu(int);
#else
static inline int vprintk(const char *s, va_list args)
__attribute__ ((format (printf, 1, 0)));
@@ -211,6 +213,8 @@ static inline int printk_ratelimit(void)
static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
unsigned int interval_msec) \
{ return false; }
+static inline void printk_tick(void) { }
+static inline int printk_needs_cpu(int) { return 0; }
#endifextern void asmlinkage __attribute__((format(printf, 1, 2)))
Index: linux-2.6/include/linux/seqlock.h
===================================================================
--- linux-2.6.orig/include/linux/seqlock.h
+++ linux-2.6/include/linux/seqlock.h
@@ -71,6 +71,11 @@ static inline void write_sequnlock(seqlo
spin_unlock(&sl->lock);
}+static inline int seq_is_writelocked(seqlock_t *sl)
+{
+ return spin_is_l...
Could we not literally just make this a RCU event? Unconditionally too?
Linus
--
Sure, but the RCU callback period is at least 3 jiffies and much longer
when busy - I'm not sure how long before we force a grace period, we do
that to avoid DoS, right Paul?So this version would have a much higher risk of overflowing the console
buffer and making klogd miss bits. Then again, I really don't care about
klogd at _all_, I've been running with the wakeup patched out for ages.Gah, the below doesn't boot - because I guess we start using rcu before
its properly set up.. should I poke at it more?---
diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..3d80e30 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -32,6 +32,7 @@
#include <linux/security.h>
#include <linux/bootmem.h>
#include <linux/syscalls.h>
+#include <linux/rcupdate.h>#include <asm/uaccess.h>
@@ -982,10 +983,38 @@ int is_console_locked(void)
return console_locked;
}-void wake_up_klogd(void)
+void __wake_up_klogd(struct rcu_head *head);
+
+static struct {
+ struct rcu_head head;
+ spinlock_t lock;
+ int pending;
+} klogd_wakeup_state = {
+ .lock = __SPIN_LOCK_UNLOCKED(klogd_wakeup_state.lock),
+};
+
+void __wake_up_klogd(struct rcu_head *head)
{
+ unsigned long flags;
+
+ spin_lock_irqsave(&klogd_wakeup_state.lock, flags);
+ BUG_ON(!klogd_wakeup_state.pending);
if (!oops_in_progress && waitqueue_active(&log_wait))
wake_up_interruptible(&log_wait);
+ klogd_wakeup_state.pending = 0;
+ spin_unlock_irqrestore(&klogd_wakeup_state.lock, flags);
+}
+
+void wake_up_klogd(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&klogd_wakeup_state.lock, flags);
+ if (!klogd_wakeup_state.pending) {
+ call_rcu(&klogd_wakeup_state.head, __wake_up_klogd);
+ klogd_wakeup_state.pending = 1;
+ }
+ spin_unlock_irqrestore(&klogd_wakeup_state.lock, flags);
}/**
--
I really don't think it matters. klogd is going to write the thing to
_disk_ (or network), and three jiffies really don't matter. If we can fillWell, I'd care a _bit_ about klogd, but not enough to worry about a couple
I'd certainly prefer this kind of approach. However, may I suggest:
- doing the "waitqueue_active(&log_wait)" before even bothering to do the
RCU call. That, btw, will automatically mean that we wouldn't ever call
the RCU code before anything is initialized.- get rid of the "oops_in_progress" thing, since I think the whole point
of that was to avoid getting the lock recursively in the first place.- I'd worry about the "spin_lock_irqsave(&klogd_wakeup_state.lock)". What
if the printk happens from call_rcu()? This is exactly what we're
trying to get away from - having some parts of the kernel not able to
printk() because of subtle locking issues.For that last thing, maybe we can just make it a percpu thing and just
disable irq's?Linus
--
You're _so_ right! :-)
This is much prettier and boots to boot.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..10830d8 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -32,6 +32,7 @@
#include <linux/security.h>
#include <linux/bootmem.h>
#include <linux/syscalls.h>
+#include <linux/rcupdate.h>#include <asm/uaccess.h>
@@ -982,10 +983,43 @@ int is_console_locked(void)
return console_locked;
}+void __wake_up_klogd(struct rcu_head *head);
+
+struct klogd_wakeup_state {
+ struct rcu_head head;
+ int pending;
+};
+
+DEFINE_PER_CPU(struct klogd_wakeup_state, kws);
+
+void __wake_up_klogd(struct rcu_head *head)
+{
+ unsigned long flags;
+ struct klogd_wakeup_state *kws =
+ container_of(head, struct klogd_wakeup_state, head);
+
+ local_irq_save(flags);
+ BUG_ON(!kws->pending);
+ wake_up_interruptible(&log_wait);
+ kws->pending = 0;
+ local_irq_restore(flags);
+}
+
void wake_up_klogd(void)
{
- if (!oops_in_progress && waitqueue_active(&log_wait))
- wake_up_interruptible(&log_wait);
+ unsigned long flags;
+ struct klogd_wakeup_state *kws;
+
+ if (!waitqueue_active(&log_wait))
+ return;
+
+ local_irq_save(flags);
+ kws = &__get_cpu_var(kws);
+ if (!kws->pending) {
+ call_rcu(&kws->head, __wake_up_klogd);
+ kws->pending = 1;
+ }
+ local_irq_restore(flags);
}/**
--
If this makes any difference:
Acked-by: Steven Rostedt <srostedt@redhat.com>
-- Steve
--
In case we have a recursive oops in the RCU paths, I'd think we'd be
*much* better off withif (!kws->pending) {
kws->pending = 1;
call_rcu(&kws->head, __wake_up_klogd);
}in order to not re-enter the RCU paths again and causing endless oopses
along with list corruption.Hmm?
Oh, and I don't see the point of the irqsave/restore in __wake_up_klogd().
Or the BUG_ON(), for that matter (even if it ever were to trigger, we'd be
better off _not_ triggering it - I agree with it in a "verifying the new
implementation works at all" sense, but not in a "this is actually for
merging" kind of patch).Linus
--
Avoid deadlocks against rq->lock and xtime_lock by using RCU to defer
the klogd wakeup.Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Steven Rostedt <srostedt@redhat.com>
---
diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..8870c2e 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -32,6 +32,7 @@
#include <linux/security.h>
#include <linux/bootmem.h>
#include <linux/syscalls.h>
+#include <linux/rcupdate.h>#include <asm/uaccess.h>
@@ -982,10 +983,37 @@ int is_console_locked(void)
return console_locked;
}+struct klogd_wakeup_state {
+ struct rcu_head head;
+ int pending;
+};
+
+DEFINE_PER_CPU(struct klogd_wakeup_state, kws);
+
+void __wake_up_klogd(struct rcu_head *head)
+{
+ struct klogd_wakeup_state *kws =
+ container_of(head, struct klogd_wakeup_state, head);
+
+ wake_up_interruptible(&log_wait);
+ kws->pending = 0;
+}
+
void wake_up_klogd(void)
{
- if (!oops_in_progress && waitqueue_active(&log_wait))
- wake_up_interruptible(&log_wait);
+ unsigned long flags;
+ struct klogd_wakeup_state *kws;
+
+ if (!waitqueue_active(&log_wait))
+ return;
+
+ local_irq_save(flags);
+ kws = &__get_cpu_var(kws);
+ if (!kws->pending) {
+ kws->pending = 1;
+ call_rcu(&kws->head, __wake_up_klogd);
+ }
+ local_irq_restore(flags);
}/**
--
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Since there is not RCU_HEAD_INIT() made on these per-cpu objects, this code
is depending upon the fact that currently the RCU head is just initialized
to all zeros.
--
RCU_HEAD_INIT() is only needed for debugging purposes, when you want
to be sure that any garbage in the rcu_head is due to corruption or
some such. The call_rcu() function always completely initializes the
rcu_head structure.Thanx, Paul
--
Peter Zijlstra <a.p.zijlstra@chello.nl> writes:
Hopefully nobody puts a printk into call_rcu now...
Also I'm not sure if there wouldn't be livelocks if someone puts
printk into some other parts of RCU (e.g. each RCU adding more
RCU events -- haven't checked that in detail so I might be wrong)How about a big comment describing why you do this?
And you should add comments to all the new hands-off parts
where printk is not allowed now.-Andi
--
Longer term, I will have one of the people here (who just happens to have
made the mistake of expressing an interest in hacking on RCU) look into
permitting invocation of RCU earlier during boot, as in before rcu_init()
has been called. Should be possible, in fact should be possible without
common-case performance penalty. ;-)In the meantime, looks good, so:
--
On Fri, 08 Aug 2008 20:14:28 +0200
Note that kernel/rcupreempt.c's flavour of call_rcu() takes
RCU_DATA_ME().lock, so there are still code sites from which a printk
can deadlock. Only now, it is config-dependent.From a quick look it appears that large amounts of kernel/rcupreempt.c
are now a printk-free zone.--
The RCU callbacks are (and must be) invoked without holding any RCU
locks, so printk()s in RCU callbacks are perfectly permissible. It would
not be -that- hard to eliminate the lock in call_rcu(), but yeccch...
It would be far easier to debug RCU without the benefit printk()s than
to deal with a preemptable RCU implementation that was based solely onThe only printk() is in preemptable RCU's rcu_init() function, whose
only purpose is to mark the dmesg appropriately, and which should not
be a problem.And that would be because I do a fair amount of prototyping and debugging
in user space, where one instead uses printf(). ;-)Thanx, Paul
--
Drad, missed that bit, I did look at the calling end, but forgot the
call_rcu() end :-/The initial printk_tick() based implementation didn't suffer this
problem, should we revert to that scheme?--
Just in case people care..
---
Subject: printk: robustify printkAvoid deadlocks against rq->lock and xtime_lock by deferring the klogd
wakeup by polling from the timer tick.Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/kernel.h | 4 ++++
kernel/printk.c | 19 +++++++++++++++++--
kernel/time/tick-sched.c | 2 +-
kernel/timer.c | 1 +
4 files changed, 23 insertions(+), 3 deletions(-)Index: linux-2.6/include/linux/kernel.h
===================================================================
--- linux-2.6.orig/include/linux/kernel.h
+++ linux-2.6/include/linux/kernel.h
@@ -200,6 +200,8 @@ extern struct ratelimit_state printk_rat
extern int printk_ratelimit(void);
extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
unsigned int interval_msec);
+extern void printk_tick(void);
+extern int printk_needs_cpu(int);
#else
static inline int vprintk(const char *s, va_list args)
__attribute__ ((format (printf, 1, 0)));
@@ -211,6 +213,8 @@ static inline int printk_ratelimit(void)
static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
unsigned int interval_msec) \
{ return false; }
+static inline void printk_tick(void) { }
+static inline int printk_needs_cpu(int) { return 0; }
#endifextern void asmlinkage __attribute__((format(printf, 1, 2)))
Index: linux-2.6/kernel/printk.c
===================================================================
--- linux-2.6.orig/kernel/printk.c
+++ linux-2.6/kernel/printk.c
@@ -982,10 +982,25 @@ int is_console_locked(void)
return console_locked;
}-void wake_up_klogd(void)
+static DEFINE_PER_CPU(int, printk_pending);
+
+void printk_tick(void)
{
- if (!oops_in_progress && waitqueue_active(&log_wait))
+ if (__get_cpu_var(printk_pending)) {
+ __get_cpu_var(printk_pending) = 0;
wake_up_interruptible(&log_wait);
+ }
+}
+
+int printk_needs_cpu(int cpu)
+{
+ return per_cpu(printk_p...
i missed most of the discussion, but this seems like the simplest (and
hence ultimately the best) approach to me.Coupling printk with RCU, albeit elegant, does not seem like the right
choice to me in this specific case: printk as an essential debug
mechanism should be as decoupled as possible.Also, once we accept the possibility of async klogd completion, we might
as well do it all the time.this change made a previous design quirks even more visible: these are
items that are not purely event driven but need some polling component.
RCU is one, and now printk is another.We could clean this up further by integrating the rcu_needs_cpu() and
printk_needs_cpu() into a softirq mechanism. We already check for
pending softirqs in tick-sched.c, so the above complication would go
away completely.( But that's for a separate cleanup patch i think. )
No strong feelings though. Peter, which one do you prefer?
Ingo
--
I am missing something here. Are you suggesting that RCU call out
when a given CPU has nothing to do, rather than the current behavior
where rcu_needs_cpu() is invoked when a CPU is being considered for
dynticks idle mode? My concern with this approach would be races that
are currently avoided by the fact that calls to rcu_needs_cpu() are
performed with hardirqs disabled.--
RCU depends on the polling to advance the state machine, if you want an
event driven state machine, you'd have to drive it from
rcu_read_unlock() adding overhead there - and I'm pretty sure you don't
want to do that.So while its a tad ugly to poll for these states, I'm not too worried in
these two cases - of course every additional poll needs goodI personally prefer this printk_tick() driven one over the RCU driven
one because it doesn't trade deadlocks.--
One way to break the deadlock within the RCU subsystem would be something
similar to the following untested (known not to compile) patch. The
idea is that RCU detects that call_rcu() is being called from printk(), and
simply enqueues the callback in this case. For this to really work, RCU
needs something exported from printk() to allow it to make this
decision. I chose the static variable printk_cpu below just to present
the general idea.Thoughts?
Buggy, but otherwise:
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---rcupreempt.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)diff -urpNa -X dontdiff linux-2.6.27-rc1/kernel/rcupreempt.c linux-2.6.27-rc1-printk/kernel/rcupreempt.c
--- linux-2.6.27-rc1/kernel/rcupreempt.c 2008-07-30 08:48:17.000000000 -0700
+++ linux-2.6.27-rc1-printk/kernel/rcupreempt.c 2008-08-11 09:02:03.000000000 -0700
@@ -1118,17 +1118,22 @@ void call_rcu(struct rcu_head *head, voi
{
unsigned long flags;
struct rcu_data *rdp;
+ int this_cpu = smp_processor_id();head->func = func;
head->next = NULL;
local_irq_save(flags);
rdp = RCU_DATA_ME();
- spin_lock(&rdp->lock);
- __rcu_advance_callbacks(rdp);
+ if (this_cpu == printk_cpu) {
+ spin_lock(&rdp->lock);
+ __rcu_advance_callbacks(rdp);
+ }
*rdp->nexttail = head;
rdp->nexttail = &head->next;
RCU_TRACE_RDP(rcupreempt_trace_next_add, rdp);
- spin_unlock_irqrestore(&rdp->lock, flags);
+ if (this_cpu == printk_cpu) {
+ spin_unlock_irqrestore(&rdp->lock, flags);
+ }
}
EXPORT_SYMBOL_GPL(call_rcu);--
ok - i guess we can declare stop_tick a slowpath as well (we are on the
i've started testing it in tip/core/printk to give it some track record
- thanks Peter.Linus, any strong opinion against Peter's patch below (or in favor of
the RCU patch)? Tentatively for v2.6.28 i'd say, to give people time to
object to the async behavior.Ingo
----------------->
From b845b517b5e3706a3729f6ea83b88ab85f0725b0 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Fri, 8 Aug 2008 21:47:09 +0200
Subject: [PATCH] printk: robustify printkAvoid deadlocks against rq->lock and xtime_lock by deferring the klogd
wakeup by polling from the timer tick.Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
include/linux/kernel.h | 4 ++++
kernel/printk.c | 19 +++++++++++++++++--
kernel/time/tick-sched.c | 2 +-
kernel/timer.c | 1 +
4 files changed, 23 insertions(+), 3 deletions(-)diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index aaa998f..113ac8d 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -200,6 +200,8 @@ extern struct ratelimit_state printk_ratelimit_state;
extern int printk_ratelimit(void);
extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
unsigned int interval_msec);
+extern void printk_tick(void);
+extern int printk_needs_cpu(int);
#else
static inline int vprintk(const char *s, va_list args)
__attribute__ ((format (printf, 1, 0)));
@@ -211,6 +213,8 @@ static inline int printk_ratelimit(void) { return 0; }
static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
unsigned int interval_msec) \
{ return false; }
+static inline void printk_tick(void) { }
+static inline int printk_needs_cpu(int) { return 0; }
#endifextern void asmlinkage __attribute__((format(printf, 1, 2)))
diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..65...
Then I think we also want the patch below on top of that, right?
From: Jiri Kosina <jkosina@suse.cz>
printk: robustify printk, fix #3Remove the comment describing the possibility of printk() deadlocking on
runqueue lock.Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
kernel/printk.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..9fbf391 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -577,9 +577,6 @@ static int have_callable_console(void)
* @fmt: format string
*
* This is printk(). It can be called from any context. We want it to work.
- * Be aware of the fact that if oops_in_progress is not set, we might try to
- * wake klogd up which could deadlock on runqueue lock if printk() is called
- * from scheduler code.
*
* We try to grab the console_sem. If we succeed, it's easy - we log the output and
* call the console drivers. If we fail to get the semaphore we place the output
--
--
applied to tip/core/printk, thanks.
Ingo
--
small build fixlet below.
Ingo
-------------->
From 99fc8ce41d639ce46f32c4e8618e8a571858e5aa Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Mon, 11 Aug 2008 14:38:12 +0200
Subject: [PATCH] printk: robustify printk, fix
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bitfix:
include/linux/kernel.h: In function ‘printk_needs_cpu':
include/linux/kernel.h:217: error: parameter name omittedSigned-off-by: Ingo Molnar <mingo@elte.hu>
---
include/linux/kernel.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index c9ed24c..a012579 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -201,7 +201,7 @@ extern int printk_ratelimit(void);
extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
unsigned int interval_msec);
extern void printk_tick(void);
-extern int printk_needs_cpu(int);
+extern int printk_needs_cpu(int cpu);
#else
static inline int vprintk(const char *s, va_list args)
__attribute__ ((format (printf, 1, 0)));
@@ -214,7 +214,7 @@ static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
unsigned int interval_msec) \
{ return false; }
static inline void printk_tick(void) { }
-static inline int printk_needs_cpu(int) { return 0; }
+static inline int printk_needs_cpu(int cpu) { return 0; }
#endifextern void asmlinkage __attribute__((format(printf, 1, 2)))
--
The problem is that it means any printk data output that is more
than DMESG-BUFFER-SIZE bytes during one clock tick is going to lose data.
It loses the natural adaption to higher printk rates that you
got previously.Now we could say that for debugging etc. people should switch
to other mechanisms like relayfs, but I would still worry about
some corner cases where losing printk data that wasn't lost before
could be a severe regression (e.g. consider firewall log messages
or similar)Essentially it makes printk (much?) less reliable than it was before
in the general case. Not sure that's a good thing. So the patch
title is definitely misleading.As Linus pointed out earlier we've survived with this restriction
(not doing printk in the scheduler) for a long time, so is thereRCU coupling has actually the same problem.
-Andi
--
You only loose the msgs with klogd, console still gets everything. If
firewalls are generating that much data, perhaps its time to think aboutDepends, I don't give a rats arse about klogd - I get everything through
Why not fix it if its acceptable - the deadlock is just ugly.
--
That's a rather misleading argument, because klogd as it stands today is
already lossy: you can overflow it with enough printk data. (It's rather
easy to trigger it as well, so nobody sane relies on it as a reliable
channel.)Using relayfs and dynamic buffers would be a much worse approach because
if we are out of buffer space it would be a robustness disaster to start
allocating more RAM. (we want less coupling of printk to other kernel
subsystems, not more coupling) Preallocating wont help either because ityeah, and note that klogd as it stands today is a lossy channel no
matter what - there's nothing that keeps a really verbose kernel from
flooding the buffer.The issue Andi raises is largely independent of this change and there's
three clean options to deal with it:- increase buffering (already possible, see CONFIG_LOG_BUF_SHIFT)
- create a more reliable klogd channel with a dynamically
shrinking/growing buffer (i very much doubt the complexity is
worth it, and this one wont be 100% loss-less either)- use a reliable console
#1 is what distros use to maximize the yield of user bugreports, and #3
is what everyone who ultimately cares about massive printk output and
reliable logging uses.Ingo
--
That's true, but with at least the timer tick version it would become much
worse I think, because there would be a rather low upper limit on
the maximum frequency of klogd wakeups.Not fully sure about the RCU version, i guess it would depend on
how many CPUs you have (but that one had the problem of
"cannot put printks into RCU anymore")BTW if you only care about the console you can always use early_printk()
at least for serial/vga/usb debug on x86 -- it will never lose anything (at
least as long as you can read the output fast enough) and also
works in nearly any context. But of course since it also doesn't
log anything to klogd it's probably not a good idea outside debugging.-Andi
--
Yes, and netfilter has them in fact, but it's clearly still a regression for
people who rely on klogd for this today.Also firewall is just an example. Other cases might be people relying
The question of interest is not how you personally configure your systems,
Well you fix one thing and you break another thing (high rate
printk). It's not clear to me that the trade off is a good
one in this case. I suppose far more people care about
high rate printk than the number of people who put
printk into the scheduler.-Andi
--
Similar to "netfilter has other ways", the SELinux messages have another
defined way as well. If your site is high-security and *depends* on the
messages, the *right* answer is to run auditd to catch and log the messages,
and configure auditd to bring the system to a screeching halt if the
log space fills up or there's other problems (yes, some sites would rather
have the machine drop dead than possibly miss an SELinux message).
$INSERT-ANY-OTHER-PRINTK-USE$
The point was that printk may have been perfectly adequate for
them with its existing buffer sizes and dynamic wake up frequency,
and might not anymore with the timer wakeup change. Essentially
it is making a widely used kernel facility more fragile.-Andi
--
I've measured several latencies in the kernel that would cause klogd to
take serveral jiffies to wake up. I highly doubt that adding one jiffy
will break anything. And if it did, then it would show a bug in their
system. If adding one jiffy causes lost data, then the system
administrators are relying on a utility (klogd) that can easily fail them
without these patches.If we are making a highly fragile setup a bit more fragile, maybe it will
wake the IT guys up to change their setup.-- Steve
--
On Fri, 08 Aug 2008 21:21:08 +0200
Dunno. Perhaps we could convert RCU_DATA_ME's spinlock_t into an
rwlock and do read_lock() in call_rcu()? Then we can should be able to
call printk from inside that read_lock(), but not inside write_lock(),
which, with suitable warning comments might be acceptable.afacit everything in call_rcu()'s *rdp is cpu-local and is protected by
local_irq_save(). rcu_ctrlblk.completed and rcu_flipped need some
protection, but a) rdp->lock isn't sufficient anyway and b)
read_lock protection would suffice. Maybe other CPUs can alter *rdp
while __rcu_advance_callbacks() is running.Anyway, that's all handwaving. My point is that making rcupreempt.c
more robust and more concurrent might be an alternative fix, and might
be beneficial in its own right. Working out the details is what we
have Pauls for ;)--
How about if I instead add comments warning people not to put printk()
in the relevant RCU-implementation code? That way I can be not only lazy,
but cowardly as well! ;-)Thanx, Paul
--
Well, it does spoil the party a little, we are now trading one deadlock
for another :-/--
On Fri, 08 Aug 2008 22:37:39 +0200
Yup. We could just as easily put those comments into the scheduler
wakeup code.Why are we fixing this, btw? The problem has been there forever and
people who hack the wakeup code could/should know about it anyway. All
they need to do is to disable klogd during development. Did the
problem recently become worse for some reason?--
It hasn't beemn there forever at all.
Yes, there used to be reliance on the actual _scheduler_ locks. Doign a
wake_up() would cause runqueue locks etc to be taken.But the xtime deadlock is fairly recent, and only happened with CFQ, I
think.And _that_ is the irritating one. I personally wouldn't mind at all if
there is some printk() dependency on the core runqueue rq->lock or on the
RCU locking thing. But look at xtime_lock. THAT is a disaster.Just grep for it.
So I personally actually like the RCU version best. Yes, it still depends
on really core locking. But it's really core and low-level and _confined_
locking, where a comment in a single place would probably suffice. Compare
that to all the places where we take the xtime_lock for writing!Linus
--
On Fri, 8 Aug 2008 13:57:26 -0700 (PDT)
<actually reads stuff>
Yes, not being able to do printk inside xtime_lock would be a disaster.
We decided that about 1.5 years ago last time we added then fixed this
bug (at the time I think I identified multiple already-present printks
inside xtime_lock, on error paths). Did we go and re-add this bug recentlySure, the cant-printk-in-rcupreempt.c limitation should be quite
acceptable.
--
Yeah, but only Paul and his clones would need printks in preempt RCU.
The rest of us mortals now have the rest of the kernel to play with.
-- Steve
--
Hehe :-)
Just in case Paul shows he's human and cannot work his way around it, I
just posted a new version of the printk_tick() stuff..--
Ok, I can't see anything wrong with this from the patch. I do hate that
deadlock, so I think I'll apply it. But I'll try to hold off for a while
in case somebody has any last-minute comments..Linus
--
Hehe, we could do the boring conventional thing and stick it in -mm and
-next and such :-)--
/me tatoos on forehead: 'quilt refresh' before posting!!
Subject: printk: robustify printk wakeup behaviour
The klogd wakeup in the printk patch can cause deadlocks when holding the
rq->lock and or xtime_lock for writing.Avoid doing the wakeup under certain conditions and delay it to the next jiffy
tick.Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/kernel.h | 4 +++
include/linux/seqlock.h | 5 ++++
kernel/printk.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++-
kernel/time/tick-sched.c | 2 -
kernel/timer.c | 1
5 files changed, 58 insertions(+), 2 deletions(-)Index: linux-2.6/include/linux/kernel.h
===================================================================
--- linux-2.6.orig/include/linux/kernel.h
+++ linux-2.6/include/linux/kernel.h
@@ -200,6 +200,8 @@ extern struct ratelimit_state printk_rat
extern int printk_ratelimit(void);
extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
unsigned int interval_msec);
+extern void printk_tick(void);
+extern int printk_needs_cpu(int);
#else
static inline int vprintk(const char *s, va_list args)
__attribute__ ((format (printf, 1, 0)));
@@ -211,6 +213,8 @@ static inline int printk_ratelimit(void)
static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
unsigned int interval_msec) \
{ return false; }
+static inline void printk_tick(void) { }
+static inline int printk_needs_cpu(int) { return 0; }
#endifextern void asmlinkage __attribute__((format(printf, 1, 2)))
Index: linux-2.6/include/linux/seqlock.h
===================================================================
--- linux-2.6.orig/include/linux/seqlock.h
+++ linux-2.6/include/linux/seqlock.h
@@ -71,6 +71,11 @@ static inline void write_sequnlock(seqlo
spin_unlock(&sl->lock);
}+static inline int seq_is_writelocked(seqlock_t *sl)
+{
+ return spin_is_locked(&sl->lock);
+}
+
static inline int w...
I've successfully tested these patches. Thanks.
Tested-by: Marcin Slusarz <marcin.slusarz@gmail.com>Marcin
--
| Bart Van Assche | Re: Integration of SCST in the mainstream Linux kernel |
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
| Pavel Roskin | ndiswrapper and GPL-only symbols redux |
| Linus Torvalds | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
git: | |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Paweł Staszewski | rib_trie / Fix inflate_threshold_root. Now=15 size=11 bits |
| David Miller | Re: [PATCH 3/3] Convert the UDP hash lock to RCU |
