Re: [PATCH] printk: robustify printk

Previous thread: [RFC PATCH] freeze feature ver 1.0 by Takashi Sato on Monday, March 24, 2008 - 7:11 am. (2 messages)

Next thread: [PATCH 2/2] time: xtime lock vs printk by Peter Zijlstra on Monday, March 24, 2008 - 8:24 am. (2 messages)
To: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Marcin Slusarz <marcin.slusarz@...>
Cc: LKML <linux-kernel@...>, Peter Zijlstra <a.p.zijlstra@...>
Date: Monday, March 24, 2008 - 8:24 am

Hi Linus,

As to the regression reported by Marcin; what happens is that we invoke
printk() while holding the xtime lock for writing. printk() will call
wake_up_klogd() which tries to enqueue klogd on some rq.

The known deadlock here is calling printk() while holding rq->lock, which
would then try to recusively lock the rq again when trying to wake klogd.

The new deadlock is due to task enqueues setting an hrtimer, which requires
reading the time, which will result in a live-lock when the printk() call-
site is holding the xtime lock for writing.

Thomas would like to preserve the printk() information if possible, hence my
proposal of printk_nowakeup(). It will do everything printk() normally does,
except ommit to wake up of klogd. The call is explicitly not EXPORTed so that
its use is confined to core kernel code.

Marcin, could you please test these two patches to confirm they do indeed
solve your issue as well?

Peter

--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Marcin Slusarz <marcin.slusarz@...>, LKML <linux-kernel@...>
Date: Monday, March 24, 2008 - 1:58 pm

Ok.

Right now, however, I think that for 2.6.25 I'll just remove the printk.

And for the long haul, I really don't think the solution is
"printk_nowakup()", because this is going to happen again when somebody
doesn't realize the code is called with the rq lock held, and it's going
to be a bitch to debug.

I just don't think this is maintainable.

Linus
--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Marcin Slusarz <marcin.slusarz@...>, LKML <linux-kernel@...>
Date: Monday, March 24, 2008 - 2:16 pm

So I just committed that patch, but I was also looking at just making
'printk()' itself more robust against this kind of thing.

Quite frankly, the printk() console semaphore etc handling is very
confusing, and it has obviously evolved over time. Here's a suggested
cleanup patch that doesn't actually *fix* anything, but it makes the
resulting code flow a heck of a lot more readable.

In particular, it would now easily and readably allow us to add other
rules inside the new "can_use_console()" that disable the console flushing
if (for example) the request queue lock is held, or xlock is held for
writing.

[ The patch adds more lines than it removes, but that is largely due to
the comments and the fact that it moves some of the code into new helper
functions, which invariably adds a few lines for the function definition
etc.

There are actually fewer lines of actual code, because it ends up using
the already-existing "try_acquire_console_sem()", and it avoids one ugly
special case, and removes some trivial code duplication about releasing
the logbuf_lock by reorganizing things a bit ]

Comments?

Linus
---
kernel/printk.c | 83 +++++++++++++++++++++++++++++++-----------------------
1 files changed, 48 insertions(+), 35 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 9adc2a4..c46a20a 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -616,6 +616,40 @@ asmlinkage int printk(const char *fmt, ...)
/* cpu currently holding logbuf_lock */
static volatile unsigned int printk_cpu = UINT_MAX;

+/*
+ * Can we actually use the console at this time on this cpu?
+ *
+ * Console drivers may assume that per-cpu resources have
+ * been allocated. So unless they're explicitly marked as
+ * being able to cope (CON_ANYTIME) don't call them until
+ * this CPU is officially up.
+ */
+static inline int can_use_console(unsigned int cpu)
+{
+ return cpu_online(cpu) || have_callable_console();
+}
+
+/*
+ * Try to get console ...

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Marcin Slusarz <marcin.slusarz@...>, LKML <linux-kernel@...>
Date: Monday, March 24, 2008 - 2:15 pm

Yeah, we get the printk vs rq->lock thing on regular basis, the xtime
lock is new.

I'm afraid I'll have to agree.

How about I use the lockdep infrastructure to check if printk() is
invoked whole holding either xtime or rq lock, and then avoid calling
wake_up_klogd(). That way, we at least get sane debug output when the
lock debugging infrastructure is enabled?

As for removing the printk(), Thomas, do you see any other sane way to
relay that information?

--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: <torvalds@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>
Date: Monday, March 24, 2008 - 2:57 pm

On Mon, 24 Mar 2008 19:15:47 +0100

The core problem seems to be that printk shouldn't be calling wake_up().
Can we fix that?

I expect it would be acceptable to do it from the timer interrupt instead.
For NOHZ kernels a poll when we enter the idle loop would also be needed.

But does that cover everything? Is it possible for a CPU to run 100% busy
while not receiving timer interrupts? I guess so. To receive no
interrupts at all? Also possible.

--

To: Andrew Morton <akpm@...>
Cc: <torvalds@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, David Miller <davem@...>, Steven Rostedt <rostedt@...>
Date: Friday, August 8, 2008 - 9:30 am

local_irq_disable(); while (1);

But I guess you have more pressing issues when that happens..

---
Subject: printk: robustify printk wakeup behaviour

The klogd wakeup in the printk patch can cause deadlocks when holding the
rq->lock and or xtime_lock for writing.

Avoid doing the wakeup under certain conditions and delay it to the next jiffy
tick.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/kernel.h | 4 ++++
include/linux/seqlock.h | 5 +++++
kernel/printk.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++-
kernel/time/tick-sched.c | 2 +-
kernel/timer.c | 1 +
5 files changed, 57 insertions(+), 2 deletions(-)

Index: linux-2.6/include/linux/kernel.h
===================================================================
--- linux-2.6.orig/include/linux/kernel.h
+++ linux-2.6/include/linux/kernel.h
@@ -200,6 +200,8 @@ extern struct ratelimit_state printk_rat
extern int printk_ratelimit(void);
extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
unsigned int interval_msec);
+extern void printk_tick(void);
+extern int printk_needs_cpu(int);
#else
static inline int vprintk(const char *s, va_list args)
__attribute__ ((format (printf, 1, 0)));
@@ -211,6 +213,8 @@ static inline int printk_ratelimit(void)
static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
unsigned int interval_msec) \
{ return false; }
+static inline void printk_tick(void) { }
+static inline int printk_needs_cpu(int) { return 0; }
#endif

extern void asmlinkage __attribute__((format(printf, 1, 2)))
Index: linux-2.6/include/linux/seqlock.h
===================================================================
--- linux-2.6.orig/include/linux/seqlock.h
+++ linux-2.6/include/linux/seqlock.h
@@ -71,6 +71,11 @@ static inline void write_sequnlock(seqlo
spin_unlock(&sl->lock);
}

+static inline int seq_is_writelocked(seqlock_t *sl)
+{
+ return spin_is_l...

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Andrew Morton <akpm@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, David Miller <davem@...>, Steven Rostedt <rostedt@...>
Date: Friday, August 8, 2008 - 12:41 pm

Could we not literally just make this a RCU event? Unconditionally too?

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, David Miller <davem@...>, Steven Rostedt <rostedt@...>, Paul E McKenney <paulmck@...>
Date: Friday, August 8, 2008 - 1:10 pm

Sure, but the RCU callback period is at least 3 jiffies and much longer
when busy - I'm not sure how long before we force a grace period, we do
that to avoid DoS, right Paul?

So this version would have a much higher risk of overflowing the console
buffer and making klogd miss bits. Then again, I really don't care about
klogd at _all_, I've been running with the wakeup patched out for ages.

Gah, the below doesn't boot - because I guess we start using rcu before
its properly set up.. should I poke at it more?

---
diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..3d80e30 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -32,6 +32,7 @@
#include <linux/security.h>
#include <linux/bootmem.h>
#include <linux/syscalls.h>
+#include <linux/rcupdate.h>

#include <asm/uaccess.h>

@@ -982,10 +983,38 @@ int is_console_locked(void)
return console_locked;
}

-void wake_up_klogd(void)
+void __wake_up_klogd(struct rcu_head *head);
+
+static struct {
+ struct rcu_head head;
+ spinlock_t lock;
+ int pending;
+} klogd_wakeup_state = {
+ .lock = __SPIN_LOCK_UNLOCKED(klogd_wakeup_state.lock),
+};
+
+void __wake_up_klogd(struct rcu_head *head)
{
+ unsigned long flags;
+
+ spin_lock_irqsave(&klogd_wakeup_state.lock, flags);
+ BUG_ON(!klogd_wakeup_state.pending);
if (!oops_in_progress && waitqueue_active(&log_wait))
wake_up_interruptible(&log_wait);
+ klogd_wakeup_state.pending = 0;
+ spin_unlock_irqrestore(&klogd_wakeup_state.lock, flags);
+}
+
+void wake_up_klogd(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&klogd_wakeup_state.lock, flags);
+ if (!klogd_wakeup_state.pending) {
+ call_rcu(&klogd_wakeup_state.head, __wake_up_klogd);
+ klogd_wakeup_state.pending = 1;
+ }
+ spin_unlock_irqrestore(&klogd_wakeup_state.lock, flags);
}

/**

--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Andrew Morton <akpm@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, David Miller <davem@...>, Steven Rostedt <rostedt@...>, Paul E McKenney <paulmck@...>
Date: Friday, August 8, 2008 - 1:25 pm

I really don't think it matters. klogd is going to write the thing to
_disk_ (or network), and three jiffies really don't matter. If we can fill

Well, I'd care a _bit_ about klogd, but not enough to worry about a couple

I'd certainly prefer this kind of approach. However, may I suggest:

- doing the "waitqueue_active(&log_wait)" before even bothering to do the
RCU call. That, btw, will automatically mean that we wouldn't ever call
the RCU code before anything is initialized.

- get rid of the "oops_in_progress" thing, since I think the whole point
of that was to avoid getting the lock recursively in the first place.

- I'd worry about the "spin_lock_irqsave(&klogd_wakeup_state.lock)". What
if the printk happens from call_rcu()? This is exactly what we're
trying to get away from - having some parts of the kernel not able to
printk() because of subtle locking issues.

For that last thing, maybe we can just make it a percpu thing and just
disable irq's?

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, David Miller <davem@...>, Steven Rostedt <rostedt@...>, Paul E McKenney <paulmck@...>
Date: Friday, August 8, 2008 - 1:40 pm

You're _so_ right! :-)

This is much prettier and boots to boot.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..10830d8 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -32,6 +32,7 @@
#include <linux/security.h>
#include <linux/bootmem.h>
#include <linux/syscalls.h>
+#include <linux/rcupdate.h>

#include <asm/uaccess.h>

@@ -982,10 +983,43 @@ int is_console_locked(void)
return console_locked;
}

+void __wake_up_klogd(struct rcu_head *head);
+
+struct klogd_wakeup_state {
+ struct rcu_head head;
+ int pending;
+};
+
+DEFINE_PER_CPU(struct klogd_wakeup_state, kws);
+
+void __wake_up_klogd(struct rcu_head *head)
+{
+ unsigned long flags;
+ struct klogd_wakeup_state *kws =
+ container_of(head, struct klogd_wakeup_state, head);
+
+ local_irq_save(flags);
+ BUG_ON(!kws->pending);
+ wake_up_interruptible(&log_wait);
+ kws->pending = 0;
+ local_irq_restore(flags);
+}
+
void wake_up_klogd(void)
{
- if (!oops_in_progress && waitqueue_active(&log_wait))
- wake_up_interruptible(&log_wait);
+ unsigned long flags;
+ struct klogd_wakeup_state *kws;
+
+ if (!waitqueue_active(&log_wait))
+ return;
+
+ local_irq_save(flags);
+ kws = &__get_cpu_var(kws);
+ if (!kws->pending) {
+ call_rcu(&kws->head, __wake_up_klogd);
+ kws->pending = 1;
+ }
+ local_irq_restore(flags);
}

/**

--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, David Miller <davem@...>, Paul E McKenney <paulmck@...>
Date: Friday, August 8, 2008 - 1:52 pm

If this makes any difference:

Acked-by: Steven Rostedt <srostedt@redhat.com>

-- Steve
--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Andrew Morton <akpm@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, David Miller <davem@...>, Steven Rostedt <rostedt@...>, Paul E McKenney <paulmck@...>
Date: Friday, August 8, 2008 - 1:48 pm

In case we have a recursive oops in the RCU paths, I'd think we'd be
*much* better off with

if (!kws->pending) {
kws->pending = 1;
call_rcu(&kws->head, __wake_up_klogd);
}

in order to not re-enter the RCU paths again and causing endless oopses
along with list corruption.

Hmm?

Oh, and I don't see the point of the irqsave/restore in __wake_up_klogd().
Or the BUG_ON(), for that matter (even if it ever were to trigger, we'd be
better off _not_ triggering it - I agree with it in a "verifying the new
implementation works at all" sense, but not in a "this is actually for
merging" kind of patch).

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, David Miller <davem@...>, Steven Rostedt <rostedt@...>, Paul E McKenney <paulmck@...>
Date: Friday, August 8, 2008 - 2:14 pm

Avoid deadlocks against rq->lock and xtime_lock by using RCU to defer
the klogd wakeup.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Steven Rostedt <srostedt@redhat.com>
---
diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..8870c2e 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -32,6 +32,7 @@
#include <linux/security.h>
#include <linux/bootmem.h>
#include <linux/syscalls.h>
+#include <linux/rcupdate.h>

#include <asm/uaccess.h>

@@ -982,10 +983,37 @@ int is_console_locked(void)
return console_locked;
}

+struct klogd_wakeup_state {
+ struct rcu_head head;
+ int pending;
+};
+
+DEFINE_PER_CPU(struct klogd_wakeup_state, kws);
+
+void __wake_up_klogd(struct rcu_head *head)
+{
+ struct klogd_wakeup_state *kws =
+ container_of(head, struct klogd_wakeup_state, head);
+
+ wake_up_interruptible(&log_wait);
+ kws->pending = 0;
+}
+
void wake_up_klogd(void)
{
- if (!oops_in_progress && waitqueue_active(&log_wait))
- wake_up_interruptible(&log_wait);
+ unsigned long flags;
+ struct klogd_wakeup_state *kws;
+
+ if (!waitqueue_active(&log_wait))
+ return;
+
+ local_irq_save(flags);
+ kws = &__get_cpu_var(kws);
+ if (!kws->pending) {
+ kws->pending = 1;
+ call_rcu(&kws->head, __wake_up_klogd);
+ }
+ local_irq_restore(flags);
}

/**

--

To: <a.p.zijlstra@...>
Cc: <torvalds@...>, <akpm@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <rostedt@...>, <paulmck@...>
Date: Friday, August 8, 2008 - 7:02 pm

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

Since there is not RCU_HEAD_INIT() made on these per-cpu objects, this code
is depending upon the fact that currently the RCU head is just initialized
to all zeros.
--

To: David Miller <davem@...>
Cc: <a.p.zijlstra@...>, <torvalds@...>, <akpm@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <rostedt@...>
Date: Friday, August 8, 2008 - 8:18 pm

RCU_HEAD_INIT() is only needed for debugging purposes, when you want
to be sure that any garbage in the rcu_head is due to corruption or
some such. The call_rcu() function always completely initializes the
rcu_head structure.

Thanx, Paul
--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, David Miller <davem@...>, Steven Rostedt <rostedt@...>, Paul E McKenney <paulmck@...>
Date: Friday, August 8, 2008 - 5:35 pm

Peter Zijlstra <a.p.zijlstra@chello.nl> writes:

Hopefully nobody puts a printk into call_rcu now...
Also I'm not sure if there wouldn't be livelocks if someone puts
printk into some other parts of RCU (e.g. each RCU adding more
RCU events -- haven't checked that in detail so I might be wrong)

How about a big comment describing why you do this?
And you should add comments to all the new hands-off parts
where printk is not allowed now.

-Andi
--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, David Miller <davem@...>, Steven Rostedt <rostedt@...>, <josht@...>
Date: Friday, August 8, 2008 - 4:20 pm

Longer term, I will have one of the people here (who just happens to have
made the mistake of expressing an interest in hacking on RCU) look into
permitting invocation of RCU earlier during boot, as in before rcu_init()
has been called. Should be possible, in fact should be possible without
common-case performance penalty. ;-)

In the meantime, looks good, so:

--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: <torvalds@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Friday, August 8, 2008 - 3:14 pm

On Fri, 08 Aug 2008 20:14:28 +0200

Note that kernel/rcupreempt.c's flavour of call_rcu() takes
RCU_DATA_ME().lock, so there are still code sites from which a printk
can deadlock. Only now, it is config-dependent.

From a quick look it appears that large amounts of kernel/rcupreempt.c
are now a printk-free zone.

--

To: Andrew Morton <akpm@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, <torvalds@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>
Date: Friday, August 8, 2008 - 4:30 pm

The RCU callbacks are (and must be) invoked without holding any RCU
locks, so printk()s in RCU callbacks are perfectly permissible. It would
not be -that- hard to eliminate the lock in call_rcu(), but yeccch...
It would be far easier to debug RCU without the benefit printk()s than
to deal with a preemptable RCU implementation that was based solely on

The only printk() is in preemptable RCU's rcu_init() function, whose
only purpose is to mark the dmesg appropriately, and which should not
be a problem.

And that would be because I do a fair amount of prototyping and debugging
in user space, where one instead uses printf(). ;-)

Thanx, Paul
--

To: Andrew Morton <akpm@...>
Cc: <torvalds@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Friday, August 8, 2008 - 3:21 pm

Drad, missed that bit, I did look at the calling end, but forgot the
call_rcu() end :-/

The initial printk_tick() based implementation didn't suffer this
problem, should we revert to that scheme?

--

To: Andrew Morton <akpm@...>
Cc: <torvalds@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Friday, August 8, 2008 - 3:47 pm

Just in case people care..

---
Subject: printk: robustify printk

Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd
wakeup by polling from the timer tick.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/kernel.h | 4 ++++
kernel/printk.c | 19 +++++++++++++++++--
kernel/time/tick-sched.c | 2 +-
kernel/timer.c | 1 +
4 files changed, 23 insertions(+), 3 deletions(-)

Index: linux-2.6/include/linux/kernel.h
===================================================================
--- linux-2.6.orig/include/linux/kernel.h
+++ linux-2.6/include/linux/kernel.h
@@ -200,6 +200,8 @@ extern struct ratelimit_state printk_rat
extern int printk_ratelimit(void);
extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
unsigned int interval_msec);
+extern void printk_tick(void);
+extern int printk_needs_cpu(int);
#else
static inline int vprintk(const char *s, va_list args)
__attribute__ ((format (printf, 1, 0)));
@@ -211,6 +213,8 @@ static inline int printk_ratelimit(void)
static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
unsigned int interval_msec) \
{ return false; }
+static inline void printk_tick(void) { }
+static inline int printk_needs_cpu(int) { return 0; }
#endif

extern void asmlinkage __attribute__((format(printf, 1, 2)))
Index: linux-2.6/kernel/printk.c
===================================================================
--- linux-2.6.orig/kernel/printk.c
+++ linux-2.6/kernel/printk.c
@@ -982,10 +982,25 @@ int is_console_locked(void)
return console_locked;
}

-void wake_up_klogd(void)
+static DEFINE_PER_CPU(int, printk_pending);
+
+void printk_tick(void)
{
- if (!oops_in_progress && waitqueue_active(&log_wait))
+ if (__get_cpu_var(printk_pending)) {
+ __get_cpu_var(printk_pending) = 0;
wake_up_interruptible(&log_wait);
+ }
+}
+
+int printk_needs_cpu(int cpu)
+{
+ return per_cpu(printk_p...

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Andrew Morton <akpm@...>, <torvalds@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Monday, August 11, 2008 - 6:45 am

i missed most of the discussion, but this seems like the simplest (and
hence ultimately the best) approach to me.

Coupling printk with RCU, albeit elegant, does not seem like the right
choice to me in this specific case: printk as an essential debug
mechanism should be as decoupled as possible.

Also, once we accept the possibility of async klogd completion, we might
as well do it all the time.

this change made a previous design quirks even more visible: these are
items that are not purely event driven but need some polling component.
RCU is one, and now printk is another.

We could clean this up further by integrating the rcu_needs_cpu() and
printk_needs_cpu() into a softirq mechanism. We already check for
pending softirqs in tick-sched.c, so the above complication would go
away completely.

( But that's for a separate cleanup patch i think. )

No strong feelings though. Peter, which one do you prefer?

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, Andrew Morton <akpm@...>, <torvalds@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>
Date: Monday, August 11, 2008 - 9:22 am

I am missing something here. Are you suggesting that RCU call out
when a given CPU has nothing to do, rather than the current behavior
where rcu_needs_cpu() is invoked when a CPU is being considered for
dynticks idle mode? My concern with this approach would be races that
are currently avoided by the fact that calls to rcu_needs_cpu() are
performed with hardirqs disabled.

--

To: Ingo Molnar <mingo@...>
Cc: Andrew Morton <akpm@...>, <torvalds@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Monday, August 11, 2008 - 7:04 am

RCU depends on the polling to advance the state machine, if you want an
event driven state machine, you'd have to drive it from
rcu_read_unlock() adding overhead there - and I'm pretty sure you don't
want to do that.

So while its a tad ugly to poll for these states, I'm not too worried in
these two cases - of course every additional poll needs good

I personally prefer this printk_tick() driven one over the RCU driven
one because it doesn't trade deadlocks.

--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, <torvalds@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>
Date: Monday, August 11, 2008 - 12:09 pm

One way to break the deadlock within the RCU subsystem would be something
similar to the following untested (known not to compile) patch. The
idea is that RCU detects that call_rcu() is being called from printk(), and
simply enqueues the callback in this case. For this to really work, RCU
needs something exported from printk() to allow it to make this
decision. I chose the static variable printk_cpu below just to present
the general idea.

Thoughts?

Buggy, but otherwise:

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

rcupreempt.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff -urpNa -X dontdiff linux-2.6.27-rc1/kernel/rcupreempt.c linux-2.6.27-rc1-printk/kernel/rcupreempt.c
--- linux-2.6.27-rc1/kernel/rcupreempt.c 2008-07-30 08:48:17.000000000 -0700
+++ linux-2.6.27-rc1-printk/kernel/rcupreempt.c 2008-08-11 09:02:03.000000000 -0700
@@ -1118,17 +1118,22 @@ void call_rcu(struct rcu_head *head, voi
{
unsigned long flags;
struct rcu_data *rdp;
+ int this_cpu = smp_processor_id();

head->func = func;
head->next = NULL;
local_irq_save(flags);
rdp = RCU_DATA_ME();
- spin_lock(&rdp->lock);
- __rcu_advance_callbacks(rdp);
+ if (this_cpu == printk_cpu) {
+ spin_lock(&rdp->lock);
+ __rcu_advance_callbacks(rdp);
+ }
*rdp->nexttail = head;
rdp->nexttail = &head->next;
RCU_TRACE_RDP(rcupreempt_trace_next_add, rdp);
- spin_unlock_irqrestore(&rdp->lock, flags);
+ if (this_cpu == printk_cpu) {
+ spin_unlock_irqrestore(&rdp->lock, flags);
+ }
}
EXPORT_SYMBOL_GPL(call_rcu);

--

To: Peter Zijlstra <a.p.zijlstra@...>, Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Monday, August 11, 2008 - 7:51 am

ok - i guess we can declare stop_tick a slowpath as well (we are on the

i've started testing it in tip/core/printk to give it some track record
- thanks Peter.

Linus, any strong opinion against Peter's patch below (or in favor of
the RCU patch)? Tentatively for v2.6.28 i'd say, to give people time to
object to the async behavior.

Ingo

----------------->
From b845b517b5e3706a3729f6ea83b88ab85f0725b0 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Fri, 8 Aug 2008 21:47:09 +0200
Subject: [PATCH] printk: robustify printk

Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd
wakeup by polling from the timer tick.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
include/linux/kernel.h | 4 ++++
kernel/printk.c | 19 +++++++++++++++++--
kernel/time/tick-sched.c | 2 +-
kernel/timer.c | 1 +
4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index aaa998f..113ac8d 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -200,6 +200,8 @@ extern struct ratelimit_state printk_ratelimit_state;
extern int printk_ratelimit(void);
extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
unsigned int interval_msec);
+extern void printk_tick(void);
+extern int printk_needs_cpu(int);
#else
static inline int vprintk(const char *s, va_list args)
__attribute__ ((format (printf, 1, 0)));
@@ -211,6 +213,8 @@ static inline int printk_ratelimit(void) { return 0; }
static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
unsigned int interval_msec) \
{ return false; }
+static inline void printk_tick(void) { }
+static inline int printk_needs_cpu(int) { return 0; }
#endif

extern void asmlinkage __attribute__((format(printf, 1, 2)))
diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..65...

To: Ingo Molnar <mingo@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Wednesday, August 20, 2008 - 8:40 am

Then I think we also want the patch below on top of that, right?

From: Jiri Kosina <jkosina@suse.cz>
printk: robustify printk, fix #3

Remove the comment describing the possibility of printk() deadlocking on
runqueue lock.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>

---

kernel/printk.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..9fbf391 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -577,9 +577,6 @@ static int have_callable_console(void)
* @fmt: format string
*
* This is printk(). It can be called from any context. We want it to work.
- * Be aware of the fact that if oops_in_progress is not set, we might try to
- * wake klogd up which could deadlock on runqueue lock if printk() is called
- * from scheduler code.
*
* We try to grab the console_sem. If we succeed, it's easy - we log the output and
* call the console drivers. If we fail to get the semaphore we place the output
--

To: Jiri Kosina <jkosina@...>
Cc: Ingo Molnar <mingo@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Wednesday, August 20, 2008 - 8:43 am

--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Jiri Kosina <jkosina@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Wednesday, August 20, 2008 - 9:40 am

applied to tip/core/printk, thanks.

Ingo
--

To: Peter Zijlstra <a.p.zijlstra@...>, Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Monday, August 11, 2008 - 8:36 am

small build fixlet below.

Ingo

-------------->
From 99fc8ce41d639ce46f32c4e8618e8a571858e5aa Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Mon, 11 Aug 2008 14:38:12 +0200
Subject: [PATCH] printk: robustify printk, fix
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

fix:

include/linux/kernel.h: In function ‘printk_needs_cpu':
include/linux/kernel.h:217: error: parameter name omitted

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
include/linux/kernel.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index c9ed24c..a012579 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -201,7 +201,7 @@ extern int printk_ratelimit(void);
extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
unsigned int interval_msec);
extern void printk_tick(void);
-extern int printk_needs_cpu(int);
+extern int printk_needs_cpu(int cpu);
#else
static inline int vprintk(const char *s, va_list args)
__attribute__ ((format (printf, 1, 0)));
@@ -214,7 +214,7 @@ static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
unsigned int interval_msec) \
{ return false; }
static inline void printk_tick(void) { }
-static inline int printk_needs_cpu(int) { return 0; }
+static inline int printk_needs_cpu(int cpu) { return 0; }
#endif

extern void asmlinkage __attribute__((format(printf, 1, 2)))
--

To: Ingo Molnar <mingo@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, Andrew Morton <akpm@...>, <torvalds@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Monday, August 11, 2008 - 7:03 am

The problem is that it means any printk data output that is more
than DMESG-BUFFER-SIZE bytes during one clock tick is going to lose data.
It loses the natural adaption to higher printk rates that you
got previously.

Now we could say that for debugging etc. people should switch
to other mechanisms like relayfs, but I would still worry about
some corner cases where losing printk data that wasn't lost before
could be a severe regression (e.g. consider firewall log messages
or similar)

Essentially it makes printk (much?) less reliable than it was before
in the general case. Not sure that's a good thing. So the patch
title is definitely misleading.

As Linus pointed out earlier we've survived with this restriction
(not doing printk in the scheduler) for a long time, so is there

RCU coupling has actually the same problem.

-Andi
--

To: Andi Kleen <andi@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, <torvalds@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Monday, August 11, 2008 - 7:22 am

You only loose the msgs with klogd, console still gets everything. If
firewalls are generating that much data, perhaps its time to think about

Depends, I don't give a rats arse about klogd - I get everything through

Why not fix it if its acceptable - the deadlock is just ugly.

--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Andi Kleen <andi@...>, Andrew Morton <akpm@...>, <torvalds@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Monday, August 11, 2008 - 8:02 am

That's a rather misleading argument, because klogd as it stands today is
already lossy: you can overflow it with enough printk data. (It's rather
easy to trigger it as well, so nobody sane relies on it as a reliable
channel.)

Using relayfs and dynamic buffers would be a much worse approach because
if we are out of buffer space it would be a robustness disaster to start
allocating more RAM. (we want less coupling of printk to other kernel
subsystems, not more coupling) Preallocating wont help either because it

yeah, and note that klogd as it stands today is a lossy channel no
matter what - there's nothing that keeps a really verbose kernel from
flooding the buffer.

The issue Andi raises is largely independent of this change and there's
three clean options to deal with it:

- increase buffering (already possible, see CONFIG_LOG_BUF_SHIFT)

- create a more reliable klogd channel with a dynamically
shrinking/growing buffer (i very much doubt the complexity is
worth it, and this one wont be 100% loss-less either)

- use a reliable console

#1 is what distros use to maximize the yield of user bugreports, and #3
is what everyone who ultimately cares about massive printk output and
reliable logging uses.

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, Andi Kleen <andi@...>, Andrew Morton <akpm@...>, <torvalds@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Monday, August 11, 2008 - 8:14 am

That's true, but with at least the timer tick version it would become much
worse I think, because there would be a rather low upper limit on
the maximum frequency of klogd wakeups.

Not fully sure about the RCU version, i guess it would depend on
how many CPUs you have (but that one had the problem of
"cannot put printks into RCU anymore")

BTW if you only care about the console you can always use early_printk()
at least for serial/vga/usb debug on x86 -- it will never lose anything (at
least as long as you can read the output fast enough) and also
works in nearly any context. But of course since it also doesn't
log anything to klogd it's probably not a good idea outside debugging.

-Andi
--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Andi Kleen <andi@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, <torvalds@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Monday, August 11, 2008 - 7:42 am

Yes, and netfilter has them in fact, but it's clearly still a regression for
people who rely on klogd for this today.

Also firewall is just an example. Other cases might be people relying

The question of interest is not how you personally configure your systems,

Well you fix one thing and you break another thing (high rate
printk). It's not clear to me that the trade off is a good
one in this case. I suppose far more people care about
high rate printk than the number of people who put
printk into the scheduler.

-Andi

--

To: Andi Kleen <andi@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, <torvalds@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Monday, August 11, 2008 - 10:15 am

Similar to "netfilter has other ways", the SELinux messages have another
defined way as well. If your site is high-security and *depends* on the
messages, the *right* answer is to run auditd to catch and log the messages,
and configure auditd to bring the system to a screeching halt if the
log space fills up or there's other problems (yes, some sites would rather
have the machine drop dead than possibly miss an SELinux message).

To: <Valdis.Kletnieks@...>
Cc: Andi Kleen <andi@...>, Peter Zijlstra <a.p.zijlstra@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, <torvalds@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Monday, August 11, 2008 - 10:29 am

$INSERT-ANY-OTHER-PRINTK-USE$

The point was that printk may have been perfectly adequate for
them with its existing buffer sizes and dynamic wake up frequency,
and might not anymore with the timer wakeup change. Essentially
it is making a widely used kernel facility more fragile.

-Andi
--

To: Andi Kleen <andi@...>
Cc: <Valdis.Kletnieks@...>, Peter Zijlstra <a.p.zijlstra@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, <torvalds@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <paulmck@...>
Date: Monday, August 11, 2008 - 10:55 am

I've measured several latencies in the kernel that would cause klogd to
take serveral jiffies to wake up. I highly doubt that adding one jiffy
will break anything. And if it did, then it would show a bug in their
system. If adding one jiffy causes lost data, then the system
administrators are relying on a utility (klogd) that can easily fail them
without these patches.

If we are making a highly fragile setup a bit more fragile, maybe it will
wake the IT guys up to change their setup.

-- Steve

--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: <torvalds@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Friday, August 8, 2008 - 3:37 pm

On Fri, 08 Aug 2008 21:21:08 +0200

Dunno. Perhaps we could convert RCU_DATA_ME's spinlock_t into an
rwlock and do read_lock() in call_rcu()? Then we can should be able to
call printk from inside that read_lock(), but not inside write_lock(),
which, with suitable warning comments might be acceptable.

afacit everything in call_rcu()'s *rdp is cpu-local and is protected by
local_irq_save(). rcu_ctrlblk.completed and rcu_flipped need some
protection, but a) rdp->lock isn't sufficient anyway and b)
read_lock protection would suffice. Maybe other CPUs can alter *rdp
while __rcu_advance_callbacks() is running.

Anyway, that's all handwaving. My point is that making rcupreempt.c
more robust and more concurrent might be an alternative fix, and might
be beneficial in its own right. Working out the details is what we
have Pauls for ;)

--

To: Andrew Morton <akpm@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, <torvalds@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>
Date: Friday, August 8, 2008 - 4:32 pm

How about if I instead add comments warning people not to put printk()
in the relevant RCU-implementation code? That way I can be not only lazy,
but cowardly as well! ;-)

Thanx, Paul
--

To: <paulmck@...>
Cc: Andrew Morton <akpm@...>, <torvalds@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>
Date: Friday, August 8, 2008 - 4:37 pm

Well, it does spoil the party a little, we are now trading one deadlock
for another :-/

--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: <paulmck@...>, <torvalds@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>
Date: Friday, August 8, 2008 - 4:46 pm

On Fri, 08 Aug 2008 22:37:39 +0200

Yup. We could just as easily put those comments into the scheduler
wakeup code.

Why are we fixing this, btw? The problem has been there forever and
people who hack the wakeup code could/should know about it anyway. All
they need to do is to disable klogd during development. Did the
problem recently become worse for some reason?

--

To: Andrew Morton <akpm@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, <paulmck@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>
Date: Friday, August 8, 2008 - 4:57 pm

It hasn't beemn there forever at all.

Yes, there used to be reliance on the actual _scheduler_ locks. Doign a
wake_up() would cause runqueue locks etc to be taken.

But the xtime deadlock is fairly recent, and only happened with CFQ, I
think.

And _that_ is the irritating one. I personally wouldn't mind at all if
there is some printk() dependency on the core runqueue rq->lock or on the
RCU locking thing. But look at xtime_lock. THAT is a disaster.

Just grep for it.

So I personally actually like the RCU version best. Yes, it still depends
on really core locking. But it's really core and low-level and _confined_
locking, where a comment in a single place would probably suffice. Compare
that to all the places where we take the xtime_lock for writing!

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: <a.p.zijlstra@...>, <paulmck@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>
Date: Friday, August 8, 2008 - 5:13 pm

On Fri, 8 Aug 2008 13:57:26 -0700 (PDT)

<actually reads stuff>

Yes, not being able to do printk inside xtime_lock would be a disaster.
We decided that about 1.5 years ago last time we added then fixed this
bug (at the time I think I identified multiple already-present printks
inside xtime_lock, on error paths). Did we go and re-add this bug recently

Sure, the cant-printk-in-rcupreempt.c limitation should be quite
acceptable.
--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: <paulmck@...>, Andrew Morton <akpm@...>, <torvalds@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>
Date: Friday, August 8, 2008 - 4:50 pm

Yeah, but only Paul and his clones would need printks in preempt RCU.

The rest of us mortals now have the rest of the kernel to play with.

-- Steve

--

To: Andrew Morton <akpm@...>
Cc: <torvalds@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, <davem@...>, <rostedt@...>, <paulmck@...>
Date: Friday, August 8, 2008 - 3:49 pm

Hehe :-)

Just in case Paul shows he's human and cannot work his way around it, I
just posted a new version of the printk_tick() stuff..

--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Andrew Morton <akpm@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, David Miller <davem@...>, Steven Rostedt <rostedt@...>, Paul E McKenney <paulmck@...>
Date: Friday, August 8, 2008 - 2:30 pm

Ok, I can't see anything wrong with this from the patch. I do hate that
deadlock, so I think I'll apply it. But I'll try to hold off for a while
in case somebody has any last-minute comments..

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, David Miller <davem@...>, Steven Rostedt <rostedt@...>, Paul E McKenney <paulmck@...>
Date: Friday, August 8, 2008 - 2:33 pm

Hehe, we could do the boring conventional thing and stick it in -mm and
-next and such :-)

--

To: Andrew Morton <akpm@...>
Cc: <torvalds@...>, <mingo@...>, <tglx@...>, <marcin.slusarz@...>, <linux-kernel@...>, David Miller <davem@...>, Steven Rostedt <rostedt@...>
Date: Friday, August 8, 2008 - 9:46 am

/me tatoos on forehead: 'quilt refresh' before posting!!

Subject: printk: robustify printk wakeup behaviour

The klogd wakeup in the printk patch can cause deadlocks when holding the
rq->lock and or xtime_lock for writing.

Avoid doing the wakeup under certain conditions and delay it to the next jiffy
tick.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/kernel.h | 4 +++
include/linux/seqlock.h | 5 ++++
kernel/printk.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++-
kernel/time/tick-sched.c | 2 -
kernel/timer.c | 1
5 files changed, 58 insertions(+), 2 deletions(-)

Index: linux-2.6/include/linux/kernel.h
===================================================================
--- linux-2.6.orig/include/linux/kernel.h
+++ linux-2.6/include/linux/kernel.h
@@ -200,6 +200,8 @@ extern struct ratelimit_state printk_rat
extern int printk_ratelimit(void);
extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
unsigned int interval_msec);
+extern void printk_tick(void);
+extern int printk_needs_cpu(int);
#else
static inline int vprintk(const char *s, va_list args)
__attribute__ ((format (printf, 1, 0)));
@@ -211,6 +213,8 @@ static inline int printk_ratelimit(void)
static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
unsigned int interval_msec) \
{ return false; }
+static inline void printk_tick(void) { }
+static inline int printk_needs_cpu(int) { return 0; }
#endif

extern void asmlinkage __attribute__((format(printf, 1, 2)))
Index: linux-2.6/include/linux/seqlock.h
===================================================================
--- linux-2.6.orig/include/linux/seqlock.h
+++ linux-2.6/include/linux/seqlock.h
@@ -71,6 +71,11 @@ static inline void write_sequnlock(seqlo
spin_unlock(&sl->lock);
}

+static inline int seq_is_writelocked(seqlock_t *sl)
+{
+ return spin_is_locked(&sl->lock);
+}
+
static inline int w...

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, LKML <linux-kernel@...>
Date: Monday, March 24, 2008 - 10:31 am

I've successfully tested these patches. Thanks.
Tested-by: Marcin Slusarz <marcin.slusarz@gmail.com>

Marcin
--

Previous thread: [RFC PATCH] freeze feature ver 1.0 by Takashi Sato on Monday, March 24, 2008 - 7:11 am. (2 messages)

Next thread: [PATCH 2/2] time: xtime lock vs printk by Peter Zijlstra on Monday, March 24, 2008 - 8:24 am. (2 messages)