Re: [PATCH] printk: robustify printk

Previous thread: [RFC PATCH] freeze feature ver 1.0 by Takashi Sato on Monday, March 24, 2008 - 4:11 am. (2 messages)

Next thread: [PATCH 2/2] time: xtime lock vs printk by Peter Zijlstra on Monday, March 24, 2008 - 5:24 am. (2 messages)
From: Peter Zijlstra
Date: Monday, March 24, 2008 - 5:24 am

Hi Linus,

As to the regression reported by Marcin; what happens is that we invoke
printk() while holding the xtime lock for writing. printk() will call
wake_up_klogd() which tries to enqueue klogd on some rq.

The known deadlock here is calling printk() while holding rq->lock, which
would then try to recusively lock the rq again when trying to wake klogd.

The new deadlock is due to task enqueues setting an hrtimer, which requires
reading the time, which will result in a live-lock when the printk() call-
site is holding the xtime lock for writing.

Thomas would like to preserve the printk() information if possible, hence my
proposal of printk_nowakeup(). It will do everything printk() normally does,
except ommit to wake up of klogd. The call is explicitly not EXPORTed so that
its use is confined to core kernel code.

Marcin, could you please test these two patches to confirm they do indeed
solve your issue as well?

Peter



--

From: Marcin Slusarz
Date: Monday, March 24, 2008 - 7:31 am

I've successfully tested these patches. Thanks.
Tested-by: Marcin Slusarz <marcin.slusarz@gmail.com>

Marcin
--

From: Linus Torvalds
Date: Monday, March 24, 2008 - 10:58 am

Ok.

Right now, however, I think that for 2.6.25 I'll just remove the printk. 

And for the long haul, I really don't think the solution is 
"printk_nowakup()", because this is going to happen again when somebody 
doesn't realize the code is called with the rq lock held, and it's going 
to be a bitch to debug.

I just don't think this is maintainable. 

		Linus
--

From: Peter Zijlstra
Date: Monday, March 24, 2008 - 11:15 am

Yeah, we get the printk vs rq->lock thing on regular basis, the xtime
lock is new.


I'm afraid I'll have to agree.

How about I use the lockdep infrastructure to check if printk() is
invoked whole holding either xtime or rq lock, and then avoid calling
wake_up_klogd(). That way, we at least get sane debug output when the
lock debugging infrastructure is enabled?

As for removing the printk(), Thomas, do you see any other sane way to
relay that information?



--

From: Andrew Morton
Date: Monday, March 24, 2008 - 11:57 am

On Mon, 24 Mar 2008 19:15:47 +0100

The core problem seems to be that printk shouldn't be calling wake_up(). 
Can we fix that?

I expect it would be acceptable to do it from the timer interrupt instead. 
For NOHZ kernels a poll when we enter the idle loop would also be needed. 

But does that cover everything?  Is it possible for a CPU to run 100% busy
while not receiving timer interrupts?  I guess so.  To receive no
interrupts at all?  Also possible.

--

From: Peter Zijlstra
Date: Friday, August 8, 2008 - 6:30 am

local_irq_disable(); while (1);

But I guess you have more pressing issues when that happens..

---
Subject: printk: robustify printk wakeup behaviour

The klogd wakeup in the printk patch can cause deadlocks when holding the
rq->lock and or xtime_lock for writing.

Avoid doing the wakeup under certain conditions and delay it to the next jiffy
tick.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/kernel.h   |    4 ++++
 include/linux/seqlock.h  |    5 +++++
 kernel/printk.c          |   47 ++++++++++++++++++++++++++++++++++++++++++++++-
 kernel/time/tick-sched.c |    2 +-
 kernel/timer.c           |    1 +
 5 files changed, 57 insertions(+), 2 deletions(-)

Index: linux-2.6/include/linux/kernel.h
===================================================================
--- linux-2.6.orig/include/linux/kernel.h
+++ linux-2.6/include/linux/kernel.h
@@ -200,6 +200,8 @@ extern struct ratelimit_state printk_rat
 extern int printk_ratelimit(void);
 extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
 				   unsigned int interval_msec);
+extern void printk_tick(void);
+extern int printk_needs_cpu(int);
 #else
 static inline int vprintk(const char *s, va_list args)
 	__attribute__ ((format (printf, 1, 0)));
@@ -211,6 +213,8 @@ static inline int printk_ratelimit(void)
 static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
 					  unsigned int interval_msec)	\
 		{ return false; }
+static inline void printk_tick(void) { }
+static inline int printk_needs_cpu(int) { return 0; }
 #endif
 
 extern void asmlinkage __attribute__((format(printf, 1, 2)))
Index: linux-2.6/include/linux/seqlock.h
===================================================================
--- linux-2.6.orig/include/linux/seqlock.h
+++ linux-2.6/include/linux/seqlock.h
@@ -71,6 +71,11 @@ static inline void write_sequnlock(seqlo
 	spin_unlock(&sl->lock);
 }
 
+static inline int seq_is_writelocked(seqlock_t *sl)
+{
+	return ...
From: Peter Zijlstra
Date: Friday, August 8, 2008 - 6:46 am

/me tatoos on forehead: 'quilt refresh' before posting!!

Subject: printk: robustify printk wakeup behaviour

The klogd wakeup in the printk patch can cause deadlocks when holding the
rq->lock and or xtime_lock for writing.

Avoid doing the wakeup under certain conditions and delay it to the next jiffy
tick.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/kernel.h   |    4 +++
 include/linux/seqlock.h  |    5 ++++
 kernel/printk.c          |   48 ++++++++++++++++++++++++++++++++++++++++++++++-
 kernel/time/tick-sched.c |    2 -
 kernel/timer.c           |    1 
 5 files changed, 58 insertions(+), 2 deletions(-)

Index: linux-2.6/include/linux/kernel.h
===================================================================
--- linux-2.6.orig/include/linux/kernel.h
+++ linux-2.6/include/linux/kernel.h
@@ -200,6 +200,8 @@ extern struct ratelimit_state printk_rat
 extern int printk_ratelimit(void);
 extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
 				   unsigned int interval_msec);
+extern void printk_tick(void);
+extern int printk_needs_cpu(int);
 #else
 static inline int vprintk(const char *s, va_list args)
 	__attribute__ ((format (printf, 1, 0)));
@@ -211,6 +213,8 @@ static inline int printk_ratelimit(void)
 static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
 					  unsigned int interval_msec)	\
 		{ return false; }
+static inline void printk_tick(void) { }
+static inline int printk_needs_cpu(int) { return 0; }
 #endif
 
 extern void asmlinkage __attribute__((format(printf, 1, 2)))
Index: linux-2.6/include/linux/seqlock.h
===================================================================
--- linux-2.6.orig/include/linux/seqlock.h
+++ linux-2.6/include/linux/seqlock.h
@@ -71,6 +71,11 @@ static inline void write_sequnlock(seqlo
 	spin_unlock(&sl->lock);
 }
 
+static inline int seq_is_writelocked(seqlock_t *sl)
+{
+	return spin_is_locked(&sl->lock);
+}
+
 static inline int ...
From: Linus Torvalds
Date: Friday, August 8, 2008 - 9:41 am

Could we not literally just make this a RCU event? Unconditionally too?

			Linus
--

From: Peter Zijlstra
Date: Friday, August 8, 2008 - 10:10 am

Sure, but the RCU callback period is at least 3 jiffies and much longer
when busy - I'm not sure how long before we force a grace period, we do
that to avoid DoS, right Paul?

So this version would have a much higher risk of overflowing the console
buffer and making klogd miss bits. Then again, I really don't care about
klogd at _all_, I've been running with the wakeup patched out for ages.


Gah, the below doesn't boot - because I guess we start using rcu before
its properly set up.. should I poke at it more?

---
diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..3d80e30 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -32,6 +32,7 @@
 #include <linux/security.h>
 #include <linux/bootmem.h>
 #include <linux/syscalls.h>
+#include <linux/rcupdate.h>
 
 #include <asm/uaccess.h>
 
@@ -982,10 +983,38 @@ int is_console_locked(void)
 	return console_locked;
 }
 
-void wake_up_klogd(void)
+void __wake_up_klogd(struct rcu_head *head);
+
+static struct {
+	struct rcu_head head;
+	spinlock_t	lock;
+	int		pending;
+} klogd_wakeup_state = {
+	.lock = __SPIN_LOCK_UNLOCKED(klogd_wakeup_state.lock),
+};
+
+void __wake_up_klogd(struct rcu_head *head)
 {
+	unsigned long flags;
+
+	spin_lock_irqsave(&klogd_wakeup_state.lock, flags);
+	BUG_ON(!klogd_wakeup_state.pending);
 	if (!oops_in_progress && waitqueue_active(&log_wait))
 		wake_up_interruptible(&log_wait);
+	klogd_wakeup_state.pending = 0;
+	spin_unlock_irqrestore(&klogd_wakeup_state.lock, flags);
+}
+
+void wake_up_klogd(void)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&klogd_wakeup_state.lock, flags);
+	if (!klogd_wakeup_state.pending) {
+		call_rcu(&klogd_wakeup_state.head, __wake_up_klogd);
+		klogd_wakeup_state.pending = 1;
+	}
+	spin_unlock_irqrestore(&klogd_wakeup_state.lock, flags);
 }
 
 /**


--

From: Linus Torvalds
Date: Friday, August 8, 2008 - 10:25 am

I really don't think it matters. klogd is going to write the thing to 
_disk_ (or network), and three jiffies really don't matter. If we can fill 

Well, I'd care a _bit_ about klogd, but not enough to worry about a couple 

I'd certainly prefer this kind of approach. However, may I suggest:

 - doing the "waitqueue_active(&log_wait)" before even bothering to do the 
   RCU call. That, btw, will automatically mean that we wouldn't ever call 
   the RCU code before anything is initialized.

 - get rid of the "oops_in_progress" thing, since I think the whole point 
   of that was to avoid getting the lock recursively in the first place.

 - I'd worry about the "spin_lock_irqsave(&klogd_wakeup_state.lock)". What 
   if the printk happens from call_rcu()? This is exactly what we're 
   trying to get away from - having some parts of the kernel not able to 
   printk() because of subtle locking issues.

For that last thing, maybe we can just make it a percpu thing and just 
disable irq's?

			Linus
--

From: Peter Zijlstra
Date: Friday, August 8, 2008 - 10:40 am

You're _so_ right! :-)

This is much prettier and boots to boot.


Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..10830d8 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -32,6 +32,7 @@
 #include <linux/security.h>
 #include <linux/bootmem.h>
 #include <linux/syscalls.h>
+#include <linux/rcupdate.h>
 
 #include <asm/uaccess.h>
 
@@ -982,10 +983,43 @@ int is_console_locked(void)
 	return console_locked;
 }
 
+void __wake_up_klogd(struct rcu_head *head);
+
+struct klogd_wakeup_state {
+	struct rcu_head head;
+	int		pending;
+};
+
+DEFINE_PER_CPU(struct klogd_wakeup_state, kws);
+
+void __wake_up_klogd(struct rcu_head *head)
+{
+	unsigned long flags;
+	struct klogd_wakeup_state *kws = 
+		container_of(head, struct klogd_wakeup_state, head);
+
+	local_irq_save(flags);
+	BUG_ON(!kws->pending);
+	wake_up_interruptible(&log_wait);
+	kws->pending = 0;
+	local_irq_restore(flags);
+}
+
 void wake_up_klogd(void)
 {
-	if (!oops_in_progress && waitqueue_active(&log_wait))
-		wake_up_interruptible(&log_wait);
+	unsigned long flags;
+	struct klogd_wakeup_state *kws;
+
+	if (!waitqueue_active(&log_wait))
+		return;
+
+	local_irq_save(flags);
+	kws = &__get_cpu_var(kws);
+	if (!kws->pending) {
+		call_rcu(&kws->head, __wake_up_klogd);
+		kws->pending = 1;
+	}
+	local_irq_restore(flags);
 }
 
 /**


--

From: Linus Torvalds
Date: Friday, August 8, 2008 - 10:48 am

In case we have a recursive oops in the RCU paths, I'd think we'd be 
*much* better off with

	if (!kws->pending) {  
		kws->pending = 1;
		call_rcu(&kws->head, __wake_up_klogd);
	}

in order to not re-enter the RCU paths again and causing endless oopses 
along with list corruption.

Hmm?

Oh, and I don't see the point of the irqsave/restore in __wake_up_klogd(). 
Or the BUG_ON(), for that matter (even if it ever were to trigger, we'd be 
better off _not_ triggering it - I agree with it in a "verifying the new 
implementation works at all" sense, but not in a "this is actually for 
merging" kind of patch).

			Linus
--

From: Peter Zijlstra
Date: Friday, August 8, 2008 - 11:14 am

Avoid deadlocks against rq->lock and xtime_lock by using RCU to defer
the klogd wakeup.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Steven Rostedt <srostedt@redhat.com>
---
diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..8870c2e 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -32,6 +32,7 @@
 #include <linux/security.h>
 #include <linux/bootmem.h>
 #include <linux/syscalls.h>
+#include <linux/rcupdate.h>
 
 #include <asm/uaccess.h>
 
@@ -982,10 +983,37 @@ int is_console_locked(void)
 	return console_locked;
 }
 
+struct klogd_wakeup_state {
+	struct rcu_head head;
+	int pending;
+};
+
+DEFINE_PER_CPU(struct klogd_wakeup_state, kws);
+
+void __wake_up_klogd(struct rcu_head *head)
+{
+	struct klogd_wakeup_state *kws = 
+		container_of(head, struct klogd_wakeup_state, head);
+
+	wake_up_interruptible(&log_wait);
+	kws->pending = 0;
+}
+
 void wake_up_klogd(void)
 {
-	if (!oops_in_progress && waitqueue_active(&log_wait))
-		wake_up_interruptible(&log_wait);
+	unsigned long flags;
+	struct klogd_wakeup_state *kws;
+
+	if (!waitqueue_active(&log_wait))
+		return;
+
+	local_irq_save(flags);
+	kws = &__get_cpu_var(kws);
+	if (!kws->pending) {
+		kws->pending = 1;
+		call_rcu(&kws->head, __wake_up_klogd);
+	}
+	local_irq_restore(flags);
 }
 
 /**


--

From: Linus Torvalds
Date: Friday, August 8, 2008 - 11:30 am

Ok, I can't see anything wrong with this from the patch. I do hate that 
deadlock, so I think I'll apply it. But I'll try to hold off for a while 
in case somebody has any last-minute comments..

		Linus
--

From: Peter Zijlstra
Date: Friday, August 8, 2008 - 11:33 am

Hehe, we could do the boring conventional thing and stick it in -mm and
-next and such :-)

--

From: Andrew Morton
Date: Friday, August 8, 2008 - 12:14 pm

On Fri, 08 Aug 2008 20:14:28 +0200

Note that kernel/rcupreempt.c's flavour of call_rcu() takes
RCU_DATA_ME().lock, so there are still code sites from which a printk
can deadlock.  Only now, it is config-dependent.

From a quick look it appears that large amounts of kernel/rcupreempt.c
are now a printk-free zone.

--

From: Peter Zijlstra
Date: Friday, August 8, 2008 - 12:21 pm

Drad, missed that bit, I did look at the calling end, but forgot the
call_rcu() end :-/

The initial printk_tick() based implementation didn't suffer this
problem, should we revert to that scheme?

--

From: Andrew Morton
Date: Friday, August 8, 2008 - 12:37 pm

On Fri, 08 Aug 2008 21:21:08 +0200

Dunno.  Perhaps we could convert RCU_DATA_ME's spinlock_t into an
rwlock and do read_lock() in call_rcu()?  Then we can should be able to
call printk from inside that read_lock(), but not inside write_lock(),
which, with suitable warning comments might be acceptable.

afacit everything in call_rcu()'s *rdp is cpu-local and is protected by
local_irq_save().  rcu_ctrlblk.completed and rcu_flipped need some
protection, but a) rdp->lock isn't sufficient anyway and b)
read_lock protection would suffice.  Maybe other CPUs can alter *rdp
while __rcu_advance_callbacks() is running.

Anyway, that's all handwaving.  My point is that making rcupreempt.c
more robust and more concurrent might be an alternative fix, and might
be beneficial in its own right.  Working out the details is what we
have Pauls for ;)


--

From: Peter Zijlstra
Date: Friday, August 8, 2008 - 12:49 pm

Hehe :-)

Just in case Paul shows he's human and cannot work his way around it, I
just posted a new version of the printk_tick() stuff..

--

From: Paul E. McKenney
Date: Friday, August 8, 2008 - 1:32 pm

How about if I instead add comments warning people not to put printk()
in the relevant RCU-implementation code?  That way I can be not only lazy,
but cowardly as well!  ;-)

							Thanx, Paul
--

From: Peter Zijlstra
Date: Friday, August 8, 2008 - 1:37 pm

Well, it does spoil the party a little, we are now trading one deadlock
for another :-/



--

From: Steven Rostedt
Date: Friday, August 8, 2008 - 1:50 pm

Yeah, but only Paul and his clones would need printks in preempt RCU.

The rest of us mortals now have the rest of the kernel to play with.

-- Steve

--

From: Andrew Morton
Date: Friday, August 8, 2008 - 1:46 pm

On Fri, 08 Aug 2008 22:37:39 +0200

Yup.  We could just as easily put those comments into the scheduler
wakeup code.

Why are we fixing this, btw?  The problem has been there forever and
people who hack the wakeup code could/should know about it anyway.  All
they need to do is to disable klogd during development.  Did the
problem recently become worse for some reason?

--

From: Linus Torvalds
Date: Friday, August 8, 2008 - 1:57 pm

It hasn't beemn there forever at all.

Yes, there used to be reliance on the actual _scheduler_ locks. Doign a 
wake_up() would cause runqueue locks etc to be taken.

But the xtime deadlock is fairly recent, and only happened with CFQ, I 
think.

And _that_ is the irritating one. I personally wouldn't mind at all if 
there is some printk() dependency on the core runqueue rq->lock or on the 
RCU locking thing. But look at xtime_lock. THAT is a disaster.

Just grep for it. 

So I personally actually like the RCU version best. Yes, it still depends 
on really core locking. But it's really core and low-level and _confined_ 
locking, where a comment in a single place would probably suffice. Compare 
that to all the places where we take the xtime_lock for writing!

		Linus
--

From: Andrew Morton
Date: Friday, August 8, 2008 - 2:13 pm

On Fri, 8 Aug 2008 13:57:26 -0700 (PDT)

<actually reads stuff>

Yes, not being able to do printk inside xtime_lock would be a disaster.
We decided that about 1.5 years ago last time we added then fixed this
bug (at the time I think I identified multiple already-present printks
inside xtime_lock, on error paths).  Did we go and re-add this bug recently

Sure, the cant-printk-in-rcupreempt.c limitation should be quite
acceptable.
--

From: Peter Zijlstra
Date: Friday, August 8, 2008 - 12:47 pm

Just in case people care..

---
Subject: printk: robustify printk

Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd
wakeup by polling from the timer tick.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/kernel.h   |    4 ++++
 kernel/printk.c          |   19 +++++++++++++++++--
 kernel/time/tick-sched.c |    2 +-
 kernel/timer.c           |    1 +
 4 files changed, 23 insertions(+), 3 deletions(-)

Index: linux-2.6/include/linux/kernel.h
===================================================================
--- linux-2.6.orig/include/linux/kernel.h
+++ linux-2.6/include/linux/kernel.h
@@ -200,6 +200,8 @@ extern struct ratelimit_state printk_rat
 extern int printk_ratelimit(void);
 extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
 				   unsigned int interval_msec);
+extern void printk_tick(void);
+extern int printk_needs_cpu(int);
 #else
 static inline int vprintk(const char *s, va_list args)
 	__attribute__ ((format (printf, 1, 0)));
@@ -211,6 +213,8 @@ static inline int printk_ratelimit(void)
 static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
 					  unsigned int interval_msec)	\
 		{ return false; }
+static inline void printk_tick(void) { }
+static inline int printk_needs_cpu(int) { return 0; }
 #endif
 
 extern void asmlinkage __attribute__((format(printf, 1, 2)))
Index: linux-2.6/kernel/printk.c
===================================================================
--- linux-2.6.orig/kernel/printk.c
+++ linux-2.6/kernel/printk.c
@@ -982,10 +982,25 @@ int is_console_locked(void)
 	return console_locked;
 }
 
-void wake_up_klogd(void)
+static DEFINE_PER_CPU(int, printk_pending);
+
+void printk_tick(void)
 {
-	if (!oops_in_progress && waitqueue_active(&log_wait))
+	if (__get_cpu_var(printk_pending)) {
+		__get_cpu_var(printk_pending) = 0;
 		wake_up_interruptible(&log_wait);
+	}
+}
+
+int printk_needs_cpu(int cpu)
+{
+	return per_cpu(printk_pending, ...
From: Ingo Molnar
Date: Monday, August 11, 2008 - 3:45 am

i missed most of the discussion, but this seems like the simplest (and 
hence ultimately the best) approach to me.

Coupling printk with RCU, albeit elegant, does not seem like the right 
choice to me in this specific case: printk as an essential debug 
mechanism should be as decoupled as possible.

Also, once we accept the possibility of async klogd completion, we might 
as well do it all the time.


this change made a previous design quirks even more visible: these are 
items that are not purely event driven but need some polling component. 
RCU is one, and now printk is another.

We could clean this up further by integrating the rcu_needs_cpu() and 
printk_needs_cpu() into a softirq mechanism. We already check for 
pending softirqs in tick-sched.c, so the above complication would go 
away completely.

( But that's for a separate cleanup patch i think. )

No strong feelings though. Peter, which one do you prefer?

	Ingo
--

From: Andi Kleen
Date: Monday, August 11, 2008 - 4:03 am

The problem is that it means any printk data output that is more
than DMESG-BUFFER-SIZE bytes during one clock tick is going to lose data.
It loses the natural adaption to higher printk rates that you
got previously.

Now we could say that for debugging etc. people should switch
to other mechanisms like relayfs, but I would still worry about
some corner cases where losing printk data that wasn't lost before
could be a severe regression (e.g. consider firewall log messages
or similar)

Essentially it makes printk (much?) less reliable than it was before
in the general case. Not sure that's a good thing. So the patch
title is definitely misleading.

As Linus pointed out earlier we've survived with this restriction
(not doing printk in the scheduler) for a long time, so is there

RCU coupling has actually the same problem.

-Andi
--

From: Peter Zijlstra
Date: Monday, August 11, 2008 - 4:22 am

You only loose the msgs with klogd, console still gets everything. If
firewalls are generating that much data, perhaps its time to think about

Depends, I don't give a rats arse about klogd - I get everything through

Why not fix it if its acceptable - the deadlock is just ugly.

--

From: Andi Kleen
Date: Monday, August 11, 2008 - 4:42 am

Yes, and netfilter has them in fact, but it's clearly still a regression for 
people who rely on klogd for this today.

Also firewall is just an example. Other cases might be people relying

The question of interest is not how you personally configure your systems, 

Well you fix one thing and you break another thing (high rate
printk). It's not clear to me that the trade off is a good
one in this case. I suppose far more people care about
high rate printk than the number of people who put
printk into the scheduler.

-Andi

--

From: Valdis.Kletnieks
Date: Monday, August 11, 2008 - 7:15 am

Similar to "netfilter has other ways", the SELinux messages have another
defined way as well.  If your site is high-security and *depends* on the
messages, the *right* answer is to run auditd to catch and log the messages,
and configure auditd to bring the system to a screeching halt if the
log space fills up or there's other problems (yes, some sites would rather
have the machine drop dead than possibly miss an SELinux message).
From: Andi Kleen
Date: Monday, August 11, 2008 - 7:29 am

$INSERT-ANY-OTHER-PRINTK-USE$

The point was that printk may have been perfectly adequate for 
them with its existing buffer sizes and dynamic wake up frequency,
and might not anymore with the timer wakeup change. Essentially
it is making a widely used kernel facility more fragile.

-Andi
--

From: Steven Rostedt
Date: Monday, August 11, 2008 - 7:55 am

I've measured several latencies in the kernel that would cause klogd to 
take serveral jiffies to wake up. I highly doubt that adding one jiffy 
will break anything. And if it did, then it would show a bug in their 
system. If adding one jiffy causes lost data, then the system 
administrators are relying on a utility (klogd) that can easily fail them 
without these patches.


If we are making a highly fragile setup a bit more fragile, maybe it will 
wake the IT guys up to change their setup.

-- Steve

--

From: Ingo Molnar
Date: Monday, August 11, 2008 - 5:02 am

That's a rather misleading argument, because klogd as it stands today is 
already lossy: you can overflow it with enough printk data. (It's rather 
easy to trigger it as well, so nobody sane relies on it as a reliable 
channel.)

Using relayfs and dynamic buffers would be a much worse approach because 
if we are out of buffer space it would be a robustness disaster to start 
allocating more RAM. (we want less coupling of printk to other kernel 
subsystems, not more coupling) Preallocating wont help either because it 

yeah, and note that klogd as it stands today is a lossy channel no 
matter what - there's nothing that keeps a really verbose kernel from 
flooding the buffer.

The issue Andi raises is largely independent of this change and there's 
three clean options to deal with it:

 - increase buffering (already possible, see CONFIG_LOG_BUF_SHIFT)

 - create a more reliable klogd channel with a dynamically
   shrinking/growing buffer (i very much doubt the complexity is
   worth it, and this one wont be 100% loss-less either)

 - use a reliable console

#1 is what distros use to maximize the yield of user bugreports, and #3
is what everyone who ultimately cares about massive printk output and 
reliable logging uses.

	Ingo
--

From: Andi Kleen
Date: Monday, August 11, 2008 - 5:14 am

That's true, but with at least the timer tick version it would become much 
worse I think, because there would be a rather low upper limit on
the maximum frequency of klogd wakeups.

Not fully sure about the RCU version, i guess it would depend on 
how many CPUs you have (but that one had the problem of
"cannot put printks into RCU anymore")

BTW if you only care about the console you can always use early_printk()
at least for serial/vga/usb debug on x86 -- it will never lose anything (at 
least as long as you can read the output fast enough) and also
works in nearly any context. But of course since it also doesn't
log anything to klogd it's probably not a good idea outside debugging.

-Andi
--

From: Peter Zijlstra
Date: Monday, August 11, 2008 - 4:04 am

RCU depends on the polling to advance the state machine, if you want an
event driven state machine, you'd have to drive it from
rcu_read_unlock() adding overhead there - and I'm pretty sure you don't
want to do that.

So while its a tad ugly to poll for these states, I'm not too worried in
these two cases - of course every additional poll needs good

I personally prefer this printk_tick() driven one over the RCU driven
one because it doesn't trade deadlocks.


--

From: Ingo Molnar
Date: Monday, August 11, 2008 - 4:51 am

ok - i guess we can declare stop_tick a slowpath as well (we are on the 

i've started testing it in tip/core/printk to give it some track record 
- thanks Peter.

Linus, any strong opinion against Peter's patch below (or in favor of 
the RCU patch)? Tentatively for v2.6.28 i'd say, to give people time to 
object to the async behavior.

	Ingo

----------------->
From b845b517b5e3706a3729f6ea83b88ab85f0725b0 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Fri, 8 Aug 2008 21:47:09 +0200
Subject: [PATCH] printk: robustify printk

Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd
wakeup by polling from the timer tick.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/kernel.h   |    4 ++++
 kernel/printk.c          |   19 +++++++++++++++++--
 kernel/time/tick-sched.c |    2 +-
 kernel/timer.c           |    1 +
 4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index aaa998f..113ac8d 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -200,6 +200,8 @@ extern struct ratelimit_state printk_ratelimit_state;
 extern int printk_ratelimit(void);
 extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
 				   unsigned int interval_msec);
+extern void printk_tick(void);
+extern int printk_needs_cpu(int);
 #else
 static inline int vprintk(const char *s, va_list args)
 	__attribute__ ((format (printf, 1, 0)));
@@ -211,6 +213,8 @@ static inline int printk_ratelimit(void) { return 0; }
 static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
 					  unsigned int interval_msec)	\
 		{ return false; }
+static inline void printk_tick(void) { }
+static inline int printk_needs_cpu(int) { return 0; }
 #endif
 
 extern void asmlinkage __attribute__((format(printf, 1, 2)))
diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..655cc2c 100644
--- ...
From: Ingo Molnar
Date: Monday, August 11, 2008 - 5:36 am

small build fixlet below.

	Ingo

-------------->
From 99fc8ce41d639ce46f32c4e8618e8a571858e5aa Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Mon, 11 Aug 2008 14:38:12 +0200
Subject: [PATCH] printk: robustify printk, fix
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

fix:

 include/linux/kernel.h: In function ‘printk_needs_cpu':
 include/linux/kernel.h:217: error: parameter name omitted

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/kernel.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index c9ed24c..a012579 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -201,7 +201,7 @@ extern int printk_ratelimit(void);
 extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
 				   unsigned int interval_msec);
 extern void printk_tick(void);
-extern int printk_needs_cpu(int);
+extern int printk_needs_cpu(int cpu);
 #else
 static inline int vprintk(const char *s, va_list args)
 	__attribute__ ((format (printf, 1, 0)));
@@ -214,7 +214,7 @@ static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
 					  unsigned int interval_msec)	\
 		{ return false; }
 static inline void printk_tick(void) { }
-static inline int printk_needs_cpu(int) { return 0; }
+static inline int printk_needs_cpu(int cpu) { return 0; }
 #endif
 
 extern void asmlinkage __attribute__((format(printf, 1, 2)))
--

From: Jiri Kosina
Date: Wednesday, August 20, 2008 - 5:40 am

Then I think we also want the patch below on top of that, right?


From: Jiri Kosina <jkosina@suse.cz>
printk: robustify printk, fix #3

Remove the comment describing the possibility of printk() deadlocking on 
runqueue lock.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>

--- 

 kernel/printk.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..9fbf391 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -577,9 +577,6 @@ static int have_callable_console(void)
  * @fmt: format string
  *
  * This is printk().  It can be called from any context.  We want it to work.
- * Be aware of the fact that if oops_in_progress is not set, we might try to
- * wake klogd up which could deadlock on runqueue lock if printk() is called
- * from scheduler code.
  *
  * We try to grab the console_sem.  If we succeed, it's easy - we log the output and
  * call the console drivers.  If we fail to get the semaphore we place the output
--

From: Peter Zijlstra
Date: Wednesday, August 20, 2008 - 5:43 am

From: Ingo Molnar
Date: Wednesday, August 20, 2008 - 6:40 am

applied to tip/core/printk, thanks.

	Ingo
--

From: Paul E. McKenney
Date: Monday, August 11, 2008 - 9:09 am

One way to break the deadlock within the RCU subsystem would be something
similar to the following untested (known not to compile) patch.  The
idea is that RCU detects that call_rcu() is being called from printk(), and
simply enqueues the callback in this case.  For this to really work, RCU
needs something exported from printk() to allow it to make this
decision.  I chose the static variable printk_cpu below just to present
the general idea.

Thoughts?

Buggy, but otherwise:

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 rcupreempt.c |   11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff -urpNa -X dontdiff linux-2.6.27-rc1/kernel/rcupreempt.c linux-2.6.27-rc1-printk/kernel/rcupreempt.c
--- linux-2.6.27-rc1/kernel/rcupreempt.c	2008-07-30 08:48:17.000000000 -0700
+++ linux-2.6.27-rc1-printk/kernel/rcupreempt.c	2008-08-11 09:02:03.000000000 -0700
@@ -1118,17 +1118,22 @@ void call_rcu(struct rcu_head *head, voi
 {
 	unsigned long flags;
 	struct rcu_data *rdp;
+	int this_cpu = smp_processor_id();
 
 	head->func = func;
 	head->next = NULL;
 	local_irq_save(flags);
 	rdp = RCU_DATA_ME();
-	spin_lock(&rdp->lock);
-	__rcu_advance_callbacks(rdp);
+	if (this_cpu == printk_cpu) {
+		spin_lock(&rdp->lock);
+		__rcu_advance_callbacks(rdp);
+	}
 	*rdp->nexttail = head;
 	rdp->nexttail = &head->next;
 	RCU_TRACE_RDP(rcupreempt_trace_next_add, rdp);
-	spin_unlock_irqrestore(&rdp->lock, flags);
+	if (this_cpu == printk_cpu) {
+		spin_unlock_irqrestore(&rdp->lock, flags);
+	}
 }
 EXPORT_SYMBOL_GPL(call_rcu);
 
--

From: Paul E. McKenney
Date: Monday, August 11, 2008 - 6:22 am

I am missing something here.  Are you suggesting that RCU call out
when a given CPU has nothing to do, rather than the current behavior
where rcu_needs_cpu() is invoked when a CPU is being considered for
dynticks idle mode?  My concern with this approach would be races that
are currently avoided by the fact that calls to rcu_needs_cpu() are
performed with hardirqs disabled.

--

From: Paul E. McKenney
Date: Friday, August 8, 2008 - 1:30 pm

The RCU callbacks are (and must be) invoked without holding any RCU
locks, so printk()s in RCU callbacks are perfectly permissible.  It would
not be -that- hard to eliminate the lock in call_rcu(), but yeccch...
It would be far easier to debug RCU without the benefit printk()s than
to deal with a preemptable RCU implementation that was based solely on

The only printk() is in preemptable RCU's rcu_init() function, whose
only purpose is to mark the dmesg appropriately, and which should not
be a problem.

And that would be because I do a fair amount of prototyping and debugging
in user space, where one instead uses printf().  ;-)

							Thanx, Paul
--

From: Paul E. McKenney
Date: Friday, August 8, 2008 - 1:20 pm

Longer term, I will have one of the people here (who just happens to have
made the mistake of expressing an interest in hacking on RCU) look into
permitting invocation of RCU earlier during boot, as in before rcu_init()
has been called.  Should be possible, in fact should be possible without
common-case performance penalty.  ;-)

In the meantime, looks good, so:

--

From: Andi Kleen
Date: Friday, August 8, 2008 - 2:35 pm

Peter Zijlstra <a.p.zijlstra@chello.nl> writes:

Hopefully nobody puts a printk into call_rcu now...
Also I'm not sure if there wouldn't be livelocks if someone puts
printk into some other parts of RCU (e.g. each RCU adding more
RCU events -- haven't checked that in detail so I might be wrong)



How about a big comment describing why you do this?
And you should add comments to all the new hands-off parts
where printk is not allowed now.

-Andi
--

From: David Miller
Date: Friday, August 8, 2008 - 4:02 pm

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

Since there is not RCU_HEAD_INIT() made on these per-cpu objects, this code
is depending upon the fact that currently the RCU head is just initialized
to all zeros.
--

From: Paul E. McKenney
Date: Friday, August 8, 2008 - 5:18 pm

RCU_HEAD_INIT() is only needed for debugging purposes, when you want
to be sure that any garbage in the rcu_head is due to corruption or
some such.  The call_rcu() function always completely initializes the
rcu_head structure.

							Thanx, Paul
--

From: Steven Rostedt
Date: Friday, August 8, 2008 - 10:52 am

If this makes any difference:

Acked-by: Steven Rostedt <srostedt@redhat.com>

-- Steve
--

From: Linus Torvalds
Date: Monday, March 24, 2008 - 11:16 am

So I just committed that patch, but I was also looking at just making 
'printk()' itself more robust against this kind of thing.

Quite frankly, the printk() console semaphore etc handling is very 
confusing, and it has obviously evolved over time. Here's a suggested 
cleanup patch that doesn't actually *fix* anything, but it makes the 
resulting code flow a heck of a lot more readable.

In particular, it would now easily and readably allow us to add other 
rules inside the new "can_use_console()" that disable the console flushing 
if (for example) the request queue lock is held, or xlock is held for 
writing.

[ The patch adds more lines than it removes, but that is largely due to 
  the comments and the fact that it moves some of the code into new helper 
  functions, which invariably adds a few lines for the function definition 
  etc.

  There are actually fewer lines of actual code, because it ends up using
  the already-existing "try_acquire_console_sem()", and it avoids one ugly 
  special case, and removes some trivial code duplication about releasing 
  the logbuf_lock by reorganizing things a bit ]

Comments? 

			Linus
---
 kernel/printk.c |   83 +++++++++++++++++++++++++++++++-----------------------
 1 files changed, 48 insertions(+), 35 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 9adc2a4..c46a20a 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -616,6 +616,40 @@ asmlinkage int printk(const char *fmt, ...)
 /* cpu currently holding logbuf_lock */
 static volatile unsigned int printk_cpu = UINT_MAX;
 
+/*
+ * Can we actually use the console at this time on this cpu?
+ *
+ * Console drivers may assume that per-cpu resources have
+ * been allocated. So unless they're explicitly marked as
+ * being able to cope (CON_ANYTIME) don't call them until
+ * this CPU is officially up.
+ */
+static inline int can_use_console(unsigned int cpu)
+{
+	return cpu_online(cpu) || have_callable_console();
+}
+
+/*
+ * Try to get ...
Previous thread: [RFC PATCH] freeze feature ver 1.0 by Takashi Sato on Monday, March 24, 2008 - 4:11 am. (2 messages)

Next thread: [PATCH 2/2] time: xtime lock vs printk by Peter Zijlstra on Monday, March 24, 2008 - 5:24 am. (2 messages)