[PATCH 1/3 v4] Add tracing_off_event() to stop tracing when a bug or warning occur

Previous thread: [PATCH 2/3] Add tracing_off_event() calls to BUG() and WARN() paths by Chase Douglas on Wednesday, April 14, 2010 - 9:20 am. (5 messages)

Next thread: [PATCH v2] x86, UV: BAU performance and error recovery by Cliff Wickman on Wednesday, April 14, 2010 - 9:35 am. (3 messages)
From: Chase Douglas
Date: Wednesday, April 14, 2010 - 9:20 am

The tracing_off_event() function calls tracing_off() to stop tracing
when an event occurs. By default, only BUG-type events stop tracing,
while WARNING type events do not. This is controlled through the
tracing_off={none,warn,bug} commandline parameter.

Call this function from bug and warning event handlers to enable a user
to debug their kernel by starting a trace, hitting an event, and then
retrieving trace info knowing that the trace was stopped right after the
event was hit.

Signed-off-by: Chase Douglas <chase.douglas@canonical.com>
---
 include/linux/kernel.h     |    7 +++++++
 kernel/trace/ring_buffer.c |   34 ++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 9365227..80c67ad 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -478,16 +478,23 @@ static inline char *pack_hex_byte(char *buf, u8 byte)
  *
  * Most likely, you want to use tracing_on/tracing_off.
  */
+enum trace_off_event {
+	TRACE_EVENT_BUG = 0,
+	TRACE_EVENT_WARN,
+};
+
 #ifdef CONFIG_RING_BUFFER
 void tracing_on(void);
 void tracing_off(void);
 /* trace_off_permanent stops recording with no way to bring it back */
 void tracing_off_permanent(void);
+void tracing_off_event(enum trace_off_event event);
 int tracing_is_on(void);
 #else
 static inline void tracing_on(void) { }
 static inline void tracing_off(void) { }
 static inline void tracing_off_permanent(void) { }
+static inline void tracing_off_event(enum trace_off_event event) { }
 static inline int tracing_is_on(void) { return 0; }
 #endif
 #ifdef CONFIG_TRACING
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 41ca394..f6be195 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -194,6 +194,40 @@ void tracing_off_permanent(void)
 	set_bit(RB_BUFFERS_DISABLED_BIT, &ring_buffer_flags);
 }
 
+static enum trace_off_event tracing_off_event_ctrl = ...
From: Chase Douglas
Date: Wednesday, April 14, 2010 - 9:20 am

This change adds a tracing_off_event() call to stop tracing on schedule
bugs unless tracing_off=none was specified on the commandline.

Signed-off-by: Chase Douglas <chase.douglas@canonical.com>
---
 kernel/sched.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 6af210a..439f036 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3590,6 +3590,8 @@ static noinline void __schedule_bug(struct task_struct *prev)
 {
 	struct pt_regs *regs = get_irq_regs();
 
+	tracing_off_event(TRACE_EVENT_BUG);
+
 	printk(KERN_ERR "BUG: scheduling while atomic: %s/%d/0x%08x\n",
 		prev->comm, prev->pid, preempt_count());
 
-- 
1.7.0

--

From: Frederic Weisbecker
Date: Thursday, April 15, 2010 - 2:03 pm

I would rather call that a TRACE_EVENT_WARN as this is what happens: we
warn but we continue.

--

From: Chase Douglas
Date: Thursday, April 15, 2010 - 2:45 pm

I tend to think of the TRACE_EVENT_* as an indication of severity and
whether we want to stop the trace by default. From a distro
standpoint, the likelihood that we want to continue tracing after a
__schedule_bug is pretty low. It's easiest if we don't have to tell
our users to add a kernel command line, especially since grub in
Ubuntu 10.04 LTS is difficult to interact with for end users.

Now I'm not much of an upstream developer (at least not yet), so if it
makes much more sense to continue by default from a development
standpoint, then perhaps it's better to make it continue by default. I
just think that the frequency of hitting this should be low enough
that it won't trip people up. And when it does, maybe that gives more
incentive to fix the schedule bug :).

-- Chase
--

From: Thomas Gleixner
Date: Thursday, April 15, 2010 - 4:01 pm

Well, scheduling while atomic is a BUG, but one of the category which
allows the kernel to continue. So in fact it's treated like a WARN_ON.
So the tracing_off_event() qualifier should be *_WARN.

That's independent of the question whether you want to stop tracing in

That's a serious PITA caused by the "let's mimic the other OS" crowd
and no excuse for creating a mess in the kernel.
 
Thanks,

	tglx
From: Frederic Weisbecker
Date: Thursday, April 15, 2010 - 4:10 pm

Exactly.

--

From: Chase Douglas
Date: Thursday, April 15, 2010 - 4:27 pm

We seem to be agreeing on the functionality. The disagreement seems to
be in the macro name/functionality mapping. However, the name of the
function itself is *_bug. I don't see how things are clearer or more
useful by inserting a *_WARN level macro in a *_bug named function.

Essentially, it makes more sense to me for the macro to represent the
severity of the case, and not be coupled somehow to what the kernel
decides to do outside of the tracing.

Is it the case that you really feel it should be *_WARN, or that the
macro name/functionality should be different?

-- Chase
--

From: Thomas Gleixner
Date: Thursday, April 15, 2010 - 4:50 pm

It does not matter at all whether the function name has "bug" in it or
not. What matters is the semantics of the function. It does _NOT_
raise a BUG. It merily warns and tries to continue. So it follows the
WARN() semantics.

If you feel strong about that send a patch to

Essentially you are wrong. The semantic of schedule_bug() is clearly
WARN() and not BUG(). So the tracing off qualifier needs to be
WARN. And it does not matter what you consider as the severity. The
severity is given by the semantics of schedule_bug().

If your extra stupid grub hiding logic prevents an user to change the
trace off level then you need to fix that instead of anything else.

BTW, if interacting with grub is that hard: how does an user start the
tracer at all ?

Thanks,

	tglx
From: Thomas Gleixner
Date: Thursday, April 15, 2010 - 5:21 pm

Just looked through the other patches and noticed that the patch which
provides the tracing_off(level) stuff is incomplete as it provides
only a command line option to change that tracing off level.

The command line option is merily for tracing which happens to be
started on the command line i.e. _BEFORE_ we have usable user space.

So your grub argument is just crap. If the user cannot change this
setting w/o fiddling with the obscured grub then he can not start the
tracer on the command line either.

But somehow he can start the tracer later when user space is up and
running, but there is no way to change that setting anymore. Therefor
you go through the kernel and impose settings at will.

1) Your patch simply lacks an interface to change that setting via
   debugfs/tracing/wtf

   WTF should I reboot my machine to change that setting from the
   default BUG to WARN or NONE ? There is no reason at all.

2) tracing off can be done via filters on functions and/or events
   already - so I doubt that the tracing_off_event(level) is necessary
   at all.

   schedule_bug() definitely deserves a separate trace_schedule_bug()
   event which can be used to stop the tracer by already existing
   functionality.

Thanks,

	tglx
--

From: Frederic Weisbecker
Date: Thursday, April 15, 2010 - 6:49 pm

Yeah it works for the function tracer, but not for events (or other tracers),
or I missed this feature somehow.

--

From: Steven Rostedt
Date: Friday, April 16, 2010 - 9:41 am

It is currently only for the function tracer. It's been on my todo list
for events for quite a while. Perhaps I'll push that up to the top of
the list next week.

-- Steve



--

From: Chase Douglas
Date: Thursday, April 15, 2010 - 9:01 pm

You're right, this should be something that may be changed at run
time. I spoke with Steven at LCS and he agrees that there should be an
option in (debugfs)/tracing/options/ for this. I will fix up the first

Steven said he would be fine with a separate TRACE_EVENT_<blah> macro
for the schedule bug if needed, but I'm not sure we need to go that
far. If it's configurable through debugfs at run time then it serves
my purpose. Unless you feel we should have finer grained control
specifically for scheduling while atomic bugs, I'll just leave it as
TRACE_EVENT_WARN.

-- Chase
--

From: Steven Rostedt
Date: Friday, April 16, 2010 - 9:46 am

I actually like Thomas's idea better. I need to add the "stop trace on
event" functionality, and we can insert trace events for bugs, and not
have this whole "stop tracing here" functions. Instead we could just add
tracepoints and have a way to pick and choose where to stop tracing.

add a:

include/trace/events/errors.h

#define TRACE_SYSTEM errors

TRACE_EVENT(sched_bug, ....)

etc,


When I get back home, I'll add this functionality to stop tracing on
events. Perhaps I'll even add "TRACE_SUB_SYSTEM" so in the events
directory, we can have sub layers:

events/errors/BUG/...
events/errors/WARNING/...

etc

-- Steve


--

From: Chase Douglas
Date: Friday, April 16, 2010 - 10:14 am

Ok, I see where this is going now. I agree this sounds like a much
better approach. I'll wait for the implementation of the stop tracing
on event functionality so I know how it will work, and then I can
submit patches for the BUG, WARN, panic, and schedule_bug paths.

-- Chase
--

From: Frederic Weisbecker
Date: Friday, April 16, 2010 - 11:14 am

I agree this could be nice. I'm thinking about the printk all around
the kernel that can be represented as trace events with sub-sub-systems
that could be file paths of the kernel.

I just hope we can find a way to do this that won't break tools.

--

From: Thomas Gleixner
Date: Friday, April 16, 2010 - 12:58 pm

Steven,


I just assumed that it would work for events already. The stop trace on

I like that idea. That's solving the problem in a very elegant way.

Thanks,

	tglx
--

From: Chase Douglas
Date: Monday, April 19, 2010 - 3:30 pm

I just found what you were meaning here, since there's no
documentation yet about the traceon/traceoff filter command. This
would do exactly what we want for helping us fix bugs.

I'll write some info about the command in ftrace.txt and send a patch
for review so no one else is left out of the party :).

-- Chase
--

From: Chase Douglas
Date: Thursday, April 15, 2010 - 8:52 pm

I misunderstood what you wrote. I thought you were agreeing that the
tracing should stop by default in __schedule_bug. I now realize you
meant that you agree it should be possible to stop it, but it should
not be the default behavior. Sorry for the confusion.

I will change the event to be a *_WARN type.

-- Chase
--

From: Frederic Weisbecker
Date: Thursday, April 15, 2010 - 1:13 pm

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

--

Previous thread: [PATCH 2/3] Add tracing_off_event() calls to BUG() and WARN() paths by Chase Douglas on Wednesday, April 14, 2010 - 9:20 am. (5 messages)

Next thread: [PATCH v2] x86, UV: BAU performance and error recovery by Cliff Wickman on Wednesday, April 14, 2010 - 9:35 am. (3 messages)