Re: [patch] add kdump_after_notifier

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: <vgoyal@...>
Cc: Eric W. Biederman <ebiederm@...>, Takenori Nagano <t-nagano@...>, <k-miyoshi@...>, Bernhard Walle <bwalle@...>, <kexec@...>, <linux-kernel@...>, Andrew Morton <akpm@...>
Date: Friday, August 3, 2007 - 12:05 am

Vivek Goyal (on Thu, 2 Aug 2007 16:58:52 +0530) wrote:

Do not concentrate on kdb alone.  The problem above applies to all the
RAS tools, not just kdb.

My stance is that _all_ the RAS tools (kdb, kgdb, nlkd, netdump, lkcd,
crash, kdump etc.) should be using a common interface that safely puts
the entire system in a stopped state and saves the state of each cpu.
Then each tool can do what it likes, instead of every RAS tool doing
its own thing and they all conflict with each other, which is why this
thread started.

It is not the kernel's job to decide which RAS tool runs first, second
etc., it is the user's decision to set that policy.  Different sites
will want different orders, some will say "go straight to kdump", other
sites will want to invoke a debugger first.  Sites must be able to
define that policy, but we hard code the policy into the kernel.

I proposed and wrote most of this common interface against 2.6.19-rc5.
See http://marc.info/?l=linux-arch&w=2&r=1&s=crash_stop&q=b, look for
crash_stop.  The crash_stop interface stops all the cpus, saves the
system state in a common format then runs an ordered list of RAS tools.

The order that the RAS tools are run depends on the priority value that
each tool passes to register_die_notifier.  Currently each RAS tool
hard codes its priority but it is trivial to change the tools to make
that priority a parameter, passing the policy decision back to the
user, not the kernel.

Despite having written the code and put it up for comments, the only
feedback I got was from Vivek saying "So I think crash dump will be a
little special case".  kdump is a special case whose priority is hard
wired into the kernel, so of course people are going to argue about the
coexistence of kdump with the other RAS tools.  Unless the kdump
developers agree to some flexibility, this thread will not be resolved
to anybody's satisfaction.  Use a common interface with no special
cases and let the user decide which tools to run and in which order.

The main objection raised against crash_stop is that it will not work
if the kernel stack has overflowed.  That problem is also solvable, I
raised an RFC inside SGI that would detect stack overflow and still let
the cpu continue.  Again, no interest.  I will copy that proposal to
the list as a separate thread.

I have pretty well given up on RAS code in the Linux kernel.  Everybody
has different ideas, there is no overall plan and little interest from
Linus in getting RAS tools into the kernel.  We are just thrashing.

-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[patch] add kdump_after_notifier, Takenori Nagano, (Thu Jul 19, 8:15 am)
Re: [patch] add kdump_after_notifier, Bernhard Walle, (Thu Jul 26, 10:07 am)
Re: [patch] add kdump_after_notifier, Vivek Goyal, (Thu Jul 26, 11:32 am)
Re: [patch] add kdump_after_notifier, Bernhard Walle, (Thu Jul 26, 11:34 am)
Re: [patch] add kdump_after_notifier, Vivek Goyal, (Thu Jul 26, 11:44 am)
Re: [patch] add kdump_after_notifier, Bernhard Walle, (Thu Jul 26, 11:47 am)
Re: [patch] add kdump_after_notifier, Vivek Goyal, (Thu Jul 26, 11:54 am)
Re: [patch] add kdump_after_notifier, Takenori Nagano, (Thu Jul 26, 7:28 pm)
Re: [patch] add kdump_after_notifier, Vivek Goyal, (Mon Jul 30, 5:16 am)
Re: [patch] add kdump_after_notifier, Eric W. Biederman, (Mon Jul 30, 9:42 am)
Re: [patch] add kdump_after_notifier, Takenori Nagano, (Tue Jul 31, 1:55 am)
Re: [patch] add kdump_after_notifier, Eric W. Biederman, (Tue Jul 31, 2:53 am)
Re: [patch] add kdump_after_notifier, Takenori Nagano, (Wed Aug 1, 5:26 am)
Re: [patch] add kdump_after_notifier, Eric W. Biederman, (Wed Aug 1, 6:00 am)
Re: [patch] add kdump_after_notifier, Vivek Goyal, (Thu Aug 2, 7:28 am)
Re: [patch] add kdump_after_notifier , Keith Owens, (Fri Aug 3, 12:05 am)
Re: [patch] add kdump_after_notifier, Vivek Goyal, (Sun Aug 5, 7:07 am)
Re: [patch] add kdump_after_notifier, Takenori Nagano, (Tue Aug 14, 4:34 am)
Re: [patch] add kdump_after_notifier, Bernhard Walle, (Tue Aug 14, 4:37 am)
Re: [patch] add kdump_after_notifier, Vivek Goyal, (Tue Aug 14, 9:24 am)
Re: [patch] add kdump_after_notifier, Takenori Nagano, (Thu Aug 16, 5:26 am)
Re: [patch] add kdump_after_notifier, Vivek Goyal, (Fri Aug 17, 6:56 am)
Re: [patch] add kdump_after_notifier, Jay Lan, (Tue Aug 21, 9:18 am)
Re: [patch] add kdump_after_notifier, Vivek Goyal, (Wed Aug 22, 11:56 pm)
Re: [patch] add kdump_after_notifier, Jay Lan, (Thu Aug 23, 1:34 pm)
Re: [patch] add kdump_after_notifier, Bernhard Walle, (Tue Aug 21, 9:21 am)
Re: [patch] add kdump_after_notifier, Takenori Nagano, (Tue Aug 21, 3:45 am)
Re: [patch] add kdump_after_notifier, Vivek Goyal, (Wed Aug 22, 11:52 pm)
Re: [patch] add kdump_after_notifier, Bernhard Walle, (Thu Aug 16, 5:45 am)
Re: [patch] add kdump_after_notifier, Takenori Nagano, (Tue Aug 14, 4:48 am)
Re: [patch] add kdump_after_notifier, Bernhard Walle, (Tue Aug 14, 4:53 am)
Re: [patch] add kdump_after_notifier, Andrew Morton, (Fri Aug 3, 2:25 am)
Re: [patch] add kdump_after_notifier, Eric W. Biederman, (Fri Aug 3, 3:10 am)
Re: [patch] add kdump_after_notifier , Keith Owens, (Fri Aug 3, 2:34 am)
Re: [patch] add kdump_after_notifier, Andrew Morton, (Fri Aug 3, 3:37 am)
Re: [patch] add kdump_after_notifier, Takenori Nagano, (Thu Aug 2, 4:11 am)
Re: [patch] add kdump_after_notifier, Bernhard Walle, (Thu Jul 26, 12:14 pm)
Re: [patch] add kdump_after_notifier, Bernhard Walle, (Thu Jul 26, 12:21 pm)