Re: [NAK] Re: [PATCH -v2 9/9] ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

* Andi Kleen <andi@firstfloor.org> wrote:


That's my whole point, _why_ do they have different interfaces?

EDAC is the upstream mechanism to organize hardware error reporting and to get 
hardware errors to user-space. It is already successful in handling a wide range of 
hardware in a similar fashion.

Furthermore, there is work ongoing to do the reporting via perf event channels, some 
of that work is upstream already. Boris is working on persistent events, on RAS 
tooling (tools/ras/) and on event injection. Here's a past submission of his work:

  http://lwn.net/Articles/394522/

You are now doing a completely separate thing here, detaching a big CPU vendor from 
the main body of Linux code that deals with this stuff.

IMHO that's not helpful _at all_.


It's never a good thing to have separate, vendor dependent interfaces for what to 
the user is basically the same conceptual thing!


And that kind of variance is in your opinion a good reason to introduce separate 
user ABIs for it?

( And i dont care that there might be no 'end user' for hardware error injection per 
  se right now. There is certainly an 'end user' for hardware error events and even 
  _there_ you are introducing and pushing for separate, incompatible interfaces. )

We have really good historic data here: we got the _biggest_ practical advantage 
from event enumeration (/debug/tracing/events/) when we extended it in a generic, 
unified way to the rich topology that the hardware and the kernel gives us.

That way we got new, useful tools like powertop, timechart or pytimechart or the 
edac tool, which can concentrate on a single, well-defined event topology and event 
ABI.

Why do these tools like this kind of unified event enumeration and reporting 
facilities, which you are fighting against so hard? Because of the big technological 
advantage of having to deal with one enumeration and reporting facility alone. They 
can get power events, scheduling events, timer events, kmalloc events all from the 
same source - even though these subsystems have barely anything in common! Tools can 
then combine these seemingly unrelated events into something new and useful.

It's a very extensible model, and with every new event type added, the tool space 
gets richer _together_.

Error event injection to simulate/trigger various error conditions in those events 
is a natural extension to the whole events framework - not something that should be 
in a randomly different way.

What you are doing here is to fragment the whole landscape into small, incompatible, 
vendor specific bits. Some of it is in /dev, some of it is in debugfs, some things 
report via signals, etc. etc.

It's inconsistent, messy and doesnt integrate well with the events framework we are 
building.

That was the main basis of my prior NAK, and you have said _nothing_ in the past 
that invalidates the fundamental points of that NAK.

Instead you started, by stealth and by duplicity, looking for ways to get around 
that conceptual NAK.


That's insane!

	Ingo
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [NAK] Re: [PATCH -v2 9/9] ACPI, APEI, Generic Hardware ..., Ingo Molnar, (Mon Oct 25, 4:15 am)
Re: [NAK] Re: [PATCH -v2 9/9] ACPI, APEI, Generic Hardware ..., Mauro Carvalho Chehab, (Mon Oct 25, 5:04 am)
Re: [NAK] Re: [PATCH -v2 9/9] ACPI, APEI, Generic Hardware ..., Mauro Carvalho Chehab, (Mon Oct 25, 10:19 am)