Re: Kernel panic on shutdown -p -- ACPI problem?

Previous thread: Dear friend, by Hou Ruoyu on Monday, May 3, 2010 - 5:42 am. (1 message)

Next thread: OpenBSD release party Amsterdam by Floor Terra on Monday, May 3, 2010 - 5:12 pm. (1 message)
From: Stefan Unterweger
Date: Monday, May 3, 2010 - 3:43 pm

Hello!

I've recently "rediscovered" a computer that I'd been using as a
Linux fileserver a few years ago. Since it's hardware is
considerably better than the even older machine I'm using now as
an OpenBSD fileserver, I tried if I could make it run.

In principle, everything works fine, to some extent much smoother
than on Linux (especially getting the sensors to work back then
was a true nightmare, and I eventually gave up in defeat -- on
OpenBSD, they just work).

However, if I do `shutdown -h -p` (thus power off), I get a
kernel panic; specifically, "AML PARSE ERROR" (see below). This
only happens when doing '-p' is involved somehow; rebooting
works, and just '-h' without '-p' does, too.

I've done some research, and it turns out that the motherboard
seems to a particularly buggy ACPI tables. And just as well, if I
disable ACPI, the kernel panic vanishes. However, the machine
doesn't get turned off as well, so it's not really a victory.
All this was done using 4.6 release, as this was a few months
ago.

Before I do any further research or experiments with that
machine, I just wanted to ask if I'd have any chances to work
against this problems. As far as I understood from some ancient
NetBSD mailinglist threads, in theory it should be possible
to somehow do something such that the kernel loads patched ACPI
tables which have those particular bugs corrected. So, if this
would be possible on OpenBSD, I knew that I should spend some
more time on this, without it being wasted.

The motherboard in question is a Tyan Tiger S2466 dual-Athon
multiprocessor board, with both processor sockets filled. As
already said, not the most recent of mainboard imaginable, so I
don't think that trying 4.7 would be much difference, especially
as it seems that the bug is in the BIOS, not in OpenBSD.

If anyone has a pointer---a "no, it won't work" would be more
than helpful, too---, I'd be grateful. If I could get that thing
to work again, my poor student's budget would be saved yet
another ...
From: Aaron Mason
Date: Tuesday, May 4, 2010 - 3:48 am

On Tue, May 4, 2010 at 8:43 AM, Stefan Unterweger

Hi,

When you get it out again, we'll also need to see an acpidump output.

Thanks

-- 
Aaron Mason - Programmer, open source addict
I've taken my software vows - for beta or for worse

From: Stefan Unterweger
Date: Tuesday, May 4, 2010 - 3:07 pm

Here is the output of both acpidump(8) and dmesg(8).


s//un

-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----

/*
RSD PTR: Checksum=20, OEMID=PTLTD, RsdtAddress=0x3fefcf28
 */
/*
RSDT: Length=44, Revision=1, Checksum=13,
	OEMID=PTLTD, OEM Table ID=  RSDT, OEM Revision=0x6040000,
	Creator ID= LTP, Creator Revision=0x0
 */
/*
	Entries={ 0x3fefef2e, 0x3fefefa2 }
 */
/*
	DSDT=0x3fefcf54
	INT_MODEL=PIC
	SCI_INT=9
	SMI_CMD=0x802f, ACPI_ENABLE=0xf0, ACPI_DISABLE=0xf1, S4BIOS_REQ=0x0
	PM1a_EVT_BLK=0x8000-0x8003
	PM1a_CNT_BLK=0x8004-0x8005
	PM2_TMR_BLK=0x8008-0x800b
	PM2_GPE0_BLK=0x8020-0x8023
	P_LVL2_LAT=101ms, P_LVL3_LAT=1001ms
	FLUSH_SIZE=0, FLUSH_STRIDE=0
	DUTY_OFFSET=1, DUTY_WIDTH=0
	DAY_ALRM=13, MON_ALRM=0, CENTURY=50
	Flags={WBINVD,PROC_C1}
 */
/*
DSDT: Length=8154, Revision=1, Checksum=247,
	OEMID=AMD, OEM Table ID=AMDACPI, OEM Revision=0x6040000,
	Creator ID=MSFT, Creator Revision=0x100000d
 */
DefinitionBlock (
"acpi_dsdt.aml",	//Output filename
"DSDT",			//Signature
0x1,			//DSDT Revision
"AMD",			//OEMID
"AMDACPI",		//TABLE ID
0x6040000			//OEM Revision
)

{
Scope(\_PR_) {
    Processor(CPU0, 0, 0x8010, 0x6) {
    }
    Processor(CPU1, 1, 0x0, 0x0) {
    }
}
Name(\_S0_, Package(0x4) {
    0x0,
    0x0,
    0x0,
    0x0,
})
Name(\_S1_, Package(0x4) {
    0x1,
    0x1,
    0x1,
    0x1,
})
Name(\_S4_, Package(0x4) {
    0x6,
    0x6,
    0x6,
    0x6,
})
Name(\_S5_, Package(0x4) {
    0x7,
    0x7,
    0x7,
    0x7,
})
Name(OSFL, 0x0)
Method(STRC, 2) {
    If(LNot(LEqual(SizeOf(Arg0), SizeOf(Arg1)))) {
        Return(0x0)
    }
    Add(SizeOf(Arg0), 0x1, Local0)
    Name(BUF0, Buffer(Local0) { })
    Name(BUF1, Buffer(Local0) { })
    Store(Arg0, BUF0)
    Store(Arg1, BUF1)
    While(Local0) {
        Decrement(Local0)
        If(LNot(LEqual(DerefOf(Index(BUF0, Local0)), DerefOf(Index(BUF1, Local0))))) {
            Return(Zero)
        }
    }
    Return(One)
}
OperationRegion(\DEBG, SystemIO, ...
From: Stefan Unterweger
Date: Tuesday, May 4, 2010 - 3:20 pm

The download went surprisingly fast (and the install even moreso;
big thanks at this point for the folks who did the installer, it
seems rare that one can install an operating system in five
minutes).

Running the April 28 snapshot which I just grabbed from FTP this
instant doesn't change a thing---as expected, I get the same
"AML PARSE ERROR" kernel panic.

Here's a pseudo-diff from the previous dmesg to the 4.7-current one,
other than the vscsi stuff nothing changes:

-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----
-OpenBSD 4.6 (GENERIC.MP) #89: Thu Jul  9 21:32:39 MDT 2009
+OpenBSD 4.7-current (GENERIC.MP) #560: Wed Apr 28 11:55:01 MDT 2010

-avail mem = 1027940352 (980MB)
+avail mem = 1028833280 (981MB)

+vscsi0 at root
+scsibus1 at vscsi0: 256 targets
----->8----->8----->8----->8----->8----->8----->8----->8-----

acpidump(8) gives exactly the same result
(well, as expected; the ACPI tables didn't change, after all...).


  s//un

From: Stefan Unterweger
Date: Thursday, May 6, 2010 - 6:06 am

/*
RSD PTR: Checksum=20, OEMID=PTLTD, RsdtAddress=0x3fefcf28
 */
/*
RSDT: Length=44, Revision=1, Checksum=13,
	OEMID=PTLTD, OEM Table ID=  RSDT, OEM Revision=0x6040000,
	Creator ID= LTP, Creator Revision=0x0
 */
/*
	Entries={ 0x3fefef2e, 0x3fefefa2 }
 */
/*
	DSDT=0x3fefcf54
	INT_MODEL=PIC
	SCI_INT=9
	SMI_CMD=0x802f, ACPI_ENABLE=0xf0, ACPI_DISABLE=0xf1, S4BIOS_REQ=0x0
	PM1a_EVT_BLK=0x8000-0x8003
	PM1a_CNT_BLK=0x8004-0x8005
	PM2_TMR_BLK=0x8008-0x800b
	PM2_GPE0_BLK=0x8020-0x8023
	P_LVL2_LAT=101ms, P_LVL3_LAT=1001ms
	FLUSH_SIZE=0, FLUSH_STRIDE=0
	DUTY_OFFSET=1, DUTY_WIDTH=0
	DAY_ALRM=13, MON_ALRM=0, CENTURY=50
	Flags={WBINVD,PROC_C1}
 */
/*
DSDT: Length=8154, Revision=1, Checksum=247,
	OEMID=AMD, OEM Table ID=AMDACPI, OEM Revision=0x6040000,
	Creator ID=MSFT, Creator Revision=0x100000d
 */
DefinitionBlock (
"acpi_dsdt.aml",	//Output filename
"DSDT",			//Signature
0x1,			//DSDT Revision
"AMD",			//OEMID
"AMDACPI",		//TABLE ID
0x6040000			//OEM Revision
)

{
Scope(\_PR_) {
    Processor(CPU0, 0, 0x8010, 0x6) {
    }
    Processor(CPU1, 1, 0x0, 0x0) {
    }
}
Name(\_S0_, Package(0x4) {
    0x0,
    0x0,
    0x0,
    0x0,
})
Name(\_S1_, Package(0x4) {
    0x1,
    0x1,
    0x1,
    0x1,
})
Name(\_S4_, Package(0x4) {
    0x6,
    0x6,
    0x6,
    0x6,
})
Name(\_S5_, Package(0x4) {
    0x7,
    0x7,
    0x7,
    0x7,
})
Name(OSFL, 0x0)
Method(STRC, 2) {
    If(LNot(LEqual(SizeOf(Arg0), SizeOf(Arg1)))) {
        Return(0x0)
    }
    Add(SizeOf(Arg0), 0x1, Local0)
    Name(BUF0, Buffer(Local0) { })
    Name(BUF1, Buffer(Local0) { })
    Store(Arg0, BUF0)
    Store(Arg1, BUF1)
    While(Local0) {
        Decrement(Local0)
        If(LNot(LEqual(DerefOf(Index(BUF0, Local0)), DerefOf(Index(BUF1, Local0))))) {
            Return(Zero)
        }
    }
    Return(One)
}
OperationRegion(\DEBG, SystemIO, 0x80, 0x1)
Field(\DEBG, ByteAcc, NoLock, Preserve) {
    DBG1,	8
}
OperationRegion(KBC_, SystemIO, 0x64, 0x1)
Field(KBC_, ByteAcc, ...
From: Stefan T. Unterweger
Date: Monday, May 10, 2010 - 5:46 pm

I've done some additional tests (since I remembered that this
particular mainboard _did_ power off correctly a few years ago,
albeit it was running an ancient 2.4.something Linux at that
time, and maybe not even ACPI, so this does not really count).

I just installed NetBSD on the machine---I suppose it is close
enough to OpenBSD to make for a meaningful comparison. Here, the
poweroff works. Well, I still see some ACPI error messages from
the kernel fly by, but they're gone much too fast, and then the
machine powers off.

I'll try if I can find the responsible piece of NetBSD that works
around this mainboard quirk. Maybe there's hope that I'll get it
to work in OpenBSD at last, I don't really want to run NetBSD on
it. :o)

Since I don't really have any experience at all with kernel
hacking, especially not with black magic as ACPI, does anyone
have a pointer where I should start looking?


s//un

From: Stefan Unterweger
Date: Tuesday, May 11, 2010 - 9:27 am

Finally I've found that particular post again, and have been able
to fix the broken DSDT to some extent. With some dirty patchwork
acpi_load_dsdt now loads my custom table, and `shutdown -p -h`
succeeds in turning off the machine, without any more warnings.

A few questions'd remain, though:

- I don't suppose that there would be some "official" point in
  the ACPI driver where such workarounds would "belong"? The code
  looks clear enough to me, but I "speak" neither enough C nor
  ACPI to be sure...

- The patch seems almost too easy to me, but I'm not yet made
  that much progress in learning C. With all that memcpy going
  around, I have the uneasy feeling that I might be introducing
  some nasty memory holes...

The patch is against 4.6-release, since that's the version I was
planning to put on the machine.


Regards,
  s//un



--- acpi.c.orig	Tue May 11 18:07:10 2010
+++ acpi.c	Tue May 11 17:59:56 2010
@@ -48,6 +48,8 @@
 #define APMDEV_NORMAL	0
 #define APMDEV_CTL	8
 
+#include "custom_dsdt.h"
+
 #ifdef ACPI_DEBUG
 int acpi_debug = 16;
 #endif
@@ -889,6 +891,11 @@
 		}
 		memcpy((*dsdt)->q_data, handle.va, len);
 		(*dsdt)->q_table = (*dsdt)->q_data;
+
+		/* 5AEb+sk: Override the Tyan Tiger S2466's corrupt DSDT */
+		printf("Trying to override broken DSDT table...\n");
+		(*dsdt)->q_table = (struct acpi_table_header *)AmlCode;
+
 		acpi_unmap(&handle);
 	}
 }

From: Owain Ainsworth
Date: Tuesday, May 11, 2010 - 10:10 am

I assume you forgot to cvs add the custom_dsdt.h header there.

-0-
-- 
Celebrate Hannibal Day this year.  Take an elephant to lunch.

From: Stefan Unterweger
Date: Tuesday, May 11, 2010 - 11:30 am

The diff was made by hand, that's probably why I forgot to add it
to the post. :o)

custom_dsdt.h basically contains nothing more than the
compiled replacement DSDT as it comes out of iasl:

The modified kernel still reports the same broken DSDT when I do
acpidump, but maybe the "real" one is still in memory, and the
patched is only in-kernel.


s//un

Here's custom_dsdt.h, and following it the source file from which
it was assembled:
-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----
/*
 * 
 * Intel ACPI Component Architecture
 * ASL Optimizing Compiler version 20080701 [Jul  2 2009]
 * Copyright (C) 2000 - 2008 Intel Corporation
 * Supports ACPI Specification Revision 3.0a
 * 
 * Compilation of "Sorpigal.dsl" - Tue May 11 12:04:09 2010
 * 
 * C source code output
 *
 */
unsigned char AmlCode[] =
{
    0x44,0x53,0x44,0x54,0x41,0x1D,0x00,0x00,  /* 00000000    "DSDTA..." */
    0x01,0x85,0x41,0x4D,0x44,0x00,0x00,0x00,  /* 00000008    "..AMD..." */
    0x41,0x4D,0x44,0x41,0x43,0x50,0x49,0x00,  /* 00000010    "AMDACPI." */
    0x00,0x00,0x04,0x06,0x49,0x4E,0x54,0x4C,  /* 00000018    "....INTL" */
    0x01,0x07,0x08,0x20,0x10,0x1F,0x5F,0x50,  /* 00000020    "... .._P" */
    0x52,0x5F,0x5B,0x83,0x0B,0x43,0x50,0x55,  /* 00000028    "R_[..CPU" */
    0x30,0x00,0x10,0x80,0x00,0x00,0x06,0x5B,  /* 00000030    "0......[" */
    0x83,0x0B,0x43,0x50,0x55,0x31,0x01,0x00,  /* 00000038    "..CPU1.." */
    0x00,0x00,0x00,0x00,0x08,0x5F,0x53,0x30,  /* 00000040    "....._S0" */
    0x5F,0x12,0x06,0x04,0x00,0x00,0x00,0x00,  /* 00000048    "_......." */
    0x08,0x5F,0x53,0x31,0x5F,0x12,0x06,0x04,  /* 00000050    "._S1_..." */
    0x01,0x01,0x01,0x01,0x08,0x5F,0x53,0x34,  /* 00000058    "....._S4" */
    0x5F,0x12,0x0A,0x04,0x0A,0x06,0x0A,0x06,  /* 00000060    "_......." */
    0x0A,0x06,0x0A,0x06,0x08,0x5F,0x53,0x35,  /* 00000068    "....._S5" */
    0x5F,0x12,0x0A,0x04,0x0A,0x07,0x0A,0x07,  /* 00000070    "_......." */
    0x0A,0x07,0x0A,0x07,0x08,0x4F,0x53,0x46,  /* ...
From: Marco Peereboom
Date: Saturday, June 19, 2010 - 12:55 pm

Jordan and I are going to take this up with theo during the hackathon.
I am not sure I like having a gaping hole in the kernel but maybe we can
do something before securelevel.  I like it from a hacking perspective
though.


Previous thread: Dear friend, by Hou Ruoyu on Monday, May 3, 2010 - 5:42 am. (1 message)

Next thread: OpenBSD release party Amsterdam by Floor Terra on Monday, May 3, 2010 - 5:12 pm. (1 message)