From: Matthew Garrett <mjg59@srcf.ucam.org> Some HP laptops have a problem with their DSDT reporting as HP/SB400/10000, which includes some code which overrides all temperature trip points to 16C if the INTIN2 input of the I/O APIC is enabled. This input is incorrectly designated the ISA IRQ 0 via an interrupt source override even though it is wired to the output of the master 8259A and INTIN0 is not connected at all. So far two models have been identified, namely nx6125 and nx6325. Use a knob provided by the I/O APIC interrupt registration code to abandon any attempts to route IRQ 0 through the I/O APIC for these systems. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> --- Hello, Matthew, you have not added your sign-off with the original patch, please do so now. Rafael, please try this change together with patch-2.6.26-rc1-20080505-mpparse-acpi-noirq0-2 and see if this fixes your system. I have tried the patches with a wildcard entry added to acpi_dmi_table[] so that it trips for my system. The result is as follows: ENABLING IO-APIC IRQs ..TIMER: vector=0x31 apic1=-1 pin1=-1 apic2=-1 pin2=-1 ...trying to set up timer as Virtual Wire IRQ... works. (note the absence of any timer pin), which means the changes work as expected. Here the timer has been set up as a native local APIC interrupt routed through the 8259A set up as a "virtual wire". Ingo, please apply the two patches where appropriate. Maciej patch-2.6.26-rc1-20080505-mjg59-acpi-noirq0-0 diff -up --recursive --new-file linux-2.6.26-rc1-20080505.macro/arch/x86/kernel/acpi/boot.c linux-2.6.26-rc1-20080505/arch/x86/kernel/acpi/boot.c --- linux-2.6.26-rc1-20080505.macro/arch/x86/kernel/acpi/boot.c 2008-05-05 02:55:24.000000000 +0000 +++ linux-2.6.26-rc1-20080505/arch/x86/kernel/acpi/boot.c 2008-06-30 23:31:48.000000000 +0000 @@ -1049,6 +1049,17 @@ static int __init force_acpi_ht(const st } /* + * Don't register any I/O APIC entries for the 8254 timer IRQ. + */ +static ...
Assuming it works for Rafael (still haven't had time to pull my nx6125 out of storage): Signed-off-by: mjg@redhat.com -- Matthew Garrett | mjg59@srcf.ucam.org --
Rafael, could you please try the latest tip/master that i've just pushed out: http://people.redhat.com/mingo/tip.git/README it has the final form of these changes integrated - it should in theory work out of box on your system, with no boot parameters or other explicit quirks needed anywhere. Ingo --
I tested patches [1/2] (your version) and [2/2] on top of today's linux-next and they work just fine. Thanks, Rafael --
Can you please push them to linux-next, btw? They still apply cleanly to linux-next for me. Thanks, Rafael --
Well, I'm afraid that my information wasn't correct. The patches actually don't work, but I had my own additional patch that fixed the problem applied, which I've only just discovered. Sorry for the confusion and the issue is still unfixed. Thanks, Rafael --
Appended is a patch that actually works for me (it's missing the i386 part,
but I can't test that anyway).
Please note two things:
(1) The whole acpi_dmi_table[] thing originally depended on __i386__, so it
wouldn't work on x86-64 no matter what. I removed that dependecy, but
I have no idea why it was there and so I'm not sure if that's correct.
(2) The clear_IO_APIC_pin(apic1, pin1) done if disable_irq0_through_ioapic is
true is absolutely essential. The symptoms are 100% reproducible without it.
Thanks,
Rafael
---
arch/x86/kernel/acpi/boot.c | 43 +++++++++++++++++++++++++++++++++++++------
arch/x86/kernel/io_apic_64.c | 19 ++++++++++++++-----
2 files changed, 51 insertions(+), 11 deletions(-)
Index: linux-next/arch/x86/kernel/acpi/boot.c
===================================================================
--- linux-next.orig/arch/x86/kernel/acpi/boot.c
+++ linux-next/arch/x86/kernel/acpi/boot.c
@@ -1363,8 +1363,6 @@ static void __init acpi_process_madt(voi
return;
}
-#ifdef __i386__
-
static int __init disable_acpi_irq(const struct dmi_system_id *d)
{
if (!acpi_force) {
@@ -1415,6 +1413,17 @@ static int __init force_acpi_ht(const st
}
/*
+ * Don't register any I/O APIC entries for the 8254 timer IRQ.
+ */
+static int __init
+dmi_disable_irq0_through_ioapic(const struct dmi_system_id *d)
+{
+ pr_notice("%s detected: disabling IRQ 0 through I/O APIC\n", d->ident);
+ force_disable_irq0_through_ioapic();
+ return 0;
+}
+
+/*
* If your system is blacklisted here, but you find that acpi=force
* works for you, please contact acpi-devel@sourceforge.net
*/
@@ -1581,11 +1590,35 @@ static struct dmi_system_id __initdata a
DMI_MATCH(DMI_PRODUCT_NAME, "TravelMate 360"),
},
},
+ /*
+ * HP laptops which use a DSDT reporting as HP/SB400/10000,
+ * which includes some code which overrides all temperature
+ * trip points to 16C if the INTIN2 input of the I/O APIC
+ * is enabled. This input is ...thanks, applied to tip/x86, to give this some more testing. the clear_IO_APIC_pin() is the most worrisome aspect of the change - but since we are already in limited quirk mode, does it hurt? Maciej, any preferences? Ingo --
i know you havent submitted this for inclusion yet, but FYI it wont build on 32-bit with a config like: http://redhat.com/~mingo/misc/config-Mon_Jul__7_09_22_45_CEST_2008.bad Ingo --
Well, I wrote it was missing the i386 part. ;-) Thanks, Rafael --
It makes absolutely no sense and should be harmful to call clear_IO_APIC_pin(apic1, pin1) here, because both apic1 and pin1 should be equal to -1 here. If it has to be called, then I suppose the DMI matching did not work and the workaround has not been enabled. Rafael, can you please provide a full bootstrap log obtained with all the changes, but *without* this part? Also, can you please verify DMI IDs of your system (dmidecode or /sys/class/dmi, I am told)? I do think we should record DMI vendor and name information in the bootstrap log. It won't take a lot of space and will be more useful than some other IDs we never use for anything else, which we obtain from some other tables. Maciej --
Do you realize that the clear_IO_APIC_pin(apic1, pin1) thing is _only_ called I assume you want the boot log with the ACPI debugging messages enabled? See above. Thanks, Rafael --
Well, it is the very intent of the DMI quirk to set apic1 and pin1 both to -1, as a result of IRQ0 being absent from our I/O APIC interrupt routing table. Therefore if the quirk did indeed work, a call to clear_IO_APIC_pin() is useless and likely harmful as its callees don't do range checking (my understanding of code is it results in random poking at the local APIC through the FIX_APIC_BASE fixmap). There should be nothing to clear too, as interrupt redirection entries for all the I/O APIC inputs are cleared (the mask is set to 1 and the remaining fields zeroed) when clear_IO_APIC() is called from enable_IO_APIC() upon initialization and all the unused ones (not referred to from anywhere in the interrupt routing table) are never touched afterwards. Maciej --
Sorry, the patch I posted was _instead_ of your previous patch with the quirk, because that patch didn't work. I don't know why it didn't work, however, I can only say it didn't work after removing the __i386__ dependency of acpi_dmi_table[]. My patch is on top of the linux-next tree that didn't contain your patch. So, my patch adds a quirk that sets disable_irq0_through_ioapic to 1 (this variable is defined differently in my patch) and uses it to skip the part of check_timer() that breaks my box. I hope that makes things clear. Thanks, Rafael --
I don't recall seeing a note that my patch was reverted. I had a careful look at yours now and it applies on top of some diagnostic code which should never have been applied to the tree in the first place as it was not meant to and would also break the majority of systems out there. I am fairly sure this piece of code is the reason my implementation of the It does, thanks. However I insist on getting this issue dealt with the way I proposed -- check_timer() is too complicated and too fragile to mess with as our history has already shown. If IRQ0 is not to be routed through the I/O APIC, then it should never be registered as an I/O APIC interrupt in the first place. I am fairly with the offending change removed my fix will work for you as expected as I went through the effort of checking at the run time that once activated it makes the kernel correctly avoid the sequence in check_timer() that hits your system, so it is only the DMI ID matching that could have gone wrong, but your change indicates this is actually not the case. Maciej --
So I'm going to apply the appended combined patch on top of linux-next and
retest.
Thanks,
Rafael
---
arch/x86/kernel/acpi/boot.c | 53 ++++++++++++++++++++++++++++++++++++++-----
arch/x86/kernel/io_apic_64.c | 5 ----
2 files changed, 47 insertions(+), 11 deletions(-)
Index: linux-next/arch/x86/kernel/acpi/boot.c
===================================================================
--- linux-next.orig/arch/x86/kernel/acpi/boot.c
+++ linux-next/arch/x86/kernel/acpi/boot.c
@@ -83,6 +83,8 @@ int acpi_lapic;
int acpi_ioapic;
int acpi_strict;
+static int disable_irq0_through_ioapic __initdata;
+
u8 acpi_sci_flags __initdata;
int acpi_sci_override_gsi __initdata;
int acpi_skip_timer_override __initdata;
@@ -990,6 +992,10 @@ void __init mp_override_legacy_irq(u8 bu
int pin;
struct mp_config_intsrc mp_irq;
+ /* Skip the 8254 timer interrupt (IRQ 0) if requested. */
+ if (bus_irq == 0 && disable_irq0_through_ioapic)
+ return;
+
/*
* Convert 'gsi' to 'ioapic.pin'.
*/
@@ -1056,6 +1062,10 @@ void __init mp_config_acpi_legacy_irqs(v
for (i = 0; i < 16; i++) {
int idx;
+ /* Skip the 8254 timer interrupt (IRQ 0) if requested. */
+ if (i == 0 && disable_irq0_through_ioapic)
+ continue;
+
for (idx = 0; idx < mp_irq_entries; idx++) {
struct mp_config_intsrc *irq = mp_irqs + idx;
@@ -1363,8 +1373,6 @@ static void __init acpi_process_madt(voi
return;
}
-#ifdef __i386__
-
static int __init disable_acpi_irq(const struct dmi_system_id *d)
{
if (!acpi_force) {
@@ -1415,6 +1423,17 @@ static int __init force_acpi_ht(const st
}
/*
+ * Don't register any I/O APIC entries for the 8254 timer IRQ.
+ */
+static int __init
+dmi_disable_irq0_through_ioapic(const struct dmi_system_id *d)
+{
+ pr_notice("%s detected: disabling IRQ 0 through I/O APIC\n", d->ident);
+ disable_irq0_through_ioapic = 1;
+ return 0;
+}
+
+/*
* If your system is blacklisted here, but you find that acpi=force
* works for you, ...Wonderful! Thanks for your patience and cooperation. Maciej --
Unfortunately, it hangs solid very early on boot, before earlyprintk=vga kicks in. Thanks, Rafael --
BTW, did you even to look at the code _as_ _is_ in linux-next?
In fact, it is _impossible_ that either apic1 or pin1 are equal to -1 at this
point, because of this part:
/*
* Some BIOS writers are clueless and report the ExtINTA
* I/O APIC input from the cascaded 8259A as the timer
* interrupt input. So just in case, if only one pin
* was found above, try it both directly and through the
* 8259A.
*/
if (pin1 == -1) {
pin1 = pin2;
apic1 = apic2;
no_pin1 = 1;
} else if (pin2 == -1) {
pin2 = pin1;
apic2 = apic1;
}
that originates from your patch.
End even without this part apic1 and pin1 are _not_ equal to -1 on this box
(apic2 and pin2 are, but that's a different matter).
Thanks,
Rafael
--
Well, I hope it has not changed too much since my recent work in this And the conclusion is? If we leave MP-table setups aside as irrelevant for this configuration, we can be almost certain apic2 and pin2 are both equal to -1 at this point. This is because (unlike the MP table) ACPI has no provisions in its tables for ExtINTA APIC interrupt inputs. Therefore the only case apic2 and pin2 may not be equal -1 here is when the firmware had set up one of the inputs as such in the hardware before initiating bootstrap which has been subsequently noted by a piece of code in enable_IO_APIC() which examines the I/O APIC for such a condition. I have taken these circumstances very much into account when preparing the workaround, based on the assumption that if the firmware has set up an I/O APIC line as an ExtINTA interrupt, then it means it considers it suitable to use in such a manner. This furthermore implies the line should be safe to be used in any valid 8259A mode of operation, such as one we use to forward IRQ0 transparently through the 8259A (we double-check it just in case though, as workarounds for hardware bugs in the past made it not always true). The workaround therefore applies to genuine IRQ0 routing as reported by ACPI only and not any possible legacy ExtINTA fallback that we may attempt to use. Of course, as determined previously, the ExtINTA line is not safe to be used on your box, but it has not been set up by the firmware as an ExtINTA interrupt either, so the assumption mentioned above remains valid and has no negative impact on your system. At this point all of apic1, apic2, pin1 and pin2 should be equal -1, which means the reassignments you quoted Which means the workaround has not triggered and the rest of cosideration is therefore irrelevant. Please get us these DMI IDs, so that we can see what's wrong with the quirk. Maciej --
I have the nx6125 version of the laptop, though it's not long for this world. Regardless, dmi data below. Earlier, I looked at your DMI match strings up-thread and they seemed good, so you may be barking up the wrong tree here. ray@phoenix:~$ sudo grep . /sys/class/dmi/id/* /sys/class/dmi/id/bios_date:03/10/2006 /sys/class/dmi/id/bios_vendor:Hewlett-Packard /sys/class/dmi/id/bios_version:68DTT Ver. F.0E /sys/class/dmi/id/board_name:308B /sys/class/dmi/id/board_vendor:Hewlett-Packard /sys/class/dmi/id/board_version:KBC Version 45.26 /sys/class/dmi/id/chassis_asset_tag: /sys/class/dmi/id/chassis_serial:CND5500R4K /sys/class/dmi/id/chassis_type:10 /sys/class/dmi/id/chassis_vendor:Hewlett-Packard /sys/class/dmi/id/modalias:dmi:bvnHewlett-Packard:bvr68DTTVer.F.0E:bd03/10/2006:svnHewlett-Packard:pnHPCompaqnx6125(PZ880UA#ABA):pvrF.0E:rvnHewlett-Packard:rn308B:rvrKBCVersion45.26:cvnHewlett-Packard:ct10:cvr: /sys/class/dmi/id/product_name:HP Compaq nx6125 (PZ880UA#ABA) /sys/class/dmi/id/product_serial:CND5500R4K /sys/class/dmi/id/product_uuid:4D0AF52B-ED6E-DA11-1880-66990A524D29 /sys/class/dmi/id/product_version:F.0E /sys/class/dmi/id/sys_vendor:Hewlett-Packard /sys/class/dmi/id/uevent:MODALIAS=dmi:bvnHewlett-Packard:bvr68DTTVer.F.0E:bd03/10/2006:svnHewlett-Packard:pnHPCompaqnx6125(PZ880UA#ABA):pvrF.0E:rvnHewlett-Packard:rn308B:rvrKBCVersion45.26:cvnHewlett-Packard:ct10:cvr: --
Which workaround? There are no any workarounds related to this problem in the 20080704 linux-next I used those DMI IDs and your quirk didn't work, even after removing the dependency of acpi_dmi_table[] from __i386__. For this reason, I prepared an alternative patch that did work and posted it for your information. Now, you're saying that my patch couldn't work, while in fact it did. Thanks, Rafael --
This is completely broken -- you cannot blindly assume IRQ0 is wired to the pin #0 of the I/O APIC #0. You have to respect routing information provided by the system. Ingo, from the sequence above, I gather this code is currently in the tree: - replace_pin_at_irq(0, 0, 0, apic1, pin1); - apic1 = 0; - pin1 = 0; Please revert the change which introduced it. While I recall posting a patch which added code like this, I clearly stated it was solely for diagnostics of Rafael's system and not to apply to any tree. Maciej --
Shouldn't the setup_timer_IRQ0_pin(apic1, pin1, cfg->vector) be removed as well? Rafael --
Yes. The whole commit should be reverted as mentioned in the other e-mail (it didn't even bear my sign-off mark). This would include the line you are referring to as well. I have submitted a patch to add a 64-bit variation of replace_pin_at_irq() separately. Maciej --
I have had a look at the tree and there are two commits containing diagnostic code that should have never been applied to the tree. Their IDs in the linux-next tree are as follows: 90221a61a71b7ad659d8741cf1e404506b174982 a74a1cc3df0be89658bc735c8aed80c8392e2c15 Please revert them both. Thanks. Maciej --
Since we add a new entry to acpi_dmi_table[], I think it's necessary for this new entry to preserve the semantics of this table, that is "acpi=force overrules DMI blacklist", am I right? Thanks, Forrest --
Can you please elaborate? What exactly are we supposed to do apart from adding the new entries to the table? Thanks, Rafael --
Out of sheer curiosity. What makes you think that IRQ0 is not
connected to INT0 of IO APIC? The IRQ0 pin2 override?
Why would we trust that BIOS information if the broken BIOS does
weird things if INT2 of IO APIC is _not_ masked?
IMHO on those HP systems we should skip the (bogus?) timer override,
leave IRQ0 -> IOAPIC/INT0 and mask IOAPIC/INT2. Or at least we should
test that configuration.
I admit that I have lost track of who has provided which patches,
which patch does exactly what ((IO|L)A)PIC configuration, and who has
tested what combinations, and what the results were.
So I've just done the following (based on x86/master as of yesterday):
Booting an HP nx6325
(1) with "acpi_skip_timer_override" to avoid configuration
of IOAPIC/INT2 and to avoid masking of IOAPIC/INT0.
plus
(2) adding (hardcoded in check_timer()) some code to explicitly mask
IOAPIC/INT2 to quiesce the fan (work around the DSDT issue).
With that setup my test box boots w/o problems (no fan noise, no
throttling, no stablity problems so far). Here the relevant dmesg
parts:
Command line: root=/dev/sda2 video=radeonfb:noaccel debug apic=debug acpi_skip_timer_override
...
IOAPIC[0]: apic_id 2, version 0, address 0xfec00000, GSI
...
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: BIOS IRQ0 pin2 override ignored.
...
using C1E aware idle routine
...
init IO_APIC IRQs
IOAPIC[0]: Set routing entry (2-0 -> 0x30 -> IRQ 0 Mode:0 Active:0)
IOAPIC[0]: Set routing entry (2-1 -> 0x31 -> IRQ 1 Mode:0 Active:0)
IOAPIC[0]: Set routing entry (2-2 -> 0x32 -> IRQ 2 Mode:0 Active:0)
...
IO-APIC (apicid-pin) 2-16, 2-17, 2-18, 2-19, 2-20 not connected.
IOAPIC[0]: Set routing entry (2-21 -> 0x49 -> IRQ 21 Mode:1 Active:1)
IO-APIC (apicid-pin) 2-22, 2-23 not connected.
..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1
CPU0: AMD Turion(tm) 64 X2 Mobile Technology TL-60 stepping 02
Using local APIC timer interrupts.
...
IO APIC #2......
...
NR Dst Mask Trig ...The test I prepared a couple of weeks ago which Rafael ran for me. The test forcibly rerouted IRQ0 to INTIN0, but the timer did not work with This is why I prepared the workaround to ignore BIOS IRQ0 configuration Unfortunately it did not work for Rafael's box and diagnostic messages have been quiesced in the 64-bit variation of code so bootstrap logs I can be blamed for the vast majority. If you are unsure what does what, Now that you mentioned (2) a sudden insight made me realise in the absence of an overlapping override we register ISA IRQ2 as an edge-triggered I/O APIC interrupt, which means its input will be implicitly unmasked. But despite the ACPI spec does not make it explicitly clear, there is no such thing as ISA IRQ2 -- what used to be IRQ2 up to PC/XT got remapped to IRQ9 when the IR2 line of the 8259A was reused as a cascade to the slave and no device since has ever been able to make use of IRQ2. Therefore I think we should actually not register this I am assuming by "Virtual Wire IRQ" you mean a setup we are referring to as "8259A Virtual Wire" (without this additional qualifier the term is synonymous to ExtINTA). Given (C) did not work a workaround has been prepared to implement (A). As it does DMI ID matching and "knows" for this system (C) would not work I decided there is no point in trying such a configuration. Now you say your machine actually supports (C), which makes me wonder why Well, this is a piece of information I could not know. Documentation for the SB400 is not publicly available. I asked whether anybody could answer how this piece of circuitry has been designed in the chip and you are the One or more of the following: 1. Because the 8254 is not in fact routed to INTIN0 (as seen with the test). 2. Because almost everything else requires such an override and it was left in place in the code base by mistake/misunderstanding/etc. OK, to recap: given what you have written, I think we should ...
FWIW, it's my preferred one too. So far, I haven't seen any problems with this approach (ie. "acpi_skip_timer_override" plus masking IOAPIC/INT2 in check_timer()), even with the C1E aware idle routine. Thanks, Rafael --
There is some confusion here apparently -- you certainly have not tested the timer as an ExtINTA interrupt during the course of this discussion, because the "8259A Virtual Wire" through the local APIC has always worked for you. ExtINTA would be the next, fourth and final attempt before a panic(). For a reference, here is the list of configurations of the 8254 timer interrupt in the order they are tried: 1. Native I/O APIC interrupt. 2. "8259A Virtual Wire" through the I/O APIC. 3. "8259A Virtual Wire" through the local APIC of the bootstrap processor. 4. ExtINTA ("Virtual Wire") through the local APIC of the BSP. The configurations #1 and #2 are only tried if the firmware has supplied information about how the IRQ0 or 8259A have been wired. The #4 is only reached in very rare cases where there is some special glue logic between the 8259A and the APIC (seen before as a workaround for broken hardware; unlock_ExtINT_logic() is needed for that) or a given implementation of the 8259A is not a full one (hypothetical). Maciej --
Yes, you're right, sorry. So in fact I didn't test B. Thanks, Rafael --
