Do it, instead of keeping in io_apic_32.c. This is the way x86_64 already does.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
---
arch/x86/kernel/apic_32.c | 6 +-----
arch/x86/kernel/io_apic_32.c | 2 +-
arch/x86/kernel/smpboot.c | 3 ++-
include/asm-x86/hw_irq.h | 3 ---
4 files changed, 4 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/apic_32.c b/arch/x86/kernel/apic_32.c
index fa8cf79..a06442a 100644
--- a/arch/x86/kernel/apic_32.c
+++ b/arch/x86/kernel/apic_32.c
@@ -1277,11 +1277,7 @@ int __init APIC_init_uniprocessor(void)
#endif
localise_nmi_watchdog();
end_local_APIC_setup();
-#ifdef CONFIG_X86_IO_APIC
- if (smp_found_config)
- if (!skip_ioapic_setup && nr_ioapics)
- setup_IO_APIC();
-#endif
+
setup_boot_clock();
return 0;
diff --git a/arch/x86/kernel/io_apic_32.c b/arch/x86/kernel/io_apic_32.c
index 7fc071f..cb79ce3 100644
--- a/arch/x86/kernel/io_apic_32.c
+++ b/arch/x86/kernel/io_apic_32.c
@@ -1606,7 +1606,7 @@ void /*__init*/ print_PIC(void)
#endif /* 0 */
-static void __init enable_IO_APIC(void)
+void __init enable_IO_APIC(void)
{
union IO_APIC_reg_01 reg_01;
int i8259_apic, i8259_pin;
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c4bb997..49d2900 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1179,13 +1179,14 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
*/
setup_local_APIC();
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_X86_IO_APIC
/*
* Enable IO APIC before setting up error vector
*/
if (!skip_ioapic_setup && nr_ioapics)
enable_IO_APIC();
#endif
+
end_local_APIC_setup();
map_cpu_to_logical_apicid();
diff --git a/include/asm-x86/hw_irq.h b/include/asm-x86/hw_irq.h
index 18f067c..31b56cd 100644
--- a/include/asm-x86/hw_irq.h
+++ b/include/asm-x86/hw_irq.h
@@ -66,10 +66,7 @@ extern void disable_IO_APIC(void);
extern void print_IO_APIC(void);
extern int ...Change unmap_cpu_to_logical_apicid to numa_remove_cpu.
Besides being shorter, it is the same name x86_64 uses. We
can save an ifdef in the code this way.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
---
arch/x86/kernel/smpboot.c | 8 ++------
1 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 49d2900..6316d30 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -181,13 +181,12 @@ static void map_cpu_to_logical_apicid(void)
map_cpu_to_node(cpu, node);
}
-static void unmap_cpu_to_logical_apicid(int cpu)
+static void numa_remove_cpu(int cpu)
{
cpu_2_logical_apicid[cpu] = BAD_APICID;
unmap_cpu_to_node(cpu);
}
#else
-#define unmap_cpu_to_logical_apicid(cpu) do {} while (0)
#define map_cpu_to_logical_apicid() do {} while (0)
#endif
@@ -945,10 +944,7 @@ restore_state:
if (boot_error) {
/* Try to put things back the way they were before ... */
- unmap_cpu_to_logical_apicid(cpu);
-#ifdef CONFIG_X86_64
numa_remove_cpu(cpu); /* was set by numa_add_cpu */
-#endif
cpu_clear(cpu, cpu_callout_map); /* was set by do_boot_cpu() */
cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
cpu_clear(cpu, cpu_possible_map);
@@ -1246,7 +1242,7 @@ void cpu_exit_clear(void)
cpu_clear(cpu, cpu_callout_map);
cpu_clear(cpu, cpu_callin_map);
- unmap_cpu_to_logical_apicid(cpu);
+ numa_remove_cpu(cpu);
}
# endif /* CONFIG_X86_32 */
--
1.5.4.5
--
This change looks wrong -- is native_smp_prepare_cpus() at all run on !CONFIG_SMP? I don't think so. You have to do this differently. Maciej --
How about this one instead? This one does enable_IO_APIC in the init_uniprocessor too, and should account for the !smp case.
Hmm, it looks a little bit better, but why do you want to call enable_IO_APIC() separately in the first place? There is a comment stating: "Enable IO APIC before setting up error vector," but why is it needed on 64-bit systems? Especially as the very same system may run a 32-bit kernel and then it suddenly would not have to do this anymore? Strange... Also since you are cleaning up this code -- why don't you actually take the opportunity and get rid of the horrible #ifdefs interspersed throughout? Maciej --
that enable_IO_APIC() only call record old ioapic/pin for timer and clear_IO_APIC. we need clear_IO_APIC before enable setup error vector, in case there is wrong setting in ioapic by BIOS... YH --
Fair enough, although it is still interesting why it would only trigger in the 64-bit mode and why shouldn't the BIOS be fixed instead. Maciej --
Then again -- what if X86_LOCAL_APIC is set, but X86_IO_APIC is not? Maciej --
config X86_LOCAL_APIC
def_bool y
depends on X86_64 || (X86_32 && (X86_UP_APIC || ((X86_VISWS ||
SMP) && !X86_VOYAGER) || X86_GENERICARCH))
config X86_IO_APIC
def_bool y
depends on X86_64 || (X86_32 && (X86_UP_IOAPIC || (SMP &&
!(X86_VISWS || X86_VOYAGER)) || X86_GENERICARCH))
for 64bit, those are all set.
for 32bit, may need to null stub if X86_IO_APIC is not set
YH
--
That does not answer the question what to do with a 32-bit kernel run on a system that requires the fixup in the 64-bit mode. Papering over such firmware bugs in the kernel encourages hardware vendors to maintain the breakage. Note that the I/O APIC comes out of reset with all the redirection entries masked out, so if any error conditions are triggered before Linux configures the chip, they are a result of a deliberate misprogramming done to the chip by the firmware. Does anyone have a reference to the original problem report which lead to this workaround having been put in place? Maciej --
kernel should not assume io apic register is set right by firmware. and kernel actually doesn't trust them, and clear the io apic registers. that patch just move that early before enable error vector. YH --
What is the basis of this assumption? Linux generally assumes the chipset components have been placed by the firmware into a consistent state. That does not necessarily mean suitable for Linux, hence the need to reconfigure a bit here or there, but there should be no need to touch components that are not going to be used by Linux directly. The I/O APIC The I/O APIC registers are cleared, because this is the only way you can assure inconsistent configuration does not happen during reconfiguration. For example two inputs using the same vector or set up into the ExtINTA mode. The original intent of the code was not to paper over breakage in the firmware. You are trying to change it and it can be done, but it has What I am saying repeatedly is clearing of the I/O APIC is not guaranteed to happen for all the possible cases of Linux configuration. Which means this is a partial solution only -- please try to propose a better one or provide the original problem report so that someone else can have a look at it. What's triggering the error interrupt for example? Is it recoverable? Maciej --
On Tue, Jun 10, 2008 at 12:36 PM, Maciej W. Rozycki ExtINT is routed to ioapic pin0. but the dst is set to 0. and the systems has multi sockets with quadcore cpu, so the apic id of boot cpu is set to 4 instead of 0 YH --
Thanks for the info. Let me understand the situation better: local APIC IDs are preassigned by the firmware based on their "socket address" and the socket where the lowest numbered quad would be is empty. Nevertheless the firmware sets the destination ID of the ExtINTA interrupt in the I/O APIC to 0 rather than the ID of the bootstrap CPU. Is that correct? But it would mean the Virtual Wire interrupt delivery would not work, or is the I/O APIC setup redundant and the local APIC of the bootstrap CPU is set up for ExtINTA delivery as well? Maciej --
Yes after I asked bios engineer to change the dest apic id to 4, the error is gone. before clear_IO_APIC() number of MP IRQ sources: 15. number of IO-APIC #0 registers: 24. number of IO-APIC #1 registers: 7. number of IO-APIC #2 registers: 7. number of IO-APIC #3 registers: 24. testing the IO APIC....................... IO APIC #0...... .... register #00: 00000000 ....... : physical APIC id: 00 .... register #01: 00170011 ....... : max redirection entries: 0017 ....... : PRQ implemented: 0 ....... : IO APIC version: 0011 .... register #02: 00000000 ....... : arbitration: 00 .... IRQ redirection table: NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect: 00 004 0 0 0 0 0 0 7 00 01 000 1 0 0 0 0 0 0 00 02 000 1 0 0 0 0 0 0 00 03 000 1 0 0 0 0 0 0 00 04 000 1 0 0 0 0 0 0 00 05 000 1 0 0 0 0 0 0 00 06 000 1 0 0 0 0 0 0 00 07 000 1 0 0 0 0 0 0 00 08 000 1 0 0 0 0 0 0 00 09 000 1 0 0 0 0 0 0 00 0a 000 1 0 0 0 0 0 0 00 0b 000 1 0 0 0 0 0 0 00 0c 000 1 0 0 0 0 0 0 00 0d 000 1 0 0 0 0 0 0 00 0e 000 1 0 0 0 0 0 0 00 0f 000 1 0 0 0 0 0 0 00 10 000 1 0 0 0 0 0 0 00 11 000 1 0 0 0 0 0 0 00 12 000 1 0 0 0 0 0 0 00 13 000 1 0 0 0 0 0 0 00 14 000 1 0 0 0 0 0 0 00 15 000 1 0 0 0 0 0 0 00 16 000 1 0 0 0 0 0 0 00 it doesn't need to virtual wire for timer (ExtInt), timer is hpet and routed to ioapic pin2. Just know at first BIOS engineer refused to change that to 4, because other os is not happy. YH --
That's not what I asked about -- the timer does not matter here. The Virtual Wire mode is a configuration, where one input of one APIC in the system is set up for the ExtINTA mode and acts transparently with the system software having no need to know about it. Instead a pair of legacy 8259A chips is used to deliver interrupts, including claiming the INTA cycles, providing vectors and prioritising sources, as defined in the PC/AT architecture. Many pieces of software rely on the 8259A PICs, either because they predate the APIC or because they have no means to make use of multiprocessor features anyway. They include various versions of DOS together with software run in that environment (as DOS programs quite frequently drive hardware at the low level), many versions of the Microsoft Windows system as well as other software. For these a legacy mode, either the Virtual Wire mode, or a mode where 8259A interrupts are delivered directly to one processor's INT line has to be implemented as mandated both by the Multiprocessor Specification and the Advanced Configuration and Power Interface Specification. Coming back to my question -- how is such a mode implemented in the affected system? Clearly the route through the I/O APIC is not going to work and moreover, the chip clutters the system with broken interrupt messages each time the 8259A signals an interrupt. Please note Linux can use the Virtual Wire mode in the APIC/SMP mode too, if requested by specifying the "noapic" command-line option -- have you Well, this is just a confirmation my attitude is correct -- such problems should not be papered over, because vendors will deny their existence then. At least a complaint message should be printed so that users have an opportunity to see it and ask their hardware supplier for an explanation. In this case, a workaround for the 64-bit mode happens to be quite cheap, but it should be extended to cover the 32-bit mode as well. The ...
throughout where? They're all over the place ;) My next target would be per-cpu data. But that's because there's _a lot_ of code in the tree that got ifdefs between 32 and 64-bit because of differences in that, specially irq statistics. A macro would do, but if we're gonna do it, let's do it right. --
