Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

Previous thread: [PATCH 02/15] x86: don't use gdt_page openly. by Glauber Costa on Monday, June 9, 2008 - 7:16 am. (16 messages)

Next thread: [PATCH 13/15] x86: remove cpu from maps by Glauber Costa on Monday, June 9, 2008 - 7:16 am. (2 messages)
From: Glauber Costa
Date: Monday, June 9, 2008 - 7:16 am

Do it, instead of keeping in io_apic_32.c. This is the way x86_64 already does.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
---
 arch/x86/kernel/apic_32.c    |    6 +-----
 arch/x86/kernel/io_apic_32.c |    2 +-
 arch/x86/kernel/smpboot.c    |    3 ++-
 include/asm-x86/hw_irq.h     |    3 ---
 4 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/apic_32.c b/arch/x86/kernel/apic_32.c
index fa8cf79..a06442a 100644
--- a/arch/x86/kernel/apic_32.c
+++ b/arch/x86/kernel/apic_32.c
@@ -1277,11 +1277,7 @@ int __init APIC_init_uniprocessor(void)
 #endif
 		localise_nmi_watchdog();
 	end_local_APIC_setup();
-#ifdef CONFIG_X86_IO_APIC
-	if (smp_found_config)
-		if (!skip_ioapic_setup && nr_ioapics)
-			setup_IO_APIC();
-#endif
+
 	setup_boot_clock();
 
 	return 0;
diff --git a/arch/x86/kernel/io_apic_32.c b/arch/x86/kernel/io_apic_32.c
index 7fc071f..cb79ce3 100644
--- a/arch/x86/kernel/io_apic_32.c
+++ b/arch/x86/kernel/io_apic_32.c
@@ -1606,7 +1606,7 @@ void /*__init*/ print_PIC(void)
 
 #endif  /*  0  */
 
-static void __init enable_IO_APIC(void)
+void __init enable_IO_APIC(void)
 {
 	union IO_APIC_reg_01 reg_01;
 	int i8259_apic, i8259_pin;
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c4bb997..49d2900 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1179,13 +1179,14 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 	 */
 	setup_local_APIC();
 
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_X86_IO_APIC
 	/*
 	 * Enable IO APIC before setting up error vector
 	 */
 	if (!skip_ioapic_setup && nr_ioapics)
 		enable_IO_APIC();
 #endif
+
 	end_local_APIC_setup();
 
 	map_cpu_to_logical_apicid();
diff --git a/include/asm-x86/hw_irq.h b/include/asm-x86/hw_irq.h
index 18f067c..31b56cd 100644
--- a/include/asm-x86/hw_irq.h
+++ b/include/asm-x86/hw_irq.h
@@ -66,10 +66,7 @@ extern void disable_IO_APIC(void);
 extern void print_IO_APIC(void);
 extern int ...
From: Glauber Costa
Date: Monday, June 9, 2008 - 7:16 am

Change unmap_cpu_to_logical_apicid to numa_remove_cpu.
Besides being shorter, it is the same name x86_64 uses. We
can save an ifdef in the code this way.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
---
 arch/x86/kernel/smpboot.c |    8 ++------
 1 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 49d2900..6316d30 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -181,13 +181,12 @@ static void map_cpu_to_logical_apicid(void)
 	map_cpu_to_node(cpu, node);
 }
 
-static void unmap_cpu_to_logical_apicid(int cpu)
+static void numa_remove_cpu(int cpu)
 {
 	cpu_2_logical_apicid[cpu] = BAD_APICID;
 	unmap_cpu_to_node(cpu);
 }
 #else
-#define unmap_cpu_to_logical_apicid(cpu) do {} while (0)
 #define map_cpu_to_logical_apicid()  do {} while (0)
 #endif
 
@@ -945,10 +944,7 @@ restore_state:
 
 	if (boot_error) {
 		/* Try to put things back the way they were before ... */
-		unmap_cpu_to_logical_apicid(cpu);
-#ifdef CONFIG_X86_64
 		numa_remove_cpu(cpu); /* was set by numa_add_cpu */
-#endif
 		cpu_clear(cpu, cpu_callout_map); /* was set by do_boot_cpu() */
 		cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
 		cpu_clear(cpu, cpu_possible_map);
@@ -1246,7 +1242,7 @@ void cpu_exit_clear(void)
 	cpu_clear(cpu, cpu_callout_map);
 	cpu_clear(cpu, cpu_callin_map);
 
-	unmap_cpu_to_logical_apicid(cpu);
+	numa_remove_cpu(cpu);
 }
 #  endif /* CONFIG_X86_32 */
 
-- 
1.5.4.5

--

From: Maciej W. Rozycki
Date: Monday, June 9, 2008 - 8:23 am

This change looks wrong -- is native_smp_prepare_cpus() at all run on
!CONFIG_SMP?  I don't think so.  You have to do this differently.

  Maciej
--

From: Glauber Costa
Date: Monday, June 9, 2008 - 8:52 am

You're right.

I'll update this one.
--

From: Glauber Costa
Date: Monday, June 9, 2008 - 12:44 pm

How about this one instead?

This one does enable_IO_APIC in the init_uniprocessor too, and should 
account for the !smp case.
From: Maciej W. Rozycki
Date: Monday, June 9, 2008 - 1:12 pm

Hmm, it looks a little bit better, but why do you want to call
enable_IO_APIC() separately in the first place?  There is a comment
stating: "Enable IO APIC before setting up error vector," but why is it
needed on 64-bit systems?  Especially as the very same system may run a
32-bit kernel and then it suddenly would not have to do this anymore?  
Strange...

 Also since you are cleaning up this code -- why don't you actually take
the opportunity and get rid of the horrible #ifdefs interspersed
throughout?

  Maciej
--

From: Yinghai Lu
Date: Monday, June 9, 2008 - 1:53 pm

that enable_IO_APIC() only call record old ioapic/pin for timer and
clear_IO_APIC.

we need clear_IO_APIC before enable setup error vector, in case there
is wrong setting in ioapic by BIOS...

YH
--

From: Maciej W. Rozycki
Date: Monday, June 9, 2008 - 2:00 pm

Fair enough, although it is still interesting why it would only trigger 
in the 64-bit mode and why shouldn't the BIOS be fixed instead.

  Maciej
--

From: Maciej W. Rozycki
Date: Monday, June 9, 2008 - 7:46 pm

Then again -- what if X86_LOCAL_APIC is set, but X86_IO_APIC is not?

  Maciej
--

From: Yinghai Lu
Date: Monday, June 9, 2008 - 10:08 pm

config X86_LOCAL_APIC
        def_bool y
        depends on X86_64 || (X86_32 && (X86_UP_APIC || ((X86_VISWS ||
SMP) && !X86_VOYAGER) || X86_GENERICARCH))

config X86_IO_APIC
        def_bool y
        depends on X86_64 || (X86_32 && (X86_UP_IOAPIC || (SMP &&
!(X86_VISWS || X86_VOYAGER)) || X86_GENERICARCH))

for 64bit, those are all set.

for 32bit, may need to null stub if X86_IO_APIC is not set

YH
--

From: Glauber Costa
Date: Tuesday, June 10, 2008 - 6:00 am

Fair Enough.
--

From: Maciej W. Rozycki
Date: Tuesday, June 10, 2008 - 6:30 am

That does not answer the question what to do with a 32-bit kernel run on
a system that requires the fixup in the 64-bit mode.  Papering over such
firmware bugs in the kernel encourages hardware vendors to maintain the
breakage.

 Note that the I/O APIC comes out of reset with all the redirection
entries masked out, so if any error conditions are triggered before Linux
configures the chip, they are a result of a deliberate misprogramming done
to the chip by the firmware.

 Does anyone have a reference to the original problem report which lead to
this workaround having been put in place?

  Maciej
--

From: Yinghai Lu
Date: Tuesday, June 10, 2008 - 12:09 pm

kernel should not assume io apic register is set right by firmware.
and kernel actually doesn't trust them, and clear the io apic registers.
that patch just move that early before enable error vector.

YH
--

From: Maciej W. Rozycki
Date: Tuesday, June 10, 2008 - 12:36 pm

What is the basis of this assumption?  Linux generally assumes the
chipset components have been placed by the firmware into a consistent
state.  That does not necessarily mean suitable for Linux, hence the need
to reconfigure a bit here or there, but there should be no need to touch
components that are not going to be used by Linux directly.  The I/O APIC

 The I/O APIC registers are cleared, because this is the only way you can
assure inconsistent configuration does not happen during reconfiguration.  
For example two inputs using the same vector or set up into the ExtINTA
mode.  The original intent of the code was not to paper over breakage in
the firmware.  You are trying to change it and it can be done, but it has

 What I am saying repeatedly is clearing of the I/O APIC is not guaranteed
to happen for all the possible cases of Linux configuration.  Which means
this is a partial solution only -- please try to propose a better one or
provide the original problem report so that someone else can have a look
at it.  What's triggering the error interrupt for example?  Is it
recoverable?

  Maciej
--

From: Yinghai Lu
Date: Tuesday, June 10, 2008 - 12:49 pm

On Tue, Jun 10, 2008 at 12:36 PM, Maciej W. Rozycki

ExtINT is routed to ioapic pin0. but the dst is set to 0.
and the systems has multi sockets with quadcore cpu, so the apic id of boot cpu
is set to 4 instead of 0

YH
--

From: Maciej W. Rozycki
Date: Tuesday, June 10, 2008 - 5:29 pm

Thanks for the info.  Let me understand the situation better: local APIC
IDs are preassigned by the firmware based on their "socket address" and
the socket where the lowest numbered quad would be is empty.  
Nevertheless the firmware sets the destination ID of the ExtINTA interrupt
in the I/O APIC to 0 rather than the ID of the bootstrap CPU.  Is that
correct?

 But it would mean the Virtual Wire interrupt delivery would not work, or
is the I/O APIC setup redundant and the local APIC of the bootstrap CPU is
set up for ExtINTA delivery as well?

  Maciej
--

From: Yinghai Lu
Date: Tuesday, June 10, 2008 - 7:32 pm

Yes

after I asked bios engineer to change the dest apic id to 4, the error is gone.

before clear_IO_APIC()
number of MP IRQ sources: 15.
number of IO-APIC #0 registers: 24.
number of IO-APIC #1 registers: 7.
number of IO-APIC #2 registers: 7.
number of IO-APIC #3 registers: 24.
testing the IO APIC.......................

IO APIC #0......
.... register #00: 00000000
.......    : physical APIC id: 00
.... register #01: 00170011
.......     : max redirection entries: 0017
.......     : PRQ implemented: 0
.......     : IO APIC version: 0011
.... register #02: 00000000
.......     : arbitration: 00
.... IRQ redirection table:
 NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
 00 004 0    0    0   0   0    0    7    00
 01 000 1    0    0   0   0    0    0    00
 02 000 1    0    0   0   0    0    0    00
 03 000 1    0    0   0   0    0    0    00
 04 000 1    0    0   0   0    0    0    00
 05 000 1    0    0   0   0    0    0    00
 06 000 1    0    0   0   0    0    0    00
 07 000 1    0    0   0   0    0    0    00
 08 000 1    0    0   0   0    0    0    00
 09 000 1    0    0   0   0    0    0    00
 0a 000 1    0    0   0   0    0    0    00
 0b 000 1    0    0   0   0    0    0    00
 0c 000 1    0    0   0   0    0    0    00
 0d 000 1    0    0   0   0    0    0    00
 0e 000 1    0    0   0   0    0    0    00
 0f 000 1    0    0   0   0    0    0    00
 10 000 1    0    0   0   0    0    0    00
 11 000 1    0    0   0   0    0    0    00
 12 000 1    0    0   0   0    0    0    00
 13 000 1    0    0   0   0    0    0    00
 14 000 1    0    0   0   0    0    0    00
 15 000 1    0    0   0   0    0    0    00
 16 000 1    0    0   0   0    0    0    00

it doesn't need to virtual wire for timer (ExtInt), timer is hpet and
routed to ioapic pin2.

Just know at first BIOS engineer refused to change that to 4, because
other os is not happy.

YH
--

From: Maciej W. Rozycki
Date: Wednesday, June 11, 2008 - 5:57 am

That's not what I asked about -- the timer does not matter here.  The 
Virtual Wire mode is a configuration, where one input of one APIC in the 
system is set up for the ExtINTA mode and acts transparently with the 
system software having no need to know about it.  Instead a pair of 
legacy 8259A chips is used to deliver interrupts, including claiming the 
INTA cycles, providing vectors and prioritising sources, as defined in the 
PC/AT architecture.

 Many pieces of software rely on the 8259A PICs, either because they 
predate the APIC or because they have no means to make use of 
multiprocessor features anyway.  They include various versions of DOS 
together with software run in that environment (as DOS programs quite 
frequently drive hardware at the low level), many versions of the 
Microsoft Windows system as well as other software.  For these a legacy 
mode, either the Virtual Wire mode, or a mode where 8259A interrupts are 
delivered directly to one processor's INT line has to be implemented as 
mandated both by the Multiprocessor Specification and the Advanced 
Configuration and Power Interface Specification.

 Coming back to my question -- how is such a mode implemented in the 
affected system?  Clearly the route through the I/O APIC is not going to 
work and moreover, the chip clutters the system with broken interrupt 
messages each time the 8259A signals an interrupt.

 Please note Linux can use the Virtual Wire mode in the APIC/SMP mode too,
if requested by specifying the "noapic" command-line option -- have you 

 Well, this is just a confirmation my attitude is correct -- such problems 
should not be papered over, because vendors will deny their existence 
then.  At least a complaint message should be printed so that users have 
an opportunity to see it and ask their hardware supplier for an 
explanation.

 In this case, a workaround for the 64-bit mode happens to be quite cheap,
but it should be extended to cover the 32-bit mode as well.  The ...
From: Glauber Costa
Date: Monday, June 9, 2008 - 2:02 pm

throughout where?
They're all over the place ;)

My next target would be per-cpu data. But that's because there's _a lot_ 
of code in the tree that got ifdefs between 32 and 64-bit because of 
differences in that, specially irq statistics. A macro would do, but if 
we're gonna do it, let's do it right.

--

Previous thread: [PATCH 02/15] x86: don't use gdt_page openly. by Glauber Costa on Monday, June 9, 2008 - 7:16 am. (16 messages)

Next thread: [PATCH 13/15] x86: remove cpu from maps by Glauber Costa on Monday, June 9, 2008 - 7:16 am. (2 messages)