Re: default IRQ affinity change in v2.6.27 (breaking several SMP PPC based systems)

Previous thread: NOHZ fixes for stable? by Elias Oltmanns on Friday, October 24, 2008 - 5:32 am. (2 messages)

Next thread: Does rt_down_timeout have a bug ? by Thibaut BOYER on Friday, October 24, 2008 - 5:48 am. (1 message)
From: Kumar Gala
Date: Friday, October 24, 2008 - 5:45 am

It appears the default IRQ affinity changes from being just cpu 0 to  
all cpu's.  This breaks several PPC SMP systems in which only a single  
processor is allowed to be selected as the destination of the IRQ.

What is the right answer in fixing this?  Should we:

	cpumask_t irq_default_affinity = 1;

instead of

	cpumask_t irq_default_affinity = CPU_MASK_ALL?

- k
--

From: Chris Snook
Date: Friday, October 24, 2008 - 8:17 am

On those systems, perhaps, but not universally.  There's plenty of hardware 
where the physical topology of the machine is abstracted away from the OS, and 
you need to leave the mask wide open and let the APIC figure out where to map 
the IRQs.  Ideally, we should probably make this decision based on the APIC, but 
if there's no PPC hardware that uses this technique, then it would suffice to 
make this arch-specific.

-- Chris
--

From: Kumar Gala
Date: Friday, October 24, 2008 - 8:39 am

What did those systems do before this patch?  Its one thing to expose  
a mask in the ability to change the default mask in /proc/irq/ 
default_smp_affinity.  Its another (and a regression in my opinion) to  
change the mask value itself.

As for making it ARCH specific, that doesn't really help since not all  
PPC hw has the limitation I spoke of.  Not even all MPIC (in our  
cases) have the limitation.

- k
--

From: Chris Snook
Date: Friday, October 24, 2008 - 9:09 am

Before the patch they took an extremely long time to boot if they had storage 
attached to each node of a multi-chassis system, performed poorly unless special 
irqbalance hackery or manual assignment was used, and imposed artificial 
restrictions on the granularity of hardware partitioning to ensure that CPU 0 

What did those systems do before this patch? :)

Making it arch-specific is an extremely simple way to solve your problem without 
making trouble for the people who wanted this patch in the first place.  If PPC 
needs further refinement to handle particular *PICs, you can implement that 
without touching any arch-generic code.

-- Chris
--

From: Kumar Gala
Date: Friday, October 24, 2008 - 9:36 am

So why not just have x86 startup code set irq_default_affinity =  
CPU_MASK_ALL than?

- k
--

From: Scott Wood
Date: Friday, October 24, 2008 - 10:39 am

That doesn't really solve the problem, as a user could still manually 
set an invalid affinity.  The MPIC driver should reduce the affinity 
itself to what the hardware can handle.

-Scott
--

From: Chris Snook
Date: Friday, October 24, 2008 - 11:18 am

Does the MPIC code actually allow that to happen?  I can't quite tell, but I 
noticed this:

[csnook@bernoulli sysdev]$ fgrep '#ifdef CONFIG_' mpic.c | sort -u
#ifdef CONFIG_IRQ_ALL_CPUS
#ifdef CONFIG_MPIC_BROKEN_REGREAD
#ifdef CONFIG_MPIC_U3_HT_IRQS
#ifdef CONFIG_MPIC_WEIRD
#ifdef CONFIG_PCI_MSI
#ifdef CONFIG_PM
#ifdef CONFIG_PPC32     /* XXX for now */
#ifdef CONFIG_PPC_DCR
#ifdef CONFIG_SMP

Do any of those config options (or combinations thereof) imply an MPIC that 
can't handle an IRQ masked to multiple CPUs?  If so, this can be fixed rather 
easily at build time, without having to muck around with arch-specific 
initialization code.

-- Chris
--

From: Scott Wood
Date: Friday, October 24, 2008 - 11:26 am

I don't think so, and in any case it should be detected at runtime from 
the device tree.

-Scott

--

From: Chris Snook
Date: Friday, October 24, 2008 - 10:51 am

It's an issue on Itanium as well, and potentially any SMP architecture with a 
non-trivial interconnect.

-- Chris
--

From: David Miller
Date: Friday, October 24, 2008 - 4:18 pm

From: Kumar Gala <galak@kernel.crashing.org>

Since the PPC code knows exactly which MPICs have the problem the
PPC code is where the constraining can occur.

I agree completely with the suggestion that the arch code has to
interpret the cpumask as appropriate for the hardware, since the
user can stick "illegal" values there anyways.
--

From: Max Krasnyansky
Date: Tuesday, November 18, 2008 - 11:43 pm

Sorry for delay in replying to this. And sorry for causing regression on some
ppc platforms.
I totally agree with what Dave said above. ALL_CPUS is a sane default,
platform code has to sanity check masks passed via set_affinity() calls
anyway. So I beleive it should be fixed in the platform code.

Max

--

Previous thread: NOHZ fixes for stable? by Elias Oltmanns on Friday, October 24, 2008 - 5:32 am. (2 messages)

Next thread: Does rt_down_timeout have a bug ? by Thibaut BOYER on Friday, October 24, 2008 - 5:48 am. (1 message)