[PATCH 0/2 V3] powerpc - Make the irq reverse mapping tree lockless

Previous thread: [PATCH 1/3] Cleanup hfc multiport driver by Karsten Keil on Wednesday, August 6, 2008 - 6:30 am. (1 message)

Next thread: [PATCH 2/3] Cosmetic fixes by Karsten Keil on Wednesday, August 6, 2008 - 6:30 am. (1 message)
From: Sebastien Dugue
Date: Wednesday, August 6, 2008 - 6:30 am

Hi ,

  here is V3 for the powerpc IRQ radix tree reverse mapping rework.

  V2 -> V3: from comments by Benjamin Herrenschmidt and Daniel Walker

  - Move the initialization of the radix tree back into irq_late_init() and
    insert pre-existing irqs into the tree at that time.

  - One whitespace cleanup.

  V1 -> V2: from comments by Michael Ellerman

  - Initialize the XICS radix tree in xics code and only for that irq_host
    rather than doing it for all the hosts in the powerpc irq generic code
    (although the hosts list only contain one entry at the moment).

  - Add a comment in irq_radix_revmap_lookup() stating why it is safe to
    perform a lookup even if the radix tree has not been initialized yet.


  The goal of this patchset is to simplify the locking constraints on the radix
tree used for IRQ reverse mapping on the pSeries machines and provide lockless
access to this tree.

  This also solves the following BUG under preempt-rt:

BUG: sleeping function called from invalid context swapper(1) at kernel/rtmutex.c:739
in_atomic():1 [00000002], irqs_disabled():1
Call Trace:
[c0000001e20f3340] [c000000000010370] .show_stack+0x70/0x1bc (unreliable)
[c0000001e20f33f0] [c000000000049380] .__might_sleep+0x11c/0x138
[c0000001e20f3470] [c0000000002a2f64] .__rt_spin_lock+0x3c/0x98
[c0000001e20f34f0] [c0000000000c3f20] .kmem_cache_alloc+0x68/0x184
[c0000001e20f3590] [c000000000193f3c] .radix_tree_node_alloc+0xf0/0x144
[c0000001e20f3630] [c000000000195190] .radix_tree_insert+0x18c/0x2fc
[c0000001e20f36f0] [c00000000000c710] .irq_radix_revmap+0x1a4/0x1e4
[c0000001e20f37b0] [c00000000003b3f0] .xics_startup+0x30/0x54
[c0000001e20f3840] [c00000000008b864] .setup_irq+0x26c/0x370
[c0000001e20f38f0] [c00000000008ba68] .request_irq+0x100/0x158
[c0000001e20f39a0] [c0000000001ee9c0] .hvc_open+0xb4/0x148
[c0000001e20f3a40] [c0000000001d72ec] .tty_open+0x200/0x368
[c0000001e20f3af0] [c0000000000ce928] .chrdev_open+0x1f4/0x25c
[c0000001e20f3ba0] [c0000000000c8bf0] ...
From: Sebastien Dugue
Date: Wednesday, August 6, 2008 - 6:30 am

irq_radix_revmap() currently serves 2 purposes, irq mapping lookup
and insertion which happen in interrupt and process context respectively.

  Separate the function into its 2 components, one for lookup only and one
for insertion only.

  Fix the only user of the revmap tree (XICS) to use the new functions.

  Also, move the insertion into the radix tree of those irqs that were
requested before it was initialized at said tree initialization.

  Mutual exclusion between the tree initialization and readers/writers is
handled via an atomic variable (revmap_trees_allocated) set when the tree
has been initialized and checked before any reader or writer access just
like we used to check for tree.gfp_mask != 0 before.


Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
---
 arch/powerpc/include/asm/irq.h        |   18 ++++++-
 arch/powerpc/kernel/irq.c             |   76 ++++++++++++++++++++++++---------
 arch/powerpc/platforms/pseries/xics.c |   11 ++---
 3 files changed, 74 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index a372f76..0a51376 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -236,15 +236,27 @@ extern unsigned int irq_find_mapping(struct irq_host *host,
 extern unsigned int irq_create_direct_mapping(struct irq_host *host);
 
 /**
- * irq_radix_revmap - Find a linux virq from a hw irq number.
+ * irq_radix_revmap_insert - Insert a hw irq to linux virq number mapping.
+ * @host: host owning this hardware interrupt
+ * @virq: linux irq number
+ * @hwirq: hardware irq number in that host space
+ *
+ * This is for use by irq controllers that use a radix tree reverse
+ * mapping for fast lookup.
+ */
+extern void irq_radix_revmap_insert(struct irq_host *host, unsigned int virq,
+				    irq_hw_number_t ...
From: Benjamin Herrenschmidt
Date: Tuesday, August 19, 2008 - 10:21 pm

The atomic doesn't need to be such. Could just be a global. In fact, I
don't like your synchronization too much between the init and _insert. 

What I'd do is, turn your atomic into a simple int

 - do smp_wmb() and set it to 1 after the tree is initialized, then
   smb_wmb() again and set it to 2 after the tree has been filled
   by the init code 
 - in the revmap_lookup path just test that it's >1, no need for a
barrier. At worst you'll use the slow path instead of the fast path some
time during boot, no big deal.
 - in the insert path, do an smp_rb() and if it's >0 do the insert with
the lock
 - in the init pre-insert path, use the lock inside the loop for each
insertion.

that means you may get concurrent attempt to insert but the lock will
help there and turn them into 2 insertions of the same translation. Is
that a big deal ? If it is, make it be a lookup+insert.


--

From: Sebastien Dugue
Date: Wednesday, September 3, 2008 - 6:34 am

Hi Benjamin,

  sorry for the (long) delay, just came back from vacation.



  Yep, it mainly relies on some (hidden) assumptions about stuff ordering


  No problem with that, radix_tree_insert() will return EEXIST.

  Thanks,

--

From: Sebastien Dugue
Date: Wednesday, August 6, 2008 - 6:30 am

The radix trees used by interrupt controllers for their irq reverse mapping
(currently only the XICS found on pSeries) have a complex locking scheme
dating back to before the advent of the lockless radix tree.

  Take advantage of this and of the fact that the items of the tree are
pointers to a static array (irq_map) elements which can never go under us
to simplify the locking.

  Concurrency between readers and writers is handled by the intrinsic
properties of the lockless radix tree. Concurrency between writers is handled
with a spinlock added to the irq_host structure.


Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
---
 arch/powerpc/include/asm/irq.h |    1 +
 arch/powerpc/kernel/irq.c      |   74 ++++++----------------------------------
 2 files changed, 12 insertions(+), 63 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index 0a51376..72fd036 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -119,6 +119,7 @@ struct irq_host {
 		} linear;
 		struct radix_tree_root tree;
 	} revmap_data;
+	spinlock_t		tree_lock;
 	struct irq_host_ops	*ops;
 	void			*host_data;
 	irq_hw_number_t		inval_irq;
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index dc8663a..7a19103 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -439,8 +439,6 @@ void do_softirq(void)
 
 static LIST_HEAD(irq_hosts);
 static DEFINE_SPINLOCK(irq_big_lock);
-static DEFINE_PER_CPU(unsigned int, irq_radix_reader);
-static unsigned int irq_radix_writer;
 static atomic_t revmap_trees_allocated = ATOMIC_INIT(0);
 struct irq_map_entry irq_map[NR_IRQS];
 static unsigned int irq_virq_count = NR_IRQS;
@@ -584,57 +582,6 @@ void irq_set_virq_count(unsigned int count)
 		irq_virq_count = count;
 }
 
-/* radix tree not lockless safe ! we ...
From: Benjamin Herrenschmidt
Date: Tuesday, August 19, 2008 - 10:22 pm

No need for a spinlock in the irq_host. Make it one global lock, it's
not like scalability of irq_create_mapping() was a big deal and there's

--

From: Sebastien Dugue
Date: Wednesday, September 3, 2008 - 6:34 am

Right, done.

  Thanks,

--

From: Benjamin Herrenschmidt
Date: Tuesday, August 19, 2008 - 10:23 pm

BTW. It would be good to try to turn the GFP_ATOMIC into GFP_KERNEL,
maybe using a semaphore instead of a lock to protect insertion vs.
initialisation. The old scheme was fine because if the atomic allocation
failed, it could fallback to the linear search and try again on the next
interrupt. Not anymore.

Ben.


--

From: Sebastien Dugue
Date: Wednesday, September 3, 2008 - 6:41 am

a semaphore? are you meaning a mutex? If not, I fail to understand what you're

  Right, that's the problem with this new scheme and I'm still trying
to find a way to handle memory allocation failures be it for GFP_ATOMIC or
GFP_KERNEL.

  I could not think of anything simple so far and I'm open for suggestions.

  Thanks,

  Sebastien.

--

From: Benjamin Herrenschmidt
Date: Wednesday, September 3, 2008 - 7:52 pm

GFP_KERNEL should not fail, it will just block no ? If it fails, it's
probably catastrophic enough not to care. You can always fallback to
linear lookup. I don't know if it's worth trying to fire off a new
allocation attempt later, probably not.

Ben.


--

From: Sebastien Dugue
Date: Thursday, September 4, 2008 - 12:22 am

I've been pondering with this lately, but I think that adding a linear
lookup fallback should be OK.

  Thanks,

--

From: Benjamin Herrenschmidt
Date: Thursday, September 4, 2008 - 12:34 am

hrm... it used to never fail.. that may have changed. But it will
definitely block and try very hard to push things out to make space,

Well, the must be one in the case the tree isn't initialized yet, so if
there's an allocation failure, you may "de-initialize" it or
something... Or you can fallback if you don't find, as easy, probably

Yup.

Cheers,
Ben.


--

From: Sebastien Dugue
Date: Thursday, September 4, 2008 - 12:55 am

Right, but it still can fail - and you're right, we're in trouble already.

  There's nothing to 'de-initialize' here, or am I missing something?


  Thanks Ben,

  Sebastien.
--

From: Benjamin Herrenschmidt
Date: Thursday, September 4, 2008 - 12:58 am

Thanks for doing that work !

Cheers,
Ben.


--

From: Sebastien Dugue
Date: Thursday, September 4, 2008 - 1:04 am

Will do that way.

  Thanks,

  Sebastien.
--

Previous thread: [PATCH 1/3] Cleanup hfc multiport driver by Karsten Keil on Wednesday, August 6, 2008 - 6:30 am. (1 message)

Next thread: [PATCH 2/3] Cosmetic fixes by Karsten Keil on Wednesday, August 6, 2008 - 6:30 am. (1 message)