Hi ,
here is V3 for the powerpc IRQ radix tree reverse mapping rework.
V2 -> V3: from comments by Benjamin Herrenschmidt and Daniel Walker
- Move the initialization of the radix tree back into irq_late_init() and
insert pre-existing irqs into the tree at that time.
- One whitespace cleanup.
V1 -> V2: from comments by Michael Ellerman
- Initialize the XICS radix tree in xics code and only for that irq_host
rather than doing it for all the hosts in the powerpc irq generic code
(although the hosts list only contain one entry at the moment).
- Add a comment in irq_radix_revmap_lookup() stating why it is safe to
perform a lookup even if the radix tree has not been initialized yet.
The goal of this patchset is to simplify the locking constraints on the radix
tree used for IRQ reverse mapping on the pSeries machines and provide lockless
access to this tree.
This also solves the following BUG under preempt-rt:
BUG: sleeping function called from invalid context swapper(1) at kernel/rtmutex.c:739
in_atomic():1 [00000002], irqs_disabled():1
Call Trace:
[c0000001e20f3340] [c000000000010370] .show_stack+0x70/0x1bc (unreliable)
[c0000001e20f33f0] [c000000000049380] .__might_sleep+0x11c/0x138
[c0000001e20f3470] [c0000000002a2f64] .__rt_spin_lock+0x3c/0x98
[c0000001e20f34f0] [c0000000000c3f20] .kmem_cache_alloc+0x68/0x184
[c0000001e20f3590] [c000000000193f3c] .radix_tree_node_alloc+0xf0/0x144
[c0000001e20f3630] [c000000000195190] .radix_tree_insert+0x18c/0x2fc
[c0000001e20f36f0] [c00000000000c710] .irq_radix_revmap+0x1a4/0x1e4
[c0000001e20f37b0] [c00000000003b3f0] .xics_startup+0x30/0x54
[c0000001e20f3840] [c00000000008b864] .setup_irq+0x26c/0x370
[c0000001e20f38f0] [c00000000008ba68] .request_irq+0x100/0x158
[c0000001e20f39a0] [c0000000001ee9c0] .hvc_open+0xb4/0x148
[c0000001e20f3a40] [c0000000001d72ec] .tty_open+0x200/0x368
[c0000001e20f3af0] [c0000000000ce928] .chrdev_open+0x1f4/0x25c
[c0000001e20f3ba0] [c0000000000c8bf0] ...irq_radix_revmap() currently serves 2 purposes, irq mapping lookup and insertion which happen in interrupt and process context respectively. Separate the function into its 2 components, one for lookup only and one for insertion only. Fix the only user of the revmap tree (XICS) to use the new functions. Also, move the insertion into the radix tree of those irqs that were requested before it was initialized at said tree initialization. Mutual exclusion between the tree initialization and readers/writers is handled via an atomic variable (revmap_trees_allocated) set when the tree has been initialized and checked before any reader or writer access just like we used to check for tree.gfp_mask != 0 before. Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <michael@ellerman.id.au> --- arch/powerpc/include/asm/irq.h | 18 ++++++- arch/powerpc/kernel/irq.c | 76 ++++++++++++++++++++++++--------- arch/powerpc/platforms/pseries/xics.c | 11 ++--- 3 files changed, 74 insertions(+), 31 deletions(-) diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index a372f76..0a51376 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -236,15 +236,27 @@ extern unsigned int irq_find_mapping(struct irq_host *host, extern unsigned int irq_create_direct_mapping(struct irq_host *host); /** - * irq_radix_revmap - Find a linux virq from a hw irq number. + * irq_radix_revmap_insert - Insert a hw irq to linux virq number mapping. + * @host: host owning this hardware interrupt + * @virq: linux irq number + * @hwirq: hardware irq number in that host space + * + * This is for use by irq controllers that use a radix tree reverse + * mapping for fast lookup. + */ +extern void irq_radix_revmap_insert(struct irq_host *host, unsigned int virq, + irq_hw_number_t ...
The atomic doesn't need to be such. Could just be a global. In fact, I don't like your synchronization too much between the init and _insert. What I'd do is, turn your atomic into a simple int - do smp_wmb() and set it to 1 after the tree is initialized, then smb_wmb() again and set it to 2 after the tree has been filled by the init code - in the revmap_lookup path just test that it's >1, no need for a barrier. At worst you'll use the slow path instead of the fast path some time during boot, no big deal. - in the insert path, do an smp_rb() and if it's >0 do the insert with the lock - in the init pre-insert path, use the lock inside the loop for each insertion. that means you may get concurrent attempt to insert but the lock will help there and turn them into 2 insertions of the same translation. Is that a big deal ? If it is, make it be a lookup+insert. --
Hi Benjamin, sorry for the (long) delay, just came back from vacation. Yep, it mainly relies on some (hidden) assumptions about stuff ordering No problem with that, radix_tree_insert() will return EEXIST. Thanks, --
The radix trees used by interrupt controllers for their irq reverse mapping
(currently only the XICS found on pSeries) have a complex locking scheme
dating back to before the advent of the lockless radix tree.
Take advantage of this and of the fact that the items of the tree are
pointers to a static array (irq_map) elements which can never go under us
to simplify the locking.
Concurrency between readers and writers is handled by the intrinsic
properties of the lockless radix tree. Concurrency between writers is handled
with a spinlock added to the irq_host structure.
Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
---
arch/powerpc/include/asm/irq.h | 1 +
arch/powerpc/kernel/irq.c | 74 ++++++----------------------------------
2 files changed, 12 insertions(+), 63 deletions(-)
diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index 0a51376..72fd036 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -119,6 +119,7 @@ struct irq_host {
} linear;
struct radix_tree_root tree;
} revmap_data;
+ spinlock_t tree_lock;
struct irq_host_ops *ops;
void *host_data;
irq_hw_number_t inval_irq;
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index dc8663a..7a19103 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -439,8 +439,6 @@ void do_softirq(void)
static LIST_HEAD(irq_hosts);
static DEFINE_SPINLOCK(irq_big_lock);
-static DEFINE_PER_CPU(unsigned int, irq_radix_reader);
-static unsigned int irq_radix_writer;
static atomic_t revmap_trees_allocated = ATOMIC_INIT(0);
struct irq_map_entry irq_map[NR_IRQS];
static unsigned int irq_virq_count = NR_IRQS;
@@ -584,57 +582,6 @@ void irq_set_virq_count(unsigned int count)
irq_virq_count = count;
}
-/* radix tree not lockless safe ! we ...No need for a spinlock in the irq_host. Make it one global lock, it's not like scalability of irq_create_mapping() was a big deal and there's --
BTW. It would be good to try to turn the GFP_ATOMIC into GFP_KERNEL, maybe using a semaphore instead of a lock to protect insertion vs. initialisation. The old scheme was fine because if the atomic allocation failed, it could fallback to the linear search and try again on the next interrupt. Not anymore. Ben. --
a semaphore? are you meaning a mutex? If not, I fail to understand what you're Right, that's the problem with this new scheme and I'm still trying to find a way to handle memory allocation failures be it for GFP_ATOMIC or GFP_KERNEL. I could not think of anything simple so far and I'm open for suggestions. Thanks, Sebastien. --
GFP_KERNEL should not fail, it will just block no ? If it fails, it's probably catastrophic enough not to care. You can always fallback to linear lookup. I don't know if it's worth trying to fire off a new allocation attempt later, probably not. Ben. --
I've been pondering with this lately, but I think that adding a linear lookup fallback should be OK. Thanks, --
hrm... it used to never fail.. that may have changed. But it will definitely block and try very hard to push things out to make space, Well, the must be one in the case the tree isn't initialized yet, so if there's an allocation failure, you may "de-initialize" it or something... Or you can fallback if you don't find, as easy, probably Yup. Cheers, Ben. --
Right, but it still can fail - and you're right, we're in trouble already. There's nothing to 'de-initialize' here, or am I missing something? Thanks Ben, Sebastien. --
Thanks for doing that work ! Cheers, Ben. --
Will do that way. Thanks, Sebastien. --
