Subject: [PATCH] x86_64: Dynamically allocate arch specific system vectors
From: Alan Mayer <ajm@sgi.com>
On some systems (e. g., UV) it is necessary to use an interrupt vector
as a "system" vector, that is, it is generated by system hardware, not an
IO device. This patch dynamically allocates them from the pool of interrupt
vectors below the fixed system vectors. This may include stealing some from
the device interrupt vector pool, so they are allocated dynamically so that
other archs don't have to pay the price. In UV, examples of these hardware
and software systems that need dynamically allocated vectors are the GRU,
the BAU, and XPM/XPC.
Signed-off-by: Alan Mayer <ajm@sgi.com>
Reviewed by: Robin Holt <holt@sgi.com> Dean Nelson <dcn@sgi.com> Cliff
Wickman <cpw@sgi.com>
---
Index: temp/arch/x86/kernel/io_apic_64.c
===================================================================
--- temp.orig/arch/x86/kernel/io_apic_64.c 2008-07-31 11:53:58.000000000
-0500
+++ temp/arch/x86/kernel/io_apic_64.c 2008-07-31 11:54:27.000000000 -0500
@@ -85,10 +85,6 @@
static int assign_irq_vector(int irq, cpumask_t mask);
-int first_system_vector = 0xfe;
-
-char system_vectors[NR_VECTORS] = { [0 ... NR_VECTORS-1] =
SYS_VECTOR_FREE};
-
#define __apicdebuginit __init
int sis_apic_bug; /* not actually supported, dummy for compile */
@@ -783,7 +779,8 @@
* Also, we've got to be careful not to trash gate
* 0x80, because int 0x80 is hm, kind of importantish. ;)
*/
- static int current_vector = FIRST_DEVICE_VECTOR, current_offset = 0;
+ static int current_vector = FIRST_DYNAMIC_VECTOR;
+ static int current_offset = 0;
unsigned int old_vector;
int cpu;
struct irq_cfg *cfg;
@@ -817,10 +814,10 @@
offset = current_offset;
next:
vector += 8;
- if (vector >= first_system_vector) {
+ if (vector > last_dynamic_device_vector) {
/* If we run out of vectors on large boxen, must share them. */
offset = (offset + 1) % ...Could you please explain words why the vector allocator does not work for you? This code at first glance looks a lot like duplicating the vector allocator. Eric --
I assume you mean create_irq() and destroy_irq(). They are close to what we need. However for any given system use: - we need to request a high priority vector for some irq's, rather than one randomly allocated as per __assign_irq_vector(). - we want the irq/vector to be targeted to all cpu's (as specified in a mask, and can include currently offline cpu's) rather than a single cpu. I suppose those abilities could be added to create_irq(), but we didn't want to intrude into that interface. A smaller consideration is simplicity of use. We want any such user to use the generic do_IRQ() flow (not alloc_intr_gate()). But make it easy to set up the irq/vector, irq_chip and irq_desc without getting intimate with the details. I suppose some other wrapper for an enhanced create_irq() could be done. We are going to need such irq/vector pairs for a couple of UV drivers (drivers/misc/sgi-gru/ and sgi-xp/). And would prefer it for the UV TLB shootdown (x86/kernel/tlb.uv.c) rather than using alloc_intr_gate(). -Cliff -- Cliff Wickman Silicon Graphics, Inc. cpw@sgi.com (651) 683-3824 --
Well I was actually thinking mostly about reusing assign_irq_vector. The high priority vector seems to be the place where we would need My primary concern is that you are generalizing an interface that is designed for alloc_intr_gate consumers. In particular for people who wish to allocate vectors that can be triggered by software to do this. If this is normal irq handling I would really prefer to use the normal data structures that we use for allocating vectors to irqs, I also want to ask why do you need an irq that fires on every cpu? That concept seems to be responsible for one of the nastiest data structures in the kernel. Where we track how many times any irq has occurred on any cpu. Look at the percpu variable kernel_stat. It seems to be one of the nastiest variables I know of. Of size: NR_CPUS*NR_CPUS*32 Assuming you have a machine with a reasonable number of irqs per cpu. Eric --
Okay, I think we have it now. assign_irq_vector *almost* does what we need. One minor thing is that assign_irq_vector ANDs against cpu_online_map. We would need cpu_possible_map, so we get the vector on offline cpus that may come online. The other thing is that assign_irq_vector doesn't allow the specification of interrupt priorities. It would need to be modified to handle returning either a high priority vector or a low priority vector. Would modifying the api for assign_irq_vector be the proper approach? The interrupts don't necessarily fire on all cpus, it's just that they *can* fire on any cpu. For example, the GRU triggers an interrupt (it is very IPI'ish) to a particular cpu in the event of a GRU TLB fault. That cpu handles the fault and returns. But the fault can happen on any cpu, so all cpus need to be registered for the same vector and irq. This is probably splitting hairs; it is certainly no different in principal from timer interrupts or processor TLB faults. As far as kernel_stat is concerned. I see you're point. NR_CPUS on our machines is going to be big (4K? 8K? something like that). NR_IRQS is also going to big because of that. It's unfortunate since the actual number of interrupt sources is going to be an order of magnitude smaller, at least. --ajm -- Make me an angel That flies from Montgomery. -- Alan J. Mayer SGI ajm@sgi.com WORK: 651-683-3131 HOME: 651-407-0134 -- --
I don't know if it makes sense to modify assign_irq_vector or to have a companion function that uses the same data structures. I think I would work on the companion function and if the code Reasonable. As long as you don't need to read a status register to figure out what to do that sounds reasonable. This does sound very much like splitting hairs on a very platform specific capability. If we can generalize the mechanism to things like per cpu timer interrupts and such so that we reduced the total amount of code we The number of interrupts sources is going to be smaller only because SGI machines have or at least appear to have poor I/O compared to most of the rest of machines in existence. NR_CPUS*16 is a fairly reasonable estimate on most machines in existence. In the short term it is going to get worse in the presence of MSI-X. I was talking to a developer at Intel last week about 256 irqs for one card. I keep having dreams about finding a way to just keep stats for a few cpus but alas I don't think that is going to happen. Silly us. Eric --
Okay, If I understand you, here's what we can do. We currently have this function that does pretty much what the combination of create_irq() and __assign_irq_vector() do. We can accomplish the same thing that our routine does using create_irq() and __assign_irq_vector() do if we make the following changes: __assign_irq_vector(int irq, cpumask_t mask) ==> __assign_irq_vector(int irq, cpumask_t mask, int priority); priority has three values: priority_none, priority_low, priority_high priority_none means do everything the way it is done now. priority_low means do everything the way its is done now, except use cpu_possible_map rather than cpu_online_map. priority_high means search the interrupt vectors from the top down, rather than from the bottom up and use cpu_possible_map rather than cpu_online_map. create_irq(void) ==> create_irq(int priority, cpumask_t *mask) priority_none, means do everything the way it is done now, passing in TARGET_CPUS as the mask, but also sending the priority arg. into __assign_irq_vector(). priority_low and priority_high means use create_irq()'s mask arg. as the mask passed to __assign_irq_vector). We would add an additional small routine on top of create_irq() to do any massaging of the irq_desc, etc. that we need for these system vectors. Is that what you were thinking about? -- It's getting to the point Where I'm no fun anymore. -- Alan J. Mayer SGI ajm@sgi.com WORK: 651-683-3131 HOME: 651-407-0134 -- --
Checking to insure that at least one of the cpus is online seems to be prudent, as well as what happens when the last online cpu in the group goes offline? Thanks, --
