Please check dyn_array support for x86
Thanks
Yinghai Lu
--
YH you have not addressed any of my core concerns and this exceeds my review limit.
Unfortunately I don't feel like this is a productive process.My core concerns are:
- You have not separated out and separately pushed the regression patch. So that we can
fix the current rc release. Simply tuning NR_IRQS is all I feel comfortable with for
fixing things in the post merge window period.- The generic code has no business with dealing with NR_IRQS sized arrays.
Since we don't have a generic problem I don't see why we should have a generic dyn_array solution.- The dyn_array infrastructure does not provide for per numa node allocation of
irq_desc structures, limiting NUMA scalability.- You appear to be papering over problems instead of digging in and actually fixing them.
YH Here is what I was suggesting when the topic of killing NR_IRQs came up a week or so
ago.
http://lkml.org/lkml/2008/7/10/439
http://lkml.org/lkml/2008/7/10/532Which essentially boils down to:
- Removing NR_IRQS from the non-irq infrastructure code.
- Add a config option for architectures that are not going to use an array
- In the genirq code have a lookup function that goes from irq number to irq_desc *.The rest we should be able to handle in a arch dependent fashion.
When we are done we should be able to create a stable irq number for msi interrupts
that is something like: bus:dev:fun:vector_no which is 8+5+3+12=28 bits long.Eric
--
there is some NR_IRQS arrays are left over and not put into PER_CPU?
Wonder if could use dyn_array with some of them.
YH
--
Hi Eric,
Small nit: domain:bus:dev:fun:vector_no ... an SGI UV system can have potentially
512 domains (NODES), each having some # of busses.Thanks,
Mike
--
besides
arch/x86/kernel/io_apic_32.c:DEFINE_DYN_ARRAY(irq_2_pin, sizeof(struct
irq_pin_list), pin_map_size, 16, NULL);
arch/x86/kernel/io_apic_32.c:DEFINE_DYN_ARRAY(balance_irq_affinity,
sizeof(struct balance_irq_affinity), nr_irqs, PAGE_SIZE,
irq_affinity_init_work);
arch/x86/kernel/io_apic_32.c:DEFINE_DYN_ARRAY(irq_vector, sizeof(u8),
nr_irqs, PAGE_SIZE, irq_vector_init_work);
arch/x86/kernel/io_apic_64.c:DEFINE_DYN_ARRAY(irq_cfg, sizeof(struct
irq_cfg), nr_irqs, PAGE_SIZE, init_work);
arch/x86/kernel/io_apic_64.c:DEFINE_DYN_ARRAY(irq_2_pin, sizeof(struct
irq_pin_list), pin_map_size, sizeof(struct irq_pin_list), NULL);kernel/sched.c:DEFINE_PER_CPU_DYN_ARRAY_ADDR(per_cpu__kstat_irqs,
per_cpu__kstat.irqs, sizeof(unsigned int), nr_irqs, sizeof(unsigned
long), NULL);you plan to move irq_desc when irq_affinity is set to cpus on other node?
use dyn_array is less intrusive at this point. and dyn_array related
code is not big.
just NR_IRQS to nr_irqs to make the patches more bigger. actually it is simple.with acpi_madt probing, nr_irqs is much small. like 48 or 98. and
so we need one pointer array with that lookup function? what is the
pointer array index size?
or use list in that lookup function?how about irq migration from one cpu to another with different vector_no ?
YH
--
Still not based on UART_NR. Although Alan said he would take a look at it
x86_32 has it set to 1024 so 512 is too small. I think your patch
which essentially restores the old behavior is the right way to go for
this merge window. I just want to carefully look at it and ensure we
are restoring the old heuristics. On a lot of large machines we windYou have noticed how much of those arrays I have collapsed into irq_cfg
on x86_64. We can ultimately do the same on x86_32. The
tricky one is irq_2_pin. I believe the proper solution is to just
dynamically allocate entries and place a pointer in irq_cfg. AlthoughNot when irq_affinity is set. But rather allocate it with the on the
node where the device that generates the irq and the node where the
irq controller the irq goes through is located on. Which is where weI agree with your sentiment if we can actually allocate the irqs by
demand instead of preallocating them based on worst case usage we
should use much less memory.I figure that keeping any type of nr_irqs around you are requiring
us to estimate the worst case number of irqs we need to deal with.The challenge is that we have hot plug devices with MSI-X capabilities
on them. Just one of those could add 4K irqs (worst case). 256 or
so I have actually heard hardware guys talking about.But even one msi vector on a pci card that doesn't have normal irqs could
Please read the articles I mentioned. My first approximation would
Again in the referenced articles is my old patch that turns kstat.irqs
inside out. Allowing us to handle that case with a normal percpuSorry I was referring to the MSI-X source vector number which is a 12
bit index into an array of MSI-X vectors on the pci device, not the
vector we receive the irq at on the pci card.Eric
--
so there will be irq_desc and irq_cfg lists?
wonder if helper to get irq_desc and irq_cfg for one irq_no could be bottleneck?
PS: cpumask_t domain in irq_cfg need to updated... it wast 512bytes
when NR_CPUS=4096
could change it to unsigned int. logical mode (flat, x2apic logical) it as maskneed to comprise flexibility and performance..., or say waste some
good know. so one cpu handle one card? or need 16 cpus serve one
cards? or they got new cpu to NR_VECTORS with 32bit?so it is
cpu is going to check that vectors in addition to vectors in IDT?
YH
--
Yes. Which is 1024 irq sources/gsis only 1/4 used so it will fit into 256 irqs.
On x86_64 we have removed the confusing and brittle irq compression
code. So to handle that many irqs we would need 1024 irqs.Nah. We lookup whatever it we need in the 256 entry vector_irq table.
I expect we can do the container_of trick beyond that.If the helper which we should only see on the slow path is a bottleneck
we can easily turn organize irq_desc into a tree structure. Ultimately
I think we want drivers to have a struct irq *irq pointer but we needThe thing is there is no good upper bound of how many irqs we can see
Yes. Currently for the current worst case it requires 16 cpus.
The biggest I have heard a card using at this point is 256 irqs.
At lot of the goal in those cards is so they can have 2 irqs per cpu.Yes. But we can put all the arch specific code in irq_cfg, and put
No. The destination cpu and destination vector number are encoded in
the MSI message. Each MSI-X source ``vector'' has a different MSI message.So on my wish list is to stably encode the MSI interurrpt numbers. And
using a sparse irq address space I can. As it only takes 28 bits to hold
the complete bus + device + function + msi source [ 0-4095 ]Eric
--
Don't you need "domain" (node) in the bus:device:function:vector combination?
(Or [hack] use a lot bigger field for bus with the node encoded into it.)Thanks,
Mike
--
You definitely need domain, and that blows the 32-bit limit quite out of
the water.-hpa
--
Yes. Although when I dreamed it up it domain wasn't more then a twinkle in
someone's eye on x86. I'm not certain it is much more than that now.The interesting implication of this is that if you have the right hardware
and are absolutely loopy you can have more interrupt sources than can
be described in a 32bit unsigned int, and certainly more than any sane person
would allocate in a statically sized array.Eric
--
Yes, I'm quite convinced that the statically sized array is a bad idea.
-hpa
--
how about ioapic interrupt numbers...? they should stay with same
numbering with gsi?and how about pci segments : that will need another 4 bits for AMD
systems..aka 16 segments..you will run out of 32bits...
BTW:
kstat_irqs patch is there.
How are the progress with irq_cfg/irq_desc dyn allocating patch?YH
--
I also see little value in stably encoding IRQ numbers using
geographical identifiers. It seems that the only case where you care
that an interrupt number is stable is when it is *not* tied to a
geographically addressed entity, so why does it matter?-hpa
--
In the case of msi it is a minor. In the case of GSIs from ACPI it dramatically
simplified the code, and improved it's reliability. Because then everyone including
ACPI was always using the same.So in general principle I think we should have stable irq numbers if we can. Which
allows someone to say I have a problem with irq X. And it will always be irq X on
their box. An extra level of indirection makes debugging more difficult.Having a human readable name like: eth0irq22 or hbairq5 is likely just
as good in the case of msi. Still all of the users interfaces today take numbers.
So we are stuck with dealing with numbers for a long time to come.Eric
--
Long sparse numbers are messy, too, though. It might be interesting to
have a routine somewhere like "irq_name()" to output a human-readable
IRQ name, which in case of MSI-X could contain the PCI device name.-hpa
--
Yes. I want the option of using those bits. It might not be smart to
use them to encode a physical location and the irq number but just
having the option would be nice.Making /proc/interrupts useful without breaking user space is going to be
an interesting challenge one of these days.Eric
--
Urk! First of all, there isn't enough space as we have already proven
(on the machines where it actually matters there just aren't enough
bits), but doing this kind of stuff *optionally* is going to hurt even
worse.Furthermore, this crap will break anyway the *next* time someone comes
up with a new clever way to do interrupts -- and to truly get stable
identifiers, we can't treat HyperTransport MSI as APICs anymore, yadda,If changing to non-numbers in /proc/interrupts will break userspace,
then userspace will have to deal with a numeric token in
/proc/interrupts which will have to be looked up elsewhere (perhaps in a
sysfs directory) to get a more meaningful index.-hpa
--
With respect to space we have shown: We create many more irq_desc
entries then we use in practice. Which hurts us when it comes to
pace. Especially when compiling a single kernel for a wide range
of machines.Which is why I ultimately want a list or a tree data structure holding
irq_desc entries instead of an array. Arrays must be statically
oversized sized, waisting space and reducing our flexibility of
dealing with irqs at run time.Which says to me the low level architecture code that actually knows
at run time how many irqs there are should do the allocation of
irq_desc entries and allocating them on the appropriate NUMA node.All of which should yield no fixed cap short of 32 bits for the irq
number at run time. Not having an arbitrarily low cap is what I mean
by having the option of a sparsely allocated irq number. If we have a
nice data structure that is a side effect that comes essentially for
free.Except for upgrading the genirq code to pass things internally and to
the arch code in terms of irq_desc * entries. This should be very littleYes. There are those kinds of issues. I don't think I have yet come up with
a usable stable mapping for msi interrupts. Just something close.I expect what is most likely to work is after allocating the fixed irqs, to scan the
pci busses and for each for each pci device if msi is supported reserve 1 irq number.
If msi-X is supported reserve 4096 irq numbers. If ht-irqs are supported reserve
1 irq for each irq number. Hot plug slots that can ultimately have pci busses
plugged into them are going to be interesting. But I think if we make an
effort msi irq numbers will stop flapping in the breeze and are likely to
remain the same, and fit in the number of bits we have. While still not
requiring us to allocate storage for them. Potentially we can even treat
GSIs the same way. If we know that an ioapic line is simply not connected
we can reserve an irq number for it at boot but never allocate an irq_desc
structure for...
No, but the mapping from MSI-X vectors to -> {CPU, IDT} is arbitrary, as
the MSI(-X) address and data registers contain the target CPU and
destination vector, respectively. However, we may have to manage the
mappings directly, to re-use IDT entries and provide interrupt balancing.-hpa
--
so using rcu to get irq_desc.
YH
--
you moved kstat_irqs to irqdesc, and it will not numa-aware. if
irq_desc is not go with every cpu.YH
--
That part is a limitation of the per cpu allocator that the sgi guys
are in the process of fixing. Which is one of the following goals
of folding the pda into a per cpu structure.In practice it matters little as irqs only occur on one cpu at a time,
so we shouldn't have cache line contention.I never got to the arch specific part of allocating irq_desc in a numa
aware fashion. But I have always figured that if we move the work to
arch code it won't be too difficult, to do things appropriately.Eric
--
Dhaval Giani got:
kernel BUG at arch/x86/kernel/io_apic_64.c:357!
invalid opcode: 0000 [1] SMP
CPU 24
...his system (x3950) has 8 ioapic, irq > 256
caused by
commit 9b7dc567d03d74a1fbae84e88949b6a60d922d82
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Fri May 2 20:10:09 2008 +0200x86: unify interrupt vector defines
The interrupt vector defines are copied 4 times around with minimal
differences. Move them all into asm-x86/irq_vectors.hbecause 64bit allow same vector for different cpu to serve different irq
need to create that array dynamically later
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Tested-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>---
include/asm-x86/irq_vectors.h | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
Index: linux-2.6/include/asm-x86/irq_vectors.h
===================================================================
--- linux-2.6.orig/include/asm-x86/irq_vectors.h
+++ linux-2.6/include/asm-x86/irq_vectors.h
@@ -113,28 +113,26 @@# if defined(CONFIG_X86_IO_APIC) || defined(CONFIG_PARAVIRT) || defined(CONFIG_X86_VISWS)
+#ifdef CONFIG_X86_64
+# define NR_IRQS (32 * NR_CPUS + 224)
+#else
# define NR_IRQS 224
-
-# if (224 >= 32 * NR_CPUS)
-# define NR_IRQ_VECTORS NR_IRQS
-# else
-# define NR_IRQ_VECTORS (32 * NR_CPUS)
-# endif
+#endif# else /* IO_APIC || PARAVIRT */
# define NR_IRQS 16
-# define NR_IRQ_VECTORS NR_IRQS# endif
#else /* !VISWS && !VOYAGER */
# define NR_IRQS 224
-# define NR_IRQ_VECTORS NR_IRQS#endif /* VISWS */
+#define NR_IRQ_VECTORS NR_IRQS
+
/* Voyager specific defines */
/* These define the CPIs we use in linux */
#define VIC_CPI_LEVEL0 0
--
add DEFINE_DYN_ARRAY for dynamical array support
v2: other platform will have nr_irqs = NR_IRQS
for MAXSMP/UV: could set smaller nr_irqs in acpi_madt_oem_check in genx2_apic_uv_x
v3: seperate DYN_ARRAY and enabling to x86_64 to following patchesSigned-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
---
---
arch/x86/kernel/io_apic_32.c | 26 +++++++++++++------------
arch/x86/kernel/io_apic_64.c | 33 ++++++++++++++++----------------
arch/x86/kernel/irq_32.c | 8 +++----
arch/x86/kernel/irq_64.c | 8 +++----
arch/x86/kernel/irqinit_32.c | 2 -
arch/x86/kernel/irqinit_64.c | 2 -
drivers/char/hpet.c | 2 -
drivers/char/random.c | 4 +--
drivers/char/vr41xx_giu.c | 2 -
drivers/net/3c59x.c | 4 +--
drivers/net/hamradio/baycom_ser_fdx.c | 4 +--
drivers/net/hamradio/scc.c | 6 ++---
drivers/net/wan/sbni.c | 2 -
drivers/pci/intr_remapping.c | 16 +++++++--------
drivers/pcmcia/at91_cf.c | 2 -
drivers/pcmcia/vrc4171_card.c | 2 -
drivers/rtc/rtc-vr41xx.c | 4 +--
drivers/scsi/aha152x.c | 2 -
drivers/serial/8250.c | 4 +--
drivers/serial/amba-pl010.c | 2 -
drivers/serial/amba-pl011.c | 2 -
drivers/serial/cpm_uart/cpm_uart_core.c | 2 -
drivers/serial/m32r_sio.c | 4 +--
drivers/serial/serial_core.c | 2 -
drivers/serial/serial_lh7a40x.c | 2 -
drivers/serial/sh-sci.c | 2 -
drivers/serial/ucc_uart.c | 2 -
drivers/xen/events.c | 12 +++++------
fs/proc/proc_misc.c | 10 ++++-----
include/asm-x86/irq.h | 3 ++
include/linux/irq.h | 2 +
kernel/irq/autoprobe.c ...
so could put some crazy big array in bootmem in init stage.
use CONFIG_HAVE_DYN_ARRAY to enable it or not
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
---
include/asm-generic/vmlinux.lds.h | 7 +++++++
include/linux/init.h | 23 +++++++++++++++++++++++
init/main.c | 25 +++++++++++++++++++++++++
3 files changed, 55 insertions(+)Index: linux-2.6/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6.orig/include/asm-generic/vmlinux.lds.h
+++ linux-2.6/include/asm-generic/vmlinux.lds.h
@@ -214,6 +214,13 @@
* All archs are supposed to use RO_DATA() */
#define RODATA RO_DATA(4096)+#define DYN_ARRAY_INIT(align) \
+ . = ALIGN((align)); \
+ .dyn_array.init : AT(ADDR(.dyn_array.init) - LOAD_OFFSET) { \
+ VMLINUX_SYMBOL(__dyn_array_start) = .; \
+ *(.dyn_array.init) \
+ VMLINUX_SYMBOL(__dyn_array_end) = .; \
+ }
#define SECURITY_INIT \
.security_initcall.init : AT(ADDR(.security_initcall.init) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__security_initcall_start) = .; \
Index: linux-2.6/include/linux/init.h
===================================================================
--- linux-2.6.orig/include/linux/init.h
+++ linux-2.6/include/linux/init.h
@@ -249,6 +249,29 @@ struct obs_kernel_param {/* Relies on boot_command_line being set */
void __init parse_early_param(void);
+
+struct dyn_array {
+ void **name;
+ unsigned long size;
+ unsigned int *nr;
+ unsigned long align;
+ void (*init_work)(void *);
+};
+extern struct dyn_array *__dyn_array_start[], *__dyn_array_end[];
+
+#define DEFINE_DYN_ARRAY(nameX, sizeX, nrX, alignX, init_workX) \
+ static struct dyn_array __dyn_array_##nameX __initdata = \
+ { .name = (void **)&nameX,\
+ .size = sizeX,\
+ .nr = &nrX,\
+ .align = alignX,\
+ .init_work = init_workX,\
+ }; \
+ static struct dyn_array *__dyn_array_ptr_##...
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
---
drivers/char/random.c | 6 ++++++
1 file changed, 6 insertions(+)Index: linux-2.6/drivers/char/random.c
===================================================================
--- linux-2.6.orig/drivers/char/random.c
+++ linux-2.6/drivers/char/random.c
@@ -558,7 +558,13 @@ struct timer_rand_state {
};static struct timer_rand_state input_timer_state;
+
+#ifdef CONFIG_HAVE_DYN_ARRAY
+static struct timer_rand_state **irq_timer_state;
+DEFINE_DYN_ARRAY(irq_timer_state, sizeof(struct timer_rand_state *), nr_irqs, PAGE_SIZE, NULL);
+#else
static struct timer_rand_state *irq_timer_state[NR_IRQS];
+#endif/*
* This function adds entropy to the entropy "pool" by using timing
--
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
---
drivers/pci/intr_remapping.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)Index: linux-2.6/drivers/pci/intr_remapping.c
===================================================================
--- linux-2.6.orig/drivers/pci/intr_remapping.c
+++ linux-2.6/drivers/pci/intr_remapping.c
@@ -11,12 +11,14 @@ static struct ioapic_scope ir_ioapic[MAX
static int ir_ioapic_num;
int intr_remapping_enabled;-static struct {
+static struct irq_2_iommu {
struct intel_iommu *iommu;
u16 irte_index;
u16 sub_handle;
u8 irte_mask;
-} irq_2_iommu[NR_IRQS];
+} *irq_2_iommu;
+
+DEFINE_DYN_ARRAY(irq_2_iommu, sizeof(struct irq_2_iommu), nr_irqs, PAGE_SIZE, NULL);static DEFINE_SPINLOCK(irq_2_ir_lock);
--
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
---
include/linux/irq.h | 4 ++++
kernel/irq/handle.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 35 insertions(+)Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -181,7 +181,11 @@ struct irq_desc {
const char *name;
} ____cacheline_internodealigned_in_smp;+#ifdef CONFIG_HAVE_DYN_ARRAY
+extern struct irq_desc *irq_desc;
+#else
extern struct irq_desc irq_desc[NR_IRQS];
+#endif/*
* Migration helpers for obsolete names, they will go away:
Index: linux-2.6/kernel/irq/handle.c
===================================================================
--- linux-2.6.orig/kernel/irq/handle.c
+++ linux-2.6/kernel/irq/handle.c
@@ -48,6 +48,36 @@ handle_bad_irq(unsigned int irq, struct
* Controller mappings for all interrupt sources:
*/
int nr_irqs = NR_IRQS;
+
+#ifdef CONFIG_HAVE_DYN_ARRAY
+static struct irq_desc irq_desc_init = {
+ .status = IRQ_DISABLED,
+ .chip = &no_irq_chip,
+ .handle_irq = handle_bad_irq,
+ .depth = 1,
+ .lock = __SPIN_LOCK_UNLOCKED(irq_desc->lock),
+#ifdef CONFIG_SMP
+ .affinity = CPU_MASK_ALL
+#endif
+};
+
+static void __init init_work(void *data)
+{
+ struct dyn_array *da = data;
+ int i;
+ struct irq_desc *desc;
+
+ desc = *da->name;
+
+ for (i = 0; i < *da->nr; i++)
+ memcpy(&desc[i], &irq_desc_init, sizeof(struct irq_desc));
+}
+
+struct irq_desc *irq_desc;
+DEFINE_DYN_ARRAY(irq_desc, sizeof(struct irq_desc), nr_irqs, PAGE_SIZE, init_work);
+
+#else
+
struct irq_desc irq_desc[NR_IRQS] __cacheline_aligned_in_smp = {
[0 ... NR_IRQS-1] = {
.status = IRQ_DISABLED,
@@ -60,6 +90,7 @@ struct irq_desc irq_desc[NR_IRQS] __cach
#endif
}
};
+#endif/*
* What should we do if we get a hw irq event on an illegal vector?
--
set nr_irqs according to nr_cpu_ids, so could get small footprint when use
big kernel.Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
---
arch/Kconfig | 2 ++
arch/x86/Kconfig | 1 +
arch/x86/kernel/io_apic_64.c | 28 +++++++++++++++++++++-------
arch/x86/kernel/setup.c | 6 ++++++
arch/x86/kernel/vmlinux_64.lds.S | 3 +++
5 files changed, 33 insertions(+), 7 deletions(-)Index: linux-2.6/arch/Kconfig
===================================================================
--- linux-2.6.orig/arch/Kconfig
+++ linux-2.6/arch/Kconfig
@@ -103,3 +103,5 @@ config HAVE_CLK
The <linux/clk.h> calls support software clock gating and
thus are a key power management tool on many systems.+config HAVE_DYN_ARRAY
+ def_bool n
Index: linux-2.6/arch/x86/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86/Kconfig
+++ linux-2.6/arch/x86/Kconfig
@@ -33,6 +33,7 @@ config X86
select HAVE_ARCH_TRACEHOOK
select HAVE_GENERIC_DMA_COHERENT if X86_32
select HAVE_EFFICIENT_UNALIGNED_ACCESS
+ select HAVE_DYN_ARRAY if X86_64config ARCH_DEFCONFIG
string
Index: linux-2.6/arch/x86/kernel/io_apic_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/io_apic_64.c
+++ linux-2.6/arch/x86/kernel/io_apic_64.c
@@ -66,7 +66,7 @@ struct irq_cfg {
};/* irq_cfg is indexed by the sum of all RTEs in all I/O APICs. */
-static struct irq_cfg irq_cfg[NR_IRQS] __read_mostly = {
+static struct irq_cfg irq_cfg_legacy[] __initdata = {
[0] = { .domain = CPU_MASK_ALL, .vector = IRQ0_VECTOR, },
[1] = { .domain = CPU_MASK_ALL, .vector = IRQ1_VECTOR, },
[2] = { .domain = CPU_MASK_ALL, .vector = IRQ2_VECTOR, },
@@ -85,6 +85,17 @@ static struct irq_cfg irq_cfg[NR_IRQS] _
[15] = { .domain = CPU_MASK_ALL, .vector = IRQ15_VECTOR, },
};+static struct irq_cfg *irq_cfg;
+
+static void __init init_work(...
replace
[PATCH] serial: change irq_lists to use dyn_array
use small array with index to handle irq locking for serial port
hope 32 slot is enoughv2: according to Eric, move irq_no into irq_info, and not clean irq_no
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
---
drivers/serial/8250.c | 45 +++++++++++++++++++++++++++++++++++++++++----
1 file changed, 41 insertions(+), 4 deletions(-)Index: linux-2.6/drivers/serial/8250.c
===================================================================
--- linux-2.6.orig/drivers/serial/8250.c
+++ linux-2.6/drivers/serial/8250.c
@@ -147,9 +147,39 @@ struct uart_8250_port {
struct irq_info {
spinlock_t lock;
struct list_head *head;
+ int irq_no;
};-static struct irq_info irq_lists[NR_IRQS];
+#define NR_IRQ_INFO 32
+
+static struct irq_info irq_lists[NR_IRQ_INFO] = {
+ [0 ... NR_IRQ_INFO-1] = {
+ .irq_no = -1,
+ }
+};
+
+static struct irq_info *get_irq_info(int irq, int with_free)
+{
+ int i, first_free = -1;
+
+ for (i = 0; i < NR_IRQ_INFO; i++) {
+ if (irq_lists[i].irq_no == irq)
+ return &irq_lists[i];
+ if (irq_lists[i].irq_no == -1 && first_free == -1)
+ first_free = i;
+ }
+ if (!with_free)
+ return NULL;
+
+ if (first_free != -1) {
+ irq_lists[first_free].irq_no = irq;
+ return &irq_lists[first_free];
+ }
+
+ WARN_ON("NR_IRQ_INFO too small");
+
+ return NULL;
+}/*
* Here we define the default xmit fifo size used for each type of UART.
@@ -1541,9 +1571,12 @@ static void serial_do_unlink(struct irq_static int serial_link_irq_chain(struct uart_8250_port *up)
{
- struct irq_info *i = irq_lists + up->port.irq;
+ struct irq_info *i = get_irq_info(up->port.irq, 1);
int ret, irq_flags = up->port.flags & UPF_SHARE_IRQ ? IRQF_SHARED : 0;+ if (!i)
+ return -1;
+
spin_lock_irq(&i->lock);if (i->head) {
@@ -1567,7 +1600,11 @@ static int serial_link_irq_chain(structstatic void serial_unlink_irq_chain(struc...
so could make array in per_cpu is allocated dynamically too
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
---
arch/x86/kernel/setup_percpu.c | 7 +++-
include/asm-generic/vmlinux.lds.h | 6 ++++
include/linux/init.h | 27 ++++++++++++++++--
init/main.c | 57 ++++++++++++++++++++++++++++++++++++++
4 files changed, 92 insertions(+), 5 deletions(-)Index: linux-2.6/arch/x86/kernel/setup_percpu.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup_percpu.c
+++ linux-2.6/arch/x86/kernel/setup_percpu.c
@@ -140,7 +140,7 @@ static void __init setup_cpu_pda_map(voi
*/
void __init setup_per_cpu_areas(void)
{
- ssize_t size = PERCPU_ENOUGH_ROOM;
+ ssize_t size, old_size;
char *ptr;
int cpu;@@ -148,7 +148,8 @@ void __init setup_per_cpu_areas(void)
setup_cpu_pda_map();/* Copy section for each CPU (we discard the original) */
- size = PERCPU_ENOUGH_ROOM;
+ old_size = PERCPU_ENOUGH_ROOM;
+ size = old_size + per_cpu_dyn_array_size();
printk(KERN_INFO "PERCPU: Allocating %zd bytes of per cpu data\n",
size);@@ -176,6 +177,8 @@ void __init setup_per_cpu_areas(void)
per_cpu_offset(cpu) = ptr - __per_cpu_start;
memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);+ per_cpu_alloc_dyn_array(cpu, ptr + old_size);
+
}printk(KERN_DEBUG "NR_CPUS: %d, nr_cpu_ids: %d, nr_node_ids %d\n",
Index: linux-2.6/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6.orig/include/asm-generic/vmlinux.lds.h
+++ linux-2.6/include/asm-generic/vmlinux.lds.h
@@ -220,6 +220,12 @@
VMLINUX_SYMBOL(__dyn_array_start) = .; \
*(.dyn_array.init) \
VMLINUX_SYMBOL(__dyn_array_end) = .; \
+ } \
+ . = ALIGN((align)); \
+ .per_cpu_dyn_array.init : AT(ADDR(.per_cpu_dyn_array.init) - LOAD_OFFSET) { \
+ VMLINUX_SYMBOL(__per_cpu_dyn_array_start) = .; \
+ *(....
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
---
include/linux/kernel_stat.h | 4 ++++
kernel/sched.c | 5 ++++-
2 files changed, 8 insertions(+), 1 deletion(-)Index: linux-2.6/include/linux/kernel_stat.h
===================================================================
--- linux-2.6.orig/include/linux/kernel_stat.h
+++ linux-2.6/include/linux/kernel_stat.h
@@ -28,7 +28,11 @@ struct cpu_usage_stat {struct kernel_stat {
struct cpu_usage_stat cpustat;
+#ifdef CONFIG_HAVE_DYN_ARRAY
+ unsigned int *irqs;
+#else
unsigned int irqs[NR_IRQS];
+#endif
};DECLARE_PER_CPU(struct kernel_stat, kstat);
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -4021,9 +4021,12 @@ static inline void idle_balance(int cpu,
#endifDEFINE_PER_CPU(struct kernel_stat, kstat);
-
EXPORT_PER_CPU_SYMBOL(kstat);+#ifdef CONFIG_HAVE_DYN_ARRAY
+DEFINE_PER_CPU_DYN_ARRAY_ADDR(per_cpu__kstat_irqs, per_cpu__kstat.irqs, sizeof(unsigned int), nr_irqs, sizeof(unsigned long), NULL);
+#endif
+
/*
* Return p->sum_exec_runtime plus any more ns on the sched_clock
* that have not yet been banked in case the task is currently running.
--
| H. Peter Anvin | Re: [rft] s2ram wakeup moves to .c, could fix few machines |
| Greg Kroah-Hartman | [PATCH 002/196] Chinese: rephrase English introduction in HOWTO |
| Ingo Molnar | [patch] PID namespace design bug, workaround |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
git: | |
| Eric Dumazet | Re: Multicast packet loss |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
