James Bottomley <James.Bottomley@HansenPartnership.com> writes:
Careful. There are two entities termed vector in this conversation.
There is the MSI-X vector which can hold up to 4096 entries per device.
There is the idt vector which has 256 entries per cpu.
I believe assign_irq_vector on x86_64 and soon on x86_32 does this already.
The number that was being changed was the irq number of for the
msi-x ``vectors'' from some random free irq number to roughly
bus(8 bits):device+function(8 bits):msix-vector(12 bits) so that we
could have a stable irq number for msi irqs.
Once pci domain is considered it is hard to claim we have enough bits.
I expect we need at least pci domains to have one per NUMA node, in
the general case.
The big motivation for killing NR_IRQS sized arrays comes from 2 directions.
msi-x which allows up to 4096 irqs per device and nic vendors starting
to produce cards with 256 queues, and from large SGI systems that don't do
I/O and want to be supported with the same kernel build as smaller systems.
A kernel built to handle 4096*32 irqs which is more or less reasonable if
the system was I/O heavy is a ridiculously sized array on smaller machines.
So a static irq_desc is out. And since with the combination of msi-x hotplug
we can not tell how many irq sources and thus irq numbers the machine is going
to have we can not reasonably even have a dynamic array at boot time. Further
we also want to allocate the irq_desc entries in node-local memory on NUMA
machines for better performance. Which means we need to dynamically allocate
irq_desc entries and have some lookup mechanism from irq# to irq_desc entry.
So once we have all of that. It becomes possible to look at assigning a static
irq number to each pci (bus:device:function:msi-x vector) pair so the system
is more reproducible.
Eric
--