I think I have an idea how to do this. Its a bit x86_64 specific but here
it goes.
We define a virtual area of NR_CPUS * 2M areas that are each mapped by a
PMD. That means we have a fixed virtual address for each cpus per cpu
area.
First cpu is at PER_CPU_START
Second cpu is at PER_CPU_START + 2M
So the per cpu area for cpu n is easily calculated using
PER_CPU_START + cpu << 19
without any lookups.
On bootup we allocate the 2M pages.
After boot is complete we allow the reduction of the size of the per cpu
areas . Lets say we only need 128k per cpu. Then the remaining pages will
be returned to the page allocator.
We create some sysfs thingy were one can see the current reserves of per
cpu storage. If one wants to reduce memory then one can write something to
that to return the remainder of the memory.
-