Definitely and its very complex so any more eyes on this are appreciated.
The problem is that offsets relative to %gs or %fs are limited by the
small memory model that is chosen. We cannot have an offset large than
2GB. So we must have a linear address range and cannot use separate chunks
of memory. If we do not use the segment register then we cannot do atomic
(wrt interrupt) cpu ops.
Mike has done so and then I had to tell him what I just told you.
Right that is what cpu_alloc v2 did. It created a virtual mapping and
populated it on demand with 2MB PMD entries.
The relative to 0 stuff comes in at the x86_64 level because we want to
unify pda and percpu accesses. pda access have been relative to 0 and in
particular the stack canary in glibc directly accesses the pda at a
certain offset. So we must be zero based in order to preserve
compatibility with glibc.
Normal memory uses 2MB tlbs. There is no overhead therefore by mapping the
percpu areas using 2MB tlbs. So we do not need to be that complicated.
What v2 did was allocate an area n * MAX_VIRT_PER_CPU_SIZE in vmalloc
space and then it dynamically populated 2MB segments as needed. The MAX
size was 128MB or so.
We could either do the same on i386 or use 4kb mappings (then we can
directly use the vmalloc functionality). But then there would be
additional TLB overhead.
We have similar 2MB virtual mapping tricks for the virtual memmap.
Basically we can copy the functions and customize them for the virtual per
cpu areas (Mike is hopefully listening and reading the V2 patch ....)
--