At least in my tests so far show that it can be a full replacement but
then I have only tested on x86_64 and Ia64. Its likely much easier to go
for the full replacement rather than in steps.
If we want dynamically sized virtually mapped per cpu areas then we may
have issues on 32 bit platforms and with !MMU. So I would think that a
fallback to a statically sized version may be needed. On the other hand
!MMU and 32 bit do not support a large number of processors. So we may be
able to get away on 32 bit with a small virtual memory area.
-