We do. The case has been made numerous times that we need at least several megabytes of per cpu memory in case someone creates gazillions of ip tunnels etc etc.
Yes most arches provide specialized registers for local per cpu variable access. There are cases though in which you have to access another processors cpu space.
The patches are reasonably small. The problem that Mike seems to have is early boot debugging.
--