I understand that you use TLB mappings.
What I'm suggesting is to very early on set ar.k3 to something which
makes accesses go through the __per_cpu image copy in the main kernel
image.
You could even set up a dummy TLB mapping during this early boot
period.
Otherwise it's just cleverness that is unique to IA64 and is going to
constantly run into issues like this. An alternative is to implement
your own sched_clock() et al. where you can adhere to whatever special
rules your platform may have.
--