Unfortunately that doesn't actually work, because you can't have a reloc
with two variables.
In something like:
mov %gs:per_cpu__foo - 12345, %rax
mov %gs:per_cpu__foo, %rax
mov %gs:per_cpu__foo - 12345(%rip), %rax
mov %gs:per_cpu__foo(%rip), %rax
mov %gs:per_cpu__foo - __per_cpu_start, %rax
mov %gs:per_cpu__foo - __per_cpu_start(%rip), %rax
the last two lines will not assemble:
t.S:5: Error: can't resolve `per_cpu__foo' {*UND* section} - `__per_cpu_start' {*UND* section}
t.S:6: Error: can't resolve `per_cpu__foo' {*UND* section} - `__per_cpu_start' {*UND* section}
Unfortunately, the only way I can think of fixing this is to compute the
offset into a temp register, then use that:
lea per_cpu__foo(%rip), %rax
mov %gs:__per_cpu_offset(%rax), %rax
(where __per_cpu_offset is defined in the linker script as
-__per_cpu_start).
This seems to be a general problem with zero-offset per-cpu. And its
unfortunate, because no-register access to per-cpu variables is nice to
have.
The other alternative - and I have no idea whether this is practical or
possible - is to define a complete set of pre-offset per_cpu symbols.
J
--