Extend the this_cpu_ops infrastructure to support this_cpu_inc_return and
then use the new functionality in various subsystems. Provide an optimize x86
implementation for these operations.
All these patches depends on the initial patch being applied first.
Patches have been reviewed a couple of times and I would suggest that
they go through Tejun's percpu tree for merging.
--