I'd be glad too if you could get some numbers. I did some benchmarking a
few weeks ago on x86_64 and I found only a very minimal performance drop
if the calculation was simplified.
Note also that a smaller structure means that more page structs can be
covered by a certain amount of cachelines. Doing the alignment may cause
more cacheline misses.
-