There is one other possibility. Typically the swap code is using
compatibility disk I/O functions instead of the best the kernel
can offer. I haven't looked recently but it might be worth just
making certain that there isn't some low-level optimization or
cleanup possible on that path. Although I may just be thinking
of swapfiles.
I know there were tremendous gains ago when I removed the functions
that wrote pages synchronously to swapfiles.
Eric
-