Set your loopback MTU to some larger value if this result and
the locking overhead upsets you.
Also, woe be to the application that wants fast local interprocess
communication and doesn't use IPC_SHM, MAP_SHARED, pipes, or AF_UNIX
sockets. (there's not just one better facility, there are _four_!)
From this perspective, people way-overemphasize loopback performance,
and 999 times out of 1000 they prove their points using synthetic
benchmarks.
And don't give me this garbage about the application wanting to be
generic and therefore use IP sockets for everything. Either they want
to be generic, or they want the absolute best performance. Trying
to get an "or" and have both at the same time will result in
ludicrious hacks ending up in the kernel.
--