Actually the case you mention it is part of the incentive for
this change.
Branch prediction fares very poorly in such cases, and
therefore it is better to mispredict one branch over
all the data items in the same cache line than any one
of several such branches. The above new sequence gets
emitted by the compiler as several integer operations and
one branch. As long as all the data items are in the
same cacheline, this is optimal.
We made such a change for ethernet address comparisons a
few years ago. At the time Eric showed that it mattered
a lot for Athlon processors.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html