Sure. I guess I was thinking out loud that it's maybe somewhat implicit
that things must be serial at that point and I wasn't sure if it was meant
to be required or enforced, if it should be clarified with comments or
by explicitly adding the rcu calls in these couple additional places.
I've had a bunch of machine issues, but I did manage to do some testing.
I'd started looking at the regression after Anton Blanchard mentioned
seeing it via this simple bit of code:
http://ozlabs.org/~anton/junkcode/lock_comparison.c
It essentially spawns a number of threads to match the cpu count, each
thread looping 10000 times, where each loop does some trivial semop()'s.
Each thread has its own semaphore it is semop()'ing so there's no
contention.
To get a little more detail I hacked Anton's code minimally to record
results for thread counts 1..n and also to optionally have just a single
semaphore on which all of these threads are contending. I ran this on
a 64cpu machine (128 given SMT), but didn't make any effort to force
clean thread/cpu affinity.
The contended numbers don't look quite as consistent as they could at
the high end, but I think this is more quick/dirty test than patch.
Nevertheless the patch clearly outperforms mainline and despite the
noise actually looks to perform a more consistently than mainline
(see graphs).
Summary numbers from a run (with a bit more detail at the high thread
side as the numbers had more variation there and just showing the power
of two numbers for this run incorrectly implies a knee...I can do more
runs with averages/stats if people need more convincing):
threads uncontended contended
semop loops semop loops
2.6.26-rc2 +patchset 2.6.26-rc2 +patchset
1 2243.94 4436.25 2063.18 4386.78
2 2954.11 5445.12 67885.16 5906.72
4 4367.45 8125.67 72820.32 44263.86
8 7440.00 9842.66 60184.17 95677.58
16 12959.44 20323.97 136482.42 248143.80
32 35252.71 56334.28 363884.09 599773.31
48 62811.15 102684.67 515886.12 1714530.12
...
57 81064.99 141434.33 564874.69 2518078.75
58 79486.08 145685.84 693038.06 1868813.12
59 83634.19 153087.80 1237576.25 2828301.25
60 91581.07 158207.08 797796.94 2970977.25
61 89209.40 160529.38 1202135.38 2538114.50
62 89008.45 167843.78 1305666.75 2274845.00
63 97753.17 177470.12 733957.31 363952.62
64 102556.05 175923.56 1356988.88 199527.83
(detail plots from this same run attached...)
Nadia, you're welcome to add either or both of these to the series if
you'd like:
Reviewed-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Tested-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
--
Tim Pepper <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center