There is also the imbalance across CPUs. I think I figured out what's
going there as well. The allocation happens on one CPU (via
page_alloc), but the tear down happens on the other CPU, which
accumulates the pages in the quicklist. So the quicklist of the busy
CPU is empty, while the one of the idle CPU goes up to the limit. When
I pin the loop to one CPU then the quicklists are stable.
Thanks,
tglx
--