On Tue, Dec 07, 2010 at 10:32:45AM +0900, Minchan Kim wrote:
Ok.
When high-order node balancing is finished (aging zones as before), a
check is made to ensure that all zones are balanced for order-0. If not,
kswapd stays awake continueing to age zones as before. Zones will not age
as aggressively now that high-order balancing finishes but as part of the
bug report is too many pages being freed by kswapd, this is a good thing.
I'm guessing bad luck because it's not stable. There is a large memory
consumer running in the background. If the timing of when it got swapped
out changed, it could have regressed. It's not very stable between runs.
Sometimes the files deleted is not affected at all but every time the
read/writes per second is higher and the total time to completion is lower.
It'll still balance the zones for order-0, the size we care most about.
See the reset of the series. The main consequence of any_zone being a low
zone is that balancing can stop because ZONE_DMA is balanced even though it
is unusable for allocations. Patch 3 takes the classzone_idx into account
to identify when deciding if kswapd should go to sleep. The final patch in
the series replaces "any zone" with "at least 25% of the pages making up
the node must be balanced". The situation could be forced artifically by
preventing pages ever being allocated from ZONE_DMA but we wouldn't be able
to draw any sensible conclusion from it as patch 5 in the series handles it.
This is why I'm depending on Simon's reports to see if his corner case is fixed
while running other stress tests to see if anything else is noticeably worse.
"Time kswapd awake" is the time between when
Trace event mm_vmscan_kswapd_wake is recorded while kswapd is asleep
Trave event mm_vmscan_kswapd_sleep is recorded just before kswapd calls
schedule() to properly go to sleep.
It's possible to receive mm_vmscan_kswapd_wake multiple times while kswapd
is asleep but it is ignored.
If kswapd schedules out normally or is stalled on direct writeback, this
time is included in the above value. Maybe a better name for this is
"kswapd active".
Possibly, but I don't think so. I'm more inclined to blame the
effectively random interaction between postmark and the memory consumer
running in the background.
The only test I ran that would be affected is a streaming IO test but
it's only one aspect of memory reclaim behaviour (albeit it one that
people tend to complain about when it's broken)
About all I can report on is the streaming IO benchmarks results which
looks like;
MICRO
traceonly kanyzone
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 24.23 23.93
Total Elapsed Time (seconds) 916.18 916.69
FTrace Reclaim Statistics: vmscan
traceonly kanyzone
Direct reclaims 2437 2565
Direct reclaim pages scanned 1688201 1801142
Direct reclaim write file async I/O 0 0
Direct reclaim write anon async I/O 14 0
Direct reclaim write file sync I/O 0 0
Direct reclaim write anon sync I/O 0 0
Wake kswapd requests 1333358 1417622
Kswapd wakeups 107 116
Kswapd pages scanned 15801484 15706394
Kswapd reclaim write file async I/O 44 24
Kswapd reclaim write anon async I/O 25 0
Kswapd reclaim write file sync I/O 0 0
Kswapd reclaim write anon sync I/O 0 0
Time stalled direct reclaim (seconds) 1.79 0.98
Time kswapd awake (seconds) 387.60 410.26
Total pages scanned 17489685 17507536
%age total pages scanned/reclaimed 0.00% 0.00%
%age total pages scanned/written 0.00% 0.00%
%age file pages scanned/written 0.00% 0.00%
Percentage Time Spent Direct Reclaim 6.88% 3.93%
Percentage Time kswapd Awake 42.31% 44.75%
proc vmstat: Faults
micro-traceonly-v3r1-micromicro-kanyzone-v3r1-micro
traceonly-v3r1 kanyzone-v3r1
Major Faults 1943 1808
Minor Faults 55488625 55441993
Page ins 134044 126640
Page outs 73884 69248
Swap ins 2322 1972
Swap outs 7291 6521
Total pages scanned differ by 0.1% which is not much. Time to completion
is more or less the same. Faults, paging activity and swap activity are
all slightly reduced.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--