"We have seen ramdisk based install systems, where some pages of mapped libraries and programs were suddendly zeroed under memory pressure. This should not happen, as the ramdisk avoids freeing its pages by keeping them dirty all the time," Christian Borntraeger began, explaining the need for his small patch to the ramdisk driver. He continued, "it turns out that there is a case, where the VM makes a ramdisk page clean, without telling the ramdisk driver. On memory pressure shrink_zone runs and it starts to run shrink_active_list. There is a check for buffer_heads_over_limit, and if true, pagevec_strip is called. pagevec_strip calls try_to_release_page. If the mapping has no releasepage callback, try_to_free_buffers is called. try_to_free_buffers has now a special logic for some file systems to make a dirty page clean, if all buffers are clean. Thats what happened in our test case."
He provided two methods for duplicating the reported problem, "you have to make buffer_heads_over_limit true" This is done by either lowering max_buffer_heads or having a system with lots of high memory. "The solution is to provide a noop-releasepage callback for the ramdisk driver. This avoids try_to_free_buffers for ramdisk pages."
"The current VM can get itself into trouble fairly easily on systems with a small ZONE_HIGHMEM, which is common on i686 computers with 1GB of memory," Rik van Riel said explaining a small patch to cmscan.c. He continued, "on one side, page_alloc() will allocate down to zone->pages_low, while on the other side, kswapd() and balance_pgdat() will try to free memory from every zone, until every zone has more free pages than zone->pages_high." He noted that highmem could be filled up with "page tables, ramfs, vmalloc allocations and other unswappable things quite easily and without many bad side effects, since we still have a huge ZONE_NORMAL to do future allocations from. However, as long as the number of free pages in the highmem zone is below zone->pages_high, kswapd will continue swapping things out from ZONE_NORMAL, too! Sami Farin managed to get his system into a stage where kswapd had freed about 700MB of low memory and was still 'going strong'." He described his patch:
"The attached patch will make kswapd stop paging out data from zones when there is more than enough memory free. We do go above zone->pages_high in order to keep pressure between zones equal in normal circumstances, but the patch should prevent the kind of excesses that made Sami's computer totally unusable."
Andrew Morton [interview] posted an overview of patches in -mm, discussing what is destined for inclusion in the upcoming 2.6.18 Linux kernel. He noted, "there is an unusually large amount of difficult material here." Patch sets that were discussed include a cleanup of kernel headers, klibc, various subsystem cleanups, the ACX1xx wireless driver, swsup cleanups, per-task statistic metrics, a clocksource management infrastructure, smpnice, swap prefetching [story], priority-inheriting futexes, a revamp of /proc/pid, ecryptfs, utsname virtualization [story], readahead, reiser4 improvements, a statistics infrastructure, and lock validation code.
Following up on a couple of features discussed earlier on KernelTrap, both swap-prefetching and utsname virtualization were briefly discussed. In regards to swap-prefetching Andrew noted, "I remain skeptical, but I have a lot of RAM. Multiple people have sung its praises. I guess I'll re-review and tentatively plan on sending them along or 2.6.18. Opinions are sought." As for utsname virtualization, "this doesn't seem very pointful as a standalone thing. That's a general problem with infrastructural work for a very large new feature. So probably I'll continue to babysit these patches, unless someone can identify a decent reason why mainline needs this work. I don't want to carry an ever-growing stream of OS-virtualisation groundwork patches for ever and ever so if we're going to do this thing... faster, please."
As RAM increasingly becomes a commodity, the prices drop and computer users are able to buy more. 32-bit archictectures face certain limitations in regards to accessing these growing amounts of RAM. To better understand the problem and the various solutions, we begin with an overview of Linux memory management. Understanding how basic memory management works, we are better able to define the problem, and finally to review the various solutions.
This article was written by examining the Linux 2.6 kernel source code for the x86 architecture types.