Andrea Arcangeli announced the release of 2.6.5-rc2-aa3 saying, "Ok, this seems feature complete. Both nonlinear swapping and prio_tree are available now. I believe objrmap-core+anon-vma+prio_tree can be merged into mainline after a bit more of testing, certainly they looks good enough for -mm." This is part of a continued effort [forum] to replace the existing reverse-mapping in the 2.6 VM [story] with something that scales better on high-end systems.
The priority-based radix search tree, referred to as the prio_tree, has two indices known as the radix_index and the heap_index. The search tree itself is a simple binary radix tree using the radix_index, however "with an additional heap tree like property that a parent node's heap_index is always greater than or equal to the heap_indices of its children." Rajesh Venkatasubramanian further explains:
"An interesting property of the prio_tree is that they are useful to store and query intervals, for example, a file-mapped vm_area_struct can be considered as an interval of file pages. If we store all vmas that map a file in a prio_tree, then we can execute a stabbing query, i.e., choosing a set of vmas that a map a single file page or a set of contiguous file pages, in O(log n + m) time where "log n" indicates the height of the tree (maximum 64 in a 32 bit machine) and "m" represents the number of vmas that map the page(s)."
From: Andrea Arcangeli [email blocked]
To: linux-kernel
Subject: 2.6.5-rc2-aa3
Date: Fri, 26 Mar 2004 00:41:06 +0100
Ok, this seems feature complete. Both nonlinear swapping and prio_tree
are available now. I believe objrmap-core+anon-vma+prio_tree can be
merged into mainline after a bit more of testing, certainly they looks
good enough for -mm.
As usual this work wouldn't been possible without the efforts of Hugh
Dickins, Dave McCracken, and last but not the least Rajesh
Venkatasubramanian.
As usual I'm writing this on my desktop while running this kernel and it
swaps fine (uptime still 10 minutes though ;).
after some misc swapping (desktop like load) this is the profiling:
724627 total 0.2810
613729 default_idle 9589.5156
47565 mmx_clear_page 424.6875
9710 do_page_fault 7.2898
5965 buffered_rmqueue 9.8109
4589 delay_tsc 143.4062
3986 do_no_page 2.1853
2679 unmap_page_range 3.4171
2487 free_hot_cold_page 9.1434
2363 page_add_rmap 6.4212
1897 page_address 10.7784
1600 handle_mm_fault 0.6993
1594 release_pages 4.5284
1289 page_remove_rmap 5.7545
1229 lru_cache_add_active 15.3625
1041 __alloc_pages 1.3555
1002 find_pte 5.2188
952 __copy_to_user_ll 3.9667
858 __pagevec_lru_add_active 4.4688
720 probe_irq_on 2.2500
the prio_tree funcs are at the end, this is expected since I have normal
load:
14 prio_tree_insert 0.0203
8 prio_tree_first 0.0185
6 prio_tree_remove 0.0179
4 __vma_prio_tree_insert 0.0139
4 prio_tree_next 0.0081
2 __vma_prio_tree_remove 0.0054
the db simulator is running fine on the test box too.
URL:
http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.6/2.6.5-rc2-aa3.gz
Changelog diff between 2.6.5-rc2-aa2 and 2.6.5-rc2-aa3:
Only in 2.6.5-rc2-aa2: anon_vma.gz
Only in 2.6.5-rc2-aa3: anon-vma.gz
Rediffed to fixup a reject after objrmap-core update.
Files 2.6.5-rc2-aa2/extraversion and 2.6.5-rc2-aa3/extraversion differ
Files 2.6.5-rc2-aa2/objrmap-core.gz and 2.6.5-rc2-aa3/objrmap-core.gz differ
Fixup locking in swapoff, and microscalability optimization
in do_munmap. Both from the original objrmap patch from Dave McCracken.
Only in 2.6.5-rc2-aa3: prio-tree.gz
Drop computational complexity of swapping using objrmap, using
a prio tree for efficient search of the vma-ranges mapping
any given page in an inode. From Rajesh Venkatasubramanian.
More details can be found at his URL:
http://www-personal.engin.umich.edu/~vrajesh/linux/prio_tree/