Andrea Arcangeli announced the release of 2.6.5-rc2-aa3 saying, "Ok, this seems feature complete. Both nonlinear swapping and prio_tree are available now. I believe objrmap-core+anon-vma+prio_tree can be merged into mainline after a bit more of testing, certainly they looks good enough for -mm." This is part of a continued effort [forum] to replace the existing reverse-mapping in the 2.6 VM [story] with something that scales better on high-end systems.
The priority-based radix search tree, referred to as the prio_tree, has two indices known as the radix_index and the heap_index. The search tree itself is a simple binary radix tree using the radix_index, however "with an additional heap tree like property that a parent node's heap_index is always greater than or equal to the heap_indices of its children." Rajesh Venkatasubramanian further explains:
"An interesting property of the prio_tree is that they are useful to store and query intervals, for example, a file-mapped vm_area_struct can be considered as an interval of file pages. If we store all vmas that map a file in a prio_tree, then we can execute a stabbing query, i.e., choosing a set of vmas that a map a single file page or a set of contiguous file pages, in O(log n + m) time where "log n" indicates the height of the tree (maximum 64 in a 32 bit machine) and "m" represents the number of vmas that map the page(s)."
From: Andrea Arcangeli [email blocked] To: linux-kernel Subject: 2.6.5-rc2-aa3 Date: Fri, 26 Mar 2004 00:41:06 +0100 Ok, this seems feature complete. Both nonlinear swapping and prio_tree are available now. I believe objrmap-core+anon-vma+prio_tree can be merged into mainline after a bit more of testing, certainly they looks good enough for -mm. As usual this work wouldn't been possible without the efforts of Hugh Dickins, Dave McCracken, and last but not the least Rajesh Venkatasubramanian. As usual I'm writing this on my desktop while running this kernel and it swaps fine (uptime still 10 minutes though ;). after some misc swapping (desktop like load) this is the profiling: 724627 total 0.2810 613729 default_idle 9589.5156 47565 mmx_clear_page 424.6875 9710 do_page_fault 7.2898 5965 buffered_rmqueue 9.8109 4589 delay_tsc 143.4062 3986 do_no_page 2.1853 2679 unmap_page_range 3.4171 2487 free_hot_cold_page 9.1434 2363 page_add_rmap 6.4212 1897 page_address 10.7784 1600 handle_mm_fault 0.6993 1594 release_pages 4.5284 1289 page_remove_rmap 5.7545 1229 lru_cache_add_active 15.3625 1041 __alloc_pages 1.3555 1002 find_pte 5.2188 952 __copy_to_user_ll 3.9667 858 __pagevec_lru_add_active 4.4688 720 probe_irq_on 2.2500 the prio_tree funcs are at the end, this is expected since I have normal load: 14 prio_tree_insert 0.0203 8 prio_tree_first 0.0185 6 prio_tree_remove 0.0179 4 __vma_prio_tree_insert 0.0139 4 prio_tree_next 0.0081 2 __vma_prio_tree_remove 0.0054 the db simulator is running fine on the test box too. URL: http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.6/2.6.5-rc2-aa3.gz Changelog diff between 2.6.5-rc2-aa2 and 2.6.5-rc2-aa3: Only in 2.6.5-rc2-aa2: anon_vma.gz Only in 2.6.5-rc2-aa3: anon-vma.gz Rediffed to fixup a reject after objrmap-core update. Files 2.6.5-rc2-aa2/extraversion and 2.6.5-rc2-aa3/extraversion differ Files 2.6.5-rc2-aa2/objrmap-core.gz and 2.6.5-rc2-aa3/objrmap-core.gz differ Fixup locking in swapoff, and microscalability optimization in do_munmap. Both from the original objrmap patch from Dave McCracken. Only in 2.6.5-rc2-aa3: prio-tree.gz Drop computational complexity of swapping using objrmap, using a prio tree for efficient search of the vma-ranges mapping any given page in an inode. From Rajesh Venkatasubramanian. More details can be found at his URL: http://www-personal.engin.umich.edu/~vrajesh/linux/prio_tree/