It would, but it'd obscure the code to take advantage of that.
They have to fiddle with the size of the unmovable partition if their
workload uses more unmovable kernel allocations than expected. This was
always going to be the restriction with using zones for partitioning
memory. Resizing zones on the fly is not really an option because the
resizing would only work reliably in one direction.
The anti-fragmentation code could potentially be used to have subzone
groups that kept movable and unmovable allocations as far apart as
possible and at opposite ends of a zone. That approach has been kicked a
few times because of complexity.
Subtle difference. The amount of unmovable memory is calculated per node.
As evenly as possible.
I know, it's why find_zone_movable_pfns_for_nodes() is as complex as it
is. The mechanism spreads the unmovable memory evenly throughout all
nodes. In the event some nodes are too small to hold their share, the
remaining unmovable memory is divided between the nodes that are larger.
Not in all cases. Some systems will not know how many huge pages they need
in advance because it is used as a batch system running jobs as requested.
The zone allows an amount of memory to be set aside that can be
*optionally* used for hugepages if desired or base pages if not. Between
jobs, the hugepage pool can be resized up to the size of ZONE_MOVABLE.
The other case is ever supporting memory hot-remove. Any memory within
ZONE_MOVABLE can potentially be removed by migrating pages and off-lined.
I didn't say they were the largest supported contiguous block, I said they
were the largest contiguous block we *care* about. Right now, it is
assumed that variable pages are not supported at runtime. If they were,
some smarts would be needed to keep huge pages of the same size together
to control external fragmentation but that's about it.
The size doesn't really make much difference to the mechanism.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-