Change the interface to use kilobytes instead of pages. Page sizes can vary
across platforms and configurations. A new strategy routine has been added
to the resource counters infrastructure to format the data as desired.Suggested by David Rientjes, Andrew Morton and Herbert Poetzl
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---Documentation/controllers/memory.txt | 7 +++--
include/linux/res_counter.h | 6 ++--
kernel/res_counter.c | 24 +++++++++++++----
mm/memcontrol.c | 47 +++++++++++++++++++++++++++--------
4 files changed, 64 insertions(+), 20 deletions(-)diff -puN mm/memcontrol.c~mem-control-make-ui-use-kilobytes mm/memcontrol.c
--- linux-2.6.23-rc3/mm/memcontrol.c~mem-control-make-ui-use-kilobytes 2007-08-28 13:20:44.000000000 +0530
+++ linux-2.6.23-rc3-balbir/mm/memcontrol.c 2007-08-29 14:36:07.000000000 +0530
@@ -32,6 +32,7 @@struct container_subsys mem_container_subsys;
static const int MEM_CONTAINER_RECLAIM_RETRIES = 5;
+static const int MEM_CONTAINER_CHARGE_KB = (PAGE_SIZE >> 10);/*
* The memory controller data structure. The memory controller controls both
@@ -312,7 +313,7 @@ int mem_container_charge(struct page *pa
* If we created the page_container, we should free it on exceeding
* the container limit.
*/
- while (res_counter_charge(&mem->res, 1)) {
+ while (res_counter_charge(&mem->res, MEM_CONTAINER_CHARGE_KB)) {
if (try_to_free_mem_container_pages(mem))
continue;@@ -352,7 +353,7 @@ int mem_container_charge(struct page *pa
kfree(pc);
pc = race_pc;
atomic_inc(&pc->ref_cnt);
- res_counter_uncharge(&mem->res, 1);
+ res_counter_uncharge(&mem->res, MEM_CONTAINER_CHARGE_KB);
css_put(&mem->css);
goto done;
}
@@ -417,7 +418,7 @@ void mem_container_uncharge(struct page_
css_put(&mem->css);
page_assign_page_container(page, NULL);
unlock_page_container(page);
- ...
Do these changes really need to happen anywhere besides the
user<->kernel boundary? Why can't internal tracking be in pages?-- Dave
-
I've thought about this before. The problem is that a user could
set his limit to 10000 bytes, but would then see the usage and
limit round to the closest page boundary. This can be confusing
to a user.--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
True, but we're lying if we allow a user to set their limit there,
because we can't actually enforce a limit at 8,192 bytes vs 10,000.
They're the same limit as far as the kernel is concerned.Why not just -EINVAL if the value isn't page-aligned? There are plenty
of interfaces in the kernel that require userspace to know the page
size, so this shouldn't be too difficult.-- Dave
-
True, mmap() is a good example of such an interface for developers, I
am not sure about system admins though.To quote Andrew
<quote>
Reporting tools could run getpagesize() and do the arithmetic, but we
generally try to avoid exposing PAGE_SIZE, HZ, etc to userspace in this
manner.
</quote>--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
Well, rounding to PAGE_SIZE exposes PAGE_SIZE as well, just in a
non-intuitive fashion. :)If we're going to modify what the user specifies, we should probably at
least mandate that writes are only a "suggestion" and users must read
back the value to ensure what actually got committed.If we're going to round in any direction, shouldn't we round up? If a
user specifies 4097 bytes and uses two pages, we don't want to complain
when they hit that second page.-- Dave
-
Absolutely, I used rounding to mean round up, truncation for rounding down.
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
I'd argue that having the user's specified limit be truncated to the
page size is less confusing than giving an EINVAL if it's not page
aligned.Paul
-
Do we truncate mmap() values to the nearest page so to not confuse the
user? ;)Imagine a careful application setting and accounting for limits on a
long-running system. Might its internal accounting get sufficiently
misaligned from the kernel's after a while to cause a problem?
Truncating values like that would appear reserve significantly less
memory than desired over a long period of time.-- Dave
-
I think rounding to the closest page size is a better option, but
again it can be a bit confusing. I am all for using memparse() to
parse the user input as a specification of the memory limit.The second question of how to store it internally without truncation/
rounding is something we need to agree upon. We also need to see
how to display the data back to the user.I chose kilobytes for two reasons
1. Several people recommended it
2. Herbert mentioned that they've moved to that interface and it
was working fine for them.--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTLPS: I am going off to the web to search for some CUI/CLI guidelines.
-
On Thu, 30 Aug 2007 04:07:11 +0530
I have no strong opinion. But how about Mega bytes ? (too big ?)
There will be no rounding up/down problem.-Kame.
-
Here is what I am thinking, allow the user to input bytes/kilobytes/
megabytes or gigabytes. Store the data internally in kilobytes or
PFN. I prefer kilobytes (no rounding issues), but while implementing
limits we round up to the closest PFN.--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
This seems a bit inconsistent - if you write a value to a limit file,
then the value that you read back is reduced by a factor of 1024?
Having the "(kB)" suffix isn't really a big help to automated
middleware.I'd still be in favour of just reading/writing 64-bit values
representing bytes - simple, and unambiguous for programmatic use, and
not really any less user-friendly than kilobytes for manual use
(since the numbers involved are going to be unwieldly for manual use
whether they're in bytes or kB).Paul
-
Why is that? Is it because you could write 4M and see it show up
as 4096 kilobytes? We'll that can be fixed with another variant64 bit might be an overkill for 32 bit machines. 32 bit machines with
PAE cannot use 32 bit values, they need 64 bits. I think KiloBytes--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
I was thinking the other way around - you can write 1048576 (i.e. 1MB)
to the file and read back 1024. It just seems to me that it's clearerHow is using a 64-bit value for consistency overkill?
As someone pointed out, 4TB machines probably aren't that far around
the corner (if they're not here already) so even if you use KB rather
than bytes, userspace needs to be using an int64 for this value in
case it ends up running as a 32-bit-compiled app on a 64-bit kernel
with lots of memory.Paul
-
| monstr | [PATCH 27/56] microblaze_v2: support for a.out |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Rafael J. Wysocki | [Bug #10493] mips BCM47XX compile error |
git: | |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| Frans Pop | svc: failed to register lockdv1 RPC service (errno 97). |
