From: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net> Initialize msgmnb value to min(MSGMNB * num_online_cpus(), MSGMNB * MSG_CPU_SCALE) to increase the default value for larger machines. MSG_CPU_SCALE scaling factor is defined to be 4, as 16384 x 4 = 65536 is an already used and recommended value. The msgmni value is made dependant of msgmnb to keep the memory dedicated to message queues within the 1/MSG_MEM_SCALE of lowmem bound. Unlike msgmni, the value is not scaled (down) with respect to the number of ipc namespaces for simplicity. To disable recomputation when user explicitely set a value, we reuse the callback defined for msgmni. As msgmni and msgmnb are correlated, user settings of any of the two disable recomputation of both, for now. This is refined in a later patch. When a negative value is put in /proc/sys/kernel/msgmnb automatic recomputing is re-enabled. Signed-off-by: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net> --- Documentation/sysctl/kernel.txt | 28 ++++++++++++++++++++++++++++ include/linux/msg.h | 6 ++++++ ipc/ipc_sysctl.c | 5 +++-- ipc/msg.c | 17 +++++++++++++---- 4 files changed, 50 insertions(+), 6 deletions(-) Index: b/ipc/msg.c =================================================================== --- a/ipc/msg.c +++ b/ipc/msg.c @@ -38,6 +38,7 @@ #include <linux/rwsem.h> #include <linux/nsproxy.h> #include <linux/ipc_namespace.h> +#include <linux/cpumask.h> #include <asm/current.h> #include <asm/uaccess.h> @@ -92,7 +93,7 @@ void recompute_msgmni(struct ipc_namespa si_meminfo(&i); allowed = (((i.totalram - i.totalhigh) / MSG_MEM_SCALE) * i.mem_unit) - / MSGMNB; + / ns->msg_ctlmnb; nb_ns = atomic_read(&nr_ipc_ns); allowed /= nb_ns; @@ -108,11 +109,19 @@ void recompute_msgmni(struct ipc_namespa ns->msg_ctlmni = allowed; } +/* + * Scale msgmnb with the number of online cpus, up to 4x MSGMNB. + */ +void ...
On Tue, 24 Jun 2008 11:34:53 +0200 The magical positive-versus-negative number trick is a bit obscure, and I don't think there's any precedent for it in the kernel ABI (which is what this is). Is there anything we can do to reduce the unusualness of this interface? Say, add a new /proc/sys/kernel/automatic-msgmnb which contains the automatic scaling and leave /proc/sys/kernel/msgmnb containing the manual scaling? Or something like that? --
Well, I plead guilty ;-) I've done this proposal when sending the msgmni scaling patches (unfortunatly my network is down, so can't look the reference thread). From what I have in my folders here's the complete story: . January 08: sent the patches . 02/05/2008: got an answer from Yasunori Goto: Yasunori Goto wrote: > Hmmm. I suppose this may be side effect which user does not wish. > > I would like to recommend there should be a switch which can turn > on/off > automatic recomputing. > If user would like to change this value, it should be turned off. > Otherwise, his requrest will be rejected with some messages. > > Probably, user can understand easier than this side effect. . 02/11/2008: resent the patches after fixing the issues: Nadia Derbey wrote: > Resending the set of patches after Yasunori's remark about being able > to turn on/off automatic recomputing. > (see message at http://lkml.org/lkml/2008/2/5/149). > I actually introduced an intermediate solution: when msgmni is set by > hand, it is uneregistered from the ipcns notifier chain (i.e. > automatic recomputing is disabled). This corresponds to an implicit > turn off. Setting it to a negative value makes it registered back in > the notifier chain (which corresponds to the turn on proposed by > Yasunaori). And I don't remember anybody complaining about that :-( Sorry for introducing this "magical positive-vs-negative # trick". Will think a bit more about your suggestion. Regards, Nadia --
Well, I don't know if I well understood your proposal: is it 1 value in
automatic-msgmnb and another one in msgmnb?
I don't clearly see how this could work.
IMHO, we should keep /proc/sys/kernel/msgmnb as a way to externalize the
current tunable value (whether it is automatically recomputed or not).
Also keep the current strategy: as soon as a value is written into that
file, give up with the automatic recomputing.
And use the file you propose as a way to go back and forth between
automatic recomputing and manual setting.
So the process would be the following:
1) kernel boots in "automatic recomputing mode"
/proc/kernel/sys/msgmni contains whatever value has been computed
/proc/kernel/sys/automatic-msgmnb contains "ON"
2) echo <val> > /proc/kernel/sys/msgmnb
. sets msg_ctlmnb to <val>
. de-activates automatic recomputing (i.e. if, say, a cpu disappears
it won't be recompiuted anymore)
. /proc/kernel/sys/automatic-msgmnb now contains "OFF"
Echoing "OFF" into /proc/kernel/sys/automatic-msgmnb would have the same
effect (except that msg_ctlmnb's value would stay blocked at its current
value)
3) echo "ON" > /proc/kernel/sys/automatic-msgmnb
. recomputes msgmnb's value based on the current available resources
. re-activates automatic recomputing for msgmnb.
Of course, all this should be applied to msgmni too.
And may be this automatic-xxx file should be located under sysfs?
--> create /sys/kernel/automatic directory and have 1 file per
tunable to be scalled (who knows, may be we are adding other ones in th
future?)
Now, may be this is what you actually proposed and I completely
misunderstod it?
Regards,
Nadia
--
I don't know what I proposed, sorry ;) I didn't think about it very hard. But the positive-values-mean-one-thing/negative-values-mean-another-thing trick is unusual and rather unpleasing. I was hoping you guys could come up with a cleaner interface. --
