Re: [PATCH -mm 1/3] sysv ipc: increase msgmnb default value wrt. the number of cpus

Previous thread: [PATCH -mm 0/3] sysv ipc: increase msgmnb with the number of cpus by Solofo.Ramangalahy on Tuesday, June 24, 2008 - 2:34 am. (4 messages)

Next thread: [PATCH -mm 2/3] sysv ipc: recompute msgmnb (and msgmni) on cpu hotplug addition and removal by Solofo.Ramangalahy on Tuesday, June 24, 2008 - 2:34 am. (1 message)
From: Solofo.Ramangalahy
Date: Tuesday, June 24, 2008 - 2:34 am

From: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>

Initialize msgmnb value to
min(MSGMNB * num_online_cpus(), MSGMNB * MSG_CPU_SCALE)
to increase the default value for larger machines.

MSG_CPU_SCALE scaling factor is defined to be 4, as 16384 x 4 = 65536
is an already used and recommended value.

The msgmni value is made dependant of msgmnb to keep the memory
dedicated to message queues within the 1/MSG_MEM_SCALE of lowmem
bound.

Unlike msgmni, the value is not scaled (down) with respect to the
number of ipc namespaces for simplicity.

To disable recomputation when user explicitely set a value,
we reuse the callback defined for msgmni.

As msgmni and msgmnb are correlated, user settings of any of the two
disable recomputation of both, for now. This is refined in a later
patch.

When a negative value is put in /proc/sys/kernel/msgmnb
automatic recomputing is re-enabled.


Signed-off-by: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>

---
 Documentation/sysctl/kernel.txt |   28 ++++++++++++++++++++++++++++
 include/linux/msg.h             |    6 ++++++
 ipc/ipc_sysctl.c                |    5 +++--
 ipc/msg.c                       |   17 +++++++++++++----
 4 files changed, 50 insertions(+), 6 deletions(-)

Index: b/ipc/msg.c
===================================================================
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -38,6 +38,7 @@
 #include <linux/rwsem.h>
 #include <linux/nsproxy.h>
 #include <linux/ipc_namespace.h>
+#include <linux/cpumask.h>
 
 #include <asm/current.h>
 #include <asm/uaccess.h>
@@ -92,7 +93,7 @@ void recompute_msgmni(struct ipc_namespa
 
 	si_meminfo(&i);
 	allowed = (((i.totalram - i.totalhigh) / MSG_MEM_SCALE) * i.mem_unit)
-		/ MSGMNB;
+		/ ns->msg_ctlmnb;
 	nb_ns = atomic_read(&nr_ipc_ns);
 	allowed /= nb_ns;
 
@@ -108,11 +109,19 @@ void recompute_msgmni(struct ipc_namespa
 
 	ns->msg_ctlmni = allowed;
 }
+/*
+ * Scale msgmnb with the number of online cpus, up to 4x MSGMNB.
+ */
+void ...
From: Andrew Morton
Date: Tuesday, June 24, 2008 - 2:31 pm

On Tue, 24 Jun 2008 11:34:53 +0200


The magical positive-versus-negative number trick is a bit obscure, and
I don't think there's any precedent for it in the kernel ABI (which is
what this is).

Is there anything we can do to reduce the unusualness of this
interface?  Say, add a new /proc/sys/kernel/automatic-msgmnb which
contains the automatic scaling and leave /proc/sys/kernel/msgmnb
containing the manual scaling?  Or something like that?
--

From: Nadia Derbey
Date: Wednesday, June 25, 2008 - 3:34 am

Well, I plead guilty ;-)
I've done this proposal when sending the msgmni scaling patches 
(unfortunatly my network is down, so can't look the reference thread).
 From what I have in my folders here's the complete story:

. January 08: sent the patches
. 02/05/2008: got an answer from Yasunori Goto:

Yasunori Goto wrote:
 > Hmmm. I suppose this may be side effect which user does not wish.
 >
 > I would like to recommend there should be a switch which can turn
 > on/off
 > automatic recomputing.
 > If user would like to change this value, it should be turned off.
 > Otherwise, his requrest will be rejected with some messages.
 >
 > Probably, user can understand easier than this side effect.

. 02/11/2008: resent the patches after fixing the issues:

Nadia Derbey wrote:
 > Resending the set of patches after Yasunori's remark about being able
 > to turn on/off automatic recomputing.
 > (see message at http://lkml.org/lkml/2008/2/5/149).
 > I actually introduced an intermediate solution: when msgmni is set by
 > hand, it is uneregistered from the ipcns notifier chain (i.e.
 > automatic recomputing is disabled). This corresponds to an implicit
 > turn off. Setting it to a negative value makes it registered back in
 > the notifier chain (which corresponds to the turn on proposed by
 > Yasunaori).

And I don't remember anybody complaining about that :-(

Sorry for introducing this "magical positive-vs-negative # trick".

Will think a bit more about your suggestion.

Regards,
Nadia

--

From: Nadia Derbey
Date: Thursday, June 26, 2008 - 7:49 am

Well, I don't know if I well understood your proposal: is it 1 value in 
automatic-msgmnb and another one in msgmnb?
I don't clearly see how this could work.

IMHO, we should keep /proc/sys/kernel/msgmnb as a way to externalize the 
current tunable value (whether it is automatically recomputed or not).

Also keep the current strategy: as soon as a value is written into that 
file, give up with the automatic recomputing.

And use the file you propose as a way to go back and forth between 
automatic recomputing and manual setting.

So the process would be the following:
1) kernel boots in "automatic recomputing mode"
    /proc/kernel/sys/msgmni contains whatever value has been computed
    /proc/kernel/sys/automatic-msgmnb contains "ON"

2) echo <val> > /proc/kernel/sys/msgmnb
    . sets msg_ctlmnb to <val>
    . de-activates automatic recomputing (i.e. if, say, a cpu disappears
      it won't be recompiuted anymore)
    . /proc/kernel/sys/automatic-msgmnb now contains "OFF"

Echoing "OFF" into /proc/kernel/sys/automatic-msgmnb would have the same 
effect (except that msg_ctlmnb's value would stay blocked at its current 
value)

3) echo "ON" > /proc/kernel/sys/automatic-msgmnb
    . recomputes msgmnb's value based on the current available resources
    . re-activates automatic recomputing for msgmnb.

Of course, all this should be applied to msgmni too.
And may be this automatic-xxx file should be located under sysfs?
   --> create /sys/kernel/automatic directory and have 1 file per 
tunable to be scalled (who knows, may be we are adding other ones in th 
future?)

Now, may be this is what you actually proposed and I completely 
misunderstod it?

Regards,
Nadia
--

From: Andrew Morton
Date: Thursday, June 26, 2008 - 9:12 am

I don't know what I proposed, sorry ;)  I didn't think about it very hard.

But the positive-values-mean-one-thing/negative-values-mean-another-thing
trick is unusual and rather unpleasing.  I was hoping you guys could come up
with a cleaner interface.

--

Previous thread: [PATCH -mm 0/3] sysv ipc: increase msgmnb with the number of cpus by Solofo.Ramangalahy on Tuesday, June 24, 2008 - 2:34 am. (4 messages)

Next thread: [PATCH -mm 2/3] sysv ipc: recompute msgmnb (and msgmni) on cpu hotplug addition and removal by Solofo.Ramangalahy on Tuesday, June 24, 2008 - 2:34 am. (1 message)