Re: [PATCH] Infiniband: Randomize local port allocation.

Previous thread: Help Needed ( working on kernel ) by Sameer Rahmani on Monday, April 12, 2010 - 2:59 am. (2 messages)

Next thread: make xconfig needs qt-mt according to error msg by Gene Heskett on Monday, April 12, 2010 - 4:55 am. (2 messages)
From: Amerigo Wang
Date: Monday, April 12, 2010 - 3:03 am

Changes from the previous version:
- Rename proc_{get,put}_ulong to proc_{get,put}_long();
- Fix potential dead loop problems in cma code.

------------->

This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which
allows users to reserve ports for third-party applications.

The reserved ports will not be used by automatic port assignments
(e.g. when calling connect() or bind() with port number 0). Explicit
port allocation behavior is unchanged.

There are still some miss behaviors with regard to proc parsing in odd
invalid cases (for "40000\0-40001" all is acknowledged but only 40000
is accepted) but they are not easy to fix without changing the current
"acknowledge how much we accepted" behavior.

Because of that and because the same issues are present in the
existing proc_dointvec code as well I don't think its worth holding
the actual feature (port reservation) after such petty error recovery
issues.


--

From: Amerigo Wang
Date: Monday, April 12, 2010 - 3:04 am

From: Octavian Purdila <opurdila@ixiacom.com>

As we are about to add another integer handling proc function a little
bit of cleanup is in order: add a few helper functions to improve code
readability and decrease code duplication.

In the process a bug is also fixed: if the user specifies a number
with more then 20 digits it will be interpreted as two integers
(e.g. 10000...13 will be interpreted as 100.... and 13).

Behavior for EFAULT handling was changed as well. Previous to this
patch, when an EFAULT error occurred in the middle of a write
operation, although some of the elements were set, that was not
acknowledged to the user (by shorting the write and returning the
number of bytes accepted). EFAULT is now treated just like any other
errors by acknowledging the amount of bytes accepted.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---

Index: linux-2.6/kernel/sysctl.c
===================================================================
--- linux-2.6.orig/kernel/sysctl.c
+++ linux-2.6/kernel/sysctl.c
@@ -2040,8 +2040,148 @@ int proc_dostring(struct ctl_table *tabl
 			       buffer, lenp, ppos);
 }
 
+static int proc_skip_wspace(char __user **buf, size_t *size)
+{
+	char c;
+
+	while (*size) {
+		if (get_user(c, *buf))
+			return -EFAULT;
+		if (!isspace(c))
+			break;
+		(*size)--;
+		(*buf)++;
+	}
+
+	return 0;
+}
+
+static bool isanyof(char c, const char *v, unsigned len)
+{
+	int i;
+
+	if (!len)
+		return false;
+
+	for (i = 0; i < len; i++)
+		if (c == v[i])
+			break;
+	if (i == len)
+		return false;
+
+	return true;
+}
+
+#define TMPBUFLEN 22
+/**
+ * proc_get_long - reads an ASCII formated integer from a user buffer
+ *
+ * @buf - user buffer
+ * @size - size of the user buffer
+ * @val - this is where the number will be stored
+ * @neg - set to %TRUE if number is negative
+ * @perm_tr - a vector which contains the allowed ...
From: Alexey Dobriyan
Date: Tuesday, April 13, 2010 - 4:18 am

ULONG_MAX is not 22 digits always.

The fix is to not rely on simple_strtoul()

I guess it's time to finally remove it. :-(

Also, it's better to copy_from user stuff once.


A what?
--

From: Cong Wang
Date: Tuesday, April 13, 2010 - 12:35 am

It seems that all proc code assumes that the input buffer will


Hmm, right, it should be memchr(v, c, len).

Thanks!

--

From: Amerigo Wang
Date: Monday, April 12, 2010 - 3:04 am

From: Octavian Purdila <opurdila@ixiacom.com>

This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which
allows users to reserve ports for third-party applications.

The reserved ports will not be used by automatic port assignments
(e.g. when calling connect() or bind() with port number 0). Explicit
port allocation behavior is unchanged.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---

Index: linux-2.6/Documentation/networking/ip-sysctl.txt
===================================================================
--- linux-2.6.orig/Documentation/networking/ip-sysctl.txt
+++ linux-2.6/Documentation/networking/ip-sysctl.txt
@@ -588,6 +588,37 @@ ip_local_port_range - 2 INTEGERS
 	(i.e. by default) range 1024-4999 is enough to issue up to
 	2000 connections per second to systems supporting timestamps.
 
+ip_local_reserved_ports - list of comma separated ranges
+	Specify the ports which are reserved for known third-party
+	applications. These ports will not be used by automatic port
+	assignments (e.g. when calling connect() or bind() with port
+	number 0). Explicit port allocation behavior is unchanged.
+
+	The format used for both input and output is a comma separated
+	list of ranges (e.g. "1,2-4,10-10" for ports 1, 2, 3, 4 and
+	10). Writing to the file will clear all previously reserved
+	ports and update the current list with the one given in the
+	input.
+
+	Note that ip_local_port_range and ip_local_reserved_ports
+	settings are independent and both are considered by the kernel
+	when determining which ports are available for automatic port
+	assignments.
+
+	You can reserve ports which are not in the current
+	ip_local_port_range, e.g.:
+
+	$ cat /proc/sys/net/ipv4/ip_local_port_range
+	32000	61000
+	$ cat ...
From: Tetsuo Handa
Date: Monday, April 12, 2010 - 6:21 pm

I think above part is wrong. Below program
--------------------
#include <linux/module.h>
#include <linux/sched.h>
#include <linux/idr.h>

static DEFINE_IDR(idr);
static int idr_demo_init(void)
{
	int next_port = 65530;
	int port = 0;
	int ret = -EINTR;
	while (!signal_pending(current)) {
		msleep(1000);
		ret = idr_get_new_above(&idr, NULL, next_port, &port);
		printk(KERN_INFO "idr_get_new_above() = %d\n", ret);
		if (!ret) {
			/* Emulate inet_is_reserved_local_port(port) = true */
			printk(KERN_INFO "Port %u is reserved.\n", port);
			ret = -EAGAIN;
		}
		if (ret == -EAGAIN) {
			if (idr_pre_get(&idr, GFP_KERNEL)) {
				printk(KERN_INFO "idr_pre_get() succeeded.\n");
				continue;
			}
			printk(KERN_INFO "idr_pre_get() failed.\n");
			break;
		} else {
			printk(KERN_INFO "next_port=%u port=%u\n",
			       next_port, port);
			break;
		}
	}
	if (!ret)
		idr_remove(&idr, port);
	idr_destroy(&idr);
	return -EINVAL;
}
module_init(idr_demo_init);
MODULE_LICENSE("GPL");
--------------------
generated below output.

idr_get_new_above() = -11
idr_pre_get() succeeded.
idr_get_new_above() = 0
Port 65530 is reserved.
idr_pre_get() succeeded.
idr_get_new_above() = 0
Port 65531 is reserved.
idr_pre_get() succeeded.
idr_get_new_above() = 0
Port 65532 is reserved.
idr_pre_get() succeeded.
idr_get_new_above() = 0
Port 65533 is reserved.
idr_pre_get() succeeded.
idr_get_new_above() = 0
Port 65534 is reserved.
idr_pre_get() succeeded.
idr_get_new_above() = 0
Port 65535 is reserved.
idr_pre_get() succeeded.
idr_get_new_above() = 0
Port 65536 is reserved.
idr_pre_get() succeeded.
idr_get_new_above() = 0
Port 65537 is reserved.
idr_pre_get() succeeded.
idr_get_new_above() = 0
(...snipped...)

This result suggests that above loop will continue until idr_pre_get() fails
due to out of memory if all ports were reserved.

Also, if idr_get_new_above() returned 0, bind_list (which is a kmalloc()ed
pointer) is already installed into a free slot (see ...
From: Cong Wang
Date: Tuesday, April 13, 2010 - 12:13 am

Thanks for testing!

How about:

+		if (!ret && inet_is_reserved_local_port(port))
+			ret = -EBUSY;

? So that it will break the loop and return error.
--

From: Cong Wang
Date: Tuesday, April 13, 2010 - 1:48 am

Or use the similar trick:

 int tries = 10;
...

 if(!ret && inet_is_reserved_local_port(port)) {
   if (tries--)
     ret = -EAGAIN;
   else
     ret = -EBUSY;
 }

Any comments?
--

From: Tetsuo Handa
Date: Tuesday, April 13, 2010 - 6:07 am

Hello.

Adding Sean Hefty and Roland Dreier as drivers/infiniband/core/cma.c maintainer.

I don't like above change. Above change makes local port assignment from
"likely-succeed" (succeeds if one port is available from thousands of ports) to
"unlikely-succeed" (fail if randomly chosen port is already in use).
We should repeat for all ranges specified in /proc/sys/net/ipv4/ip_local_port_range .

cma_alloc_any_port() and cma_alloc_port() are almost identical.
Thus, I think we can call cma_alloc_port() from cma_alloc_any_port().

Sean and Roland, is below patch correct?
inet_is_reserved_local_port() is the new function proposed in this patchset.

---
 drivers/infiniband/core/cma.c |   68 ++++++++++++++----------------------------
 1 file changed, 23 insertions(+), 45 deletions(-)

--- linux-2.6.34-rc4.orig/drivers/infiniband/core/cma.c
+++ linux-2.6.34-rc4/drivers/infiniband/core/cma.c
@@ -79,7 +79,6 @@ static DEFINE_IDR(sdp_ps);
 static DEFINE_IDR(tcp_ps);
 static DEFINE_IDR(udp_ps);
 static DEFINE_IDR(ipoib_ps);
-static int next_port;
 
 struct cma_device {
 	struct list_head	list;
@@ -1970,47 +1969,31 @@ err1:
 
 static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 {
-	struct rdma_bind_list *bind_list;
-	int port, ret, low, high;
-
-	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
-	if (!bind_list)
-		return -ENOMEM;
-
-retry:
-	/* FIXME: add proper port randomization per like inet_csk_get_port */
-	do {
-		ret = idr_get_new_above(ps, bind_list, next_port, &port);
-	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
-
-	if (ret)
-		goto err1;
+	static unsigned int last_used_port;
+	int low, high, remaining;
+	unsigned int rover;
 
 	inet_get_local_port_range(&low, &high);
-	if (port > high) {
-		if (next_port != low) {
-			idr_remove(ps, port);
-			next_port = low;
-			goto retry;
+	remaining = (high - low) + 1;
+	rover = net_random() % remaining + low;
+	do {
+		rover++;
+		if ((rover < low) || (rover > ...
From: Sean Hefty
Date: Tuesday, April 13, 2010 - 9:32 am

It looks correct to me.  I didn't test the patch series, but if I comment out
the call to inet_is_reserved_local_port() in the provided below, the changes
worked fine for me.

Acked-by: Sean Hefty <sean.hefty@intel.com>

--

From: penguin-kernel
Date: Tuesday, April 13, 2010 - 7:01 pm

Thank you for testing.

I think it is better to split this patch into

Part 1: Make cma_alloc_any_port() to use cma_alloc_port().

Part 2: Insert "!inet_is_reserved_local_port(rover) &&" line.

for future "git bisect".

Roland, will you review below patch for part 1?
--------------------
[PATCH] Infiniband: Randomize local port allocation.

Randomize local port allocation in a way sctp_get_port_local() does.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/infiniband/core/cma.c |   69 ++++++++++++++----------------------------
 1 file changed, 24 insertions(+), 45 deletions(-)

--- linux-2.6.34-rc4.orig/drivers/infiniband/core/cma.c
+++ linux-2.6.34-rc4/drivers/infiniband/core/cma.c
@@ -79,7 +79,6 @@ static DEFINE_IDR(sdp_ps);
 static DEFINE_IDR(tcp_ps);
 static DEFINE_IDR(udp_ps);
 static DEFINE_IDR(ipoib_ps);
-static int next_port;
 
 struct cma_device {
 	struct list_head	list;
@@ -1970,47 +1969,32 @@ err1:
 
 static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 {
-	struct rdma_bind_list *bind_list;
-	int port, ret, low, high;
-
-	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
-	if (!bind_list)
-		return -ENOMEM;
-
-retry:
-	/* FIXME: add proper port randomization per like inet_csk_get_port */
-	do {
-		ret = idr_get_new_above(ps, bind_list, next_port, &port);
-	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
-
-	if (ret)
-		goto err1;
+	static unsigned int last_used_port;
+	int low, high, remaining;
+	unsigned int rover;
 
 	inet_get_local_port_range(&low, &high);
-	if (port > high) {
-		if (next_port != low) {
-			idr_remove(ps, port);
-			next_port = low;
-			goto retry;
+	remaining = (high - low) + 1;
+	rover = net_random() % remaining + low;
+	do {
+		rover++;
+		if ((rover < low) || (rover > high))
+			rover = low;
+		if (last_used_port != rover &&
+		    !idr_find(ps, (unsigned short) rover)) {
+			int ret = cma_alloc_port(ps, id_priv, rover);
+			/*
+			 * ...
From: Cong Wang
Date: Tuesday, April 13, 2010 - 9:38 pm

Right, thanks a lot for your work!

So, I will rebase my patch 3/3 on top of this patch. I hope someone
could take this one asap.

--

From: Sean Hefty
Date: Wednesday, April 14, 2010 - 5:01 pm

Thanks for fixing this long outstanding issue.  :)  The latest patch looks
correct and passed some simple tests that I ran against it.  One comment below,

Assuming that we're likely to pick a valid port on the first try, it would be


--

From: Tetsuo Handa
Date: Wednesday, April 14, 2010 - 7:29 pm

Indeed. I moved these lines to "if (--remaining) { ... }" block.
--------------------
[PATCH] Infiniband: Randomize local port allocation.

Randomize local port allocation in a way sctp_get_port_local() does.
Update rover at the end of loop since we're likely to pick a valid port
on the first try.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/infiniband/core/cma.c |   70 +++++++++++++++---------------------------
 1 file changed, 25 insertions(+), 45 deletions(-)

--- linux-2.6.34-rc4.orig/drivers/infiniband/core/cma.c
+++ linux-2.6.34-rc4/drivers/infiniband/core/cma.c
@@ -79,7 +79,6 @@ static DEFINE_IDR(sdp_ps);
 static DEFINE_IDR(tcp_ps);
 static DEFINE_IDR(udp_ps);
 static DEFINE_IDR(ipoib_ps);
-static int next_port;
 
 struct cma_device {
 	struct list_head	list;
@@ -1970,47 +1969,33 @@ err1:
 
 static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 {
-	struct rdma_bind_list *bind_list;
-	int port, ret, low, high;
-
-	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
-	if (!bind_list)
-		return -ENOMEM;
-
-retry:
-	/* FIXME: add proper port randomization per like inet_csk_get_port */
-	do {
-		ret = idr_get_new_above(ps, bind_list, next_port, &port);
-	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
-
-	if (ret)
-		goto err1;
+	static unsigned int last_used_port;
+	int low, high, remaining;
+	unsigned int rover;
 
 	inet_get_local_port_range(&low, &high);
-	if (port > high) {
-		if (next_port != low) {
-			idr_remove(ps, port);
-			next_port = low;
-			goto retry;
-		}
-		ret = -EADDRNOTAVAIL;
-		goto err2;
+	remaining = (high - low) + 1;
+	rover = net_random() % remaining + low;
+retry:
+	if (last_used_port != rover &&
+	    !idr_find(ps, (unsigned short) rover)) {
+		int ret = cma_alloc_port(ps, id_priv, rover);
+		/*
+		 * Remember previously used port number in order to avoid
+		 * re-using same port immediately after it is closed.
+		 */
+		if ...
From: Sean Hefty
Date: Thursday, April 15, 2010 - 12:55 pm

I like this version, thanks!  I'm not sure which tree to merge it through.

--

From: Cong Wang
Date: Thursday, April 15, 2010 - 7:22 pm

As soon as possible, so 2.6.34. :)
--

From: Tetsuo Handa
Date: Friday, April 16, 2010 - 6:54 am

Cong, merge window for 2.6.34 was already closed.
You need to make your patchset towards 2.6.35 (using net-next-2.6 tree)
rather than 2.6.34 (using linux-2.6 tree). Therefore, this patch being
queued for 2.6.35 (through net-next-2.6 tree) should be okay for you.
--

From: David Miller
Date: Friday, April 16, 2010 - 1:30 pm

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

I don't take RDMA patches into net-next-2.6, the less I touch this
stack avoiding stuff the better and Roland has been taking this stuff
into his own tree for some time now.
--

From: Cong Wang
Date: Monday, April 19, 2010 - 9:34 pm

I left for a few days.

Ok, so I will wait for this to be merged.

Thanks, David and Tetsuo!

--

From: Roland Dreier
Date: Wednesday, April 21, 2010 - 4:19 pm

Thanks, applied this part of the patch -- I preferred this one since the
goto into the middle of a loop seemed worse than a goto out of the loop...
-- 
Roland Dreier <rolandd@cisco.com> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--

From: Roland Dreier
Date: Wednesday, April 21, 2010 - 4:22 pm

> Thanks, applied this part of the patch -- I preferred this one since the

err, not "part of the patch" -- I meant "this version of the patch".
-- 
Roland Dreier <rolandd@cisco.com> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--

Previous thread: Help Needed ( working on kernel ) by Sameer Rahmani on Monday, April 12, 2010 - 2:59 am. (2 messages)

Next thread: make xconfig needs qt-mt according to error msg by Gene Heskett on Monday, April 12, 2010 - 4:55 am. (2 messages)