Re: speed regression in udp_lib_lport_inuse()

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Eric Dumazet
Date: Thursday, January 22, 2009 - 3:06 pm

Vitaly Mayatskikh a écrit :

Bug here, if bind() returns -1 (all ports are in use)


Hello Vitaly, thanks for this excellent report.

Yes, current code is really not good when all ports are in use :

We now have to scan 28232 [1] times long chains of 220 sockets.
Thats very long (but at least thread is preemptable)

In the past (before patches), only one thread was allowed to run in kernel while scanning
udp port table (we had only one global lock udp_hash_lock protecting the whole udp table).
This thread was faster because it was not slowed down by other threads.
(But the rwlock we used was responsible for starvations of writers if many UDP frames
were received)



One way to solve the problem could be to use following :

1) Raising UDP_HTABLE_SIZE from 128 to 1024 to reduce average chain lengths.

2) In bind(0) algo, use rcu locking to find a possible usable port. All cpus can run in //, without
dirtying locks. Then lock the found chain and recheck port is available before using it.

[1] replace 28232 by your actual /proc/sys/net/ipv4/ip_local_port_range values
61000 - 32768 = 28232

I will try to code a patch before this week end.

Thanks

Note : I tried to use a mutex to force only one thread in bind(0) code but got no real speedup.
But it should help if you have a SMP machine, since only one cpu will be busy in bind(0)


diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cf5ab05..a572407 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -155,6 +155,8 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 	struct udp_hslot *hslot;
 	struct udp_table *udptable = sk->sk_prot->h.udp_table;
 	int    error = 1;
+	static DEFINE_MUTEX(bind0_mutex);
+	int mutex_acquired = 0;
 	struct net *net = sock_net(sk);
 
 	if (!snum) {
@@ -162,6 +164,8 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 		unsigned rand;
 		unsigned short first;
 
+		mutex_lock(&bind0_mutex);
+		mutex_acquired = 1;
 		inet_get_local_port_range(&low, &high);
 		remaining = (high - low) + 1;
 
@@ -196,6 +200,8 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 fail_unlock:
 	spin_unlock_bh(&hslot->lock);
 fail:
+	if (mutex_acquired)
+		mutex_unlock(&bind0_mutex);
 	return error;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
speed regression in udp_lib_lport_inuse(), Vitaly Mayatskikh, (Thu Jan 22, 11:49 am)
Re: speed regression in udp_lib_lport_inuse(), Eric Dumazet, (Thu Jan 22, 3:06 pm)
Re: speed regression in udp_lib_lport_inuse(), Evgeniy Polyakov, (Thu Jan 22, 3:14 pm)
Re: speed regression in udp_lib_lport_inuse(), Vitaly Mayatskikh, (Thu Jan 22, 3:40 pm)
Re: speed regression in udp_lib_lport_inuse(), Eric Dumazet, (Thu Jan 22, 5:14 pm)
Re: speed regression in udp_lib_lport_inuse(), Eric Dumazet, (Thu Jan 22, 5:20 pm)
Re: speed regression in udp_lib_lport_inuse(), Vitaly Mayatskikh, (Fri Jan 23, 2:42 am)
Re: speed regression in udp_lib_lport_inuse(), Eric Dumazet, (Fri Jan 23, 4:45 am)
Re: speed regression in udp_lib_lport_inuse(), Eric Dumazet, (Fri Jan 23, 6:44 am)
Re: speed regression in udp_lib_lport_inuse(), Vitaly Mayatskikh, (Fri Jan 23, 7:56 am)
Re: speed regression in udp_lib_lport_inuse(), Eric Dumazet, (Fri Jan 23, 9:05 am)
Re: speed regression in udp_lib_lport_inuse(), Vitaly Mayatskikh, (Fri Jan 23, 9:14 am)
[PATCH] udp: optimize bind(0) if many ports are in use, Eric Dumazet, (Mon Jan 26, 1:20 am)