[PATCH 0/2] udp: Convert the UDP hash lock to RCU

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Eric Dumazet
Date: Tuesday, October 28, 2008 - 1:37 pm

UDP sockets are hashed in a 128 slots hash table.

This hash table is protected by *one* rwlock.

This rwlock is readlocked each time an incoming UDP message is handled.

This rwlock is writelocked each time a socket must be inserted in
hash table (bind time), or deleted from this table (unbind time)

This is not scalable on SMP machines :

1) Even in read mode, lock() and unlock() are atomic operations and
must dirty a contended cache line, shared by all cpus.

2) A writer might be starved if many readers are 'in flight'. This can
happen on a machine with some NIC receiving many UDP messages. User
process can be delayed a long time at socket creation/dismantle time.


What Corey and I propose is to use RCU to protect this hash table.

Goals are :

1) Optimizing handling of incoming Unicast UDP frames, so that no memory
writes should happen in the fast path. Using an array of rwlocks (one per
slot for example is not an option in this regard)

Note: Multicasts and broadcasts still will need to take a lock,
because doing a full lockless lookup in this case is difficult.

2) No expensive operations in the socket bind/unhash phases :
  - No expensive synchronize_rcu() calls.

  - No added rcu_head in socket structure, increasing memory needs,
  but more important, forcing us to use call_rcu() calls,
  that have the bad property of making sockets structure cold.
  (rcu grace period between socket freeing and its potential reuse
   make this socket being cold in CPU cache).
  David did a previous patch using call_rcu() and noticed a 20%
  impact on TCP connection rates.

  Quoting Cristopher Lameter :
  "Right. That results in cacheline cooldown. You'd want to recycle
   the object as they are cache hot on a per cpu basis. That is screwed
   up by the delayed regular rcu processing. We have seen multiple
   regressions due to cacheline cooldown.
   The only choice in cacheline hot sensitive areas is to deal with the
   complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."

  - Because udp sockets are allocated from dedicated kmem_cache,
  use of SLAB_DESTROY_BY_RCU can help here.

Theory of operation :
---------------------

As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
special attention must be taken by readers and writers.

Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
reused, inserted in a different chain or in worst case in the same chain
while readers could do lookups in the same time.

In order to avoid loops, a reader must check each socket found in a chain
really belongs to the chain the reader was traversing. If it finds a
mismatch, lookup must start again at the begining. This *restart* loop
is the reason we had to use rdlock for the multicast case, because
we dont want to send same message several times to the same socket.

We use RCU only for fast path. Thus, /proc/net/udp still take rdlocks.


Work splited on two patches.

[PATCH 1/2] udp: introduce struct udp_table and multiple rwlocks

Introduction 'struct udp_table and struct udp_hslot',
with one rwlock per chain, instead of a global one.
Some cleanups were done to ease review of next patch.

[PATCH 2/2] udp: RCU handling for Unicast packets.


Tests done on a dual quad core machine (8 cpus) with IPV4 only were
pretty good, since some microbenches ran ten times faster. 

Many thanks to all contributors (David Miller, Christoph Lameter,
Peter Zijlstra, Stephen Hemminger, Paul E. McKenney, Evgeniy Polyakov)
for their review/comments on initial Corey work.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH 3/3] Convert the UDP hash lock to RCU, Corey Minyard, (Mon Oct 6, 11:50 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Eric Dumazet, (Mon Oct 6, 2:22 pm)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, David Miller, (Mon Oct 6, 2:40 pm)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Corey Minyard, (Mon Oct 6, 3:07 pm)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Corey Minyard, (Mon Oct 6, 4:08 pm)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Eric Dumazet, (Mon Oct 6, 10:24 pm)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Peter Zijlstra, (Tue Oct 7, 1:17 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Peter Zijlstra, (Tue Oct 7, 1:31 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Evgeniy Polyakov, (Tue Oct 7, 1:37 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Benny Amorsen, (Tue Oct 7, 1:54 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Eric Dumazet, (Tue Oct 7, 2:24 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Eric Dumazet, (Tue Oct 7, 5:59 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Stephen Hemminger, (Tue Oct 7, 7:07 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Christoph Lameter, (Tue Oct 7, 7:15 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Christoph Lameter, (Tue Oct 7, 7:16 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Evgeniy Polyakov, (Tue Oct 7, 7:29 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Paul E. McKenney, (Tue Oct 7, 7:33 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Paul E. McKenney, (Tue Oct 7, 7:36 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Paul E. McKenney, (Tue Oct 7, 7:38 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Christoph Lameter, (Tue Oct 7, 7:38 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Christoph Lameter, (Tue Oct 7, 7:45 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Eric Dumazet, (Tue Oct 7, 7:50 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Paul E. McKenney, (Tue Oct 7, 8:05 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Eric Dumazet, (Tue Oct 7, 8:07 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Paul E. McKenney, (Tue Oct 7, 8:07 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Peter Zijlstra, (Tue Oct 7, 8:09 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Christoph Lameter, (Tue Oct 7, 8:23 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Corey Minyard, (Tue Oct 7, 9:43 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, David Miller, (Tue Oct 7, 11:26 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, David Miller, (Tue Oct 7, 11:29 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, David Miller, (Tue Oct 7, 1:55 pm)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Stephen Hemminger, (Tue Oct 7, 2:20 pm)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Eric Dumazet, (Wed Oct 8, 1:35 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, Eric Dumazet, (Wed Oct 8, 6:55 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, David Miller, (Wed Oct 8, 9:38 am)
Re: [PATCH 3/3] Convert the UDP hash lock to RCU, David Miller, (Wed Oct 8, 11:45 am)
[PATCH 0/2] udp: Convert the UDP hash lock to RCU, Eric Dumazet, (Tue Oct 28, 1:37 pm)
[PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Tue Oct 28, 1:42 pm)
Re: [PATCH 0/2] udp: Convert the UDP hash lock to RCU, Stephen Hemminger, (Tue Oct 28, 2:28 pm)
Re: [PATCH 0/2] udp: Convert the UDP hash lock to RCU, Eric Dumazet, (Tue Oct 28, 2:50 pm)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Tue Oct 28, 3:45 pm)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., David Miller, (Tue Oct 28, 10:05 pm)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Wed Oct 29, 1:23 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., David Miller, (Wed Oct 29, 1:56 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Wed Oct 29, 2:04 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., David Miller, (Wed Oct 29, 2:17 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Wed Oct 29, 3:19 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Corey Minyard, (Wed Oct 29, 6:17 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Wed Oct 29, 7:36 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Corey Minyard, (Wed Oct 29, 8:34 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Wed Oct 29, 9:09 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Paul E. McKenney, (Wed Oct 29, 9:37 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Corey Minyard, (Wed Oct 29, 10:22 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Wed Oct 29, 10:32 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Wed Oct 29, 10:45 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Paul E. McKenney, (Wed Oct 29, 11:11 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., David Miller, (Wed Oct 29, 11:19 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., David Miller, (Wed Oct 29, 11:20 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Corey Minyard, (Wed Oct 29, 11:28 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., David Miller, (Wed Oct 29, 11:29 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Wed Oct 29, 11:36 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Paul E. McKenney, (Wed Oct 29, 11:38 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Paul E. McKenney, (Wed Oct 29, 11:52 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Wed Oct 29, 1:00 pm)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Paul E. McKenney, (Wed Oct 29, 1:17 pm)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Corey Minyard, (Wed Oct 29, 2:29 pm)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Wed Oct 29, 2:57 pm)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Paul E. McKenney, (Wed Oct 29, 2:58 pm)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Wed Oct 29, 3:08 pm)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Corey Minyard, (Wed Oct 29, 8:22 pm)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., David Miller, (Wed Oct 29, 10:40 pm)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Wed Oct 29, 10:50 pm)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Wed Oct 29, 10:51 pm)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Thu Oct 30, 12:04 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., David Miller, (Thu Oct 30, 12:05 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Peter Zijlstra, (Thu Oct 30, 4:04 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Peter Zijlstra, (Thu Oct 30, 4:12 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Thu Oct 30, 4:29 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Thu Oct 30, 4:30 am)
Re: [PATCH] udp: Introduce special NULL pointers for hlist ..., Stephen Hemminger, (Thu Oct 30, 8:51 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Paul E. McKenney, (Thu Oct 30, 11:25 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Eric Dumazet, (Fri Oct 31, 9:40 am)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., Paul E. McKenney, (Fri Oct 31, 8:10 pm)
Re: [PATCH 2/2] udp: RCU handling for Unicast packets., David Miller, (Sat Nov 1, 9:19 pm)
[PATCH 1/3] rcu: Introduce hlist_nulls variant of hlist, Eric Dumazet, (Thu Nov 13, 6:14 am)
[PATCH 2/3] udp: Use hlist_nulls in UDP RCU code, Eric Dumazet, (Thu Nov 13, 6:15 am)
[PATCH 4/3] rcu: documents rculist_nulls, Eric Dumazet, (Thu Nov 13, 9:02 am)
Re: [PATCH 4/3] rcu: documents rculist_nulls, Peter Zijlstra, (Fri Nov 14, 8:16 am)
Re: [PATCH 4/3] rcu: documents rculist_nulls, David Miller, (Sun Nov 16, 8:36 pm)
Re: [PATCH 1/3] rcu: Introduce hlist_nulls variant of hlist, Paul E. McKenney, (Wed Nov 19, 10:01 am)
Re: [PATCH 4/3] rcu: documents rculist_nulls, Paul E. McKenney, (Wed Nov 19, 10:07 am)
Re: [PATCH 2/3] udp: Use hlist_nulls in UDP RCU code, Paul E. McKenney, (Wed Nov 19, 10:29 am)
Re: [PATCH 2/3] udp: Use hlist_nulls in UDP RCU code, Eric Dumazet, (Wed Nov 19, 10:53 am)
Re: [PATCH 1/3] rcu: Introduce hlist_nulls variant of hlist, Paul E. McKenney, (Wed Nov 19, 11:46 am)
Re: [PATCH 1/3] rcu: Introduce hlist_nulls variant of hlist, Arnaldo Carvalho de Melo, (Wed Nov 19, 11:53 am)
Re: [PATCH 0/3] net: RCU lookups for UDP, DCCP and TCP pro ..., Christoph Lameter, (Wed Nov 19, 12:52 pm)
Re: [PATCH 1/3] rcu: Introduce hlist_nulls variant of hlist, Paul E. McKenney, (Wed Nov 19, 2:17 pm)
Re: [PATCH 1/3] rcu: Introduce hlist_nulls variant of hlist, Paul E. McKenney, (Wed Nov 19, 2:21 pm)
Re: [PATCH] net: Convert TCP/DCCP listening hash tables to ..., Paul E. McKenney, (Sun Nov 23, 12:17 pm)