[34-longterm 114/260] tcp: Combat per-cpu skew in orphan tests.

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Paul Gortmaker
Date: Sunday, January 2, 2011 - 12:16 am

From: David S. Miller <davem@davemloft.net>

commit ad1af0fedba14f82b240a03fe20eb9b2fdbd0357 upstream.

As reported by Anton Blanchard when we use
percpu_counter_read_positive() to make our orphan socket limit checks,
the check can be off by up to num_cpus_online() * batch (which is 32
by default) which on a 128 cpu machine can be as large as the default
orphan limit itself.

Fix this by doing the full expensive sum check if the optimized check
triggers.

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 include/net/tcp.h    |   18 ++++++++++++++----
 net/ipv4/tcp.c       |    5 +----
 net/ipv4/tcp_timer.c |    8 ++++----
 3 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index aa04b9a..cae8c39 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -268,11 +268,21 @@ static inline int between(__u32 seq1, __u32 seq2, __u32 seq3)
 	return seq3 - seq2 >= seq1 - seq2;
 }
 
-static inline int tcp_too_many_orphans(struct sock *sk, int num)
+static inline bool tcp_too_many_orphans(struct sock *sk, int shift)
 {
-	return (num > sysctl_tcp_max_orphans) ||
-		(sk->sk_wmem_queued > SOCK_MIN_SNDBUF &&
-		 atomic_read(&tcp_memory_allocated) > sysctl_tcp_mem[2]);
+	struct percpu_counter *ocp = sk->sk_prot->orphan_count;
+	int orphans = percpu_counter_read_positive(ocp);
+
+	if (orphans << shift > sysctl_tcp_max_orphans) {
+		orphans = percpu_counter_sum_positive(ocp);
+		if (orphans << shift > sysctl_tcp_max_orphans)
+			return true;
+	}
+
+	if (sk->sk_wmem_queued > SOCK_MIN_SNDBUF &&
+	    atomic_read(&tcp_memory_allocated) > sysctl_tcp_mem[2])
+		return true;
+	return false;
 }
 
 /* syncookies: remember time of last synqueue overflow */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 205ea31..692f424 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2002,11 +2002,8 @@ adjudge_to_death:
 		}
 	}
 	if (sk->sk_state != TCP_CLOSE) {
-		int orphan_count = percpu_counter_read_positive(
-						sk->sk_prot->orphan_count);
-
 		sk_mem_reclaim(sk);
-		if (tcp_too_many_orphans(sk, orphan_count)) {
+		if (tcp_too_many_orphans(sk, 0)) {
 			if (net_ratelimit())
 				printk(KERN_INFO "TCP: too many of orphaned "
 				       "sockets\n");
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 8a0ab29..d252af7 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -67,18 +67,18 @@ static void tcp_write_err(struct sock *sk)
 static int tcp_out_of_resources(struct sock *sk, int do_reset)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	int orphans = percpu_counter_read_positive(&tcp_orphan_count);
+	int shift = 0;
 
 	/* If peer does not open window for long time, or did not transmit
 	 * anything for long time, penalize it. */
 	if ((s32)(tcp_time_stamp - tp->lsndtime) > 2*TCP_RTO_MAX || !do_reset)
-		orphans <<= 1;
+		shift++;
 
 	/* If some dubious ICMP arrived, penalize even more. */
 	if (sk->sk_err_soft)
-		orphans <<= 1;
+		shift++;
 
-	if (tcp_too_many_orphans(sk, orphans)) {
+	if (tcp_too_many_orphans(sk, shift)) {
 		if (net_ratelimit())
 			printk(KERN_INFO "Out of socket memory\n");
 
-- 
1.7.3.3

--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[34-longterm 000/260] v2.6.34.8 longterm review, Paul Gortmaker, (Sun Jan 2, 12:14 am)
[34-longterm 003/260] ath5k: drop warning on jumbo frames, Paul Gortmaker, (Sun Jan 2, 12:14 am)
[34-longterm 019/260] ext4: Show journal_checksum option, Paul Gortmaker, (Sun Jan 2, 12:15 am)
[34-longterm 025/260] ext4: Fix compat EXT4_IOC_ADD_GROUP, Paul Gortmaker, (Sun Jan 2, 12:15 am)
[34-longterm 028/260] ext4: fix freeze deadlock under IO, Paul Gortmaker, (Sun Jan 2, 12:15 am)
[34-longterm 030/260] xen: handle events as edge-triggered, Paul Gortmaker, (Sun Jan 2, 12:15 am)
[34-longterm 045/260] USB: ehci-ppc-of: problems in unwind, Paul Gortmaker, (Sun Jan 2, 12:15 am)
[34-longterm 047/260] USB: CP210x Add new device ID, Paul Gortmaker, (Sun Jan 2, 12:15 am)
[34-longterm 065/260] irda: off by one, Paul Gortmaker, (Sun Jan 2, 12:16 am)
[34-longterm 089/260] sched: Optimize task_rq_lock(), Paul Gortmaker, (Sun Jan 2, 12:16 am)
[34-longterm 090/260] sched: Fix nr_uninterruptible count, Paul Gortmaker, (Sun Jan 2, 12:16 am)
[34-longterm 093/260] sched: Fix select_idle_sibling(), Paul Gortmaker, (Sun Jan 2, 12:16 am)
[34-longterm 098/260] arm: fix really nasty sigreturn bug, Paul Gortmaker, (Sun Jan 2, 12:16 am)
[34-longterm 106/260] drm/i915: Prevent double dpms on, Paul Gortmaker, (Sun Jan 2, 12:16 am)
[34-longterm 110/260] gro: fix different skb headrooms, Paul Gortmaker, (Sun Jan 2, 12:16 am)
[34-longterm 111/260] gro: Re-fix different skb headrooms, Paul Gortmaker, (Sun Jan 2, 12:16 am)
[34-longterm 114/260] tcp: Combat per-cpu skew in orphan t ..., Paul Gortmaker, (Sun Jan 2, 12:16 am)
[34-longterm 115/260] tcp: fix three tcp sysctls tuning, Paul Gortmaker, (Sun Jan 2, 12:16 am)
[34-longterm 118/260] rds: fix a leak of kernel memory, Paul Gortmaker, (Sun Jan 2, 12:16 am)
[34-longterm 126/260] Staging: vt6655: fix buffer overflow, Paul Gortmaker, (Sun Jan 2, 12:17 am)
[34-longterm 134/260] percpu: fix pcpu_last_unit_cpu, Paul Gortmaker, (Sun Jan 2, 12:17 am)
[34-longterm 136/260] inotify: send IN_UNMOUNT events, Paul Gortmaker, (Sun Jan 2, 12:17 am)
[34-longterm 139/260] fix siglock, Paul Gortmaker, (Sun Jan 2, 12:17 am)
[34-longterm 145/260] AT91: change dma resource index, Paul Gortmaker, (Sun Jan 2, 12:17 am)
[34-longterm 154/260] inotify: fix inotify oneshot support, Paul Gortmaker, (Sun Jan 2, 12:17 am)
[34-longterm 158/260] alpha: Fix printk format errors, Paul Gortmaker, (Sun Jan 2, 12:17 am)
[34-longterm 188/260] atl1: fix resume, Paul Gortmaker, (Sun Jan 2, 12:18 am)
[34-longterm 190/260] De-pessimize rds_page_copy_user, Paul Gortmaker, (Sun Jan 2, 12:18 am)
[34-longterm 192/260] xfrm4: strip ECN bits from tos field, Paul Gortmaker, (Sun Jan 2, 12:18 am)
[34-longterm 193/260] tcp: Fix &gt;4GB writes on 64-bit., Paul Gortmaker, (Sun Jan 2, 12:18 am)
[34-longterm 199/260] tcp: Fix race in tcp_poll, Paul Gortmaker, (Sun Jan 2, 12:18 am)
[34-longterm 200/260] netxen: dont set skb-&gt;truesize, Paul Gortmaker, (Sun Jan 2, 12:18 am)
[34-longterm 203/260] skge: add quirk to limit DMA, Paul Gortmaker, (Sun Jan 2, 12:18 am)
[34-longterm 208/260] b44: fix carrier detection on bind, Paul Gortmaker, (Sun Jan 2, 12:18 am)
[34-longterm 224/260] bluetooth: Fix missing NULL check, Paul Gortmaker, (Sun Jan 2, 12:18 am)
[34-longterm 236/260] KVM: x86: Fix SVM VMCB reset, Paul Gortmaker, (Sun Jan 2, 12:18 am)
[34-longterm 240/260] p54usb: fix off-by-one on !CONFIG_PM, Paul Gortmaker, (Sun Jan 2, 12:18 am)
[34-longterm 241/260] p54usb: add five more USBIDs, Paul Gortmaker, (Sun Jan 2, 12:18 am)
[34-longterm 256/260] libsas: fix NCQ mixing with non-NCQ, Paul Gortmaker, (Sun Jan 2, 12:19 am)
[34-longterm 257/260] gdth: integer overflow in ioctl, Paul Gortmaker, (Sun Jan 2, 12:19 am)
[34-longterm 258/260] Fix race when removing SCSI devices, Paul Gortmaker, (Sun Jan 2, 12:19 am)
Re: [34-longterm 000/260] v2.6.34.8 longterm review, Paul Gortmaker, (Sun Jan 2, 3:46 am)
Re: [34-longterm 000/260] v2.6.34.8 longterm review, Jiri Slaby, (Mon Jan 3, 3:41 am)
Re: [34-longterm 000/260] v2.6.34.8 longterm review, Paul Gortmaker, (Tue Jan 4, 12:11 pm)