Add IPv6 support to TCP SYN cookies. This is written and tested against 2.6.24, and applies cleanly to linus' current HEAD (d2fc0b). Unfortunately linus' HEAD breaks my sky2 card at the moment, so I'm unable to test against that. I see no reason why it would be affected though. Comments/suggestions are welcome. Signed-off-by: Glenn Griffin <ggriffin.kernel@gmail.com> --- include/net/tcp.h | 4 + net/ipv4/syncookies.c | 203 ++++++++++++++++++++++++++++++++++++++++++++++++- net/ipv6/tcp_ipv6.c | 77 +++++++++++++----- 3 files changed, 260 insertions(+), 24 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index cb5b033..02dc6dd 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -435,6 +435,9 @@ extern struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, struct ip_options *opt); extern __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mss); +extern struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb); +extern __u32 cookie_v6_init_sequence(struct sock *sk, struct sk_buff *skb, + __u16 *mss); /* tcp_output.c */ @@ -1337,6 +1340,7 @@ extern int tcp_proc_register(struct tcp_seq_afinfo *afinfo); extern void tcp_proc_unregister(struct tcp_seq_afinfo *afinfo); extern struct request_sock_ops tcp_request_sock_ops; +extern struct request_sock_ops tcp6_request_sock_ops; extern int tcp_v4_destroy_sock(struct sock *sk); diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c index 2da1be0..b342bae 100644 --- a/net/ipv4/syncookies.c +++ b/net/ipv4/syncookies.c @@ -3,6 +3,7 @@ * * Copyright (C) 1997 Andi Kleen * Based on ideas by D.J.Bernstein and Eric Schenk. + * IPv6 Support Added by Glenn Griffin (2008) * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -10,8 +11,6 @@ * 2 of the License, or (at your option) any later version. * ...
Syncookies are discouraged these days. They disable too many valuable TCP features (window scaling, SACK) and even without them the kernel is usually strong enough to defend against syn floods and systems have much more memory than they used to be. So I don't think it makes much sense to add more code to it, sorry. Besides you should really move it to the ipv6 module, right now the code would be always compiled in even for ipv4 only kernels. -Andi --
Somewhat untrue. Network speeds have risen dramatically, the number of appliances running Linux that are not PC class means memory has fallen I think it makes a lot of sense - providing it defaults off for the PC world where as you say its use is limited. Alan --
With strong I meant Linux has much better algorithms to handle the standard syn queue (syncookies was originally added when it had only dumb head drop) and there are minisocks which also require significantly less overhead to manage than full sockets (less memory etc.) I have my doubts. It would be probably better to recheck everything and then remove syncookies. Also your sub PC class appliances rarely run LISTEN servers anyways that are open to the world. -Andi --
Really. The ones that first come to mind often have exposed ports including PDA devices and phones. (Ditto low end PC boxes - portscan an EEPC some day ;)) Is the other stuff enough - good question, and can be measured easily enough on a little dlink router or similar. Alan --
What kind of LISTEN ports? And does it matter if they're DoS'ed? The only one I can think of right now would be ident and frankly nobody will really care if that one works or not. If it's just the management interface etc. (which should really be firewalled) My guess would be that it is. If it's not it would be probably better to look at improving the standard queue management again; e.g.readd RED. -Andi --
> What kind of LISTEN ports? And does it matter if they're DoS'ed? I guess that depends on the opinion of the owner - Push based mobile services - Email delivery - VoIP - Management ports - Peer to peer data transfer - Instant messaging direct user/user connections Alan --
Hi Andi, Alan, I've run extensive tests with/without syn cookies recently. In my tests, I discovered that in fact SYN cookies more benefit high end machines than low-end ones. Let me explain. I noticed that computing the cookie consumes a lot of CPU, which is a real problem on low-end machines. But on the other end, it helps the system continue to respond when otherwise it would not. My tests on an AMD LX800 with max_syn_backlog at 63000 on an HTTP reverse proxy consisted in injecting 250 hits/s of legitimate traffic with 8000 SYN/s of noise. Without SYN cookies, the average response time was about 1.5 second and unstable (due to retransmits), and the CPU was set to 60%. With SYN cookies enabled, the response time dropped to 12-15ms only, but CPU usage jumped to 70%. The difference appears at a higher legitimate traffic rate. At 500 hits/s + 7800 SYN/s, the CPU is just saturated with correct response time (SYN backlog almost full but never full), and the performance slightly goes down with SYN cookies enabled, inducing a drop of the hit rate due to the increased CPU consumption. Till there, one would conclude that SYN cookies are bad. BUT! this was with tcp_synack_retries = 1, which is the optimal situation without SYN cookies under an attack and which is pretty bad for normal usage. The real problem without SYN cookies is that you are forced to support a huge SYN backlog (eg: 2 million entries to sustain 100 Mbps of SYN). And what happens with a large backlog ? You send a lot of retries for each SYN. 5 by default, meaning 6 SYN-ACKs for 1 SYN. Thus, you become a SYN amplifier and the guy in front of you just has to send you 20 Mbps of traffic for you to saturate your 100 Mbps uplink. Also, sending all those SYN-ACKs takes a huge amount of CPU time. With tcp_synack_retries at 0, my machine received 26600 SYN/s, and returned 26600 SYN-ACK/s at 100% CPU. With tcp_synack_retries set to 4, it could only accept 12900 SYN/s, replying with 51200 SYN-ACK/s. So the ...
As you say the kernel is usually strong enough to defend against syn flood attacks, but what about the situations where it isn't? As valuable as the TCP features are I would give them up if it means I'm able to connect to my sshd port when I otherwise would be unable to. While increased synq sizes, better dropping algorithms, and minisocks are a great way to mitigate the attacks and in most cases are enough, there are situations where syncookies are nice. Regardless, I would say as long as ipv4 has syncookie support it will accurately be viewed as a deficiency of ipv6 if it lacks support. So perhaps the discussion should be we whether all the other defenses are enough to warrant the removal of syncookie support from ipv4. That topic may bring in That is correct. I will gladly move it into it's own section within net/ipv6/. Do you have any problem using the same CONFIG and sysctl variables as the ipv4 implementation? Thanks --Glenn --
Yes, syncookies, while presenting some tradeoffs, are a necessary tool to have. The problem is that any reasonably recent PC can generate enough forged SYN packets to overwhelm reasonable SYN queues on a much more powerful server. Imagine a server with a few hundres Apache virtual hosts. One website pisses off the wrong person and it impacts service for everyone. While syncookies isn't always enough, enabling it often helps make the server more resiliant during attacks. And for web service, most of the connections are short-lived connections for small pieces of data - so I'm not really convinced that window scaling and selective ACK are all that important. -- Ross Vandegrift ross@kallisti.us "The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell." --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37 --
Have you actually seen this with a recent kernel in the wild or are you just talking theoretically? Linux uses some heuristics to manage the syn queue that should still ensure reasonable service even without cookies under attack. Also SYN-RECV sockets are stored in a special data structure optimized to use minimal resources. It is far from the classical head drop method that was so vunerable to syn flooding. -Andi --
I work at a hosting company and we see these kinds of issues in the real world fairly frequently. I would guess maybe a monthly basis. The servers where we have seen this are typically running RHEL 4 or 5 kernels, so I can't really speak to how recent the kernel is in this specific term. If I can find a box that we could temporary get a kernel.org kernel on, I'll see if I can get a real comparison together. We have collected a few of the more effective attack tools that have been left on compromised systems, so it wouldn't be too difficult to get some numbers. -- Ross Vandegrift ross@kallisti.us "The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell." --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37 --
That would be useful yes -- for different bandwidths. If the young/old heuristics do not work well enough anymore most likely we should try readding RED to the syn queue again. That used to be pretty effective in the early days. I don't quite remember why Linux didn't end up using it in fact. -Andi --
My initial test is end-to-end 1000Mbps, but I've got a few different I'm running juno-z with 2, 4, & 8 threads of syn flood to port 80. wireshark measures 2 threads at 350pps, 4 threads at 750pps, and 8 threads at 1200pps. Under no SYN flood, the server handles 750 HTTP requests per second, measured via httping in flood mode. With a default tcp_max_syn_backlog of 1024, I can trivially prevent any inbound client connections with 2 threads of syn flood. Enabling tcp_syncookies brings the connection handling back up to 725 fetches per second. If I raise the backlog to 16384, 4 threads gives me about 650 legit requests per sec. Going to 8 threads makes connections very unreliable - a handful will get through every 15 to 20 seconds. Again, tcp_syncookies returns performance almost totally back to normal. Cranking juno-z to the max generates me about 16kpps. Any syn backlog is easily overwhelmed and nothing gets through. tcp_syncookies gets me back to 650 requests per second. At these levels the CPU impact of tcp_syncookies is nothing. I can't measure a difference. In the real world, a 16kpps syn flood is small. People with a distributed botnet can easily get to the hundreds of thousands, and I have seen over million packets per second of SYN flood. BTW, I can trigger a soft lockup BUG when I restart apache to change the backlog during the 16kpps test-case: BUG: soft lockup detected on CPU#1! [<c044d1ec>] softlockup_tick+0x96/0xa4 [<c042ddb0>] update_process_times+0x39/0x5c [<c04196f7>] smp_apic_timer_interrupt+0x5b/0x6c [<c04059bf>] apic_timer_interrupt+0x1f/0x24 [<c045007b>] taskstats_exit_send+0x152/0x371 [<c05c007b>] netlink_kernel_create+0x5/0x11c [<c05a7415>] reqsk_queue_alloc+0x32/0x81 [<c05a5aca>] lock_sock+0x8e/0x96 [<c05ce8c4>] inet_csk_listen_start+0x17/0x106 [<c05e720f>] inet_listen+0x3c/0x5f [<c05a3e55>] sys_listen+0x4a/0x66 [<c05a4f4d>] sys_socketcall+0x98/0x19e [<c0407ef7>] do_syscall_trace+0xab/0xb1 [<c0404eff>] ...
Thanks for the tests. Could you please do an additional experiment? Use sch_em or similar to add a jittering longer latency in the connection (as would be realistic in a real distributed DOS). Does it make a Yes the defaults are probably too low. That's something that should CPU impact of syncookies was never a concern. The problems are rather I think the softirqs are starving user context through the socket lock. Probably should be fixed too. Something like softirq should detect when there is a user and it is looping too long and should give up the lock for some time. -Andi --
Sorry for the delay in getting back to you on this Andi. Here's a breakdown for each attack of the pps and bandwidth: packets/s Mb/s 380 0.182 715 0.343 1193 0.572 16460 7.896 The first three tests were done with some fixed delay inbetween syn Jitter on both ends makes it worse. Jitter only on the syn flooder end behaves approximately the same. If I add jitter to both the flooder and the target with: tc qdisc add dev eth0 root netem delay 50ms 100ms distribution normal I can kill off the host with even a single thread of syn flooding, Yea, with a longer queue, the server is somewhat more resiliant. But it's still pretty easy to overwhelm it. syncookies is a night and day Oh definitely - Willy raised the CPU issue in another mail, I was just including my findings with a bigger CPU. In general I agree with you for the TCP features, but in the cases where we're enabling syncookies, it's the difference between bad connectivity and no connectivity. -- Ross Vandegrift ross@kallisti.us "The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell." --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37 --
Have you seen such a case in practice with a modern kernel? They also cause problems unfortunately; e.g. there is no real flow control for connections No. -Andi --
How does syncookies prevent windows from growing? Most (if not all) distributions have them enabled and window growing works just fine. Actually I do not see any reason why connection establishment handshake should prevent any run-time operations at all, even if it was setup during handshake. -- Evgeniy Polyakov --
TCP only uses options negotiated during the hand shake and syncookies is incapable to do this. -Andi --
Then you meant not windows change, but the fact, that option is ignored What about fixing the implementation, so that it could get into account -- Evgeniy Polyakov --
> How does syncookies prevent windows from growing? Syncookies are only triggered if the system is under a load where it would begin to lose connections otherwise. So they merely turn a DoS into a working if slightly slower setup (and > 64K windows don't matter for most normal users, especially on mobile devices). --
Hi Alan. SACK is actually a good idea for mobile devices, so preventing syncookies from not getting into account some options (btw, does it work with timestamps and PAWS?) is not a solution. -- Evgeniy Polyakov --
All TCP options negociated during session setup are lost. In fact, some bits (3) are still reserved for the best known value of the MSS, but that's all. The principle of SYN cookies is that the server does not create any session upon the SYN, but builds a sequence number constitued from a hash and the values it absolutely needs to know when the client validates the session with an ACK. I've seen some firewalls acting as SYN gateways which send the options from the server to the client in the first ACK packet from the server. This is normally not allowed, but it seems to work with some TCP stacks (at least for the MSS). One solution would be to extend TCP to officially support this behaviour and to optionally use it along with SYN cookies, but there will always be old clients not compatible with the extension. Regards, Willy --
Syncookies only get used at the point where the alternative is failure. No SACK beats a DoS situation most days --
I realized an earlier email I sent had an incorrect timestamp and wasn't associated with the thread, so I thought it would be better to resend. I apologize if this is duplicated for anyone. Here is a reworked patch that moves the IPv6 syncookie support out of the ipv4/syncookies.c file and into it's own ipv6/syncookies.c. The same CONFIG options and sysctl variables as ipv4, but this way the code is isolated to the ipv6 module. Signed-off-by: Glenn Griffin <ggriffin.kernel@gmail.com> --- include/net/tcp.h | 6 + net/ipv6/Makefile | 1 + net/ipv6/syncookies.c | 273 +++++++++++++++++++++++++++++++++++++++++++++++++ net/ipv6/tcp_ipv6.c | 77 ++++++++++---- 4 files changed, 335 insertions(+), 22 deletions(-) create mode 100644 net/ipv6/syncookies.c diff --git a/include/net/tcp.h b/include/net/tcp.h index cb5b033..d7f620c 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -436,6 +436,11 @@ extern struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, extern __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mss); +/* From net/ipv6/syncookies.c */ +extern struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb); +extern __u32 cookie_v6_init_sequence(struct sock *sk, struct sk_buff *skb, + __u16 *mss); + /* tcp_output.c */ extern void __tcp_push_pending_frames(struct sock *sk, unsigned int cur_mss, @@ -1337,6 +1342,7 @@ extern int tcp_proc_register(struct tcp_seq_afinfo *afinfo); extern void tcp_proc_unregister(struct tcp_seq_afinfo *afinfo); extern struct request_sock_ops tcp_request_sock_ops; +extern struct request_sock_ops tcp6_request_sock_ops; extern int tcp_v4_destroy_sock(struct sock *sk); diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile index 87c23a7..d1a1056 100644 --- a/net/ipv6/Makefile +++ b/net/ipv6/Makefile @@ -15,6 +15,7 @@ ipv6-$(CONFIG_XFRM) += xfrm6_policy.o xfrm6_state.o xfrm6_input.o \ ipv6-$(CONFIG_NETFILTER) += netfilter.o ...
I didn't think a module could have multiple module_inits. Are you sure that works? -Andi --
Indeed. That will fail whenever ipv6 is compiled as a module. It's been removed. It snuck in from the v4 implementation, where I'm still having trouble understanding why it's needed there. --Glenn --
s/needed/used/ ipv4 is never modular so it works. Arguably it would be cleaner if it was __initcall() -Andi --
Okay. Round3. Took into account that it was horribly broken when ipv6 was compiled as a module. The fixes export a few more symbols, and now the syncookie_secret is shared between the v4 and v6 code. That should be fine as it will be initialized when the v4 code starts, and it's not currently possible to have v6 cookie support without v4. At this point I have not taken Evgeniy's feedback on the hash buffer being to large to keep on the stack. I was hoping to hear some other opinions on that. Feedback is appreciated. Thanks. Signed-off-by: Glenn Griffin <ggriffin.kernel@gmail.com> --- include/net/tcp.h | 8 ++ net/ipv4/syncookies.c | 7 +- net/ipv4/tcp_input.c | 1 + net/ipv4/tcp_minisocks.c | 2 + net/ipv4/tcp_output.c | 1 + net/ipv6/Makefile | 1 + net/ipv6/syncookies.c | 265 ++++++++++++++++++++++++++++++++++++++++++++++ net/ipv6/tcp_ipv6.c | 77 ++++++++++---- 8 files changed, 336 insertions(+), 26 deletions(-) create mode 100644 net/ipv6/syncookies.c diff --git a/include/net/tcp.h b/include/net/tcp.h index cb5b033..58a2dda 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -29,6 +29,7 @@ #include <linux/skbuff.h> #include <linux/dmaengine.h> #include <linux/crypto.h> +#include <linux/cryptohash.h> #include <net/inet_connection_sock.h> #include <net/inet_timewait_sock.h> @@ -431,11 +432,17 @@ extern int tcp_disconnect(struct sock *sk, int flags); extern void tcp_unhash(struct sock *sk); /* From syncookies.c */ +extern __u32 syncookie_secret[2][16-3+SHA_DIGEST_WORDS]; extern struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, struct ip_options *opt); extern __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mss); +/* From net/ipv6/syncookies.c */ +extern struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb); +extern __u32 cookie_v6_init_sequence(struct sock *sk, struct sk_buff *skb, + __u16 ...
This huge buffer should not be allocated on stack. -- Evgeniy Polyakov --
I can replace it will a kmalloc, but for my benefit what's the practical size we try and limit the stack to? It seemed at first glance to me that 404 bytes plus the arguments, etc. was not such a large buffer for a non-recursive function. Plus the alternative with a kmalloc requires propogating the possible error status back up to tcp_ipv6.c in the event we are unable to allocate enough memory, so it can simply drop the connection. Not an impossible task by any means but it does significantly complicate things and I would like to know it's worth the effort. Also would it be worth it to provide a supplemental patch for the ipv4 implementation as it allocates the same buffer? --Glenn --
Well, maybe for connection establishment path it is not, but it is absolutely the case in the sending and sometimes receiving pathes for 4k stacks. The main problem is that bugs which happen because of stack overflow are so much obscure, that it is virtually impossible to detect where overflow happend. 'Debug stack overflow' somehow does not help to detect it. Usually there is about 1-1.5 kb of free stack for each process, so this change will cut one third of the free stack, getting into account that One can reorganize syncookie support to work with request hash tables too, so that we could allocate per hash-bucket space and use it as a -- Evgeniy Polyakov --
Or maybe use percpu storage for that... I am not sure if cookie_hash() is always called with preemption disabled.= (If not, we have to use get_cpu_var()/put_cpu_var()) [NET] IPV4: lower stack usage in cookie_hash() function 400 bytes allocated on stack might be a litle bit too much. Using a=20 per_cpu var is more friendly. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> Or maybe use percpu storage for that... That seems like a good approach. I'll incorporate it into my v6 patch, cookie_hash is always called within NET_RX_SOFTIRQ context so I believe preemption will always be disabled by __do_softirq(). So there shouldn't be a need to use get_cpu_var/put_cpu_var, somebody correct me if I'm wrong. --Glenn --
Updated to incorporate Eric's suggestion of using a per cpu buffer rather than allocating on the stack. Just a two line change, but will resend in it's entirety. Signed-off-by: Glenn Griffin <ggriffin.kernel@gmail.com> --- include/net/tcp.h | 8 ++ net/ipv4/syncookies.c | 7 +- net/ipv4/tcp_input.c | 1 + net/ipv4/tcp_minisocks.c | 2 + net/ipv4/tcp_output.c | 1 + net/ipv6/Makefile | 1 + net/ipv6/syncookies.c | 267 ++++++++++++++++++++++++++++++++++++++++++++++ net/ipv6/tcp_ipv6.c | 77 ++++++++++---- 8 files changed, 338 insertions(+), 26 deletions(-) create mode 100644 net/ipv6/syncookies.c diff --git a/include/net/tcp.h b/include/net/tcp.h index 7de4ea3..c428ec7 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -29,6 +29,7 @@ #include <linux/skbuff.h> #include <linux/dmaengine.h> #include <linux/crypto.h> +#include <linux/cryptohash.h> #include <net/inet_connection_sock.h> #include <net/inet_timewait_sock.h> @@ -434,11 +435,17 @@ extern int tcp_disconnect(struct sock *sk, int flags); extern void tcp_unhash(struct sock *sk); /* From syncookies.c */ +extern __u32 syncookie_secret[2][16-3+SHA_DIGEST_WORDS]; extern struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, struct ip_options *opt); extern __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mss); +/* From net/ipv6/syncookies.c */ +extern struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb); +extern __u32 cookie_v6_init_sequence(struct sock *sk, struct sk_buff *skb, + __u16 *mss); + /* tcp_output.c */ extern void __tcp_push_pending_frames(struct sock *sk, unsigned int cur_mss, @@ -1332,6 +1339,7 @@ extern int tcp_proc_register(struct tcp_seq_afinfo *afinfo); extern void tcp_proc_unregister(struct tcp_seq_afinfo *afinfo); extern struct request_sock_ops tcp_request_sock_ops; +extern struct request_sock_ops tcp6_request_sock_ops; ...
Applied in my linux-2.6-dev tree. Thanks. --yoshfuji --
I've posted a series of patches that I believe address Andi's concerns about syncookies not supporting valuable tcp options (primarily SACK, and window scaling). The premise being if the client support tcp timestamps we can encode the additional tcp options in the initial timestamp we send back to the client, and they will be echo'd back to us in the ack. Anyone interested have a look, and provide any suggestions you may have. The new patches are a superset of this patch, so if they are accepted this is one obsolete. Support arbitrary initial TCP timestamps http://lkml.org/lkml/2008/2/15/244 Enable the use of TCP options with syncookies http://lkml.org/lkml/2008/2/15/245 Add IPv6 Support to TCP SYN cookies http://lkml.org/lkml/2008/2/15/246 --Glenn --
Applied to my inet6-2.6.26 tree. Thanks. --yoshfuji --
Even though I work the loyal opposition to SuSE I'd say SuSE 10.3 is correct in having it enabled in the build. Alan --
