Hi All; We are noticing a kernel OOPS on 2.6.26 kernel when we issue the command "clear ip bgp <bgp-peer-ip>" on Quagga BGP routing software. Also we are noticing a very similar crash when we set and reset the local TCP MD5 password. I am attaching the backtrace. I wonder if this has been seen by any other person and whether it is a known issue? We are running our kernel on the mips hardware. Thanks, Ani # [23:10:35.108808] Kernel bug detected[#1]: [23:10:35.112527] Cpu 0 [23:10:35.114676] $ 0 : 0000000000000000 0000000014001fe0 0000000000000066 0000000000000004 [23:10:35.122845] $ 4 : ffffffff80516c10 0000000014001fe0 ffffffff8050c010 0000000000000004 [23:10:35.131015] $ 8 : 0000000000000000 0000000000000041 ffffffff805142e8 0000000000000001 [23:10:35.139184] $12 : ffffffff80600000 ffffffff805f0000 0000000000000064 0000000000000190 [23:10:35.147354] $16 : 0000000000000102 ffffffff803afdf0 ffffffff80539040 ffffffff80600780 [23:10:35.155526] $20 : ffffffff80540000 0000000000200200 ffffffff804c0000 000000000000000a [23:10:35.163695] $24 : a3d70a3d70a3d70b 8000000000000003 [23:10:35.171865] $28 : ffffffff8050c000 ffffffff8050fd90 9000000010030000 ffffffff801487a8 [23:10:35.180035] Hi : 0000000000000000 [23:10:35.183819] Lo : 0000000000000000 [23:10:35.187603] epc : ffffffff801487a8 run_timer_softirq+0x198/0x258 Tainted: P [23:10:35.196032] ra : ffffffff801487a8 run_timer_softirq+0x198/0x258 [23:10:35.202395] Status: 14001fe3 KX SX UX KERNEL EXL IE [23:10:35.207814] Cause : 00808024 [23:10:35.210911] PrId : 01041100 (SiByte SB1A) [23:10:35.215209] Modules linked in: xt_state ipt_REJECT iptable_filter nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 ip_tables ebtable_filter ebtables bridge llc zeug_ipmcdrv(P) irqdisp(P) zvirt(P) zeugmod(P) softdog [23:10:35.236024] Process swapper (pid: 0, threadinfo=ffffffff8050c000, task=ffffffff805142e8, tls=0000000000000000) [23:10:35.246169] Stack : ffffffff8050fd90 ...
From: Anirban Sinha <asinha@zeugmasystems.com> You will need to update your kernel, there have been many TCP MD5 bug fixes since 2.6.26 --
Hi: Sigh ... wish that were that easy! Anyway, as far as I could, I have tried to apply the upstream patches that seemed relevant to TCP MD5SUM. Am I missing some other patches? It will be great if someone can point me to any patch that I might be missing related to the TCP MD5SUM support. I applied the following patches: (a) author Adam Langley <agl@imperialviolet.org> Sat, 19 Jul 2008 07:01:42 +0000 (00:01 -0700) committer David S. Miller <davem@davemloft.net> Sat, 19 Jul 2008 07:01:42 +0000 (00:01 -0700) commit 49a72dfb8814c2d65bd9f8c9c6daf6395a1ec58d tree 38804d609f21503573bbdd8bb9af38df99275ff5 tree | snapshot parent 845525a642c1c9e1335c33a274d4273906ee58eb commit | diff tcp: Fix MD5 signatures for non-linear skbs Currently, the MD5 code assumes that the SKBs are linear and, in the case that they aren't, happily goes off and hashes off the end of the SKB and into random memory. Reported by Stephen Hemminger in [1]. Advice thanks to Stephen and Evgeniy Polyakov. Also includes a couple of missed route_caps from Stephen's patch in [2]. [1] http://marc.info/?l=linux-netdev&m=121445989106145&w=2 [2] http://marc.info/?l=linux-netdev&m=121459157816964&w=2 Signed-off-by: Adam Langley <agl@imperialviolet.org> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net> (b) author YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Fri, 18 Apr 2008 03:45:16 +0000 (12:45 +0900) committer YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Wed, 11 Jun 2008 18:46:30 +0000 (03:46 +0900) commit 9501f9722922f2e80e1f9dc6682311d65c2b5690 tree ca8195e04ea63e8273801030ce26527fe5a8a7c7 tree | snapshot parent 8d26d76dd4a4c87ef037a44a42a0608ffc730199 commit | diff tcp md5sig: Let the caller pass appropriate key for tcp_v{4,6}_do_calc_md5_hash(). As we do for other socket/timewait-socket specific parameters, let the callers pass appropriate arguments to tcp_v{4,6}_do_calc_md5_hash(). Signed-off-by: YOSHIFUJI Hideaki ...
From: Anirban Sinha <asinha@zeugmasystems.com> Contact your vendor for support :-) --
but we *are* the vendors for our distribution and even though I may not be a networking guru, I have little bit knowledge in working my way through the kernel code. I have traced down the cause of the BUG() and when I did a git pull against Linus' tree, I see the same issue in teh git tip as well. The BUG() is triggered from kernel/timer.c, line 1037 within the function __run_timers(). I am reporting these line numbers from the latest git tip. What happens is that before and after the callback, the code grabs the preempt count and catches unbalanced preempt_enable() and preempt_disable() calls from within the callback function. In this case, the callback function is inet_twdr_hangman() as can be seen from this instrumented log: [02:15:15.941981] Kernel panic - not syncing: <3>huh, entered ffffffff803fbd60 (inet_twdr_hangman+0x0/0xe0)with preempt_count 00000102, exited with 00000101? Clearly there is an extra unbalanced preempt_enable() somewhere within the callback function. When I looked at the function, I see that in net/ipv4/inet_timewait_sock.c line 215, the function calls schedule_work(). schedule_work() calls queue_work() which in turn calls put_cpu() that ultimately does a preempt_enable(). It is this unbalanced preempt_enable() that decriments the preempt_count by one as can be seen from the above trace. I suspect that workqueue related operatios are illegal from a timer callback function. In that case, the above mentioned callback function needs to be fixed. Yes, I can't explain why others are also not seeing the same bug crash. We don't have the luxury to pull in the latest and greatest kernel from the git tree everytime an update is made and try it out. So I am unable to repo the issue with the latest kernel. But if that means that this issue should be ignored, then that is fine by me. We will fix our private kenrel with an appropriate patch as we continue to investigate more. Cheers, Ani --
and btw, this is the crash (on mips) we are talking about: # [23:10:35.108808] Kernel bug detected[#1]: [23:10:35.112527] Cpu 0 [23:10:35.114676] $ 0 : 0000000000000000 0000000014001fe0 0000000000000066 0000000000000004 [23:10:35.122845] $ 4 : ffffffff80516c10 0000000014001fe0 ffffffff8050c010 0000000000000004 [23:10:35.131015] $ 8 : 0000000000000000 0000000000000041 ffffffff805142e8 0000000000000001 [23:10:35.139184] $12 : ffffffff80600000 ffffffff805f0000 0000000000000064 0000000000000190 [23:10:35.147354] $16 : 0000000000000102 ffffffff803afdf0 ffffffff80539040 ffffffff80600780 [23:10:35.155526] $20 : ffffffff80540000 0000000000200200 ffffffff804c0000 000000000000000a [23:10:35.163695] $24 : a3d70a3d70a3d70b 8000000000000003 [23:10:35.171865] $28 : ffffffff8050c000 ffffffff8050fd90 9000000010030000 ffffffff801487a8 [23:10:35.180035] Hi : 0000000000000000 [23:10:35.183819] Lo : 0000000000000000 [23:10:35.187603] epc : ffffffff801487a8 run_timer_softirq+0x198/0x258 Tainted: P [23:10:35.196032] ra : ffffffff801487a8 run_timer_softirq+0x198/0x258 [23:10:35.202395] Status: 14001fe3 KX SX UX KERNEL EXL IE [23:10:35.207814] Cause : 00808024 [23:10:35.210911] PrId : 01041100 (SiByte SB1A) [23:10:35.215209] Modules linked in: xt_state ipt_REJECT iptable_filter nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 ip_tables ebtable_filter ebtables bridge llc zeug_ipmcdrv(P) irqdisp(P) zvirt(P) zeugmod(P) softdog [23:10:35.236024] Process swapper (pid: 0, threadinfo=ffffffff8050c000, task=ffffffff805142e8, tls=0000000000000000) [23:10:35.246169] Stack : ffffffff8050fd90 ffffffff8050fd90 0000000014001fe0 ffffffff805ff3e0 [23:10:35.254166] ffffffff806003c4 0000000000000001 ffffffff8053f650 ffffffff805706d0 [23:10:35.262337] ffffffff80572020 ffffffff80142280 ffffffff806003c0 0000000000000000 [23:10:35.270507] 0000000014001fe0 000000000000c5b0 ffffffff8fefc520 ffffffff8feea52c [23:10:35.278676] ...
Hi Oleg: I have a question for you. The queue_work() routine which is called from schedule_work() does a put_cpu() which in turn does a enable_preempt(). Is this an attempt to trigger the scheduler? One of the side affects of this enable_preempt() is the crash that we see below. What is happening is that a timer callback routine, in this case inet_twdr_hangman(), tries a bunch of cleanup until a threshold is reached. If further cleanups needs to be done beyond the threshold, it queues a work function. Now when the timer callback is run in __run_timers(), the routine grabs the value of preempt_count before and after the callback function call. If the two counts do not match, it calls BUG() (line 1037 in kernel/timer.c). Is is it illegal to schedule a work function from within a timer callback? What would be a good solution? I have already posted in netdev but since workqueues and timers are general kernel infrastructure, I thought I might as well post the question in the main linux m ailing list and to you. Here's the output from my instrumented BUG() call: [02:15:15.941981] Kernel panic - not syncing: <3>huh, entered ffffffff803fbd60 (inet_twdr_hangman+0x0/0xe0)with preempt_count 00000102, exited with 00000101? I was thinking of a hacky solution, to replace schedule_work() with schedule_delayed_work() just to get around the issue. But I am sure this is just too hacky and probably not the ideal solution ... Cheers, Ani --
Hi Anirban, No. please note that queue_work() does get_cpu() + put_cpu() to protect against cpu_down() in between. Yes, but I can't see how queue_work() can be involved, it doesn't change Yes sure. I'd suppose that this unbalance comes from inet_twdr_hangman() pathes. Could you verify this? Oleg. --
I'll keep looking. Thanks for the help Oleg. Ani --
OOPS!!!! I misread your question, didn't notice "il" above... I meant: yes sure it _is legal_ to schedule a work from within a timer callback (in fact it is legal from any context). Sorry for confusion. Oleg. --
ok, then that part of the function looks fine. Ani --
Yes, I have now verified this. There is indeed an issue with one of the functions called by inet_twdr_hangman(). The call sequence is: inet_twdr_hangman() -> inet_twdr_do_twkill_work() -> inet_twsk_put() -> twsk_destructor(). In this case, the destructor callback is tcp_twsk_destructor() (installed from line 1208 in net/ipv4/tcp_ipv4.c and line 906 in net/ipv6/tcp_ipv6.c) . Without the TCP_MD5SUM compiled in, the function is a no-op. However, with the MD5SUM compiled in, it calls tcp_put_md5_sig_pool() (when keylen is non zero) which does an unbalanced put_cpu(). I did a grep across the entire tree. tcp_put_md5_sig_pool() is a matching function for tcp_get_md5_sig_pool() and in all other TCP IPV4 cases, it is called from net/ipv4/tcp_ipv4.c from functions tcp_v4_md5_hash_hdr() and --
Yes, I have now verified this. There is indeed an issue with one of the
functions called by inet_twdr_hangman(). The call sequence is:
inet_twdr_hangman() -> inet_twdr_do_twkill_work() -> inet_twsk_put() ->
twsk_destructor().
In this case, the destructor callback is tcp_twsk_destructor() (installed
from line 1208 in net/ipv4/tcp_ipv4.c and line 906 in net/ipv6/tcp_ipv6.c) .
Without the TCP_MD5SUM compiled in, the function is a no-op. However, with the MD5SUM
compiled in, it calls tcp_put_md5_sig_pool() (when keylen is non zero) which
does an unbalanced put_cpu(). I did a grep across the entire tree.
tcp_put_md5_sig_pool() is a matching function for tcp_get_md5_sig_pool() and
in all other TCP IPV4 cases, it is called from net/ipv4/tcp_ipv4.c from
functions tcp_v4_md5_hash_hdr() and tcp_v4_hash_skb() along with the matching
get()
function. So I would think that in tcp_twsk_destructor(), the call should be
replaced by tcp_free_md5_sig_pool() instead.
Signed-of-by: Anirban Sinha <asinha@zeugmasystems.com>
---
net/ipv4/tcp_minisocks.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index e48c37d..dccc01e 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -363,7 +363,7 @@ void tcp_twsk_destructor(struct sock *sk)
#ifdef CONFIG_TCP_MD5SIG
struct tcp_timewait_sock *twsk = tcp_twsk(sk);
if (twsk->tw_md5_keylen)
- tcp_put_md5sig_pool();
+ tcp_free_md5sig_pool();
#endif
}
--
From: Anirban Sinha <asinha@zeugmasystems.com>
This has been fixed in the tree for a month of so:
commit 657e9649e745b06675aa5063c84430986cdc3afa
Author: Robert Varga <nite@hq.alert.sk>
Date: Tue Sep 15 23:49:21 2009 -0700
tcp: fix CONFIG_TCP_MD5SIG + CONFIG_PREEMPT timer BUG()
I have recently came across a preemption imbalance detected by:
<4>huh, entered ffffffff80644630 with preempt_count 00000102, exited with 00000101?
<0>------------[ cut here ]------------
<2>kernel BUG at /usr/src/linux/kernel/timer.c:664!
<0>invalid opcode: 0000 [1] PREEMPT SMP
with ffffffff80644630 being inet_twdr_hangman().
This appeared after I enabled CONFIG_TCP_MD5SIG and played with it a
bit, so I looked at what might have caused it.
One thing that struck me as strange is tcp_twsk_destructor(), as it
calls tcp_put_md5sig_pool() -- which entails a put_cpu(), causing the
detected imbalance. Found on 2.6.23.9, but 2.6.31 is affected as well,
as far as I can tell.
Signed-off-by: Robert Varga <nite@hq.alert.sk>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 045bcfd..624c3c9 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -363,7 +363,7 @@ void tcp_twsk_destructor(struct sock *sk)
#ifdef CONFIG_TCP_MD5SIG
struct tcp_timewait_sock *twsk = tcp_twsk(sk);
if (twsk->tw_md5_keylen)
- tcp_put_md5sig_pool();
+ tcp_free_md5sig_pool();
#endif
}
--
Grrrr! Time for me to do a git pull again. The kernel source tree in my work machine must be out of date by about the same time. --
