Hi,
I used netconsole to document a kernel panic related to AX25 on one of
my systems.
This occurs systematically as soon as I close an AX25 session
established through a level 3 ROSE network (FPAC) via AXUDP link (ax25ipd).
It is not triggered when connecting locally or via AX25 call through LAN.This is observed with kernel-2.6.24.2 and -3 with most recent AX25
patches applied.Bernard Pidoux
BUG: unable to handle kernel paging request at virtual address 6b6b6c2b
printing eip: c0139585 *pde = 00000000
Oops: 0002 [#1]
Modules linked in: netconsole configfs rose netrom mkiss crc16 ax25 mga
drm nfsd exportfs nfs lockd nfs_acl sunrpc af_packet ipv6 snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss
binfmt_misc loop floppy usbhid snd_ens1371 snd_rawmidi snd_seq_device
snd_ac97_codec snd_pcm uhci_hcd snd_timer snd soundcore snd_page_alloc
usbcore ne2k_pci intel_agp ac97_bus ide_cd 8390 agpgart ext3 jbd
ide_disk piix ide_corePid: 3527, comm: ax25ipd Not tainted (2.6.24.2 #2)
EIP: 0060:[<c0139585>] EFLAGS: 00010012 CPU: 0
EIP is at __lock_acquire+0x65/0x11d0
EAX: 00000012 EBX: 00000001 ECX: c7670068 EDX: 00000002
ESI: 00000000 EDI: 6b6b6b6b EBP: c6bf5bc0 ESP: c6bf5b38
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process ax25ipd (pid: 3527, ti=c6bf4000 task=c60251d0 task.ti=c6bf4000)
Stack: 0000580b 00000000 02c059e8 00000000 c04f73d8 02c059e8 00000000
c6bf5be4
c01397d8 00000001 c017ecc0 00000000 00000002 00000000 c7670068
c60251d0
0000580b 00000000 02c059d1 00000000 c04f3ce0 00000002 00000000
c75290f8
Call Trace:
[<c010513a>] show_trace_log_lvl+0x1a/0x30
[<c01051f9>] show_stack_log_lvl+0xa9/0xd0
[<c01052e6>] show_registers+0xc6/0x1b0
[<c01054b5>] die+0xe5/0x220
[<c011244f>] do_page_fault+0x15f/0x610
[<c02c2cfa>] error_code+0x6a/0x70
[<c013a773>] lock_acquire+0x83/0xa0
[<c02c2abe>] _spin_lock_irqsave+0x3e/0x70
[<c011458b>] __wake_up+...
Bernard Pidoux F6BVP <f6bvp@free.fr> :
May be replacing netif_rx() in drivers/net/hamradio/mkiss.c::ax_bump()
by netif_receive_skb() ?--
Ueimor
--
Sorry, this does not made it.
I replaced netif_rx() by netif_receive_skb(skb)
but this gives a worst situation : system is frozen.
Thank you for the suggestion.=======================
BUG: soft lockup - CPU#0 stuck for 11s! [ax25ipd:3584]Pid: 3584, comm: ax25ipd Not tainted (2.6.24.3 #7)
EIP: 0060:[<c01ed892>] EFLAGS: 00000246 CPU: 0
EIP is at delay_tsc+0x12/0x20
EAX: 7752234b EBX: 00000001 ECX: 7752234b EDX: 00000016
ESI: 087022c8 EDI: 00000000 EBP: c6bb5bc0 ESP: c6bb5bbc
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: b7dd86f8 CR3: 07960000 CR4: 000000d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
[<c010535a>] show_trace_log_lvl+0x1a/0x30
[<c0105d82>] show_trace+0x12/0x20
[<c01024cc>] show_regs+0x1c/0x20
[<c015555b>] softlockup_tick+0x11b/0x150
[<c0126c12>] run_local_timers+0x12/0x20
[<c012708f>] update_process_times+0x2f/0x60
[<c013a355>] tick_periodic+0x25/0x80
[<c013a3c9>] tick_handle_periodic+0x19/0x80
[<c0107aca>] timer_interrupt+0x4a/0x60
[<c0155928>] handle_IRQ_event+0x28/0x60
[<c015725f>] handle_level_irq+0x7f/0xf0
[<c0106e68>] do_IRQ+0x48/0xa0
[<c0104d16>] common_interrupt+0x2e/0x34
[<c01ed839>] __delay+0x9/0x10
[<c01f15f2>] _raw_spin_lock+0xb2/0x140
[<c02cd350>] _spin_lock_bh+0x50/0x60
[<c8c59e72>] ax_xmit+0x112/0x370 [mkiss]
[<c025e947>] dev_hard_start_xmit+0x207/0x270
[<c026f454>] __qdisc_run+0x54/0x1b0
[<c0261226>] dev_queue_xmit+0x266/0x340
[<c8c27a87>] ax25_queue_xmit+0x47/0x70 [ax25]
[<c8c27b77>] ax25_transmit_buffer+0xc7/0x110 [ax25]
[<c8c27c3d>] ax25_send_iframe+0x7d/0x110 [ax25]
[<c8c27d80>] ax25_kick+0xb0/0x1b0 [ax25]
[<c8c28e49>] ax25_std_frame_in+0x69/0x890 [ax25]
[<c8c26dbb>] ax25_kiss_rcv+0x2bb/0x7c0 [ax25]
[<c025e24a>] netif_receive_skb+0x38a/0x480
[<c8c...
On Wed, Mar 19, 2008 at 09:57:18PM +0100, Bernard Pidoux wrote:
Well, the rest looks very similarly and not trivial... Probably a
socket is destructed too soon (or something...). Could you try to
reproduce this with some debugging turned off, e.g.:
CONFIG_DEBUG_LOCK_ALLOC, CONFIG_DEBUG_PAGEALLOC, (CONFIG_SLUB_DEBUG?).Regards,
Jarek P.
--
I recompiled the kernel after turning off the above options.
However there is still a kernel panic documented below.
Does it help ?Bernard Pidoux
BUG: spinlock bad magic on CPU#0, swapper/0
BUG: unable to handle kernel paging request at virtual address 6b6b6c2f
printing eip: c01e730b *pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in: rose netrom mkiss crc16 ax25 mga drm netconsole
configfs nfsd exportfs nfs lockd nfs_acl sunrpc af_packet ipv6
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss
snd_mixer_oss binfmt_misc loop usb_storage scsi_mod floppy usbhid
snd_ens1371 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer
snd uhci_hcd intel_agp ne2k_pci ide_cd usbcore soundcore snd_page_alloc
ac97_bus 8390 agpgart genrtc ext3 jbd ide_disk piix ide_corePid: 0, comm: swapper Not tainted (2.6.24.3 #8)
EIP: 0060:[<c01e730b>] EFLAGS: 00010002 CPU: 0
EIP is at spin_bug+0xdb/0xf0
EAX: 0000002f EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: 00000082
ESI: c07e5a38 EDI: c0333633 EBP: c0379c78 ESP: c0379c54
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=c0378000 task=c0353340 task.ti=c0378000)
Stack: c0333724 c0333633 00000000 c0353509 00000000 00000010 c07e5a38
00000001
c7b8abb8 c0379cb0 c01e7409 c11448c8 c1147740 00000000 c11448c0
c0379cb8
c114775c c11448c8 c1147740 00000000 00000282 00000001 c7b8abb8
c0379cbc
Call Trace:
[<c010527a>] show_trace_log_lvl+0x1a/0x30
[<c0105339>] show_stack_log_lvl+0xa9/0xd0
[<c0105427>] show_registers+0xc7/0x1b0
[<c010561a>] die+0x10a/0x230
[<c0114855>] do_page_fault+0x2f5/0x600
[<c02c039a>] error_code+0x72/0x78
[<c01e7409>] _raw_spin_lock+0x69/0x140
[<c02c01cc>] _spin_lock_irqsave+0xc/0x20
[<c0117c9b>] __wake_up+0x1b/0x50
[<c024badc>] sock_def_write_space+0x9c/0xc0
[<c024bdcc>] sock_wfree+0x3c/0x50
[<c024dfc7>] skb_release_all+0x57/0x130
[<c024d75b>] __kfree_skb+0x...
On Thu, Mar 20, 2008 at 11:58:35PM +0100, Bernard Pidoux wrote:
...Hi,
Bernard, could you try this debugging patch?
Thanks,
Jarek P.---
diff -Nurp 2.6.24.3-/net/core/sock.c 2.6.24.3+/net/core/sock.c
--- 2.6.24.3-/net/core/sock.c 2008-01-24 23:58:37.000000000 +0100
+++ 2.6.24.3+/net/core/sock.c 2008-03-22 12:10:10.000000000 +0100
@@ -1517,14 +1517,21 @@ static void sock_def_write_space(struct
* progress. --DaveM
*/
if ((atomic_read(&sk->sk_wmem_alloc) << 1) <= sk->sk_sndbuf) {
- if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
+ if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) {
+ if (!sk->sk_socket || sk->sk_socket->sk != sk) {
+ WARN_ON_ONCE(1);
+ sk->sk_socket = NULL;
+ sk->sk_sleep = NULL;
+ goto out;
+ }
wake_up_interruptible(sk->sk_sleep);
+ }/* Should agree with poll, otherwise some programs break */
if (sock_writeable(sk))
sk_wake_async(sk, 2, POLL_OUT);
}
-
+out:
read_unlock(&sk->sk_callback_lock);
}--
Hi Jarek,
Here is the result of your patch :
WARNING: at net/core/sock.c:1522 sock_def_write_space()
Pid: 3452, comm: mkiss Not tainted 2.6.24.3 #2
[<c010513a>] show_trace_log_lvl+0x1a/0x30
[<c0105b52>] show_trace+0x12/0x20
[<c01064ac>] dump_stack+0x6c/0x80
[<c024d54b>] sock_def_write_space+0x10b/0x120
[<c024d93a>] sock_wfree+0x3a/0x40
[<c024fafd>] skb_release_all+0x4d/0xf0
[<c024f2ab>] __kfree_skb+0xb/0x90
[<c024f349>] kfree_skb+0x19/0x30
[<c8cb5e37>] ax_xmit+0xf7/0x360 [mkiss]
[<c0255367>] dev_hard_start_xmit+0x207/0x270
[<c02657f4>] __qdisc_run+0x54/0x1a0
[<c0257bac>] dev_queue_xmit+0x25c/0x330
[<c8cc8a77>] ax25_queue_xmit+0x47/0x70 [ax25]
[<c8cc8b65>] ax25_transmit_buffer+0xc5/0x110 [ax25]
[<c8cc8c2d>] ax25_send_iframe+0x7d/0x110 [ax25]
[<c8cc8d70>] ax25_kick+0xb0/0x1b0 [ax25]
[<c8cc9e29>] ax25_std_frame_in+0x69/0x890 [ax25]
[<c8cc7dba>] ax25_kiss_rcv+0x2ba/0x7b0 [ax25]
[<c0254cc4>] netif_receive_skb+0x244/0x2e0
[<c0257835>] process_backlog+0x75/0xe0
[<c025734b>] net_rx_action+0x12b/0x1d0
[<c011d595>] __do_softirq+0x55/0xc0
[<c011d647>] do_softirq+0x47/0x50
[<c011d9f5>] local_bh_enable_ip+0x95/0xe0
[<c02babe5>] _spin_unlock_bh+0x25/0x30
[<c8cb56b0>] mkiss_receive_buf+0x360/0x420 [mkiss]
[<c020dfe7>] pty_write+0x47/0x60
[<c020b5b2>] write_chan+0x222/0x330
[<c0208c76>] tty_write+0x126/0x1b0
[<c01728df>] vfs_write+0xaf/0x120
[<c0172e2d>] sys_write+0x3d/0x70
[<c010412a>] sysenter_past_esp+0x5f/0xa5
=======================Happy Easter,
Bernard Pidoux
--
On Sun, Mar 23, 2008 at 06:00:17PM +0100, Bernard Pidoux wrote:
...Thanks. Here is one more (to apply after the previous one).
Happy Holidays...
Jarek P.---
diff -Nurp 2.6.24.3-with1/include/net/sock.h 2.6.24.3-with2/include/net/sock.h
--- 2.6.24.3-with1/include/net/sock.h 2008-01-24 23:58:37.000000000 +0100
+++ 2.6.24.3-with2/include/net/sock.h 2008-03-24 21:14:43.000000000 +0100
@@ -263,6 +263,7 @@ struct sock {
int (*sk_backlog_rcv)(struct sock *sk,
struct sk_buff *skb);
void (*sk_destruct)(struct sock *sk);
+ int sk_ax25_debug;
};/*
diff -Nurp 2.6.24.3-with1/net/ax25/af_ax25.c 2.6.24.3-with2/net/ax25/af_ax25.c
--- 2.6.24.3-with1/net/ax25/af_ax25.c 2008-01-24 23:58:37.000000000 +0100
+++ 2.6.24.3-with2/net/ax25/af_ax25.c 2008-03-23 00:53:51.000000000 +0100
@@ -936,6 +936,7 @@ struct sock *ax25_make_new(struct sock *
}sk->sk_protinfo = ax25;
+ sk->sk_ax25_debug = 1;
ax25->sk = sk;return sk;
diff -Nurp 2.6.24.3-with1/net/core/sock.c 2.6.24.3-with2/net/core/sock.c
--- 2.6.24.3-with1/net/core/sock.c 2008-03-22 12:10:10.000000000 +0100
+++ 2.6.24.3-with2/net/core/sock.c 2008-03-24 21:43:46.000000000 +0100
@@ -1519,6 +1519,9 @@ static void sock_def_write_space(struct
if ((atomic_read(&sk->sk_wmem_alloc) << 1) <= sk->sk_sndbuf) {
if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) {
if (!sk->sk_socket || sk->sk_socket->sk != sk) {
+ if (net_ratelimit())
+ printk(KERN_INFO "sk_ax25_debug = %d, %p\n",
+ sk->sk_ax25_debug, sk->sk_socket);
WARN_ON_ONCE(1);
sk->sk_socket = NULL;
sk->sk_sleep = NULL;
--
After applying this patch I did not see any sk_ax25_debug = 1 into
/var/log/messages. It was always = 0.Mar 25 12:03:07 f6bvp-9 kernel: sk_ax25_debug = 0, c08de2e0
Mar 25 12:11:55 f6bvp-9 kernel: sk_ax25_debug = 0, c08de2e0
Mar 25 12:13:07 f6bvp-9 kernel: sk_ax25_debug = 0, c08de2e0
Mar 25 12:16:39 f6bvp-9 kernel: sk_ax25_debug = 0, c08de040
Mar 25 12:18:57 f6bvp-9 kernel: sk_ax25_debug = 0, c08de040
Mar 25 12:23:12 f6bvp-9 kernel: sk_ax25_debug = 0, c08de580
Mar 25 12:26:50 f6bvp-9 kernel: sk_ax25_debug = 0, c08de580
Mar 25 12:32:34 f6bvp-9 kernel: sk_ax25_debug = 0, c08de040
Mar 25 12:33:14 f6bvp-9 kernel: sk_ax25_debug = 0, c08de2e0
Mar 25 12:33:57 f6bvp-9 kernel: sk_ax25_debug = 0, c08de040
Mar 25 12:36:56 f6bvp-9 kernel: sk_ax25_debug = 0, c08de580
Mar 25 12:43:16 f6bvp-9 kernel: sk_ax25_debug = 0, c08de580
Mar 25 12:48:34 f6bvp-9 kernel: sk_ax25_debug = 0, c08de580
Mar 25 12:48:57 f6bvp-9 kernel: sk_ax25_debug = 0, c08de040
Mar 25 12:53:10 f6bvp-9 kernel: sk_ax25_debug = 0, c08de580
Mar 25 13:00:31 f6bvp-9 kernel: sk_ax25_debug = 0, c08de2e0
Mar 25 13:03:24 f6bvp-9 kernel: sk_ax25_debug = 0, c08de2e0
Mar 25 13:03:56 f6bvp-9 kernel: sk_ax25_debug = 0, c08de040
Mar 25 13:10:52 f6bvp-9 kernel: sk_ax25_debug = 0, c08de2e0
Mar 25 13:13:22 f6bvp-9 kernel: sk_ax25_debug = 0, c08de040Here is an extract of /var/log/kernel/warnings.log
Mar 25 11:54:04 f6bvp-9 kernel: NET: Registered protocol family 3
Mar 25 11:54:04 f6bvp-9 kernel: mkiss: AX.25 Multikiss, Hans Albas PE1AYX
Mar 25 11:54:04 f6bvp-9 mkiss[3443]: starting
Mar 25 11:54:04 f6bvp-9 kernel: mkiss: ax0: crc mode is auto.
Mar 25 11:54:04 f6bvp-9 kernel: ADDRCONF(NETDEV_CHANGE): ax0: link
becomes ready
Mar 25 11:54:04 f6bvp-9 kernel: mkiss: ax1: crc mode is auto.
Mar 25 11:54:04 f6bvp-9 kernel: ADDRCONF(NETDEV_CHANGE): ax1: link
becomes ready
Mar 25 11:54:05 f6bvp-9 mkiss[3455]: starting
Mar 25 11:54:05 f6bvp-9 kernel: mkiss: ax2: crc mode is auto.
Mar...
On Tue, Mar 25, 2008 at 02:23:46PM +0100, Bernard Pidoux wrote:
...OK, here is the next try. (Apply after all previous patches to any 2.6.24.x;
there would be more noise, so revert it after catching some data.)Thanks,
Jarek P.(debugging patch #3)
---diff -Nurp 2.6.24.3-with2/include/net/sock.h 2.6.24.3-with3/include/net/sock.h
--- 2.6.24.3-with2/include/net/sock.h 2008-03-24 21:14:43.000000000 +0100
+++ 2.6.24.3-with3/include/net/sock.h 2008-03-26 19:10:33.000000000 +0100
@@ -1414,4 +1414,20 @@ extern int sysctl_optmem_max;
extern __u32 sysctl_wmem_default;
extern __u32 sysctl_rmem_default;+static inline int sock_debug_ax25(struct sk_buff *skb, struct sock *ax25_sk)
+{
+ struct sock *sk = skb->sk;
+
+ if (sk && sk->sk_socket && sk->sk_socket->sk != sk) {
+ if (net_ratelimit())
+ printk(KERN_INFO "AX25: %s, %d, %p, %p, %p, %p, %p\n",
+ __FUNCTION__, __LINE__, sk, sk->sk_socket,
+ sk->sk_socket->sk, ax25_sk,
+ ax25_sk == NULL ? NULL : ax25_sk->sk_socket);
+ skb->destructor = NULL;
+ return 1;
+ }
+ return 0;
+}
+
#endif /* _SOCK_H */
diff -Nurp 2.6.24.3-with2/net/ax25/af_ax25.c 2.6.24.3-with3/net/ax25/af_ax25.c
--- 2.6.24.3-with2/net/ax25/af_ax25.c 2008-03-23 00:53:51.000000000 +0100
+++ 2.6.24.3-with3/net/ax25/af_ax25.c 2008-03-26 18:52:54.000000000 +0100
@@ -952,6 +952,8 @@ static int ax25_release(struct socket *ssock_hold(sk);
sock_orphan(sk);
+ if (net_ratelimit())
+ printk(KERN_INFO "AX25- %s, %d, %p, %p\n", __FUNCTION__, __LINE__, sk, sock);
lock_sock(sk);
ax25 = ax25_sk(sk);@@ -1372,12 +1374,16 @@ static int ax25_accept(struct socket *so
newsk = skb->sk;
newsk->sk_socket = newsock;
newsk->sk_sleep = &newsock->wait;
+ newsk->sk_ax25_debug = 2;/* Now attach up the new socket */
kfree_skb(skb);
sk->sk_ack_backlog--;
newsock->sk = newsk;
newsock->state = SS_CONNECTED;
+ if (net_ratelimit())
+ printk(KERN_INFO "AX25=...
On Wed, Mar 26, 2008 at 07:35:38PM +0100, Jarek Poplawski wrote:
Bernard, I fucked it up with this #3: (changed macro to function at
the last minute). I'm currently reading this, and maybe I'll figure
out something, but if it's not a big problem for you to repeat this
after adding this #4 (after all previous) it should be 'a bit' easier.
(But no need to hurry.)Very sorry,
Jarek P.(debugging patch #4)
---diff -Nurp 2.6.24.3-with3/include/net/sock.h 2.6.24.3-with4/include/net/sock.h
--- 2.6.24.3-with3/include/net/sock.h 2008-03-26 19:10:33.000000000 +0100
+++ 2.6.24.3-with4/include/net/sock.h 2008-03-28 12:42:19.000000000 +0100
@@ -1414,20 +1414,18 @@ extern int sysctl_optmem_max;
extern __u32 sysctl_wmem_default;
extern __u32 sysctl_rmem_default;-static inline int sock_debug_ax25(struct sk_buff *skb, struct sock *ax25_sk)
-{
- struct sock *sk = skb->sk;
-
- if (sk && sk->sk_socket && sk->sk_socket->sk != sk) {
- if (net_ratelimit())
- printk(KERN_INFO "AX25: %s, %d, %p, %p, %p, %p, %p\n",
- __FUNCTION__, __LINE__, sk, sk->sk_socket,
- sk->sk_socket->sk, ax25_sk,
- ax25_sk == NULL ? NULL : ax25_sk->sk_socket);
- skb->destructor = NULL;
- return 1;
- }
- return 0;
-}
+#define sock_debug_ax25(__skb, __ax25_sk) \
+({ \
+ struct sock *__sk = __skb->sk; \
+ \
+ if (__sk && __sk->sk_socket && __sk->sk_socket->sk != __sk) { \
+ if (net_ratelimit()) \
+ printk(KERN_INFO "AX25: %s, %d, %p, %p, %p, %p, %p\n", \
+ __FUNCTION__, __LINE__, __sk, __sk->sk_socket, \
+ __sk->sk_socket->sk, __ax25_sk, \
+ __ax25_sk == NULL ? NULL : __ax25_sk->sk_socket); \
+ __skb->destructor = NULL; \
+ } \
+})#endif /* _SOCK_H */
diff -Nurp 2.6.24.3-with3/net/ax25/af_ax25.c 2.6.24.3-with4/net/ax25/af_ax25.c
--- 2.6.24.3-with3/net/ax25/af_ax25.c 2008-03-26 18:52:54.000000000 +0100
+++ 2.6.24.3-with4/net/ax25/af_ax25.c 200...
Please find below an extract of /var/log/messages
The time of interest is around 12:03Mar 29 11:52:53 f6bvp-9 kernel: AX25+ ax25_create, 866, c15602e0, c0f0a820
Mar 29 11:52:53 f6bvp-9 kernel: AX25- ax25_release, 959, c15602e0,
c0f0a820
Mar 29 11:52:53 f6bvp-9 kernel: AX25+ ax25_create, 866, c15602e0, c0f0a820
Mar 29 11:52:53 f6bvp-9 kernel: AX25- ax25_release, 959, c15602e0,
c0f0a820
Mar 29 11:52:59 f6bvp-9 kernel: AX25: ax25_frames_acked, 59, c6ae2bc8,
c0f0a040, 6b6b6b6b, 00000000, 00000000
Mar 29 11:53:53 f6bvp-9 kernel: AX25+ ax25_create, 866, c15602e0, c0f0a580
Mar 29 11:53:53 f6bvp-9 kernel: AX25- ax25_release, 959, c15602e0,
c0f0a580
Mar 29 11:53:53 f6bvp-9 kernel: AX25+ ax25_create, 866, c15602e0, c0f0a580
Mar 29 11:53:53 f6bvp-9 kernel: AX25- ax25_release, 959, c15602e0,
c0f0a580
Mar 29 11:54:53 f6bvp-9 kernel: AX25+ ax25_create, 866, c15602e0, c0f0a580
Mar 29 11:54:53 f6bvp-9 kernel: AX25- ax25_release, 959, c15602e0,
c0f0a580
Mar 29 11:54:53 f6bvp-9 kernel: AX25+ ax25_create, 866, c15602e0, c0f0a580
Mar 29 11:54:53 f6bvp-9 kernel: AX25- ax25_release, 959, c15602e0,
c0f0a580
Mar 29 11:55:53 f6bvp-9 kernel: AX25+ ax25_create, 866, c15602e0, c0f60820
Mar 29 11:55:53 f6bvp-9 kernel: AX25- ax25_release, 959, c15602e0,
c0f60820
Mar 29 11:55:53 f6bvp-9 kernel: AX25+ ax25_create, 866, c15602e0, c0f60820
Mar 29 11:55:53 f6bvp-9 kernel: AX25- ax25_release, 959, c15602e0,
c0f60820
Mar 29 11:56:52 f6bvp-9 kernel: AX25+ ax25_create, 866, c15602e0, c0f60820
Mar 29 11:56:52 f6bvp-9 kernel: AX25- ax25_release, 959, c15602e0,
c0f60820
Mar 29 11:56:53 f6bvp-9 kernel: AX25+ ax25_create, 866, c15602e0, c0f60820
Mar 29 11:56:53 f6bvp-9 kernel: AX25- ax25_release, 959, c15602e0,
c0f60820
Mar 29 11:56:53 f6bvp-9 kernel: AX25+ ax25_create, 866, c15602e0, c0f60820
Mar 29 11:56:53 f6bvp-9 kernel: AX25- ax25_release, 959, c15602e0,
c0f60820
Mar 29 11:56:53 f6bvp-9 kernel: AX25+ ax25_create, 866, c15602e0, c0f60820
Mar 29 11...
On Sat, Mar 29, 2008 at 01:04:16PM +0100, Bernard Pidoux wrote:
...Thanks Bernard! ...On the other hand, if you could hold these patches
a while yet and maybe find more of these "AX25:" it should be very
helpful (of course with other AX25* lines).Regards,
Jarek P.
--
Jarek,
I enlarged grep filter in order to grab AX.25 data that might be
interesting too.And this is the result :
# cat /var/log/kernel/info.log | grep AX...
Mar 30 10:19:00 f6bvp-9 kernel: AX25+ ax25_create, 866, c14e6e28, c74322e0
Mar 30 10:19:00 f6bvp-9 kernel: AX25- ax25_release, 959, c14e6e28, c74322e0
Mar 30 10:19:13 f6bvp-9 kernel: AX25: ax25_frames_acked, 59, c2aa4130,
c74322e0, 6b6b6b6b, 00000000, 00000000
Mar 30 10:19:59 f6bvp-9 kernel: AX25+ ax25_create, 866, c14e7a70, c74322e0
Mar 30 10:19:59 f6bvp-9 kernel: AX25- ax25_release, 959, c14e7a70, c74322e0
Mar 30 10:20:00 f6bvp-9 kernel: AX25+ ax25_create, 866, c14e7a70, c74322e0
Mar 30 10:20:00 f6bvp-9 kernel: AX25- ax25_release, 959, c14e7a70, c74322e0
Mar 30 10:20:59 f6bvp-9 kernel: AX25+ ax25_create, 866, c14e6e28, c74322e0
Mar 30 10:20:59 f6bvp-9 kernel: AX25- ax25_release, 959, c14e6e28, c74322e0
Mar 30 10:21:00 f6bvp-9 kernel: AX25+ ax25_create, 866, c14e6e28, c74322e0
Mar 30 10:21:00 f6bvp-9 kernel: AX25- ax25_release, 959, c14e6e28, c74322e0
Mar 30 10:21:59 f6bvp-9 kernel: AX25+ ax25_create, 866, c14e6e28, c74322e0
Mar 30 10:21:59 f6bvp-9 kernel: AX25- ax25_release, 959, c14e6e28, c74322e0
Mar 30 10:22:00 f6bvp-9 kernel: AX25+ ax25_create, 866, c14e6e28, c74322e0
Mar 30 10:22:00 f6bvp-9 kernel: AX25- ax25_release, 959, c14e6e28, c74322e0
Mar 30 10:22:37 f6bvp-9 kernel: AX25; ax25_make_new, 943, c14e7a70, c51ad820
Mar 30 10:22:37 f6bvp-9 kernel: AX25= ax25_accept, 1386, c14e7a70,
c51ad820, c74322e0
Mar 30 10:22:38 f6bvp-9 kernel: AX.25: sendto: Addresses built.
Mar 30 10:22:38 f6bvp-9 kernel: AX.25: sendto: building packet.
Mar 30 10:22:38 f6bvp-9 kernel: AX.25: Appending user data
Mar 30 10:22:38 f6bvp-9 kernel: AX.25: Transmitting buffer
Mar 30 10:22:39 f6bvp-9 kernel: AX.25: sendto: Addresses built.
Mar 30 10:22:39 f6bvp-9 kernel: AX.25: sendto: building packet.
Mar 30 10:22:39 f6bvp-9 kernel: AX.25: Appending user data
Mar 30 10:22:39 f6bvp-9 kernel: AX.25: Transmitting buffer
Mar 30 10:22:39 f6bvp-9 ker...
Hi Jarek,
please find attached another collection of data from
/var/log/kernel/info.log that I bziped for it is rather large.Bernard
Thanks again. It looks like ROSE sockets aren't released properly.
Here is a try to fix this. Could you test it with all previous patches
present?Regards,
Jarek P.(patch #5)
---diff -Nurp 2.6.24.4-/net/rose/af_rose.c 2.6.24.4+/net/rose/af_rose.c
--- 2.6.24.4-/net/rose/af_rose.c 2008-01-24 23:58:37.000000000 +0100
+++ 2.6.24.4+/net/rose/af_rose.c 2008-03-30 10:53:19.000000000 +0200
@@ -599,17 +599,24 @@ static int rose_release(struct socket *sif (sk == NULL) return 0;
+ sock_hold(sk);
+ sock_orphan(sk);
+ lock_sock(sk);
rose = rose_sk(sk);switch (rose->state) {
case ROSE_STATE_0:
+ release_sock(sk);
rose_disconnect(sk, 0, -1, -1);
+ lock_sock(sk);
rose_destroy_socket(sk);
break;case ROSE_STATE_2:
rose->neighbour->use--;
+ release_sock(sk);
rose_disconnect(sk, 0, -1, -1);
+ lock_sock(sk);
rose_destroy_socket(sk);
break;@@ -634,6 +641,8 @@ static int rose_release(struct socket *s
}sock->sk = NULL;
+ release_sock(sk);
+ sock_put(sk);return 0;
}
--
Jarek,
It looks like you hit the right target !
There is no more warnings and no more ax25_frames_acked or
sk_ax25_debug.We only get ax25_create and ax25_release data, plus ax25_make_new when
there is an AX25 connexion.cat /var/log/messages | grep 'Mar 30 17' | grep AX...
Mar 30 17:00:58 f6bvp-9 kernel: AX25+ ax25_create, 866, c73a6738, c3300040
Mar 30 17:00:58 f6bvp-9 kernel: AX25- ax25_release, 959, c73a6738, c3300040
Mar 30 17:00:58 f6bvp-9 kernel: AX25+ ax25_create, 866, c73a6738, c3300040
Mar 30 17:00:58 f6bvp-9 kernel: AX25- ax25_release, 959, c73a6738, c3300040
Mar 30 17:01:23 f6bvp-9 kernel: AX25; ax25_make_new, 943, c73a6738, c762a820
Mar 30 17:01:23 f6bvp-9 kernel: AX25= ax25_accept, 1386, c73a6738,
c762a820, c3300040
Mar 30 17:01:58 f6bvp-9 kernel: AX25+ ax25_create, 866, c73a6f68, c7673ac0
Mar 30 17:01:58 f6bvp-9 kernel: AX25- ax25_release, 959, c73a6f68, c7673ac0
Mar 30 17:01:58 f6bvp-9 kernel: AX25+ ax25_create, 866, c73a6f68, c7673ac0Please excuse the following questions if you find they are too naive.
Could you tell us how did you suspect ROSE socket from the data
collected ? and why improperly released ROSE sockets could interfere
with AX25 ?Regards,
Bernard P.
On Sun, Mar 30, 2008 at 07:38:51PM +0200, Bernard Pidoux wrote:
No problem! Patch #3 added to "sk_ax25_debug =" sk->sk_family as the
2nd number, and 11 according to include/linux/socket.h means AF_ROSE.sock_def_write() function in net/core/sock.c where the OOPs triggered
runs some cleaning on sockets when their skbs are kfreed, but this
shouldn't be done after socket had been closed (sock_orphan() removes
some pointers). When ROSE sends it's skbs with ax25_send_frame(), they
are added to AX25 queues, and kfreed after transmitting. If it's after
ROSE socket had been released, but not properly updated, the cleaning
is done on kfreed memory not used by the socket anymore.Cheers,
Jarek P.
--
Hi Jarek,
Here is the result of cat /var/log/messages | grep -B1 -A50 core/sock
No event was detected before 16:33.--
Mar 27 16:33:05 f6bvp-9 kernel: sk_ax25_debug = 0, 11, 0, 5, c08922e0,
6b6b6b6b, c7790190
Mar 27 16:33:05 f6bvp-9 kernel: WARNING: at net/core/sock.c:1561
sock_def_write_space()
Mar 27 16:33:05 f6bvp-9 kernel: Pid: 3454, comm: mkiss Not tainted
2.6.24.4 #4
Mar 27 16:33:05 f6bvp-9 kernel: [<c010513a>] show_trace_log_lvl+0x1a/0x30
Mar 27 16:33:05 f6bvp-9 kernel: [<c0105b52>] show_trace+0x12/0x20
Mar 27 16:33:05 f6bvp-9 kernel: [<c01064ac>] dump_stack+0x6c/0x80
Mar 27 16:33:05 f6bvp-9 kernel: [<c024d68b>]
sock_def_write_space+0x17b/0x190
Mar 27 16:33:05 f6bvp-9 kernel: [<c024da7a>] sock_wfree+0x3a/0x40
Mar 27 16:33:05 f6bvp-9 kernel: [<c024fc3d>] skb_release_all+0x4d/0xf0
Mar 27 16:33:05 f6bvp-9 kernel: [<c024f3eb>] __kfree_skb+0xb/0x90
Mar 27 16:33:05 f6bvp-9 kernel: [<c024f489>] kfree_skb+0x19/0x30
Mar 27 16:33:05 f6bvp-9 kernel: [<c8cbfe37>] ax_xmit+0xf7/0x360 [mkiss]
Mar 27 16:33:05 f6bvp-9 kernel: [<c02554a7>]
dev_hard_start_xmit+0x207/0x270
Mar 27 16:33:05 f6bvp-9 kernel: [<c0265934>] __qdisc_run+0x54/0x1a0
Mar 27 16:33:05 f6bvp-9 kernel: [<c0257cec>] dev_queue_xmit+0x25c/0x330
Mar 27 16:33:05 f6bvp-9 kernel: [<c8befa77>] ax25_queue_xmit+0x47/0x70
[ax25]
Mar 27 16:33:05 f6bvp-9 kernel: [<c8befb65>]
ax25_transmit_buffer+0xc5/0x110 [ax25]
Mar 27 16:33:05 f6bvp-9 kernel: [<c8befc2d>]
ax25_send_iframe+0x7d/0x110 [ax25]
Mar 27 16:33:06 f6bvp-9 kernel: [<c8befd70>] ax25_kick+0xb0/0x2b0 [ax25]
Mar 27 16:33:06 f6bvp-9 kernel: [<c8bf10c9>]
ax25_std_frame_in+0x69/0x890 [ax25]
Mar 27 16:33:06 f6bvp-9 kernel: [<c8beedba>] ax25_kiss_rcv+0x2ba/0x7b0
[ax25]
Mar 27 16:33:06 f6bvp-9 kernel: [<c0254e04>] netif_receive_skb+0x244/0x2e0
Mar 27 16:33:06 f6bvp-9 kernel: [<c0257975>] process_backlog+0x75/0xe0
Mar 27 16:33:06 f6bvp-9 ke...
Hi Jarek,
Thank you for the new debugging patch.
I think I have collected enough data for you to make some interesting
deductions.Here are the data collected and filtered with
cat /var/log/kernel/info.log | grep AX25
I have noticed that there is no AX25= result, although the system
ran a few hours.Mar 27 15:50:49 f6bvp-9 kernel: AX25- ax25_release, 952, c340fad0, c513a820
Mar 27 15:50:54 f6bvp-9 kernel: AX25- ax25_release, 952, c0f07b10, c0cc0820
Mar 27 15:50:55 f6bvp-9 kernel: AX25- ax25_release, 952, c34676d8, c0cc0d60
Mar 27 15:50:55 f6bvp-9 kernel: AX25- ax25_release, 952, c340e240, c0cc0820
Mar 27 15:50:55 f6bvp-9 kernel: AX25- ax25_release, 952, c0f07b10, c0cc02e0
Mar 27 15:50:55 f6bvp-9 kernel: AX25- ax25_release, 952, c0f07b10, c0cc0d60
Mar 27 15:50:55 f6bvp-9 kernel: AX25- ax25_release, 952, c0f07b10, c0cc02e0
Mar 27 15:51:17 f6bvp-9 kernel: AX25- ax25_release, 952, c0fc4ad0, c148aac0
Mar 27 15:51:22 f6bvp-9 kernel: AX25- ax25_release, 952, c0d482c0, c148a2e0
Mar 27 15:51:22 f6bvp-9 kernel: AX25- ax25_release, 952, c0805758, c148a580
Mar 27 15:51:22 f6bvp-9 kernel: AX25- ax25_release, 952, c0804f28, c148a820
Mar 27 15:51:22 f6bvp-9 kernel: AX25- ax25_release, 952, c08046f8, c0cfdd60
Mar 27 15:51:27 f6bvp-9 kernel: AX25- ax25_release, 952, c0804f28, c148a820
Mar 27 15:51:27 f6bvp-9 kernel: AX25- ax25_release, 952, c0d48af0, c148a820
Mar 27 15:51:27 f6bvp-9 kernel: AX25- ax25_release, 952, c0fc4ad0, c148aac0
Mar 27 15:51:27 f6bvp-9 kernel: AX25- ax25_release, 952, c0804f28, c148a820
Mar 27 15:51:27 f6bvp-9 kernel: AX25- ax25_release, 952, c0804f28, c148a040
Mar 27 15:51:37 f6bvp-9 kernel: AX25- ax25_release, 952, c0fc4ad0, c148aac0
Mar 27 15:51:47 f6bvp-9 kernel: AX25- ax25_release, 952, c0fc4ad0, c148aac0
Mar 27 15:51:47 f6bvp-9 kernel: AX25- ax25_release, 952, c0d486d8, c0cfdd60
Mar 27 15:52:27 f6bvp-9 kernel: AX25- ax25_release, 952, c0d486d8, c0cfdd60
Mar 27 15:52:27 f6bvp-9 kernel: AX25- ax25_release, 952, c0d486d8, c0cfdd60
Mar 27 15:53:27 f6bvp-9 kernel: AX...
On Tue, Mar 25, 2008 at 02:23:46PM +0100, Bernard Pidoux wrote:
I need some time yet; it looks like 2.6.24.4 doesn't collide with
these patches, but upgrade could be reasonable for other reasons.Best regards,
Jarek P.
--
Hi Jarek,
Please ignore the spinlock report I sent in this previous message.
I simply forgot to apply back your mkiss ax_bump() patch when doing some
trials.After applying again your patch and turning off the debug you asked for,
I observe exactly the same spinlock I first reported i.e. involving ax25ipd.Sorry for the confusion.
Regards,
Bernard
--
On Thu, Mar 20, 2008 at 11:58:35PM +0100, Bernard Pidoux wrote:
I wanted to see something more on the stack than lockdep's code only.
Will see... It seems some memory debugging is still on and doing this
poisoning with '6b', but I can't check now what it is. (Sometimes this
could omit some warnings).Thanks,
Jarek P.
--
The only difference I can see compared to the original spinlock report
is this line :[<c01e7409>] _raw_spin_lock+0x69/0x140
Regards,
Bernard P.
BUG: spinlock bad magic on CPU#0, ax25ipd/3648
BUG: unable to handle kernel paging request at virtual address 6b6b6c2f
printing eip: c01e730b *pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in: rose netrom mkiss crc16 ax25 mga drm netconsole
configfs nfsd exportfs nfs lockd nfs_acl sunrpc af_packet ipv6
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss
snd_mixer_oss binfmt_misc loop usb_storage scsi_mod floppy usbhid
snd_ens1371 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer
snd uhci_hcd soundcore snd_page_alloc usbcore ne2k_pci ide_cd 8390
ac97_bus intel_agp agpgart genrtc ext3 jbd ide_disk piix ide_corePid: 3648, comm: ax25ipd Not tainted (2.6.24.3 #8)
EIP: 0060:[<c01e730b>] EFLAGS: 00010002 CPU: 0
EIP is at spin_bug+0xdb/0xf0
EAX: 00000032 EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: 00000096
ESI: c0e4bc30 EDI: c0333633 EBP: c5c7bbc4 ESP: c5c7bba0
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process ax25ipd (pid: 3648, ti=c5c7a000 task=c6abec90 task.ti=c5c7a000)
Stack: c0333724 c0333633 00000000 c6abee59 00000e40 00000007 c0e4bc30
00000001
c697ebf8 c5c7bbfc c01e7409 00000000 00000001 c0180640 00000000
00000000
00000003 c0a320b8 00000000 c6abec90 00000282 00000001 c697ebf8
c5c7bc08
Call Trace:
[<c010527a>] show_trace_log_lvl+0x1a/0x30
[<c0105339>] show_stack_log_lvl+0xa9/0xd0
[<c0105427>] show_registers+0xc7/0x1b0
[<c010561a>] die+0x10a/0x230
[<c0114855>] do_page_fault+0x2f5/0x600
[<c02c039a>] error_code+0x72/0x78
[<c01e7409>] _raw_spin_lock+0x69/0x140
[<c02c01cc>] _spin_lock_irqsave+0xc/0x20
[<c0117c9b>] __wake_up+0x1b/0x50
[<c024badc>] sock_def_write_space+0x9c/0xc0
[<c024bdcc>] sock_wfree+0x3c/0x50
[<c024dfc7>] skb_release_all+0x57/0x130
[<c024d75b&...
On Fri, Mar 21, 2008 at 10:52:27AM +0100, Bernard Pidoux wrote:
Bernard, thanks for these reports. I simply need more time to find
various places in ax25 which could matter here. Of course any
additional information could help, but these reports mostly repeat
the same. BTW, I wonder if it's possible to run this command for
closing the session which causes the bug with strace?:
strace -o strace.log a_commandRegards,
Jarek P.
--
| James Bottomley | Re: Integration of SCST in the mainstream Linux kernel |
| Greg Kroah-Hartman | [PATCH 007/196] Chinese: add translation of stable_kernel_rules.txt |
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Jan Engelhardt | intel iommu (Re: -mm merge plans for 2.6.23) |
git: | |
| Alexey Dobriyan | Re: [GIT]: Networking |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | Re: [BUG] New Kernel Bugs |
