Hello Alan and Marcel, this morning i got a NULL pointer dereference in tty_register_device() that looked similar to the one, i posted here on 03-jun-2009: http://marc.info/?l=linux-bluetooth&m=124404919324542&w=2 As the boot sequence stopped after this point, i was only able to take a picture of it - see attached jpg. I was able to shut down the system gracefully by pressing the power button. The problem emerges really seldom. Any idea? Regards, Oliver
On Sat, 25 Jul 2009 12:01:43 +0200 tty_register_device appears to have been called with a NULL pointer. Not sure why however. --
if that is the pointer for the struct device, then that used to be fine in the past. Not all RFCOMM device have a parent when they are created. Regards Marcel --
The tty layer doesn't care about the struct device really. Nothing there has changed. The NULL passed appears to be the driver argument. --
Agree with you, because in rfcomm_init, rfcomm thread run before tty initilized, the following patch may fix the problem. oliver, could you verify it it fix your problem? --- rfcomm tty may be used before rfcomm_tty_driver initilized, reporting in: http://marc.info/?l=linux-bluetooth&m=124404919324542&w=2 make 3 changes: 1. remove #ifdef in rfcomm/core.c, make it blank function when rfcomm tty not selected in rfcomm.h 2. tune the rfcomm_init error patch to ensure tty driver initilized before any usage. 3. remove __exit for rfcomm_cleanup_sockets because above change need call it in a __init function. CC: Alan Cox <alan@lxorguk.ukuu.org.uk> Reported-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: Dave Young <hidave.darkstar@gmail.com> -- include/net/bluetooth/rfcomm.h | 13 ++++++++++++- net/bluetooth/rfcomm/core.c | 31 +++++++++++++++++++++---------- net/bluetooth/rfcomm/sock.c | 2 +- 3 files changed, 34 insertions(+), 12 deletions(-) --- linux-2.6.orig/include/net/bluetooth/rfcomm.h 2009-04-09 16:23:03.000000000 +0800 +++ linux-2.6/include/net/bluetooth/rfcomm.h 2009-07-27 17:14:43.000000000 +0800 @@ -355,7 +355,18 @@ struct rfcomm_dev_list_req { }; int rfcomm_dev_ioctl(struct sock *sk, unsigned int cmd, void __user *arg); + +#ifdef CONFIG_BT_RFCOMM_TTY int rfcomm_init_ttys(void); void rfcomm_cleanup_ttys(void); - +#else +static inline int rfcomm_init_ttys(void) +{ + return 0; +} +static inline int rfcomm_cleanup_ttys(void) +{ + return 0; +} +#endif #endif /* __RFCOMM_H */ --- linux-2.6.orig/net/bluetooth/rfcomm/core.c 2009-06-16 17:39:32.000000000 +0800 +++ linux-2.6/net/bluetooth/rfcomm/core.c 2009-07-27 17:24:27.000000000 +0800 @@ -2080,28 +2080,41 @@ static CLASS_ATTR(rfcomm_dlc, S_IRUGO, r /* ---- Initialization ---- */ static int __init rfcomm_init(void) { + int ret; + l2cap_load(); + ret = rfcomm_init_sockets(); + if (ret) + goto out_sock; + + ret = rfcomm_init_ttys(); + if (ret) + goto ...
Hi Dave, i get this problem really seldom on my Laptop and i did not manage to get a reproducible Oops of that problem. Anyway the code you are pointing to seems to have a problem and your added error handling looks good to me - even if i don't know if the initializations can be reordered in that way. I'll try your patch, but it could take a *long* time to prove it right ;-) Maybe Marcel and Alan can better prove it to be correct by code review. Many thanks, --
Just FYI: Your patch compiled, the system booted without problems and nothing is broken so far. I checked the BT, WLAN and BT dial-up with success. So it looks good to me. Regards, Oliver --
Hi Dave, i got it again - even with your patch (that's why it's 2.6.31-rc4-dirty in the attached screenshot). Sorry ;-) Best regards, Oliver
Weird, the oops occurs between sock init and tty init routines. Could you tell your bluez version and your configuration? -- Regards dave --
No problem: Hardware here is a Dell 830. And the Oops is still rare ;-) Distro is a Debian $> dpkg -l | grep blue ii bluez-audio 3.36-3 Bluetooth audio support ii bluez-gnome 0.27-1 Bluetooth utilities for GNOME ii bluez-utils 3.36-3 Bluetooth tools and daemons ii libbluetooth2 3.36-1 Library to use the BlueZ Linux Bluetooth stack config-2.6.31-rc4-dirty: # # Automatically generated make config: don't edit # Linux kernel version: 2.6.31-rc4 # Thu Jul 23 08:14:21 2009 # # CONFIG_64BIT is not set CONFIG_X86_32=y # CONFIG_X86_64 is not set CONFIG_X86=y CONFIG_OUTPUT_FORMAT="elf32-i386" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_FAST_CMPXCHG_LOCAL=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y # CONFIG_RWSEM_GENERIC_SPINLOCK is not set CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y CONFIG_GENERIC_CALIBRATE_DELAY=y # CONFIG_GENERIC_TIME_VSYSCALL is not set CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_DEFAULT_IDLE=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_HAVE_DYNAMIC_PER_CPU_AREA=y # CONFIG_HAVE_CPUMASK_OF_CPU_MAP is not set CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y # CONFIG_ZONE_DMA32 is not set CONFIG_ARCH_POPULATES_NODE_MAP=y # CONFIG_AUDIT_ARCH is not ...
Thanks.
It's still reasonable, after rfcomm sock layer initialized, userspace do sock ioctl callback but tty layer was not initilized yet at this time.
Could you confirm it by applying following debug patch on top of my previous patch? if you get more oops with it then above reason will be right.
--- linux-2.6.orig/net/bluetooth/rfcomm/core.c 2009-07-31 17:14:07.000000000 +0800
+++ linux-2.6/net/bluetooth/rfcomm/core.c 2009-07-31 17:30:39.000000000 +0800
@@ -36,6 +36,7 @@
#include <linux/net.h>
#include <linux/mutex.h>
#include <linux/kthread.h>
+#include <linux/nmi.h>
#include <net/sock.h>
#include <asm/uaccess.h>
@@ -2080,7 +2081,7 @@ static CLASS_ATTR(rfcomm_dlc, S_IRUGO, r
/* ---- Initialization ---- */
static int __init rfcomm_init(void)
{
- int ret;
+ int ret, i;
l2cap_load();
@@ -2088,6 +2089,12 @@ static int __init rfcomm_init(void)
if (ret)
goto out_sock;
+ /* delay 5 seconds to trigger the tty bug */
+ for (i = 0; i < 50; i++) {
+ touch_nmi_watchdog();
+ mdelay(100);
+ }
+
ret = rfcomm_init_ttys();
if (ret)
goto out_tty;
--
Hi, for this case, msleep is better, you can just replace the above -- Regards dave --
Hi Dave, applied this patch and replaced mdelay(100) with msleep(100). I got two crashes and three proper boots. The crashes look like the formerly posted screenshots. When it boots properly i can see the delay in the boot process. Does this help? Regards, Oliver --
Yes, I think so. Please unapply the before two patch, try the following v2 patch instead. Changes from v1: fixes 'goto' path again, make tty init before sock init. Thanks. --- rfcomm tty may be used before rfcomm_tty_driver initilized, The problem is that now socket layer init before tty layer, if userspace program do socket callback right here then oops will happen. reporting in: http://marc.info/?l=linux-bluetooth&m=124404919324542&w=2 make 3 changes: 1. remove #ifdef in rfcomm/core.c, make it blank function when rfcomm tty not selected in rfcomm.h 2. tune the rfcomm_init error patch to ensure tty driver initilized before rfcomm socket usage. 3. remove __exit for rfcomm_cleanup_sockets because above change need call it in a __init function. CC: Alan Cox <alan@lxorguk.ukuu.org.uk> Reported-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: Dave Young <hidave.darkstar@gmail.com> -- include/net/bluetooth/rfcomm.h | 13 ++++++++++++- net/bluetooth/rfcomm/core.c | 29 ++++++++++++++++++++--------- net/bluetooth/rfcomm/sock.c | 2 +- 3 files changed, 33 insertions(+), 11 deletions(-) --- linux-2.6.orig/include/net/bluetooth/rfcomm.h 2009-08-01 10:53:18.000000000 +0800 +++ linux-2.6/include/net/bluetooth/rfcomm.h 2009-08-01 10:55:29.000000000 +0800 @@ -355,7 +355,18 @@ struct rfcomm_dev_list_req { }; int rfcomm_dev_ioctl(struct sock *sk, unsigned int cmd, void __user *arg); + +#ifdef CONFIG_BT_RFCOMM_TTY int rfcomm_init_ttys(void); void rfcomm_cleanup_ttys(void); - +#else +static inline int rfcomm_init_ttys(void) +{ + return 0; +} +static inline int rfcomm_cleanup_ttys(void) +{ + return 0; +} +#endif #endif /* __RFCOMM_H */ --- linux-2.6.orig/net/bluetooth/rfcomm/core.c 2009-08-01 10:53:18.000000000 +0800 +++ linux-2.6/net/bluetooth/rfcomm/core.c 2009-08-01 11:03:24.000000000 +0800 @@ -2080,28 +2080,41 @@ static CLASS_ATTR(rfcomm_dlc, S_IRUGO, r /* ---- Initialization ---- */ static int __init rfcomm_init(void) ...
Hi Dave, i did a 'git reset --hard' to get back to vanilla 2.6.31-rc4 . Applied the v2 patch and booted five times without any problems. I'll keep you informed if anything changes ;-) Thanks, --
Just a minor thing: Does rfcomm_cleanup_ttys() return 'int' or is it 'void' ? Regards, Oliver --
Yes it should be void. Thanks for pointing out, here update the patch: --- rfcomm tty may be used before rfcomm_tty_driver initilized, The problem is that now socket layer init before tty layer, if userspace program do socket callback right here then oops will happen. reporting in: http://marc.info/?l=linux-bluetooth&m=124404919324542&w=2 make 3 changes: 1. remove #ifdef in rfcomm/core.c, make it blank function when rfcomm tty not selected in rfcomm.h 2. tune the rfcomm_init error patch to ensure tty driver initilized before rfcomm socket usage. 3. remove __exit for rfcomm_cleanup_sockets because above change need call it in a __init function. CC: Alan Cox <alan@lxorguk.ukuu.org.uk> Reported-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: Dave Young <hidave.darkstar@gmail.com> -- include/net/bluetooth/rfcomm.h | 12 +++++++++++- net/bluetooth/rfcomm/core.c | 29 ++++++++++++++++++++--------- net/bluetooth/rfcomm/sock.c | 2 +- 3 files changed, 32 insertions(+), 11 deletions(-) --- linux-2.6.orig/include/net/bluetooth/rfcomm.h 2009-08-01 13:56:53.000000000 +0800 +++ linux-2.6/include/net/bluetooth/rfcomm.h 2009-08-01 17:24:59.000000000 +0800 @@ -355,7 +355,17 @@ struct rfcomm_dev_list_req { }; int rfcomm_dev_ioctl(struct sock *sk, unsigned int cmd, void __user *arg); + +#ifdef CONFIG_BT_RFCOMM_TTY int rfcomm_init_ttys(void); void rfcomm_cleanup_ttys(void); - +#else +static inline int rfcomm_init_ttys(void) +{ + return 0; +} +static inline void rfcomm_cleanup_ttys(void) +{ +} +#endif #endif /* __RFCOMM_H */ --- linux-2.6.orig/net/bluetooth/rfcomm/core.c 2009-08-01 13:56:53.000000000 +0800 +++ linux-2.6/net/bluetooth/rfcomm/core.c 2009-08-01 13:57:18.000000000 +0800 @@ -2080,28 +2080,41 @@ static CLASS_ATTR(rfcomm_dlc, S_IRUGO, r /* ---- Initialization ---- */ static int __init rfcomm_init(void) { + int ret; + l2cap_load(); hci_register_cb(&rfcomm_cb); rfcomm_thread = kthread_run(rfcomm_run, NULL, ...
Hi Dave, the latest approach seems to fit. I booted about 25+ times without any problems. Even the upgrade from Debian Lenny to Squeeze did not cause any probs with bluetooth. Tested-by: Oliver Hartkopp <oliver@hartkopp.net> --
