Re: tty_register_device NULL pointer dereference in 2.6.31-rc4

Previous thread: Re: [Bugme-new] [Bug 13760] New: 2.6.30 kernel locks up with pppoe in back trace (regression) by Herbert Xu on Friday, July 24, 2009 - 8:33 pm. (2 messages)

Next thread: 1,000,000.00 GBP by michael smith on Saturday, July 25, 2009 - 5:59 am. (1 message)
From: Oliver Hartkopp
Date: Saturday, July 25, 2009 - 3:01 am

Hello Alan and Marcel,

this morning i got a NULL pointer dereference in tty_register_device() that
looked similar to the one, i posted here on 03-jun-2009:

http://marc.info/?l=linux-bluetooth&m=124404919324542&w=2

As the boot sequence stopped after this point, i was only able to take a
picture of it - see attached jpg.

I was able to shut down the system gracefully by pressing the power button.

The problem emerges really seldom.

Any idea?

Regards,
Oliver



From: Alan Cox
Date: Saturday, July 25, 2009 - 3:50 am

On Sat, 25 Jul 2009 12:01:43 +0200

tty_register_device appears to have been called with a NULL pointer. Not
sure why however.
--

From: Marcel Holtmann
Date: Saturday, July 25, 2009 - 4:07 am

if that is the pointer for the struct device, then that used to be fine
in the past. Not all RFCOMM device have a parent when they are created.

Regards

Marcel


--

From: Alan Cox
Date: Saturday, July 25, 2009 - 5:10 am

The tty layer doesn't care about the struct device really. Nothing there
has changed. The NULL passed appears to be the driver argument.
--

From: Dave Young
Date: Monday, July 27, 2009 - 2:59 am

Agree with you, because in rfcomm_init, rfcomm thread run before tty initilized, the following patch may fix the problem. oliver, could you verify it it fix your problem?
---

rfcomm tty may be used before rfcomm_tty_driver initilized,

reporting in:
http://marc.info/?l=linux-bluetooth&m=124404919324542&w=2

make 3 changes:
1. remove #ifdef in rfcomm/core.c,
make it blank function when rfcomm tty not selected in rfcomm.h

2. tune the rfcomm_init error patch to ensure
tty driver initilized before any usage.

3. remove __exit for rfcomm_cleanup_sockets
because above change need call it in a __init function. 


CC: Alan Cox <alan@lxorguk.ukuu.org.uk>
Reported-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
--
include/net/bluetooth/rfcomm.h |   13 ++++++++++++-
net/bluetooth/rfcomm/core.c    |   31 +++++++++++++++++++++----------
net/bluetooth/rfcomm/sock.c    |    2 +-
3 files changed, 34 insertions(+), 12 deletions(-)

--- linux-2.6.orig/include/net/bluetooth/rfcomm.h	2009-04-09 16:23:03.000000000 +0800
+++ linux-2.6/include/net/bluetooth/rfcomm.h	2009-07-27 17:14:43.000000000 +0800
@@ -355,7 +355,18 @@ struct rfcomm_dev_list_req {
 };
 
 int  rfcomm_dev_ioctl(struct sock *sk, unsigned int cmd, void __user *arg);
+
+#ifdef CONFIG_BT_RFCOMM_TTY
 int  rfcomm_init_ttys(void);
 void rfcomm_cleanup_ttys(void);
-
+#else
+static inline int rfcomm_init_ttys(void)
+{
+	return 0;
+}
+static inline int rfcomm_cleanup_ttys(void)
+{
+	return 0;
+}
+#endif
 #endif /* __RFCOMM_H */
--- linux-2.6.orig/net/bluetooth/rfcomm/core.c	2009-06-16 17:39:32.000000000 +0800
+++ linux-2.6/net/bluetooth/rfcomm/core.c	2009-07-27 17:24:27.000000000 +0800
@@ -2080,28 +2080,41 @@ static CLASS_ATTR(rfcomm_dlc, S_IRUGO, r
 /* ---- Initialization ---- */
 static int __init rfcomm_init(void)
 {
+	int ret;
+
 	l2cap_load();
 
+	ret = rfcomm_init_sockets();
+	if (ret)
+		goto out_sock;
+
+	ret = rfcomm_init_ttys();
+	if (ret)
+		goto ...
From: Oliver Hartkopp
Date: Monday, July 27, 2009 - 4:12 am

Hi Dave,

i get this problem really seldom on my Laptop and i did not manage to get a
reproducible Oops of that problem.

Anyway the code you are pointing to seems to have a problem and your added
error handling looks good to me - even if i don't know if the initializations
can be reordered in that way.

I'll try your patch, but it could take a *long* time to prove it right ;-)

Maybe Marcel and Alan can better prove it to be correct by code review.

Many thanks,

--

From: Oliver Hartkopp
Date: Monday, July 27, 2009 - 4:39 am

Just FYI:

Your patch compiled, the system booted without problems and nothing is broken
so far. I checked the BT, WLAN and BT dial-up with success. So it looks good
to me.

Regards,
Oliver
--

From: Dave Young
Date: Monday, July 27, 2009 - 7:07 am

From: Oliver Hartkopp
Date: Wednesday, July 29, 2009 - 7:00 am

Hi Dave,

i got it again - even with your patch (that's why it's 2.6.31-rc4-dirty in the
attached screenshot).

Sorry ;-)

Best regards,
Oliver

From: Dave Young
Date: Thursday, July 30, 2009 - 2:15 am

Weird, the oops occurs between sock init and tty init routines. Could
you tell your bluez version and your configuration?

-- 
Regards
dave
--

From: Oliver Hartkopp
Date: Thursday, July 30, 2009 - 3:05 am

No problem:

Hardware here is a Dell 830.

And the Oops is still rare ;-)

Distro is a Debian

$> dpkg -l | grep blue

ii  bluez-audio                          3.36-3                     Bluetooth
audio support
ii  bluez-gnome                          0.27-1                     Bluetooth
utilities for GNOME
ii  bluez-utils                          3.36-3                     Bluetooth
tools and daemons
ii  libbluetooth2                        3.36-1                     Library to
use the BlueZ Linux Bluetooth stack


config-2.6.31-rc4-dirty:

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.31-rc4
# Thu Jul 23 08:14:21 2009
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
# CONFIG_X86_64 is not set
CONFIG_X86=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
# CONFIG_GENERIC_TIME_VSYSCALL is not set
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_HAVE_DYNAMIC_PER_CPU_AREA=y
# CONFIG_HAVE_CPUMASK_OF_CPU_MAP is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_ZONE_DMA32 is not set
CONFIG_ARCH_POPULATES_NODE_MAP=y
# CONFIG_AUDIT_ARCH is not ...
From: Dave Young
Date: Friday, July 31, 2009 - 2:39 am

Thanks.

It's still reasonable, after rfcomm sock layer initialized, userspace do sock ioctl callback but tty layer was not initilized yet at this time.

Could you confirm it by applying following debug patch on top of my previous patch? if you get more oops with it then above reason will be right.

--- linux-2.6.orig/net/bluetooth/rfcomm/core.c	2009-07-31 17:14:07.000000000 +0800
+++ linux-2.6/net/bluetooth/rfcomm/core.c	2009-07-31 17:30:39.000000000 +0800
@@ -36,6 +36,7 @@
 #include <linux/net.h>
 #include <linux/mutex.h>
 #include <linux/kthread.h>
+#include <linux/nmi.h>
 
 #include <net/sock.h>
 #include <asm/uaccess.h>
@@ -2080,7 +2081,7 @@ static CLASS_ATTR(rfcomm_dlc, S_IRUGO, r
 /* ---- Initialization ---- */
 static int __init rfcomm_init(void)
 {
-	int ret;
+	int ret, i;
 
 	l2cap_load();
 
@@ -2088,6 +2089,12 @@ static int __init rfcomm_init(void)
 	if (ret)
 		goto out_sock;
 
+	/* delay 5 seconds to trigger the tty bug */
+	for (i = 0; i < 50; i++) {
+		touch_nmi_watchdog();
+		mdelay(100);
+	}
+
 	ret = rfcomm_init_ttys();
 	if (ret)
 		goto out_tty;
--

From: Dave Young
Date: Friday, July 31, 2009 - 3:10 am

Hi, for this case, msleep is better, you can just replace the above



-- 
Regards
dave
--

From: Oliver Hartkopp
Date: Friday, July 31, 2009 - 4:20 am

Hi Dave,

applied this patch and replaced mdelay(100) with msleep(100).

I got two crashes and three proper boots.

The crashes look like the formerly posted screenshots.
When it boots properly i can see the delay in the boot process.

Does this help?

Regards,
Oliver

--

From: Dave Young
Date: Friday, July 31, 2009 - 8:17 pm

Yes, I think so.

Please unapply the before two patch, try the following v2 patch instead.
Changes from v1: fixes 'goto' path again, make tty init before sock init.

Thanks.
---

rfcomm tty may be used before rfcomm_tty_driver initilized,
The problem is that now socket layer init before tty layer, if userspace
program do socket callback right here then oops will happen.

reporting in:
http://marc.info/?l=linux-bluetooth&m=124404919324542&w=2

make 3 changes:
1. remove #ifdef in rfcomm/core.c,
make it blank function when rfcomm tty not selected in rfcomm.h

2. tune the rfcomm_init error patch to ensure
tty driver initilized before rfcomm socket usage.

3. remove __exit for rfcomm_cleanup_sockets
because above change need call it in a __init function. 


CC: Alan Cox <alan@lxorguk.ukuu.org.uk>
Reported-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
--
include/net/bluetooth/rfcomm.h |   13 ++++++++++++-
net/bluetooth/rfcomm/core.c    |   29 ++++++++++++++++++++---------
net/bluetooth/rfcomm/sock.c    |    2 +-
3 files changed, 33 insertions(+), 11 deletions(-)

--- linux-2.6.orig/include/net/bluetooth/rfcomm.h	2009-08-01 10:53:18.000000000 +0800
+++ linux-2.6/include/net/bluetooth/rfcomm.h	2009-08-01 10:55:29.000000000 +0800
@@ -355,7 +355,18 @@ struct rfcomm_dev_list_req {
 };
 
 int  rfcomm_dev_ioctl(struct sock *sk, unsigned int cmd, void __user *arg);
+
+#ifdef CONFIG_BT_RFCOMM_TTY
 int  rfcomm_init_ttys(void);
 void rfcomm_cleanup_ttys(void);
-
+#else
+static inline int rfcomm_init_ttys(void)
+{
+	return 0;
+}
+static inline int rfcomm_cleanup_ttys(void)
+{
+	return 0;
+}
+#endif
 #endif /* __RFCOMM_H */
--- linux-2.6.orig/net/bluetooth/rfcomm/core.c	2009-08-01 10:53:18.000000000 +0800
+++ linux-2.6/net/bluetooth/rfcomm/core.c	2009-08-01 11:03:24.000000000 +0800
@@ -2080,28 +2080,41 @@ static CLASS_ATTR(rfcomm_dlc, S_IRUGO, r
 /* ---- Initialization ---- */
 static int __init rfcomm_init(void)
 ...
From: Oliver Hartkopp
Date: Saturday, August 1, 2009 - 2:15 am

Hi Dave,

i did a 'git reset --hard' to get back to vanilla 2.6.31-rc4 .

Applied the v2 patch and booted five times without any problems.

I'll keep you informed if anything changes ;-)

Thanks,

--

From: Oliver Hartkopp
Date: Saturday, August 1, 2009 - 2:21 am

Just a minor thing:

Does rfcomm_cleanup_ttys() return 'int' or is it 'void' ?

Regards,
Oliver
--

From: Dave Young
Date: Saturday, August 1, 2009 - 2:32 am

Yes it should be void. Thanks for pointing out, here update the patch:
---

rfcomm tty may be used before rfcomm_tty_driver initilized,
The problem is that now socket layer init before tty layer, if userspace
program do socket callback right here then oops will happen.

reporting in:
http://marc.info/?l=linux-bluetooth&m=124404919324542&w=2

make 3 changes:
1. remove #ifdef in rfcomm/core.c,
make it blank function when rfcomm tty not selected in rfcomm.h

2. tune the rfcomm_init error patch to ensure
tty driver initilized before rfcomm socket usage.

3. remove __exit for rfcomm_cleanup_sockets
because above change need call it in a __init function. 


CC: Alan Cox <alan@lxorguk.ukuu.org.uk>
Reported-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
--
include/net/bluetooth/rfcomm.h |   12 +++++++++++-
net/bluetooth/rfcomm/core.c    |   29 ++++++++++++++++++++---------
net/bluetooth/rfcomm/sock.c    |    2 +-
3 files changed, 32 insertions(+), 11 deletions(-)

--- linux-2.6.orig/include/net/bluetooth/rfcomm.h	2009-08-01 13:56:53.000000000 +0800
+++ linux-2.6/include/net/bluetooth/rfcomm.h	2009-08-01 17:24:59.000000000 +0800
@@ -355,7 +355,17 @@ struct rfcomm_dev_list_req {
 };
 
 int  rfcomm_dev_ioctl(struct sock *sk, unsigned int cmd, void __user *arg);
+
+#ifdef CONFIG_BT_RFCOMM_TTY
 int  rfcomm_init_ttys(void);
 void rfcomm_cleanup_ttys(void);
-
+#else
+static inline int rfcomm_init_ttys(void)
+{
+	return 0;
+}
+static inline void rfcomm_cleanup_ttys(void)
+{
+}
+#endif
 #endif /* __RFCOMM_H */
--- linux-2.6.orig/net/bluetooth/rfcomm/core.c	2009-08-01 13:56:53.000000000 +0800
+++ linux-2.6/net/bluetooth/rfcomm/core.c	2009-08-01 13:57:18.000000000 +0800
@@ -2080,28 +2080,41 @@ static CLASS_ATTR(rfcomm_dlc, S_IRUGO, r
 /* ---- Initialization ---- */
 static int __init rfcomm_init(void)
 {
+	int ret;
+
 	l2cap_load();
 
 	hci_register_cb(&rfcomm_cb);
 
 	rfcomm_thread = kthread_run(rfcomm_run, NULL, ...
From: Oliver Hartkopp
Date: Monday, August 3, 2009 - 5:03 am

Hi Dave,

the latest approach seems to fit.

I booted about 25+ times without any problems.
Even the upgrade from Debian Lenny to Squeeze did not cause any probs with
bluetooth.

Tested-by: Oliver Hartkopp <oliver@hartkopp.net>


--

Previous thread: Re: [Bugme-new] [Bug 13760] New: 2.6.30 kernel locks up with pppoe in back trace (regression) by Herbert Xu on Friday, July 24, 2009 - 8:33 pm. (2 messages)

Next thread: 1,000,000.00 GBP by michael smith on Saturday, July 25, 2009 - 5:59 am. (1 message)