Re: 2.6.25.11-97.fc9 (P): idr_remove called for id=236 which is not allocated

Previous thread: [PATCH RT] fix flags in usb code by Steven Rostedt on Friday, August 15, 2008 - 8:04 am. (1 message)

Next thread: [patch] fix setpriority(PRIO_PGRP) thread iterator breakage. by Ken Chen on Friday, August 15, 2008 - 8:25 am. (1 message)
From: Vegard Nossum
Date: Friday, August 15, 2008 - 8:11 am

Hi,

First the confessions: It's fedora-patched kernel, 2.6.25.11-97.fc9,
tainted by NVIDIA driver. So feel free to drop to /dev/null until I
can reproduce on a clean and recent kernel. (But -rc3 didn't boot for
me, and I don't have anything else handy atm.)

By bashing on random device nodes (with mknod + mmap as root), I got a
few messages of this kind:

idr_remove called for id=236 which is not allocated.
Pid: 3019, comm: a.out Tainted: P         2.6.25.11-97.fc9.i686 #1
 [<c04f030c>] idr_remove+0xd5/0x138
 [<c0540bfc>] release_dev+0x5bd/0x5cf
 [<c04909fd>] ? d_free+0x3b/0x4d
 [<c04955d0>] ? mntput_no_expire+0x16/0x69
 [<c048309d>] ? __fput+0x149/0x151
 [<c0540d8c>] tty_release+0x12/0x1c
 [<c0483001>] __fput+0xad/0x151
 [<c04830bc>] fput+0x17/0x19
 [<c04807df>] filp_close+0x50/0x5a
 [<c0480852>] sys_close+0x69/0xa1
 [<c0405bf2>] syscall_call+0x7/0xb
 [<c0620000>] ? agp_amd64_probe+0x134/0x3ee
 =======================

idr_remove called for id=127 which is not allocated.
Pid: 3019, comm: a.out Tainted: P         2.6.25.11-97.fc9.i686 #1
 [<c04f030c>] idr_remove+0xd5/0x138
 [<c0540bfc>] release_dev+0x5bd/0x5cf
 [<c04909fd>] ? d_free+0x3b/0x4d
 [<c04955d0>] ? mntput_no_expire+0x16/0x69
 [<c048309d>] ? __fput+0x149/0x151
 [<c0540d8c>] tty_release+0x12/0x1c
 [<c0483001>] __fput+0xad/0x151
 [<c04830bc>] fput+0x17/0x19
 [<c04807df>] filp_close+0x50/0x5a
 [<c0480852>] sys_close+0x69/0xa1
 [<c0405bf2>] syscall_call+0x7/0xb
 =======================

It looks scary. Or is it purely my own fault?

Thanks,


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--

From: Vegard Nossum
Date: Friday, August 15, 2008 - 8:27 am

Now it looks more like this (with very frequent messages):

ida_remove called for id=112 which is not allocated.
ida_remove called for id=67 which is not allocated.
ida_remove called for id=191 which is not allocated.
ida_remove called for id=23 which is not allocated.

..and with no backtrace, so I guess it means "not harmful". Sorry for the noise.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--

From: Alan Cox
Date: Friday, August 15, 2008 - 8:28 am

Thats definitely not good and wants digging into further.
--

From: Vegard Nossum
Date: Friday, August 15, 2008 - 8:50 am

Okay!

I've attached my program. On 2.6.27-rc3 the message appears at least a
couple of times per second, so you should have no trouble in
reproducing it. On the 2.6.25 I get it maybe once a minute.

Compile with: gcc -std=gnu99


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
From: Vegard Nossum
Date: Friday, August 15, 2008 - 2:26 pm

Hi,

I've now been digging. This reproduces it accurately:

# mknod fubar c 128 42
# cat fubar
<ctrl-c>

idr_remove called for id=42 which is not allocated.

Major nr. 128 is UNIX98_PTY_MASTER_MAJOR. The
Documentation/devices.txt tells us to access these through /dev/ptmx
only. So when we don't follow that rule, tty_open() is called instead
of ptmx_open() when the device is opened. ptmx_open() would allocate a
new id to use. But since we call tty_open(), it will use tty->index
which is set from get_tty_driver() -- calculated using the minor
number that we provided!

The only thing I don't understand is why we don't get _two_ errors on
close() -- I would expect to get one for the slave too. But maybe the
slave is never created. What do you think of this theory?


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--

From: Alan Cox
Date: Monday, August 18, 2008 - 3:15 pm

On Fri, 15 Aug 2008 23:26:28 +0200

pty: If the administrator creates a device not for a ptmx slave don't error

From: Alan Cox <alan@redhat.com>


The open path for ptmx slaves is via the ptmx device. Opening them any
other way is not allowed. Vegard Nossum found that previously this was not
the case and mknod foo c 128 42; cat foo would produce nasty diagnostics
---

 drivers/char/tty_io.c |   19 +++++++++++++------
 1 files changed, 13 insertions(+), 6 deletions(-)


diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index 53b62c4..430c266 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -1217,7 +1217,8 @@ static void tty_line_name(struct tty_driver *driver, int index, char *p)
  *	init_dev		-	initialise a tty device
  *	@driver: tty driver we are opening a device on
  *	@idx: device index
- *	@tty: returned tty structure
+ *	@ret_tty: returned tty structure
+ *	@first_ok: ok to open a new device (used by ptmx)
  *
  *	Prepare a tty device. This may not be a "new" clean device but
  *	could also be an active device. The pty drivers require special
@@ -1238,7 +1239,7 @@ static void tty_line_name(struct tty_driver *driver, int index, char *p)
  */
 
 static int init_dev(struct tty_driver *driver, int idx,
-	struct tty_struct **ret_tty)
+	struct tty_struct **ret_tty, int first_ok)
 {
 	struct tty_struct *tty, *o_tty;
 	struct ktermios *tp, **tp_loc, *o_tp, **o_tp_loc;
@@ -1269,6 +1270,12 @@ static int init_dev(struct tty_driver *driver, int idx,
 	}
 	if (tty) goto fast_track;
 
+	if (driver->subtype == PTY_TYPE_MASTER &&
+		(driver->flags & TTY_DRIVER_DEVPTS_MEM) && !first_ok) {
+		printk("SLAP\n");
+		retval = -EIO;
+		goto end_init;
+	}
 	/*
 	 * First time open is complex, especially for PTY devices.
 	 * This code guarantees that either everything succeeds and the
@@ -1403,7 +1410,7 @@ static int init_dev(struct tty_driver *driver, int idx,
 
 	if (retval)
 		goto release_mem_out;
-	 goto success;
+	goto success;
 
 	/*
 ...
From: Vegard Nossum
Date: Monday, August 18, 2008 - 11:53 pm

Thanks, this seems to work for me. I do get a "SLAP" in dmesg each
time, however :-)


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--

From: Alan Cox
Date: Tuesday, August 19, 2008 - 1:29 am

Thanks - pushed to -next (without the printk)
--

From: Alan Cox
Date: Friday, August 15, 2008 - 8:26 am

There were (and probably are) tty races and there might be some more so
I'd actually be interested to know if its repeatable and if you can
provide the test code that triggers it.

Could be nVidiot stuff but quite frankly it looks like a quite believable
real pty layer race in areas of code that were touched recently.

Alan
--

Previous thread: [PATCH RT] fix flags in usb code by Steven Rostedt on Friday, August 15, 2008 - 8:04 am. (1 message)

Next thread: [patch] fix setpriority(PRIO_PGRP) thread iterator breakage. by Ken Chen on Friday, August 15, 2008 - 8:25 am. (1 message)