Hi, First the confessions: It's fedora-patched kernel, 2.6.25.11-97.fc9, tainted by NVIDIA driver. So feel free to drop to /dev/null until I can reproduce on a clean and recent kernel. (But -rc3 didn't boot for me, and I don't have anything else handy atm.) By bashing on random device nodes (with mknod + mmap as root), I got a few messages of this kind: idr_remove called for id=236 which is not allocated. Pid: 3019, comm: a.out Tainted: P 2.6.25.11-97.fc9.i686 #1 [<c04f030c>] idr_remove+0xd5/0x138 [<c0540bfc>] release_dev+0x5bd/0x5cf [<c04909fd>] ? d_free+0x3b/0x4d [<c04955d0>] ? mntput_no_expire+0x16/0x69 [<c048309d>] ? __fput+0x149/0x151 [<c0540d8c>] tty_release+0x12/0x1c [<c0483001>] __fput+0xad/0x151 [<c04830bc>] fput+0x17/0x19 [<c04807df>] filp_close+0x50/0x5a [<c0480852>] sys_close+0x69/0xa1 [<c0405bf2>] syscall_call+0x7/0xb [<c0620000>] ? agp_amd64_probe+0x134/0x3ee ======================= idr_remove called for id=127 which is not allocated. Pid: 3019, comm: a.out Tainted: P 2.6.25.11-97.fc9.i686 #1 [<c04f030c>] idr_remove+0xd5/0x138 [<c0540bfc>] release_dev+0x5bd/0x5cf [<c04909fd>] ? d_free+0x3b/0x4d [<c04955d0>] ? mntput_no_expire+0x16/0x69 [<c048309d>] ? __fput+0x149/0x151 [<c0540d8c>] tty_release+0x12/0x1c [<c0483001>] __fput+0xad/0x151 [<c04830bc>] fput+0x17/0x19 [<c04807df>] filp_close+0x50/0x5a [<c0480852>] sys_close+0x69/0xa1 [<c0405bf2>] syscall_call+0x7/0xb ======================= It looks scary. Or is it purely my own fault? Thanks, Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 --
Now it looks more like this (with very frequent messages): ida_remove called for id=112 which is not allocated. ida_remove called for id=67 which is not allocated. ida_remove called for id=191 which is not allocated. ida_remove called for id=23 which is not allocated. ..and with no backtrace, so I guess it means "not harmful". Sorry for the noise. Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 --
Thats definitely not good and wants digging into further. --
Okay! I've attached my program. On 2.6.27-rc3 the message appears at least a couple of times per second, so you should have no trouble in reproducing it. On the 2.6.25 I get it maybe once a minute. Compile with: gcc -std=gnu99 Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036
Hi, I've now been digging. This reproduces it accurately: # mknod fubar c 128 42 # cat fubar <ctrl-c> idr_remove called for id=42 which is not allocated. Major nr. 128 is UNIX98_PTY_MASTER_MAJOR. The Documentation/devices.txt tells us to access these through /dev/ptmx only. So when we don't follow that rule, tty_open() is called instead of ptmx_open() when the device is opened. ptmx_open() would allocate a new id to use. But since we call tty_open(), it will use tty->index which is set from get_tty_driver() -- calculated using the minor number that we provided! The only thing I don't understand is why we don't get _two_ errors on close() -- I would expect to get one for the slave too. But maybe the slave is never created. What do you think of this theory? Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 --
On Fri, 15 Aug 2008 23:26:28 +0200
pty: If the administrator creates a device not for a ptmx slave don't error
From: Alan Cox <alan@redhat.com>
The open path for ptmx slaves is via the ptmx device. Opening them any
other way is not allowed. Vegard Nossum found that previously this was not
the case and mknod foo c 128 42; cat foo would produce nasty diagnostics
---
drivers/char/tty_io.c | 19 +++++++++++++------
1 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index 53b62c4..430c266 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -1217,7 +1217,8 @@ static void tty_line_name(struct tty_driver *driver, int index, char *p)
* init_dev - initialise a tty device
* @driver: tty driver we are opening a device on
* @idx: device index
- * @tty: returned tty structure
+ * @ret_tty: returned tty structure
+ * @first_ok: ok to open a new device (used by ptmx)
*
* Prepare a tty device. This may not be a "new" clean device but
* could also be an active device. The pty drivers require special
@@ -1238,7 +1239,7 @@ static void tty_line_name(struct tty_driver *driver, int index, char *p)
*/
static int init_dev(struct tty_driver *driver, int idx,
- struct tty_struct **ret_tty)
+ struct tty_struct **ret_tty, int first_ok)
{
struct tty_struct *tty, *o_tty;
struct ktermios *tp, **tp_loc, *o_tp, **o_tp_loc;
@@ -1269,6 +1270,12 @@ static int init_dev(struct tty_driver *driver, int idx,
}
if (tty) goto fast_track;
+ if (driver->subtype == PTY_TYPE_MASTER &&
+ (driver->flags & TTY_DRIVER_DEVPTS_MEM) && !first_ok) {
+ printk("SLAP\n");
+ retval = -EIO;
+ goto end_init;
+ }
/*
* First time open is complex, especially for PTY devices.
* This code guarantees that either everything succeeds and the
@@ -1403,7 +1410,7 @@ static int init_dev(struct tty_driver *driver, int idx,
if (retval)
goto release_mem_out;
- goto success;
+ goto success;
/*
...Thanks, this seems to work for me. I do get a "SLAP" in dmesg each time, however :-) Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 --
Thanks - pushed to -next (without the printk) --
There were (and probably are) tty races and there might be some more so I'd actually be interested to know if its repeatable and if you can provide the test code that triggers it. Could be nVidiot stuff but quite frankly it looks like a quite believable real pty layer race in areas of code that were touched recently. Alan --
