Re: Oops when using growisofs

Previous thread: [PATCH 0/5 v2] x86 boot: various E820 & EFI related fixes - what changed in v2 by Paul Jackson on Sunday, June 22, 2008 - 10:21 am. (34 messages)

Next thread: [2.6.26 patch] fix the request_firmware() dummy by Adrian Bunk on Sunday, June 22, 2008 - 12:29 pm. (1 message)
To: linux-kernel <linux-kernel@...>
Date: Sunday, June 22, 2008 - 12:18 pm

The following oops happend when trying to append data to a DVD
with growisofs -M.

Burning DVD...
Executing 'genisoimage -C 16,737776 -M /dev/fd/3 -R -J foobar | builtin_dd of=/dev/dvd obs=32k seek=46111'
I: -input-charset not specified, using utf-8 (detected in locale settings)
Rock Ridge signatures found

This is a wireless-testing.git kernel, which corresponds to 2.6.26-rc6

[28375.893176] Unable to handle kernel paging request for data at address 0x00000008
[28375.893181] Faulting instruction address: 0xc00000000012df84
[28375.893186] Oops: Kernel access of bad area, sig: 11 [#1]
[28375.893189] PREEMPT SMP NR_CPUS=4 NUMA PowerMac
[28375.893200] Modules linked in: nls_cp437 isofs udf rfkill_input hci_usb nfsd exportfs auth_rpcgss deadline_iosched nbd ppdev lp parport nfs lockd sunrpc fuse sr_mod sbp2 arc4 ecb b43 evdev af_packet mac80211 cfg80211 rfkill appledisplay usbhid ssb ide_pci_generic ohci1394 ieee1394 uninorth_agp agpgart
[28375.893287] NIP: c00000000012df84 LR: c00000000012df70 CTR: c000000000137b00
[28375.893293] REGS: c000000116aeb300 TRAP: 0300 Not tainted (2.6.26-rc6-wl)
[28375.893296] MSR: 9000000000009032 <EE,ME,IR,DR> CR: 28004842 XER: 000fffff
[28375.893316] DAR: 0000000000000008, DSISR: 0000000040000000
[28375.893320] TASK = c00000011636db00[4667] 'kded' THREAD: c000000116ae8000 CPU: 2
[28375.893327] GPR00: c00000000012df70 c000000116aeb580 c00000000090ff20 0000000000000000
[28375.893340] GPR04: 0000000000010000 0000000000000001 c00000011bfe37a0 0000000000000010
[28375.893352] GPR08: f00000000694d280 0000000000000000 c0000000008c0be0 0000000000000000
[28375.893364] GPR12: 0000000028004842 c000000000941700 0000000000000004 c000000116aeb840
[28375.893377] GPR16: c0000001195d8f78 c0000000008c0cb8 c0000000000bd064 0000000000000003
[28375.893389] GPR20: 0000000000000000 c0000001195d8d68 0000000000000004 c0000001195d8f80
[28375.893402] GPR24: c00000000082c700 0000000000010000 f00000000694d280 0000000000000000
[28375.893415] GPR28: 0000000000000000 f0000...

To: Michael Buesch <mb@...>
Cc: linux-kernel <linux-kernel@...>, Jens Axboe <axboe@...>
Date: Sunday, June 22, 2008 - 5:22 pm

Ok, important information: ppc64 architecture. It would be nice to mention

task was kded, i.e. not growisofs itself, thouh growisofs is probably the one

Note: r9 and r3 are both NULL pointers. r3 is the value returned from alloc_page_buffers.

4800000c is the branch to alloc_page_buffers
7d695b78 copies the return value of that to r9
e9690008 dereferences r9

Evidently, alloc_page_buffers got an out of memory condition, which was not caught
by create_empty_buffers. No idea how it should be handled, but the fact that it's
not looks like a bug to me ;-).

Arnd <><
--

To: Arnd Bergmann <arnd@...>
Cc: linux-kernel <linux-kernel@...>, Jens Axboe <axboe@...>
Date: Sunday, June 22, 2008 - 6:05 pm

I don't think it exausted memory. oom-killer messages would have been in the logs.
And this machine has 2.5GiB memory. It continued to run fine after restarting kded.
I sent this bugreport on the machine that oopsed without a reboot.

Is it possible that this was a kernel race between kded and growisofs?

Hm, yeah. I looked at that code already, but I can't see how it could return

alloc_page_buffers should never return a NULL pointer here, as far as I can see.
It clearly is a bug. An oops always is a bug.

--
Greetings Michael.
--

To: Arnd Bergmann <arnd@...>
Cc: linux-kernel <linux-kernel@...>, Jens Axboe <axboe@...>
Date: Sunday, June 22, 2008 - 6:28 pm

Well, actually, it can return a NULL pointer.

928 head = NULL;
929 offset = PAGE_SIZE;
930 while ((offset -= size) >= 0) {
...
949 }
950 return head;

So if size, which is a passed in as parameter, is > PAGE_SIZE it will return NULL.

The size parameter is calculated by doing
blocksize = 1 << inode->i_blkbits;
in an earlier function in the callchain.

So, well. I dunno what i_blkbits is. There's no docs in struct inode.

--
Greetings Michael.
--

To: Michael Buesch <mb@...>
Cc: Arnd Bergmann <arnd@...>, linux-kernel <linux-kernel@...>, Jens Axboe <axboe@...>, Jan Kara <jack@...>
Date: Monday, June 23, 2008 - 2:34 am

Yes, that's a more likely scenario. isofs has a history of passing

It's log2 of the filesystem blocksize. It'd be interesting to work out
what value isofs is setting it to, and why.

--

To: Andrew Morton <akpm@...>
Cc: Michael Buesch <mb@...>, Arnd Bergmann <arnd@...>, linux-kernel <linux-kernel@...>, Jens Axboe <axboe@...>, Jan Kara <jack@...>
Date: Tuesday, June 24, 2008 - 1:28 pm

Well, yes, that looks as a reason at the first sight. But what I don't
get is, how can isofs possibly set such a blocksize when it uses
sb_set_blocksize() which checks whether blocksize isn't larger than page
size... So it must be something less obvious.
bd_set_size() can possibly set blocksize larger than PAGE_SIZE and
it's called from do_open() but it uses bdev_hardsect_size() and that
shouldn't be larger than PAGE_SIZE either (at least drivers seem to take
care of this).
I have seen one more report of this Oops for SLES10 kernel and also in that
case an IO error happened so probably that is a trigger... But so far I
don't get the details.

Honza
--
Jan Kara <jack@suse.cz>
SuSE CR Labs
--

To: Jan Kara <jack@...>
Cc: Andrew Morton <akpm@...>, Arnd Bergmann <arnd@...>, linux-kernel <linux-kernel@...>, Jens Axboe <axboe@...>, Jan Kara <jack@...>
Date: Tuesday, June 24, 2008 - 2:39 pm

Yeah the IO error is the trigger.
I noticed that it had obvious troubles accessing the DVD that was in the drive.
It sweeped over it for several seconds, then hung the system for 2 or 3 seconds
and then oopsed. But after that everything continued to work as usual.
(Except kded of course)

--
Greetings Michael.
--

To: Michael Buesch <mb@...>
Cc: Andrew Morton <akpm@...>, Arnd Bergmann <arnd@...>, linux-kernel <linux-kernel@...>, Jens Axboe <axboe@...>
Date: Wednesday, June 25, 2008 - 5:37 am

Hmm, by "accessing" do you mean that you've mounted the burned DVD and when
browsing it the IO error and the oops occured or that IO error happened
when burning? It is important because in the first case i_blkbits would be
taken from some ISOFS inode desribing some file while in the second case
i_blkbits are from the inode of the device...

Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--

To: Jan Kara <jack@...>
Cc: Andrew Morton <akpm@...>, Arnd Bergmann <arnd@...>, linux-kernel <linux-kernel@...>, Jens Axboe <axboe@...>
Date: Wednesday, June 25, 2008 - 5:46 am

I don't know. kded, which caused the oops, is always running. It is a KDE daemon
that polls device state and so on. So yeah, it might have accessed the drive
while growisofs was writing to it.

However with "accessing" I mean the DVD drive motor was spinning up and down
and the laser lens was moving like crazy. The sound that happens, if you put
a completely scratched DVD into the drive and it is unable to make sense of it.
However, this was not scratched. It was a new DVD with one session on it that
I just burnt 5 minutes before that. So I wanted to append another session to it
and it crashed and resulted in IO errors in growisofs.

--
Greetings Michael.
--

To: Jens Axboe <axboe@...>
Cc: Michael Buesch <mb@...>, Andrew Morton <akpm@...>, Arnd Bergmann <arnd@...>, linux-kernel <linux-kernel@...>
Date: Thursday, June 26, 2008 - 1:05 pm

I've been looking into this problem for some time. The only way how I see
blocksize can be set so big is in cdrom_read_capacity() in
drivers/ide/ide-cd.c. That basically blindly fills in queue->hardsect_size with what
the drive returns and this can propagate in bd_set_size() to i_blkbits.
Jens, do you think that is possible? Shouldn't ide_cd_read_toc() do some
sanity checks of the blocksize returned?

Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--

To: Jan Kara <jack@...>
Cc: Michael Buesch <mb@...>, Andrew Morton <akpm@...>, Arnd Bergmann <arnd@...>, linux-kernel <linux-kernel@...>
Date: Thursday, June 26, 2008 - 2:11 pm

It can't hurt, the value should be >= 512b and <= 4kb. Normally it would
be 2kb, but some devices have a 512b switch so that is also seen. Not
sure that 1kb and 4kb are valid, but at least it would still point to
the drive possibly returning valid data and not garbage. So accept all
those, reject (and complain) if it isn't one of those and default to 2kb.

--
Jens Axboe

--

To: Jens Axboe <jens.axboe@...>
Cc: Jan Kara <jack@...>, Andrew Morton <akpm@...>, Arnd Bergmann <arnd@...>, linux-kernel <linux-kernel@...>
Date: Thursday, June 26, 2008 - 2:21 pm

I agree with the need for a hardware sanity check and I would happily test
any RFC patch :)

--
Greetings Michael.
--

To: Michael Buesch <mb@...>
Cc: Jan Kara <jack@...>, Andrew Morton <akpm@...>, Arnd Bergmann <arnd@...>, linux-kernel <linux-kernel@...>
Date: Thursday, June 26, 2008 - 2:36 pm

Something like this, totally untested...

diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 68e7f19..5c1e663 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -1308,13 +1308,29 @@ static int cdrom_read_capacity(ide_drive_t *drive, unsigned long *capacity,
req.cmd_flags |= REQ_QUIET;

stat = ide_cd_queue_pc(drive, &req);
- if (stat == 0) {
- *capacity = 1 + be32_to_cpu(capbuf.lba);
- *sectors_per_frame =
- be32_to_cpu(capbuf.blocklen) >> SECTOR_BITS;
+ if (stat)
+ return stat;
+
+ /*
+ * Sanity check the given block size
+ */
+ switch (capbuf.blocklen) {
+ case 512:
+ case 1024:
+ case 2048:
+ case 4096:
+ break;
+ default:
+ printk(KERN_ERR "ide-cd: weird block size %u\n",
+ capbuf.blocklen);
+ printk(KERN_ERR "ide-cd: default to 2kb block size\n");
+ capbuf.blocklen = 2048;
+ break;
}

- return stat;
+ *capacity = 1 + be32_to_cpu(capbuf.lba);
+ *sectors_per_frame = be32_to_cpu(capbuf.blocklen) >> SECTOR_BITS;
+ return 0;
}

static int cdrom_read_tocentry(ide_drive_t *drive, int trackno, int msf_flag,

--
Jens Axboe

--

To: Jens Axboe <jens.axboe@...>
Cc: Michael Buesch <mb@...>, Jan Kara <jack@...>, Arnd Bergmann <arnd@...>, linux-kernel <linux-kernel@...>, Bartlomiej Zolnierkiewicz <bzolnier@...>, <linux-ide@...>, <stable@...>
Date: Tuesday, July 22, 2008 - 5:25 am

This was all nearly a month ago and we missed 2.6.25.x and 2.6.26.

Jan has confirmed that the patch did fix the oops. So I have put
toegether the below patch, which I will send in Bart's direction. I
believe that Jens is offline at present.

Unfortunately the surrounding code has changed a bit in the current
post-2.6.26 mainline, but it is a small syntactic thing and the patch
can easily be backported.

I believe that the fix is needed in both 2.6.25.x and 2.6.26.x.

From: Jens Axboe <jens.axboe@oracle.com>

cdrom_read_capacity() will blindly return the capacity from the device
without sanity-checking it. This later causes code in fs/buffer.c to
oops.

Fix this by checking that the device is telling us sensible things.

Cc: Michael Buesch <mb@bu3sch.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

drivers/ide/ide-cd.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)

diff -puN drivers/ide/ide-cd.c~oops-when-using-growisofs drivers/ide/ide-cd.c
--- a/drivers/ide/ide-cd.c~oops-when-using-growisofs
+++ a/drivers/ide/ide-cd.c
@@ -1309,13 +1309,29 @@ static int cdrom_read_capacity(ide_drive

stat = ide_cd_queue_pc(drive, cmd, 0, &capbuf, &len, sense, 0,
REQ_QUIET);
- if (stat == 0) {
- *capacity = 1 + be32_to_cpu(capbuf.lba);
- *sectors_per_frame =
- be32_to_cpu(capbuf.blocklen) >> SECTOR_BITS;
+ if (stat)
+ return stat;
+
+ /*
+ * Sanity check the given block size
+ */
+ switch (capbuf.blocklen) {
+ case 512:
+ case 1024:
+ case 2048:
+ case 4096:
+ break;
+ default:
+ printk(KERN_ERR "ide-cd: weird block size %u\n",
+ capbuf.blocklen);
+ printk(KERN_ERR "ide-cd: default to 2kb block size\n");
+ capbuf.blocklen = 2048;
+ break;
}

- return stat;
+ *capacity = 1 + be32_to_cpu(capbuf.lba);
+ *sectors_pe...

To: Jens Axboe <jens.axboe@...>
Cc: Michael Buesch <mb@...>, Jan Kara <jack@...>, Andrew Morton <akpm@...>, Arnd Bergmann <arnd@...>, linux-kernel <linux-kernel@...>
Date: Wednesday, July 9, 2008 - 2:46 pm

BTW, just to confirm some positive result: On one of my machines I was
able to trigger the warning message in this patch. So it definitely fixes
some problem.

--

To: Jens Axboe <jens.axboe@...>
Cc: Jan Kara <jack@...>, Andrew Morton <akpm@...>, Arnd Bergmann <arnd@...>, linux-kernel <linux-kernel@...>
Date: Sunday, June 29, 2008 - 3:39 pm

So I applied this patch and it works fine.
However I cannot say if this fixed it. The warning does not appear, however,
it's probably possible that the drive does only return a wrong blocksize
under weird conditions (firmware bug).

So in any case, I think this patch should be applied. Checking device sanity
is always a good thing.

--
Greetings Michael.
--

To: Jens Axboe <jens.axboe@...>
Cc: Jan Kara <jack@...>, Andrew Morton <akpm@...>, Arnd Bergmann <arnd@...>, linux-kernel <linux-kernel@...>
Date: Thursday, June 26, 2008 - 2:39 pm

I'll test this. Thanks a lot. :)
However I cannot reproduce the bug. So I cannot tell whether it fixes it.

--
Greetings Michael.
--

To: Michael Buesch <mb@...>
Cc: Jan Kara <jack@...>, Andrew Morton <akpm@...>, Arnd Bergmann <arnd@...>, linux-kernel <linux-kernel@...>
Date: Thursday, June 26, 2008 - 2:41 pm

The patch wont hurt, probably we should just add it just in case.

--
Jens Axboe

--

To: Michael Buesch <mb@...>
Cc: Jan Kara <jack@...>, Andrew Morton <akpm@...>, linux-kernel <linux-kernel@...>, Jens Axboe <axboe@...>, Jan Kara <jack@...>
Date: Tuesday, June 24, 2008 - 9:42 pm

So maybe a use-after-free bug in the I/O error handling caused an inode to
be overwritten and then passed to an I/O function that used an invalid
inode->i_blkbits?

Arnd <><
--

To: Andrew Morton <akpm@...>
Cc: Michael Buesch <mb@...>, Arnd Bergmann <arnd@...>, linux-kernel <linux-kernel@...>, Jens Axboe <axboe@...>, Jan Kara <jack@...>
Date: Monday, June 23, 2008 - 2:59 am

Yes isofs will pass in a too-big page here (IIRC 32K or something).
And trigger this oops.
--

To: Michael Buesch <mb@...>
Cc: linux-kernel <linux-kernel@...>, Jens Axboe <axboe@...>
Date: Sunday, June 22, 2008 - 5:31 pm

Oh, btw: http://www.kerneloops.org/search.php?search=create_empty_buffers&btnG=Fu...
finds 15 more instances of the same bug (and one other bug), across lots of
kernel versions and architectures. The bug has been there since we started
using git.

Arnd <><
--

To: Arnd Bergmann <arnd@...>
Cc: linux-kernel <linux-kernel@...>, Jens Axboe <axboe@...>
Date: Sunday, June 22, 2008 - 6:09 pm

So, ehm. Maybe some debugging code should be added??
We could at least add code that prevents a crash and adds WARN_ONs.

--
Greetings Michael.
--

Previous thread: [PATCH 0/5 v2] x86 boot: various E820 & EFI related fixes - what changed in v2 by Paul Jackson on Sunday, June 22, 2008 - 10:21 am. (34 messages)

Next thread: [2.6.26 patch] fix the request_firmware() dummy by Adrian Bunk on Sunday, June 22, 2008 - 12:29 pm. (1 message)