Disabling CONFIG_IDE made my machine boot, as it was using libata anyway. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
Hi, this comes from ide-generic Kamalesh/Pavel: Could you try latest git and see if the OOPS is still there? [ Yeah, I'm unable to reproduce it. :( ] Thanks, Bart --
Hi Bart, The panic is reproducible with the 2.6.24-git16 kernel, the call trace is similar to the previous one BUG: unable to handle kernel paging request at ffffffffffffffa0 IP: [<ffffffff80415673>] init_irq+0x188/0x444 PGD 203067 PUD 204067 PMD 0 Oops: 0000 [1] SMP CPU 3 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.24-git16 #1 RIP: 0010:[<ffffffff80415673>] [<ffffffff80415673>] init_irq+0x188/0x444 RSP: 0000:ffff81022f093e00 EFLAGS: 00010282 RAX: ffffffffffffff80 RBX: ffffffff808ad200 RCX: 0000000000000000 RDX: 00000000ffffffff RSI: ffff81022fc039c0 RDI: ffffffff807512c0 RBP: ffff81022f093e30 R08: ffff81022f093d70 R09: 0000000000000002 R10: 0000000000000001 R11: ffff81022f093c00 R12: ffffffff808b4500 R13: ffffffff808b4510 R14: 0000000000000000 R15: ffffffffffffffff FS: 0000000000000000(0000) GS:ffff81022f0e7ac0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: ffffffffffffffa0 CR3: 0000000000201000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 1, threadinfo ffff81022f092000, task ffff81022f0797e0) Stack: ffff81022f093e30 0000000000000000 ffffffff808ad200 ffffffff808ad220 ffffffff808add80 0000000000000000 ffff81022f093eb0 ffffffff8041648f ffff81022f093ec0 0000000000000000 0000000080751ee0 0000000000000246 Call Trace: [<ffffffff8041648f>] ide_device_add_all+0xb60/0xe54 [<ffffffff807d6d48>] ide_generic_init+0x46/0x4a [<ffffffff807b873b>] kernel_init+0x175/0x2e7 [<ffffffff8020bff8>] child_rip+0xa/0x12 [<ffffffff8037476c>] acpi_ds_init_one_object+0x0/0x88 [<ffffffff807b85c6>] kernel_init+0x0/0x2e7 [<ffffffff8020bfee>] child_rip+0x0/0x12 Code: 89 03 49 8b 45 18 48 89 18 48 39 1b 75 04 0f 0b eb fe fe 05 20 71 38 00 fb eb 5b 48 8b 83 20 07 00 00 83 ca ff 48 83 c0 80 74 0e <48> 8b 40 20 48 8b 80 88 00 00 00 8b 50 04 48 8b 3d 48 11 30 00 RIP [<ffffffff80415673>] init_irq+0x188/0x444 ...
Thanks, I again reviewed ide-probe.c changes but nothing seems wrong... Please also try disassembling init_irq using gdb so we see where it fails. --
Kamalesh, were you able to bisect this down? I just got hit by the same panic on a 4-way x86_64, with 2.6.24-git22. Thanks, Nish --
Hi Nish,
I tried bisecting and the guilty patch seems to be
36501650ec45b1db308c3b51886044863be2d762 is first bad commit
commit 36501650ec45b1db308c3b51886044863be2d762
Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date: Fri Feb 1 23:09:31 2008 +0100
ide: keep pointer to struct device instead of struct pci_dev in ide_hwif_t
the gdb output, also points to the changes made by the guilty patch
(gdb) p ide_device_add_all
$1 = {int (u8 *, const struct ide_port_info *)} 0xffffffff804176ac <ide_device_add_all>
(gdb) p/x 0xffffffff804176ac+0xb60
$2 = 0xffffffff8041820c
(gdb) l *0xffffffff8041820c
0xffffffff8041820c is in ide_device_add_all (drivers/ide/ide-probe.c:1249).
1244 goto out;
1245 }
1246
1247 sg_init_table(hwif->sg_table, hwif->sg_max_nents);
1248
1249 if (init_irq(hwif) == 0)
1250 goto done;
1251
1252 old_irq = hwif->irq;
1253 /*
(gdb)
(gdb) p init_irq
$1 = {int (ide_hwif_t *)} 0xffffffff8041721f <init_irq>
(gdb) p/x 0xffffffff8041721f+0x1a4
$2 = 0xffffffff804173c3
(gdb) l *0xffffffff804173c3
0xffffffff804173c3 is in init_irq (include/asm/pci.h:101).
96 /* Returns the node based on pci bus */
97 static inline int __pcibus_to_node(struct pci_bus *bus)
98 {
99 struct pci_sysdata *sd = bus->sysdata;
100
101 return sd->node;
102 }
103
104 static inline cpumask_t __pcibus_to_cpumask(struct pci_bus *bus)
105 {
(gdb)
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
Hi,
Thanks for the detailed analysis and sorry for the bug.
I think that this may has been just fixed by Andi's recent hwif_to_node()
fix (patch below, it is in Linus' tree already), could please verify this?
commit 1f07e988290fc45932f5028c9e2a862c37a57336
Author: Andi Kleen <andi@firstfloor.org>
Date: Mon Feb 11 01:35:20 2008 +0100
Prevent IDE boot ops on NUMA system
Without this patch a Opteron test system here oopses at boot with
current git.
Calling to_pci_dev() on a NULL pointer gives a negative value so the
following NULL pointer check never triggers and then an illegal address
is referenced. Check the unadjusted original device pointer for NULL
instead.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff --git a/include/linux/ide.h b/include/linux/ide.h
index 23fad89..a3b69c1 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -1295,7 +1295,7 @@ static inline void ide_dump_identify(u8 *id)
static inline int hwif_to_node(ide_hwif_t *hwif)
{
struct pci_dev *dev = to_pci_dev(hwif->dev);
- return dev ? pcibus_to_node(dev->bus) : -1;
+ return hwif->dev ? pcibus_to_node(dev->bus) : -1;
}
static inline ide_drive_t *ide_get_paired_drive(ide_drive_t *drive)
--
Hi Bart, Thanks !! the patch solves the kernel panic but when after applying the patch,kernel is not able to mount the filesystem and panics, am i not sure what is likely causing the panic. Creating root device. Mounting root filesystem. mount: could not find filesystem Kernel panic - not syncing: Attempted to kill init! -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. --
Hi, Is - the commit 36501650ec45b1db308c3b51886044863be2d762 with Andi's fix applied or - the commit f6fb786d6dcdd7d730e4fba620b071796f487e1b (the one before commit 36501650ec45b1db308c3b51886044863be2d762) Is IDE actually used for the boot device? [ Please send a dmesg output from the working system. ] Thanks, Bart --
No, the commit before the commit 36501650ec45b1db308c3b51886044863be2d762 did not either work, i -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL.
On Thu, Feb 14, 2008 at 1:46 AM, Kamalesh Babulal it seems you have enclosure connected. please check if you enable the SES in .config. if so, please try http://lkml.org/lkml/2008/2/13/673 YH --
Hi, Thanks for pointing the patch, I do not have the SES config option enabled, then too i tried your patch, but that does not solve the panic. The kernel panic's with the same panic message as before. I have attached the .config file which i am using, please let me know if i am missing out/getting wrong any option in the configuration. -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. ---
On Fri, Feb 15, 2008 at 3:15 AM, Kamalesh Babulal can you try x86.git#testing? http://people.redhat.com/mingo/x86.git/README YH --
Hi, Hmm, it is not (from dmesg): Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx Probing IDE interface ide0... hda: HL-DT-STCD-RW/DVD DRIVE GCC-4244N, ATAPI CD/DVD-ROM drive Probing IDE interface ide1... ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hda: ATAPI 24X DVD-ROM CD-R/RW drive, 2048kB Cache Uniform CD-ROM driver Revision: 3.20 [...] Adaptec aacraid driver 1.1-5[2449]-ms ACPI: PCI Interrupt 0000:01:02.0[A] -> GSI 25 (level, low) -> IRQ 25 AAC0: kernel 5.2-0[11835] Jan 9 2007 AAC0: monitor 5.2-0[11835] AAC0: bios 5.2-0[11835] AAC0: serial 1625D1 AAC0: 64bit support enabled. AAC0: 64 Bit DAC enabled scsi0 : ServeRAID scsi 0:0:0:0: Direct-Access IBM x366 V1.0 PQ: 0 ANSI: 2 scsi 0:1:0:0: Direct-Access IBM-ESXS ST973401SS B519 PQ: 0 ANSI: 5 scsi 0:1:1:0: Direct-Access IBM-ESXS ST973401SS B519 PQ: 0 ANSI: 5 scsi 0:1:2:0: Direct-Access IBM-ESXS ST973401SS B519 PQ: 0 ANSI: 5 scsi 0:3:0:0: Enclosure IBM SAS SES-2 DEVICE 0.09 PQ: 0 ANSI: 5 sd 0:0:0:0: [sda] 429459456 512-byte hardware sectors (219883 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 06 00 10 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA sd 0:0:0:0: [sda] 429459456 512-byte hardware sectors (219883 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 06 00 10 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA sda: sda1 sda2 sda3 sda4 < sda5 sda6 > sd 0:0:0:0: [sda] Attached SCSI removable disk sd 0:0:0:0: Attached scsi generic sg0 type 0 scsi 0:1:0:0: Attached scsi generic sg1 type 0 scsi 0:1:1:0: Attached scsi generic sg2 type 0 scsi 0:1:2:0: Attached scsi generic sg3 type 0 scsi 0:3:0:0: Attached scsi generic sg4 type 13 [...] kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data ...
Yinghai Lu noticed that it may be actually a SES problem: http://lkml.org/lkml/2008/2/14/88 [ I overlooked the above mail, sorry ] --
Only if SES is enabled, is it (CONFIG_SCSI_ENCLOSURE)? ... is there actually a dmesg of the failing system somewhere, I couldn't find it in the (somewhat long) thread? James --
