Disabling CONFIG_IDE made my machine boot, as it was using libata
anyway.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
Hi,
this comes from ide-generic
Kamalesh/Pavel:
Could you try latest git and see if the OOPS is still there?
[ Yeah, I'm unable to reproduce it. :( ]
Thanks,
Bart
--
Hi Bart,
The panic is reproducible with the 2.6.24-git16 kernel, the call trace is
similar to the previous oneBUG: unable to handle kernel paging request at ffffffffffffffa0
IP: [<ffffffff80415673>] init_irq+0x188/0x444
PGD 203067 PUD 204067 PMD 0
Oops: 0000 [1] SMP
CPU 3
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-git16 #1
RIP: 0010:[<ffffffff80415673>] [<ffffffff80415673>] init_irq+0x188/0x444
RSP: 0000:ffff81022f093e00 EFLAGS: 00010282
RAX: ffffffffffffff80 RBX: ffffffff808ad200 RCX: 0000000000000000
RDX: 00000000ffffffff RSI: ffff81022fc039c0 RDI: ffffffff807512c0
RBP: ffff81022f093e30 R08: ffff81022f093d70 R09: 0000000000000002
R10: 0000000000000001 R11: ffff81022f093c00 R12: ffffffff808b4500
R13: ffffffff808b4510 R14: 0000000000000000 R15: ffffffffffffffff
FS: 0000000000000000(0000) GS:ffff81022f0e7ac0(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffffffffffffffa0 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff81022f092000, task ffff81022f0797e0)
Stack: ffff81022f093e30 0000000000000000 ffffffff808ad200 ffffffff808ad220
ffffffff808add80 0000000000000000 ffff81022f093eb0 ffffffff8041648f
ffff81022f093ec0 0000000000000000 0000000080751ee0 0000000000000246
Call Trace:
[<ffffffff8041648f>] ide_device_add_all+0xb60/0xe54
[<ffffffff807d6d48>] ide_generic_init+0x46/0x4a
[<ffffffff807b873b>] kernel_init+0x175/0x2e7
[<ffffffff8020bff8>] child_rip+0xa/0x12
[<ffffffff8037476c>] acpi_ds_init_one_object+0x0/0x88
[<ffffffff807b85c6>] kernel_init+0x0/0x2e7
[<ffffffff8020bfee>] child_rip+0x0/0x12Code: 89 03 49 8b 45 18 48 89 18 48 39 1b 75 04 0f 0b eb fe fe 05 20 71 38 00 fb eb 5b 48 8b 83 20 07 00 00 83 ca ff 48 83 c0 80 74 0e <48> 8b 40 20 48 8b 80 88 00 00 00 8b 50 04 48 8b 3d ...
Thanks, I again reviewed ide-probe.c changes but nothing seems wrong...
Please also try disassembling init_irq using gdb so we see where it fails.
--
Kamalesh, were you able to bisect this down? I just got hit by the
same panic on a 4-way x86_64, with 2.6.24-git22.Thanks,
Nish
--
Hi Nish,
I tried bisecting and the guilty patch seems to be
36501650ec45b1db308c3b51886044863be2d762 is first bad commit
commit 36501650ec45b1db308c3b51886044863be2d762
Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date: Fri Feb 1 23:09:31 2008 +0100ide: keep pointer to struct device instead of struct pci_dev in ide_hwif_t
the gdb output, also points to the changes made by the guilty patch
(gdb) p ide_device_add_all
$1 = {int (u8 *, const struct ide_port_info *)} 0xffffffff804176ac <ide_device_add_all>
(gdb) p/x 0xffffffff804176ac+0xb60
$2 = 0xffffffff8041820c
(gdb) l *0xffffffff8041820c
0xffffffff8041820c is in ide_device_add_all (drivers/ide/ide-probe.c:1249).
1244 goto out;
1245 }
1246
1247 sg_init_table(hwif->sg_table, hwif->sg_max_nents);
1248
1249 if (init_irq(hwif) == 0)
1250 goto done;
1251
1252 old_irq = hwif->irq;
1253 /*
(gdb)(gdb) p init_irq
$1 = {int (ide_hwif_t *)} 0xffffffff8041721f <init_irq>
(gdb) p/x 0xffffffff8041721f+0x1a4
$2 = 0xffffffff804173c3
(gdb) l *0xffffffff804173c3
0xffffffff804173c3 is in init_irq (include/asm/pci.h:101).
96 /* Returns the node based on pci bus */
97 static inline int __pcibus_to_node(struct pci_bus *bus)
98 {
99 struct pci_sysdata *sd = bus->sysdata;
100
101 return sd->node;
102 }
103
104 static inline cpumask_t __pcibus_to_cpumask(struct pci_bus *bus)
105 {
(gdb)--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
Hi,
Thanks for the detailed analysis and sorry for the bug.
I think that this may has been just fixed by Andi's recent hwif_to_node()
fix (patch below, it is in Linus' tree already), could please verify this?commit 1f07e988290fc45932f5028c9e2a862c37a57336
Author: Andi Kleen <andi@firstfloor.org>
Date: Mon Feb 11 01:35:20 2008 +0100Prevent IDE boot ops on NUMA system
Without this patch a Opteron test system here oopses at boot with
current git.Calling to_pci_dev() on a NULL pointer gives a negative value so the
following NULL pointer check never triggers and then an illegal address
is referenced. Check the unadjusted original device pointer for NULL
instead.Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>diff --git a/include/linux/ide.h b/include/linux/ide.h
index 23fad89..a3b69c1 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -1295,7 +1295,7 @@ static inline void ide_dump_identify(u8 *id)
static inline int hwif_to_node(ide_hwif_t *hwif)
{
struct pci_dev *dev = to_pci_dev(hwif->dev);
- return dev ? pcibus_to_node(dev->bus) : -1;
+ return hwif->dev ? pcibus_to_node(dev->bus) : -1;
}static inline ide_drive_t *ide_get_paired_drive(ide_drive_t *drive)
--
Hi Bart,
Thanks !! the patch solves the kernel panic but when after applying the patch,kernel is not
able to mount the filesystem and panics, am i not sure what is likely causing the panic.Creating root device.
Mounting root filesystem.
mount: could not find filesystem
Kernel panic - not syncing: Attempted to kill init!--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
Hi,
Is
- the commit 36501650ec45b1db308c3b51886044863be2d762 with Andi's fix applied
or
- the commit f6fb786d6dcdd7d730e4fba620b071796f487e1b
(the one before commit 36501650ec45b1db308c3b51886044863be2d762)Is IDE actually used for the boot device?
[ Please send a dmesg output from the working system. ]
Thanks,
Bart
--
No, the commit before the commit 36501650ec45b1db308c3b51886044863be2d762 did not either work, i
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
Hi,
Hmm, it is not (from dmesg):
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
Probing IDE interface ide0...
hda: HL-DT-STCD-RW/DVD DRIVE GCC-4244N, ATAPI CD/DVD-ROM drive
Probing IDE interface ide1...
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: ATAPI 24X DVD-ROM CD-R/RW drive, 2048kB Cache
Uniform CD-ROM driver Revision: 3.20[...]
Adaptec aacraid driver 1.1-5[2449]-ms
ACPI: PCI Interrupt 0000:01:02.0[A] -> GSI 25 (level, low) -> IRQ 25
AAC0: kernel 5.2-0[11835] Jan 9 2007
AAC0: monitor 5.2-0[11835]
AAC0: bios 5.2-0[11835]
AAC0: serial 1625D1
AAC0: 64bit support enabled.
AAC0: 64 Bit DAC enabled
scsi0 : ServeRAID
scsi 0:0:0:0: Direct-Access IBM x366 V1.0 PQ: 0 ANSI: 2
scsi 0:1:0:0: Direct-Access IBM-ESXS ST973401SS B519 PQ: 0 ANSI: 5
scsi 0:1:1:0: Direct-Access IBM-ESXS ST973401SS B519 PQ: 0 ANSI: 5
scsi 0:1:2:0: Direct-Access IBM-ESXS ST973401SS B519 PQ: 0 ANSI: 5
scsi 0:3:0:0: Enclosure IBM SAS SES-2 DEVICE 0.09 PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 429459456 512-byte hardware sectors (219883 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 06 00 10 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
sd 0:0:0:0: [sda] 429459456 512-byte hardware sectors (219883 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 06 00 10 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
sda: sda1 sda2 sda3 sda4 < sda5 sda6 >
sd 0:0:0:0: [sda] Attached SCSI removable disk
sd 0:0:0:0: Attached scsi generic sg0 type 0
scsi 0:1:0:0: Attached scsi generic sg1 type 0
scsi 0:1:1:0: Attached scsi generic sg2 type 0
scsi 0:1:2:0: Attached scsi generic sg3 type 0
scsi 0:3:0:0: Attached scsi generic sg4 type 13[...]
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode....
Yinghai Lu noticed that it may be actually a SES problem:
http://lkml.org/lkml/2008/2/14/88
[ I overlooked the above mail, sorry ]
--
Only if SES is enabled, is it (CONFIG_SCSI_ENCLOSURE)? ... is there
actually a dmesg of the failing system somewhere, I couldn't find it in
the (somewhat long) thread?James
--
On Thu, Feb 14, 2008 at 1:46 AM, Kamalesh Babulal
it seems you have enclosure connected.
please check if you enable the SES in .config.
if so, please try
http://lkml.org/lkml/2008/2/13/673
YH
--
Hi,
Thanks for pointing the patch, I do not have the SES config option enabled,
then too i tried your patch, but that does not solve the panic. The kernel
panic's with the same panic message as before. I have attached the .config
file which i am using, please let me know if i am missing out/getting wrong
any option in the configuration.--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
---
On Fri, Feb 15, 2008 at 3:15 AM, Kamalesh Babulal
can you try x86.git#testing?
http://people.redhat.com/mingo/x86.git/README
YH
--
and try attached patch.
YH
