2.6.20*: PATA DMA timeout, hangs (2)

Previous thread: [PATCH 1/2] mm: move common segment checks to separate helper function (v6) by Dmitriy Monakhov on Monday, March 12, 2007 - 3:57 am. (5 messages)

Next thread: Re: PCI failures during boot by Francis Moreau on Monday, March 12, 2007 - 5:07 am. (2 messages)
To: <linux-kernel@...>
Date: Monday, March 12, 2007 - 4:54 am

2.6.19 is ok, 2.6.20.[12] hangs from the moment DMA is turned on (hdparm
-d 1 /dev/hda):

hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 {
DriveReady
SeekComplete
DataRequest
}

Linux version 2.6.20.2-x152 (fvm@lokka) (gcc version 3.4.6 (Debian 3.4.6-4)) #1 SMP Sun Mar 11 21:21:07 CET 2007
BIOS-provided physical RAM map:
sanitize start
sanitize end
copy_e820_map() start: 0000000000000000 size: 000000000009fc00 end: 000000000009fc00 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 000000000009fc00 size: 0000000000000400 end: 00000000000a0000 type: 2
copy_e820_map() start: 00000000000e0000 size: 0000000000020000 end: 0000000000100000 type: 2
copy_e820_map() start: 0000000000100000 size: 000000001fdd0000 end: 000000001fed0000 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 000000001fed0000 size: 0000000000020000 end: 000000001fef0000 type: 4
copy_e820_map() start: 000000001fef0000 size: 0000000000010000 end: 000000001ff00000 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 00000000feea0000 size: 0000000001160000 end: 0000000100000000 type: 2
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000001fed0000 (usable)
BIOS-e820: 000000001fed0000 - 000000001fef0000 (ACPI NVS)
BIOS-e820: 000000001fef0000 - 000000001ff00000 (usable)
BIOS-e820: 00000000feea0000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
511MB LOWMEM available.
found SMP MP-table at 000f8da0
Zone PFN ranges:
DMA 0 -> 4096
Normal 4096 -> 130816
HighMem 130816 -> 130816
early_node_map[1] active PFN ranges
0: 0 -> 130816
DMI 2.3 present.
ACPI: PM-Timer IO Port: 0xf808
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 6:8 APIC version 17
ACPI: L...

To: <linux-kernel@...>
Date: Monday, March 12, 2007 - 7:24 am

I have a totally different PATA based system (P4 HT) with similar symptoms
except that it seem to recover by switching DMA off during boot after
5 errors:

hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command

So in this case it doesn't hang but is not really usable either.

lspci:
00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub Interface (rev 02)
00:01.0 PCI bridge: Intel Corporation 82865G/PE/P PCI to AGP Controller (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (...

To: Frank van Maarseveen <frankvm@...>
Cc: <linux-kernel@...>
Date: Monday, March 12, 2007 - 8:21 am

Hi,

Could you check if this is the same problem as this one:

http://bugzilla.kernel.org/show_bug.cgi?id=8169

Thanks,
Bart

-

To: Bartlomiej Zolnierkiewicz <bzolnier@...>
Cc: <linux-kernel@...>
Date: Monday, March 12, 2007 - 8:40 am

Looks like it except that I don't see "lost interrupt" messages here. So,
it might be something different (I don't know).

--
Frank
-

To: Frank van Maarseveen <frankvm@...>
Cc: <linux-kernel@...>
Date: Monday, March 12, 2007 - 4:40 pm

Hi,

hda: max request size: 128KiB
hda: 40021632 sectors (20491 MB) w/2048KiB Cache, CHS=39704/16/63
hda: cache flushes not supported
 hda: hda1 hda2 hda4

It seems that DMA is not used by default (CONFIG_IDEDMA_PCI_AUTO=n),
so this is probably exactly the same issue.

Please try the patch attached to the bugzilla bug entry.

Thanks,
Bart
-

To: Bartlomiej Zolnierkiewicz <bzolnier@...>
Cc: <linux-kernel@...>
Date: Tuesday, March 13, 2007 - 5:19 am

2.6.20.2 rejects this patch and I don't see a way to apply it by hand:
ide_set_dma() isn't there, nothing seems to match.

--
Frank
-

To: Frank van Maarseveen <frankvm@...>
Cc: <linux-kernel@...>
Date: Tuesday, March 13, 2007 - 7:04 am

The patch is for 2.6.21-rc3, sorry for not making it clear.

Bart
-

To: Frank van Maarseveen <frankvm@...>
Cc: <linux-kernel@...>
Date: Monday, March 12, 2007 - 8:07 am

Not a solution, unfortunately, but try disabling CONFIG_IDE and using Alan's
new PATA drivers. For your Intel systems, this should mean you need only:

CONFIG_ATA_PIIX

For both SATA and PATA support. You'll need the appropriate SCSI modules built
in (if you say =y), i.e. SCSI disk and SCSI CDROM should be built in.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-

To: Alistair John Strachan <s0348365@...>
Cc: <linux-kernel@...>
Date: Monday, March 12, 2007 - 9:25 am

yes, that worked... after booting with root=/dev/sda2 and s/hda/sda/
/etc/fstab /etc/lilo.conf + lilo. didn't mount a /dev/sr0 for a loong
time.

So, are /dev/hd* going to disappear in a few years? iow, does it make
sense to _slowly_ start to migrate to /dev/sd*?

The problem is there's no plan B in case of any troubles except rename
everything back again to boot an old kernel.

--
Frank
-

To: Frank van Maarseveen <frankvm@...>
Cc: <linux-kernel@...>
Date: Monday, March 12, 2007 - 9:52 am

On Monday 12 March 2007 13:25, Frank van Maarseveen wrote:

How would you propose doing this? I'm sure modern distros with an
initrd/initramfs probably already do some sort of root detection. Doesn't fix

I doubt this matters for distributors, as they'll simply switch over when you
upgrade the distro, and the earliest supported kernel will be the one that
shipped with the newer version.

I accept that it's a bit of a drag, but it's better to have a standard naming
convention for all disks, isn't it?

Glad this is working for you.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-

To: Alistair John Strachan <s0348365@...>
Cc: Frank van Maarseveen <frankvm@...>, <linux-kernel@...>
Date: Monday, March 19, 2007 - 4:22 am

The solution is quite simple. Use the LABEL= trick or other methods to
uniquely identify the partition regardless how it's connected. Most
modern distributions are already doing this.

--
tejun
-

Previous thread: [PATCH 1/2] mm: move common segment checks to separate helper function (v6) by Dmitriy Monakhov on Monday, March 12, 2007 - 3:57 am. (5 messages)

Next thread: Re: PCI failures during boot by Francis Moreau on Monday, March 12, 2007 - 5:07 am. (2 messages)