Hello everyone,
I have trawled the depths of the Internet, scoured the innermost reaches
of the Usenet, and finally I arrive, beaten and bruised, at the steps to
the Linux kernel mailing list, to seek advice from the penguins
themselves. I humbly prostrate myself .. :)
My first request: Please CC me directly on replies as I am not
subscribed to the list.
Now to the meat of it; I have been experiencing a lot of trouble with
system freezes; but these are not crippling freezes in the sense that
they come back after a few seconds. They are always accompanied by the
following log in /var/log/messages:
----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<
Aug 3 21:44:34 localhost kernel: ata1.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x2 frozen
Aug 3 21:44:34 localhost kernel: ata1.00: cmd
ca/00:40:b8:c9:f3/00:00:00:00:00/e8 tag 0 dma 32768 out
Aug 3 21:44:34 localhost kernel: res
40/00:78:00:00:00/00:00:00:00:00/50 Emask 0x4 (timeout)
Aug 3 21:44:34 localhost kernel: ata1.00: status: { DRDY }
Aug 3 21:44:39 localhost kernel: ata1: port is slow to respond, please
be patient (Status 0x80)
Aug 3 21:44:44 localhost kernel: ata1: device not ready (errno=-16),
forcing hardreset
Aug 3 21:44:44 localhost kernel: ata1: soft resetting link
Aug 3 21:44:45 localhost kernel: ata1.00: configured for UDMA/100
Aug 3 21:44:45 localhost kernel: ata1.01: configured for UDMA/100
Aug 3 21:44:45 localhost kernel: ata1: EH complete
Aug 3 21:44:45 localhost kernel: sd 0:0:0:0: [sda] 160086528 512-byte
hardware sectors (81964 MB)
Aug 3 21:44:45 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
Aug 3 21:44:45 localhost kernel: sd 0:0:0:0: [sda] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Aug 3 21:44:45 localhost kernel: sd 0:0:1:0: [sdb] 488397168 512-byte
hardware sectors (250059 MB)
Aug 3 21:44:45 localhost kernel: sd 0:0:1:0: [sdb] Write Protect is off
Aug 3 21:44:45 ...To start with can I have a dmesg after boot and a description of what is plugged into where (disks and CD wise) Alan --
Hi Alain, Sure, no problem. First the description, I'll append the dmesg output at the end. I have an ASUS A8V motherboard, with a Maxtor Ultra100 IDE controller card on one of the PCI ports. On the mainboard: - Primary IDE Master : 13GB Quantum Fireball HD (Where Windows resides, not that I use it). - Primary IDE Slave : LG CD-ROM, 52X - Secondary IDE : Nothing On the IDE controller card: - "IDE1 slot" Master : 80GB Maxtor - "IDE1 slot" Slave : 250GB Western Digital - "IDE2 slot" : Nothing The output of dmesg (after booting) follows: ================================== Initializing cgroup subsys cpuset Linux version 2.6.25.11-60.fc8 (mockbuild@x86-7) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)) #1 SMP Mon Jul 21 01:40:51 EDT 2008 Command line: ro root=/dev/VolGroup00/LogVol00 rhgb quiet BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000003ffb0000 (usable) BIOS-e820: 000000003ffb0000 - 000000003ffc0000 (ACPI data) BIOS-e820: 000000003ffc0000 - 000000003fff0000 (ACPI NVS) BIOS-e820: 000000003fff0000 - 0000000040000000 (reserved) BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved) Entering add_active_range(0, 0, 159) 0 entries of 3200 used Entering add_active_range(0, 256, 262064) 1 entries of 3200 used end_pfn_map = 1048576 DMI 2.3 present. ACPI: RSDP 000FA810, 0021 (r2 ACPIAM) ACPI: XSDT 3FFB0100, 003C (r1 A M I OEMXSDT 5000729 MSFT 97) ACPI: FACP 3FFB0290, 00F4 (r3 A M I OEMFACP 5000729 MSFT 97) ACPI: DSDT 3FFB03F0, 391B (r1 A0277 A0277001 1 MSFT 100000D) ACPI: FACS 3FFC0000, 0040 ACPI: APIC 3FFB0390, 0052 (r1 A M I OEMAPIC 5000729 MSFT 97) ACPI: OEMB 3FFC0040, 003F (r1 A M I OEMBIOS 5000729 MSFT 97) Scanning NUMA topology in Northbridge 24 No NUMA configuration found Faking a node at ...
Ok so the pdc202xx_old hardware flakes out when you have very high network load (I'd guess in fact very high bus traffic). The actual log is the disk I/O timing out, then the drive being busy (probably due to the timeout and a DMA transfer getting stuck). We reset it and carry on. Libata happens to log this a lot more visibly than old kernels which is useful but does mean people sometimes don't notice. The rest then fits - the freeze I'd expect as we block I/O while trying to get the drive back. Doubt the Nvidia module is involved as I'd then expect problems under high graphical load but you can certainly test that. I don't suppose you've got a spare PCI network card you could try instead to see if it is the network card bits ? Alan --
Hi Alan,
Actually I do not really have another PCI network card, but I could
switch the computer back to the other interface which is on the
motherboard (does that one function as a PCI device?). As I mentioned in
my first post, the current card I am using is an attempt to try to work
around the problem (originally I thought it was the on-board
controller), so I have my doubts as to whether switching back would help.
Nonetheless, I will give it a try again and let you know the result.
Thanks,
David
--
The only way to keep your health is to eat what you don't want, drink
what you don't like, and do what you'd rather not.
-- Mark Twain
That would be great - if it makes no difference, yet high network traffic is the key factor then it mostly eliminates bugs in the network drivers from suspicion. At that point its time to dig deeper into the chip config. --
Hi Alan,
So I switched back to my old on-board network card (removing the prior
card altogether from the case). I tried my test-case, which involves
downloading some big files while simultaneously running a find command.
I *was* able to reproduce the issue fairly quickly.
The key difference is that this time I was using the skge kernel module
for networking. So, I tend to agree that it is most likely not a network
driver problem.
Please let me know if there is anything further I can do to assist in
debugging.
Thanks,
David
--
The only way to keep your health is to eat what you don't want, drink
what you don't like, and do what you'd rather not.
-- Mark Twain
Hello,
It occurs to me that you might be implying there is some "chip config"
that I can retrieve and give to you. Is there anything I can/should do
in order to give you more information, or is this pretty much out of my
hands at this point?
Should I be filing a bug report for this?
--
The only way to keep your health is to eat what you don't want, drink
what you don't like, and do what you'd rather not.
-- Mark Twain
There are two things you can play with. One is the UDMA burst mode on the chip which should be getting set, the other is PCI latencies (which the Probably a good idea Next things to try are 1. edit drivers/ata/pata_pdc202xx_old.c after the line iowrite8(burst | 0x01, bmdma + 0x1f); add printk(KERN_ERR "BURST was %02X\n", burst); build/install/boot that kernel and see what it says. The second (sledgehammer) approach would be to use setpci to set the LATENCY_TIMER value on the pdc202xx_old and network card differently. Alan --
