Re: SATA Cold Boot problems on >2.6.25 with NV

Previous thread: MTD/block regression (was Re: Slub debugging NAND error in 2.6.25.10.atmel.2) by Haavard Skinnemoen on Friday, August 29, 2008 - 7:28 am. (5 messages)

Next thread: Re: [PATCH] x86: split e820 reserved entries record to late v2 by David Witbrodt on Friday, August 29, 2008 - 7:48 am. (2 messages)
From: Robert Hancock
Date: Friday, August 29, 2008 - 7:44 am

(ccing linux-ide)

Tejun, another one of these reset issues?


--

From: Tejun Heo
Date: Friday, August 29, 2008 - 7:52 am

Yeah, looks like it.  I just sent the patch for #upstream-fixes and will
forward it to -stable once it gets into #upstream-fixes.

  http://article.gmane.org/gmane.linux.ide/34077

Thanks.

-- 
tejun
--

From: Konstantin Kletschke
Date: Friday, August 29, 2008 - 2:21 pm

I have this patch actually applied and had switched off the computer
afterwards completely for more than one hour two times. Each time it
booted then (this was the patch you suggested initially to Many Maxwell
this month to this list).

Everything seems to work fine, but my dmesg and /var/log/messages is
flooded with this now:

Aug 29 23:20:39 zappa ata1: EH complete
Aug 29 23:20:41 zappa ata1: EH complete
Aug 29 23:20:47 zappa ata1: EH complete
Aug 29 23:20:49 zappa ata1: EH complete
Aug 29 23:20:51 zappa ata1: EH complete
Aug 29 23:20:53 zappa ata1: EH complete
Aug 29 23:20:55 zappa ata1: EH complete
Aug 29 23:20:56 zappa ata1: EH complete
Aug 29 23:20:59 zappa ata1: EH complete
Aug 29 23:21:01 zappa ata1: EH complete

I am curious if it boots tomorrow after sleeping for one night :-P

Regards, Konsti

-- 
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E  A080 1E69 3FDA EF62 FCEF
--

From: Tejun Heo
Date: Saturday, August 30, 2008 - 2:14 am

Hmm... Can you post full dmesg output?  We used to see things like above
when ATAPI CHECK SENSE handling somehow failed to tell EH that it was an
exception not worth whining about.  Maybe EH action mask is not being

I somehow feel pretty optimistic about that part.  :-)

-- 
tejun
--

From: Konstantin Kletschke
Date: Saturday, August 30, 2008 - 1:51 pm

Of course :-)

dmesg is attached with patch applied. At the end of the patch it (of
course) continues, but only with:

ata1: EH complete
ata1: EH complete
ata1: EH complete

Hmn, to take my pants down entirely: What is this "EH"?

And how does the change of the .reset function affect this? May be, I

You are right, it started immediately without a hitch this morning after
sleeping entirely for a couple of hours.

Kind Regards, Konsti


-- 
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E  A080 1E69 3FDA EF62 FCEF
From: Tejun Heo
Date: Monday, September 1, 2008 - 4:18 am

Your controller is raising exception continuously for some reason.
Have no idea yet.

Can you please apply the attached patch and post the resulting dmesg?

-- 
tejun
From: Konstantin Kletschke
Date: Monday, September 1, 2008 - 10:46 am

Of course. Patch applied, here is dmesg output, looks interesting now -
but anything works fine so far.

Regards, Konsti

-- 
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E  A080 1E69 3FDA EF62 FCEF
From: Tejun Heo
Date: Monday, September 1, 2008 - 9:35 pm

(cc'ing linux-ide)


I think it's more circa 2.6.22 but in my memory terms, which sucks,

Hmm... someone is scheduling EH incessantly without any error or action
set.  Can you please try the attached patch?

-- 
tejun
From: Konstantin Kletschke
Date: Monday, September 1, 2008 - 10:43 pm

I did and put here all dmesg gave me. Do you want to get the whole stuff
from the beginning? /var/log/messages has ist, but it is 416k, I put it
on a ftp server somewhere out there then. Or is something else
recommended?

Konsti



-- 
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E  A080 1E69 3FDA EF62 FCEF
From: Tejun Heo
Date: Monday, September 1, 2008 - 10:45 pm

The excerpt is fine but please turn on CONFIG_KALLSYMS.  The stack dump
is pretty much meaningless without it.

Thanks.

-- 
tejun
--

From: Konstantin Kletschke
Date: Monday, September 1, 2008 - 11:36 pm

Yea, of course... my bad. First I wondered to turn this on or deliver

;-)

-- 
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E  A080 1E69 3FDA EF62 FCEF
From: Benny Halevy
Date: Tuesday, September 23, 2008 - 12:36 am

I'm seeing a similar problem after upgrading from 
2.6.25.14-108.fc9.x86_64 to 2.6.27-rc6.

From what I can tell the messages
Sep 23 10:27:20 pangw kernel: ata6: EH pending after 5 tries, giving up
Sep 23 10:27:20 pangw kernel: ata6: EH complete

are printed for a disconnected ATA port that's a neighbor
of one that's occupied.

these are the ports that are in use:
Sep 23 10:19:40 pangw kernel: ata1.00: ATA-6: ST3160023A, 3.01, max UDMA/100
Sep 23 10:19:40 pangw kernel: ata2.00: ATAPI: _NEC DVD_RW ND-3550A, 1.05, max UDMA/33
[PATA]
Sep 23 10:19:40 pangw kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

The WDC drive was previously on ata3 and the message were then printed
for ata4.  This makes me thunk it might be related to managing of
dual-ported sata_nv chips?

before:
Sep 21 19:14:10 pangw kernel: sata_nv 0000:00:08.0: PCI INT A -> Link[APSJ] -> GSI 20 (level, low) -> IRQ 20
Sep 21 19:14:10 pangw kernel: scsi2 : sata_nv
Sep 21 19:14:10 pangw kernel: scsi3 : sata_nv
Sep 21 19:14:10 pangw kernel: ata3: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xcc00 irq 21
Sep 21 19:14:10 pangw kernel: ata4: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xcc08 irq 21
Sep 21 19:14:10 pangw kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep 21 19:14:10 pangw kernel: ata3.00: ATA-7: WDC WD1600JS-60MHB1, 10.02E02, max UDMA/100
Sep 21 19:14:10 pangw kernel: ata3.00: 312581808 sectors, multi 16: LBA48 
Sep 21 19:14:10 pangw kernel: ata3.00: configured for UDMA/100
Sep 21 19:14:10 pangw kernel: scsi 2:0:0:0: Direct-Access     ATA      WDC WD1600JS-60M 10.0 PQ: 0 ANSI: 5
Sep 21 19:14:10 pangw kernel: sd 2:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)
Sep 21 19:14:10 pangw kernel: sd 2:0:0:0: [sdb] Write Protect is off
Sep 21 19:14:10 pangw kernel: sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 21 19:14:10 pangw kernel: sd 2:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)
Sep 21 19:14:10 pangw ...
From: Konstantin Kletschke
Date: Wednesday, September 24, 2008 - 1:48 am

Hey, I did not realize this yet!

My only SATA device seems to be connected to Port2:

Sep 24 07:12:47 zappa ata2: SATA max UDMA/133 cmd 0xe80 ctl 0xe00 bmdma 0xd808 irq 21
Sep 24 07:12:47 zappa ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 24 07:12:47 zappa ata2.00: ATA-7: SAMSUNG HD753LJ, 1AA01106, max UDMA7
Sep 24 07:12:47 zappa ata2.00: 1465149168 sectors, multi 16: LBA48 NCQ (depth 0/32)
Sep 24 07:12:47 zappa ata2.00: configured for UDMA/133

Whereas my message log is still flooded by ata1 stuff:

Sep 21 11:34:57 zappa ata1: EH complete
Sep 21 11:34:57 zappa ata1: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Sep 21 11:34:58 zappa ata1: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen t4
Sep 21 11:34:58 zappa ata1: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen t3
Sep 21 11:34:58 zappa ata1: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen t2

Tijo, is there any stuff I should try out? I mean, if I got this right,
the boot problems itself were fixed by removing nv_hardreset, but is
there a way around getting the log flooded by "EH complete" now?


Kind regards, Konsti



-- 
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E  A080 1E69 3FDA EF62 FCEF
--

From: Tejun Heo
Date: Wednesday, September 24, 2008 - 2:36 am

Please apply the attached patch and post the resulting log.  Please
don't forget to turn on KALLSYMS.

Thanks.

-- 
tejun
From: Konstantin Kletschke
Date: Wednesday, September 24, 2008 - 3:59 am

Mainly this is the patch you posted in

http://marc.info/?l=linux-kernel&m=122033025206886&w=2

which I replied with a useless reply containing no debug.
After that I turned KALLSYMS on and posted, did you miss this?

here:

http://marc.info/?l=linux-kernel&m=122033745615187&w=2

Kind Regards, Konsti

-- 
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E  A080 1E69 3FDA EF62 FCEF
--

From: Konstantin Kletschke
Date: Thursday, September 25, 2008 - 1:18 am

Tejun, sorry for misspelling your name as Tijo :-/

Do I hit your spamfilter with my KSYMOOPS enabled debug outputs or some
sort of that? If not, sorry for the inconvenience and take your time.

Regards, Konsti

-- 
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E  A080 1E69 3FDA EF62 FCEF
--

From: Tejun Heo
Date: Saturday, September 27, 2008 - 2:22 pm

Sorry about lack of response.  I was on vacation after a series of
conferences.  There's a bug entry for this problem and I just posted a
patch.

  http://bugzilla.kernel.org/show_bug.cgi?id=11615

Can you please try the patch attached there and post the result there?

Thanks.

-- 
tejun
--

From: Konstantin Kletschke
Date: Tuesday, September 30, 2008 - 1:12 am

Well, this Patch simply installs the old weird behaviour with the
result in sometimes no cold boot possible yielding into

ataX: link too slow to response ...
ataX: COMRESET failed (errno=-16)

After that powercycle required. I wrote it down from memory. If there
is a detailed log required I can create them tomorrow when I have my
Hands back onto the machine.

Kind Regards, Konsti

-- 
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E  A080 1E69 3FDA EF62 FCEF
--

From: Tejun Heo
Date: Tuesday, September 30, 2008 - 7:09 am

Ah.. okay, so generic is not the only one affected by the original
hardreset problem.  Can you please post the result of "lspci -nn"?

-- 
tejun
--

From: Konstantin Kletschke
Date: Tuesday, September 30, 2008 - 8:16 am

00:00.0 Host bridge [0600]: nVidia Corporation nForce3 250Gb Host Bridge [10de:00e1] (rev a1)
00:01.0 ISA bridge [0601]: nVidia Corporation nForce3 250Gb LPC Bridge [10de:00e0] (rev a2)
00:01.1 SMBus [0c05]: nVidia Corporation nForce 250Gb PCI System Management [10de:00e4] (rev a1)
00:02.0 USB Controller [0c03]: nVidia Corporation CK8S USB Controller [10de:00e7] (rev a1)
00:02.1 USB Controller [0c03]: nVidia Corporation CK8S USB Controller [10de:00e7] (rev a1)
00:02.2 USB Controller [0c03]: nVidia Corporation nForce3 EHCI USB 2.0 Controller [10de:00e8] (rev a2)
00:05.0 Bridge [0680]: nVidia Corporation CK8S Ethernet Controller [10de:00df] (rev a2)
00:06.0 Multimedia audio controller [0401]: nVidia Corporation nForce3 250Gb AC'97 Audio Controller [10de:00ea] (rev a1)
00:08.0 IDE interface [0101]: nVidia Corporation CK8S Parallel ATA Controller (v2.5) [10de:00e5] (rev a2)
00:0a.0 IDE interface [0101]: nVidia Corporation CK8S Serial ATA Controller (v2.5) [10de:00e3] (rev a2)
00:0b.0 PCI bridge [0604]: nVidia Corporation nForce3 250Gb AGP Host to PCI Bridge [10de:00e2] (rev a2)
00:0e.0 PCI bridge [0604]: nVidia Corporation nForce3 250Gb PCI-to-PCI Bridge [10de:00ed] (rev a2)
00:18.0 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration [1022:1100]
00:18.1 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map [1022:1101]
00:18.2 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller [1022:1102]
00:18.3 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control [1022:1103]
01:00.0 VGA compatible controller [0300]: nVidia Corporation NV34 [GeForce FX 5200] [10de:0322] (rev a1)
02:06.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ [10ec:8139] (rev 10)



-- 
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E  A080 1E69 3FDA EF62 FCEF
--

From: Tejun Heo
Date: Tuesday, September 30, 2008 - 9:47 am

Please apply the attached patch and see whether the problem goes away.
Also, can you test whether hotplug works with the patch applied?
CK804 had problems with hotplug w/ HRST removed.  I wanna make sure
nf2/3 doesn't have the same problem.

Thanks.

-- 
tejun
From: Konstantin Kletschke
Date: Wednesday, October 1, 2008 - 12:38 am

Erm... I never did Hotplug on SATA, should I plug out the Disk out of
the Mainboard Connector to see what happens? I suspect I need another

Thats no problem, but one question:


go onto vanilla 2.6.27_rc7 WITH or withOUT
sata_nv-reinstate-nv_hardreset.patch?


Regards, Konsti


-- 
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E  A080 1E69 3FDA EF62 FCEF
--

From: Tejun Heo
Date: Wednesday, October 1, 2008 - 12:53 am

Or you can boot into single mode, ro mount / with kernel messages
redirected to console and hot unplug/plug the root disk and see what

With.

Thanks.

-- 
tejun
--

From: Konstantin Kletschke
Date: Wednesday, October 1, 2008 - 12:30 pm

Well, this way the Situation is the following:

TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
NET: Registered protocol family 1
SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
SGI XFS Quota Management subsystem
msgmni has been set to 2008
io scheduler noop registered
io scheduler cfq registered (default)
pci 0000:01:00.0: Boot video device
Linux agpgart interface v0.103
forcedeth: Reverse Engineered nForce ethernet driver. Version 0.61.
ACPI: PCI Interrupt Link [LKLN] enabled at IRQ 22
forcedeth 0000:00:05.0: PCI INT A -> Link[LKLN] -> GSI 22 (level, low) -> IRQ 22
forcedeth 0000:00:05.0: setting latency timer to 64
nv_probe: set workaround bit for reversed mac addr
Switched to high resolution mode on CPU 0
forcedeth 0000:00:05.0: ifname eth0, PHY OUI 0x732 @ 1, addr 00:13:8f:fd:f9:26
forcedeth 0000:00:05.0: csum timirq lnktim desc-v2
netconsole: local port 6665
netconsole: local IP 10.10.0.1
netconsole: interface eth0
netconsole: remote port 6666
netconsole: remote IP 10.10.0.18
netconsole: remote ethernet address 00:22:15:68:2c:eb
netconsole: device eth0 not up yet, forcing it
eth0: no link during initialization.
eth0: link up.
console [netcon0] enabled
netconsole: network logging started
Driver 'sd' needs updating - please use bus_type methods
sata_nv 0000:00:0a.0: version 3.5
ACPI: PCI Interrupt Link [LTID] enabled at IRQ 21
sata_nv 0000:00:0a.0: PCI INT A -> Link[LTID] -> GSI 21 (level, low) -> IRQ 21
sata_nv 0000:00:0a.0: setting latency timer to 64
scsi0 : sata_nv
scsi1 : sata_nv
ata1: SATA max UDMA/133 cmd 0xf80 ctl 0xf00 bmdma 0xd800 irq 21
ata2: SATA max UDMA/133 cmd 0xe80 ctl 0xe00 bmdma 0xd808 irq 21
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7: SAMSUNG HD753LJ, 1AA01106, max UDMA7
ata1.00: 1465149168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata1.00: configured for UDMA/133
isa bounce pool size: 16 pages
scsi 0:0:0:0: Direct-Access     ATA      ...
From: Benny Halevy
Date: Sunday, October 5, 2008 - 3:02 am

With commit 4c1eb90a0908c0c60db2169dce08fb672e7582f1 (v2.6.27-rc8),
I see no spurious EH complete events as I saw with 2.6.27-rc <= 7.

Thanks,

Benny
--

From: Tejun Heo
Date: Sunday, October 5, 2008 - 3:18 am

You're on CK804, right?

-- 
tejun
--

From: Benny Halevy
Date: Sunday, October 5, 2008 - 3:34 am

No, MCP55 actually:

$ lspci | grep IDE
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
--

From: Tejun Heo
Date: Sunday, October 5, 2008 - 3:42 am

Right, the commit fixes generic and CK804 while break nf2/3.  Can you
also try the following patch?

  http://article.gmane.org/gmane.linux.ide/34942/raw

-- 
tejun
--

From: Benny Halevy
Date: Sunday, October 5, 2008 - 4:18 am

Log looks clean with this patch as well.

Benny
--

From: Konstantin Kletschke
Date: Monday, October 6, 2008 - 2:19 pm

Hm, sadly doesn't look so well:

Linux version 2.6.27-rc8 (root@zappa) (gcc version 4.3.1 (Gentoo 4.3.1-r1 p
2 CEST 2008
Command line: auto BOOT_IMAGE=linux ro root=801 netconsole=6665@10.10.0.1/e
5:68:2c:eb loglevel=8 debug
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003ffb0000 (usable)
 BIOS-e820: 000000003ffb0000 - 000000003ffc0000 (ACPI data)
 BIOS-e820: 000000003ffc0000 - 000000003fff0000 (ACPI NVS)
 BIOS-e820: 000000003fff0000 - 0000000040000000 (reserved)
 BIOS-e820: 00000000ff7c0000 - 0000000100000000 (reserved)
last_pfn = 0x3ffb0 max_arch_pfn = 0x3ffffffff
init_memory_mapping
 0000000000 - 003fe00000 page 2M
 003fe00000 - 003ffb0000 page 4k
kernel direct mapping tables up to 3ffb0000 @ 8000-b000
last_map_addr: 3ffb0000 end: 3ffb0000
DMI 2.3 present.
ACPI: RSDP 000F8710, 0014 (r0 ACPIAM)
ACPI: RSDT 3FFB0000, 0030 (r1 A M I  OEMRSDT   8000607 MSFT       97)
ACPI: FACP 3FFB0200, 0084 (r2 A M I  OEMFACP   8000607 MSFT       97)
ACPI: DSDT 3FFB03F0, 3F26 (r1  K8UNF K8UNF201      201 INTL  2002026)
ACPI: FACS 3FFC0000, 0040
ACPI: APIC 3FFB0390, 005C (r1 A M I  OEMAPIC   8000607 MSFT       97)
ACPI: OEMB 3FFC0040, 0056 (r1 A M I  AMI_OEM   8000607 MSFT       97)
(4 early reservations) ==> bootmem [0000000000 - 003ffb0000]
  #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 000000100
  #1 [0000200000 - 000058a730]    TEXT DATA BSS ==> [0000200000 - 000058a73
  #2 [000009fc00 - 0000100000]    BIOS reserved ==> [000009fc00 - 000010000
  #3 [0000008000 - 0000009000]          PGTABLE ==> [0000008000 - 000000900
 [ffffe20000000000-ffffe20000dfffff] PMD -> [ffff880001200000-ffff880001fff
Zone PFN ranges:
  DMA      0x00000000 -> 0x00001000
  DMA32    0x00001000 -> 0x00100000
  ...
From: Konstantin Kletschke
Date: Monday, October 6, 2008 - 2:23 pm

What I forgot is, that the 

ata2: EH pending after 5 tries, giving up
ata2: EH complete
ata2: EH pending after 5 tries, giving up
ata2: EH complete
ata2: EH complete
ata2: EH complete
ata2: EH complete
ata2: EH complete
ata2: EH complete
ata2: EH complete
ata2: EH complete
ata2: EH complete
ata2: EH complete

is away now and tomorrow morning I will take care if it 
manages to do a cold boot after it was switched of this night.

-- 
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E  A080 1E69 3FDA EF62 FCEF
--

From: Tejun Heo
Date: Monday, October 6, 2008 - 6:02 pm

Hmm... strange.  Can you please try the attached patch?  It's basically
the same with a bit more debug information.

Thanks.

-- 
tejun
From: Konstantin Kletschke
Date: Monday, October 6, 2008 - 11:04 pm

No Problem.

I had difficulties to cold boot the machine today, I had to powercycle a
lot. Then I applied the patch and it bootet immediately:

sata_nv 0000:00:0a.0: version 3.5
ACPI: PCI Interrupt Link [LTID] enabled at IRQ 21
sata_nv 0000:00:0a.0: PCI INT A -> Link[LTID] -> GSI 21 (level, low) -> IRQ 21
sata_nv 0000:00:0a.0: setting latency timer to 64
scsi0 : sata_nv
scsi1 : sata_nv
ata1: SATA max UDMA/133 cmd 0xf80 ctl 0xf00 bmdma 0xd800 irq 21
ata2: SATA max UDMA/133 cmd 0xe80 ctl 0xe00 bmdma 0xd808 irq 21
ata1: hard resetting link
XXX CLASSIFY 01:00:00
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7: SAMSUNG HD753LJ, 1AA01106, max UDMA7
ata1.00: 1465149168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata1.00: configured for UDMA/133
ata1: EH complete
ata2: hard resetting link
ata2: SATA link down (SStatus 0 SControl 300)
ata2: EH complete
isa bounce pool size: 16 pages
scsi 0:0:0:0: Direct-Access     ATA      SAMSUNG HD753LJ  1AA0 PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 1465149168 512-byte hardware sectors (750156 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] 1465149168 512-byte hardware sectors (750156 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 >
sd 0:0:0:0: [sda] Attached SCSI disk
PNP: No PS/2 controller found. Probing ports directly.
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
TCP cubic registered
NET: Registered protocol family 17
XFS mounting filesystem sda1
Ending clean XFS mount for filesystem: sda1
VFS: Mounted root (xfs filesystem) readonly.
Freeing unused kernel memory: 220k freed


Then I switched off and smoked a cigarette, booting then lasted a bit longer:

ata1: link ...
From: Benny Halevy
Date: Tuesday, October 7, 2008 - 1:10 am

See, cigarettes are bad for you(r computer) ;-)


--

From: Konstantin Kletschke
Date: Wednesday, October 8, 2008 - 1:08 am

Yes... But where does he know from? I have no /dev/eyes still ;-)

-- 
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E  A080 1E69 3FDA EF62 FCEF
--

From: Tejun Heo
Date: Monday, October 13, 2008 - 1:36 am

It has all the fans for a reason.  :-)

Eh... Joke aside.  I still don't know what's going on here.  Before
2.6.26, you always had clean boot, right?

-- 
tejun
--

From: Tejun Heo
Date: Monday, October 13, 2008 - 1:38 am

Also, can you please repeat the test several times and see whether there
are some patterns?  And please also try pre-2.6.26 kernel a few times
just to make sure it's not some bad coincidence.

Thanks.

-- 
tejun
--

From: Konstantin Kletschke
Date: Monday, October 13, 2008 - 7:29 am

Still I have the last suggested patch running and the machine solves to
boot cold any time (I am shure meanwhile, turned off the whole sunday it
botted this morning and so on - any time).

Consistent is this issue telling something about MISSCLASSIFIED:

sata_nv 0000:00:0a.0: version 3.5
ACPI: PCI Interrupt Link [LTID] enabled at IRQ 21
sata_nv 0000:00:0a.0: PCI INT A -> Link[LTID] -> GSI 21 (level, low) -> IRQ 21
sata_nv 0000:00:0a.0: setting latency timer to 64
scsi0 : sata_nv
scsi1 : sata_nv
ata1: SATA max UDMA/133 cmd 0xf80 ctl 0xf00 bmdma 0xd800 irq 21
ata2: SATA max UDMA/133 cmd 0xe80 ctl 0xe00 bmdma 0xd808 irq 21
ata1: hard resetting link
ata1: link is slow to respond, please be patient (ready=0)
ata1: SRST failed (errno=-16)
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1: link online but device misclassified, retrying
ata1: hard resetting link
ata1: link is slow to respond, please be patient (ready=0)
ata1: SRST failed (errno=-16)
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1: link online but device misclassified, retrying
ata1: hard resetting link
XXX CLASSIFY 01:00:00
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7: SAMSUNG HD753LJ, 1AA01106, max UDMA7
ata1.00: 1465149168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata1.00: configured for UDMA/133
ata1: EH complete
ata2: hard resetting link
ata2: SATA link down (SStatus 0 SControl 300)
ata2: EH complete
isa bounce pool size: 16 pages
scsi 0:0:0:0: Direct-Access     ATA      SAMSUNG HD753LJ  1AA0 PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 1465149168 512-byte hardware sectors (750156 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] 1465149168 512-byte hardware sectors (750156 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't ...
From: Tejun Heo
Date: Tuesday, October 14, 2008 - 11:15 pm

Hmm... this is proving to be much more difficult than I expected. :-(

Can you please try the attached patch?

Thanks.

-- 
tejun
From: Konstantin Kletschke
Date: Friday, October 17, 2008 - 1:08 am

Hello!

The patch before, I told it always boots but the last two days I had
much difficulties to boot. It was hard resetting and waiting a couple of
times before bailing out with no mountable root FS. 

One time it was switched off three hours and the next time overnight, I
powercyced a couple of times. If I am the only one experiencing this
difficulties (am I the only one with this chipset/revision reporting?),
shouldn't we consider this machine... broken? I change SATA cables from
time to time, but these seem all to be okay. I mean, if it is really
_that_ strange...


I fetched 2.6.27 now and tried this patch. A short powercycle, reboot
wasn't a problem yesterday, this morning also not, so looks well so far,
tihs is how /var/log/messages looks now:

Oct 17 07:24:49 zappa sata_nv 0000:00:0a.0: version 3.5
Oct 17 07:24:49 zappa ACPI: PCI Interrupt Link [LTID] enabled at IRQ 21
Oct 17 07:24:49 zappa sata_nv 0000:00:0a.0: PCI INT A -> Link[LTID] -> GSI 21 (level, low) -> IRQ 21
Oct 17 07:24:49 zappa sata_nv 0000:00:0a.0: setting latency timer to 64
Oct 17 07:24:49 zappa scsi0 : sata_nv
Oct 17 07:24:49 zappa scsi1 : sata_nv
Oct 17 07:24:49 zappa ata1: SATA max UDMA/133 cmd 0xf80 ctl 0xf00 bmdma 0xd800 irq 21
Oct 17 07:24:49 zappa ata2: SATA max UDMA/133 cmd 0xe80 ctl 0xe00 bmdma 0xd808 irq 21
Oct 17 07:24:49 zappa ata1: hard resetting link
Oct 17 07:24:49 zappa ata1: SATA link down (SStatus 0 SControl 300)
Oct 17 07:24:49 zappa ata1: EH complete
Oct 17 07:24:49 zappa ata2: hard resetting link
Oct 17 07:24:49 zappa XXX CLASSIFY 01:00:00
Oct 17 07:24:49 zappa ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Oct 17 07:24:49 zappa ata2.00: ATA-7: SAMSUNG HD753LJ, 1AA01106, max UDMA7
Oct 17 07:24:49 zappa ata2.00: 1465149168 sectors, multi 16: LBA48 NCQ (depth 0/32)
Oct 17 07:24:49 zappa ata2.00: configured for UDMA/133
Oct 17 07:24:49 zappa ata2: EH complete
Oct 17 07:24:49 zappa isa bounce pool size: 16 pages
Oct 17 07:24:49 zappa scsi 1:0:0:0: Direct-Access     ATA      ...
From: Tejun Heo
Date: Monday, October 20, 2008 - 11:08 pm

Eh... I just bought a used opteron system with nf2/3.  I will receive
the machine tomorrow.  Hopefully, I'll be able find out what the heck is
going on here.

Thanks.

-- 
tejun
--

From: Konstantin Kletschke
Date: Monday, October 27, 2008 - 2:22 am

Hello Tejun!

After my short reply I had a 2.6.27 running with

[-- Attachment #2: sata_nv-nf2-hrst-debug-take2.patch --]

fine so far. It bootet immediately at any time I powercycled the
machine. Hot, cold and reboot seems to be no problem, no
/var/log/messages flooding also.

I just wanted to inform you, whatever the investigations result into, on
_my_ machine this incarnation is just fine.

Regards, Konsti



-- 
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E  A080 1E69 3FDA EF62 FCEF
--

From: Tejun Heo
Date: Sunday, November 2, 2008 - 8:04 pm

Great.  My test machine just confirmed the fix too (my first purchase
was borked so I had to get another one so the delay).  I'll forward the
fix to upstream.

Thanks a lot.

-- 
tejun
--

From: Konstantin Kletschke
Date: Monday, November 3, 2008 - 1:32 am

No Problem at all. If something gets borked - which absolutely is
allowed to happen - I have fun to sort this out.

Regards, Konsti


-- 
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E  A080 1E69 3FDA EF62 FCEF
--

From: Erich Mounce
Date: Saturday, December 13, 2008 - 5:49 pm

I'm experiencing this issue with CD-RWs only.  Power cycling allows me to eject
the CD-RW.  I'm using an ASUS G50V laptop with kernel 2.6.27-gentoo-r4.

lspci | grep ATA
00:1f.2 SATA controller: Intel Corporation Mobile SATA AHCI Controller (rev 03)

dmesg output:
[  189.184114] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
 frozen
[  189.184160] ata2.00: cmd a0/01:00:00:00:10/00:00:00:00:00/a0 tag 0 dma 
4096 in
[  189.184164]          cdb 28 00 00 05 70 74 00 00  02 00 00 00 00 00 00 00
[  189.184168]          res 40/00:03:00:fe:00/00:00:00:00:00/a0 Emask 
0x4 (timeout)
[  189.184176] ata2.00: status: { DRDY }
[  189.184191] ata2: hard resetting link
[  194.538122] ata2: link is slow to respond, please be patient (ready=0)
[  199.185071] ata2: COMRESET failed (errno=-16)
[  199.185087] ata2: hard resetting link
[  204.539127] ata2: link is slow to respond, please be patient (ready=0)
[  209.231125] ata2: COMRESET failed (errno=-16)
[  209.231153] ata2: hard resetting link
[  214.585124] ata2: link is slow to respond, please be patient (ready=0)
[  244.267114] ata2: COMRESET failed (errno=-16)
[  244.267130] ata2: limiting SATA link speed to 1.5 Gbps
[  244.267136] ata2: hard resetting link
[  249.315112] ata2: COMRESET failed (errno=-16)
[  249.315124] ata2: reset failed, giving up
[  249.315130] ata2.00: disabled
[  249.315153] ata2: EH complete


--

From: Tejun Heo
Date: Saturday, December 13, 2008 - 8:51 pm

It's a different failure on a different controller.  Can you please
file a bug report on bugzilla.kernel.org and...

1. Reproduce the problem with kernel-2.6.28-rc8.
2. Attach boot and the failure kernel log.
3. Attach the output of "lspci -nn".

Thanks.

-- 
tejun
--

From: Konstantin Kletschke
Date: Monday, October 13, 2008 - 7:25 am

Yes, with 2.6.25 it always had a clean booting system.

From time to time I do an update and with some 2.6.26_rcX I had
problems the system not solving a cold boot sometimes. Long I suspected
a hardware issue but one time I went down to 2.6.25 and the problem was
away. 

Then I updated to 2.6.27_rcX because I was hunting down some nfsv4
error, considering this as a bug or an issue in front of screen. Then I
realised the cold boot problem again which I almost forgot meanwhile or
considered closed from 2.6.26_rcX over 2.6.26 to 2.6.27_rcX.

Konsti

-- 
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E  A080 1E69 3FDA EF62 FCEF
--

From: Konstantin Kletschke
Date: Thursday, September 25, 2008 - 1:15 am

Replying to myself...

Indeed my HArddisk was plugged into SATA2 and SATA1 was left empty, I
moved it to SATA1 now.


Now:

Sep 25 08:35:29 zappa ata1: SATA max UDMA/133 cmd 0xf80 ctl 0xf00 bmdma 0xd800 irq 21
Sep 25 08:35:29 zappa ata2: SATA max UDMA/133 cmd 0xe80 ctl 0xe00 bmdma 0xd808 irq 21
Sep 25 08:35:29 zappa ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 25 08:35:29 zappa ata1.00: ATA-7: SAMSUNG HD753LJ, 1AA01106, max UDMA7
Sep 25 08:35:29 zappa ata1.00: 1465149168 sectors, multi 16: LBA48 NCQ (depth 0/32)
Sep 25 08:35:29 zappa ata1.00: configured for UDMA/133


Sep 25 08:35:29 zappa ata2: EH pending after 5 tries, giving up
Sep 25 08:35:29 zappa ata2: EH complete
Sep 25 08:35:29 zappa ata2: EH pending after 5 tries, giving up
Sep 25 08:35:29 zappa ata2: EH complete

Kind REgards, Konsti

-- 
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E  A080 1E69 3FDA EF62 FCEF
--

Previous thread: MTD/block regression (was Re: Slub debugging NAND error in 2.6.25.10.atmel.2) by Haavard Skinnemoen on Friday, August 29, 2008 - 7:28 am. (5 messages)

Next thread: Re: [PATCH] x86: split e820 reserved entries record to late v2 by David Witbrodt on Friday, August 29, 2008 - 7:48 am. (2 messages)