Re: Believed resolved: SATA kern-buffRd read slow: based on promise driver bug

Previous thread: Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override by Bodo Eggert on Sunday, December 30, 2007 - 10:50 am. (13 messages)

Next thread: Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override by Robert Hancock on Sunday, December 30, 2007 - 11:22 am. (1 message)
From: Robert Hancock
Date: Sunday, December 30, 2007 - 11:16 am

Have you tried using a different block size to see how that effects the 

It's somewhat intentional that some of the hdparm commands (like for 
settting transfer modes, enable/disable DMA, etc.) don't work with 
libata. Most of them aren't necessary at all as correct DMA settings, 

It's the same libata code, so the same applies to some of the hdparm 

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/

--

From: Linda Walsh
Date: Monday, December 31, 2007 - 5:19 pm

----
    There is some interaction with the large block size (but only on the 
SATA
disk).  Counts were adjusted to keep the read near 2G (~2x physical memory).
 From 1k-16k block sizes, I got into the low-mid 40MB/s on buffered SATA
(compared to 50-60MB/s on ATA & SCSI).  Starting at 32k-64k, the read
rate began falling and at 128k block-reads-at-a-time or larger, it drops 
below
20MB/s (again, only on buffered SATA).   It's hard to imagine what would
slow down buffered SATA reads but not ATA and SCSI reads of the same
size.  I'm using the 'cfq' scheduler with everything running at default
priorities, but again, why only SATA slowness?  It seems that at the driver
level, using direct reads, the SATA disk has the highest read rate (near
80MB/s). 

    It would certainly be perverse to have faster driver & device 
performance
---
    The only way I could tell before was using hdparm to read the 
parameters.
Since that doesn't work, it's hard to tell if they are set correctly, 
but given
the high performance at the device driver level, I'm guessing the params
---
    Hmm... might be nice as an "RFE" to at least have the 'read-status'
commands work to see what the params are set to. 

    More importantly, how does one set parameters for acoustic and power
saving parameters?  Some of my disks are used as 'backup' devices for my
other computers.  With the ATA disks, they were kept "spun down" when not
being used (only used, 'normally', in early AM hours).

    Another new "problem" (not as important) -- even though SATA disks are
called with "sdX", my ATA disks that *were* at hda-hdc are now at hde-hdg.
Devices hda-hdd are not populated in my dev directory on bootup.  Of course
this throws off boot-scripts that set diskparams by "hd<name>" and not
by label (using hdparm).  Seems like the SATA disks are suffering a partial
identity problem -- seeming to reserve hda-hdd, but using the "sd" disk 
names.
Is that a known problem?  If not, I'll add it to my queue for ...
From: Robert Hancock
Date: Monday, December 31, 2007 - 5:32 pm

Not too sure on that one. I suspect one might have to trace the actual 
requests being received at the driver level somehow with buffered reads 


I believe those hdparm commands for power-save and AAM are supposed to 
work (they just issue an ATA command to the disk). The ones that aren't 
implemented are the ones that actually commanded the IDE layer, like DMA 

Could be a udev problem, as it's what does the device naming..

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/

--

From: Mark Lord
Date: Tuesday, January 1, 2008 - 9:06 am

...

Most hdparm flags work perfectly fine with libata,
unless perhaps you're using Fedora, which for some odd
reason was using a 2+ year old copy of hdparm until
very very recently.

As others noted, the only things not working are things that
libata itself chooses not to allow from userspace because libata
has better low-level drivers that can set those things automatically
in a more reliable fashion than we ever could with drivers/ide:
  DMA, 32-bit I/O, PIO/DMA xfer rates, hotplug stuff.

The rest, including acoustic and power-saving parameters,
work just fine with libata.

Cheers
--

From: Alan Cox
Date: Monday, December 31, 2007 - 6:58 pm

Try disabling NCQ - see if you've got a drive with the 'NCQ = no

Beats me - something is wrong that your setup triggers - could be

hdparm supports identify to read modes on drives with libata. The one



NOTABUG - your BIOS has decided to move them from the legacy addresses so
they move from hda-d to e-g.

--

From: Linda Walsh
Date: Wednesday, January 2, 2008 - 1:09 pm

---
   I'm not aware, off hand, how to disable NCQ.  I haven't had any
----

  I have hdparm-v7.7. 
There are some areas where it shows information, but areas where it
does not work jump out and lead me to suspect whether or not areas
that don't give explicit "ERROR" messages are presenting valid info.

Problem areas (using hdparm, disk=Seagate Barracuda 16MB cache, model=
ST3750640AS):
1) The drives current 'multicount' setting isn't readable or settable.
param "-i" shows "?16?" (with question marks around 16) and "-I" simply
shows "?" for the current setting.  Attempting to <read|set> it:
 "HDIO_<GET|SET>_MULTCOUNT failed: Inappropriate ioctl for device"
2) Drive Advanced Power Management setting("-B") (write-only):
 "HDIO_DRIVE_CMD failed: Input/output error"
3) Drive Acoustic ("-M"), read = " acoustic      = not supported",
write = " HDIO_DRIVE_CMD:ACOUSTIC failed: Input/output error"
   Note: drive detailed info from "-I" says:
        "Recommended acoustic management value: 254, current value: 0"
   (i.e. - there seems to be no way to set recommended value)
4) 32-bit IO setting ("-c") (don't know if this important given the disk's
raw-read speed, it may be meaningless for SATA)
 "IO_support    =  0 (default 16-bit)"*
*
---
    I don't follow. It is an internal drive.  Are their software "logically
unplug" commands that automatically re-"plug-in" the drive on access
and spin it back up like the spindown/standby timeout does?  Or were
you referring to SATA's general hot/warm plug ability (if my hardware
Sorry for my unclear usage -- by "problem" I meant that it was(is) an 
"unexpected behavior".  I'm sure the kernel is following the BIOS's
directions, I'm just not sure why a (supposedly), SATA-only card would
cause my BIOS to reserve 4 "[P]ATA-drives" after installing the
card.  It may be symptomatic of some "cost-cutting" measure by the
card manufacurer.  I just don't know why it's happening right now.

*however* -- it is "annoying" -- if the kernel reserves hda-hdd ...
From: Robert Hancock
Date: Wednesday, January 2, 2008 - 5:25 pm

See here:


I don't think you can get or get the multi count currently, it just uses 



I think they were referring to physically hotplugging the drive. This is 
more practical if you have a removable drive caddy, or if the drive is 
hooked up through eSATA.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/

--

From: Linda Walsh
Date: Wednesday, January 2, 2008 - 9:25 pm

[Empty message]
From: Mikael Pettersson
Date: Thursday, January 3, 2008 - 1:37 am

Linda Walsh writes:
 > Robert Hancock wrote:
 > > Linda Walsh wrote:
 > >> Alan Cox wrote:
 > >>>> rate began falling; at 128k block-reads-at-a-time or larger, it 
 > >>>> drops below
 > >>>> 20MB/s (only on buffered SATA).
 > >>> Try disabling NCQ - see if you've got a drive with the 'NCQ = no
 > >>> readahead' flaw.
 > > http://linux-ata.org/faq.html#ncq
 > ---
 >     When drive initializes, dmesg says it has NCQ (depth 0/32)
 >     Reading the queue_depth under /sys, shows a queuedepth of "1".
 > 
 > But more importantly -- I notice a chronic error message associate
 > with this drive that may be causing some or all of the problem:
 > ---
 > Jan  2 20:06:10 Ishtar kernel: ata1.00: exception Emask 0x0 SAct 0x0 
 > SErr 0x0 action 0x2
 > Jan  2 20:06:10 Ishtar kernel: ata1.00: port_status 0x20080000
 > Jan  2 20:06:10 Ishtar kernel: ata1.00: cmd 
 > c8/00:10:30:06:03/00:00:00:00:00/e0 tag 0 cdb 0x0 data 8192 in
 > Jan  2 20:06:10 Ishtar kernel:          res 
 > 50/00:00:3f:06:03/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
 > Jan  2 20:06:13 Ishtar kernel: ata1: limiting SATA link speed to 1.5 Gbps
 > Jan  2 20:06:13 Ishtar kernel: ata1.00: exception Emask 0x0 SAct 0x0 
 > SErr 0x0 action 0x6
 > Jan  2 20:06:13 Ishtar kernel: ata1.00: port_status 0x20080000
 > Jan  2 20:06:13 Ishtar kernel: ata1.00: cmd 
 > c8/00:10:00:8b:04/00:00:00:00:00/e0 tag 0 cdb 0x0 data 8192 in
 > Jan  2 20:06:13 Ishtar kernel:          res 
 > 50/00:00:0f:8b:04/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
 > Jan  2 20:06:14 Ishtar kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 
 > 0x0 action 0x3
 > Jan  2 20:06:14 Ishtar kernel: ata1: hotplug_status 0x80
 > Jan  2 20:06:15 Ishtar kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 
 > 0x0 action 0x3
 > Jan  2 20:06:15 Ishtar kernel: ata1: hotplug_status 0x80
 > ---
 > What da heck?

Looks like the Promise ASIC SG bug. Apply
<http://user.it.uu.se/~mikpe/linux/patches/sata_promise/patch-sata_promise-1-asic-sg-bug-fix-v3-2.6.23>
and let us know ...
From: Robert Hancock
Date: Thursday, January 3, 2008 - 7:49 pm

Have you tried 2.6.24-rc6? If the problem still occurs there, you should 

Queue depth 0/32 means the drive supports a queue depth of 32 but the 

ATA disks can have FUA support, but the support is disabled in libata by 
default. (There's a fua parameter on libata module to enable it I believe.)
--

From: Mikael Pettersson
Date: Friday, January 4, 2008 - 4:23 am

Linda Walsh writes:
 > Mikael Pettersson wrote:
 > > Linda Walsh writes:
 > >  > Robert Hancock wrote:
 > >  > > Linda Walsh wrote:
 > >  > >>>> read rate began falling; at 128k block-reads-at-a-time or larger, it 
 > >  > >>>> drops below 20MB/s (only on buffered SATA).
 > >  > 
 > >  > But more importantly -- I notice a chronic error message associate
 > >  > with this drive that may be causing some or all of the problem:
 > >  > ---
 > >  > ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
 > >  > ata1.00: port_status 0x20080000
 > >  > ata1.00: cmd c8/00:10:30:06:03/00:00:00:00:00/e0 tag 0 cdb 0x0 data 8192 in
 > >  >          res 50/00:00:3f:06:03/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
 > >  > ata1: limiting SATA link speed to 1.5 Gbps
 > >
 > >
 > > Looks like the Promise ASIC SG bug. Apply
 > > <http://user.it.uu.se/~mikpe/linux/patches/sata_promise/patch-sata_promise-1-asic-sg-bug-fix-v3-2.6.23>
 > > and let us know if things improve.
 > >
 > > /Mikael
 > >   
 > ---
 >     Yep!  Hope that's making it into a patch soon or, at least 2.6.24.
 >     Kernel buffered

Good to hear that it solved this problem.
The patch is in 2.6.24-rc2 and newer kernels, and will be sent
to -stable for the 2.6.23 and 2.6.22 series.

 >     I seem to remember reading about some problems with Promise SATA & ACPI.
 > Does this address that or is that a separate issue?  (Am using no-acpi for

sata_promise does nothing ACPI-related. It doesn't need to.
(Drives may be a different story.)

 >     Is the above bug mentioned/discussed in the linux-ide archives?

Yes.

 >  That
 > and I'd like to find out why TCQ/NCQ doesn't work with the Seagate drives --

The driver doesn't yet support NCQ.
--

From: Linda Walsh
Date: Sunday, January 6, 2008 - 1:21 pm

---
    Will 'likely' wait till -stable since I use the machine as a 'server'
----
    Is 'main' diff between NCQ/TCQ that TCQ can re-arrange 'write'
priority under driver control, whereas NCQ is mostly a FIFO queue?

    On a Journal'ed file system, isn't "write-order" control required
for integrity?  That would seem to imply TCQ could be used, but
NCQ doesn't seem to offer much benefit, since the higher level
kernel drivers usually have a "larger picture" of sectors that need
to be written.  The only advantage I can see for NCQ drives might
be that the kernel may not know the drive's internal physical
structure nor where the disk is in its current revolution.  That could
allow drive write re-ordering where based on the exact-current state
of the drive that the kernel might not have access to, but it seems
this would be a minor benefit -- and, depending on firmware,
possibly higher overhead in command processing?

Am trying to differentiate NCQ/TCQ and SAS v. SCSI benefits.
It seems both support (SAS & SATA) some type of port-multiplier/
multiplexor/ option to allow more disks/port.

However, (please correct?) SATA uses a hub type architecture while
SAS uses a switch architecture.  My experience with network hubs vs.
switches is that network hubs can be much slower if there is
communication contention.  Is the word 'hub' being used in the
"shared-communication media sense", or is someone using the term
'hub' as a [sic] replacement for a 'switch'?




--

From: Tejun Heo
Date: Tuesday, January 8, 2008 - 7:30 pm

No, NCQ can reorder although I recently heard that windows issues
overlapping NCQ commands and expects them to be processed in order (what
were they thinking?).

The biggest difference between TCQ and NCQ is that TCQ is for SCSI while
NCQ is for ATA.  Functional difference includes more number of available
tags and ordered tags for TCQ.  The former doesn't matter for single

Port multiplier is a switch too.  It doesn't broadcast anything and
definitely has forwarding buffers inside.  An allegory which makes more
sense is expander to router and port multiplier to switch.  Unless you
wanna nest them, they aren't that different.

-- 
tejun
--