Re: [PATCH] ata: Add support for Long Logical Sectors and Long Physical Sectors

Previous thread: Simple UART driver help and TTY questions by Ira Snyder on Tuesday, August 5, 2008 - 9:33 am. (6 messages)

Next thread: Spinlock recursion in hvc_poll by Jeremy Fitzhardinge on Tuesday, August 5, 2008 - 11:08 am. (5 messages)
From: Matthew Wilcox
Date: Tuesday, August 5, 2008 - 10:26 am

ATA-8 defines two different methods for increasing the length of
sectors.  Long Physical Sectors increase the native sector size
of the disc while pretending to the OS that sectors are still 512 bytes.
Long Logical Sectors increase the length of the reported sector.  This
patch series adds support for both.  It does not add support for
non-power-of-2 sector sizes as they have not been fully specified yet.


--

From: Matthew Wilcox
Date: Tuesday, August 5, 2008 - 10:26 am

From: Matthew Wilcox <willy@linux.intel.com>

Later patches require knowledge of some commands which aren't currently
defined.  While I'm at it, add all the commands that are in ATA8 and
reorder the existing commands to be in the same order as ATA8.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 include/linux/ata.h |   97 ++++++++++++++++++++++++++++++++++----------------
 1 files changed, 66 insertions(+), 31 deletions(-)

diff --git a/include/linux/ata.h b/include/linux/ata.h
index 1c622e2..3815431 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -149,58 +149,93 @@ enum {
 	ATA_REG_IRQ		= ATA_REG_NSECT,
 
 	/* ATA device commands */
-	ATA_CMD_DEV_RESET	= 0x08, /* ATAPI device reset */
+	ATA_CMD_CFA_ERASE_SECTORS = 0xC0,
+	ATA_CMD_CFA_REQUEST_EXT_ERROR = 0x03,
+	ATA_CMD_CFA_TRANSLATE_SECTOR = 0x87,
+	ATA_CMD_CFA_WRITE_MULTI_WITHOUT_ERASE = 0xCD,
+	ATA_CMD_CFA_WRITE_SECTORS_WITHOUT_ERASE = 0x38,
+	ATA_CMD_CHK_MEDIA_CARD_TYPE = 0xD1,
 	ATA_CMD_CHK_POWER	= 0xE5, /* check power mode */
-	ATA_CMD_STANDBY		= 0xE2, /* place in standby power mode */
-	ATA_CMD_IDLE		= 0xE3, /* place in idle power mode */
+	ATA_CMD_CONFIG_STREAM	= 0x51,
+	ATA_CMD_CONF_OVERLAY	= 0xB1,
+	ATA_CMD_DEV_RESET	= 0x08, /* ATAPI device reset */
+	ATA_CMD_DLOAD_MCODE	= 0x92,
 	ATA_CMD_EDD		= 0x90,	/* execute device diagnostic */
 	ATA_CMD_FLUSH		= 0xE7,
 	ATA_CMD_FLUSH_EXT	= 0xEA,
 	ATA_CMD_ID_ATA		= 0xEC,
 	ATA_CMD_ID_ATAPI	= 0xA1,
-	ATA_CMD_READ		= 0xC8,
-	ATA_CMD_READ_EXT	= 0x25,
-	ATA_CMD_WRITE		= 0xCA,
-	ATA_CMD_WRITE_EXT	= 0x35,
-	ATA_CMD_WRITE_FUA_EXT	= 0x3D,
+	ATA_CMD_IDLE		= 0xE3, /* place in idle power mode */
+	ATA_CMD_IDLEIMMEDIATE	= 0xE1,
+	ATA_CMD_NVCACHE		= 0xB6,
+	ATA_CMD_NOP		= 0x00,
+	ATA_CMD_PACKET		= 0xA0,
+	ATA_CMD_PMP_READ	= 0xE4, /* aka READ BUFFER */
+	ATA_CMD_READ		= 0xC8, /* aka READ DMA */
+	ATA_CMD_READ_EXT	= 0x25, /* aka READ DMA EXT */
+	ATA_CMD_READ_QUEUED	= 0xC7,
+	ATA_CMD_READ_QUEUED_EXT	= 0x26,
 	ATA_CMD_FPDMA_READ	= ...
From: Greg Freemyer
Date: Tuesday, August 5, 2008 - 1:10 pm

Updated comment below.


Since these diagnostic commands could be confused with the new "long
sector" commands, should the comment say:

/* Obsolete LBA28 diagnostic commands, in LBA48 replaced by



-- 
Greg Freemyer
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com
--

From: Matthew Wilcox
Date: Tuesday, August 5, 2008 - 10:26 am

From: Matthew Wilcox <willy@linux.intel.com>

ATA 8 adds support for devices that have sector sizes larger than 512
bytes.  We record the sector size in the ata_device and use it instead
of the ATA_SECT_SIZE when the data transfer is a multiple of sectors.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 drivers/ata/libata-core.c |  127 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata-scsi.c |   52 +++++++++++++-----
 include/linux/libata.h    |    4 +-
 3 files changed, 168 insertions(+), 15 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 5ba96c5..8f38d0c 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -446,6 +446,106 @@ void ata_tf_from_fis(const u8 *fis, struct ata_taskfile *tf)
 	tf->hob_nsect	= fis[13];
 }
 
+/**
+ * ata_sect_size - Returns the sector size to use for a command
+ * @command: The ATA command byte
+ * @dev_sect_size: The size of the drive's sectors
+ *
+ * Some commands are specified to transfer (a multiple of) 512 bytes of data
+ * while others transfer a multiple of the number of bytes in a sector.  This
+ * function knows which commands transfer how much data.
+ */
+unsigned ata_sect_size(u8 command, unsigned dev_sect_size)
+{
+	switch (command) {
+	case ATA_CMD_CFA_TRANSLATE_SECTOR:
+	case ATA_CMD_CFA_WRITE_MULTI_WITHOUT_ERASE:
+	case ATA_CMD_CFA_WRITE_SECTORS_WITHOUT_ERASE:
+	case ATA_CMD_READ:
+	case ATA_CMD_READ_EXT:
+	case ATA_CMD_READ_QUEUED:
+	case ATA_CMD_READ_QUEUED_EXT:
+	case ATA_CMD_FPDMA_READ:
+	case ATA_CMD_READ_MULTI:
+	case ATA_CMD_READ_MULTI_EXT:
+	case ATA_CMD_PIO_READ:
+	case ATA_CMD_PIO_READ_EXT:
+	case ATA_CMD_READ_STREAM_DMA_EXT:
+	case ATA_CMD_READ_STREAM_EXT:
+	case ATA_CMD_VERIFY:
+	case ATA_CMD_VERIFY_EXT: /* XXX: not listed in rev 5 */
+	case ATA_CMD_WRITE:
+	case ATA_CMD_WRITE_EXT:
+	case ATA_CMD_WRITE_FUA_EXT:
+	case ATA_CMD_WRITE_DMA_QUEUED:
+	case ATA_CMD_WRITE_DMA_QUEUED_EXT:
+	case ...
From: Alan Cox
Date: Tuesday, August 5, 2008 - 1:46 pm

static u32 ata_sector_or_block[]={...};

	if (test_bit(tf->cmd, &ata_sector_or_block))

looks so much more elegant than a giant switch statement and I suspect

word 106 is not defined in early ATA standards so it would be wise to
check that ATA8 is reported by the drive - and trust the relevant bits
for ATA7/8 as appropriate.

The drivers also need to know when a command is being issued which is a
funny shape as some hardware cannot do it because the internal state
machine knows the sector size and other stuff seems to need the internal
FIFO turning off or other things whacked repeatedly on the head to make
it work.

You also don't need to bother listing all the commands that don't have
data transfer phases as a sector size is meaningless there so we
shouldn't even bother doing a lookup for them. Indeed the more I look at
this the more it seems that keeping long complex ever out of date arrays
is the wrong way to do it.

We will also need a driver flag to indicate support in that HBA but that
is pretty trivial to add

Alan
--

From: Matthew Wilcox
Date: Tuesday, August 5, 2008 - 7:22 pm

Probably ... I did consider it, but I think I was too influenced by the

I'm not sure that's necessary.  The spec says to check whether words are

Yes, I would expect some driver work to be required.  It only gets worse
once we implement the DIX-equivalent.  How do you suggest would be a
good migration path?  We could have the driver set a flag, or call into
the driver from the midlayer to check whether it can cope with a

I didn't want to change behaviour.  We currently set sect_size to 512
for all commands (except READ LONG), and I didn't want to change that
for non-data commands.  I agree with you that it's not relevant.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Alan Cox
Date: Wednesday, August 6, 2008 - 2:06 am

On Tue, 5 Aug 2008 20:22:55 -0600

What early spec says what state word 106 is in ? Healthy paranoia is a
good idea in the IDE world because its all a bit murky in the early days
and you get some quite strange ident data from early devices - one reason
for 0xC000 = 0x4000 is that some early drives use 0xFFFF for unknown words

On the driver side I need to know so I can control the FIFO so I guess
knowing when you start/end planning to use large sector sizes. The driver
could do it per command but the cost is almost certainly not worth it as
I'd expect us to stick to a size. A driver method would do the trick
nicely if it could return -EOPNOTSUPP or similar.

Alan
--

From: Matthew Wilcox
Date: Wednesday, August 6, 2008 - 6:11 am

ATA-1 says that word 106 is reserved (and further that reserved means
the drive shall return 0).  I don't have a spec earlier than that.

I suspect the ATA spec writers are assuming that no drive puts random
bits there.  all-0 and all-1 make sense, but alternating 0 and 1 are
unlikely.  At least, it's good enough for us for checking words 48, 76, 78,

Obviously it is going to change per command -- because different
commands have different sizes.  I was thinking that we could call the
driver to see if it can handle a particular sector size right after we
get the IDENTIFY data.  

Something like this, perhaps:

diff --git a/drivers/ata/ata_ram.c b/drivers/ata/ata_ram.c
index 7479b69..7c30e97 100644
--- a/drivers/ata/ata_ram.c
+++ b/drivers/ata/ata_ram.c
@@ -595,11 +595,17 @@ static bool ata_ram_qc_fill_rtf(struct ata_queued_cmd *qc)
 	return true;
 }
 
+static bool ata_ram_sector_size_supported(struct ata_device *dev)
+{
+	return 1;
+}
+
 static struct ata_port_operations ata_ram_port_ops = {
 	.qc_fill_rtf = ata_ram_qc_fill_rtf,
 	.qc_prep = ata_ram_qc_prep,
 	.qc_issue = ata_ram_qc_issue,
 	.error_handler = ata_ram_error_handler,
+	.sector_size_supported = ata_ram_sector_size_supported,
 };
 
 static struct scsi_host_template ata_ram_template = {
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 8f38d0c..28f16fb 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2236,6 +2236,20 @@ static void ata_dev_config_ncq(struct ata_device *dev,
 		snprintf(desc, desc_sz, "NCQ (depth %d/%d)", hdepth, ddepth);
 }
 
+static int ata_check_sect_size(struct ata_device *dev)
+{
+	/* Every device can handle 512 byte sectors */
+	if (dev->sect_size == 512)
+		return 0;
+	/* Linux doesn't handle sectors larger than 4GB.  This may be a
+	 * problem around 2050 or so.  Deal with it then.  */
+	if (dev->sect_size > 0xffffffffULL)
+		return -EINVAL;
+	if (!dev->link->ap->ops->sector_size_supported)
+		return -EINVAL;
+	return ...
From: Alan Cox
Date: Wednesday, August 6, 2008 - 7:13 am

The drivers need to know if you are going to be using odd sizes regularly
so they can pick between

	-	I do this fine who cares (most chips)
	-	Er uh wtf its not 512 byts (some chip state machines)
	-	FIFO off (performance hit) for this disk
	-	FIFO managed for the odd command thats a funny size
	-	Various other levels of software managed controller
		thumping

It's not a passive thing and we'd want to do it post identify on the
drive pair as it'll often need per channel decisions (eg on FIFO)

Alan
--

From: Matthew Wilcox
Date: Wednesday, August 6, 2008 - 8:13 am

It's up to the drive to report the number of sectors it uses.  After
that all regular read/write commands will be doing that size.  Unless
we're in some very bizarre situation, that will be the majority of



Might want to print a message explaining why performance is going to

... but it's not the odd command, it's going to be the vast majority of

Why can't you just disable the (controller) FIFO whenever any drive
reports != 512 byte sectors?

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Alan Cox
Date: Wednesday, August 6, 2008 - 8:06 am

Put a 512 byte and a 2K sector sized disk on the same channel and its

Because I'd like to do better than that - and FIFO is only one of the
cases. I think your proposal basically works - providing you identify
both drives before worrying about sector sizes. Most of our other logic
is done that way - the drives are all identified then the controller makes
decisions.

The setup time method then either vetos the disk or accepts it
The existing mode_filter can be used to implement stuff like "PIO only no
DMA" cases
And if we can do it but have problems the qc_issue or device select
methods can be hooked to do the work.

Alan
--

From: Jeff Garzik
Date: Sunday, August 17, 2008 - 4:19 pm

Was this ever updated regarding Alan's final comments?

I only have this patch (the one initiating the email sub-thread).

	Jeff



--

From: Matthew Wilcox
Date: Monday, August 18, 2008 - 11:15 am

I didn't update it fully; I moved onto something else.  I didn't change
the first of the two patches, so I'll just include the new version of
the second one in this mail.  I think it addresses all of Alan's comments.
I'll have a git tree up soon for you to pull from.

commit dfde7a888bae3b36113b19f37e9edb29be9ae803
Author: Matthew Wilcox <willy@linux.intel.com>
Date:   Tue Aug 5 02:25:57 2008 -0400

    ata: Add support for Long Logical Sectors and Long Physical Sectors
    
    ATA 8 permits devices that have sector sizes larger than 512 bytes.
    We support this by recording the sector size in the ata_device and use
    it instead of the ATA_SECT_SIZE when the data transfer is a multiple
    of sectors.  Drivers must indicate their support for sector sizes other
    than 512 by implementing the 'sector_size_supported' port operation.
    
    Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 5ba96c5..31e359e 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -446,6 +446,113 @@ void ata_tf_from_fis(const u8 *fis, struct ata_taskfile *tf)
 	tf->hob_nsect	= fis[13];
 }
 
+/**
+ * ata_sect_size - Returns the sector size to use for a command
+ * @command: The ATA command byte
+ * @dev_sect_size: The size of the drive's sectors
+ *
+ * Some commands are specified to transfer (a multiple of) 512 bytes of data
+ * while others transfer a multiple of the number of bytes in a sector.  This
+ * function knows which commands transfer how much data.
+ */
+unsigned ata_sect_size(u8 command, unsigned dev_sect_size)
+{
+	enum {
+		UNKNOWN = 0,
+		FIVE_TWELVE = 1,
+		DEVICE = 2,
+	};
+
+	static const char command_sect_size[256] = {
+		[ ATA_CMD_CFA_TRANSLATE_SECTOR ] = DEVICE,
+		[ ATA_CMD_CFA_WRITE_MULTI_WITHOUT_ERASE ] = DEVICE,
+		[ ATA_CMD_CFA_WRITE_SECTORS_WITHOUT_ERASE ] = DEVICE,
+		[ ATA_CMD_READ ] = DEVICE,
+		[ ATA_CMD_READ_EXT ] = DEVICE,
+		[ ...
From: Alan Cox
Date: Monday, August 18, 2008 - 11:06 am

or two bits ;)


Rest looks ok. I'll probably send a follow up patch to make most pata
drivers return "no" to anything but 512 until we can test them with real
devices.

Alan
--

From: Matthew Wilcox
Date: Monday, August 18, 2008 - 11:42 am

;-)  How about two 256-bit arrays?

The first is set for "Use device sector size", the second is set for
"Yes, we know about this command, don't warn".  The most common commands

Why do we need to do that?  Leaving sector_size_supported() unset means
exactly "no to anything but 512".  I'll be quite surprised if we see
PATA drives supporting 4096 or 520 byte sectors anyway.  It seems like
a premium feature.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Alan Cox
Date: Monday, August 18, 2008 - 11:49 am

Good point on the former - yes it defaults sane now so that doesn't need
doing. For the sector size remember a lot of PATA chips appear with SATA
bridges at times - especially promise and highpoint.
--

From: Matthew Wilcox
Date: Monday, August 18, 2008 - 12:04 pm

... err ... any idea what the syntax might be for initialising a bit
array at compile time?  Do I have to use an initialiser function at
runtime?

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

Previous thread: Simple UART driver help and TTY questions by Ira Snyder on Tuesday, August 5, 2008 - 9:33 am. (6 messages)

Next thread: Spinlock recursion in hvc_poll by Jeremy Fitzhardinge on Tuesday, August 5, 2008 - 11:08 am. (5 messages)