Re: [PATCH] Retry commands with UNIT_ATTENTION sense codes

Previous thread: SYNCHRONIZE_CACHE command is not retried by Hannes Reinecke on Tuesday, May 4, 2010 - 5:23 am. (4 messages)

Next thread: [PATCH] Enable retries for SYNCRONIZE_CACHE commands by Hannes Reinecke on Tuesday, May 4, 2010 - 7:49 am. (2 messages)
From: Hannes Reinecke
Date: Tuesday, May 4, 2010 - 7:48 am

We have to enable retries for UNIT_ATTENTION sense codes, as a
command might've been injected from the block layer (eg barrier
requests). And for those no error recovers takes place, so we
need to make sure that the SCSI midlayer corrects any transient
errors.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/scsi_error.c |    8 +-------
 1 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index d45c69c..663a1ad 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -290,19 +290,13 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
 			return NEEDS_RETRY;
 		}
 		/*
-		 * if the device is in the process of becoming ready, we 
-		 * should retry.
-		 */
-		if ((sshdr.asc == 0x04) && (sshdr.ascq == 0x01))
-			return NEEDS_RETRY;
-		/*
 		 * if the device is not started, we need to wake
 		 * the error handler to start the motor
 		 */
 		if (scmd->device->allow_restart &&
 		    (sshdr.asc == 0x04) && (sshdr.ascq == 0x02))
 			return FAILED;
-		return SUCCESS;
+		return NEEDS_RETRY;
 
 		/* these three are not supported */
 	case COPY_ABORTED:
-- 
1.6.0.2

--

From: James Bottomley
Date: Tuesday, May 4, 2010 - 9:26 am

The other patch is fine, but I don't think this is necessary.  The
reason is that even returning SUCCESS here, we go straight into
scsi_finish_command() (which passes it up to the driver handler) and
then scsi_io_completion().  There's a catch for UNIT_ATTENTION in
scsi_io_completion which will still cause a retry provided the driver
handler hasn't done any alterations, so I think we'd still get the retry
with only your other patch.

James


--

From: Mike Christie
Date: Tuesday, May 4, 2010 - 1:15 pm

The request is sent as a REQ_TYPE_BLOCK_PC (this flag is set for the 
request in sd_prepare_flush), and scsi_io_completion's blk_pc_request 
check() that returns the request upwards is before the UNIT_ATTENTION 
check one so we never hit the UNIT_ATTENTION check.
--

From: Mike Christie
Date: Tuesday, May 4, 2010 - 1:27 pm

I was looking at the wrong source.

scsi_finish_command checks for REQ_TYPE_BLOCK_PC and sets good_bytes to 
scsi_bufflen, so when scsi_io_completion calls scsi_end_request, it 

--

From: James Bottomley
Date: Tuesday, May 4, 2010 - 1:30 pm

Ah, yes, missed that.

However there are other problems then.  We can't just eat all unit
attentions on the BLOCK_PC path because some of the user programs want
to see them (CD burners for one).  I know the patch allows some to
proceed upwards, but it's risky making all except device not started do
a retry.

I'm a bit stumped on this one ... the intention of BLOCK_PC is that the
caller simply retries if they get a unit attention (which is what all
SCSI code does).  The block code doesn't want to look at the sense data
but at the error return.  I suppose we could make UNIT_ATTENTION
translate to -EAGAIN and have block do the right thing?

James


--

From: James Bottomley
Date: Tuesday, May 4, 2010 - 1:51 pm

I tried this, it gets very messy in block, so it looks to be a hack too
far.

What about this then ... it only retries UAs for barrier commands (thus
preserving standard semantics for SG_IO requests)?

James

---

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index d45c69c..7ad53fa 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -302,7 +302,20 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
 		if (scmd->device->allow_restart &&
 		    (sshdr.asc == 0x04) && (sshdr.ascq == 0x02))
 			return FAILED;
-		return SUCCESS;
+
+		if (blk_barrier_rq(scmd->request))
+			/*
+			 * barrier requests should always retry on UA
+			 * otherwise block will get a spurious error
+			 */
+			return NEEDS_RETRY;
+		else
+			/*
+			 * for normal (non barrier) commands, pass the
+			 * UA upwards for a determination in the
+			 * completion functions
+			 */
+			return SUCCESS;
 
 		/* these three are not supported */
 	case COPY_ABORTED:


--

From: Hannes Reinecke
Date: Tuesday, May 4, 2010 - 11:26 pm

The intention of BLOCK_PC is ok, but the point here is the barrier
request isn't really a BLOCK_PC request. The caller precisely
does _not_ want to look the sense code but rather have the SCSI
layer do all the necessary things.

Why can't we just mark the barrier request as something else than
BLOCK_PC, eg REQ_TYPE_SPECIAL?
Then we'd avoid these pitfalls and everything should work as expected, right?

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index de6c603..b63f84e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1038,7 +1038,7 @@ static int sd_sync_cache(struct scsi_disk *sdkp)
 
 static void sd_prepare_flush(struct request_queue *q, struct request *rq)
 {
-       rq->cmd_type = REQ_TYPE_BLOCK_PC;
+       rq->cmd_type = REQ_TYPE_SPECIAL;
        rq->timeout = SD_TIMEOUT;
        rq->retries = SD_MAX_RETRIES;
        rq->cmd[0] = SYNCHRONIZE_CACHE;

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--

From: James Bottomley
Date: Wednesday, May 5, 2010 - 6:19 am

Unfortunately, not.  A long time ago we used to use SPECIAL for prepared
commands, but we stripped that path out.  If you look at the prep
functions today, they only accept either REQ_TYPE_FS or
REQ_TYPE_BLOCK_PC ... they kill everything else, so we'd have to do
surgery on all the input paths to get this to work.

Looking at all of this, I'm really unhappy with the way barriers are
working in SCSI ... using BLOCK_PC for them was a big mistake, since
block can't handle the errors and BLOCK_PC by its very nature requests
that all errors, however trivially fixable, be returned.

It looks like we have precisely the same problems with discard too,
except there at least we'll ignore the error.

Someone needs to take a good look at fixing this ... or at least just
making discard and barrier use the same submit paths, so we can fix it
up in the lower layers.

In the meantime, I think the SUCCESS/NEEDS_RETRY fiddle in check
condition is still the easiest hack.

James


--

From: Christoph Hellwig
Date: Wednesday, May 5, 2010 - 6:28 am

Both barriers and dicards are FS requests as far as the block layer is
concerned.  We turn them into BLOCK_PC requests very late in the game,
either in the prepare_flush function for barriers, or directly inside
prep_fn for discard.  I tried to handle discard requests as FS all
the way through, but the various different ways in which we account for
a request length didn't play nicely with that.

Handling cache flushes as FS requests all the way through should be
easier than that.

--

Previous thread: SYNCHRONIZE_CACHE command is not retried by Hannes Reinecke on Tuesday, May 4, 2010 - 5:23 am. (4 messages)

Next thread: [PATCH] Enable retries for SYNCRONIZE_CACHE commands by Hannes Reinecke on Tuesday, May 4, 2010 - 7:49 am. (2 messages)