Re: Block: Fix blk_start_queueing() so as not to process a stopped queue

Previous thread: FYI: e1000e: corruption, Lenovo/IBM are replacing my MB. by Lee.Mathers on Thursday, October 2, 2008 - 8:03 am. (1 message)

Next thread: Block: Fix handling of stopped queues and a plugging issue by Elias Oltmanns on Thursday, October 2, 2008 - 8:59 am. (3 messages)
From: Elias Oltmanns
Date: Thursday, October 2, 2008 - 8:55 am

[Adding linux-scsi since this may be of interest there.]


Well, I have come across this bug by applying the out-of-tree disk shock
protection patch. I realise that this is not a good incentive for adding
queueing this up for stable. However, a quick look through the tree
revealed that blk_stop_queue() is used in various places. I probably
have to admit that my vote for inclusion into stable was mainly founded
on the assumption that ``restoring the expected behaviour'' was the safe
option, whereas you are quite right in saying that it is safer to leave
code alone until it has actually proven to behave erroneously. So, here
are some facts and considerations for you to decide for yourself whether
this should go into stable or not:

As far as I can tell, the IDE subsystem is not affected by this bug in
any way. Even though I haven't really made an effor to understand all
the pieces of code in drivers/block/ that use blk_stop_queue(), I am now
under the impression that the bug doesn't cause serious problems there
either. SCSI, on the other hand, seems to be a different matter. Here,
the functions that stop the queue, additionally change the sdev into the
SDEV_BLOCK state. Running the cfq scheduler, I can reliably freeze my
system which I attribute to the following sequence of events:

1. After scsi_request_fn() has exhausted the queue depth of the LLDD, it
   prepares one more request by means of *_prep_fn() and sets the
   DONTPREP flag. Then it realises that the LLDD queue is full and
   leaves the request (fully prepared) on the block layer queue and
   exits eventually.
2. For some reason or other, scsi_internal_device_block() gets called
   which stops the queue and transitions sdev into the SDEV_BLOCK state.
3. When the next FS request arrives, cfq calls blk_start_queueing() from
   cfq_rq_enqueued().
4. Now, the first request scsi_request_fn() takes off the queue is the
   one marked as DONTPREP. Thus, the request is passed on to
   scsi_dispatch_cmd().
5. Here the ...
Previous thread: FYI: e1000e: corruption, Lenovo/IBM are replacing my MB. by Lee.Mathers on Thursday, October 2, 2008 - 8:03 am. (1 message)

Next thread: Block: Fix handling of stopped queues and a plugging issue by Elias Oltmanns on Thursday, October 2, 2008 - 8:59 am. (3 messages)