Re: [BUG] disk_free_ptbl_rcu_cb() crash

Previous thread: [PATCH] tcm/iblock: Update blkdev_issue_flush() parameter usage by Nicholas A. Bellinger on Saturday, October 23, 2010 - 1:59 pm. (1 message)

Next thread: [PATCH] video/omap: suspected typo in assignment by Nicolas Kaiser on Saturday, October 23, 2010 - 2:20 pm. (1 message)
From: Eric Dumazet
Date: Saturday, October 23, 2010 - 2:10 pm

Current Linus tree makes my machine crash in disk_free_ptbl_rcu_cb(),
while booting...

commit 7681bfeeccff5ef seems the problem ?

Following patch solves the NULL dereference, but this is only to show
you where the problem is, not a real fix, of course.

Thanks

 block/genhd.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index a8adf96..b63d401 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -930,14 +930,16 @@ static void disk_free_ptbl_rcu_cb(struct rcu_head *head)
 	struct disk_part_tbl *ptbl =
 		container_of(head, struct disk_part_tbl, rcu_head);
 	struct gendisk *disk = ptbl->disk;
-	struct request_queue *q = disk->queue;
+	struct request_queue *q = disk ? disk->queue : NULL;
 	unsigned long flags;
 
 	kfree(ptbl);
 
-	spin_lock_irqsave(q->queue_lock, flags);
-	elv_quiesce_end(q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	if (q) {
+		spin_lock_irqsave(q->queue_lock, flags);
+		elv_quiesce_end(q);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+	}
 }
 
 /**


--

From: Jens Axboe
Date: Saturday, October 23, 2010 - 11:04 pm

Darn. Your fix is on the right path, you missed one though. I think it's
cleaner to move this into the elevator helpers, so that the callers can
remain clean.

Can you verify that this works too?

diff --git a/block/elevator.c b/block/elevator.c
index 2569512..f08ae2d 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -590,11 +590,8 @@ void elv_drain_elevator(struct request_queue *q)
 /*
  * Call with queue lock held, interrupts disabled
  */
-void elv_quiesce_start(struct request_queue *q)
+void __elv_quiesce_start(struct request_queue *q)
 {
-	if (!q->elevator)
-		return;
-
 	queue_flag_set(QUEUE_FLAG_ELVSWITCH, q);
 
 	/*
@@ -610,11 +607,31 @@ void elv_quiesce_start(struct request_queue *q)
 	}
 }
 
-void elv_quiesce_end(struct request_queue *q)
+void elv_quiesce_start(struct request_queue *q)
+{
+	if (q->elevator) {
+		spin_lock_irq(q->queue_lock);
+		__elv_quiesce_start(q);
+		spin_unlock_irq(q->queue_lock);
+	}
+}
+
+void __elv_quiesce_end(struct request_queue *q)
 {
 	queue_flag_clear(QUEUE_FLAG_ELVSWITCH, q);
 }
 
+void elv_quiesce_end(struct request_queue *q)
+{
+	if (q->elevator) {
+		unsigned long flags;
+
+		spin_lock_irqsave(q->queue_lock, flags);
+		__elv_quiesce_end(q);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+	}
+}
+
 void elv_insert(struct request_queue *q, struct request *rq, int where)
 {
 	int unplug_it = 1;
@@ -969,7 +986,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 	 * Turn on BYPASS and drain all requests w/ elevator private data
 	 */
 	spin_lock_irq(q->queue_lock);
-	elv_quiesce_start(q);
+	__elv_quiesce_start(q);
 
 	/*
 	 * Remember old elevator.
@@ -995,9 +1012,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 	 * finally exit old elevator and turn off BYPASS.
 	 */
 	elevator_exit(old_elevator);
-	spin_lock_irq(q->queue_lock);
 	elv_quiesce_end(q);
-	spin_unlock_irq(q->queue_lock);
 
 	blk_add_trace_msg(q, "elv ...
From: Eric Dumazet
Date: Saturday, October 23, 2010 - 11:44 pm

Sure, I did right now and it works too, thanks !


--

From: Jens Axboe
Date: Saturday, October 23, 2010 - 11:45 pm

Thanks for the (very) quick turn-around, I'll get this one expedited
as well.

-- 
Jens Axboe

--

From: Vivek Goyal
Date: Saturday, October 23, 2010 - 11:52 pm

Hi Jens,

I am wondering if this fix is safe. Looking at the memstick backtrace in
other mail thread, it looks like request queue itself has been freed. So we
probably should be checking for request queue being valid before we try to
check q->elevator being valid.

P.S. I tried sending the same response from gmail account but it bounced.
So if you get this mail twice, please ignore.

--

From: Jens Axboe
Date: Sunday, October 24, 2010 - 12:00 am

Looking at that trace, it's not yet deleted. But if it's in the to-free
path, by the time we invoke the rcu callback and do the quiesce end it
could be gone.

Needs a bit of thought, feel free to poke at it today if you have time
(because I really do not :-/)


Didn't get it twice.

-- 
Jens Axboe

--

Previous thread: [PATCH] tcm/iblock: Update blkdev_issue_flush() parameter usage by Nicholas A. Bellinger on Saturday, October 23, 2010 - 1:59 pm. (1 message)

Next thread: [PATCH] video/omap: suspected typo in assignment by Nicolas Kaiser on Saturday, October 23, 2010 - 2:20 pm. (1 message)