Re: [bug, 2.6.26-rc4/rc5] sporadic bootup crashes in blk_lookup_devt()/prepare_namespace()

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Ingo Molnar <mingo@...>
Cc: Cornelia Huck <cornelia.huck@...>, Vegard Nossum <vegard.nossum@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Jens Axboe <jens.axboe@...>, Greg Kroah-Hartman <gregkh@...>, Rafael J. Wysocki <rjw@...>, Kay Sievers <kay.sievers@...>, Neil Brown <neilb@...>, Mariusz Kozlowski <m.kozlowski@...>, Dave Young <hidave.darkstar@...>
Date: Monday, June 9, 2008 - 12:15 pm

On Mon, 9 Jun 2008, Ingo Molnar wrote:

Yes. It could cause two kinds of problems:

 - it might end up returning the wrong 'dev_t'. This is unlikely, since we 
   only have two cases: the working whole-disk case, and the case where we 
   find a partition.

   But if we find a partition, we'd still get the right dev_t *most* of 
   the time, because we'd first get called with "part=0", and then we have 

	if (part < disk->minors)
		devt = MKDEV(MAJOR(dev->devt),
			MINOR(dev->devt) + part);
	break;

   where we would only fail if that conditional statement would be untrue 
   (and then we'd incorrectly return MKDEV(0,0)). Otherwise, 'devt' ends 
   up being correct anyway.

   So one effect of this bug would be that it would use the random 
   "disk->minors" value to either return the right devt, or return one 
   that is all zeroes. But if we return the all-zeroes case, then 
   init/do_mounts.c will just try again, this time with the numbers 
   removed, and now it wouldn't hit the "strcmp()" on any partition, and 
   the next time around it would find a disk and work again.

   So this is a bug, but it's one that essentially is hidden by the 
   caller.

 - The other alternative is that the bogus "disk->minors" thing would 
   cause a page fault. This would only happen if the partition allocation 
   was the first thing in a page, and the previous page was unused, and 
   you had DEBUG_PAGEALLOC enabled.

   This is obviously the case you saw.

My trivial fix makes it ignore partitions entirely.

We *could* (and perhaps should) do something slightly more involved 
instead, which actually uses a partition if it's there). Like this. That 
would avoid my one nagging worry (that some clever usage makes partitions 
with a different numbering or without a base block device).

And this is all still ignoring the locking issue, of course. It would be 
trivial to just remove the block_class_lock, and change

	mutex_[un]lock(&block_class_lock);

into

	down|up(&block_class.sem);

except for _one_ case, which is

	bdev_map = kobj_map_init(base_probe, &block_class_lock);

which really wants a mutex, not a sempahore.

So to fix that, we'd need to make the class->sem be a mutex, and pass that 
in. Which is probably a good change too, but makes the whole thing much 
bigger.

		Linus

---
 block/genhd.c |   14 +++++++++-----
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 129ad93..101530e 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -661,11 +661,15 @@ dev_t blk_lookup_devt(const char *name, int part)
 	mutex_lock(&block_class_lock);
 	list_for_each_entry(dev, &block_class.devices, node) {
 		if (strcmp(dev->bus_id, name) == 0) {
-			struct gendisk *disk = dev_to_disk(dev);
-
-			if (part < disk->minors)
-				devt = MKDEV(MAJOR(dev->devt),
-					     MINOR(dev->devt) + part);
+			if (dev->type == &disk_type) {
+				struct gendisk *disk = dev_to_disk(dev);
+				if (part >= disk->minors)
+					continue;
+			} else {
+				if (part)
+					continue;
+			}
+			devt = dev->devt + part;
 			break;
 		}
 	}
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [bug, 2.6.26-rc4/rc5] sporadic bootup crashes in blk_loo..., Linus Torvalds, (Mon Jun 9, 12:15 pm)