Re: Partition check considered as error is breaking mounting in 2.6.27

Previous thread: Partition check considered as error is breaking mounting in 2.6.27 by Herton Ronaldo Krzesinski on Friday, September 12, 2008 - 9:56 am. (4 messages)

Next thread: Re: Partition check considered as error is breaking mounting in 2.6.27 by Toralf on Friday, September 12, 2008 - 10:32 am. (1 message)
From: Herton Ronaldo Krzesinski
Date: Friday, September 12, 2008 - 10:01 am

[ Sent again now with correct CC to linux-usb ]

Hi,

Recently I found a problem with a buggy camera that doesn't mount anymore with 
2.6.27 (its memory is available via usb-storage), since commit 
04ebd4aee52b06a2c38127d9208546e5b96f3a19

The camera is an Olympus X-840. The original issue comes from the camera 
itself: its format program creates a partition with an off by one error, 
while the device reports that its memory has 42079 sectors, the partition 
table reports also that the only partition on the disk has the size of 42079, 
but it fails to account for the first sector in the memory that contains the 
partition table, so in the end the partition exceeds the limit of the device 
size (42080, first sector plus 42079 from the first partition).

In previous kernels (2.6.26 and before), I still could mount and access the 
device (/dev/sdb1), although with the following errors:

sd 6:0:0:0: [sdb] Assuming drive cache: write through
 sdb: sdb1
 sdb: p1 exceeds device capacity
sd 6:0:0:0: [sdb] Attached SCSI removable disk
sd 6:0:0:0: Attached scsi generic sg2 type 0
usb-storage: device scan complete
attempt to access beyond end of device
sdb: rw=0, want=42080, limit=42079
__ratelimit: 16 messages suppressed
Buffer I/O error on device sdb1, logical block 42078
attempt to access beyond end of device
sdb: rw=0, want=42080, limit=42079

If you note the log snippet above the first notable thing is "p1 exceeds 
device capacity", so looking at commit 
04ebd4aee52b06a2c38127d9208546e5b96f3a19 it is clear why sdb1 isn't created 
anymore.

After formatting the camera is this you get from fdisk (display units in 
sectors):

Disk /dev/sdb: 21 MB, 21544448 bytes
6 heads, 16 sectors/track, 438 cylinders, total 42079 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1       42079       21039+   1  FAT12
Partition 1 has different physical/logical endings:
     ...
From: Alan Stern
Date: Friday, September 12, 2008 - 10:36 am

I have to believe that it _is_ desired.  Why else would that commit 
have been merged?


Why clear the partition table?  Just replace the bogus entry with a 

This doesn't resemble the original partition table at all.  Did you 
create this strange table or did the camera's firmware change the data 

That is not a valid procedure.  You have to go into Expert mode and set 
the number of sectors, heads, and tracks first.  Most likely you'll 
want to set them to the same values used by the firmware; it looks like 
the firmware thinks there are 16 sectors/track, 8 heads, and 329 


Adding quirks to alter partition sizes sounds very dangerous.  Your 
best bet is simply to write a valid partition table.

Alan Stern

--

From: Bob Copeland
Date: Friday, September 12, 2008 - 10:59 am

Several people objected to similar patches when they were posted on
LKML in the past, usually for the reason "this breaks my forensics use
case."  I guess no one objected this time around.

http://marc.info/?l=linux-kernel&m=116069718613683&w=2

(I don't have an opinion, I'm just providing collective memory.)

-- 
Bob Copeland %% www.bobcopeland.com
--

From: Alan Stern
Date: Friday, September 12, 2008 - 11:21 am

Well, Herton can object and ask to have the offending commit reverted.

Alan Stern

--

From: Herton Ronaldo Krzesinski
Date: Friday, September 12, 2008 - 11:02 am

Clearing partition table seemed easier than fixing values with fdisk, to make 


8 heads? it reports 6 from the fdisk output:

Yes, but this is breaking other devices, not only this olympus camera. Bogdano 
reported what could be the same case with his cel. phone (Bogdano can you 
give more details?), also on IRC today Damien Lallement complained about what 
looks the same issue of "p* exceeds device capacity" (Damien can you give 
more info too?). But I'm afraid of how much devices this partition error 
fatal check now can affect, after all it's not "user friendly" to instruct 

-- 
[]'s
Herton
--

From: Alan Stern
Date: Friday, September 12, 2008 - 11:40 am

No.  phys=(328, 5, 16) doesn't mean that the device has 329 tracks, 6
heads, and 16 sectors.  It means that the last sector of the partition
(as recorded in the table) is track 328, head 5, sector 16.  
Note that partitions don't have to end on cylinder boundaries.

Guessing that the firmware intends there to be 16 sectors per track,
and also guessing that the 42079 value is too low by one, we get a
total of 42080 / 16 = 2630 as the product of heads and tracks.  Since
2630 = 8 * 328.75 this seems to say that the firmware thinks there are
approximately 329 tracks and 8 heads.  That explains the "328" above.

Continuing this reasoning, the physical end (328,5,16) corresponds to
sector (328*8*16) + (5*16) + 15 = 42079 (the 15 is because sector
numbers start at 1 instead of at 0).  Which would be valid if the first
sector of the partition was sector 1, or phys=(0,0,2), and if the
capacity was 42080.  However you didn't tell us what value was stored

You should complain to the people who wrote and accepted the
troublesome commit.  Tell them it caused a regression; that always gets
people's attention!

Alan Stern

--

From: Herton Ronaldo Krzesinski
Date: Friday, September 12, 2008 - 1:14 pm

Ok, thanks for the explanation, indeed you are right, it's about the physical 
end, I confused things, I'm not used to CHS... The start sector is 1, 
(0,0,2), so what you said is valid. I attach the dump of first sector of the 

Ok, here goes a patch that revert the relevant part of the commit and 
goes back to behaviour of previous kernels, and fixes the regression here.
If considered a good solution, can be applied.

---

fs/partition/check.c: revert part of commit 04ebd4aee52b06a2c38127d9208546e5b96f3a19

Fix regression introduced by commit 04ebd4aee52b06a2c38127d9208546e5b96f3a19,
where kernel changed behaviour making fatal the error when some partition
exceeds the limit of the device size. Some buggy devices become inacessible
because of errors in their partition table if the error is fatal.

This closes http://bugzilla.kernel.org/show_bug.cgi?id=11554

diff --git a/fs/partitions/check.c b/fs/partitions/check.c
index 7d6b34e..15c70df 100644
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -499,9 +499,9 @@ int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
 		if (!size)
 			continue;
 		if (from + size > get_capacity(disk)) {
-			printk(KERN_ERR " %s: p%d exceeds device capacity\n",
+			printk(KERN_WARNING
+				" %s: p%d exceeds device capacity\n",
 				disk->disk_name, p);
-			continue;
 		}
 		res = add_partition(disk, p, from, size, state->parts[p].flags);

-- 
[]'s
Herton
From: Herton Ronaldo Krzesinski
Date: Friday, September 12, 2008 - 1:17 pm

And I forgot my Signed-off....


--
[]'s
Herton
--

From: Bob Copeland
Date: Friday, September 12, 2008 - 1:27 pm

On Fri, Sep 12, 2008 at 4:14 PM, Herton Ronaldo Krzesinski

It might make sense to comment here that we intentionally want to add the
partition anyway.  That might keep someone from re-adding the continue (same
patch has already come up at least 3 times).

-- 
Bob Copeland %% www.bobcopeland.com
--

From: Herton Ronaldo Krzesinski
Date: Friday, September 12, 2008 - 2:07 pm

Yes, here goes a new version:

___

fs/partition/check.c: revert part of commit 04ebd4aee52b06a2c38127d9208546e5b96f3a19

Fix regression introduced by commit 04ebd4aee52b06a2c38127d9208546e5b96f3a19,
where kernel changed behaviour making fatal the error when some partition
exceeds the limit of the device size. Some buggy devices become inacessible
because of errors in their partition table if the error is fatal.

This closes http://bugzilla.kernel.org/show_bug.cgi?id=11554

Signed-off-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br>

diff --git a/fs/partitions/check.c b/fs/partitions/check.c
index 7d6b34e..2dd346d 100644
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -499,9 +499,13 @@ int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
 		if (!size)
 			continue;
 		if (from + size > get_capacity(disk)) {
-			printk(KERN_ERR " %s: p%d exceeds device capacity\n",
+			printk(KERN_WARNING
+				" %s: p%d exceeds device capacity\n",
 				disk->disk_name, p);
-			continue;
+			/* note: we don't want to break access to
+			 * devices with buggy partition tables, so we
+			 * don't want to fail here, just go on and
+			 * add partition */
 		}
 		res = add_partition(disk, p, from, size, state->parts[p].flags);
 		if (res) {



-- 
[]'s
Herton
--

From: Andrew Morton
Date: Friday, September 12, 2008 - 4:36 pm

On Fri, 12 Sep 2008 18:07:27 -0300

Well gee.  Given a choice, I went and replied to the wrong thread. 
Here's what I think:

From: Andrew Morton <akpm@linux-foundation.org>

Herton Krzesinski reports that the error-checking changes in
04ebd4aee52b06a2c38127d9208546e5b96f3a19 ("block/ioctl.c and
fs/partition/check.c: check value returned by add_partition") cause his
buggy USB camera to no longer mount.  "The camera is an Olympus X-840. 
The original issue comes from the camera itself: its format program
creates a partition with an off by one error".

Buggy devices happen.  It is better for the kernel to warn and to proceed
with the mount.

Reported-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br>
Cc: Abdel Benamrouche <draconux@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/partitions/check.c |    1 -
 1 file changed, 1 deletion(-)

diff -puN fs/partitions/check.c~rescan_partitions-make-device-capacity-errors-non-fatal fs/partitions/check.c
--- a/fs/partitions/check.c~rescan_partitions-make-device-capacity-errors-non-fatal
+++ a/fs/partitions/check.c
@@ -540,7 +540,6 @@ int rescan_partitions(struct gendisk *di
 		if (from + size > get_capacity(disk)) {
 			printk(KERN_ERR " %s: p%d exceeds device capacity\n",
 				disk->disk_name, p);
-			continue;
 		}
 		res = add_partition(disk, p, from, size, state->parts[p].flags);
 		if (res) {
_

--

From: David Brownell
Date: Friday, September 12, 2008 - 4:46 pm

So that now deserves to be KERN_WARN not KERN_ERR, yes?

--

From: Andrew Morton
Date: Friday, September 12, 2008 - 4:52 pm

On Fri, 12 Sep 2008 16:46:08 -0700


spose so.

I'm fairly unenthused about the recent KERN_correctness fad since it
went and broke sysrq-T output (you have to set the loglevel beforehand
to avoid getting only partial output).


--

From: David Brownell
Date: Friday, September 12, 2008 - 4:59 pm

I have no idea where that came from.  Wasn't in the original or

On development systems I generally "echo 8 > /proc/sysrq*"
to make sure KERN_DEBUG isn't hidden.

In this case it's just that I saw flamage a few minutes
earlier from someone trying to keep a distro boot from
spewing scarey messages for things that were NOT errors.
Like ... this message.  :)

--

From: Andrew Morton
Date: Friday, September 12, 2008 - 5:13 pm

On Fri, 12 Sep 2008 16:59:20 -0700

Your email client added it.  What I sent:

0008A0: 68 65 63 6B 2E 63 0A 40 40 20 2D 35 34 30 2C 37      >heck.c@@@ -540,7<
0008B0: 20 2B 35 34 30 2C 36 20 40 40 20 69 6E 74 20 72      > +540,6 @@ int r<
0008C0: 65 73 63 61 6E 5F 70 61 72 74 69 74 69 6F 6E 73      >escan_partitions<
0008D0: 28 73 74 72 75 63 74 20 67 65 6E 64 69 73 6B 20      >(struct gendisk <
0008E0: 2A 64 69 0A 20 09 09 69 66 20 28 66 72 6F 6D 20      >*di@ @@if (from <
0008F0: 2B 20 73 69 7A 65 20 3E 20 67 65 74 5F 63 61 70      >+ size > get_cap<
000900: 61 63 69 74 79 28 64 69 73 6B 29 29 20 7B 0A 20      >acity(disk)) {@ <
000910: 09 09 09 70 72 69 6E 74 6B 28 4B 45 52 4E 5F 45      >@@@printk(KERN_E<
000920: 52 52 20 22 20 25 73 3A 20 70 25 64 20 65 78 63      >RR " %s: p%d exc<
000930: 65 65 64 73 20 64 65 76 69 63 65 20 63 61 70 61      >eeds device capa<
000940: 63 69 74 79 5C 6E 22 2C 0A 20 09 09 09 09 64 69      >city\n",@ @@@@di<
000950: 73 6B 2D 3E 64 69 73 6B 5F 6E 61 6D 65 2C 20 70      >sk->disk_name, p<
000960: 29 3B 0A 2D 09 09 09 63 6F 6E 74 69 6E 75 65 3B      >);@-@@@continue;<
000970: 0A 20 09 09 7D 0A 20 09 09 72 65 73 20 3D 20 61      >@ @@}@ @@res = a<

your reply:

000C50: A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0      >@@@@@@@@@@@@@@@@<
000C60: A0 70 72 69 6E 74 6B 28 4B 45 52 4E 5F 45 52 52      >@printk(KERN_ERR<
000C70: 20 22 20 25 73 3A 20 70 25 64 20 65 78 63 65 65      > " %s: p%d excee<
000C80: 64 73 20 64 65 76 69 63 65 20 63 61 70 61 63 69      >ds device capaci<
000C90: 74 79 5C 6E 22 2C 0A 3E 20 A0 A0 A0 A0 A0 A0 A0      >ty\n",@> @@@@@@@<
000CA0: A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0      >@@@@@@@@@@@@@@@@<
000CB0: A0 A0 A0 A0 A0 A0 A0 A0 A0 64 69 73 6B 2D 3E 64      >@@@@@@@@@disk->d<
000CC0: 69 73 6B 5F 6E 61 6D 65 2C 20 70 29 3B 0A 3E 20      >isk_name, p);@> <
000CD0: 2D A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0      >-@@@@@@@@@@@@@@@<
000CE0: A0 A0 A0 A0 A0 A0 A0 A0 63 6F 6E 74 69 6E 75 65      ...
From: Alan Stern
Date: Friday, September 12, 2008 - 7:22 pm

In addition it deserves to be commented so that somebody doesn't go and 
change it yet again.

Alan Stern

--

From: Kay Sievers
Date: Wednesday, October 8, 2008 - 9:01 am

On Sat, Sep 13, 2008 at 1:36 AM, Andrew Morton

I was happy to see the original fix, as if causes real problems for
userspace, that the kernel creates invalid block devices with a size
that exceeds the physical disk. So, if we can not make that partition
to skip, like original patch did, because of broken hardware we don't
want to break, can we make it at least do the obvious thing, and limit
the partition with the broken entry to the size of the underlying
hardware. So that the kernel does no longer pretend to have devices of
a size which the hardware does not have.

It breaks all sort of userspace tools which read the "size" file in
sysfs, or do BLKGETSIZE and we get, if we are lucky, only:
  attempt to access beyond end of device
  sda: rw=0, want=1953535936, limit=976773168
in other cases it might cause corruption, or lead mkfs to create
devices which will fail when they get used.

Thanks,
Kay
--

From: Kay Sievers
Date: Thursday, October 9, 2008 - 7:04 am

From: Kay Sievers <kay.sievers@vrfy.org>
Subject: block: sanitize invalid partition table entries

We currently follow blindly what the partition table lies about the
disk, and let the kernel create block devices which can not be accessed.
Trying to identify the device leads to kernel logs full of:
  sdb: rw=0, want=73392, limit=28800
  attempt to access beyond end of device

Here is an example of a broken partition table, where sda2 starts
behind the end of the disk, and sdb3 is larger than the entire disk:
  Disk /dev/sdb: 14 MB, 14745600 bytes
  1 heads, 29 sectors/track, 993 cylinders, total 28800 sectors
     Device Boot      Start         End      Blocks   Id  System
  /dev/sdb1              29        7800        3886   83  Linux
  /dev/sdb2           37801       45601        3900+  83  Linux
  /dev/sdb3           15602       73402       28900+  83  Linux
  /dev/sdb4           23403       28796        2697   83  Linux

The kernel creates these completely invalid devices, which can not be
accessed, or may lead to other unpredictable failures:
  grep . /sys/class/block/sdb*/{start,size}
  /sys/class/block/sdb/size:28800
  /sys/class/block/sdb1/start:29
  /sys/class/block/sdb1/size:7772
  /sys/class/block/sdb2/start:37801
  /sys/class/block/sdb2/size:7801
  /sys/class/block/sdb3/start:15602
  /sys/class/block/sdb3/size:57801
  /sys/class/block/sdb4/start:23403
  /sys/class/block/sdb4/size:5394

With this patch, we ignore partitions which start behind the end of the disk,
and limit partitions to the end of the disk if they pretend to be larger:
  grep . /sys/class/block/sdb*/{start,size}
  /sys/class/block/sdb/size:28800
  /sys/class/block/sdb1/start:29
  /sys/class/block/sdb1/size:7772
  /sys/class/block/sdb3/start:15602
  /sys/class/block/sdb3/size:13198
  /sys/class/block/sdb4/start:23403
  /sys/class/block/sdb4/size:5394

These warnings are printed to the kernel log:
  sdb: p2 ignored, start 37801 is behind the end of the disk
  sdb: p3 size 57801 limited ...
From: Jens Axboe
Date: Monday, October 13, 2008 - 2:01 am

Kay, I added this variant.

-- 
Jens Axboe

--

Previous thread: Partition check considered as error is breaking mounting in 2.6.27 by Herton Ronaldo Krzesinski on Friday, September 12, 2008 - 9:56 am. (4 messages)

Next thread: Re: Partition check considered as error is breaking mounting in 2.6.27 by Toralf on Friday, September 12, 2008 - 10:32 am. (1 message)