https://bugzilla.kernel.org/show_bug.cgi?id=25352 Summary: resizing ext4 will corrupt filesystem Product: File System Version: 2.5 Kernel Version: 2.6.37-rc6 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: ext4 AssignedTo: fs_ext4@kernel-bugs.osdl.org ReportedBy: kees@outflux.net Regression: Yes Using resize2fs on an ext4 will result in a corrupted filesystem. This is a regression (obviously). I would expect "fsck" to be clean on a recently resized filesystem, but it is not: Pass 5: Checking group summary information Block bitmap differences: +(2621440--2621951) +(2654210--2655360) +(2686976--2687487) +(2719744--2720255) +(2752512--2753023) +(2785280--2785791) +(2818048--2818559) +(2850816--2851327) +(2883584--2884095) +(2916352--2916863) +(2949120--2949631) +(2981888--2982399) +(3014656--3015167) +(3047424--3047935) +(3080192--3080703) +(3112960--3113471) +(3145728--3146239) +(3178496--3179007) +(3211264--3211775) +(3244032--3244543) +(3276800--3277311) +(3309568--3310079) +(3342336--3342847) +(3375104--3375615) +(3407872--3408383) +(3440640--3441151) +(3473408--3473919) +(3506176--3506687) +(3538944--3539455) +(3571712--3572223) +(3604480--3604991) +(3637248--3637759) +(3670016--3670527) +(3702784--3703295) +(3735552--3736063) +(3768320--3768831) +(3801088--3801599) +(3833856--3834367) +(3866624--3867135) +(3899392--3899903) etc Reproducer script attached... -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. --
https://bugzilla.kernel.org/show_bug.cgi?id=25352 --- Comment #1 from Kees Cook <kees@outflux.net> 2010-12-20 20:57:53 --- Created an attachment (id=41062) --> (https://bugzilla.kernel.org/attachment.cgi?id=41062) script that will demo a corrupted ext4 after resize This has already been reported to Ubuntu, but was reproduced with an upstream kernel, so I've opened this report as well. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/692704 -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. --
https://bugzilla.kernel.org/show_bug.cgi?id=25352 --- Comment #2 from Theodore Tso <tytso@mit.edu> 2010-12-21 03:33:48 --- Created an attachment (id=41142) --> (https://bugzilla.kernel.org/attachment.cgi?id=41142) Proposed patch Yes, this is a regression new to 2.6.37-rc1, which was introduced by commit a31437b85: ext4: use sb_issue_zeroout in setup_new_group_blocks. When we replaced the loop zero'ing the inode table blocks with sb_issue_zeroout, we accidentally also removed this little tidbit: - ext4_set_bit(bit, bh->b_data); ... which was responsible for setting the block allocation bitmap to reserve the block descriptor blocks and inode table blocks. Oops... I believe this patch should fix things. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. --
https://bugzilla.kernel.org/show_bug.cgi?id=25352 --- Comment #3 from Theodore Tso <tytso@mit.edu> 2010-12-21 04:05:20 --- On Mon, Dec 20, 2010 at 08:56:46PM +0000, bugzilla-daemon@bugzilla.kernel.org Yes, this is a regression new to 2.6.37-rc1, which was introduced by commit a31437b85: ext4: use sb_issue_zeroout in setup_new_group_blocks. When we replaced the loop zero'ing the inode table blocks with sb_issue_zeroout, we accidentally also removed this little tidbit: - ext4_set_bit(bit, bh->b_data); ... which was responsible for setting the block allocation bitmap to reserve the block descriptor blocks and inode table blocks. Oops... I believe this patch should fix things. - Ted -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. --
https://bugzilla.kernel.org/show_bug.cgi?id=25352 --- Comment #4 from Kees Cook <kees@outflux.net> 2010-12-21 04:26:12 --- Thanks for tracking it down! After a fsck, I'm still seeing fs corruption, unfortunately: [177266.375628] EXT4-fs error (device dm-1): htree_dirblock_to_tree:586: inode #12255304: block 88074025: comm rm: bad entry in directory: rec_len is smaller than minimal - offset=0(4096), inode=0, rec_len=0, name_len=0 [177266.375872] EXT4-fs error (device dm-1): htree_dirblock_to_tree:586: inode #12255304: block 88074026: comm rm: bad entry in directory: rec_len is smaller than minimal - offset=0(8192), inode=0, rec_len=0, name_len=0 [177266.376135] EXT4-fs error (device dm-1): empty_dir:1922: inode #12255304: block 88074025: comm rm: bad entry in directory: rec_len is smaller than minimal - offset=0(4096), inode=0, rec_len=0, name_len=0 [177266.376360] EXT4-fs error (device dm-1): empty_dir:1922: inode #12255304: block 88074026: comm rm: bad entry in directory: rec_len is smaller than minimal - offset=0(8192), inode=0, rec_len=0, name_len=0 fsck didn't notice this problem, but walking the tree seems to trigger it. I've been trying to clean it up by just removing the offending directory, but it I figured I'd mention it since it seems to be a problem that fsck -f didn't see. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. --
https://bugzilla.kernel.org/show_bug.cgi?id=25352 Lukas Czerner <lczerner@redhat.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |lczerner@redhat.com --- Comment #5 from Lukas Czerner <lczerner@redhat.com> 2010-12-21 12:31:33 --- Oops indeed. Ted, thanks for the patch, it seems to fix the problem completely. -Lukas -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. --
https://bugzilla.kernel.org/show_bug.cgi?id=25352 --- Comment #6 from Lukas Czerner <lczerner@redhat.com> 2010-12-21 13:10:28 --- Oops indeed. Ted, thanks for the patch, it seems to fix the problem completely. -Lukas -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug.--
https://bugzilla.kernel.org/show_bug.cgi?id=25352 Theodore Tso <tytso@mit.edu> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tytso@mit.edu --- Comment #7 from Theodore Tso <tytso@mit.edu> 2010-12-21 14:19:17 --- Kees, was this (comment #4) using your resize-corruption.sh patch? After applying the patch I've enclosed, I've rerun your script, and it showed no problems. I then mounted the testfs file system, and ran ls -lR on /mnt/test, and still no problems... -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. --
https://bugzilla.kernel.org/show_bug.cgi?id=25352 --- Comment #8 from Kees Cook <kees@outflux.net> 2010-12-21 18:03:21 --- Ted, no, sorry; I didn't mean to confuse. Those are just left-over corruption from my initial fs hit. I just thought I'd report the fact that fsck didn't notice this when cleaning up from the original corruption. I.e. here's my timeline for this corruption: resize get errors in dmesg umount fsck -f (for half a day, cleans up tons) mount delete all of lost+found continue using fs more dmesg errors umount fsck -f (returns without error) mount continue using fs still dmesg errors rm offending directory completely no more errors So, it seemed like a flaw in fsck that it didn't find the bad directory, but since it was related to the corruption introduced by this kernel bug, I thought I'd bring it up in this thread. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. --
https://bugzilla.kernel.org/show_bug.cgi?id=25352 --- Comment #9 from Theodore Tso <tytso@mit.edu> 2010-12-21 19:19:46 --- Ah, thanks for the clarification. Ok, I think I see what's going on. It's a difference of how e2fsck treats a case of rec_len == 0 for block sizes less than 64k compared to the kernel. It's an edge case, but it's one we should definitely fix. Thanks for pointing it out. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. --
https://bugzilla.kernel.org/show_bug.cgi?id=25352 Rafael J. Wysocki <rjw@sisk.pl> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |florian@mickler.org, | |maciej.rutecki@gmail.com, | |rjw@sisk.pl Blocks| |21782 --- Comment #10 from Rafael J. Wysocki <rjw@sisk.pl> 2010-12-21 22:32:29 --- Handled-By : Theodore Tso <tytso@mit.edu> -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. --
https://bugzilla.kernel.org/show_bug.cgi?id=25352 Rafael J. Wysocki <rjw@sisk.pl> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |CODE_FIX --- Comment #11 from Rafael J. Wysocki <rjw@sisk.pl> 2010-12-24 13:38:17 --- Fixed by http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8a7411a243... . -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. --
https://bugzilla.kernel.org/show_bug.cgi?id=25352 Rafael J. Wysocki <rjw@sisk.pl> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. --
https://bugzilla.kernel.org/show_bug.cgi?id=25352 Martin Steigerwald <Martin@Lichtvoll.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |Martin@Lichtvoll.de --- Comment #12 from Martin Steigerwald <Martin@Lichtvoll.de> 2010-12-30 13:47:11 --- I had a corrupted ext4 yesterday after I made a ThinkPad T42 BIOS update while I just let the kernel hibernate. The kernel consequently oopsed after resuming after the BIOS update - well whether it did so consequently, but it did it, I made a screenshot of it, some ACPI related stuff AFAIR. Now I wonder whether it was me wanting to save boot and uptime causing the issue or whether it was the online resize a few days before - and I just didn't notice it cause actually I did not reboot since then before. Can you have a short log at the following to see whether that might have been the same online resizing issue? I'd just like to know what might have been the cause for that filesystem issue - cause I doubt that my risk based approach of doing the BIOS update could have caused such a corruption. I will use the shutdown and reboot method on any subsequent BIOS updated anyway - that much I learned. I already recovered by rsync'ing changed files to my backup as far as possible and then redoing Ext4 from scratch with mkfs.ext4 and then restoring from backup. I do not have the old state available anymore as I do not have a spare 220 GB to dd the filesystem to. Thus I just like to know whether the following hints at this online resizing issue or not. I have full output logs available on request. This is with: martin@shambhala:~> cat /proc/version Linux version 2.6.37-rc7-tp42-ata-eh-dbg-dirty (martin@shambhala) (gcc version 4.4.5 (Debian 4.4.5-8) ) #1 PREEMPT Wed Dec 22 11:41:20 CET 2010 Which is a plain 2.6.37-rc7 + a libata debug patch in order to get to the cause of bug ...
https://bugzilla.kernel.org/show_bug.cgi?id=25352 --- Comment #13 from Martin Steigerwald <Martin@Lichtvoll.de> 2010-12-30 14:12:08 --- Hmmm, the test script produces different fsck.ext4 output. But then my Ext4 filesystem had about two days to grow the initial corruption. And the syslog shows first problems on the 27th of December while I did the BIOS update yesterday evening. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. --
