Reiserfs mount time

Submitted by Enrico
on January 20, 2006 - 11:27am

Reading an article on Linux Gazette and reporting his own experience Jan Engelhardt asked on the lkml why reiserfs has the longest mount time among the filesystems that were tested. Jeffrey Mahoney explained:

ReiserFS caches bitmaps on mount and
for large file systems it can take a quite a long time. I've heard
reports of up to 15 minutes on multi-TB file systems

But there are patches that solve this issue.

Patches for this issue were ready since kernel 2.6.12 but "Hans was opposed to them initially, but emailed in September saying he decided that they should be accepted after all".

Jeffrey Mahoney sent the patches again to the lkml and to the reiserfs mailing list where Hans Reiser, after some discussion, replied it was ok.
They are now pending for inclusion in -mm.


Subject:    reiserfs mount time
From:       Jan Engelhardt 
Date:       2006-01-08 22:24:02

Hi,


brought to attentino on an irc channel, reiser seems to have the largest 
mount times for big partitions. I see this behavior on at least two 
machines (160G, 250G) and one specially-crafted virtual machine
(a 1.9TB disk / 1.9TB partition - took somewhere over 120 seconds).
Here's a dig http://linuxgazette.net/122/misc/piszcz/group001/image002.png 
from http://linuxgazette.net/122/TWDT.html#piszcz
So, any hint from the reiserfs developers how come reiserfs takes so long?
Standard mkreiserfs options (none extra passed).


Jan Engelhardt

From: Jeffrey Mahoney Date: 2006-01-14 20:40:50 Jan Kara wrote: > If I remember correctly, the problem is reiserfs loads bitmaps on mount > and that takes most of the time. Jeff Mahoney has > patches fixing this but I think Hans rejected them because he wants only > bugfixes in reiser3... Yeah, that's the right analysis. ReiserFS caches bitmaps on mount and for large file systems it can take a quite a long time. I've heard reports of up to 15 minutes on multi-TB file systems. As far as these patches getting accepted, Hans was opposed to them initially, but emailed in September saying he decided that they should be accepted after all. These were against 2.6.12.1, and I remerged and split the patches out yesterday evening. I'm going to do a bit of testing and send them out on Monday for inclusion in -mm. -Jeff -- Jeff Mahoney SUSE Labs
List: reiserfs Subject: Re: [PATCH 4/4] reiserfs: on-demand bitmap loading From: Hans Reiser Date: 2006-01-17 19:03:31 Jeff Mahoney wrote: > Hans Reiser wrote: > > >Are you saying that you allow bitmaps to be unloaded? If yes, how about > >making that a separate option, and not the default? > > > They're released, yes. Whether or not they're unloaded is up to the rest > of the system, vm pressure, etc to determine. This isn't any different > than the patch I posted before, which you ultimately approved in > September. > > If the bitmaps are to be pinned at all, I'd prefer to make *that* the > option. ReiserFS's behavior with respect to bitmaps is inconsistent with > every other Linux file system. I'd prefer to make the dynamic bitmaps > the default, and if you really must, add an option to continue to pin > them. > > The fact remains that the bitmap blocks are infrequently accessed in > comparison with other bits of metadata that we don't pin. They're not > accessed at all in a read-only environment, and barely accessed in a > light-write workload. If the bitmaps are truly in demand for heavy > writing, the caches should keep those blocks in memory, the same as they > do on other file systems. If another file system, application, or kernel > subsystem needs that memory more, it should be available for it to claim. Ok.

I know this was a while ago,

Brian Kroth (not verified)
on
May 12, 2006 - 6:45pm

I know this was a while ago, but any word on those patches? Do they work? Are they stable? Can they be submitted to the mainline?
Thanks,
Brian

it seems they need some debug

on
May 26, 2006 - 9:20am

it seems they need some debugging, quoting Jeff Mahoney from http://www.mail-archive.com/reiserfs-list@namesys.com/msg20986.html

"The patches are written, they just need some debugging. Unfortunately,
they're kind of on the back burner right now due to other work constraints."

"To elaborate, there are places where the bitmap allocation code isn't
allowed to sleep. The existing code doesn't sleep, since all the bitmaps
are cached in memory. Allowing the bitmap blocks to be dropped from the
cache means that sometimes the blocks will need to be re-read, causing a
schedule to occur while the data is read from disk.

It is the schedule itself that causes the problem, not the bitmap block
getting re-read from disk. By backing out the patch that actually loads
the bitmaps dynamically and adding an msleep(30) in the path where the
bitmap blocks are used, I can reproduce the same failure as if the
blocks were loaded dynamically. In both cases it can take anywhere from
a few hours to a few days to trigger on a heavily loaded machine. My
test case has been a rather unrealistic 50-process copy-delete loop from
/usr/include to a test file system sized to create bitmap contention on
a 4 CPU machine with the RAM limited to 128 MB.

At any rate, even with the relative infrequency of the failures, it's
still a regression and I wouldn't consider submitting to mainline
(again) until they've been fixed."

"I'm embarrassed to say that after staring at the code all afternoon, my
analysis was completely wrong. I enabled CONFIG_REISERFS_CHECK to try to
catch where it was failing earlier, and the code panicked immediately
when trying to write to the file system. The error was the same as I was
seeing in tests, just caught earlier.

It turns out that the third patch in the series was flawed and didn't
treat the v3.[56] bitmap 0 specially. Every other bitmap block on the
file system is (blocksize * 8 * bitmap number), except bitmap 0 which is
the block after the superblock. I was mistakenly loading block 0 as the
bitmap block, which is obviously wrong.

This explains why I couldn't seem to find any problems related to
scheduling as well as why the panics always involved block 8211. I'll be
running some overnight stress tests as well as performing a full audit
of the code which was essentially moved around to ensure more of these
stupid bugs didn't sneak in.

I'll post the updated patches tomorrow, but they still have the caveat
that the error handling infrastructure just isn't there yet. I guess
I'll resurrect those next. :)"

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.