| From | Subject | Date |
|---|---|---|
| Eric Sandeen | [PATCH] fix oops in ext4_mb_release_group_pa tracing
Our QA reported an oops in the ext4_mb_release_group_pa tracing,
and Josef Bacik pointed out that it was because we may have a
non-null but uninitialized ac_inode in the allocation context.
I can reproduce it when running xfstests with ext4 tracepoints on,
on a CONFIG_SLAB_DEBUG kernel.
We call trace_ext4_mb_release_group_pa from 2 places,
ext4_mb_discard_group_preallocations and
ext4_mb_discard_lg_preallocations
In both cases we allocate an ac as a container just for tracing (!)
and ...
| Aug 19, 10:59 am 2010 |
| Josef Bacik | Re: [PATCH] fix oops in ext4_mb_release_group_pa tracing
Reviewed-by: Josef Bacik <josef@redhat.com>
Thanks,
Josef
--
| Aug 19, 11:02 am 2010 |
| Ted Ts'o | Re: buggy EOFBLOCKS_FL handling
Maybe. I'd need to do some testing to see what percentage of the
"takes hours longer" is caused by needing to fix truly vast numbers of
inodes, versus the fact that writing the e2fsck log file was taking a
huge amount of time. I'm not sure, asking the user, "I've tried
fixing 100 of these inodes, and it looks like there are runs more,
want to skip checking for the rest" is all that great (i.e., a "go
into automatic 'no' mode for this question").
The other possibility is that I'd make it ...
| Aug 19, 10:11 am 2010 |
| Ted Ts'o | Re: buggy EOFBLOCKS_FL handling
My current thinking is to have an EOFBLOCKS_relaxed mode setting in
/etc/e2fsck.conf which controls whether we test for this case or not.
Technically it *is* an error, but if there are file systems with a
large number of files in this state, running e2fsck could take a
***very*** long time (potentially, hours longer than would otherwise
be expected). Hopefully once the bug fix gets pushed out, eventually
we'll be able to turn this feature off. (Where eventually might be a
year or two, given ...
| Aug 19, 7:44 am 2010 |
| Theodore Ts'o | buggy EOFBLOCKS_FL handling
It looks like how we handle the EOFBLOCKS_FL flag is buggy. This means
that when we fallocate a file to have 128k using the KEEP_SIZE flag, and
then write exactly 128k, the EOFBLOCKS_FL isn't getting cleared
correctly.
This is bad, because e2fsck will then complain about that inode. If you
have a large number of inodes that are written with fallocate using
KEEP_SIZE, and then fill them up to their expected size, e2fsck will
potentially complain about a _huge_ number of inodes.
A proposed ...
| Aug 18, 8:01 pm 2010 |
| Theodore Ts'o | [PATCH, RFC] ext4: Fix EOFBLOCKS_FL handling
It turns out we have several problems with how EOFBLOCKS_FL is
handled. First of all, there was a fencepost error where we were not
clearing the EOFBLOCKS_FL when fill in the last uninitialized block,
but rather when we allocate the next block _after_ the uninitalized
block. Secondly we were not testing to see if we needed to clear the
EOFBLOCKS_FL when writing to the file O_DIRECT or when were converting
an uninitialized block (which is the most common case).
Google-Bug-Id: ...
| Aug 18, 8:04 pm 2010 |
| Andreas Dilger | Re: buggy EOFBLOCKS_FL handling
To me this falls into the class of "silently fix our mistake" kind of problem, similar to what we did in the Lustre e2fsprogs with the extent "_hi" field not being initialized in early versions of the extent patch.
If the slowdown is due to actually updating thousands of such inodes (vs. just printing the error on the screen) you could cap the number of inodes fixed for this problem at some limit, and then every time a full e2fsck is run it would fix a bunch more.
Presumably an updated kernel ...
| Aug 19, 11:33 am 2010 |
| Andreas Dilger | Re: buggy EOFBLOCKS_FL handling
Probably e2fsck also shouldn't complain if EOFBLOCKS_FL is set, but the i_size is within the range implied by i_blocks.
Cheers, Andreas
--
| Aug 18, 10:13 pm 2010 |
| Eric Sandeen | Re: buggy EOFBLOCKS_FL handling
Oh it can get fixed during rhel6's lifetime, certainly. :)
Regarding a conf file setting, I'd really rather not have another knob
that is non-obvious to the user.
Maybe e2fsck could tally these and after I dunno, 10 or 20 or so, ask
whether it should keep flagging them or just go into "yes" mode for
the rest of the inodes with that problem?
-Eric
--
| Aug 19, 10:03 am 2010 |
| Ted Ts'o | Re: Memory allocation failed, e2fsck: aborted
Something which *might* help (but will take a long time) is to add to
your /etc/e2fsck.conf (if you have one; if not create one wiht these
contents):
[scratch_files]
directory = /var/cache/fsck
(And then make sure /var/cache/fsck exists.)
Unfortunately, as it turns out tdb (from Samba) doesn't scale as much
as I would have liked, so it's on my todo to replace this with
something else. The problem with that berk_db has non-standard
interfaces and varies from version to version. So ...
| Aug 18, 5:54 pm 2010 |
| Ted Ts'o | Re: Memory allocation failed, e2fsck: aborted
As I recall, you're on a 32-bit machine, right? If so, a limitation
you may run into is simply running out of address space. If it's not
an address space issue, we don't need to mmap anything; you could just
try enabling swap, and use the existing e2fsck code.
(I had assumed you had tried that before suggesting you use the
scratch_files tdb approach....)
- Ted
--
| Aug 19, 10:16 am 2010 |
| Andre Noll | Re: Memory allocation failed, e2fsck: aborted
Thanks for the hint. It is running for an hour now and I will report
back tomorrow. ATM, it's at 1% and the two files in /var/cache/fsck
Silly question: Would it be possible to simply mmap a large enough
file for the data and and use e.g. rbtrees for the lookups? If yes,
osl [1] could probably be an option. It's very simple but likely too
Hey, I read this posting back then, and I agree with what you say.
However, we are quite happy with our hard link based backup and use
it to "snapshot" ...
| Aug 19, 6:10 am 2010 |
| Andre Noll | Re: Memory allocation failed, e2fsck: aborted
But e2fsck runs entirely in user space, so all memory should be
I'm not sure I can follow. Are you saying we currently allocate two
struct ext2_icount for a directory inode even if . and .. are the
only two references? So we could just omit this allocation in the
common icount == 2 case because we know it is a directory inode
I'm interested in having a look at the icount structure and see what
can be done to reduce memory usage. Here's a first question: There is
ext2fs_create_icount() ...
| Aug 19, 6:01 am 2010 |
| Andreas Dilger | Re: Memory allocation failed, e2fsck: aborted
One simple way that this could be fixed fairly easily (which would presumably allow swap to be used) is to have a 2-level (or N-level) tree of allocations for the icount->list array, with the first level being just an array of pointers to individually-allocated arrays of ext2_icount_el. The sub-arrays can be some reasonable size (maybe 64kB), which would give us a fan-out of 64k / 8 = 8k, and if the top-level array is (re-)allocated in chunks of, say 64 pointers, the number of times the ...
| Aug 19, 12:03 pm 2010 |
| previous day | today | next day |
|---|---|---|
| August 18, 2010 | August 19, 2010 | August 20, 2010 |
