On Friday 28 September 2007 18:02, Ingo Molnar wrote:
I'll join you on your soapbox ;)
This is a problem with the testers. The barrier for *reporting* problems
is really low. And if they're regressions, then it is almost 100% chance of
being solved. If we're simply missing the bug reports, I'll again ask for a
linux-bugs list which is clear of other crap.
Or are people scared of reporting regressions from previous experience?
I haven't seen any reason for this and linux-mm is probably one of the
most polite lists I have read (when it comes to interacting with non
developers, anyway ;)).
I don't see this as a big problem. The "better Linux MM" is implemented
by the patch one is proposing -- whether it be a one liner bugfix, or a
complete rewrite. I think it _would_ be really nice to run all interesting
workloads on various Linux 26 and 24 kernels, various patchsets, and
other open source OSes for the regression testing and cross polination
aspects... however this brings us to the main problem:
Devising useful tests and metrics, and running them.
The alternative is completely unscientific chaos. OK, for performance
heuristics, it is actually a much grayer "you show that your patch
improves something / doesn't harm others" -- this is pretty tough for
mm patches at the moment, maybe it could be better but at the end
of the day, if something is worth merging then we should be able to
actually prove (or have a pretty high confidence) that it is good. If not,
then we don't want to merge it, by definition.
In the case of the vm, this comes right back to the difficulty of getting a
range of meaningful tests. We can definitely improve this situation a
great deal, but like the scheduler, the *real* "tests" simply have to be
done by users. And unfortunately, a lot of them don't actually test VM
changes for a long time after they're merged. This would actually be one
area where it might make a lot of sense to coordinate more significant
VM changes with distro release cycles (maybe?)
The recent swap prefetch debate showed exactly how the process _should_
work. We discovered a serious issue with metadata caching behaviour, and
also the active-inactive list balancing change that makes use-once much
less effective.
It was discovered by the metrics that are already there and compiled in,
and basically required a cat /proc/vmstat ; run-workload ; cat /proc/vmstat
[ You understand why I wanted to explore any possible underlying problems
independent of swap prefetch, right? If you'd just happened to be running on
a laptop, or your pagecache / buffercache / slab cache just happened to be
full of useless crap already (as in: something that updatedb might cause), or
your morning workload happened to fault in a lot of mmap()ed data, then
swap prefetch suddenly doesn't help you. ]
There are a lot of useful stats compiled in already. There are some
significant useful metrics which are not there, but could help. I don't know
if it is anything fundamental that we're doing wrong, though.
As you say, VM state machine is a lot more complex than scheduler. I
don't think it is reasonable to expect to be able to solve all problems
bylooking at exported stats. But so far in 2.6 development, they have often
been enough to narrow things down pretty well. And where they haven't been,
I (and others) have added useful metrics where possible (witness the
evolution of oom killer output!)
But your soapbox is helpful to emphasise that we need to be thorough
with adding and using instrumentation when debugging a problem or
introducing some new behaviour / changing old behaviour.
-