On Friday 28 September 2007 18:02, Ingo Molnar wrote:I'll join you on your soapbox ;) This is a problem with the testers. The barrier for *reporting* problems is really low. And if they're regressions, then it is almost 100% chance of being solved. If we're simply missing the bug reports, I'll again ask for a linux-bugs list which is clear of other crap. Or are people scared of reporting regressions from previous experience? I haven't seen any reason for this and linux-mm is probably one of the most polite lists I have read (when it comes to interacting with non developers, anyway ;)). I don't see this as a big problem. The "better Linux MM" is implemented by the patch one is proposing -- whether it be a one liner bugfix, or a complete rewrite. I think it _would_ be really nice to run all interesting workloads on various Linux 26 and 24 kernels, various patchsets, and other open source OSes for the regression testing and cross polination aspects... however this brings us to the main problem: Devising useful tests and metrics, and running them. The alternative is completely unscientific chaos. OK, for performance heuristics, it is actually a much grayer "you show that your patch improves something / doesn't harm others" -- this is pretty tough for mm patches at the moment, maybe it could be better but at the end of the day, if something is worth merging then we should be able to actually prove (or have a pretty high confidence) that it is good. If not, then we don't want to merge it, by definition. In the case of the vm, this comes right back to the difficulty of getting a range of meaningful tests. We can definitely improve this situation a great deal, but like the scheduler, the *real* "tests" simply have to be done by users. And unfortunately, a lot of them don't actually test VM changes for a long time after they're merged. This would actually be one area where it might make a lot of sense to coordinate more significant VM changes with distro release cycles (maybe?) The recent swap prefetch debate showed exactly how the process _should_ work. We discovered a serious issue with metadata caching behaviour, and also the active-inactive list balancing change that makes use-once much less effective. It was discovered by the metrics that are already there and compiled in, and basically required a cat /proc/vmstat ; run-workload ; cat /proc/vmstat [ You understand why I wanted to explore any possible underlying problems independent of swap prefetch, right? If you'd just happened to be running on a laptop, or your pagecache / buffercache / slab cache just happened to be full of useless crap already (as in: something that updatedb might cause), or your morning workload happened to fault in a lot of mmap()ed data, then swap prefetch suddenly doesn't help you. ] There are a lot of useful stats compiled in already. There are some significant useful metrics which are not there, but could help. I don't know if it is anything fundamental that we're doing wrong, though. As you say, VM state machine is a lot more complex than scheduler. I don't think it is reasonable to expect to be able to solve all problems bylooking at exported stats. But so far in 2.6 development, they have often been enough to narrow things down pretty well. And where they haven't been, I (and others) have added useful metrics where possible (witness the evolution of oom killer output!) But your soapbox is helpful to emphasise that we need to be thorough with adding and using instrumentation when debugging a problem or introducing some new behaviour / changing old behaviour. -
| Linus Torvalds | Linux 2.6.27-rc8 |
| Greg KH | [patch 00/71] 2.6.26-stable review |
| Dmitry Torokhov | 2.6.27-rc8+ - first impressions |
| Rafael J. Wysocki | [Bug #11215] INFO: possible recursive locking detected ps2 command |
git: | |
| Christian MICHON | Re: MinGW port - initial work uploaded |
| Luiz Fernando N. Capitulino | Libification project (SoC) |
| Linus Torvalds | People unaware of the importance of "git gc"? |
| Jakub Narebski | [RFC] Git User's Survey 2008 |
| Richard Stallman | Real men don't attack straw men |
| Tony Abernethy | Re: What is our ultimate goal?? |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| James Hartley | scp batch mode? |
| Ingo Molnar | Re: [TCP]: TCP_DEFER_ACCEPT causes leak sockets |
| Timo Teräs | Re: xfrm_state locking regression... |
| Ingo Molnar | Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ |
| Natalie Protasevich | [BUG] New Kernel Bugs |
