On Fri, 21 Sep 2007, Andy Whitcroft wrote:You may well be right. My initial reaction was to point out that my patch was to a 2.6.23-rc6-mm1 implementation detail, whereas this thread is about a problem seen since 2.6.20 or perhaps earlier. But once I look harder at it, I wonder what would have kept 2.6.18 to 2.6.23 safe from the same issue: per-cpu deltas from the global vm stats too low to get synched back to global, yet adding up to something which misleads balance_dirty_pages into an indefinite loop e.g. total nr_writeback actually 0, but appearing more than dirty_thresh in the global approximation. Certainly my rc6-mm1 patch won't work here, and all it was doing was apply some safety already added by Peter to one further case. Looking at the 2.6.18-2.6.23 code, I'm uncertain what to try instead. There is a refresh_vm_stats function which we could call (then retest the break condition) just before resorting to congestion_wait. But the big NUMA people might get very upset with me calling that too often: causing a thundering herd of bouncing cachelines which that was all designed to avoid. And it's not obvious to me what condition to test for dirty_thresh "too low". I believe Peter gave all this quite a lot of thought when he was making the rc6-mm1 changes, and I'd rather defer to him for a suggestion of what best to do in earlier releases. Or maybe he'll just point out how this couldn't have been a problem before. Or there is is Richard's patch, which I haven't considered, but Andrew was not quite satisfied with it - partly because he'd like to understand how the situation could come about first, perhaps we have now got an explanation. (The original bug report was indeed on SMP, but I haven't seen anyone say that's a necessary condition for the hang: it would be if this is the issue. And Richard writes at one point of the system only responding to AltSysRq: that would be surprising for this issue, though it's possible that a task in balance_dirty_pages is holding an i_mutex that everybody else comes to need.) Hugh -
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| Trent Piepho | Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code |
| Steven Rostedt | Re: -rt scheduling: wakeup bug? |
| Andrew Morton | Re: [BUG] New Kernel Bugs |
| Gerrit Renker | [PATCH 0/37] dccp: Feature negotiation - last call for comments |
| David Miller | [GIT]: Networking |
git: | |
