> On Mon, Apr 19, 2010 at 11:37:02AM +1000, Dave Chinner wrote:
> > On Thu, Apr 08, 2010 at 11:28:50AM +0200, Jan Kara wrote:
> > > > SUPERPROXY ~ # cat /proc/1753/stack
> > > > [<c019a93c>] bdi_sched_wait+0x8/0xc
> > > > [<c019a807>] wait_on_bit+0x20/0x2c
> > > > [<c019a9af>] sync_inodes_sb+0x6f/0x10a
> > > > [<c019dd53>] __sync_filesystem+0x28/0x49
> > > > [<c019ddf3>] sync_filesystems+0x7f/0xc0
> > > > [<c019de7a>] sys_sync+0x1b/0x2d
> > > > [<c02f7a25>] syscall_call+0x7/0xb
> > > > [<ffffffff>] 0xffffffff
> > > Hmm, I guess you are observing the problem reported in
> > >
https://bugzilla.kernel.org/show_bug.cgi?id=14830
> > > There seem to be several issues in the per-bdi writeback code that
> > > cause sync on a busy filesystem to last almost forever. To that bug are
> > > attached two patches that fix two issues but apparently it's not all.
> > > I'm still looking into it...
> >
> > So Jen's writeback tracing shows this for a normal cycle during a
> > large dd:
> >
> > <...>-6030 [005] 604446.696454: writeback_sched: work=38c0, task=task
> > flush-253:16-6029 [002] 604446.696492: writeback_exec: work=38c0 pages=9223372036854775807, sb=0, kupdate=0, range_cyclic=-1 for_background=-1
> > flush-253:16-6029 [002] 604446.696493: writeback_clear: work=ffff88011f1a38c0, refs=1
> > flush-253:16-6029 [003] 604446.784240: writeback_pages_written: 1024
> >
> > There were 100 of these traces (102400 pages (400MB) which is exactly 10% of
> > RAM) before this:
> >
> > <...>-6030 [000] 604462.346329: writeback_sched: work=6c0, task=task
> > flush-253:16-6029 [002] 604462.267098: writeback_exec: work=6c0 pages=9223372036854775807, sb=0, kupdate=0, range_cyclic=-1 for_background=-1
> > flush-253:16-6029 [002] 604462.267101: writeback_clear: work=ffff88011e8006c0, refs=1
> > flush-253:16-6029 [001] 604465.406331: writeback_pages_written: 160771
> >
> > which shows 160771 pages written in a single iteration (650MB). I suspect some
> > interaction between foreground and background writeout is occurring here.
> >
> > The flusher thread appears to be getting stuck on congestion - the wchan
> > it is sitting on indicates it is blocking in get_request(). I'm going to
> > extend this tracing further down into the writeback code so that what is
> > happening is clearer...
>
> Ok, I moved to a VM with less memory (1GB vs 4GB) and slower block
> devices (100MB/s vs 500MB/s) and now it's pretty clear what is
> happening. i'll go through the traces.
>
> To start with, sync() queues up several tasks to the bdi flush daemon:
>
> SYSCALL_DEFINE0(sync)
> {
> wakeup_flusher_threads(0);
> sync_filesystems(0);
> sync_filesystems(1);
> if (unlikely(laptop_mode))
> laptop_sync_completion();
> return 0;
> }
>
>
> First is via wakeup_flusher_threads() - an async flush:
>
> sync-2499 [000] 616072.710212: writeback_queue: 253:16: pages=13818, sb=0, kupdate=0, range_cyclic=0 for_background=0
> sync-2499 [000] 616072.710217: writeback_sched: work=13c0, task=task
> ^^^^
> second is a per-sb async flush via sync_filesystems(0):
>
> sync-2499 [000] 616072.717181: writeback_queue: 253:16: pages=15761, sb=1, kupdate=0, range_cyclic=-1 for_background=0
> sync-2499 [000] 616072.717191: writeback_sched: work=3840, task=task
> ^^^^
> And finally there is the sync flush via sync_filesystems(1):
>
> sync-2499 [000] 616072.737836: writeback_queue: 253:16: pages=9223372036854775807, sb=1, kupdate=0, range_cyclic=0 for_background=0
> sync-2499 [000] 616072.737840: writeback_sched: work=3e58, task=task
> ^^^^
>
> The first async flush does:
> vvvv
> flush-253:16-2497 [000] 616072.897747: writeback_exec: work=13c0 pages=13818, sb=0, kupdate=0, range_cyclic=0 for_background=0
> flush-253:16-2497 [000] 616072.897748: writeback_clear: work=ffff88003d8813c0, refs=1
> flush-253:16-2497 [000] 616072.897753: wbc_writeback_start: dev 253:16 wbc=9d20 towrt=1024 skip=0 sb=0 mode=0 kupd=0 bgrd=0 reclm=0 cyclic=0 more=0 older=0x0 start=0x0 end=0x7fffffffffffffff
> flush-253:16-2497 [000] 616072.897768: wbc_writeback_written: dev 253:16 wbc=9d20 towrt=1024 skip=0 sb=0 mode=0 kupd=0 bgrd=0 reclm=0 cyclic=0 more=0 older=0x0 start=0x0 end=0x7fffffffffffffff
>
> Nothing - it does not write any pages towrt (nr_to_write) is
> unchanged by the attempted flush.