Re: [Experimental][PATCH] putback_lru_page rework

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Cc: Daisuke Nishimura <nishimura@...>, Andrew Morton <akpm@...>, Rik van Riel <riel@...>, Kosaki Motohiro <kosaki.motohiro@...>, Nick Piggin <npiggin@...>, <linux-mm@...>, <linux-kernel@...>, <kernel-testers@...>
Date: Thursday, June 19, 2008 - 10:45 am

On Thu, 2008-06-19 at 09:22 +0900, KAMEZAWA Hiroyuki wrote:

Update:

On x86_64 [32GB, 4xdual-core Opteron], my work load has run for ~20:40
hours.  Still running.

On ia64 [32G, 16cpu, 4 node], the system started going into softlockup
after ~7 hours.  Stack trace [below] indicates zone-lru lock in
__page_cache_release() called from put_page().  Either heavy contention
or failure to unlock.  Note that previous run, with patches to
putback_lru_page() and unmap_and_move(), the same load ran for ~18 hours
before I shut it down to try these patches.

I'm going to try again with the collected patches posted by Kosaki-san
[for which, Thanks!].  If it occurs again, I'll deconfig the unevictable
lru feature and see if I can reproduce it there.  It may be unrelated to
the unevictable lru patches.


OK, so you just want to note that we're accessing the pte w/o locking
and that this is safe because the vma has been VM_LOCKED and all pages
should be mlocked?  

I'll note that the vma is NOT VM_LOCKED during the pte walk.
munlock_vma_pages_range() resets it so that try_to_unlock(), called from
munlock_vma_page(), won't try to re-mlock the page.  However, we hold
the mmap sem for write, so faults are held off--no need to worry about a
COW fault occurring between when the VM_LOCKED was cleared and before
the page is munlocked.  If that could occur, it could open a window
where a non-mlocked page is mapped in this vma, and page reclaim could
potentially unmap the page.  Shouldn't be an issue as long as we never
downgrade the semaphore to read during munlock.

Lee

----------
softlockup stack trace for "usex" workload on ia64:

BUG: soft lockup - CPU#13 stuck for 61s! [usex:124359]
Modules linked in: ipv6 sunrpc dm_mirror dm_log dm_multipath scsi_dh dm_mod pci_slot fan dock thermal sg sr_mod processor button container ehci_hcd ohci_hcd uhci_hcd usbcore

Pid: 124359, CPU 13, comm:                 usex
psr : 00001010085a6010 ifs : 8000000000000000 ip  : [<a00000010000a1a0>]    Tainted: G      D   (2.6.26-rc5-mm3-kame-rework+mcl_inherit)
ip is at ia64_spinlock_contention+0x20/0x60
unat: 0000000000000000 pfs : 0000000000000081 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : a65955959a96e969
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001001264a0 b6  : a0000001006f0350 b7  : a00000010000b940
f6  : 0ffff8000000000000000 f7  : 1003ecf3cf3cf3cf3cf3d
f8  : 1003e0000000000000001 f9  : 1003e0000000000000015
f10 : 1003e000003a82aaab1fb f11 : 1003e0000000000000000
r1  : a000000100c03650 r2  : 000000000000038a r3  : 0000000000000001
r8  : 00000010085a6010 r9  : 0000000000080028 r10 : 000000000000000b
r11 : 0000000000000a80 r12 : e0000741aaac7d50 r13 : e0000741aaac0000
r14 : 0000000000000000 r15 : a000400741329148 r16 : e000074000060100
r17 : e000076000078e98 r18 : 0000000000000015 r19 : 0000000000000018
r20 : 0000000000000003 r21 : 0000000000000002 r22 : e000076000078e88
r23 : e000076000078e80 r24 : 0000000000000001 r25 : 0240000000080028
r26 : ffffffffffff04d8 r27 : 00000010085a6010 r28 : 7fe3382473f8b380
r29 : 9c00000000000000 r30 : 0000000000000001 r31 : e000074000061400

Call Trace:
 [<a000000100015e00>] show_stack+0x80/0xa0
                                sp=e0000741aaac79b0 bsp=e0000741aaac1528
 [<a000000100016700>] show_regs+0x880/0x8c0
                                sp=e0000741aaac7b80 bsp=e0000741aaac14d0
 [<a0000001000fbbe0>] softlockup_tick+0x2e0/0x340
                                sp=e0000741aaac7b80 bsp=e0000741aaac1480
 [<a0000001000a9400>] run_local_timers+0x40/0x60
                                sp=e0000741aaac7b80 bsp=e0000741aaac1468
 [<a0000001000a9460>] update_process_times+0x40/0xc0
                                sp=e0000741aaac7b80 bsp=e0000741aaac1438
 [<a00000010003ded0>] timer_interrupt+0x1b0/0x4a0
                                sp=e0000741aaac7b80 bsp=e0000741aaac13d0
 [<a0000001000fc480>] handle_IRQ_event+0x80/0x120
                                sp=e0000741aaac7b80 bsp=e0000741aaac1398
 [<a0000001000fc660>] __do_IRQ+0x140/0x440
                                sp=e0000741aaac7b80 bsp=e0000741aaac1338
 [<a0000001000136d0>] ia64_handle_irq+0x3f0/0x420
                                sp=e0000741aaac7b80 bsp=e0000741aaac12c0
 [<a00000010000c120>] ia64_native_leave_kernel+0x0/0x270
                                sp=e0000741aaac7b80 bsp=e0000741aaac12c0
 [<a00000010000a1a0>] ia64_spinlock_contention+0x20/0x60
                                sp=e0000741aaac7d50 bsp=e0000741aaac12c0
 [<a0000001006f0350>] _spin_lock_irqsave+0x50/0x60
                                sp=e0000741aaac7d50 bsp=e0000741aaac12b8

Probably zone lru_lock in __page_cache_release().

 [<a0000001001264a0>] put_page+0x100/0x300
                                sp=e0000741aaac7d50 bsp=e0000741aaac1280
 [<a000000100157170>] free_page_and_swap_cache+0x70/0xe0
                                sp=e0000741aaac7d50 bsp=e0000741aaac1260
 [<a000000100145a10>] exit_mmap+0x3b0/0x580
                                sp=e0000741aaac7d50 bsp=e0000741aaac1210
 [<a00000010008b420>] mmput+0x80/0x1c0
                                sp=e0000741aaac7e10 bsp=e0000741aaac11d8

NOTE:  all cpus show similar stack traces above here.  Some, however, get
here from do_exit()/exit_mm(), rather than via execve().

 [<a00000010019c2c0>] flush_old_exec+0x5a0/0x1520
                                sp=e0000741aaac7e10 bsp=e0000741aaac10f0
 [<a000000100213080>] load_elf_binary+0x7e0/0x2600
                                sp=e0000741aaac7e20 bsp=e0000741aaac0fb8
 [<a00000010019b7a0>] search_binary_handler+0x1a0/0x520
                                sp=e0000741aaac7e20 bsp=e0000741aaac0f30
 [<a00000010019e4e0>] do_execve+0x320/0x3e0
                                sp=e0000741aaac7e20 bsp=e0000741aaac0ed0
 [<a000000100014d00>] sys_execve+0x60/0xc0
                                sp=e0000741aaac7e30 bsp=e0000741aaac0e98
 [<a00000010000b690>] ia64_execve+0x30/0x140
                                sp=e0000741aaac7e30 bsp=e0000741aaac0e48
 [<a00000010000bfa0>] ia64_ret_from_syscall+0x0/0x20
                                sp=e0000741aaac7e30 bsp=e0000741aaac0e48
 [<a000000000010720>] __start_ivt_text+0xffffffff00010720/0x400
                                sp=e0000741aaac8000 bsp=e0000741aaac0e48



--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
2.6.26-rc5-mm3, Andrew Morton, (Thu Jun 12, 1:59 am)
Re: 2.6.26-rc5-mm3: BUG large value for HugePages_Rsvd, Jon Tollefson, (Thu Jun 19, 12:27 pm)
Re: 2.6.26-rc5-mm3: BUG large value for HugePages_Rsvd, Andy Whitcroft, (Thu Jun 19, 1:16 pm)
Re: 2.6.26-rc5-mm3: BUG large value for HugePages_Rsvd, Jon Tollefson, (Thu Jun 19, 11:18 pm)
[BUG][PATCH -mm] avoid BUG() in __stop_machine_run(), Hidehiro Kawai, (Thu Jun 19, 2:59 am)
Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run(), Rusty Russell, (Thu Jun 19, 6:12 am)
Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run(), Jeremy Fitzhardinge, (Thu Jun 19, 11:51 am)
Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run(), Rusty Russell, (Sun Jun 22, 11:55 pm)
[PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-r..., Daisuke Nishimura, (Tue Jun 17, 3:35 am)
Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6...., KAMEZAWA Hiroyuki, (Tue Jun 17, 9:13 pm)
[PATCH] migration_entry_wait fix., KAMEZAWA Hiroyuki, (Tue Jun 17, 9:54 pm)
Re: [PATCH] migration_entry_wait fix., Nick Piggin, (Wed Jun 18, 1:35 am)
Re: [PATCH] migration_entry_wait fix., KAMEZAWA Hiroyuki, (Wed Jun 18, 2:04 am)
Re: [PATCH] migration_entry_wait fix., Nick Piggin, (Wed Jun 18, 2:42 am)
Re: [PATCH] migration_entry_wait fix., KAMEZAWA Hiroyuki, (Wed Jun 18, 2:52 am)
[PATCH -mm][BUGFIX] migration_entry_wait fix. v2, KAMEZAWA Hiroyuki, (Wed Jun 18, 3:29 am)
Re: [PATCH -mm][BUGFIX] migration_entry_wait fix. v2, Nick Piggin, (Wed Jun 18, 3:40 am)
Re: [PATCH -mm][BUGFIX] migration_entry_wait fix. v2, KOSAKI Motohiro, (Wed Jun 18, 3:26 am)
Re: [PATCH] migration_entry_wait fix., KOSAKI Motohiro, (Wed Jun 18, 1:26 am)
Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6...., Daisuke Nishimura, (Tue Jun 17, 9:26 pm)
Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6...., Daisuke Nishimura, (Tue Jun 17, 10:59 pm)
Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6...., Daisuke Nishimura, (Tue Jun 17, 9:54 pm)
Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6...., Daisuke Nishimura, (Wed Jun 18, 12:41 am)
[PATCH][-mm] remove redundant page-&gt;mapping check, KOSAKI Motohiro, (Wed Jun 18, 3:54 am)
Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6...., KAMEZAWA Hiroyuki, (Wed Jun 18, 12:59 am)
[Bad page] trying to free locked page? (Re: [PATCH][RFC] fix..., Daisuke Nishimura, (Tue Jun 17, 3:47 am)
[Experimental][PATCH] putback_lru_page rework, KAMEZAWA Hiroyuki, (Wed Jun 18, 5:40 am)
Re: [Experimental][PATCH] putback_lru_page rework, Lee Schermerhorn, (Wed Jun 18, 2:21 pm)
Re: [Experimental][PATCH] putback_lru_page rework, KAMEZAWA Hiroyuki, (Wed Jun 18, 8:22 pm)
Re: [Experimental][PATCH] putback_lru_page rework, Lee Schermerhorn, (Thu Jun 19, 10:45 am)
Re: [Experimental][PATCH] putback_lru_page rework, KAMEZAWA Hiroyuki, (Thu Jun 19, 9:13 pm)
Re: [Experimental][PATCH] putback_lru_page rework, KOSAKI Motohiro, (Sat Jun 21, 4:39 am)
Re: [Experimental][PATCH] putback_lru_page rework, Lee Schermerhorn, (Fri Jun 20, 1:10 pm)
Re: [Experimental][PATCH] putback_lru_page rework, KOSAKI Motohiro, (Sat Jun 21, 4:41 am)
Re: [Experimental][PATCH] putback_lru_page rework, Lee Schermerhorn, (Fri Jun 20, 4:41 pm)
Re: [Experimental][PATCH] putback_lru_page rework, KOSAKI Motohiro, (Sat Jun 21, 4:56 am)
Re: [Experimental][PATCH] putback_lru_page rework, KAMEZAWA Hiroyuki, (Sun Jun 22, 8:30 pm)
Re: [Experimental][PATCH] putback_lru_page rework, KAMEZAWA Hiroyuki, (Thu Jun 19, 8:47 pm)
Re: Re: [Experimental][PATCH] putback_lru_page rework, Lee Schermerhorn, (Fri Jun 20, 12:24 pm)
Re: [Experimental][PATCH] putback_lru_page rework, Daisuke Nishimura, (Wed Jun 18, 10:50 am)
Re: [Experimental][PATCH] putback_lru_page rework, KOSAKI Motohiro, (Wed Jun 18, 7:36 am)
Re: [Experimental][PATCH] putback_lru_page rework, KAMEZAWA Hiroyuki, (Wed Jun 18, 7:55 am)
Re: [Experimental][PATCH] putback_lru_page rework, Daisuke Nishimura, (Thu Jun 19, 4:00 am)
Re: [Experimental][PATCH] putback_lru_page rework, KAMEZAWA Hiroyuki, (Thu Jun 19, 4:24 am)
Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC]..., Daisuke Nishimura, (Tue Jun 17, 10:32 pm)
Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC]..., KAMEZAWA Hiroyuki, (Tue Jun 17, 5:03 am)
Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC]..., Daisuke Nishimura, (Tue Jun 17, 5:15 am)
Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC]..., Daisuke Nishimura, (Tue Jun 17, 10:40 pm)
Re: 2.6.26-rc5-mm3, Byron Bradley, (Thu Jun 12, 7:32 pm)
Re: 2.6.26-rc5-mm3, Daniel Walker, (Wed Jun 18, 1:55 pm)
Re: 2.6.26-rc5-mm3, Ingo Molnar, (Thu Jun 19, 5:13 am)
Re: 2.6.26-rc5-mm3, Daniel Walker, (Thu Jun 19, 10:39 am)
Re: 2.6.26-rc5-mm3, Daniel Walker, (Thu Jun 12, 7:55 pm)
Re: 2.6.26-rc5-mm3, Byron Bradley, (Thu Jun 12, 8:04 pm)
[BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!, Kamalesh Babulal, (Thu Jun 12, 4:44 am)
Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!, Andrew Morton, (Fri Jun 13, 3:16 am)
Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!, Andrew Morton, (Thu Jun 12, 4:57 am)
Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!, KAMEZAWA Hiroyuki, (Thu Jun 12, 8:25 pm)
Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!, KAMEZAWA Hiroyuki, (Thu Jun 12, 7:20 am)
[PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BU..., KAMEZAWA Hiroyuki, (Thu Jun 12, 9:44 pm)
Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kerne..., Lee Schermerhorn, (Fri Jun 13, 11:30 am)
Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kerne..., KAMEZAWA Hiroyuki, (Mon Jun 16, 10:32 pm)
Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kerne..., Lee Schermerhorn, (Tue Jun 17, 11:26 am)
Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kerne..., Lee Schermerhorn, (Mon Jun 16, 10:49 am)
Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kerne..., Kamalesh Babulal, (Sat Jun 14, 11:59 pm)
2.6.26-rc5-mm3: kernel BUG at mm/vmscan.c:510, Alexey Dobriyan, (Thu Jun 12, 3:58 am)
Re: 2.6.26-rc5-mm3: kernel BUG at mm/vmscan.c:510, Andrew Morton, (Thu Jun 12, 4:22 am)
Re: 2.6.26-rc5-mm3: kernel BUG at mm/vmscan.c:510, Alexey Dobriyan, (Thu Jun 12, 4:23 am)