login
Login
/
Register
Search
Search this site:
Forums
News
Blogs
Features
Site
Home
»
Mailing list archives
»
linux-kernel
»
2010
»
April
»
23
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
view
thread
Previous message: [
thread
] [
date
] [
author
]
Next message: [
thread
] [
date
] [
author
]
[view in full thread]
From: KAMEZAWA Hiroyuki
Subject:
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
Date: Friday, April 23, 2010 - 12:55 am
On Fri, 23 Apr 2010 16:53:44 +0900 Minchan Kim <minchan.kim@gmail.com> wrote:
quoted text
> On Fri, Apr 23, 2010 at 4:17 PM, KAMEZAWA Hiroyuki > <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > On Fri, 23 Apr 2010 16:00:31 +0900 > > Minchan Kim <minchan.kim@gmail.com> wrote: > > > >> On Fri, Apr 23, 2010 at 2:27 PM, KAMEZAWA Hiroyuki > >> <kamezawa.hiroyu@jp.fujitsu.com> wrote: > >> > On Fri, 23 Apr 2010 14:11:37 +0900 > >> > Minchan Kim <minchan.kim@gmail.com> wrote: > >> > > >> >> On Fri, Apr 23, 2010 at 12:01 PM, KAMEZAWA Hiroyuki > >> >> <kamezawa.hiroyu@jp.fujitsu.com> wrote: > >> >> > > >> >> > This patch itself is for -mm ..but may need to go -stable tree for memory > >> >> > hotplug. (but we've got no report to hit this race...) > >> >> > > >> >> > This one is the simplest, I think and works well on my test set. > >> >> > == > >> >> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > >> >> > > >> >> > In rmap.c, at checking rmap in vma chain in page->mapping, anon_vma->lock > >> >> > or mapping->i_mmap_lock is held and enter following loop. > >> >> > > >> >> > for_each_vma_in_this_rmap_link(list from page->mapping) { > >> >> > unsigned long address = vma_address(page, vma); > >> >> > if (address == -EFAULT) > >> >> > continue; > >> >> > .... > >> >> > } > >> >> > > >> >> > vma_address is checking [start, end, pgoff] v.s. page->index. > >> >> > > >> >> > But vma's [start, end, pgoff] is updated without locks. vma_address() > >> >> > can hit a race and may return wrong result. > >> >> > > >> >> > This bahavior is no problem in usual routine as try_to_unmap() etc... > >> >> > But for page migration, rmap_walk() has to find all migration_ptes > >> >> > which migration code overwritten valid ptes. This race is critical and cause > >> >> > BUG that a migration_pte is sometimes not removed. > >> >> > > >> >> > pr 21 17:27:47 localhost kernel: ------------[ cut here ]------------ > >> >> > Apr 21 17:27:47 localhost kernel: kernel BUG at include/linux/swapops.h:105! > >> >> > Apr 21 17:27:47 localhost kernel: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC > >> >> > Apr 21 17:27:47 localhost kernel: last sysfs file: /sys/devices/virtual/net/br0/statistics/collisions > >> >> > Apr 21 17:27:47 localhost kernel: CPU 3 > >> >> > Apr 21 17:27:47 localhost kernel: Modules linked in: fuse sit tunnel4 ipt_MASQUERADE iptable_nat nf_nat bridge stp llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf xt_physdev ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 dm_multipath uinput ioatdma ppdev parport_pc i5000_edac bnx2 iTCO_wdt edac_core iTCO_vendor_support shpchp parport e1000e kvm_intel dca kvm i2c_i801 i2c_core i5k_amb pcspkr megaraid_sas [last unloaded: microcode] > >> >> > Apr 21 17:27:47 localhost kernel: > >> >> > Apr 21 17:27:47 localhost kernel: Pid: 27892, comm: cc1 Tainted: G W 2.6.34-rc4-mm1+ #4 D2519/PRIMERGY > >> >> > Apr 21 17:27:47 localhost kernel: RIP: 0010:[<ffffffff8114e9cf>] [<ffffffff8114e9cf>] migration_entry_wait+0x16f/0x180 > >> >> > Apr 21 17:27:47 localhost kernel: RSP: 0000:ffff88008d9efe08 EFLAGS: 00010246 > >> >> > Apr 21 17:27:47 localhost kernel: RAX: ffffea0000000000 RBX: ffffea0000241100 RCX: 0000000000000001 > >> >> > Apr 21 17:27:47 localhost kernel: RDX: 000000000000a4e0 RSI: ffff880621a4ab00 RDI: 000000000149c03e > >> >> > Apr 21 17:27:47 localhost kernel: RBP: ffff88008d9efe38 R08: 0000000000000000 R09: 0000000000000000 > >> >> > Apr 21 17:27:47 localhost kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff880621a4aae8 > >> >> > Apr 21 17:27:47 localhost kernel: R13: 00000000bf811000 R14: 000000000149c03e R15: 0000000000000000 > >> >> > Apr 21 17:27:47 localhost kernel: FS: 00007fe6abc90700(0000) GS:ffff880005a00000(0000) knlGS:0000000000000000 > >> >> > Apr 21 17:27:47 localhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> >> > Apr 21 17:27:47 localhost kernel: CR2: 00007fe6a37279a0 CR3: 000000008d942000 CR4: 00000000000006e0 > >> >> > Apr 21 17:27:47 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >> >> > Apr 21 17:27:47 localhost kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > >> >> > Apr 21 17:27:47 localhost kernel: Process cc1 (pid: 27892, threadinfo ffff88008d9ee000, task ffff8800b23ec820) > >> >> > Apr 21 17:27:47 localhost kernel: Stack: > >> >> > Apr 21 17:27:47 localhost kernel: ffffea000101aee8 ffff880621a4aae8 ffff88008d9efe38 00007fe6a37279a0 > >> >> > Apr 21 17:27:47 localhost kernel: <0> ffff8805d9706d90 ffff880621a4aa00 ffff88008d9efef8 ffffffff81126d05 > >> >> > Apr 21 17:27:47 localhost kernel: <0> ffff88008d9efec8 0000000000000246 0000000000000000 ffffffff81586533 > >> >> > Apr 21 17:27:47 localhost kernel: Call Trace: > >> >> > Apr 21 17:27:47 localhost kernel: [<ffffffff81126d05>] handle_mm_fault+0x995/0x9b0 > >> >> > Apr 21 17:27:47 localhost kernel: [<ffffffff81586533>] ? do_page_fault+0x103/0x330 > >> >> > Apr 21 17:27:47 localhost kernel: [<ffffffff8104bf40>] ? finish_task_switch+0x0/0xf0 > >> >> > Apr 21 17:27:47 localhost kernel: [<ffffffff8158659e>] do_page_fault+0x16e/0x330 > >> >> > Apr 21 17:27:47 localhost kernel: [<ffffffff81582f35>] page_fault+0x25/0x30 > >> >> > Apr 21 17:27:47 localhost kernel: Code: 53 08 85 c9 0f 84 32 ff ff ff 8d 41 01 89 4d d8 89 45 d4 8b 75 d4 8b 45 d8 f0 0f b1 32 89 45 dc 8b 45 dc 39 c8 74 aa 89 c1 eb d7 <0f> 0b eb fe 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 > >> >> > Apr 21 17:27:47 localhost kernel: RIP [<ffffffff8114e9cf>] migration_entry_wait+0x16f/0x180 > >> >> > Apr 21 17:27:47 localhost kernel: RSP <ffff88008d9efe08> > >> >> > Apr 21 17:27:47 localhost kernel: ---[ end trace 4860ab585c1fcddb ]--- > >> >> > > >> >> > > >> >> > > >> >> > This patch adds vma_address_safe(). And update [start, end, pgoff] > >> >> > under seq counter. > >> >> > > >> >> > Cc: Mel Gorman <mel@csn.ul.ie> > >> >> > Cc: Minchan Kim <minchan.kim@gmail.com> > >> >> > Cc: Christoph Lameter <cl@linux-foundation.org> > >> >> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > >> >> > >> >> That's exactly same what I have in mind. :) > >> >> But I am hesitating. That's because AFAIR, we try to remove seqlock. Right? > >> > > >> > Ah,..."don't use seqlock" is trend ? > >> > > >> >> But in this case, seqlock is good, I think. :) > >> >> > >> > BTW, this isn't seqlock but seq_counter :) > >> > > >> > I'm still testing. What I doubt other than vma_address() is fork(). > >> > at fork(), followings _may_ happen. (but I'm not sure). > >> > > >> > chain vma. > >> > copy page table. > >> > -> migration entry is copied, too. > >> > > >> > At remap, > >> > for each vma > >> > look into page table and replace. > >> > > >> > Then, > >> > rmap_walk(). > >> > fork(parent, child) > >> > look into child's page table. > >> > => we fond nothing. > >> > spin_lock(child's pagetable); > >> > spin_lock(parant's page table); > >> > copy migration entry > >> > spin_unlock(paranet's page table) > >> > spin_unlock(child's page table) > >> > update parent's paga table > >> > > >> > If we always find parant's page table before child's , there is no race. > >> > But I can't get prit_tree's list order as clear image. Hmm. > >> > > >> > Thanks, > >> > -Kame > >> > > >> > >> That's good point, Kame. > >> I looked into prio_tree quickly. > >> If I understand it right, list order is backward. > >> > >> dup_mmap calls vma_prio_tree_add. > >> > >> * prio_tree_root > >> * | > >> * A vm_set.head > >> * / \ / > >> * L R -> H-I-J-K-M-N-O-P-Q-S > >> * ^ ^ <-- vm_set.list --> > >> * tree nodes > >> * > >> > >> Maybe, parent and childs's vma are H~S. > >> Then, comment said. > >> > >> "vma->shared.vm_set.parent != NULL ==> a tree node" > >> So vma_prio_tree_add call not list_add_tail but list_add. > >> > > Ah, thank you for explanation. > > > >> Anyway, I think order isn't mixed. > >> So, could we traverse it by backward in rmap? > >> > > Doesn't it make prio-tree code dirty ? > > > > Here is another idea....but ..hmm. Does this make fork() slow in some cases ? > > Yes. I think this idea is good to me. :) > Great, Kame. > > But as you said, migration is rare. > so we wouldn't lost much performance in many case. > > Actually, If I understand prio_tree right, I think backward walking of > prio_tree is nod bad. > I don't think it make code dirty. :) > I admit it's different per people. >
it's okay to me. My concern was difficulty of maintainance.
quoted text
> I like both ideas. > I passes decision to others. :) >
me, too. maybe a problem is we need tese case to hit this race. Thanks, -Kame --
unsubscribe notice
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to
majordomo@vger.kernel.org
More majordomo info at
http://vger.kernel.org/majordomo-info.html
Please read the FAQ at
http://www.tux.org/lkml/
Previous message: [
thread
] [
date
] [
author
]
Next message: [
thread
] [
date
] [
author
]
Messages in current thread:
[BUGFIX][mm][PATCH] fix migration race in rmap_walk
, KAMEZAWA Hiroyuki
, (Thu Apr 22, 8:01 pm)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, Minchan Kim
, (Thu Apr 22, 10:11 pm)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, KAMEZAWA Hiroyuki
, (Thu Apr 22, 10:27 pm)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, Minchan Kim
, (Fri Apr 23, 12:00 am)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, KAMEZAWA Hiroyuki
, (Fri Apr 23, 12:17 am)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, Minchan Kim
, (Fri Apr 23, 12:53 am)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, KAMEZAWA Hiroyuki
, (Fri Apr 23, 12:55 am)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, Mel Gorman
, (Fri Apr 23, 2:59 am)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, Mel Gorman
, (Fri Apr 23, 8:58 am)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, KAMEZAWA Hiroyuki
, (Fri Apr 23, 7:02 pm)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, Mel Gorman
, (Sat Apr 24, 3:43 am)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, KAMEZAWA Hiroyuki
, (Sun Apr 25, 4:49 pm)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, KAMEZAWA Hiroyuki
, (Sun Apr 25, 7:53 pm)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, Minchan Kim
, (Sun Apr 25, 9:00 pm)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, Minchan Kim
, (Sun Apr 25, 9:06 pm)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, Minchan Kim
, (Sun Apr 25, 9:31 pm)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, KAMEZAWA Hiroyuki
, (Mon Apr 26, 2:28 am)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, Minchan Kim
, (Mon Apr 26, 2:48 am)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, KAMEZAWA Hiroyuki
, (Mon Apr 26, 2:49 am)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, Minchan Kim
, (Mon Apr 26, 3:07 am)
Re: [BUGFIX][mm][PATCH] fix migration race in rmap_walk
, Mel Gorman
, (Mon Apr 26, 4:36 am)
Navigation
Mailing list archives
Recent posts
Popular discussions
linux-kernel
:
Ken Chen
[patch] sched: fix inconsistency when redistribute per-cpu tg->cfs_rq shares.
Ingo Molnar
Re: [PATCH v3] x86: merge the simple bitops and move them to bitops.h
Jan Engelhardt
Re: [PATCH] Allow Kconfig to set default mmap_min_addr protection
Dmitry Torokhov
Re: [2.6 patch] input/serio/hp_sdc.c section fix
Rafael J. Wysocki
[Bug #16380] Loop devices act strangely in 2.6.35
git
:
Steven Grimm
Using git as a general backup mechanism (was Re: Using GIT to store /etc)
Jeff King
Re: [PATCH] git-reset: allow --soft in a bare repo
Johannes Sixt
Re: [PATCH 01/14] msvc: Fix compilation errors in compat/win32/sys/poll.c
Johannes Schindelin
Re: [PATCH] Uninstall rule for top level Makefile
Shawn O. Pearce
Re: [PATCH v2] Speed up bash completion loading
git-commits-head
:
Linux Kernel Mailing List
cgroups: clean up cgroup_pidlist_find() a bit
Linux Kernel Mailing List
sony-laptop: Add support for extended hotkeys
Linux Kernel Mailing List
IB/core: Add support for masked atomic operations
Linux Kernel Mailing List
V4L/DVB (8939): cx18: fix sparse warnings
Linux Kernel Mailing List
ipv6 mcast: Check address family of gf_group in getsockopt(MS_FILTER).
linux-netdev
:
Inaky Perez-Gonzalez
[PATCH 40/40] wimax/i2400m: add CREDITS and MAINTAINERS entries
Karsten Keil
[mISDN PATCH v2 05/19] Reduce stack size in dsp_cmx_send()
linux
Re: 2.6.23-rc8 network problem. Mem leak? ip1000a?
David Miller
Re: tun: Use netif_receive_skb instead of netif_rx
David Miller
Re: [net-next PATCH v2] llc enhancements
freebsd-current
:
Matthew Fleming
Re: [RFC] Outline of USB process integration in the kernel taskqueue system
illoai@gmail.com
Re: OT: 2d password
Hartmut Brandt
Re: problem with nss_ldap
Andrew Reilly
Re: FreeBSD's problems as seen by the BSDForen.de community
Max Laier
Re: Upcoming ABI Breakage in RELENG_7
Colocation donated by:
Syndicate