On Tue, Jan 29, 2008 at 12:30:06PM -0800, Christoph Lameter wrote:It's not ok because that function can very well overwrite existing and present ptes (it's actually the nonlinear common case fast path for db). With your code the sptes created between invalidate_range and populate_range, will keep pointing forever to the old physical page instead of the newly populated one. I'm also asking myself if it's a smp race not to call mmu_notifier(invalidate_page) between ptep_clear_flush and set_pte_at in install_file_pte. Probably not because the guest VM running in a different thread would need to serialize outside the install_file_pte code with the task running install_file_pte, if it wants to be sure to write either all its data to the old or the new page. Certainly doing the invalidate_page inside the PT lock was obviously safe but I hope this is safe and this can accommodate your needs too. The problem is the missing invalidate_page/range _after_ ptep_clear_flush. If a spte is built between invalidate_range and pte_offset_map_lock, it will remain pointing to the old page forever. Nothing will be called to invalidate that stale spte built between invalidate_page/range and ptep_clear_flush. This is why for the last few days I kept saying the mmu notifiers have to be invoked _after_ ptep_clear_flush and never before (remember the export notifier?). No idea how you can deal with this in your code, certainly for KVM sptes that's backwards and unworkable ordering of operation (exactly as backwards are doing the tlb flush before pte_clear in ptep_clear_flush, think spte as a tlb, you can't flush the tlb before clearing/updating the pte or it's smp unsafe). Yes, and the only reason this can be safe is for the reason explained at the top of the email, if the other cpu wants to serialize to be sure to write in the "new" page, it has to serialize with the page-fault but to serialize it has to wait the page fault to return (example: we're not going to call futex code until the page fault returns). --
| Andrew Morton | -mm merge plans for 2.6.23 |
| Chuck Ebbert | Why do so many machines need "noapic"? |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Greg Kroah-Hartman | [PATCH 023/196] MCP_UCB1200: Convert from class_device to device |
git: | |
| David Miller | Re: [BUG] New Kernel Bugs |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 31/37] dccp: Remove manual influence on NDP Count feature |
| Gregory Haskins | [RFC PATCH 00/17] virtual-bus |
