login
Login
/
Register
Search
Forums
News
Blogs
Features
Site
Home
»
Mailing list archives
»
linux-kernel
»
2008
»
January
»
25
Re: [PATCH -v8 4/4] The design document for memory-mapped file times update
view
thread
!MAILaRCHIVE_VOTE_RePLACE
Previous message: [
thread
] [
date
] [
author
]
Next message: [
thread
] [
date
] [author]
[view in full thread]
From:
Anton Salikhmetov <salikhmetov@...>
To: Randy Dunlap <randy.dunlap@...>
Cc: <linux-mm@...>, <jakob@...>, <linux-kernel@...>, <valdis.kletnieks@...>, <riel@...>, <ksm@...>, <staubach@...>, <jesper.juhl@...>, <torvalds@...>, <a.p.zijlstra@...>, <akpm@...>, <protasnb@...>, <miklos@...>, <r.e.wolff@...>, <hidave.darkstar@...>, <hch@...>
Subject:
Re: [PATCH -v8 4/4] The design document for memory-mapped file times update
Date: Friday, January 25, 2008 - 12:40 pm
2008/1/25, Randy Dunlap <randy.dunlap@oracle.com>:
quoted text
> On Wed, 23 Jan 2008 02:21:20 +0300 Anton Salikhmetov wrote: > > > Add a document, which describes how the POSIX requirements on updating > > memory-mapped file times are addressed in Linux. > > Hi Anton, > > Just a few small comments below... > > > Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com> > > --- > > Documentation/vm/00-INDEX | 2 + > > Documentation/vm/msync.txt | 117 ++++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 119 insertions(+), 0 deletions(-) > > > > diff --git a/Documentation/vm/00-INDEX b/Documentation/vm/00-INDEX > > index 2131b00..2726c8d 100644 > > --- a/Documentation/vm/00-INDEX > > +++ b/Documentation/vm/00-INDEX > > @@ -6,6 +6,8 @@ hugetlbpage.txt > > - a brief summary of hugetlbpage support in the Linux kernel. > > locking > > - info on how locking and synchronization is done in the Linux vm code. > > +msync.txt > > + - the design document for memory-mapped file times update > > numa > > - information about NUMA specific code in the Linux vm. > > numa_memory_policy.txt > > diff --git a/Documentation/vm/msync.txt b/Documentation/vm/msync.txt > > new file mode 100644 > > index 0000000..571a766 > > --- /dev/null > > +++ b/Documentation/vm/msync.txt > > @@ -0,0 +1,117 @@ > > + > > + The msync() system call and memory-mapped file times > > + > > + Copyright (C) 2008 Anton Salikhmetov > > + > > +The POSIX standard requires that any write reference to memory-mapped file > > +data should result in updating the ctime and mtime for that file. Moreover, > > +the standard mandates that updated file times should become visible to the > > +world no later than at the next call to msync(). > > + > > +Failure to meet this requirement creates difficulties for certain classes > > +of important applications. For instance, database backup systems fail to > > +pick up the files modified via the mmap() interface. Also, this is a > > +security hole, which allows forging file data in such a manner that proving > > +the fact that file data was modified is not possible. > > + > > +Briefly put, this requirement can be stated as follows: > > + > > + once the file data has changed, the operating system > > + should acknowledge this fact by updating file metadata. > > + > > +This document describes how this POSIX requirement is addressed in Linux. > > + > > +1. Requirements > > + > > +1.1) the POSIX standard requires updating ctime and mtime not later > > +than at the call to msync() with MS_SYNC or MS_ASYNC flags; > > + > > +1.2) in existing POSIX implementations, ctime and mtime > > +get updated not later than at the call to fsync(); > > + > > +1.3) in existing POSIX implementation, ctime and mtime > > +get updated not later than at the call to sync(), the "auto-update" feature; > > + > > +1.4) the customers require and the common sense suggests that > > +ctime and mtime should be updated not later than at the call to munmap() > > +or exit(), the latter function implying an implicit call to munmap(); > > + > > +1.5) the (1.1) item should be satisfied if the file is a block device > > +special file; > > + > > +1.6) the (1.1) item should be satisfied for files residing on > > +memory-backed filesystems such as tmpfs, too. > > + > > +The following operating systems were used as the reference platforms > > +and are referred to as the "existing implementations" above: > > +HP-UX B.11.31 and FreeBSD 6.2-RELEASE. > > + > > +2. Lazy update > > + > > +Many attempts before the current version implemented the "lazy update" approach > > +to satisfying the requirements given above. Within the latter approach, ctime > > +and mtime get updated at last moment allowable. > > + > > +Since we don't update the file times immediately, some Flag has to be > > +used. When up, this Flag means that the file data was modified and > > +the file times need to be updated as soon as possible. > > I would s/up/set/ above and below. > > > +Any existing "dirty" flag which, when up, mean that a page has been written to, > > s/mean/means/ > > > +is not suitable for this purpose. Indeed, msync() called with MS_ASYNC > > +would have to reset this "dirty" flag after updating ctime and mtime. > > +The sys_msync() function itself is basically a no-op in the MS_ASYNC case. > > +Thereby, the synchronization routines relying upon this "dirty" flag > > +would lose data. Therefore, a new Flag has to be introduced. > > + > > +The (1.5) item coupled with (1.3) requirement leads to hard work with > > +the block device inodes. Specifically, during writeback it is impossible to > > +tell which block device file was originally mapped. Therefore, we need to > > +traverse the list of "active" devices associated with the block device inode. > > +This would lead to updating file times for block device files, which were not > > +taking part in the data transfer. > > + > > +Also all versions prior to version 6 failed to correctly process ctime and > > +mtime for files on the memory-backed filesystems such as tmpfs. So the (1.6) > > +requirement was not satisfied. > > + > > +If a write reference has occurred between two consecutive calls to msync() > > +with MS_ASYNC, the second call to the latter function should take into > > +account the last write reference. The last write reference can not be caught > > s/can not/cannot/ > > > +if no pagefault occurs. Hence a pagefault needs to be forced. This can be done > > +using two different approaches. The first one is to synchronize data even when > > +msync() was called with MS_ASYNC. This is not acceptable because the current > > +design of the sys_msync() routine forbids starting I/O for the MS_ASYNC case. > > +The second approach is to write protect the page for triggering a pagefault > > s/write protect/write-protect/ > > > +at the next write reference. Note that the dirty flag for the page should not > > +be cleared thereby. > > + > > +In the "lazy update" approach, the requirements (1.1), (1.2), (1.3), and (1.4) > > +taken together result in adding code at least to the following kernel routines: > > +sys_msync(), do_fsync(), some routine in the unmap() call path, some routine > > +in the sync() call path. > > + > > +Finally, a file_update_time()-like function would have to be created for > > +processing the inode objects, not file objects. This is due to the fact that > > +during the sync() operation, the file object may not exist any more, only > > +the inode is known. > > + > > +To sum up: this "lazy" approach leads to massive changes, incurs overhead in > > +the block device case, and requires complicated design decisions. > > + > > +3. Immediate update > > + > > +OK, still reading? There's a better way. > > + > > +In a fashion analogous to what happens at write(2), react to the fact > > +that the page gets dirtied by updating the file times immediately. > > +Thereby any page writeback happens when the write reference has already > > +been accounted for from the view point of file times. > > Probably s/view point/viewpoint/. > > > + > > +The only problem which remains is to force refreshing file times at the write > > +reference following a call to msync() with MS_ASYNC. As mentioned above, all > > +that is needed here is to force a pagefault. > > + > > +The vma_wrprotect() routine introduced in this patch series is called > > +from sys_msync() in the MS_ASYNC case. The former routine is essentially > > +a version of existing page_mkclean_one() function from mm/rmap.c. Unlike > > +the latter function, the vma_wrprotect() does not touch the dirty bit. > > -- > > Thanks for the design document.
Thank you for your feedback. I'll take your suggestions into account.
quoted text
> --- > ~Randy >
--
unsubscribe notice
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to
majordomo@vger.kernel.org
More majordomo info at
http://vger.kernel.org/majordomo-info.html
Please read the FAQ at
http://www.tux.org/lkml/
Previous message: [
thread
] [
date
] [
author
]
Next message: [
thread
] [
date
] [author]
Messages in current thread:
[PATCH -v8 0/4] Fixing the issue with memory-mapped file times
, Anton Salikhmetov
, (Tue Jan 22, 7:21 pm)
[PATCH -v8 4/4] The design document for memory-mapped file t...
, Anton Salikhmetov
, (Tue Jan 22, 7:21 pm)
Re: [PATCH -v8 4/4] The design document for memory-mapped fi...
, Randy Dunlap
, (Fri Jan 25, 12:27 pm)
Re: [PATCH -v8 4/4] The design document for memory-mapped fi...
, Anton Salikhmetov
, (Fri Jan 25, 12:40 pm)
Re: [PATCH -v8 4/4] The design document for memory-mapped fi...
, Miklos Szeredi
, (Wed Jan 23, 5:26 am)
Re: [PATCH -v8 4/4] The design document for memory-mapped fi...
, Anton Salikhmetov
, (Wed Jan 23, 6:37 am)
Re: [PATCH -v8 4/4] The design document for memory-mapped fi...
, Miklos Szeredi
, (Wed Jan 23, 6:53 am)
Re: [PATCH -v8 4/4] The design document for memory-mapped fi...
, Miklos Szeredi
, (Wed Jan 23, 7:16 am)
Re: [PATCH -v8 4/4] The design document for memory-mapped fi...
, Anton Salikhmetov
, (Wed Jan 23, 8:25 am)
Re: [PATCH -v8 4/4] The design document for memory-mapped fi...
, Miklos Szeredi
, (Wed Jan 23, 9:55 am)
[PATCH -v8 1/4] Massive code cleanup of sys_msync()
, Anton Salikhmetov
, (Tue Jan 22, 7:21 pm)
[PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys_msy...
, Anton Salikhmetov
, (Tue Jan 22, 7:21 pm)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Linus Torvalds
, (Wed Jan 23, 1:05 pm)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Nick Piggin
, (Wed Jan 23, 9:36 pm)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Matt Mackall
, (Thu Jan 24, 2:56 pm)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Peter Zijlstra
, (Wed Jan 23, 1:41 pm)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Linus Torvalds
, (Wed Jan 23, 3:35 pm)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Miklos Szeredi
, (Wed Jan 23, 3:55 pm)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Linus Torvalds
, (Wed Jan 23, 5:00 pm)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Miklos Szeredi
, (Wed Jan 23, 5:16 pm)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Linus Torvalds
, (Wed Jan 23, 5:36 pm)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Miklos Szeredi
, (Wed Jan 23, 8:05 pm)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Linus Torvalds
, (Wed Jan 23, 8:11 pm)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Hugh Dickins
, (Wed Jan 23, 6:29 pm)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Linus Torvalds
, (Wed Jan 23, 6:41 pm)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Hugh Dickins
, (Wed Jan 23, 8:03 pm)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Anton Salikhmetov
, (Wed Jan 23, 1:26 pm)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Miklos Szeredi
, (Wed Jan 23, 5:41 am)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Peter Zijlstra
, (Wed Jan 23, 4:47 am)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Anton Salikhmetov
, (Wed Jan 23, 8:53 am)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Peter Zijlstra
, (Wed Jan 23, 4:51 am)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Miklos Szeredi
, (Wed Jan 23, 5:34 am)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Miklos Szeredi
, (Wed Jan 23, 5:51 am)
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys...
, Anton Salikhmetov
, (Wed Jan 23, 9:09 am)
[PATCH -v8 2/4] Update ctime and mtime for memory-mapped files
, Anton Salikhmetov
, (Tue Jan 22, 7:21 pm)
Re: [PATCH -v8 2/4] Update ctime and mtime for memory-mapped...
, Linus Torvalds
, (Wed Jan 23, 2:03 pm)
Re: [PATCH -v8 2/4] Update ctime and mtime for memory-mapped...
, Anton Salikhmetov
, (Wed Jan 23, 7:14 pm)
Navigation
Create content
Mailing list archives
Recent posts
Popular discussions
linux-kernel
:
Justin C. Sherrill
Re: dragonflybsd.org website link?
David Woodhouse
Re: -mm merge plans for 2.6.23
Greg Kroah-Hartman
[PATCH 002/196] Chinese: rephrase English introduction in HOWTO
Eric Sandeen
Re: [RFC] Heads up on sys_fallocate()
git
:
linux-netdev
:
David Miller
[GIT]: Networking
Gerrit Renker
[PATCH 15/37] dccp: Set per-connection CCIDs via socket options
Patrick McHardy
[NET_SCHED 01/15]: sch_atm: fix format string warning
openbsd-misc
:
Colocation donated by:
Who's online
There are currently
2 users
and
707 guests
online.
Online users
makeup13
earlehensley35
Syndicate