Con Kolivas [interview] posted a patch for the 2.6.13 kernel [story] that implements cache prefetching. The patch is based on earlier work by Thomas Schlichter. Con explains, "I have resuscitated and rewritten some early prefetch code Thomas Schlichter did in late 2.5 to create a configurable kernel thread that reads in swap from ram in reverse order it was written out. It does this once kswapd has been idle for a minute (implying no current vm stress)." He goes on to explain, "Note that swapped in pages are kept on backing store (swap), meaning no further I/O is required if the page needs to swap back out."
Con noted that the patch will be included in his next -ck patchset [story], however, he also posted it to the lkml to guage interest in eventually merging the effort into the mainline kernel, and to encourage greater testing and feedback.
From: Con Kolivas [email blocked] To: linux kernel mailing list [email blocked] Subject: [PATCH][RFC] vm: swap prefetch Date: Thu, 1 Sep 2005 23:46:32 +1000 Here is a working swap prefetching patch for 2.6.13. I have resuscitated and rewritten some early prefetch code Thomas Schlichter did in late 2.5 to create a configurable kernel thread that reads in swap from ram in reverse order it was written out. It does this once kswapd has been idle for a minute (implying no current vm stress). This patch attached below is a rollup of two patches the current versions of which are here: http://ck.kolivas.org/patches/swap-prefetch/ These add an exclusive_timer function, and the patch that does the swap prefetching. I'm posting this rollup to lkml to see what the interest is in this feature, and for people to test it if they desire. I'm planning on including it in the next -ck but wanted to gauge general user opinion for mainline. Note that swapped in pages are kept on backing store (swap), meaning no further I/O is required if the page needs to swap back out. Cheers, Con Kolivas
From: Hans Kristian Rosbach [email blocked] Subject: Re: [PATCH][RFC] vm: swap prefetch Date: Thu, 01 Sep 2005 16:18:23 +0200 On Thu, 2005-09-01 at 23:46 +1000, Con Kolivas wrote: > Here is a working swap prefetching patch for 2.6.13. I have resuscitated and > rewritten some early prefetch code Thomas Schlichter did in late 2.5 to > create a configurable kernel thread that reads in swap from ram in reverse > order it was written out. It does this once kswapd has been idle for a minute > (implying no current vm stress). This patch attached below is a rollup of two > patches the current versions of which are here: > > http://ck.kolivas.org/patches/swap-prefetch/ > > These add an exclusive_timer function, and the patch that does the swap > prefetching. I'm posting this rollup to lkml to see what the interest is in > this feature, and for people to test it if they desire. I'm planning on > including it in the next -ck but wanted to gauge general user opinion for > mainline. Note that swapped in pages are kept on backing store (swap), > meaning no further I/O is required if the page needs to swap back out. I would definitely use this if available. That said, I have often thought it might be good to have something like pre-writing swap, ie reverse what your patch does. In other words it'd keep as much of swappable data on disk as possible, but without removing it from memory. So when it comes time to free up some memory, the data is already on disk so no performance penalty from writing it out. Hopefully something worth thinking about. -HK
From: Con Kolivas [email blocked] Subject: Re: [PATCH][RFC] vm: swap prefetch Date: Fri, 2 Sep 2005 00:18:32 +1000 On Fri, 2 Sep 2005 00:18, Hans Kristian Rosbach wrote: > On Thu, 2005-09-01 at 23:46 +1000, Con Kolivas wrote: > > > > These add an exclusive_timer function, and the patch that does the swap > > prefetching. I'm posting this rollup to lkml to see what the interest is > > in this feature, and for people to test it if they desire. I'm planning > > on including it in the next -ck but wanted to gauge general user opinion > > for mainline. Note that swapped in pages are kept on backing store > > (swap), meaning no further I/O is required if the page needs to swap back > > out. > > I would definitely use this if available. Great. > That said, I have often thought it might be good to have something like > pre-writing swap, ie reverse what your patch does. > > In other words it'd keep as much of swappable data on disk as possible, > but without removing it from memory. So when it comes time to free up > some memory, the data is already on disk so no performance penalty from > writing it out. > > Hopefully something worth thinking about. Actually to some degree this patch does that, albeit only on things that are swapped out "naturally". Anything that is swapped out and is unnaturally swapped back in using prefetching is kept on swap, and you often find much more swap sitting around ready for freeing up ram whenever there is memory pressure again. Cheers, Con
From: Thomas Schlichter <thomas.schlichter@web.de> Subject: Re: [PATCH][RFC] vm: swap prefetch Date: Thu, 1 Sep 2005 17:15:36 +0200 Hi Con! Am Donnerstag, 1. September 2005 15:46 schrieb Con Kolivas: > Here is a working swap prefetching patch for 2.6.13. I have resuscitated > and rewritten some early prefetch code Thomas Schlichter did in late 2.5 to > create a configurable kernel thread that reads in swap from ram in reverse > order it was written out. It does this once kswapd has been idle for a > minute (implying no current vm stress). This patch attached below is a > rollup of two patches the current versions of which are here: > > http://ck.kolivas.org/patches/swap-prefetch/ > > These add an exclusive_timer function, and the patch that does the swap > prefetching. I'm posting this rollup to lkml to see what the interest is in > this feature, and for people to test it if they desire. I'm planning on > including it in the next -ck but wanted to gauge general user opinion for > mainline. Note that swapped in pages are kept on backing store (swap), > meaning no further I/O is required if the page needs to swap back out. I am (and some of my friends are) still interested in this functionality, so I'm definitly going to test your improved patch, of course. By the way, I'm quite happy that you came up with this new version of swap-prefetching, because I didn't and still don't have the time to develop or maintain it more... So thanks for your good work, and keep on helping Linux-Desktop-Users! :-) Thomas
How about cache pre-writing?
Normally the swap is not used unless it is needed. When a user has already filled their ram to the brink and starts a large app like OpenOffice.org two things happen:
- Some other stuff needs to get swapped out to disk to free ram
- A lot of stuff (the program and its files) needs to be loaded from the disk.
This results in very heavy disk-IO.
Instead the kernel could have a copy of the ram in swap. This copying would happen when there is no other disk IO. Then when the kernel would normally need to swap stuff to disk, the data would already be on the disk, and the data in ram could be freed directly.
By having a copy of the ram (or parts of it) in swap in advance, the io load would be cut in half in those situations.
Thoughts and comments are welcome.
Arghh
I should really have read the other other posts since the same idea is suggested there instead of just the summary. Feel free to moderate down (or remove) my previous comment and this one.
Oh, no you don't
Tthe punishment for "post first, read later" is to have your comments swapped to disk and immediately backed up for posterior.
HAND
Post first, read later, favorite mode
Well, see, if you waited until you actually read and understood, there would be a lot less heated discussion, and that just wouldn't be as much fun or confusion.
Actually this sounds like a good fix. We will see how it works over time. It is certainly thinking in the right direction.
But there must be more people just waiting to comment.
Ah, but since there isn't muc
Ah, but since there isn't much posting going on, has the comment been pre-fetched back in? Does kerneltrap enable comment-swap pre-fetch? inquiring minds want to know...
Empirically ...
When there is little free memory then it does swap the RAM's pages to swap.
When there is much free memory then it does prefetch-back the random swap's pages to RAM until the maximum possibilities.
Advantages?
Little more bit of speeeeeeeeeed.
Disadvantages?
Stupid, the lifetime of the harddisk is a little bit shorter.
I need dynamic 800 GB swap, my memory is the real volatile-DataBase.
Not argh
Mikko I read your idea as speculative writing of RAM to swap, and Con's work as speculate reading of swap to RAM. Hence they are complimentary ideas.
I'm not sure how this sort of
I'm not sure how this sort of data is handled already: I was under the impression that stuff from the program binary was just thrown away in case of "swapping", since it's already on the disk. I'm not sure how the kernel squares this with patching the underlying binary on system upgrade, though...
I guess for stuff like program data, this could be quite a good thing to do (equally well, if programs mmap their data when possible rather than using read() and friends, the above scheme will also work for this case (changes to the mmap'd page will still be flushed out is this case). I know the mechanisms for Doing The Right Thing with mmap are present.
MMapping
The things you mention are actually identical. Program code is never "loaded" to memory, it is mmap'ed. Writing mmap'ed stuff to swap is useless (and harmful), since it already exists on disk. Swap is only used for runtime data structures and similar data.
The "underlying binary" is done just like opening a file and then deleting it from another process. The program can use the file as normal, but other processes can't see it anymore and it is purged when the process that owns it closes the file or is killed.
The parent has it right, but needs elaboration.
Allow me to elaborate. UNIX filesystems have a concept of "inodes" that store the body of the file, its permissions and its ownership. The inodes get linked into directories via names--aka. directory entries. The same inode can be linked into the filesystem in multiple places. (Hence the concept of a "hard link.") The filesystem keeps track of how many links an inode has, and the kernel keeps track of how many processes have opened a given inode. This concept is important, and I will come back to it.
When an executable runs, the executable's file as well as the files for all the libraries it depends on get opened. The pages for these files get mmap()'d into the process' address space as file-backed virtual memory. The memory gets marked copy-on-write, so that any changes to the mmap()'d code result in a fault, and break the file backing. In any case, the file-backed portions are backed by the contents of the inodes themselves.
Under virtual memory pressure, the kernel will have to deallocate physical pages of memory from some processes in order to allocate them to others. There are two strategies available here: Write dirty pages to swap, and discard clean pages. Clean pages are pages which have either an explicit file backing (such as program executable pages), and pages that were previously swapped, brought back in, but still have an equivalent copy in the swap partition. (This is sometimes refered to as the "swap cache," though I don't know if that designation is accurate.)
So yes, under memory pressure, some pages of an executable might get discarded and will need to be brought in later from the original executable. The grandparent wonders how that works if a user upgrades a binary while the executable runs.
Recall that there's the separation between the file's contents (the inode) and the name given to it in the file system (hard link to the inode). File descriptors are bound to inodes, not directory entries. When you "rm" a file, you remove the link between the directory and the inode. When you replace a file, say with "cp," the existing inode gets unlinked and a new inode gets linked in its place. When you "mv" a file, it gets linked in its new location, and unlinked from its old location.
The filesystem code does not reclaim the space allocated to the inode until all references to the inode drop. This includes all filesystem links and open file descriptors. Thus, when you replace a program's executable while it executes, the currently running program continues to see the old executable, even if the inode doesn't have a visible link in the filesystem. The inode will remain allocated until all of its open file descriptors get closed. Then and only then will the filesystem reclaim the storage associated with the inode.
In fact, it is this property of UNIX derived filesystems that leads to all the orphaned inodes you find in "lost+found/" after a fsck if your system gets shut down abruptly. Any inodes that were open at the point of the crash, but which did not have a hard directory link end up here.
My God.
"In fact, it is this property of UNIX derived filesystems that leads to all the orphaned inodes you find in "lost+found/" after a fsck if your system gets shut down abruptly. Any inodes that were open at the point of the crash, but which did not have a hard directory link end up here."
My God... This might sound dumb, but in 7 years of linux as my primary desktop I never figured that out. Thanks, makes sense, I suppose.
Thanks for the explanation -
Thanks for the explanation - very helpful. It's clarified things a lot for me.
Presumably non-truncating writes to a file which is in executable use are presumably disallowed, right?
There's no mechanism that I know of.
There's no mechanism I know of (other than permissions, and chmod/chown can change those) that prevents someone from opening an executable as O_RDWR or O_WRONLY and either changing the contents of the file or truncating it. File locking (e.g. flock()) is usually advisory, meaning programs have to expect to take the lock for the lock to do any good.
Those classes of file modification will potentially interfere with a running program. (I say "potentially", since writes to areas the program never refers to obviously will have no effect.)
Anonümous
$ cat test.c #include <sys/types.h> #include <fcntl.h> #include <errno.h> #include <stdio.h> int main() { int fd; int i; fd = open("a.out", O_RDWR); if(fd < 0) { printf("error opening file:%s\n", strerror(errno)); } return 0; } $ gcc test.c $ ./a.out error opening file:Text file busyETXTBUSY
One of the tales in the Unix Hater's Handbook covers older Unix systems which lacked this feature.
There is a mechanism that prevents an executable from being modified while it is being executed. Observe:
zblaxell@satsuki:~$ cp /bin/bash /tmp/sh-test
zblaxell@satsuki:~$ /tmp/sh-test
zblaxell@satsuki:~$ date > /tmp/sh-test
sh-test: /tmp/sh-test: Text file busy
zblaxell@satsuki:~$ exit
exit
zblaxell@satsuki:~$ date > /tmp/sh-test
zblaxell@satsuki:~$ cat /tmp/sh-test
Sat Sep 3 17:51:21 EDT 2005
This works even when the writer is root, and over some network filesystems, e.g. NFS (and sometimes in counterintuitive ways--due to caching, it may not be possible to modify a file that was _ever_ executed without killing the NFS server).
All robust package managers replace package files by creating new files and renaming them over old ones. This preserves the old files as long as they are in use, and makes writing the new files possible even in the ETXTBUSY case.
Thanks! You plugged a hole in my knowledge.
Interesting. You still can unlink a busy text file or "mv" it to a new location, though, as you noted.
I don't know if ETXTBSY was present on the old AT&T SVR3 and SVR4 systems I learned on back in 1992.
Yes, because if you unlink it
Yes, because if you unlink it, the old inode still exists until all users close the file. That's why under UNIX you don't have to reboot your system resp. close applications when installing new software as opposed to some other OSes.
proc
What about writing to the processes /proc/[pid]/mem? Surely that works. So indeed you can modify the process as it is being executed. But the maps to the program's text sections aren't marked as shared and thereby not written back to disk.
But modifying a running program is absolutely possible.
> File locking (e.g. flock())
> File locking (e.g. flock()) is usually advisory
Yup. There's some (slightly grim) hack you can do under Linux (involving setting weird permissions, then flock()ing) which lets you do mandatory locking - it's not POSIX though.
there's such a mechanism, loo
there's such a mechanism, look up MAP_DENYWRITE/VM_DENYWRITE. due to potential DoS, it cannot be used from userland, but the kernel itself does use it when it maps the main executable image. this is why you can't open for writing an executable that is running at the time, whereas you can do the same with a shared library (and cause all kinds of problems if you actually write into it, this is why updating libraries on a system uses some file rename tricks instead of simply writing new content into the existing file/inode).
lost+found
"In fact, it is this property of UNIX derived filesystems that leads to all the orphaned inodes you find in "lost+found/" after a fsck if your system gets shut down abruptly. Any inodes that were open at the point of the crash, but which did not have a hard directory link end up here."
Modern filesystems with journalling (ext3, reiserfs, xfs) tend to be immune to this problem--the filesystem maintains a list of inodes that were deleted but still in use, and in several common "safe" crash cases (e.g. power loss or kernel lockup with properly functioning hard disks) the filesystem quietly removes these just before the filesystem is next mounted read-write. Thus in normal operation of these filesystems no lost+found entries will appear.
lost+found entries do still appear if there is data corruption during (or leading to) a crash. In this case filesystem metadata is lost or corrupted, so files that were not deleted are nonetheless inaccessible. Inodes typically feature a field which indicates if the inode is still in use, so fsck can find these and make new directory entries for them under lost+found.
Very true
This is very true. As someone who grew up w/ UFS and Ext2, though, I've had my share of inodes in "lost+found/"...
Memory pools!?... or a swap cache
i'm not a kernel hacker... just a curious that try to understand, so...
" This results in very heavy disk-IO. "
Perhaps i dont know the exact conventional names to describe kernel functionality, but i'll try to explain my ideas. Is it something really stupid to advocate the creation of memory pools adressable by the kernel ?
My idea(stupid or not) is that pages marked as "obvious candidates" for swap should not be imediately swaped but trowned "defragmented" to a *reserved* portion of physical memory.Thus kswapd will delegate the part of moving 'swapable pages' to a defragmenter that will put them defragmented in this *reserved memory swap cache*. Then Con Kolivas mechanism could carry it further from that 'swap cache' in physical memory to disk and vice-versa with the additional bonus of working with defragmented blocks.
That would stop disk IO !?... and be very usefull because i suspect with the flush in of multicore + simultaneos mutithreading CPUs, OSes going with multiple VMs for other OSes and applications heavy multithreaded, the usual way of swapping dont work because a physical page that is swapable now could be absolutely required next second in a highly CPU context swaping of threads and processes... thus making kswapd lighter more agressive and in minimal fucntionality would stop him from wasting useful CPU cicles and IO bandwith, better used by a proper mechanism!
Other idea is that disk cache, should always be created as two separeted physical memory pools, program and data. Better, a *physical memory* 'program cache' pool could also be created, with a proper mechanism requiring that program bits 'should' enter this pool already in a *continuous order*, that is defragmented(and this is possible because programs bits only change when are upgraded,i.e. almost never in CPU time!), and not trowned into the general physical memory space 'highly competition' pool for any 4K page of physical memory, when or where ever available.
This *program cache* memory pool is certainly not a hot requirement for server systems with minimal services and gigabyte data requirements, but could be a killer feature for workstation/desktop, because differently from a RAMDisk it would be quicker and more versatile as in the possibility of making their size hot dynamic, holding defragmented program bits from not only any required runtime but also other executables from /bin, /usr/bin or /usr/sbin scheduled from a simple algorithm, based on simple parameters as many times runned and usefulness.
That would stop disk IO also?!... And belive none of this will deprecate performance, because in a tipical 4Gb server or a 1Gb desktop reserving even 200Mb or 300Mb would be hardly noticed. Another big bonus is that the VM would probabily get a much bigger pool of continously adressable 4K memory pages to work with.
Sorry to comment myself, but
Sorry to comment myself, but i belive that some parts of my above posting need more clarification...
What i advocate above is not the creation by the kernel of some sort of protected had hoc ramdisks in memory, but *mostly* the creation of a proper much requested defragmentation mechanism, that could encompass swapping and pre-fetching, and be in charge of *most* moving arround of memory bits.
Here is one of my difficulties, because this mechanism should not clash with normal memory allocations and file system functionality. But i belive that if it(the defragmentor) were in charge, where fit, of moving the bits arround in the general memory space, in a "proper" order, no clashes will occur.
In the context of this article thread, it should be nice to mention first that the mechanisms of "clever" pre-fetching should not be only from disk 'swap area', as in Con Kolivas patch, but from disk in general; that is, all disk caching could be made in a pre-fetching mode as extensively as possible,... and more, this much requested new kernel mechanism(defragmentor) "could" be implemented in a fashion much more "friendlier" to this pre-fetching activity than any other structure in the kernel... i will try to explain.
The idea is that *going into* physical memory would be "as usual" since the VM would see all the memory space only reserving or marking small "contiguous" portions of physical memory(the caches) as not directly adressable by applications or other normal processes, only the defragmentor. From a general application prespective a system would have not 4000Mb or 1000Mb of RAM but 4000-300MB or 1000-200Mb arbitraly determined by a configurable kernel parameter(s), effective on system boot, with a possible minimal(s) and maximum(s) value, and changeable as any other kernel parameter.
The *going out* of memory is the one that will be done differently.
Swapping:
kswapd, would have the function, perhaps in a little more agressive and preemptive fashion, to determine which pages will be ready for swapping or writeback, more or less as it is now, but only marks them as so into a 'list' that will be a service order for the defragmentor to work with. kswaped will not move a bit. I belive since the VM has reverse mapping such a list would not be hard to implement. The defragmentor *should and could* be then much more clever than kswaped, in trying to move all marked pages in a determined time frame. That moving of pages could be done in a "proper" order, with enormous gains.
That is, from scattered physical memory into other contiguous locations in memory would be normal defragmentation. From scattered physical memory to other contiguous locations but now in this physical memory *swap cache area*, would be (one) of the ways to get much more contiguous 4K adressable pages under VM pressure rise, and a thermometer preventing pre-fetching from exploding into disk activity on VM pressure oscilation. Finaly from this *swap cache area* into disk virtual memory, but now in an obvious contiguous bit order, would be the normal swapping.
If done "always" in this "hierarchical" order, that is, writing to this *swap cache area* first and only then to disk swap, will manage to get a constant speculative cache area (holding always speculative data) and a perfect(imo) companion to pre-fetching and in consequence i suspect of future multicore multythreaded CPUs. Even if VM pressure explodes i fail(my fault) to see why this would prevent normal kernel operations as their are implemented now, or impact performance noticeably. Last but not least there is the bonus of getting a permanent defragmented physical memory area AND defragmented disk virtual memory swap area(S), even if the rest of physical memory looks worst than spaghetty.
Data writeback:
The general disk cache would be exactly as it is implemented now, but writeback to disk would be conducted by the defragmentor, in a similar hierarchical order as above, trough a comparatively very small physical memory reserved *writeback data cache*.
Here the bits are put in this writeback data cache first by the defragmentor, in contiguous pages, and only then live this 'writeback cache' to disk in contiguous bit order two.
If done always in this order, we not only get another small area of always defragmented physical memory area, holding also perhaps some valuable speculative data, restraining also a possible general disk cache pre-fetching mechanism from exploding into disk activity under "oscilative" VM pressure AND better than all we tend to get the disk automaticaly defragmented as data is used.
"In the middle of this memory activity there always be much more contiguous 4K adressable pages than possible with the present mechanisms"
Program cache:
Here my difficulty is explaining why i belive this "program cache" could be totally transparent to filesystems and normal program load. Its mechanism should be different from the described above with the pre-fectching mechanism having the gross of the 'cleverness'. This pre-fetching should 'request' into this pysical memory *program cache*, actual runtimes, executables and perhaps some librarys based on their determined usefulness, many times used and or other parameters. Can't really figure it out working without the assistance of a file system, but since there is already a 'RAM' filesystem (sysfs) implemented, why not use it ?!
Here if sysfs is implemented to always rest on this physical memory reserved *program cache* space, it is a perfect marriage, because it will automatically get out of the way of swap pressure... and even reliance on the defragmentor for the moving of the cached program bits would not harm, on contrary, it would be good for maintaining it defragmented!?. Swapping sysfs is not a issue, and making it act some how a little more like a normal ramdisk where data writeback is not necessary, a piece of cake, i belive!?
" If implemented properly, i belive this "hierarchical" caching will be a "perfect" speculative mechanism where "things" for trowing away are puted aside from interfering with normal memory operations but not really gone, because if happens they are needed (happens a lot i suspect) they are already there and no disk IO is wasted, adding a lot the fact of helping to get in the process much more "things" defragmented. I belive this is a major boost that only wastes a relatively small portion of physical memory and some negligenciable CPU cicles on overhead,... nothing! compared to the gains. "
I've read a comment somewhere from a M$ engineeer at WinHEC that this kind of caches, more in the sense of traditional "ramdisks" or "solid state" for disk caching, if loaded correctly in a dynamic fashion, could reduce disk activity to sporadic surges from 15 minutes to 15 minutes with not much more than 64MB of SPACE requirement for a *normal* mobile computing use!;... a must for mobile computing power saving needs.
Perhaps i'm missing something here, or am over my head more then i'm willing to belive... so if i said something "really" stupid please let me know.
pre-caching
i have been wondering for a while if there exists a way for a program to ask the file caching system to cache a file in advance of it being needed. For example when i open my home folder in a file browser, if the all the sub directories where preemptivly read into ram, then it would be quicker for me when i double click on my docs folder.
i suppose my file browser could read the directory and hold it in its memory, but if several apps are doing this then there will be waste. if the request was handled be the file caching system then the kernel could decide if the resources where avaiable to do the caching, or if there where more important things in the disk cach.
i am a long way from being a kernel hacker, so i could not do this my self. it may be a silly thing to do, or it may have been done already. but if it is a new idea, and feasable then i think it could make some programs a bit snappier.
ssam
Tree pre-fetching
One API I'd really kill to have on POSIX-like systems is something that allows me to iterate through filenames or filenames+stat() pairs in the most optimal order for the given filesystem. Usually I don't care what order this data comes in (breadth first, depth first, alphabetically sorted, none of the above) as long as it recursively hits an entire subtree; however, I do care about having to stop what I'm doing every millisecond to wait for a seek that will take up to 100ms. If all that data could appear in a chunk between 500K and 5M in size, read off the disk into memory without seeking, I'd do in milliseconds what currently takes minutes.
Some filesystems do have some limited support for making tree walks fast--if you know some implementation details of the filesystem you can use filesystem-specific optimizations (e.g. read filenames in the order they come back from readdir, or stat() files in inode number order); however, the kernel forces applications to choose an order of operations, so whatever your application does now will perform well on one filesystem and poorly on all others.
Advance caching..
>i have been wondering for a while if there exists a way
>for a program to ask the file caching system to cache
>a file in advance of it being needed.
Sure. Just spawn a thread/child to open each file RDONLY, mmap(MAP_POPULATE) it, and close()/munmap() it. The kernel will keep it around in the page cache until it decides it has a better use for them memory, and will share the file's pages with anything else that is interested in them.
-ml
Agreed.
I think that it would be very difficult for the kernel to know when it needs to preload the subdirectories and when it must not do it.
It is much simpler to do it in the userspace..
New version
I just posted an improved version on my website and lkml:
http://ck.kolivas.org/patches/swap-prefetch/
Update
Thansk or the new update Con Kolivas works like a charm.