Thomas Schlichter, following up to a Request-for-Comment on lkml, sent a patch implementing Swap Prefetch. More patches followed as he got more feedback about this.
The idea is to swap pages in when there is free memory, and CPU and I/O loads are low; this can improve latency when the system has been idle for a while. Andrew Morton offered some improvements, including waiting for the disk holding the swap are idle before starting the swapin.
The big problem with this idea is that there is no way to "know" which pages will be the next to be used (that is, which pages to prefetch from swap), so the patch might actually slow things down if it swaps in the wrong pages.
From: Thomas Schlichter To: linux-kernel Subject: An idea for prefetching swapped memory... Date: 2003-04-07 8:26:43 Hello, some days ago some friends and me argued about a feature which seems not to be included in current OSs but could improve useability mainly for desktop computers. The idea was about prefetching swapped out pages when some memory is free, the CPU is idle and the I/O load is low. So this should not 'cost' much but behave better on following situation: (I think there are even more such situations, this one should just be an example) One is surfing the internet and having some browser windows opened. Now, without closing the browser windows, he is playing some game which needs pretty much memory so the browsers memory is getting swapped out. After finishing gaming he's going to make some coffee and then surfing the internet again. But even if the computer was IDLE for a time and, as the game was closed again, some memory is really FREE, the pages for the browser are swapped in just when they are needed and not in advance. With this feature there should be no performance decrease because only free resources would be used, and if pages were swapped in but not be used, they stay not dirty and so have not to be written to disk when they are swapped out again. But the improvements should be obvious if simply the last swaped out pages are swapped in again... If somebody could give me a hint how to implement this I would try it. I hope it will not be very difficult... ;-) Thank you for reading and perhaps thinking about it... Best regards Thomas Schlichter
From: Con Kolivas To: linux-kernel Subject: Re: An idea for prefetching swapped memory... Date: 2003-04-07 03:40:07 PST On Mon, 7 Apr 2003 18:26, Thomas Schlichter wrote: > Hello, > > some days ago some friends and me argued about a feature which seems not to > be included in current OSs but could improve useability mainly for desktop > computers. > > The idea was about prefetching swapped out pages when some memory is free, > the CPU is idle and the I/O load is low. > > So this should not 'cost' much but behave better on following situation: > (I think there are even more such situations, this one should just be an > example) > > One is surfing the internet and having some browser windows opened. Now, > without closing the browser windows, he is playing some game which needs > pretty much memory so the browsers memory is getting swapped out. After > finishing gaming he's going to make some coffee and then surfing the > internet again. > But even if the computer was IDLE for a time and, as the game was closed > again, some memory is really FREE, the pages for the browser are swapped in > just when they are needed and not in advance. > > With this feature there should be no performance decrease because only free > resources would be used, and if pages were swapped in but not be used, they > stay not dirty and so have not to be written to disk when they are swapped > out again. But the improvements should be obvious if simply the last swaped > out pages are swapped in again... This has been argued before. Why would the last swapped out pages be the best to swap in? The vm subsystem has (somehow) decided they're the least likely to be used again so why swap them in? Alternatively how would it know which to swap in instead? Con
From: Thomas Schlichter To: linux-kernel Subject: Re: An idea for prefetching swapped memory... Date: 2003-04-07 04:00:11 PST Quoting Con Kolivas: > On Mon, 7 Apr 2003 18:26, Thomas Schlichter wrote: > > Hello, > > > > some days ago some friends and me argued about a feature which seems not to > > be included in current OSs but could improve useability mainly for desktop > > computers. > > > > The idea was about prefetching swapped out pages when some memory is free, > > the CPU is idle and the I/O load is low. > > > > So this should not 'cost' much but behave better on following situation: > > (I think there are even more such situations, this one should just be an > > example) > > > > One is surfing the internet and having some browser windows opened. Now, > > without closing the browser windows, he is playing some game which needs > > pretty much memory so the browsers memory is getting swapped out. After > > finishing gaming he's going to make some coffee and then surfing the > > internet again. > > But even if the computer was IDLE for a time and, as the game was closed > > again, some memory is really FREE, the pages for the browser are swapped in > > just when they are needed and not in advance. > > > > With this feature there should be no performance decrease because only free > > resources would be used, and if pages were swapped in but not be used, they > > stay not dirty and so have not to be written to disk when they are swapped > > out again. But the improvements should be obvious if simply the last swaped > > out pages are swapped in again... > > This has been argued before. Why would the last swapped out pages be the best > > to swap in? The vm subsystem has (somehow) decided they're the least likely > > to be used again so why swap them in? Alternatively how would it know which > > to swap in instead? > > Con > What I wanted to say is that if there is free memory it should be filled with the pages that were in use before the memory got rare. And these are the pages swapped out last. The other swapped out pages are swapped out even longer and so will likely not be used in the near future... (That's what the LRU algorithm says...) Thomas
From: Måns Rullgård To: linux-kernel Subject: Re: An idea for prefetching swapped memory... Date: 2003-04-07 04:30:15 PST Thomas Schlichter writes: > > This has been argued before. Why would the last swapped out pages > > be the best to swap in? The vm subsystem has (somehow) decided > > they're the least likely to be used again so why swap them in? > > Alternatively how would it know which to swap in instead? Con > > What I wanted to say is that if there is free memory it should be > filled with the pages that were in use before the memory got > rare. And these are the pages swapped out last. The other swapped > out pages are swapped out even longer and so will likely not be used > in the near future... (That's what the LRU algorithm says...) Would it be possible to track the most recently used swapped out page? This would possibly be a good candidate for speculative loading. -- Måns Rullgård
From: Thomas Schlichter To: linux.kernel Subject: Re: An idea for prefetching swapped memory... Date: 2003-04-07 05:50:17 PST Quoting Måns Rullgård: > Thomas Schlichter writes: > > What I wanted to say is that if there is free memory it should be > > filled with the pages that were in use before the memory got > > rare. And these are the pages swapped out last. The other swapped > > out pages are swapped out even longer and so will likely not be used > > in the near future... (That's what the LRU algorithm says...) > > Would it be possible to track the most recently used swapped out page? > This would possibly be a good candidate for speculative loading. Well, I think the 'more recently used swapped out' order relation is equivalent to the 'later swapped out' order relation if the kswapd uses the LRU algorithm. (if it does not, it has its reasons and we should respect them by simply using the 'later swapped out' order...) But I am not familiar with the linux swapping management so I don't know if it tracks this order in any structure. Perhaps there is a kind of 'last used timestamp' for each page and so for the swapped pages, too, wich could be used for my purpose. But as I sayed, I don't know... I hope there is anybody out there who can help me with this question... Thanks Thomas
From: Mark Mielke To: linux-kernel Subject: Re: An idea for prefetching swapped memory... Date: 2003-04-07 11:30:26 PST On Mon, Apr 07, 2003 at 01:24:41PM +0200, Måns Rullgård wrote: > Thomas Schlichter writes: > > > This has been argued before. Why would the last swapped out pages > > > be the best to swap in? The vm subsystem has (somehow) decided > > > they're the least likely to be used again so why swap them in? > > > Alternatively how would it know which to swap in instead? Con > > What I wanted to say is that if there is free memory it should be > > filled with the pages that were in use before the memory got > > rare. And these are the pages swapped out last. The other swapped > > out pages are swapped out even longer and so will likely not be used > > in the near future... (That's what the LRU algorithm says...) > Would it be possible to track the most recently used swapped out page? > This would possibly be a good candidate for speculative loading. Personally, I'm not sure that this idea sounds very effective. I _like_ the fact that after pages get swapped out, my RAM gets filled up with file pages with use. It means that although bringing a window that I haven't used in a while takes some time to load, my apache server, or my xterm, can serve files or requests like 'ls' much faster. If swap was automatically pulled in to replace my file pages, I suspect I would be trading one evil for another. mark
From: Helge Hafting To: linux-kernel Subject: Re: An idea for prefetching swapped memory... Date: 2003-04-07 06:30:21 PST Thomas Schlichter wrote: > What I wanted to say is that if there is free memory it should be filled with > the pages that were in use before the memory got rare. And these are the pages > swapped out last. Not necessarily. Memory isn't merely used to hold swappable stuff, it also caches files. Consider a small but io-intensive program. The stuff you want isn't necessarily the last swap (perhaps there even isn't anything swapped out) , it might be the last thing dropped from cache instead. And we can often predict better than "the last thing swapped/flushed" A bunch of free memory appearing could usually be better used for extra read-ahead, wether it is read-ahead of files/directories/bitmaps being accessed, or executable code faulted in from executables or swap devices. > The other swapped out pages are swapped out even longer and so > will likely not be used in the near future... (That's what the LRU algorithm > says...) "What we're going to need soon" is the best. It isn't always predictable, but sometimes. "The block following the last we read from some file/fs-structure" is often a good one though. Helge Hafting
From: Chris Friesen To: linux-kernel Subject: Re: An idea for prefetching swapped memory... Date: 2003-04-07 07:30:17 PST Helge Hafting wrote: > Thomas Schlichter wrote: > >> What I wanted to say is that if there is free memory it should be >> filled with >> the pages that were in use before the memory got rare. And these are >> the pages >> swapped out last. > "What we're going to need soon" is the best. It isn't always predictable, > but sometimes. "The block following the last we read from some > file/fs-structure" > is often a good one though. With the current setup though, the memory is wasted. It makes sense that we should fill the memory up with *something* that is likely to be useful. If I have mozilla open, start a kernel compile, and then come back half an hour later, I would like to see the mozilla pages speculatively loaded back into memory. Since the system is otherwise idle, it doesn't cost anything to do this. I think its obvious that it is beneficial to swap in something, the only trick is getting a decent heuristic as to what it should be. Chris
From: Jörn Engel To: linux-kernel Subject: Re: An idea for prefetching swapped memory... Date: 2003-04-07 07:50:08 PST On Mon, 7 April 2003 10:19:25 -0400, Chris Friesen wrote: > > With the current setup though, the memory is wasted. It makes sense that > we should fill the memory up with *something* that is likely to be useful. > > If I have mozilla open, start a kernel compile, and then come back half an > hour later, I would like to see the mozilla pages speculatively loaded back > into memory. > > Since the system is otherwise idle, it doesn't cost anything to do this. In the scenario above, it costs you a lot. The memory is completely used, else mozilla wouldn't get swapped out. If you swap it back in and get rid of fs cache, the next kernel (compile|grep|whatever) will be slower. And even in the original scenario, it will be expensive, depending on your machine. On a notebook, it costs you battery power, which is a limited resource, *for sure*. You *may* save user time, which *may* be a limited resource, but not always. But sure, it is a fun project to hack on, just go ahead and show the numbers. :) > I think its obvious that it is beneficial to swap in something, the only > trick is getting a decent heuristic as to what it should be. And when it should be done. ;) Jörn
From: Mark Mielke To: linux-kernel Subject: Re: An idea for prefetching swapped memory... Date: 2003-04-07 11:40:11 PST On Mon, Apr 07, 2003 at 10:19:25AM -0400, Chris Friesen wrote: > Helge Hafting wrote: > >"What we're going to need soon" is the best. It isn't always predictable, > >but sometimes. "The block following the last we read from some > >file/fs-structure" > >is often a good one though. > With the current setup though, the memory is wasted. It makes sense that > we should fill the memory up with *something* that is likely to be useful. > > If I have mozilla open, start a kernel compile, and then come back half an > hour later, I would like to see the mozilla pages speculatively loaded back > into memory. > > Since the system is otherwise idle, it doesn't cost anything to do this. I > think its obvious that it is beneficial to swap in something, the only > trick is getting a decent heuristic as to what it should be. Chris: Based on your usage patterns, how would Linux know that you were going to be opening up Mozilla, and not that you were going to tweak the kernel source and compile it again? The only time memory is wasted is when you don't have enough of it, and it gets trampled for common operations that you perform. All other times, the memory is loaded, because it was used, which means it might be used again. mark
From: Chris Friesen To: linux-kernel Subject: Re: An idea for prefetching swapped memory... Date: 2003-04-07 12:00:19 PST Mark Mielke wrote: > On Mon, Apr 07, 2003 at 10:19:25AM -0400, Chris Friesen wrote: > Chris: Based on your usage patterns, how would Linux know that you were > going to be opening up Mozilla, and not that you were going to tweak the > kernel source and compile it again? Because it would read my mind and figure out what I wanted! ;-) Maybe it would be possible to have some way to tell the kernel, "I would prefer this process to be in memory, unless you're running short, at which point you can swap it out." This would be very similar to the niceness value, except it would control what memory gets swapped out. You could tie it in to what processes have been running, such that if the system goes idle you could start preferentially swapping back in the processes with the memory niceness set. If you left it at zero you get the current behaviour (not swapped in until needed) while positive (or negative, to align with niceness) values would swap that process in preferentially when the system goes idle. This would give similar benefits as mlock without actually robbing the kernel of the ability to swap out under memory pressure. Does this sound at all useful, or am I blowing smoke? Chris
From: Robert White To: linux-kernel Subject: RE: An idea for prefetching swapped memory... Date: 2003-04-07 12:50:12 PST DISCLAIMER: Without having actually looked at the code... I would say that being able to mark a process or executable as a "should be speculatively reloaded" is... wait for it... a "very bad" idea. It would become far too easy for someone to configure a hugely anti-optimal system by just flagging some giant pig-dog program as "favorable for residency" and then have the system end up aggressively reloading parts of that program's data set that aren't even being used. Consider: you flag Mozilla and the system starts aggressively loading the composer and mail client (etc.) code while all you are doing is looking at a help file. Degenerate cases abound. [These issues are, BTW, why the use of the "text sticky bit" pretty much deprecated itself.] On the other hand, presuming for the moment that the VM system works something vaguely like the one in a Sun SVR4 system (because, remember, I haven't read the code 8-). That is, let's say there is a pointer traversing along through memory that looks at each page and considers it for writing out to swap. And there is another pointer that cycles through memory behind it and, if it hasn't been modified since the first pointer passed, it does the write-to-swap and then puts the page on the reclaim-or-overwrite list. When a process accesses a page, if it is normal then it is normal, if it is on the reclaim-or-overwrite list it reclaims its page, if it isn't on the list, the system takes the first page off the list and fills it with the swapped-in contents. Now lets change that list from a list to a priority queue.... It would be interesting to have the system keep track of page faults for each process and then make a ratio of Page_Faults/Program_Size (or maybe RSS?). The smaller this number is the higher its pages are on the priority queue. Now, programs that are experiencing a large amount of paging (because they are large and they are actively getting hit) will tend to have their pages preserved on the reclaim/overwrite list. That is, they are more likely to be able to reclaim their pages instead of having to swap them in. The nice parts: - Small programs that are being intensely used tend to stay in memory because of that use. (e.g. actively grep(ing) a file, not a large data set but the continuous use keeps its pages off the queue naturally. - Large, inactive programs tend to leave memory quickly. - Small, moderately inactive programs tend to profile competitively with larger active more-active programs (so the large active programs don't completely trample over their smaller kin.) - New (just initiated) programs will tend to profile themselves quickly, which will tend to let initialization time code and data subside gracefully. - As system run state evolves (people and processes come and go) the heuristic can keep up because the processes are only judged against one another. [ASIDE: The tracking might actually be better by "memory image" instead of "process" so that multi-threaded code will compete based on the sum of their threads activities...?] Rob.
From: Thomas Schlichter To: linux-kernel Subject: [RFC] first try for swap prefetch Date: 2003-04-10 17:47:58 Hi, as mentioned a few days ago I was going to try to implement a swap prefetch to better utilize the free memory. Now here is my first try. This version works only as a module and tests for free pagecahe memory in a interval specified as a module parameter. As I tested this I saw that many of the page reloads do not come from the swap space but from buffers that got moved away. I could easily save which buffers have been removed but I don't know how to read them back to the pagecache... An other thing I saw was that anywhere in the kernel there must be some code which always tries to hold some memory pages free, even if there are cached pages that just can be freed as they are not modified... Perhaps that code should be changed... I hope someone may give me some hints or show me obvious mistakes I made... ;-) Thank you! Best regards Thomas
[get the patch here]
From: Andrew Morton To: linux-kernel Subject: Re: [RFC] first try for swap prefetch Date: 2003-04-10 16:20:09 PST Thomas Schlichter wrote: > > Hi, > > as mentioned a few days ago I was going to try to implement a swap prefetch to > better utilize the free memory. Now here is my first try. That's surprisingly cute. Does it actually do anything noticeable? + swapped_entry = kmalloc(sizeof(*swapped_entry), GFP_ATOMIC); These guys will need a slab cache (not SLAB_HW_CACHE_ALIGNED) to save space. + swapped_entry = radix_tree_lookup(&swapped_root.tree, entry.val); + if(swapped_entry) { + list_del(&swapped_entry->list); + radix_tree_delete(&swapped_root.tree, entry.val); you can just do if (radix_tree_delete(...) != -ENOENT) list_del(...) + read_swap_cache_async(entry); What you want here is a way of telling if the disk(s) which back the swap are idle. We used to have that, but Hugh deleted it. It can be put back, but it's probably better to put a `last_read_request_time' and `last_write_request_time' into struct backing_dev_info. If nobody has used the disk in the past N milliseconds, then start the speculative swapin. It might make sense to poke the speculative swapin code in the page-freeing path too. And to put the speculatively-swapped-in pages at the tail of the inactive list (perhaps). But first-up, some demonstrated goodness is needed...
From: Thomas Schlichter To: linux-kernel Subject: Re: [RFC] first try for swap prefetch Date: 2003-04-11 05:16:42 PST On April 11, Andrew Morton wrote: > Thomas Schlichter wrote: > > Hi, > > > > as mentioned a few days ago I was going to try to implement a swap > > prefetch to better utilize the free memory. Now here is my first try. > > That's surprisingly cute. Does it actually do anything noticeable? Well, it fills free pagecache memory with swapped pages... ;-) But at the moment I can not 'feel' any real improvement... :-( I think the problem is that R/O pages are not written to swap space and so not prefetched with my patch. But I will look after it... > + swapped entry = kmalloc(sizeof(*swapped entry), GFP ATOMIC); > > These guys will need a slab cache (not SLAB HW CACHE ALIGNED) to save > space. OK, I'll do it. > + swapped entry = radix tree lookup(&swapped root.tree, entry.val); > + if(swapped entry) { > + list del(&swapped entry->list); > + radix tree delete(&swapped root.tree, entry.val); > > you can just do > > if (radix tree delete(...) != -ENOENT) > list del(...) > > + read swap cache async(entry); Sorry, but I think I can not. The list del() needs the value returned by radix tree lookup(), so I can not kick it... By the way, the only reason fo r the radix tree is to make this list del() not O(n) for searching the list.. . Do you know how expensive the radix tree lookup() is? O(1) or O(log(n))?? F or my shame I do not really know that data structure... :-( > What you want here is a way of telling if the disk(s) which back the swap > are idle. We used to have that, but Hugh deleted it. It can be put back, > but it's probably better to put a `last read request time' and > `last write request time' into struct backing dev info. If nobody has us ed > the disk in the past N milliseconds, then start the speculative swapin. That's good. I was looking for anything like that but didn't find anything fitting in the current sources... > It might make sense to poke the speculative swapin code in the page-freei ng > path too. I wanted to do this but don't know which function is the correct one for th is. But I will search harder... or can you give me a hint? > And to put the speculatively-swapped-in pages at the tail of the inactive > list (perhaps). This may be a good idea... > But first-up, some demonstrated goodness is needed... Yup, but currently it improves nothing very much, as stated above, I think first I should implement the R/O pages thing and investigete which part of the kernel works against my code and frees some pages after I just filled them... Thank you for helping me with your comments! Best regards Thomas
From: William Lee Irwin III To: linux-kernel Subject: Re: [RFC] first try for swap prefetch Date: 2003-04-11 05:20:13 PST On Fri, Apr 11, 2003 at 01:51:55PM +0200, Thomas Schlichter wrote: > Sorry, but I think I can not. The list_del() needs the value returned by > radix_tree_lookup(), so I can not kick it... By the way, the only reason for > the radix tree is to make this list_del() not O(n) for searching the list... > Do you know how expensive the radix_tree_lookup() is? O(1) or O(log(n))?? For > my shame I do not really know that data structure... :-( It's O(lg(keyspace)). This is regarded as constant by many. -- wli
From: Andrew Morton To: linux-kernel Subject: Re: [RFC] first try for swap prefetch Date: 2003-04-11 14:50:11 PST Thomas Schlichter wrote: > > > you can just do > > > > if (radix_tree_delete(...) != -ENOENT) > > list_del(...) > > > > + read_swap_cache_async(entry); > > Sorry, but I think I can not. The list_del() needs the value returned by > radix_tree_lookup(), so I can not kick it... OK, I'll change radix_tree_delete() to return the deleted object address if it was found, else NULL. That's a better API. > Do you know how expensive the radix_tree_lookup() is? O(1) or O(log(n))?? For > my shame I do not really know that data structure... :-( It is proportional to log_base_64(largest index which the tree has ever stored) log_base_64: because each node has 64 slots. Each time maxindex grows by a factor of 64 we need to introduce a new level. "largest index ever": because we do not (and cannot feasibly) reduce the height when items are removed. > > It might make sense to poke the speculative swapin code in the page-freeing > > path too. > > I wanted to do this but don't know which function is the correct one for this. > But I will search harder... or can you give me a hint? free_pages_bulk() would probably suit. diff -puN fs/nfs/write.c~radix_tree_delete-api-cleanup fs/nfs/write.c diff -puN lib/radix-tree.c~radix_tree_delete-api-cleanup lib/radix-tree.c --- 25/lib/radix-tree.c~radix_tree_delete-api-cleanup Fri Apr 11 14:30:30 2003 +++ 25-akpm/lib/radix-tree.c Fri Apr 11 14:30:30 2003 @@ -349,15 +349,18 @@ EXPORT_SYMBOL(radix_tree_gang_lookup); * @index: index key * * Remove the item at @index from the radix tree rooted at @root. + * + * Returns the address of the deleted item, or NULL if it was not present. */ -int radix_tree_delete(struct radix_tree_root *root, unsigned long index) +void *radix_tree_delete(struct radix_tree_root *root, unsigned long index) { struct radix_tree_path path[RADIX_TREE_MAX_PATH], *pathp = path; unsigned int height, shift; + void *ret = NULL; height = root->height; if (index > radix_tree_maxindex(height)) - return -ENOENT; + goto out; shift = (height-1) * RADIX_TREE_MAP_SHIFT; pathp->node = NULL; @@ -365,7 +368,7 @@ int radix_tree_delete(struct radix_tree_ while (height > 0) { if (*pathp->slot == NULL) - return -ENOENT; + goto out; pathp[1].node = *pathp[0].slot; pathp[1].slot = (struct radix_tree_node **) @@ -375,8 +378,9 @@ int radix_tree_delete(struct radix_tree_ height--; } - if (*pathp[0].slot == NULL) - return -ENOENT; + ret = *pathp[0].slot; + if (ret == NULL) + goto out; *pathp[0].slot = NULL; while (pathp[0].node && --pathp[0].node->count == 0) { @@ -387,8 +391,8 @@ int radix_tree_delete(struct radix_tree_ if (root->rnode == NULL) root->height = 0; /* Empty tree, we can reset the height */ - - return 0; +out: + return ret; } EXPORT_SYMBOL(radix_tree_delete); diff -puN mm/filemap.c~radix_tree_delete-api-cleanup mm/filemap.c diff -puN include/linux/radix-tree.h~radix_tree_delete-api-cleanup include/linux/radix-tree.h --- 25/include/linux/radix-tree.h~radix_tree_delete-api-cleanup Fri Apr 11 14:30:30 2003 +++ 25-akpm/include/linux/radix-tree.h Fri Apr 11 14:30:30 2003 @@ -43,7 +43,7 @@ do { extern int radix_tree_insert(struct radix_tree_root *, unsigned long, void *); extern void *radix_tree_lookup(struct radix_tree_root *, unsigned long); -extern int radix_tree_delete(struct radix_tree_root *, unsigned long); +extern void *radix_tree_delete(struct radix_tree_root *, unsigned long); extern unsigned int radix_tree_gang_lookup(struct radix_tree_root *root, void **results, unsigned long first_index, unsigned int max_items);
From: Thomas Schlichter To: linux-kernel Subject: Re: [RFC] first try for swap prefetch Date: 2003-04-11 22:10:09 PST On April 1, Andrew Morton wrote: > Thomas Schlichter wrote: > > > you can just do > > > > > > if (radix tree delete(...) != -ENOENT) > > > list del(...) > > > > > > + read swap cache async(entry); > > > > Sorry, but I think I can not. The list del() needs the value returned by > > radix tree lookup(), so I can not kick it... > > OK, I'll change radix tree delete() to return the deleted object address if > it was found, else NULL. That's a better API. That's right, I like it better that way, too! Thank you for the patch! > > Do you know how expensive the radix tree lookup() is? O(1) or O(log(n)) ?? > > For my shame I do not really know that data structure... :-( > > It is proportional to > > log base 64(largest index which the tree has ever stored) > > log base 64: because each node has 64 slots. Each time maxindex grows by a > factor of 64 we need to introduce a new level. > > "largest index ever": because we do not (and cannot feasibly) reduce the > height when items are removed. Thanks for the detailed answer. > > > It might make sense to poke the speculative swapin code in the > > > page-freeing path too. > > > > I wanted to do this but don't know which function is the correct one for > > this. But I will search harder... or can you give me a hint? > > free pages bulk() would probably suit. I don't think so, as this is part of the buddy allocator which controls the usage of the physical memory. Now I've implemented following: 1. Add an entry when a page is removed by the kswapd. 2. Remove the entry when the page is added to the page cache. 3. Remove the entry when the page is removed from the page cache. So with point 3 I cover the freeing of the pages. But as the kswapd calls t he function from 3, too, I do the 1st point after kswapd did do point3... To finish my second (and surely better) try I just need one more information... How can I get the file pointer for a buffered page with the information available in the kswapd (minly the page struct)?? This is very importand because, as described above, I extract the needed information for my prefetch daemon in the kswapd. My daemon needs the file pointer to be able to load the buffer pages with the page cache read() function from the mm/filemap.c file. I'm sorry if I bother you... Best regards Thomas
From: Andrew Morton To: linux-kernel Subject: Re: [RFC] first try for swap prefetch Date: 2003-04-11 22:40:16 PST Thomas Schlichter wrote: > > How can I get the file pointer for a buffered page with the information > available in the kswapd (minly the page struct)?? You can't, really. There can be any number of file*'s pointing at an inode. The pagefault handler will find it by find_vma(faulting_address)->vm_file. Other codepaths use syscalls, and the user passed the file* in. You can call page_cache_readahead() with a NULL file*. That'll mostly work except for the odd filesytem like NFS which will oops. But it's good enough for testing and development. Or you could cook up a local file struct along the lines of fs/nfsd/vfs.c:nfsd_read(), but I would not like to lead a young person that way ;)
From: Thomas Schlichter To: linux-kernel Subject: [RFC] second try for swap prefetch (does Oops!) Date: 2003-04-17 16:02:13 Hi, in the patch attached I improved the memory usage for my data structures by using a kmem_cache. Also I do not use a single radix-tree for the pointers to the list anymore but every mapping gets its own... So I should be able to prefetch not only from the swap space but from other disk-places, too. But exactly this does not work and I need some help with this... If I do add pages from the swaper_space mapping to the prefetch list everything works perfectly. But as soon as I add all pages with a mapping to the list I get the Oops attached... :-( It happens in the radix_tree_delete call from the swap_prefetch work handler. So I think I access an invalid (perhaps not initialized?) radix tree... But why I wonder is that this entry was properly inserted to the tree, because else it would never had been inserted to the list! So I am only very confused..! Thanks for your help! Thomas Schlichter On April 12, Andrew Morton wrote: > Thomas Schlichter wrote: > > How can I get the file pointer for a buffered page with the information > > available in the kswapd (minly the page struct)?? > > You can't, really. There can be any number of file*'s pointing at an > inode. OK, I understand... > The pagefault handler will find it by find_vma(faulting_address)->vm_file. > Other codepaths use syscalls, and the user passed the file* in. > > You can call page_cache_readahead() with a NULL file*. That'll mostly work > except for the odd filesytem like NFS which will oops. But it's good > enough for testing and development. That's the way I try it now... ;-) > Or you could cook up a local file struct along the lines of > fs/nfsd/vfs.c:nfsd_read(), but I would not like to lead a young person > that way ;) Thx... ;-)
[patch here]
From: Thomas Schlichter To: linux-kernel Subject: [RFC] Page prefetch ver. 0.1 for 2.5.68-mm2 Date: 2003-04-28 20:12:16 This is a working version of the idea posted on the LKML a few weeks ago... Currently it only works when loaded as a module. Then a kernel thread 'kprefetchd' is started which prefetches swap and buffer pages when there is free buffer memory. When the module is unloaded the kernel thread is stopped again. =46or me it works as expected and has no noticeable negative impact. But some benchmarks should be performed to ensure this... Besides from the problem that this works only as a module, yet, there are a few points which could be improved when I've got some spare time... =46or example I should take care of the disk I/O usage before prefetching. The list of swapped pages is growing too much, too. It can be seen be doing a grep swapped_entry /proc/slabinfo An other point is that it doesn't work on NFS file systems as I set the file pointer to NULL for do_page_chache_readahead(). And I'm open for any other further improvements. I hope someone else likes that patch as much as I do... ;-) Best regards Thomas Schlichter P.S.: For those who want it I've got a patch that applies on vanilla 2.5.68, too...
[patch here]
Useful?
Tough call to say whether this is really useful. Many of the pages allocated when starting, say, mozilla are never used again. These are the likely candidates to be swapped out under VM pressure, and there is no point swapping them back in and wasting good free physical RAM that could be used for something new or file cache. To me this would seem the majority of time. However if you're under extreme VM pressure, then the useful pages will also be swapped out, and swapping these back in would seem to be helpful. On balance it seems they would be better left on swap, assuming the VM has made the right choice in the first place. I've been toying with adding this to my autoregulated 2.4 based VM for a while now and don't think it's going to be useful. I guess there's no harm in trying.
I think so
If you swap in pages in LIFO order, then you are leveraging the page replacement algorithm to choose the most likely non resident page to be used. This is a good thing.
You also add the pages to the end of the inactive list, so they should be the first to be evicted if needed.
Only performing prefetching when there is plenty of free memory and the disk is idle means it can be a very minor cost.
maybe not quite related...
...but it bothers me to see Linux, while being idle, and while having memory that's positively free, keeping those pages in swap. It happened to me many times. It clearly hurts the performance on desktops.
If there is free memory, swap pages should be sucked in.
Need ESP.
What you have, though, is the unsolved problem of predicting which
pages to swap back in. If the number of swapped pages is greater than the number of available pages in RAM, pulling the wrong pages from disk is worse than leaving them there.
On the plus side, you don't need to reswap these pages. If memory pressure does arise, you can simply drop them. You do incur the CPU cost for the decision of deciding (again) whether to drop the pages and which ones to drop.
Personally, I'd prefer to see the background cycles spent on disk defragmentation and swap defragmentation. That way, when I do decide
I need something, it comes in off the disk quickly. Ideally, any sort of "proactive swapin" should live as a userspace daemon that balances this cost against other factors such as system load and disk defrag. It can even build a model of application use (say, by time of day or recent application activity) in order to 'guess' what to bring in.
--Joe
Heuristics don't need ESP!
By reading in the last page swapped out, you make use of the VM's page replacement algorithm (which would also require ESP if it were perfect). Pulling the wrong pages from disk when the disk is idle and there is plenty of free memory shouldn't be much worse than leaving them there.
One issue I see is that discarding of disk buffers (including file backed executable) under memory pressure could cause the exact same stalling problem as swapping out anonymous memory.
The added bonus of your swap defragmentation idea is that you would also be prereading the swap (in a way).
How about no swap at all
Why not just do away with swap altogether? Memory is cheap now. Someone could have 256MB or more of RAM and no swap and their system would be fast since it's not trying to send memory to and from disk.
Re: How about no swap at all
Buying more RAM already works like that, the system won't bother swapping if it never runs out of physical RAM. To counter though, applications often use more RAM than one might think, X is using 27MB of ram on my box, but has 109MB in swap, the idea is that while X may need some huge amount of ram in general, it doesn't need all of that in physical RAM at once. Also, more than just programs use memory, when reading through the posts above, you'll see mention of disk caching being in ram as well...So I have 512MB and it rarely feels like I'm running out.
Re: How about no swap at all
While this is true for most operating systems, it is most certainly not true for Linux. Linux is extremely (and I think unnecessarily) aggressive about swapping pages out.
Here is some top output after a day of surfing the web and browsing usenet:
Mem: 515200K av, 483888K used, 31312K free, 0K shrd, 23100K buff
Swap: 530104K av, 97352K used, 432752K free 243880K cached
Why is 100MBs of memory swapped out to disk, and 250MBs used for caching? It just doesn't make sense to me. When I switch virtual desktops after browsing usenet for a while, mozilla will take several seconds to read its current state out of swap space. Why did Linux swap that memory out in the first place, when 2/3s of my memory is being used for caching and buffering?
That's not actually true
Data which is mapped in from files on disk can be dropped and may need to be re-read if used. Executables (programs and shared libraries) are treated in this way. Providing swap will (given a reasonable VM implementation) allow unused data to stay on disk and executables which are in use to remain in memory and therefore allow the system to run faster.
Linux 2.5 has a VM "swappiness" variable to control the balance between using swap and throwing away executable data. If you set swappiness to zero, then the kernel is very unlikely to use swap, but will still have it available if you run out of memory. This is unlikely to be the fastest (highest throughput) configuration, but may reduce the time taken for your desktop to start working after a big compile job or whatever and therefore seem to be faster.
Why are we discussing this AGAIN??
Why are we discussing this again? Arent we supposed to be discussing about what to do with "swapped pages"?? Coz, i thought this point is obviously not required to be discussed...!!!
anyways, one major point that a few mails on the lkml said that when you have the so called "free" mem, and have swapped out pages on disk, you have not one but two candidates for that memory!!
1. The swapped out pages
2. The pages which are "currenly" in mem at that position. This could be cached ls, directory structures, whatever.
How does the system decide whether you are going to open your long inactive Netscape Navigator, or perform an 'ls -R ' again ?? Thats why many have adviced that this although is a good thought, it doesnt have a very good overall effect on the system...
It's a reasonable idea to consider.
The issue is hysteresis. After huge file buffer pressure, the balance is obviously tilted too far towards file cache afterwards. This is an attempt to rebalance this split during idle periods.
One heuristic I'd think would be useful is to have a target % of RAM for buffers that I'd 'protect' from idle swapin. Anything over that percentage is candidate for purging via idle swapin. Thus, you can balance out between '1' and '2' in your list.
link to my shop
Hi, just popped in here through a random link. Hi, firstly I'd like to say your site is great and very impressive. Enjoyed the reading.