Hello all,
I am using Debian Sid on a Toshiba Satellite A100 laptop. After testing
2.6.35 for a while, I noticed that sometimes my hibernation attempts
would fail. I should say that I never had such a problem before 2.6.35.
The hibernation process hangs with 2.6.35 after printing the following:
=== 8< ===
...
Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
PM: Preallocating image memory...
=== >8 ===
After a short investigation, I found out that this only happens when my
tmpfs filesystem on /tmp had a lot of data in it. When my tmpfs is empty,
I have no problems.
So I wrote a short script which fills up the tmpfs on /tmp and tries to
hibernate, and I bisected the kernel using this script.
The end result is that the following commit causes this regression:
=== 8< ===
commit bb21c7ce18eff8e6e7877ca1d06c6db719376e3c
Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Fri Jun 4 14:15:05 2010 -0700
vmscan: fix do_try_to_free_pages() return value when priority==0 reclaim failure
...
=== >8 ===
I have run 2.6.35-rc6, 2.6.35 and 2.6.35.1 with this commit reverted,
and I am happy to say that I haven't experienced any problems for at
least 17 days.
It looks like this change was included with 2.6.35-rc1. I am sorry
for not testing earlier.
I am willing to do testing in case anyone would like me to try patches.
Regards,
M. Vefa Bicakci
--
Wow. I'm very surprised this report because 1) the above commit changed do_try_to_free_pages() return value 2) but current hibernation code is ignoring this return value. Hmm... I have to investigate this very interesting issue. Thanks this report, and please give me a bit time. --
Hmm... I've tested hibernation case for a while. but I have no luck. I couldn't reproduce your issue. Very sorry. Can you please help our debugging? If possible, I hope to run following three test. 1. Please let me know your machine & test script % cat /proc/meminfo % cat /proc/vmstat % cat /proc/zoneinfo % df % cat your-fills-up-the-tmpfs-script 2. call shrink_all_memory() forcely and show result % cat /proc/meminfo % cat /proc/zoneinfo # echo 1 > /proc/sys/vm/shrink_all_memory # tail /var/log/messages % cat /proc/meminfo % cat /proc/zoneinfo 3. reset zone_reclaim_stat and rerun shrink_all_memory # echo 1 > /proc/sys/vm/reset_reclaim_stat % cat /proc/meminfo % cat /proc/zoneinfo # echo 1 > /proc/sys/vm/shrink_all_memory # tail /var/log/messages % cat /proc/meminfo % cat /proc/zoneinfo
First of all, thanks a lot for spending time on this regression I have been experiencing. I really appreciate it. Sorry to hear that you weren't able to reproduce the issue. Well the good (or bad?) news is that I am able to reproduce it with 2.6.35.3 with your patches applied. I should note that after applying your patches and trying a hibernation with a full tmpfs, a printk prints extra information on the screen just before the hibernation process hangs. The last time I ran it, it printed: === 8< === shrink_all_memory: req: 342067 reclaimed: 27062 free: 340221 === >8 === A piece of information that may be relevant or irrelevant is that my swap space is on a dm-crypt volume. Appended are the results of the tests you asked me to carry out. If you'd like, I can send in private a tarball containing this information in separate files. Once again, thanks a lot for helping out. Please note that I filled up the tmpfs filesystem between step 1 MemTotal: 3104484 kB MemFree: 2817616 kB Buffers: 31156 kB Cached: 142124 kB SwapCached: 0 kB Active: 116464 kB Inactive: 137424 kB Active(anon): 80852 kB Inactive(anon): 24820 kB Active(file): 35612 kB Inactive(file): 112604 kB Unevictable: 32 kB Mlocked: 32 kB HighTotal: 2226632 kB HighFree: 1994008 kB LowTotal: 877852 kB LowFree: 823608 kB SwapTotal: 1999540 kB SwapFree: 1999540 kB Dirty: 116 kB Writeback: 0 kB AnonPages: 80636 kB Mapped: 43768 kB Shmem: 25068 kB Slab: 15120 kB SReclaimable: 7516 kB SUnreclaim: 7604 kB KernelStack: 1856 kB PageTables: 2420 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 3551780 kB Committed_AS: 337784 kB VmallocTotal: 122880 kB VmallocUsed: 16308 ...
!! Your swap partition is smaller than physical memory. As far as I know, swap partition need physcal-mem x 2 size. Can you please try to change swap configuration? Rafael, please correct me if I'm talking wrong. Thanks. --
[Please see my reply in the other thread.] --
No, we only save 50% of RAM (at most) during hibernation. Thanks, Rafael --
Hello, No problem. I did the tests again according to your new instructions, and I am appending the results. Regards, MemTotal: 3104484 kB MemFree: 1258780 kB Buffers: 30820 kB Cached: 1693836 kB SwapCached: 0 kB Active: 1670344 kB Inactive: 137960 kB Active(anon): 1633940 kB Inactive(anon): 26224 kB Active(file): 36404 kB Inactive(file): 111736 kB Unevictable: 32 kB Mlocked: 32 kB HighTotal: 2226632 kB HighFree: 437684 kB LowTotal: 877852 kB LowFree: 821096 kB SwapTotal: 1999540 kB SwapFree: 1999540 kB Dirty: 28 kB Writeback: 0 kB AnonPages: 83676 kB Mapped: 44220 kB Shmem: 1576520 kB Slab: 17136 kB SReclaimable: 9480 kB SUnreclaim: 7656 kB KernelStack: 1832 kB PageTables: 2440 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 3551780 kB Committed_AS: 1892636 kB VmallocTotal: 122880 kB VmallocUsed: 16308 kB VmallocChunk: 92764 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 4096 kB DirectMap4k: 24568 kB nr_free_pages 314850 nr_inactive_anon 6480 nr_active_anon 408485 nr_inactive_file 27935 nr_active_file 9101 nr_unevictable 8 nr_mlock 8 nr_anon_pages 20919 nr_mapped 11055 nr_file_pages 431089 nr_dirty 8 nr_writeback 0 nr_slab_reclaimable 2370 nr_slab_unreclaimable 1914 nr_page_table_pages 610 nr_kernel_stack 229 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 394054 pgpgin 150719 pgpgout 5312 pswpin 0 pswpout 0 pgalloc_dma 2 pgalloc_normal 116354 pgalloc_high 2637682 pgalloc_movable 0 pgfree 3069531 pgactivate 2339016 pgdeactivate 160 pgfault 699155 pgmajfault ...
tmpfs files are using about 1.5GB memory. My code doesn't makes hang! Hmm... Hmm... To be honest, I have no idea why your hang was happen. 1) zone normal is not used. your system don't need additional reclaim at all. 2) reclaim logic seems to doesn't makes hang. Can you please try following additional test? # echo 8 > /proc/sysrq-trigger # echo disk > /sys/power/state --
Can you please try to avoid to use /tmp. As I said, mount -t tmpfs none /mnt/another_tmpfs dd if=/dev/zero of=/mnt/another_tmpfs/tmp bs=1024k count=1600 shred -vn1 /mnt/another_tmpfs/tmp That said, We need to know you issue is lots-anon-pages issue or filesystem-full issue. --
(Sorry, I am sending this again as I forgot to include LKML in the CC.) Hello, I have changed my configuration so that I have a 8 gigabyte large swap partition, and I have also made the changes you suggested to my hibernation/fill-tmpfs script so that a tmpfs other than /tmp is used. Unfortunately, nothing changed. I still get hangs after a few lines are printed to the console. The last two lines are from your patch. Here is an observation from an actual test which ended with a hang: === 8< === PM: Preallocating image memory shrink_all_memory: start shrink_all_memory: req: 375019 reclaimed: 48055 free: 326810 === >8 === One thing I should note is that, before your commit, I never had any problems even though my swap size was not two times the size of my physical memory and even though I used tmpfs as /tmp. If there is anything I can do to debug this problem please let me know. Regards, M. Vefa Bicakci --
Hello! First of all, thanks a lot for your help - I really appreciate it. I applied your new patches on top of your old patches. Hopefully that was okay. Unfortunately, it didn't work this time. Here's a sample output from the new patch. === 8< === [58.050208] PM: Preallocating image memory... [58.159881] shrink_all_memory start [58.232411] PM: shrink memory: pass=1, req:312373 reclaimed:15864 free:358420 [58.342041] PM: shrink memory: pass=2, req:296509 reclaimed:21837 free:362167 [60.690035] PM: shrink memory: pass=3, req:274672 reclaimed:25982 free:348006 [61.754931] PM: shrink memory: pass=4, req:248690 reclaimed:49623 free:371589 [64.361714] PM: shrink memory: pass=5, req:199067 reclaimed:74683 free:396695 [64.361769] shrink_all_memory: req:124384 reclaimed:74683 free:396695 === >8 === The interesting thing is that even though there is a lot of free memory at the end, it still hangs. I also included the timestamps; note the one and two second delays between the passes. Please let me know if there is anything I can do. Regards, M. Vefa Bicakci --
Grr. I'm surprised this result ;-) shrink_all_memory() finish to shrink memory successfully. but your system still hang immediately after. I have no idea why this mysterious occur. I prepared next debugging patch. It added prenty debug printk. I hope it enlighten up which path makes system hang-up. 1. apply my new patch 2. Enable following PM debug option in Kconfig [*] Power Management support [*] Power Management Debug Support [*] Extra PM attributes in sysfs for low-level debugging/testing [*] Verbose Power Management debugging 3. append following kernel boot option into grub configration file no_console_suspend=1 3. kernel build and reboot 4. some prepare # echo 8 > /proc/sysrq-trigger # cd /sys/power # echo 1 > pm_trace # echo 0 > pm_async This is expected result because tmpfs shrink need swap-out. then Please send me your .config and full dmesg. Thanks many and many help us!
Hello, I have followed your instructions, with one exception: I have also enabled CONFIG_PM_TRACE so that I would have /sys/power/pm_trace. This time I had some more output, as expected. I double checked what I typed while looking at the screen-shot I took with my camera. Here's the output: === 8< === PM: Marking nosave pages: ...0009f000 - ...000100000 PM: basic memory bitmaps created PM: Syncing filesystems ... done Freezing user space processes ... (elapsed 0.01 seconds) done. Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done. PM: Preallocating image memory... shrink_all_memory start PM: shrink memory: pass=1, req:310171 reclaimed:15492 free:360936 PM: shrink memory: pass=2, req:294679 reclaimed:28864 free:373981 PM: shrink memory: pass=3, req:265815 reclaimed:60311 free:405374 PM: shrink memory: pass=4, req:205504 reclaimed:97870 free:443024 PM: shrink memory: pass=5, req:107634 reclaimed:146948 free:492141 shrink_all_memory: req:107634 reclaimed:146948 free:492141 PM: preallocate_image_highmem 556658 278329 PM: preallocate_image_memory 103139 103139 PM: preallocate_highmem_fraction 183908 556658 760831 -> 183908 === >8 === According to your patch, the next output should have been "preallocate_image_memory ...", but it never gets printed, so the hang point should be that function. I am attaching my dmesg output which I got after the failed hibernation attempt and my .config file. Please note that the attached .config file is a trimmed version of the .config I usually use on my computer. I trimmed it so that it compiles faster, but (mostly) has support for devices I might use. Thanks a lot for your help, and please let me know if I can do anything else. Regards, M. Vefa Bicakci
Great! I've attached more verbose debug message patch and trial bug fixing patch.
Oops, please apply attached patch instead 0002-add-gfp_noretry.patch. Thanks.
Hello! I have applied the patches you mentioned, and rebuilt and tested the 2.6.35.4 kernel. I am really happy to say that your patches (cumulatively) fixed the issue! Unfortunately, because the hibernation is rather quick, I am having a hard time getting screen-shots with my camera. If you would like, I can try to put some sleeps around the code so that I can get the output for you. For the record, the attached patch is the cumulative version of all of your patches. It applies cleanly to 2.6.35.4, and most importantly, it fixes the issue. All in all, thanks a lot! Is there anything else I can do? Would you like me to try a trimmed version of your patch, maybe without the debugging parts and the 5-pass swap-out procedure, which I am not sure is essential or not? Thanks again, M. Vefa Bicakci
Rafael, this log mean hibernate_preallocate_memory() has a bug. It allocate memory as following order. 1. preallocate_image_highmem() (i.e. __GFP_HIGHMEM) 2. preallocate_image_memory() (i.e. GFP_KERNEL) 3. preallocate_highmem_fraction (i.e. __GFP_HIGHMEM) 4. preallocate_image_memory() (i.e. GFP_KERNEL) But, please imazine following scenario (as Vefa's scenario). - system has 3GB memory. 1GB is normal. 2GB is highmem. - all normal memory is free - 1.5GB memory of highmem are used for tmpfs. rest 500MB is free. At that time, hibernate_preallocate_memory() works as following. 1. call preallocate_image_highmem(1GB) 2. call preallocate_image_memory(500M) total 1.5GB allocated 3. call preallocate_highmem_fraction(660M) total 2.2GB allocated then, all of normal zone memory was exhaust. next preallocate_image_memory() makes OOM, and oom_killer_disabled makes infinite loop. (oom_killer_disabled careless is vmscan bug. I'll fix it soon) The problem is, alloc_pages(__GFP_HIGHMEM) -> alloc_pages(GFP_KERNEL) is wrong order. alloc_pages(__GFP_HIGHMEM) may allocate page from lower zone. then, next alloc_pages(GFP_KERNEL) lead to OOM. Please consider alloc_pages(GFP_KERNEL) -> alloc_pages(__GFP_HIGHMEM) order. Even though vmscan fix can avoid infinite loop, OOM situation might makes big slow down on highmem machine. It seems no good. Thanks. --
So, it looks like the problem will go away if we check if there are any normal
pages to allocate from before calling the last preallocate_image_memory()?
There's a problem with the ordering change that it wouldn't be clear how many
pages to request from the normal zone in step 1 and 3.
Thanks,
Rafael
---
kernel/power/snapshot.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1259,7 +1259,7 @@ int hibernate_preallocate_memory(void)
{
struct zone *zone;
unsigned long saveable, size, max_size, count, highmem, pages = 0;
- unsigned long alloc, save_highmem, pages_highmem;
+ unsigned long alloc, save_highmem, pages_highmem, size_normal;
struct timeval start, stop;
int error;
@@ -1296,6 +1296,7 @@ int hibernate_preallocate_memory(void)
else
count += zone_page_state(zone, NR_FREE_PAGES);
}
+ size_normal = count;
count += highmem;
count -= totalreserve_pages;
@@ -1344,7 +1345,13 @@ int hibernate_preallocate_memory(void)
size = preallocate_highmem_fraction(size, highmem, count);
pages_highmem += size;
alloc -= size;
- pages += preallocate_image_memory(alloc);
+ /* Check if there are any non-highmem pages to allocate from. */
+ if (alloc_normal < size_normal) {
+ size_normal -= alloc_normal;
+ if (alloc > size_normal)
+ alloc = size_normal;
+ pages += preallocate_image_memory(alloc);
+ }
pages += pages_highmem;
/*
--
Looks like fine. but I have one question. hibernate_preallocate_memory() call preallocate_image_memory() two times. Why do you only care latter one? ok, I see. thanks for good correction my mistake. --
The first one is mandatory, ie. if we can't allocate the requested number of pages at this point, we fail the entire hibernation. In that case the performance hit doesn't matter. Thanks, Rafael --
IOW, your patch at http://lkml.org/lkml/2010/9/2/262 is still necessary to protect against the infinite loop in that case. Thanks, Rafael --
As far as I understand, we need distinguish two allocation failure. 1) failure because no enough memory -> yes, hibernation should fail 2) failure because already allocated enough lower zone memory -> why should we fail? If the system has a lot of memory, scenario (2) is happen frequently than (1). I think we need check alloc_highmem and alloc_normal variable and call preallocate_image_highmem() again instead preallocate_image_memory() if we've alread allocated enough lots normal memory. nit? --
Actually I thought about that, but we don't really see hibernation fail for this reason. In all of the tests I carried out the requested 50% of highmem had been allocated before allocations from the normal zone started to be made, even if highmem was 100% full at that point. So this appears to be a theoretical issue and covering it would require us to change the algorithm entirely (eg. it doesn't make sense to call preallocate_highmem_fraction() down the road if that happens). Thanks, Rafael --
ok, thanks. probably I've catched your point. please feel free to use my reviewed-by for your fix. thanks. --
Thanks.
In the meantime, though, I prepared a patch that should address the issue
entirely. The patch is appended and if it looks good to you, I'd rather use it
instead of the previous one (it is still untested).
Rafael
---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM / Hibernate: Avoid hitting OOM during preallocation of memory
There is a problem in hibernate_preallocate_memory() that it calls
preallocate_image_memory() with an argument that may be greater than
the number of available non-highmem memory pages. This may trigger
the OOM condition which in turn can cause significant slowdown to
occur.
To avoid that, modify preallocate_image_memory() so that it checks
if there is a sufficient number of non-highmem pages to allocate from
before calling preallocate_image_pages() and change
hibernate_preallocate_memory() to try to allocate from highmem if
the number of pages allocated by preallocate_image_memory() is too
low.
Adjust free_unnecessary_pages() to take all possible memory
allocation patterns into account.
Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
kernel/power/snapshot.c | 66 +++++++++++++++++++++++++++++++++---------------
1 file changed, 46 insertions(+), 20 deletions(-)
Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1122,9 +1122,19 @@ static unsigned long preallocate_image_p
return nr_alloc;
}
-static unsigned long preallocate_image_memory(unsigned long nr_pages)
+static unsigned long preallocate_image_memory(unsigned long nr_pages,
+ unsigned long size_normal)
{
- return preallocate_image_pages(nr_pages, GFP_IMAGE);
+ unsigned long alloc;
+
+ if (size_normal <= alloc_normal)
+ return 0;
+
+ alloc = size_normal - alloc_normal;
+ if (nr_pages < alloc)
+ alloc = nr_pages;
+
+ return ...Yeah, this one looks nicer to me :) Thanks, rafael! --
Dear Rafael Wysocki, Kosaki Motohiro and Minchan Kim, Upon Kosaki Motohiro's kind request via an off-list e-mail, I tested the following two patches separately with a vanilla 2.6.35.4 kernel: Patch 1: http://lkml.org/lkml/2010/9/5/86 Patch 2: http://kerneltrap.org/mailarchive/linux-kernel/2010/9/4/4615426 The first of these was prepared by Minchan Kim, and it fixes the issue; i.e. no hangs during hibernation with a full tmpfs. However, the second patch, prepared by Rafael Wysocki, does *not* fix the problem. I still experience hangs with a full tmpfs upon hibernation. As always, I am willing to test newer patches and help in debugging this issue. I really appreciate all of your help, M. Vefa Bicakci --
What happens if you apply them both at the same time? Thanks, Rafael --
Hello, When I apply both of the patches, then I don't get any hangs with hibernation. However, I do get another problem, which I am not sure is related or not. I should note that I haven't experienced this with only the vmscan.c patch, but maybe I haven't repeated my test enough times. One test consists of an automated run of 7 hibernate/thaw cycles. Here's what I got in dmesg in two of the iterations in one test. Sorry for the long e-mail and the long lines. === 8< === [ 166.512085] PM: Hibernation mode set to 'reboot' [ 166.516503] PM: Marking nosave pages: 000000000009f000 - 0000000000100000 [ 166.517654] PM: Basic memory bitmaps created [ 166.518781] PM: Syncing filesystems ... done. [ 166.546308] Freezing user space processes ... (elapsed 0.01 seconds) done. [ 166.559596] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done. [ 166.571649] PM: Preallocating image memory... [ 185.712457] iwl3945: page allocation failure. order:0, mode:0xd0 [ 185.714564] Pid: 1225, comm: iwl3945 Not tainted 2.6.35.4-test-mm5v2-vmscan+snapshot-dirty #7 [ 185.715741] Call Trace: [ 185.716853] [<c019aa67>] ? __alloc_pages_nodemask+0x577/0x630 [ 185.718126] [<f8a562c5>] ? iwl3945_rx_allocate+0x75/0x240 [iwl3945] [ 185.719379] [<c03f0516>] ? schedule+0x356/0x730 [ 185.720556] [<f8a56d50>] ? iwl3945_rx_replenish+0x20/0x50 [iwl3945] [ 185.721914] [<f8a56dbc>] ? iwl3945_bg_rx_replenish+0x3c/0x50 [iwl3945] [ 185.723929] [<c014b167>] ? worker_thread+0x117/0x1f0 [ 185.725745] [<f8a56d80>] ? iwl3945_bg_rx_replenish+0x0/0x50 [iwl3945] [ 185.727097] [<c014ebd0>] ? autoremove_wake_function+0x0/0x40 [ 185.728468] [<c014b050>] ? worker_thread+0x0/0x1f0 [ 185.730235] [<c014e854>] ? kthread+0x74/0x80 [ 185.731601] [<c014e7e0>] ? kthread+0x0/0x80 [ 185.732919] [<c0103cb6>] ? kernel_thread_helper+0x6/0x10 [ 185.734851] Mem-Info: [ 185.736144] DMA per-cpu: [ 185.737439] CPU 0: hi: 0, btch: 1 usd: 0 [ 185.738635] CPU 1: hi: 0, btch: ...
Hm, interesting. Rafael's patch seems works intentionally. preallocate much much memory and release over allocated memory. But on your system, iwl3945 allocate memory concurrently. If it try to allocate before the hibernation code release extra memory, It may get allocation failure. So, I'm not sure wich behavior is desired. 1) preallocate enough much memory pros) hibernate faster cons) failure risk of network card memory allocation 2) preallocate small memory pros) hibernate slower cons) don't makes network card memory allocation But, I wonder why this kernel thread is not frozen. afaik, hibernation doesn't need network capability. Is this really intentional? Rafael, Could you please explain the design of hibernation and your intention? Vefa, note: this allocation failure doesn't makes any problem. this mean network card can't receive one network packet. But while hibernation, we always can't receive network patchet. so no problem. --
It's a kernel thread, we don't freeze them by default, only the ones that directly request to be frozen. BTW, please note that the card probably allocates from normal zone and that The design of the preallocator is pretty straightforward. First, if there's already enough free memory to make a copy of all memory in use, we simply allocate as much memory as needed for that copy and return (the size >= saveable condition). Next, we preallocate as much memory as to accommodate the largest possible image. A little more than 50% of RAM is preallocated in this step (this causes some pages that were in use before to be freed, so the resulting image size is a little below 50% of RAM). Next, there is the sysfs file /sys/power/image_size that represents the user's desired size of the image. If this number is much less than 50% of RAM, we do our best to force the mm subsystem to free more pages so that the resulting image size is possibly close to the desired one. So, I guess, if Vefa writes a greater number into /sys/power/image_size (this is in bytes), the problems should go away. :-) Still, I see a way to improve things in my patch. Namely, I guess the number returned by minimum_image_size() may also be regarded as the number of non-highmem pages we can't free with good approximation. Thus the second argument of preallocate_image_memory() should be size_normal - "the number returned by minimum_image_size()". [BTW, there seems to be a bug in minimum_image_size(), because if saveable < size, this means that the minimum image size is equal to saveable rather than 0. This shouldn't happen, though.] Vefa, can you please test the patch below with and without the patch at http://lkml.org/lkml/2010/9/5/86 (please don't try to change /sys/power/image_size yet)? Thanks, Rafael --- kernel/power/snapshot.c | 75 +++++++++++++++++++++++++++++++++++------------- 1 file changed, 55 insertions(+), 20 deletions(-) Index: ...
Dear Rafael Wysocki, I applied the patch below to a clean 2.6.35.4 tree and tested 6 hibernate/thaw cycles consecutively. I am happy to report that it works properly. Then I applied the patch at http://lkml.org/lkml/2010/9/5/86 (the "vmscan.c patch") on top of the tree I used above, and I also ran 6 hibernate/thaw cycles. Again, I am happy to report that this combination of patches also works properly. I should note a few things though, 1) I don't think I ever changed /sys/power/image_size, so we can rule out the possibility of that option changing the results. 2) With the patch below, for the *first* hibernation operation, the computer enters a "thoughtful" state without any disk activity for 6-8 (maybe 10) seconds after printing "Preallocating image memory". It works properly after the wait however. 3) For some reason, with the patch below by itself, or in combination with the above-mentioned vmscan.c patch, I haven't seen any page allocation errors regarding the iwl3945 driver. To be honest I am not sure why this change occurred, but I think you might know. 4) I made sure that I was not being impatient with the previous snapshot.c patch, so I tested that on its own once again, and I confirmed that hibernation hangs with the older version of the snapshot.c patch. I am very happy that we are getting closer to a solution. Please let me know if there is anything I need to test further. Regards, --
That probably is a result of spending time in the memory allocator trying to I think we just keep enough free pages in the normal zone all the time for the Below is the patch I'd like to apply. It should work just like the previous one (there are a few fixes that shouldn't affect the functionality in it), but please test it if you can. I think the slowdown you saw in 2) may be eliminated by increasing the image_size value, so I'm going to prepare a patch that will compute the value automatically during boot so that it's approximately 50% of RAM. Thanks, Rafael --- From: Rafael J. Wysocki <rjw@sisk.pl> Subject: PM / Hibernate: Avoid hitting OOM during preallocation of memory There is a problem in hibernate_preallocate_memory() that it calls preallocate_image_memory() with an argument that may be greater than the total number of available non-highmem memory pages. If that's the case, the OOM condition is guaranteed to trigger, which in turn can cause significant slowdown to occur during hibernation. To avoid that, make preallocate_image_memory() adjust its argument before calling preallocate_image_pages(), so that the total number of saveable non-highem pages left is not less than the minimum size of a hibernation image. Change hibernate_preallocate_memory() to try to allocate from highmem if the number of pages allocated by preallocate_image_memory() is too low. Modify free_unnecessary_pages() to take all possible memory allocation patterns into account. Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> --- kernel/power/snapshot.c | 85 ++++++++++++++++++++++++++++++++++++------------ 1 file changed, 65 insertions(+), 20 deletions(-) Index: linux-2.6/kernel/power/snapshot.c =================================================================== --- linux-2.6.orig/kernel/power/snapshot.c +++ linux-2.6/kernel/power/snapshot.c @@ -1122,9 +1122,19 @@ static unsigned long preallocate_image_p ...
Hello, Sorry for the late reply. I have been busy the past few days. It contains 524288000, so I think it is set to 500 MB. I believe that this is I am not sure if this is a new thing with the new patch, but the behavior seems to continue with the later hibernation operations too, not just the first one. I haven't confirmed if I really didn't realize the problem in the previous version of the patch, but it is very possible that I didn't realize it since I used to automate my tests. (I didn't automate my tests this time.) However, considering that the kernel needs to worry about compacting 1500 MB of data when hibernating with my tmpfs-is-full system, I guess these wait I am happy to report that it works properly by only itself when applied to a clean 2.6.35.4 tree. I haven't had any problems (aside from the "thoughtful I would be glad to test that patch as well, to see if it brings speed-ups. Actually, I might test hibernation with a larger value written to I really appreciate your help. Thanks a lot! --
I think that would improve things, as it probably is impossible to reduce the image size to 500 MB on your system. Anyway, I'll let you know when the patch is ready. Thanks, Rafael --
OK, please try the patch below on top of the previous one and see if it makes
hibernation run faster on your system.
Thanks,
Rafael
---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM / Hibernate: Make default image size depend on total RAM size
The default hibernation image size is currently hard coded and euqal
to 500 MB, which is not a reasonable default on many contemporary
systems. Make it equal 2/5 of the total RAM size (this is slightly
below the maximum, i.e. 1/2 of the total RAM size, and seems to be
generally suitable).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
Documentation/power/interface.txt | 2 +-
kernel/power/main.c | 1 +
kernel/power/power.h | 9 ++++++++-
kernel/power/snapshot.c | 7 ++++++-
4 files changed, 16 insertions(+), 3 deletions(-)
Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -46,7 +46,12 @@ static void swsusp_unset_page_forbidden(
* size will not exceed N bytes, but if that is impossible, it will
* try to create the smallest image possible.
*/
-unsigned long image_size = 500 * 1024 * 1024;
+unsigned long image_size;
+
+void __init hibernate_image_size_init(void)
+{
+ image_size = ((totalram_pages * 2) / 5) * PAGE_SIZE;
+}
/* List of PBEs needed for restoring the pages that were allocated before
* the suspend and included in the suspend image, but have also been
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -14,6 +14,9 @@ struct swsusp_info {
} __attribute__((aligned(PAGE_SIZE)));
#ifdef CONFIG_HIBERNATION
+/* kernel/power/snapshot.c */
+extern void __init hibernate_image_size_init(void);
+
#ifdef CONFIG_ARCH_HIBERNATION_HEADER
/* Maximum size of architecture ...Dear Rafael Wysocki, I think I have good news. I took a clean 2.6.35.4 tree, and first applied the latest version of your larger snapshot.c patch, and then the patch you appended to your final e-mail in this thread. Here is a comparison of the timings from a kernel without your patch, and one with it. === 8< === Sep 11 10:22:24 debian kernel: [ 499.968989] PM: Allocated 2531300 kbytes in 52.66 seconds (48.06 MB/s) Sep 11 10:44:08 debian kernel: [ 764.379131] PM: Allocated 2531308 kbytes in 143.41 seconds (17.65 MB/s) Sep 11 10:48:41 debian kernel: [ 920.626386] PM: Allocated 2531300 kbytes in 66.44 seconds (38.09 MB/s) Sep 11 10:53:37 debian kernel: [ 1092.919140] PM: Allocated 2531316 kbytes in 81.28 seconds (31.14 MB/s) ... Sep 13 01:26:09 debian kernel: [ 94.948054] PM: Allocated 1804008 kbytes in 28.72 seconds (62.81 MB/s) Sep 13 01:29:58 debian kernel: [ 176.678880] PM: Allocated 1803992 kbytes in 34.44 seconds (52.38 MB/s) Sep 13 01:33:48 debian kernel: [ 253.336405] PM: Allocated 1804000 kbytes in 27.35 seconds (65.95 MB/s) === >8 === I didn't have your latest patch applied on September 11, and it was applied last night. It looks like there is a good improvement. I think the data rates look faster on Sept. 13 because the kernel spent less time "thinking" less while compacting the memory image. (I don't think I have changed anything in my configuration that could affect the data rates that much.) Is it possible to have these patches applied to the 2.6.35 tree so that the regression I reported is fixed? Should I e-mail Greg Kroah-Hartman about this? Once again, thank a lot to you, Kosaki Motohiro and Minchan Kim! --
The "snapshot.c" patch has just been included into the Linus' tree as and I've already told Greg that it should go into 2.6.35.y. The second patch, however, only changes the default value of image_size, so it is not -stable material. As a workaround, you can change the init scripts on your system to set /sys/power/image_size to the same value that's in it when the second patch is applied. Thanks, Rafael --
OK, I'll put it into my linux-next branch, then. Probably, though, I should modify the changelog, because what it really does is to check if it makes sense to try to allocat from non-highmem pages, but it doesn't really prevent the OOM from occuring. Thanks, Rafael --
For completness, below is the patch with the new changelog.
Thanks,
Rafael
---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM / Hibernate: Avoid hitting OOM during preallocation of memory
There is a problem in hibernate_preallocate_memory() that it calls
preallocate_image_memory() with an argument that may be greater than
the total number of non-highmem memory pages that haven't been
already preallocated. If that's the case, the OOM condition is
guaranteed to trigger, which in turn can cause significant slowdown
to occur.
To avoid that, make preallocate_image_memory() adjust its argument
before calling preallocate_image_pages(), so that it doesn't exceed
the number of non-highmem pages that weren't preallocated previously.
Change hibernate_preallocate_memory() to try to allocate from highmem
if the number of pages allocated by preallocate_image_memory() is too
low. Modify free_unnecessary_pages() to take all possible memory
allocation patterns into account.
Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
kernel/power/snapshot.c | 66 +++++++++++++++++++++++++++++++++---------------
1 file changed, 46 insertions(+), 20 deletions(-)
Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1122,9 +1122,19 @@ static unsigned long preallocate_image_p
return nr_alloc;
}
-static unsigned long preallocate_image_memory(unsigned long nr_pages)
+static unsigned long preallocate_image_memory(unsigned long nr_pages,
+ unsigned long size_normal)
{
- return preallocate_image_pages(nr_pages, GFP_IMAGE);
+ unsigned long alloc;
+
+ if (size_normal <= alloc_normal)
+ return 0;
+
+ alloc = size_normal - alloc_normal;
+ if (nr_pages < alloc)
+ alloc = nr_pages;
+
+ return preallocate_image_pages(alloc, GFP_IMAGE);
}
#ifdef ...| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano | Re: Rss produced by git is not valid xml? |
| Linux Kernel Mailing List | iSeries: fix section mismatch in iseries_veth |
| Linux Kernel Mailing List | ixbge: remove TX lock and redo TX accounting. |
| Linux Kernel Mailing List | ixgbe: fix several counter register errata |
| Linux Kernel Mailing List |
