"Ok, I think I'm getting close to releasing a real 2.6.23," began Linus Torvalds in his release announcement for the eighth release candidate of the upcoming 2.6.23 kernel. "Things seem to have calmed down, and I think Thomas Gleixner may have found the suspend/resume regression that has dogged us for a while, so I'm feeling happy about things." Linus continued:
"Of course, me feeling happy is usually immediately followed by some nasty person finding new problems, but I'll just ignore that and enjoy the feeling anyway, however fleeting it may be.
"The shortlog really is pretty short, and I'm appending the diffstat at the end too in case anybody cares, but basically it's just a number of fairly small but real fixes, with some support for a few new chips to the sky2 network driver.."
"It took me quite a while to realize the real root cause of the VAIO - and probably many other machines - suspend/resume regressions, which were unearthed by the dyntick / clockevents patches," Thomas Gleixner explained regarding two patches for fixing suspend issues that Andrew Morton experienced with his VAIO laptop. He continued, "we disable a lot of ACPI/BIOS functionality during suspend, but we keep the lower idle C-states functionality active across suspend/resume. It seems that this causes trouble with certain BIOSes, but I assume that the problem is more wide spread and just not surfacing due to the various scenarios in which a machine goes into suspend/resume." Thomas concluded, "I really hope that this two patches finally set an end to the 'jinxed VAIO heisenbug series', which started when we removed the periodic tick with the clockevents/dyntick patches."
Linus Torvalds expressed some concerns, "the patches look fine, but I somehow have this slight feeling that you gave up a bit too soon on the '*why* does this happen?' question." He agreed that at that point there was a problem with ACPI, but cautioned that this could be triggered by another bug, "in particular, I also suspect that this may not really fix the problem - maybe it just makes the window sufficiently small that it no longer triggers. Because we don't necessarily understand what the real background for the problem is, I'm not sure we can say that it is solved." Linus concluded, "but hey, I think I'll apply the patches as-is. I'd just feel even better if we actually understood *why* doing the CPU Cx states is not something we can do around the suspend code!"
Offering a potential alternative to the existing suspend and restore implementations in the Linux Kernel, Ying Huang posted a patch utilizing kexec, "kexec based hibernation has some potential advantages over uswsusp and suspend2. " He listed two such potential advantages, "the hibernation image size can exceed half of memory size easily," and, "the hibernation image can be written to and read from almost anywhere, such as a USB disk [or] NFS." He described the feature implemented by his patch as "jumping from a kexeced kernel to the original kernel", allowing someone to first boot from one kernel, then to kexec another crashdump kernel in reserved memory and run from it for a while, and finally to "jump back" to the original kernel.
Andrew Morton replied to the idea very positively, "this sounds awesome. Am I correct in expecting that ultimately the existing hibernation implementation just goes away and we reuse (and hence strengthen) the existing kexec (and kdump?) infrastructure? And that we get hibernation support almost for free on all kexec (and relocatable-kernel?) capable architectures? And that all the management of hibernation and resume happens in userspace?" He went on to ask, "how close do you think all this is to being a viable thing?" Ying replied, "the kexec jump is the first step, maybe the simplest step. There are many other issues to be resolved, at least the following ones," going on to list a series of steps that still have to be implemented before kexec based hibernation would be a viable option.
What started as the review of a bug report grew into an interesting debate as Linus Torvalds slammed the current suspend and resume [story] design in the Linux Kernel, "why the HELL cannot you realize that kernel threads are different? The right thing to do is AND HAS ALWAYS BEEN, to stop and start user threads only around the whole thing. Don't touch those kernel threads. Stop freezing them." Later in the discussion, Linus noted that he had no interest in Suspend to Disk (STD), and was only interested in a working Suspend to Ram (STR) implementation. He noted that complexity introduced by STD was infecting the STR logic, and that the two should be completely separated, "what irritates me is that STR really shouldn't have _had_ that bug at all. The only reason STR had the same bug as STD was exactly the fact that the two features are too closely inter-twined in the kernel. That irritates me hugely. We had a bug we should never had had! We had a bug because people are sharing code that shouldn't be shared! We had a bug because of code that makes no sense in the first place!" Linus noted that he doesn't use laptops much, but still likes STR on his desktop, "STR means they are quiet and don't waste energy when I don't use them, but they're instantly available when I care." He then went on to point to design flaws in the freezer:
"I actually don't think that processes should be frozen really at all. I agree that filesystems have to be frozen (and I think that checkpointing of the filesystem or block device is 'too clever'), but I just don't think that has anything to do with freezing processes. So I'd actually much prefer to freeze at the VFS (and socket layers, etc), and make sure that anybody who tries to write or do something else that we cannot do until resuming, will just be blocked (or perhaps just buffered)!"