On Wed, May 14, 2008 at 4:00 AM, Tetsuo Handa Hi Tetsuo, Can you try the patch attached with this mail, I made this on top of 2.6.24.7 but should fit on any other 2.6.24 based distro kernel. If the attached patch still gives you the same problem, please send me your config file and the boot time dmesg's. Thanks, Alok
Hello. I tried 2.6.24.5-85.fc8 with your patch, and I can no longer reproduce this problem. I think the problem has been solved. Also, workaround till your patch is applied to distro kernels seems to be to add "clocksource=jiffies" or something to avoid using tsc as clocksource. Fedora 9's 2.6.25-14.fc9 also hangs since the default clocksource is tsc, but with "clocksource=jiffies" added, I encounter no hangs so far. From today, I can compile kernels using 2 CPUs. Thank you very much. --
Hi Tetsuo, Thanks for testing. Actually it would be better to use clocksource=acpi_pm on kernels which don't have this patch, as ACPI_PM comes with many more featuers like hrtimers, nohz, etc. On the other hand for kernels which have this patch attached, it would be best to use TSC, for performance. HTH, Alok. ________________________________________ From: Tetsuo Handa [penguin-kernel@i-love.sakura.ne.jp] Sent: Wednesday, May 14, 2008 11:11 PM To: Alok Kataria Cc: devzero@web.de; linux-kernel@vger.kernel.org; Daniel Hecht Subject: Re: Kernel hangs in SMP + VMware environment. Hello. I tried 2.6.24.5-85.fc8 with your patch, and I can no longer reproduce this problem. I think the problem has been solved. Also, workaround till your patch is applied to distro kernels seems to be to add "clocksource=jiffies" or something to avoid using tsc as clocksource. Fedora 9's 2.6.25-14.fc9 also hangs since the default clocksource is tsc, but with "clocksource=jiffies" added, I encounter no hangs so far. From today, I can compile kernels using 2 CPUs. Thank you very much. --
Hello. I see. Thanks. I'm not sure, but this problem might exist in all kernels since 2.6.18 , for I can find clocksource= parameter in Documentation/kernel-parameters.txt . I didn't experience this problem in earlier kernels (e.g. 2.6.20) since I was running only 1 VMware guest at a time. Today I ran 2 VMware guests simultaneously (one with kernel 2.6.22 and the other with kernel 2.6.18, both are assigned 2 virtual CPUs) and I encountered may-be-hanged-up with the 2.6.18 one. The Ctrl-C didn't work, the Alt-F? didn't work, the Ctrl-Alt-Del didn't work. I had no time to reproduce it, so I'm not sure it has actually hanged up. Regards. --
________________________________________ From: Tetsuo Handa [penguin-kernel@I-love.SAKURA.ne.jp] Sent: Friday, May 16, 2008 5:48 AM To: Alok Kataria Cc: devzero@web.de; linux-kernel@vger.kernel.org; Daniel Hecht Subject: Re: Kernel hangs in SMP + VMware environment. Hello. I see. Thanks. I'm not sure, but this problem might exist in all kernels since 2.6.18 , for I can find clocksource= parameter in Documentation/kernel-parameters.txt . ANK> Sorry, but I dont understand what you are pointing to, the 2.6.18 kernel does ANK> have clocksource parameter. ANK> And you are correct this could be a problem with all the kernels which are ANK> using clocksource. I didn't experience this problem in earlier kernels (e.g. 2.6.20) since I was running only 1 VMware guest at a time. Today I ran 2 VMware guests simultaneously (one with kernel 2.6.22 and the other with kernel 2.6.18, both are assigned 2 virtual CPUs) and I encountered may-be-hanged-up with the 2.6.18 one. ANK> If you are able to reproduce this with any of the kernels with the patch applied ANK> Please let me know. Thanks, Alok The Ctrl-C didn't work, the Alt-F? didn't work, the Ctrl-Alt-Del didn't work. I had no time to reproduce it, so I'm not sure it has actually hanged up. Regards. --
I'm not saying "I was able to reproduce this problem after applying your patch". I'm saying "All kernels since 2.6.18 might have this problem, and we need to apply Thanks. --
On Fri, May 16, 2008 at 6:34 PM, Tetsuo Handa Thats correct, but I am not sure how the stable folks will pickup these patches. Thomas, now if we need the patch in any previous kernels stable tree (in this instance 2.6.18.x), do we need to backport this patch for each kernel (which shows this problem) and send these patches to the stable tree maintainers or is their some other way ? Sorry, but I am not aware of how the stable tree is maintained. Any information regarding that would be helpful. Thanks, --
See also Documentation/stable_kernel_rules.txt. Bart. --
Today, I found that the CentOS 5.2's 2.6.18-92.1.6.el5 kernel has the patch applied. So, this problem existed in all kernels since 2.6.18, right? Does this problem happen when used in non-virtualized (i.e. native) environment? Regards. --
It does not happen for 2.6.23, and IIRC it did not with 2.6.24 either; 2.6.25-rcish is the first to show this behavior. Oh well, time to bisect. --
________________________________________ From: jengelh@sovereign.computergmbh.de [jengelh@sovereign.computergmbh.de] On Behalf Of Jan Engelhardt [jengelh@medozas.de] Sent: Sunday, May 18, 2008 12:45 PM To: Alok Kataria Cc: penguin-kernel@i-love.sakura.ne.jp; devzero@web.de; linux-kernel@vger.kernel.org; Daniel Hecht Subject: Re: Kernel hangs in SMP + VMware environment. I too noticed it; clocksource=pit is my current workaround. ANK> What kernel do you see this with ? Did you get a a chance to try the patch It does not happen for 2.6.23, and IIRC it did not with 2.6.24 either; 2.6.25-rcish is the first to show this behavior. ANK> Are you sure you didn't see it with 2.6.23/2.6.24 ? ANK> i am assuming your test case is also similar to that of Tetsuo. ANK> Multiple guests running simoultaneously. Oh well, time to bisect. ANK> Let me know if you get anything their. Thanks, Alok --
