Re: Kernel hangs in SMP + VMware environment.

Previous thread: [PATCH] Remove final references to deprecated, unreferenced TOPDIR. by Robert P. J. Day on Wednesday, May 14, 2008 - 11:25 am. (1 message)

Next thread: troubleshooting/debugging hard locks by Lee Howard on Wednesday, May 14, 2008 - 12:27 pm. (6 messages)
From: Alok Kataria
Date: Wednesday, May 14, 2008 - 11:30 am

On Wed, May 14, 2008 at 4:00 AM, Tetsuo Handa

Hi Tetsuo,

Can you try the patch attached with this mail, I made this on top of
2.6.24.7 but should fit on any other 2.6.24 based distro kernel.

If the attached patch still gives you the same problem, please send me
your config file and the boot time dmesg's.

Thanks,
Alok
From: Tetsuo Handa
Date: Wednesday, May 14, 2008 - 11:11 pm

Hello.


I tried 2.6.24.5-85.fc8 with your patch, and I can no longer reproduce
this problem. I think the problem has been solved.

Also, workaround till your patch is applied to distro kernels seems to be
to add "clocksource=jiffies" or something to avoid using tsc as clocksource.
Fedora 9's 2.6.25-14.fc9 also hangs since the default clocksource is tsc, but
with "clocksource=jiffies" added, I encounter no hangs so far.

From today, I can compile kernels using 2 CPUs.
Thank you very much.
--

From: Alok Kataria
Date: Thursday, May 15, 2008 - 12:44 pm

Hi Tetsuo,

Thanks for testing.

Actually it would be better to use clocksource=acpi_pm on kernels which don't have this patch, as ACPI_PM comes with many more featuers like hrtimers, nohz, etc.

On the other hand for kernels which have this patch attached,  it would be best to use TSC, for performance.

HTH,
Alok.

________________________________________
From: Tetsuo Handa [penguin-kernel@i-love.sakura.ne.jp]
Sent: Wednesday, May 14, 2008 11:11 PM
To: Alok Kataria
Cc: devzero@web.de; linux-kernel@vger.kernel.org; Daniel Hecht
Subject: Re: Kernel hangs in SMP + VMware environment.

Hello.


I tried 2.6.24.5-85.fc8 with your patch, and I can no longer reproduce
this problem. I think the problem has been solved.


Also, workaround till your patch is applied to distro kernels seems to be
to add "clocksource=jiffies" or something to avoid using tsc as clocksource.
Fedora 9's 2.6.25-14.fc9 also hangs since the default clocksource is tsc, but
with "clocksource=jiffies" added, I encounter no hangs so far.



From today, I can compile kernels using 2 CPUs.
Thank you very much.
--

From: Tetsuo Handa
Date: Friday, May 16, 2008 - 5:48 am

Hello.

I see. Thanks.



I'm not sure, but this problem might exist in all kernels since 2.6.18 , for
I can find clocksource= parameter in Documentation/kernel-parameters.txt .

I didn't experience this problem in earlier kernels (e.g. 2.6.20)
since I was running only 1 VMware guest at a time.
Today I ran 2 VMware guests simultaneously (one with kernel 2.6.22 and
the other with kernel 2.6.18, both are assigned 2 virtual CPUs) and I encountered
may-be-hanged-up with the 2.6.18 one.
The Ctrl-C didn't work, the Alt-F? didn't work, the Ctrl-Alt-Del didn't work.
I had no time to reproduce it, so I'm not sure it has actually hanged up.

Regards.
--

From: Alok Kataria
Date: Friday, May 16, 2008 - 6:22 pm

________________________________________
From: Tetsuo Handa [penguin-kernel@I-love.SAKURA.ne.jp]
Sent: Friday, May 16, 2008 5:48 AM
To: Alok Kataria
Cc: devzero@web.de; linux-kernel@vger.kernel.org; Daniel Hecht
Subject: Re: Kernel hangs in SMP + VMware environment.

Hello.

I see. Thanks.



I'm not sure, but this problem might exist in all kernels since 2.6.18 , for
I can find clocksource= parameter in Documentation/kernel-parameters.txt .

ANK> Sorry, but I dont understand what you are pointing to, the 2.6.18 kernel does
ANK> have clocksource parameter.
ANK> And you are correct this could be a problem with all the kernels which are
ANK> using clocksource.

I didn't experience this problem in earlier kernels (e.g. 2.6.20)
since I was running only 1 VMware guest at a time.

Today I ran 2 VMware guests simultaneously (one with kernel 2.6.22 and
the other with kernel 2.6.18, both are assigned 2 virtual CPUs) and I encountered
may-be-hanged-up with the 2.6.18 one.


ANK> If you are able to reproduce this with any of the kernels with the patch applied
ANK> Please let me know.

Thanks,
Alok

The Ctrl-C didn't work, the Alt-F? didn't work, the Ctrl-Alt-Del didn't work.
I had no time to reproduce it, so I'm not sure it has actually hanged up.

Regards.
--

From: Tetsuo Handa
Date: Friday, May 16, 2008 - 6:34 pm

I'm not saying "I was able to reproduce this problem after applying your patch".
I'm saying "All kernels since 2.6.18 might have this problem, and we need to apply

Thanks.
--

From: Alok kataria
Date: Friday, May 16, 2008 - 10:59 pm

On Fri, May 16, 2008 at 6:34 PM, Tetsuo Handa

Thats correct, but I am not sure how the stable folks will pickup these patches.
Thomas, now if we need the patch in any previous  kernels stable tree
(in this instance 2.6.18.x), do we need to backport this patch for
each kernel (which shows this problem)  and send these patches to the
stable tree maintainers or is their some other way  ?
Sorry, but I am not aware of how the stable tree is maintained. Any
information regarding that would be helpful.

Thanks,
--

From: Bart Van Assche
Date: Saturday, May 17, 2008 - 12:16 am

See also Documentation/stable_kernel_rules.txt.

Bart.
--

From: Tetsuo Handa
Date: Friday, June 27, 2008 - 5:34 am

Today, I found that the CentOS 5.2's 2.6.18-92.1.6.el5 kernel has the patch applied.
So, this problem existed in all kernels since 2.6.18, right?

Does this problem happen when used in non-virtualized (i.e. native) environment?

Regards.
--

From: Jan Engelhardt
Date: Sunday, May 18, 2008 - 12:45 pm

It does not happen for 2.6.23, and IIRC it did not with 2.6.24
either; 2.6.25-rcish is the first to show this behavior.
Oh well, time to bisect.
--

From: Alok Kataria
Date: Monday, May 19, 2008 - 6:08 pm

________________________________________
From: jengelh@sovereign.computergmbh.de [jengelh@sovereign.computergmbh.de] On Behalf Of Jan Engelhardt [jengelh@medozas.de]
Sent: Sunday, May 18, 2008 12:45 PM
To: Alok Kataria
Cc: penguin-kernel@i-love.sakura.ne.jp; devzero@web.de; linux-kernel@vger.kernel.org; Daniel Hecht
Subject: Re: Kernel hangs in SMP + VMware environment.


I too noticed it; clocksource=pit is my current workaround.

ANK> What kernel do you see this with ? Did you get a a chance to try the patch

It does not happen for 2.6.23, and IIRC it did not with 2.6.24
either; 2.6.25-rcish is the first to show this behavior.

ANK> Are you sure you didn't see it with 2.6.23/2.6.24 ?
ANK> i am assuming your test case is also similar to that of Tetsuo.
ANK> Multiple guests running simoultaneously.

Oh well, time to bisect.

ANK> Let me know if you get anything their.


Thanks,
Alok
--

Previous thread: [PATCH] Remove final references to deprecated, unreferenced TOPDIR. by Robert P. J. Day on Wednesday, May 14, 2008 - 11:25 am. (1 message)

Next thread: troubleshooting/debugging hard locks by Lee Howard on Wednesday, May 14, 2008 - 12:27 pm. (6 messages)