Re: A bug (probably) in stop_all_threads

Previous thread: none

Next thread: [PATCH] x86: change early_ioremap to use slots instead of nesting by Yinghai Lu on Saturday, September 13, 2008 - 2:13 am. (1 message)
From: karthikeyan S
Date: Saturday, September 13, 2008 - 1:27 am

Hi,

Apologies if I am posting this message in an incorrect mailing list
and for bringing up an issue with older kernel version (2.4), and if
the issue had been brought up earlier and I missed it.

There seems to be a bug with stop_all_threads function in 2.4. The
function sends SIGSTOP to all the threads in the thread group and
waits until all the threads get their state changed accordingly.

While waiting, if it finds that the event has not occurred, it tires
to yield the processor to other processes by calling
schedule_timeout().

Bur before calling schedule_timeout() it does not set the task state
to either TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE.
So schedule_timeout() does not do anything effectively.

This causes a problem in our device which uses kernel 2.4. When we
have a sigsegv from the task which runs at highest priority, the
control is stuck waiting for all the threads in the thread group to
change their task state. But the other threads never get a chance to
run and the SIGSTOP sent to them is of no effect.

When  I changed the stop_all_threads function to set the task state to
TASK_INTERRUPTIBLE, the problem disappears.

So is this a real issue that stop_all_threads() does not set the
current task to TASK_INTERRUPTIBLE before calling schedule_timeout()?

Please provide your feedback. Thanks a lot.

-karthik
--

From: Grant Coady
Date: Saturday, September 13, 2008 - 3:07 am

On Sat, 13 Sep 2008 13:57:28 +0530, "karthikeyan S" <karthispeaks@gmail.com> wrote:


--

From: Willy Tarreau
Date: Monday, September 15, 2008 - 10:17 pm

Hi karthik,

Just a quick note to tell you that I have not missed your mail, I
just need some time to analyse your report and the code related
to it. Have you tried setting TASK_INTERRUPTIBLE as you suggest ?
At first sight, it seems to make sense.

Regards,
Willy

--

From: karthikeyan S
Date: Monday, September 15, 2008 - 10:49 pm

Hi Willy,

Thanks for getting back. Yes, I tried to set the state to
TASK_INTERRUPTIBLE. It solves the issue. The other processes now get a
chance to handle the SIGSTOP sent to them.

-karthik

--

From: Willy Tarreau
Date: Monday, September 15, 2008 - 11:22 pm

OK, that will help me review the current code and compare it with 2.6.
If you could send me your patch, it will even save me more time. Based
on your report, it's very likely that it will get merged.

Thanks,
Willy

--

From: karthikeyan S
Date: Tuesday, September 16, 2008 - 1:28 am

Sure, I can definitely send the patch. I haven't sent a patch before,
and I am not fully aware of the process to follow. It might take a
little bit if time, but I will try to send it very soon.

--

From: Willy Tarreau
Date: Tuesday, September 16, 2008 - 2:28 am

in order not to waste your time, here is how to proceed :

go to the directory where both your new kernel and old kernel are.
Here is how you do then :

  $ diff -urN linux-2.4.36-bad linux-2.4.36-goot  > my-patch.diff

(ensure that you don't have lots of old or temporary files in it).
You might have to run a "make distclean" in each dir first.

Then integrate the result as inline text into your mail, and as an added
bonus, other people will be able to comment on your work.

Regards,
Willy

--

From: karthikeyan S
Date: Tuesday, September 16, 2008 - 12:30 pm

Willy, Thank you for the info.

I downloaded a 2.4.36 version from kernel.org, there is no
stop_all_threads() at all in that version.
do_coredump() mechanism seems to have been changed. It does not call
stop_all_threads().

I am not sure which 2.4 version we are using for our device that have
stop_all_threads().
And also I am not sure from where our guys had picked up the "dump
core for all threads" patch which includes the stop_all_threads
function. Had this function been there ever in official 2.4? Thanks a
lot.

So, looks like the need to send the patch is not there anymore? :-(

--

From: Willy Tarreau
Date: Tuesday, September 16, 2008 - 1:21 pm

No, I don't think so. But you should check Redhat and SuSE kernels,
they were heavily patched to support an early version of the 2.6 O(1)
scheduler, NPTL threads and things like this. As a result, there were
a large number of changes in this area and your patches might come
from there. Also check for Andrea Arcangelli's patches (2.4-aa), they
were approximately the ones that constituted the SuSE kernels by that
time. I'm pretty sure that you'll find what you're looking for from

No, but that does not matter. I prefer one false alarm once in while
than no alarm with a big open hole ;-)

Good luck,
Willy

--

Previous thread: none

Next thread: [PATCH] x86: change early_ioremap to use slots instead of nesting by Yinghai Lu on Saturday, September 13, 2008 - 2:13 am. (1 message)