Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startup and is unkillable [still broken in 2.6.25-rc1]

Previous thread: [PATCH] drivers/base: export gpl (un)register_memory_notifier by Jan-Bernd Themann on Monday, February 11, 2008 - 11:57 am. (3 messages)

Next thread: [PATCH?][arch/parisc/kernel/pci-dma.c] pcxl_dma_ops.alloc_noncoherent = pa11_dma_alloc_consistent? by Roel Kluin on Monday, February 11, 2008 - 12:23 pm. (10 messages)

Only to add that 2.6.25-rc1 is still broken.

thanks,

--alessandro

"We act as though comfort and luxury were the chief requirements
of life, when all that we need to make us really happy is
something to be enthusiastic about."

(Charles Kingsley)
--

To: Alessandro Suardi <alessandro.suardi@...>
Cc: LKML <linux-kernel@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Linus Torvalds <torvalds@...>, Peter Zijlstra <a.p.zijlstra@...>
Date: Monday, February 11, 2008 - 3:09 pm

Yes, I think this is the same problem. Please try to unset CONFIG_GROUP_SCHED

Yes, it is.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Alessandro Suardi <alessandro.suardi@...>, LKML <linux-kernel@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Peter Zijlstra <a.p.zijlstra@...>
Date: Monday, February 11, 2008 - 3:38 pm

Ok, that's git ID's

b47711bfbcd4eb77ca61ef0162487b20e023ae55 2.6.24-git1
9b73e76f3cf63379dcf45fcd4f112f5812418d0a 2.6.24-git2

so if you get a git tree, you can do

gitk b47711b..9b73e76

to see what happened in there.

However, the obvious candidates are the scheduler or the ocfs2 merge, and
the latter is only relevant in case you use ocfs2, of course.

The rest of it tends to be the DVB and SCSI updates.

But it would be great if you could do a bisect and verify. Just do

git bisect start
git bisect good b47711bfbcd4eb77ca61ef0162487b20e023ae55
git bisect bad 9b73e76f3cf63379dcf45fcd4f112f5812418d0a

and off you go..

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Alessandro Suardi <alessandro.suardi@...>, LKML <linux-kernel@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Peter Zijlstra <a.p.zijlstra@...>
Date: Monday, February 11, 2008 - 3:56 pm

Well, I've already bisected that down to commit
6f505b16425a51270058e4a93441fe64de3dd435 "sched: rt group scheduling" and
provided a simple test case. Moreover, there are patches from Peter that fix
the problem, but they are lost somewhere in the way from him to you (please see
http://lkml.org/lkml/2008/2/5/535 and http://lkml.org/lkml/2008/2/6/320).

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Linus Torvalds <torvalds@...>, Alessandro Suardi <alessandro.suardi@...>, LKML <linux-kernel@...>, Andrew Morton <akpm@...>, Peter Zijlstra <a.p.zijlstra@...>
Date: Monday, February 11, 2008 - 4:49 pm

no, they were not lost, they just didnt pass QA here (they crashed on a
particularly hard to debug 8-way box i have) and Peter worked on that
queue of fixes up until today to get it really correct. Could you check:

git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git

combo patch below as well - whichever you prefer. The shortlog can be
found below as well - but i dont yet consider this pullable, i'd like it
to see pass a full night of randconfig tests on my test-systems.

this all was complicated by the fact that people found interactivity of
the group scheduler to be not up to their expectations (the observed
latencies would go up linearly with the number of UIDs in the system).
And your positive test results depended on the presence of those (at
that time, still half-baken) changes. That issue turned out to be due to
a .24-era design bug in the group scheduler and it took Peter a longer
time to straighten it out. (he flattened the rbtree to maintain perfect
weights and to get latencies back within the target.)

So what fixed your thing was simply not pullable into -rc1, as -rc1 was
being cooled down at around last Friday already. Sorry about that! The
short-term emergency fix would be to turn off the group scheduler
altogether. (or, if we are lucky, tomorrow i can send the pull request
for the changes below.)

Ingo

------------------>
Peter Zijlstra (15):
hrtimer: more hrtimer_init_sleeper() fallout.
sched: fair-group: separate tg->shares from task_group_lock
sched: fix incorrect irq lock usage in normalize_rt_tasks()
sched: rt-group: deal with PI
sched: rt-group: interface
sched: rt-group: make rt groups scheduling configurable
sched: rt-group: clean up the ifdeffery
sched: rt-group: refure unrunnable tasks
sched: rt-group: synchonised bandwidth period
sched: rt-group: smp balancing
sched: cleanup old and rarely used 'debug' features.
sched: fair-gro...

To: Rafael J. Wysocki <rjw@...>
Cc: Linus Torvalds <torvalds@...>, Alessandro Suardi <alessandro.suardi@...>, LKML <linux-kernel@...>, Andrew Morton <akpm@...>, Peter Zijlstra <a.p.zijlstra@...>
Date: Monday, February 11, 2008 - 6:10 pm

ok, we just found the reason for the 8-way crash, the delta fix from
Peter is below if any of you have tried the previous combo patch.
Updated sched.git as well, new HEAD is
fec13e45305d69fd0bd23b30bd05a0a42cf341f8.

Ingo

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -219,6 +219,10 @@ static void start_rt_bandwidth(struct rt
if (rt_b->rt_runtime == RUNTIME_INF)
return;

+ if (hrtimer_active(&rt_b->rt_period_timer))
+ return;
+
+ spin_lock(&rt_b->rt_runtime_lock);
for (;;) {
if (hrtimer_active(&rt_b->rt_period_timer))
break;
@@ -229,6 +233,7 @@ static void start_rt_bandwidth(struct rt
rt_b->rt_period_timer.expires,
HRTIMER_MODE_ABS);
}
+ spin_unlock(&rt_b->rt_runtime_lock);
}

#ifdef CONFIG_RT_GROUP_SCHED
--

To: Ingo Molnar <mingo@...>
Cc: Linus Torvalds <torvalds@...>, Alessandro Suardi <alessandro.suardi@...>, LKML <linux-kernel@...>, Andrew Morton <akpm@...>, Peter Zijlstra <a.p.zijlstra@...>
Date: Monday, February 11, 2008 - 7:12 pm

With the previous patch and this patch applied, the issue is not reproducible
here.

Thanks,
--

To: Rafael J. Wysocki <rjw@...>
Cc: Ingo Molnar <mingo@...>, Linus Torvalds <torvalds@...>, Alessandro Suardi <alessandro.suardi@...>, LKML <linux-kernel@...>, Andrew Morton <akpm@...>
Date: Tuesday, February 12, 2008 - 9:44 am

Did you enable CONFIG_RT_GROUP_SCHED (it defaults to n)?

If you didn't, could you try with it set to y?

--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Ingo Molnar <mingo@...>, Linus Torvalds <torvalds@...>, Alessandro Suardi <alessandro.suardi@...>, LKML <linux-kernel@...>, Andrew Morton <akpm@...>
Date: Tuesday, February 12, 2008 - 3:28 pm

Tested with CONFIG_RT_GROUP_SCHED set and it also works as expected.

Thanks,
Rafael
--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Rafael J. Wysocki <rjw@...>, Ingo Molnar <mingo@...>, Linus Torvalds <torvalds@...>, LKML <linux-kernel@...>, Andrew Morton <akpm@...>
Date: Tuesday, February 12, 2008 - 10:35 am

I just rebuilt 2.6.25-rc1-git2 with Ingo's patch and your patch on top,
and the Oracle VKTM issue is still gone even with

[asuardi@sandman ~]$ grep GROUP_SCHED
/share/src/linux-2.6.25-rc1-git2-orafix/.config
CONFIG_GROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_RT_GROUP_SCHED=y
# CONFIG_CGROUP_SCHED is not set

so it's good for me.

Or is it necessary to also enable CONFIG_CGROUP_SCHED and retest ?

--alessandro

"We act as though comfort and luxury were the chief requirements
of life, when all that we need to make us really happy is
something to be enthusiastic about."

(Charles Kingsley)
--

To: Alessandro Suardi <alessandro.suardi@...>
Cc: Rafael J. Wysocki <rjw@...>, Ingo Molnar <mingo@...>, Linus Torvalds <torvalds@...>, LKML <linux-kernel@...>, Andrew Morton <akpm@...>
Date: Tuesday, February 12, 2008 - 10:53 am

No that should be quite all-right. Thanks for testing!

--

To: Rafael J. Wysocki <rjw@...>
Cc: Ingo Molnar <mingo@...>, Linus Torvalds <torvalds@...>, LKML <linux-kernel@...>, Andrew Morton <akpm@...>, Peter Zijlstra <a.p.zijlstra@...>
Date: Monday, February 11, 2008 - 7:31 pm

The problem is fixed for me as well with the previous patch + the patch
below, VKTM now enters S state and Oracle shuts down properly again.

--alessandro

"We act as though comfort and luxury were the chief requirements
of life, when all that we need to make us really happy is
something to be enthusiastic about."

(Charles Kingsley)
--

To: Alessandro Suardi <alessandro.suardi@...>
Cc: Rafael J. Wysocki <rjw@...>, Linus Torvalds <torvalds@...>, LKML <linux-kernel@...>, Andrew Morton <akpm@...>, Peter Zijlstra <a.p.zijlstra@...>
Date: Wednesday, February 13, 2008 - 8:28 am

thanks alot for testing this. The scheduler queue is now looking good in
testing, will probably send a pull request to Linus later today.

Ingo
--

Previous thread: [PATCH] drivers/base: export gpl (un)register_memory_notifier by Jan-Bernd Themann on Monday, February 11, 2008 - 11:57 am. (3 messages)

Next thread: [PATCH?][arch/parisc/kernel/pci-dma.c] pcxl_dma_ops.alloc_noncoherent = pa11_dma_alloc_consistent? by Roel Kluin on Monday, February 11, 2008 - 12:23 pm. (10 messages)