Re: 2.6.32 cgroup regression

Previous thread: Re: Runtime PM and the block layer by Alan Stern on Tuesday, August 24, 2010 - 1:06 pm. (2 messages)

Next thread: [PATCH -v2] fanotify: drops the packed attribute from userspace event metadata by Eric Paris on Tuesday, August 24, 2010 - 1:43 pm. (10 messages)
From: Josh Hunt
Date: Tuesday, August 24, 2010 - 1:10 pm

This commit makes the ltp cpuctl latency test #2 hang indefinitely:

commit b5d9d734a53e0204aab0089079cbde2a1285a38f
Author: Mike Galbraith <efault@gmx.de>
Date:   Tue Sep 8 11:12:28 2009 +0200

    sched: Ensure that a child can't gain time over it's parent after fork()

When I revert this commit the test progresses as it did in 2.6.31. I
have seen this issue on 2.6.32 and 2.6.32.19. The hang goes away in
2.6.33 starting with this commit:

commit 88ec22d3edb72b261f8628226cd543589a6d5e1b
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date:   Wed Dec 16 18:04:41 2009 +0100

    sched: Remove the cfs_rq dependency from set_task_cpu()

Even though this appears to be resolved in 2.6.33, I am reporting it
because 2.6.32 is the "long-term stable release".

My test system is a single socket dual core amd -
model name	: Dual Core AMD Opteron(tm) Processor 180
with 4GB of RAM.
Kernel config file attached.

The issue is easily reproducible for me by downloading and building ltp,
then running
testcases/kernel/controllers/cpuctl/run_cpuctl_latency_test.sh 2

Please let me know if you need any other information to help reproduce
this issue.

Thanks
Josh
From: Mike Galbraith
Date: Tuesday, August 24, 2010 - 10:56 pm

Ouch.  Yeah, that commit is buggy, and never got fixed up in stable.
Reverting it will restore a slightly less buggy, but not very good
situation.  Getting the fork problems all fixed up took a while.

Excellent timing you have.  I have a tree of backports, but I wasn't
counting this commit as a must have, merely highly desirable.  This


No, the testcase works well.  Thanks.

	-Mike


--

From: Josh Hunt
Date: Wednesday, August 25, 2010 - 1:19 pm

I'd be interested in looking at this tree when it's available.

Thanks
Josh
--

From: Mike Galbraith
Date: Thursday, August 26, 2010 - 12:51 am

(sent quilt stack offline, anybody else wants it, holler, and you'll
receive one [absolutely free!] 50k tarball)

--

From: Minoru Usui
Date: Tuesday, August 31, 2010 - 12:25 am

Hi, Mike

On Wed, 25 Aug 2010 07:56:01 +0200

I'm interested in this problem, because I hit the same problem in RHEL6 beta2.
(It based on 2.6.32)

Are you writing a patch to solving this problem?
If you are doing, I can test it in RHEL6 beta2 (or latest).

Appendix.
I could reproduce this problem without ltp. See below.(case 1)
But if cpus are not completely busy, it couldn't occure.(case 2)

[case1]
  1) Run busy loop process (number of cpu) in same cpu cgroup.

  2) attach process to 1)'s cpu cgroup
       -> attach process unfinished

  Ex)
     # mkdir /cgroup/cpu/test/tasks 
     # echo $$ > /cgroup/cpu/test/tasks 
     # ./loop 8 &
     [1] 27202
 
     # mpstat -P ALL 1
     Linux 2.6.32-37.el6.x86_64 (StingerG.localdomain)       08/31/2010      _x86_64_        (8 CPU)
 
     03:08:45 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
     03:08:46 PM  all  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
     03:08:46 PM    0  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
     03:08:46 PM    1  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
     03:08:46 PM    2  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
     03:08:46 PM    3  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
     03:08:46 PM    4  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
     03:08:46 PM    5  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
     03:08:46 PM    6  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
     03:08:46 PM    7  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

     # echo $$ > /cgroup/cpu/tasks
     # time echo $$ > /cgroup/cpu/test/tasks   <- unfinish this operation

[case2]
     # echo $$ > /cgroup/cpu/test/tasks 
     # ./loop 7 &
     [1] 27259
 
     # mpstat -P ALL 1
     Linux 2.6.32-37.el6.x86_64 ...
From: Mike Galbraith
Date: Friday, September 3, 2010 - 7:09 am

No, the necessary patches were already written.  I just needed to

I just sent a 50 patch series, ever so lovingly git am applied. git
format-patch exported, then imported into evolution one darn patch at a
time, to stable to either apply or bin as maintainers see fit.  To test,
all you should need to do is test mainline.  If you'd like a quilt
tarball against 32.21 anyway, just holler.

The series has all the fork/exec/wakeup/hotplug yada yada fixes I think
are mana from heaven for our long-term stable kernel.  I may well get
"are you outta yer ever lovin' mind?" back, but _I_ think it's needed,
so...

	-Mike

--

Previous thread: Re: Runtime PM and the block layer by Alan Stern on Tuesday, August 24, 2010 - 1:06 pm. (2 messages)

Next thread: [PATCH -v2] fanotify: drops the packed attribute from userspace event metadata by Eric Paris on Tuesday, August 24, 2010 - 1:43 pm. (10 messages)