Dimitri Sivanich, a colleague of mine, just reported to me an easily
reproduced BUG in Linus's current git tree, anytime one reads or writes
the new per-cpuset file "sched_relax_domain_level". The guilty task
gets a SEGV and the kernel prints (if the command was called 'cat'
and its pid was 16766 ;):
kernel BUG at kernel/cpuset.c:1448!
cat[16766]: bugcheck! 0 [3]
The BUG comes from cpuset code that wasn't expecting that read or write
request at that point in the code.
The basic problem is that Seto-san's "sched_relax_domain_level" and
Paul M's conversion to the new style *_u64 cpuset file handlers were
occurring at the same time, with the result that the handlers for
the per-cpuset file "sched_relax_domain_level" were only partially
converted to the new style *_u64 cpuset file handlers.
The following provides more details, and presents a couple of questions
for Andrew or Paul Menage, at the end.
===
On April 29, Paul Menage observed that the cpuset patch for
'sched_relax_domain' got mangled -- it ended up using the
old style common file read/write routines, but having the
cases to handle it added to Paul M's new style *_u64 handlers.
Paul M proposed the following untested patch:
Andrew replied:
I definitely agree with the above observations of Paul M. I suspect
that the patch might be missing the lines needed to -remove- the
FILE_SCHED_RELAX_DOMAIN_LEVEL cases from the old style
cpuset_common_file_read and cpuset_common_file_write switches.
The kernel now at the top of Linus's git tree hits a BUG()
immediately, anytime you try to read or write these new
per-cpuset files "sched_relax_domain_level".
I tried looking in 2.6.25-rc1-mm1-mmotm (as of an hour ago),
and it -looks- like the fix is in the linux-next.patch there.
However:
1) I can't get 2.6.25-rc1-mm1-mmotm to apply even close to
either of 2.6.25 or 2.6.25-rc1. Blows up on the first
patch.
==> akpm - what does todays 2.6.25-rc1-mm1-mmotm
apply to?
2) I didn't see any replies from Paul M in response to
Andrews above request to "send us any needed fixup later
in the week".
==> Paul M or akpm - Is this fixup in the pipeline?
I guess it did from my reading of the linux-next.patch
in 2.6.25-rc1-mm1-mmotm, but I'm not confident I'm
reading that patch right.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.940.382.4214
--
| Artem Bityutskiy | [PATCH 00/22 take 3] UBI: Unsorted Block Images |
| Greg Kroah-Hartman | [PATCH 022/196] adb: Convert from class_device to device |
| Parag Warudkar | BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] |
| Ingo Molnar | Re: pthread_create() slow for many threads; also time to revisit 64b context switc... |
git: | |
| Bill Lear | Meaning of "fatal: protocol error: bad line length character"? |
| Ludovic | `git-send-email' doesn't specify `Content-Type' |
| Miles Bader | multiple-commit cherry-pick? |
| JD Guzman | C# Git Implementation |
| Martin Schröder | Re: Real men don't attack straw men |
| Jacob Meuser | Re: esd + mpd |
| Daniel Ouellet | identifying sparse files and get ride of them trick available? |
| Todd Pytel | IDE or SCSI virtual disks for VMWare image? |
| David Miller | Re: kernel oops when system under network stress |
| Denys Fedoryshchenko | thousands of classes, e1000 TX unit hang |
| Eric Dumazet | [PATCH 6/6] fs: Introduce kern_mount_special() to mount special vfs |
| Леонид Юрьев | [r8169] patch for RTL8102 (5 new MAC/PHY) |
