I get the following OOPS from udevd during bootup on
sparc64:[ 0.982046] \|/ ____ \|/
[ 0.982054] "@'/ .. \`@"
[ 0.982058] /_| \__/ |_\
[ 0.982063] \__U_/
[ 0.982482] udevd(1305): Kernel illegal instruction [#1]
[ 0.982550] TSTATE: 0000004411001602 TPC: 00000000007ddb80 TNPC: 00000000007ddb84 Y: 00000000 Not tainted
[ 0.982647] TPC: <0x7ddb88>
[ 0.982684] g0: 0000000000000000 g1: 00000000007ddb80 g2: 0000000000000000 g3: 0000000000000000
[ 0.982760] g4: fffff803fb332a00 g5: fffff8000f8a6000 g6: fffff803fb3a8000 g7: fffff803fc893724
[ 0.982831] o0: fffff803fc8bc010 o1: 00000000007ddb58 o2: fffff803fbba0000 o3: fffff803fb3abd48
[ 0.982907] o4: 0000000000040001 o5: 0000000000000000 sp: fffff803fb3ab391 ret_pc: 00000000005b7524
[ 0.982984] RPC: <dev_attr_show+0x24/0x30>
[ 0.983016] l0: 0000000000000000 l1: 0000000000000000 l2: fffff803fc9a5308 l3: 000000000005a000
[ 0.983074] l4: 0000000000000004 l5: fffff803fb340b00 l6: 0000000000000168 l7: fffff803fbd32700
[ 0.983133] i0: fffffffffffffffb i1: 00000000007ddb58 i2: fffff803fbba0000 i3: 0000000000000000
[ 0.983197] i4: 00000000005002a4 i5: 0000000000000008 i6: fffff803fb3ab451 i7: 0000000000500620
[ 0.983269] I7: <sysfs_read_file+0x80/0x104>
[ 0.983299] Caller[0000000000500620]: sysfs_read_file+0x80/0x104
[ 0.983368] Caller[00000000004bedc0]: vfs_read+0x78/0x10c
[ 0.983556] Caller[00000000004bf114]: sys_read+0x34/0x60
[ 0.983622] Caller[0000000000406294]: linux_sparc_syscall32+0x3c/0x40
[ 0.983707] Caller[0000000000016d90]: 0x16d98
[ 0.983873] Instruction DUMP: 007ddb80 00000000 00000000 <00000000> 00000000 00000000 00000000 00000000 007ddb98This is dev_attr_show() calling attr->show() which points to
'part_attr_group' struct instead of a function. :-)I'm pretty sure the following changeset is to blame:
commit edfaa7c36574f1bf09c65ad602412db9da5f96bf
Author: Kay Si...
Just what is this creature that one sees on sparc dumps?
Looks like a smiley ball with two blinkenlights (symbolizing the sun?)
on each side...
--
From: Jan Engelhardt <jengelh@computergmbh.de>
It's ascii art I took it from someone's signature 12 years ago, it's
meant to be the guy on the cover of some of the editions of the
Hitchhikers Guide to the Galaxy by Douglas Adams."Don't Panic!" :-)
--
Ick, that's not good :)
How is it working for anyone else then? sparc64 isn't doing anything
So I'm guessing if you revert this it works?
Do you have CONFIG_SYSFS_DEPRECATED=Y or N?
thanks,
greg k-h
--
On Wed, 6 Feb 2008 15:31:17 -0800
Going offtopic here...
The patch was committed to mainline last week and it has a git timestamp
from eight months ago. When you received the original email from Kay.But the patch changed in that time period. This doesn't seem right?
--
The patch did change over time, but not that much, minor bugfixes for
it. I didn't think to update the original date in the quilt file,
sorry. It was in -mm for quite a while, so I thought it got a good
enough testing period.I'll try to remember to update the timestamp on patches that get
updated, it's a pretty rare thing for my patchflow, shouldn't be hard to
remember.thanks,
greg k-h
--
From: Greg KH <gregkh@suse.de>
Unfortunately that's not so simple to test because afterwards there
are changesets where all the files under block/ got renamed and asI have SYSFS_DEPRECATED=y
--
Understood :(
but since you've bisected it down to this, that's a pretty big
suggestion that this is the problem.What block drivers are you using for sparc? Scsi? Or something else?
What could make sparc64 different from x86 in regards to block device
structure, odd...thanks,
greg k-h
--
From: Greg KH <gregkh@suse.de>
Only Fusion SAS on this system, therefore scsi.
--
Can you send me the output of 'tree /sys/block/' on 2.6.24?
Are there a lot of partitions here? Anything different you can think of
from x86 that I can have a chance to try to narrow things down with? :)I don't know if you can boot without udev, but if you could, can you
send the 'tree' output of the offending kernel too?thanks,
greg k-h
--
Greg, I'm pretty sure I know what's happening.
For whatever reason we're invoking dev_attr_show() on attribute_group
objects.The reason it probably only crashes on sparc64 is because perhaps at
that dev_attr->show offset on x86 there are zero bytes there instead
of a pointer, so the NULL check here in dev_attr_show() masks the bug.The problem with all of this "container_of() this", "container_of()
that" is that we lose real type checking. So unless we add magic
cookies to verify or other hacks, functions never really know if the
container they are being passed really is a subset object of the type
they expect.Can you read the code instead of asking more information from me to
try and figure out why the attribute showing paths might be
misconfigured for these block device objects after the changeset in
question? I can do this, but you're more likely to find the problem
quickly than I am.I redid the bisect to make sure it absolutely was that specific
changeset, and it is.Thanks.
--
We are supposed to be careful about this, but bad things are known to
Yes, I'll look into it tonight if I get this -stable push out in time,
Thanks for doing that, I'll let you know when I have a patch to test.
greg k-h
--
From: Greg KH <gregkh@suse.de>
I found the problem, it's the "whole_disk" partition attribute.
Look in fs/partitions/check.c:add_partition() where it tests the
ADDPART_FLAG_WHOLEDISK flag.I think that code block needs to be updated to match the rest of the
changes in the offending changeset we are discussing.
--
So, if you just comment out that whole "if (flags &
ADDPART_FLAG_WHOLEDISK)" chunk, does the oops go away? I think that is
the real solution here as I don't see what this attribute is supposed to
be showing.thanks,
greg k-h
--
From: Greg KH <gregkh@suse.de>
See my other reply, it's supposed to just exist and be
a zero length file. udev looks for it to determine
which partition is the "whole disk" partition
--
I don't understand that code at all, on 2.6.24, what does reading that
file give you? At first glance, I don't see how that file would spit
out anything and not give you the same kind of oops.you are in a maze of kobject pointers, all alike...
--
From: Greg KH <gregkh@suse.de>
It's supposed to just exist, and be an empty zero length file.
That's why it's given no ->show method pointer.It's existence just means that the partition is a "whole disk"
partition type.
--
Ah, ok, that's a bit wierd, thanks for explaining it, I'll go make it
work...thanks,
greg k-h
--
Can you try this patch to see if it solves the oops, and that the file
is still there and works properly?thanks,
greg k-h
---
fs/partitions/check.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -353,11 +353,9 @@ void add_partition(struct gendisk *disk,
partition_sysfs_add_subdir(p);
p->dev.uevent_suppress = 0;
if (flags & ADDPART_FLAG_WHOLEDISK) {
- static struct attribute addpartattr = {
- .name = "whole_disk",
- .mode = S_IRUSR | S_IRGRP | S_IROTH,
- };
- err = sysfs_create_file(&p->dev.kobj, &addpartattr);
+ static DEVICE_ATTR(whole_disk, S_IRUSR | S_IRGRP | S_IROTH,
+ NULL, NULL);
+ err = device_create_file(&p->dev, &dev_attr_whole_disk);
}/* suppress uevent if the disk supresses it */
--
From: Greg KH <gregkh@suse.de>
It doesn't crash, but the file returns -EIO instead of zero when read.
--
How about this attempt?
thanks for your patience,
greg "Kay owes me a beer" k-h
---
fs/partitions/check.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -319,6 +319,14 @@ void delete_partition(struct gendisk *di
put_device(&p->dev);
}+static ssize_t whole_disk_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return 0;
+}
+static DEVICE_ATTR(whole_disk, S_IRUSR | S_IRGRP | S_IROTH,
+ whole_disk_show, NULL);
+
void add_partition(struct gendisk *disk, int part, sector_t start, sector_t len, int flags)
{
struct hd_struct *p;
@@ -352,13 +360,8 @@ void add_partition(struct gendisk *disk,
device_add(&p->dev);
partition_sysfs_add_subdir(p);
p->dev.uevent_suppress = 0;
- if (flags & ADDPART_FLAG_WHOLEDISK) {
- static struct attribute addpartattr = {
- .name = "whole_disk",
- .mode = S_IRUSR | S_IRGRP | S_IROTH,
- };
- err = sysfs_create_file(&p->dev.kobj, &addpartattr);
- }
+ if (flags & ADDPART_FLAG_WHOLEDISK)
+ err = device_create_file(&p->dev, &dev_attr_whole_disk);/* suppress uevent if the disk supresses it */
if (!disk->dev.uevent_suppress)
--
From: Greg KH <gregkh@suse.de>
That works, thanks. Please push to Linus :-)
Acked-by: David S. Miller <davem@davemloft.net>
--
Great, thanks for testing, I'll send it to him tomorrow.
greg k-h
--
Picky, picky, picky :)
It's odd that the original one didn't also do that, but I'll fix that up
properly, let me try again...thanks,
greg k-h
--
From: Greg KH <gregkh@suse.de>
In the old code it would vector through part_sysfs_ops (look at how it
works before the changeset), which returns 0 if the attribute lacked a
->show method.--
Ah, yeah, what a fun hack... Makes sense though. The updated patch I
send you should do the same thing, just with an additional function
needed.thanks,
greg k-h
--
Ah, great catch. That code just looks wrong.
I'm guessing that you have a partition that is the whole disk? That
would make sense why I and most others haven't seen this yet.I'll make up a patch...
thanks a lot for still digging into this,
greg k-h
--
From: Greg KH <gregkh@suse.de>
It's an attribute used by Sun disk labels, usually it's the
third partition.--
From: Greg KH <gregkh@suse.de>
It's a pretty simple partitioning scheme.
It actually is able to continue booting after udevd gets
killed off by the OOPS.I'm going to run a bisect again and get some other info
for you...Here is the tree output with current GIT:
/sys/block/
|-- loop0 -> ../devices/virtual/block/loop0
|-- loop1 -> ../devices/virtual/block/loop1
|-- loop2 -> ../devices/virtual/block/loop2
|-- loop3 -> ../devices/virtual/block/loop3
|-- loop4 -> ../devices/virtual/block/loop4
|-- loop5 -> ../devices/virtual/block/loop5
|-- loop6 -> ../devices/virtual/block/loop6
|-- loop7 -> ../devices/virtual/block/loop7
|-- ram0 -> ../devices/virtual/block/ram0
|-- ram1 -> ../devices/virtual/block/ram1
|-- ram10 -> ../devices/virtual/block/ram10
|-- ram11 -> ../devices/virtual/block/ram11
|-- ram12 -> ../devices/virtual/block/ram12
|-- ram13 -> ../devices/virtual/block/ram13
|-- ram14 -> ../devices/virtual/block/ram14
|-- ram15 -> ../devices/virtual/block/ram15
|-- ram2 -> ../devices/virtual/block/ram2
|-- ram3 -> ../devices/virtual/block/ram3
|-- ram4 -> ../devices/virtual/block/ram4
|-- ram5 -> ../devices/virtual/block/ram5
|-- ram6 -> ../devices/virtual/block/ram6
|-- ram7 -> ../devices/virtual/block/ram7
|-- ram8 -> ../devices/virtual/block/ram8
|-- ram9 -> ../devices/virtual/block/ram9
|-- sda -> ../devices/pci0000:02/0000:02:00.0/0000:03:02.0/0000:0a:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/block/sda
`-- sr0 -> ../devices/pci0000:02/0000:02:00.0/0000:03:01.0/0000:04:00.0/0000:05:01.0/0000:06:00.0/0000:07:00.2/usb3/3-2/3-2:1.0/host1/target1:0:0/1:0:0:0/block/sr026 directories, 0 files
--
From: David Miller <davem@davemloft.net>
I tested with SYSFS_DEPRECATED disabled, it made no
difference.
--
| Greg Kroah-Hartman | [PATCH 006/196] Chinese: add translation of oops-tracing.txt |
| Andrew Morton | Re: -mm merge plans for 2.6.23 -- sys_fallocate |
| Eric W. Biederman | [PATCH] nfs lockd reclaimer: Convert to kthread API |
| James Bottomley | Re: Integration of SCST in the mainstream Linux kernel |
git: | |
| David Miller | [GIT]: Networking |
| Gerrit Renker | [PATCH 03/37] dccp: List management for new feature negotiation |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
