I repulled all the trees an hour or two ago, installed everything on an 8-way x86_64 box and: stack-protector: Testing -fstack-protector-all feature No -fstack-protector-stack-frame! -fstack-protector-all test failed ------------[ cut here ]------------ WARNING: at kernel/panic.c:369 __stack_chk_test+0x4b/0x51() Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.25-mm1 #4 Call Trace: [<ffffffff80256692>] ? print_modules+0x88/0x8f [<ffffffff80237b70>] warn_on_slowpath+0x58/0x7f [<ffffffff802388fe>] ? printk+0x67/0x69 [<ffffffff8034ec74>] ? debug_write_lock_after+0x18/0x1f [<ffffffff8034ed43>] ? _raw_write_unlock+0x29/0x7b [<ffffffff804f0254>] ? _write_unlock+0x9/0xb [<ffffffff8023d25e>] ? insert_resource+0xe3/0xea [<ffffffff80237be2>] __stack_chk_test+0x4b/0x51 [<ffffffff8092f912>] kernel_init+0x16c/0x29e [<ffffffff8020ce58>] child_rip+0xa/0x12 [<ffffffff8092f7a6>] ? kernel_init+0x0/0x29e [<ffffffff8020ce4e>] ? child_rip+0x0/0x12 ---[ end trace da2bc9ee81defeda ]--- usb/sysfs: ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 17 (level, low) -> IRQ 17 uhci_hcd 0000:00:1d.0: UHCI Host Controller uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 1 uhci_hcd 0000:00:1d.0: irq 17, io base 0x00002080 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 2 ports detected sysfs: duplicate filename '189:0' can not be created ------------[ cut here ]------------ WARNING: at fs/sysfs/dir.c:425 sysfs_add_one+0x42/0x7c() Modules linked in: uhci_hcd(+) Pid: 600, comm: insmod Tainted: G W 2.6.25-mm1 #4 Call Trace: [<ffffffff80256692>] ? print_modules+0x88/0x8f [<ffffffff80237b70>] warn_on_slowpath+0x58/0x7f [<ffffffff802388fe>] ? printk+0x67/0x69 [<ffffffff804f0249>] ? _spin_unlock+0x9/0xb [<ffffffff802a932f>] ? ifind+0x72/0x82 [<ffffffff802e0c49>] ? sysfs_ilookup_test+0x0/0x14 [...
An old T21 is failing to boot and the relevant message appears to be [ 1.929536] Probing IDE interface ide0... [ 36.939317] ide0: Wait for ready failed before probe ! [ 37.502676] ide0: DISABLED, NO IRQ [ 37.506356] ide0: failed to initialize IDE interface The owner of ide-mm-ide-add-struct-ide_io_ports-take-2.patch with the "DISABLED, NO IRQ" message is cc'd. I've attached the config, full boot log and lspci -v for the machine in question. I'll start reverting some of the these patches to see if ide-mm-ide-add-struct-ide_io_ports-take-2.patch is really the culprit. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab
Hi, Please try reverting ide-fix-hwif-s-initialization.patch first - it has already been dropped from IDE tree because people were reporting problems similar to the one encountered by you. Thanks, Bart --
Thanks. I reverted this patch and ide-mm-ide-make-ide_hwifs-static.patch (for compile breakage reasons). It's better but still fails to find the IDE device. What is better is that it finds ide0 at; ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 but does not identify any of the disks nor does it find ide1. For comparison, a "good" dmesg looks like [ 1.793244] Probing IDE interface ide0... [ 2.235292] hda: IBM-DJSA-220, ATA DISK drive [ 2.915457] Probing IDE interface ide1... [ 3.787516] hdc: CRN-8241U, ATAPI CD/DVD-ROM drive [ 4.475650] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 [ 4.478096] ide1 at 0x170-0x177,0x376 on irq 15 [ 4.484547] hda: max request size: 128KiB [ 4.522696] hda: 39070080 sectors (20003 MB) w/1874KiB Cache, CHS=41344/15/63 [ 4.530706] hda: cache flushes not supported [ 4.538724] hda: hda1 hda2 hda3 hda4 [ 4.569606] hdc: ATAPI 24X CD-ROM drive, 128kB Cache [ 4.587678] Uniform CD-ROM driver Revision: 3.20 [ 4.595690] Driver 'sd' needs updating - please use bus_type methods Here is the bootlog with the two patches reverted. root (hd0,0) Filesystem type is ext2fs, partition type 0x83 kernel /boot/vmlinuz-2.6.25-mm1 root=/dev/hda1 mminit_loglevel=4 logle vel=9 console=tty0 console=ttyS0,9600 ro earlyprintk=serial,ttyS0,9600 kernelco re=384MB movablecore=384MB profile=sleep,2 resume=/dev/hda2 [Linux-bzImage, setup=0x2c00, size=0x1d9390] savedefault boot [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Linux version 2.6.25-mm1 (mel@arnold) (gcc version 4.2.3 (Debian 4.2.3-3)) #1 SMP Tue Apr 29 10:04:35 IST 2008 [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009f800 (usable) [ 0.000000] BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved) [ 0.000000] BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) [ 0.000000] BIOS-e820: 0000000000100000 - 000000001fff0000 (usable) [ 0.000000] BIOS-e820: 000000001fff0000 - ...
Interestingly, bisection firmly blames this patch and QEMU boots with the two patches reverted but fails with them applied so that patch does cause problems. The failure on the laptop must be depending on some follow-on patch. I tried a hatchet-job revert of the IDE patches between IDE-START and IDE-END in the series file and it similarly fails to probe the IDE devices. So either I made a mess of the reverts (strong possibility) or there is more than one problem patch. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
The third patch that needed reverting was gregkh-pci-pci-clean-up-resource-alignment-management.patch (owners added to cc). The relevant hint in the a diff between a broken and working bootlog was; system 00:09: ioport range 0x15e0-0x15ef has been reserved + PCI: bogus alignment of resource 7 [100:1ff] (flags 100) of 0000:00:02.0 + PCI: bogus alignment of resource 8 [100:1ff] (flags 100) of 0000:00:02.0 + PCI: bogus alignment of resource 9 [4000000:7ffffff] (flags 1200) of 0000:00:02.0 + PCI: bogus alignment of resource 10 [4000000:7ffffff] (flags 200) of 0000:00:02.0 + PCI: bogus alignment of resource 7 [100:1ff] (flags 100) of 0000:00:02.1 + PCI: bogus alignment of resource 8 [100:1ff] (flags 100) of 0000:00:02.1 + PCI: bogus alignment of resource 9 [4000000:7ffffff] (flags 1200) of 0000:00:02.1 + PCI: bogus alignment of resource 10 [4000000:7ffffff] (flags 200) of 0000:00:02.1 With the resource alignment patch and the two IDE patches reverted, the laptop is able to boot. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
Thanks for tracking it down.
Hmm, it seems that the above patch was merged a week ago:
commit bda0c0afa7a694bb1459fd023515aca681e4d79a
Merge: 904e0ab... af40b48...
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon Apr 21 15:58:35 2008 -0700
Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pci-2.6
...
PCI: clean up resource alignment management
...
but it could be that the issue has been already fixed in git tree
(could you verify it please?).
BTW according to lspci output you should be able to use piix driver
instead of ide_generic on this laptop.
Thanks,
Bart
--I know but the config is a bit minimal for faster building as it's only intended for sniff-testing patches. Thanks for the help. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
ide-mm-ide-add-struct-ide_io_ports-take-2.patch is now in mainline so a quicky confirmation would be to test Linus's tree. --
2.6.25 and latest git are both booting fine. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
Another runtime warning on the t61p: Brought up 2 CPUs Total of 2 processors activated (9583.80 BogoMIPS). CPU0 attaching sched-domain: domain 0: span 00000000,00000003 groups: 00000000,00000001 00000000,00000002 domain 1: span 00000000,00000003 groups: 00000000,00000003 CPU1 attaching sched-domain: domain 0: span 00000000,00000003 groups: 00000000,00000002 00000000,00000001 domain 1: span 00000000,00000003 groups: 00000000,00000003 ------------[ cut here ]------------ WARNING: at kernel/lockdep.c:2677 check_flags+0x84/0x11f() Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.25-mm1 #15 Call Trace: [<ffffffff8105f7ec>] ? print_modules+0x88/0x8f [<ffffffff81037b55>] warn_on_slowpath+0x58/0x7f [<ffffffff81056143>] ? trace_hardirqs_off+0xd/0xf [<ffffffff810560b7>] ? trace_hardirqs_off_caller+0x1d/0x9c [<ffffffff81056143>] ? trace_hardirqs_off+0xd/0xf [<ffffffff810560b7>] ? trace_hardirqs_off_caller+0x1d/0x9c [<ffffffff81056143>] ? trace_hardirqs_off+0xd/0xf [<ffffffff81058576>] ? __lock_acquire+0x809/0x893 [<ffffffff810560b7>] ? trace_hardirqs_off_caller+0x1d/0x9c [<ffffffff81056143>] ? trace_hardirqs_off+0xd/0xf [<ffffffff812b94d1>] ? __atomic_notifier_call_chain+0x0/0x81 [<ffffffff8105627e>] check_flags+0x84/0x11f [<ffffffff81058914>] lock_acquire+0x54/0xb4 [<ffffffff812b9515>] __atomic_notifier_call_chain+0x44/0x81 [<ffffffff8100a2c2>] ? mwait_idle+0x0/0x49 [<ffffffff812b9561>] atomic_notifier_call_chain+0xf/0x11 [<ffffffff8100a228>] __exit_idle+0x27/0x29 [<ffffffff8100b33c>] cpu_idle+0xdf/0xf7 [<ffffffff812b10da>] start_secondary+0xb2/0xb4 ---[ end trace 93d72a36b9146f22 ]--- possible reason: unannotated irqs-on. irq event stamp: 34 hardirqs last enabled at (33): [<ffffffff812b63f0>] trace_hardirqs_on_thunk+0x3a/0x3f hardirqs last disabled at (34): [<ffffffff81056143>] trace_hardirqs_off+0xd/0xf ...
oop, there's more: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA firewire_core: created device fw0: GUID 00016c2000174bad, S400 PM: Device usb4 failed to restore: error -113 eth0: Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX eth0: 10/100 speed: disabling TSO PM: Device usb5 failed to restore: error -113 PM: Device usb7 failed to restore: error -113 sd 0:0:0:0: [sda] Starting disk PM: Image restored successfully. Restarting tasks ... done. PM: Basic memory bitmaps freed Those USB restore failures are new. They're similar to the ones on the doesnt-resume-properly-any-more Vaio. They came out from the machine's second (successful) resume-from-disk. --
I got USB messages after s2ram + suspend to disk combination, too, but machine seems to work. ata1.00: ACPI cmd ef/10:03:00:00:00:a0 succeeded ata1.00: configured for UDMA/100 ata1.00: configured for UDMA/100 ata1: EH complete sd 0:0:0:0: [sda] 117210240 512-byte hardware sectors (60012 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 117210240 512-byte hardware sectors (60012 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA PM: Device usb2 failed to restore: error -113 PM: Device usb3 failed to restore: error -113 PM: Device usb4 failed to restore: error -113 PM: Image restored successfully. Restarting tasks ... done. PM: Basic memory bitmaps freed wlan0: RX disassociation from 00:11:2f:0e:95:a0 (reason=7) wlan0: disassociated (Apart from some wireless problems, solved by reconnecting...) (And ipw3945 LED indication now seems to work, good!) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
Try rmmod usb / insmod usb around suspend to see if it is usb-specific, or if something went seriously wrong in core. Or you might just bisect it ;-). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
There's no need to worry about them. They merely indicate that the root hubs didn't resume along with everything else, because they were already suspended when the system went to sleep and so they were left suspended. The return codes in usbcore will be changed soon so that this won't appear to be an error. Alan Stern --
I found another machine! This one's an old 4-way Nocona (x86_64) http://userweb.kernel.org/~akpm/config-x.txt http://userweb.kernel.org/~akpm/dmesg-x.txt CPU: Physical Processor ID: 0 CPU: Processor Core ID: 0 CPU0: Thermal monitoring enabled (TM1) ACPI: Core revision 20080321 Parsing all Control Methods: Table [DSDT](id 0001) - 461 Objects with 50 Devices 130 Methods 11 Regions tbxface-0598 [00] tb_load_namespace : ACPI Tables successfully acquired evxfevnt-0091 [00] enable : Transition to ACPI mode successful ------------[ cut here ]------------ WARNING: at arch/x86/kernel/genapic_64.c:86 read_apic_id+0x31/0x67() Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.25-mm1 #16 Call Trace: [<ffffffff8025272f>] ? print_modules+0x88/0x8f [<ffffffff80233493>] warn_on_slowpath+0x58/0x81 [<ffffffff80351ceb>] ? debug_spin_lock_after+0x18/0x1f [<ffffffff8035217a>] ? _raw_spin_lock+0x116/0x120 [<ffffffff80228398>] ? sub_preempt_count+0x6d/0x74 [<ffffffff804e9ba3>] ? _spin_unlock_irqrestore+0x33/0x40 [<ffffffff803523e6>] ? debug_smp_processor_id+0x32/0xc4 [<ffffffff8021ede5>] read_apic_id+0x31/0x67 [<ffffffff8066f7f2>] verify_local_APIC+0xa7/0x163 [<ffffffff8066e837>] native_smp_prepare_cpus+0x1ed/0x301 [<ffffffff80669ab2>] kernel_init+0x5a/0x276 [<ffffffff804e9a1e>] ? _spin_unlock_irq+0x2a/0x35 [<ffffffff8022b7c2>] ? finish_task_switch+0x68/0x7f [<ffffffff8020c1d8>] child_rip+0xa/0x12 [<ffffffff80669a58>] ? kernel_init+0x0/0x276 [<ffffffff8020c1ce>] ? child_rip+0x0/0x12 ---[ end trace 4eaa2a86a8e2da22 ]--- ------------[ cut here ]------------ WARNING: at arch/x86/kernel/genapic_64.c:86 read_apic_id+0x31/0x67() Modules linked in: Pid: 1, comm: swapper Tainted: G W 2.6.25-mm1 #16 Call Trace: [<ffffffff8025272f>] ? print_modules+0x88/0x8f [<ffffffff80233493>] warn_on_slowpath+0x58/0x81 [<ffffffff80351ceb>] ...
that came in via the UV-APIC patchset but the warning is entirely
harmless. At that point we've got a single CPU running only so
preemption of that code to another CPU is not possible.
native_smp_prepare_cpus() should probably just disable preemption, that
way we could remove all those ugly preempt disable-enable calls from the
called functions - per the patch below. (not boot tested yet - might
provoke atomic-scheduling warnings if i forgot about some schedule point
in this rather large codepath)
Ingo
------------------->
Subject: x86: disable preemption in native_smp_prepare_cpus
From: Ingo Molnar <mingo@elte.hu>
Date: Fri Apr 18 11:07:10 CEST 2008
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/smpboot.c | 2 ++
1 file changed, 2 insertions(+)
Index: linux-x86.q/arch/x86/kernel/smpboot.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/smpboot.c
+++ linux-x86.q/arch/x86/kernel/smpboot.c
@@ -1181,6 +1181,7 @@ static void __init smp_cpu_index_default
*/
void __init native_smp_prepare_cpus(unsigned int max_cpus)
{
+ preempt_disable();
nmi_watchdog_default();
smp_cpu_index_default();
current_cpu_data = boot_cpu_data;
@@ -1237,6 +1238,7 @@ void __init native_smp_prepare_cpus(unsi
printk(KERN_INFO "CPU%d: ", 0);
print_cpu_info(&cpu_data(0));
setup_boot_clock();
+ preempt_enable();
}
/*
* Early setup to make printk work.
--that should be the patch below.
Ingo
------------>
Subject: x86: disable preemption in native_smp_prepare_cpus
From: Ingo Molnar <mingo@elte.hu>
Date: Fri Apr 18 11:07:10 CEST 2008
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/smpboot.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
Index: linux-x86.q/arch/x86/kernel/smpboot.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/smpboot.c
+++ linux-x86.q/arch/x86/kernel/smpboot.c
@@ -1181,6 +1181,7 @@ static void __init smp_cpu_index_default
*/
void __init native_smp_prepare_cpus(unsigned int max_cpus)
{
+ preempt_disable();
nmi_watchdog_default();
smp_cpu_index_default();
current_cpu_data = boot_cpu_data;
@@ -1197,7 +1198,7 @@ void __init native_smp_prepare_cpus(unsi
if (smp_sanity_check(max_cpus) < 0) {
printk(KERN_INFO "SMP disabled\n");
disable_smp();
- return;
+ goto out;
}
preempt_disable();
@@ -1237,6 +1238,8 @@ void __init native_smp_prepare_cpus(unsi
printk(KERN_INFO "CPU%d: ", 0);
print_cpu_info(&cpu_data(0));
setup_boot_clock();
+out:
+ preempt_enable();
}
/*
* Early setup to make printk work.
--that's the stackprotector self-test: you probably have a gcc that cannot build a proper stackprotector kernel. No damage other than having no stackprotector. Arjan Cc:-ed. Ingo --
On Fri, Apr 18, 2008 at 2:03 AM, Andrew Morton
Andrew, you don't seem to have slab debugging enabled:
# CONFIG_DEBUG_SLAB is not set
And quite frankly, the oops looks unlikely to be a slab bug but rather
a plain old slab corruption cause by the callers...
Pekka
--hm, there's sel_netnode_free() in the stackframe - that's from security/selinux/netnode.c. Andrew, any recent changes in that area? Ingo --
I've reverted the -mm only change to that file in
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6.git#for-akpm
commit f777964ad75cf4a119d911d12e81948d2402677f
Author: James Morris <jmorris@namei.org>
Date: Fri Apr 18 20:27:24 2008 +1000
Revert "SELinux: Made netnode cache adds faster"
This reverts commit 6bf8f41d4efdf9d4eeb4f7df9c591e281f7da93e.
Possible cause of slab corruption in -mm.
--
James Morris
<jmorris@namei.org>
--Keep in mind that slab might have been corrupted by someone else much earlier but we didn't notice due to the lack of CONFIG_SLAB_DEBUG. --
Yes, I'd agree. All has been peachy since I dropped git-selinux. --
On Thu, 17 Apr 2008 16:03:31 -0700 do you have a stack-protector capable GCC? I guess not. This is a catch-22. You do not have stack-protector. Should we make that a silent failure? or do you want to know that you don't have a security feature you thought you had.... complaining seems to be the right thing to do imo. -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
A #warning sounds more appropriate. --
this warning is telling the user that the security feature that got enabled in the .config is completely, 100% not working due to using a stack-protector-incapable GCC. it's analogous as if there was a bug in gcc that made SELinux totally ineffective in some mitigate-exploit-damage scenarios. No harm done on a perfectly bug-free system - but once a bug happens that SELinux should have mitigated, the breakage becomes real. Having a prominent warning is the _minimum_. having a build failure would be nice too because this is a build environment problem. (not a build warning - warnings can easily be missed because on a typical kernel build there's so many false positives that get emitted by various other warning mechanisms) Arjan? Ingo --
Not really. In the selinux case we don't know that it'll break at compile Yeah, #error would work too. --
On Fri, 18 Apr 2008 00:28:58 -0700
I'm totally fine with that, but I think I need Sam's help on making that happen
the right way; this is going to need makefile fu L(
Sam:
Basically what I need is that if the
scripts/gcc-x86_64-has-stack-protector.sh script fails, the build aborts with
a message/#error that says that the compiler is not capable of supporting this feature.
Right now the script is used like this:
stackp := $(CONFIG_SHELL) $(srctree)/scripts/gcc-x86_64-has-stack-protector.sh
stackp-$(CONFIG_CC_STACKPROTECTOR) := $(shell $(stackp) \
"$(CC)" -fstack-protector )
It's obviously easy to make this script print a warning.. but how do we make it stop the build?
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--ok I found a way that works for me:
From: Arjan van de Ven <arjan@linux.intel.com>
Subject: [PATCH] stackprotector: turn not having the right gcc into an #error
If the user selects the stack-protector config option, but does not have
a gcc that has the right bits enabled (for example because it isn't build
with a glibc that supports TLS, as is common for cross-compilers, but also
because it may be too old), then the runtime test fails right now.
Andrew rightfully points out that this is a condition we can detect at
build time, and we should error out at that point instead.
This patch adds an error message for this scenario. This error accomplishes
two goals
1) the user is informed that the security option he selective isn't available
2) the user has enough info to turn of the CONFIG option that won't work for him,
and would make the runtime test fail anyway.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
arch/x86/Makefile | 2 +-
kernel/panic.c | 3 +++
2 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 3cff3c8..c3e0eee 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -73,7 +73,7 @@ else
stackp := $(CONFIG_SHELL) $(srctree)/scripts/gcc-x86_64-has-stack-protector.sh
stackp-$(CONFIG_CC_STACKPROTECTOR) := $(shell $(stackp) \
- "$(CC)" -fstack-protector )
+ "$(CC)" "-fstack-protector -DGCC_HAS_SP" )
stackp-$(CONFIG_CC_STACKPROTECTOR_ALL) += $(shell $(stackp) \
"$(CC)" -fstack-protector-all )
diff --git a/kernel/panic.c b/kernel/panic.c
index c92c1e2..7cbcd8e 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -321,6 +321,9 @@ EXPORT_SYMBOL(warn_on_slowpath);
#ifdef CONFIG_CC_STACKPROTECTOR
+#ifndef GCC_HAS_SP
+#error You have selected the CONFIG_CC_STACKPROTECTOR option, but the gcc used does not support this.
+#endif
static unsigned long __stack_check_testing;
/*
*...you noticed it ;-) Distro maintainers will notice it too if it pops up
when something breaks StackProtector. Normal user might not notice. (but
normal user might not notice a few hundred guest roots either)
but ... the real thing that made it slip into your config was that it
was default-enabled in x86/latest - the patch below should fix that.
we need the warning: it could have caught the toplevel Makefile change
last October that broke StackProtector completely. So no, we wont be and
cannot be silent about this anymore - we need and now have an end-to-end
test about it.
Ingo
------------------>
Subject: stackprotector: non default
From: Ingo Molnar <mingo@elte.hu>
Date: Fri Apr 18 11:13:17 CEST 2008
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/Kconfig | 1 -
1 file changed, 1 deletion(-)
Index: linux-x86.q/arch/x86/Kconfig
===================================================================
--- linux-x86.q.orig/arch/x86/Kconfig
+++ linux-x86.q/arch/x86/Kconfig
@@ -1146,7 +1146,6 @@ config CC_STACKPROTECTOR
bool "Enable -fstack-protector buffer overflow detection (EXPERIMENTAL)"
depends on X86_64
select CC_STACKPROTECTOR_ALL
- default y
help
This option turns on the -fstack-protector GCC feature. This
feature puts, at the beginning of functions, a canary value on
--For what it's worth I just looked over the changes in netnode.c and nothing is jumping out at me. The changes ran fine for me when tested on the later 2.6.25-rcX kernels but I suppose that doesn't mean a whole lot. I've got a 4-way x86_64 box but it needs to be installed (which means I'm not going to be able to do anything useful with it until tomorrow at the earliest). I'll try it out and see if I can recreate the problem. -- paul moore linux @ hp --
On Thu, 17 Apr 2008 19:55:46 -0400
I dropped git-selinux and that crash seems to have gone away. It took about
five minutes before, but would presumably have happened earlier if I'd
reduced the cache size.
btw, wouldn't this
--- a/security/selinux/netnode.c~a
+++ a/security/selinux/netnode.c
@@ -190,7 +190,7 @@ static int sel_netnode_insert(struct sel
if (sel_netnode_hash[idx].size == SEL_NETNODE_HASH_BKT_LIMIT) {
struct sel_netnode *tail;
tail = list_entry(node->list.prev, struct sel_netnode, list);
- list_del_rcu(node->list.prev);
+ list_del_rcu(&tail->list);
call_rcu(&tail->rcu, sel_netnode_free);
} else
sel_netnode_hash[idx].size++;
_
be a bit clearer? If it's correct - I didn't try too hard :)
--Looks good to me, although before I fix this let me try and figure out why this code is causing the machine to puke all over itself. Priorities you know :) -- paul moore linux @ hp --
On Thu, 17 Apr 2008 19:55:46 -0400 Perhaps it was tested only against slub? That config uses slab. --
Yes, I believe it was testing it with slub. -- paul moore linux @ hp --
On Thu, 17 Apr 2008 16:03:31 -0700 With git-selinux at top-of tree it's repeatably hanging in the CPA self-tests (git-x86 stuff). Last two lines are: CPA self-test: 4k 8704 large 4847 gb 0 x 0[0-0] miss 0 (clear as mud ;)) I will find the config knob to disable that test. Of course, it could be telling me that CPA is buggy. --
On Thu, 17 Apr 2008 16:40:34 -0700 Disabling CPA_DEBUG didn't help. It's still hanging. The final initcall is init_kgdbts() and disabling KGDB prevents the hang. --
In this case you do not have to disable kgdb, but just disable the kgdb test suite. Certainly I would be interested to know where it is failing as it would indicate that there is a regression that is caused by a change that occurred somewhere else in the kernel or a latent defect in kgdb was triggered. The kgdb test suite exercises a number of kernel fault systems as well as arch specific single stepping when it runs and when it fails it is likely worth it to track down which test failed and why. If you are looking to bypass the kgdb test suite you have two options. The kernel option that runs the tests on boot (which is not on by default) is CONFIG_KGDB_TESTS_ON_BOOT, and make sure this is off. You can turn off the tests in an already compiled kernel that had the testing turned on with boot by adding the boot argument with nothing on the other side of the = sign of the kgdbts paramter. Like: kgdbts= In terms of debugging what happened, if you have console output you can save, please do send me the output of kernel boot with the kernel boot argument: kgdbts=V2 That enables verbose logging of exactly what is going on and will show where wheels fall off the cart. If the kernel is dying silently it means the early exception code has completely failed in some way on the kernel architecture that was selected, and of course the .config is always useful in this case. Thanks, Jason. --
incidentally, just today, in overnight testing i triggered a similar hang in the KGDB self-test: http://redhat.com/~mingo/misc/config-Thu_Apr_17_23_46_36_CEST_2008.bad to get a similar tree to the one i tested, pick up sched-devel/latest from: http://people.redhat.com/mingo/sched-devel.git/README pick up that failing .config, do 'make oldconfig' and accept all the defaults to get a comparable kernel to mine. (kgdb is embedded in sched-devel.git.) the hang was at: [ 12.504057] Calling initcall 0xffffffff80b800c1: init_kgdbts+0x0/0x1b() [ 12.511298] kgdb: Registered I/O driver kgdbts. [ 12.515062] kgdbts:RUN plant and detach test [ 12.520283] kgdbts:RUN sw breakpoint test [ 12.524651] kgdbts:RUN bad memory access test [ 12.529052] kgdbts:RUN singlestep breakpoint test full log: http://redhat.com/~mingo/misc/log-Thu_Apr_17_23_46_36_CEST_2008.bad note that this was a 64-bit config too - our tests do a perfect mix of 50% 32-bit and 50% 64-bit kernels. So single-stepping of the kernel broke in some circumstances. find the boot log below. (it also includes all command line parameters) This is the first time ever i saw the self-test in KGDB hanging, so it's some recent non-KGDB change that provoked it or made it more likely. The KGDB self-test runs very frequently in my bootup tests: [ 12.508236] kgdb: Registered I/O driver kgdbts. [ 12.511245] kgdbts:RUN plant and detach test [ 12.517418] kgdbts:RUN sw breakpoint test [ 12.521056] kgdbts:RUN bad memory access test [ 12.525515] kgdbts:RUN singlestep breakpoint test [ 12.531483] kgdbts:RUN hw breakpoint test [ 12.536142] kgdbts:RUN hw write breakpoint test [ 12.541007] kgdbts:RUN access write breakpoint test [ 12.546223] kgdbts:RUN do_fork for 100 breakpoints so the latest kgdb-light tree literally survived thousands of such tests since it was changed last. unfortunately, the condition was not reproducible - i booted it once more and then it came up just f...
So I pulled your tree and I would agree there was a problem. But it seems unrelated to kgdb. I bisected the tree because it worked starting with the kgdb-light merge. It fails once with the patch below, but it is not clear as to why other than the lock must have something to do with it. I'll submit a patch to the kgdb test suite to increase the amount of loops through the single step test as it is it can definitely catch things :-) Jason. From 84556fe84dd975161e70b782d7d7cc7bd080c06a Mon Sep 17 00:00:00 2001 From: Ingo Molnar <mingo@elte.hu> Date: Thu, 28 Feb 2008 21:00:21 +0100 Subject: [PATCH 0883/1078] sched: make cpu_clock() globally synchronous Alexey Zaytsev reported (and bisected) that the introduction of cpu_clock() in printk made the timestamps jump back and forth. Make cpu_clock() more reliable while still keeping it fast when it's called frequently. Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/sched.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 files changed, 49 insertions(+), 3 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index 8dcdec6..7377222 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -632,11 +632,39 @@ int sysctl_sched_rt_runtime = 950000; */ #define RUNTIME_INF ((u64)~0ULL) +static const unsigned long long time_sync_thresh = 100000; + +static DEFINE_PER_CPU(unsigned long long, time_offset); +static DEFINE_PER_CPU(unsigned long long, prev_cpu_time); + /* - * For kernel-internal use: high-speed (but slightly incorrect) per-cpu - * clock constructed from sched_clock(): + * Global lock which we take every now and then to synchronize + * the CPUs time. This method is not warp-safe, but it's good + * enough to synchronize slowly diverging time sources and thus + * it's good enough for tracing: */ -unsigned long long cpu_clock(int cpu) +static DEFINE_SPINLOCK(time_sync_lock); +static unsigned long long prev_global_time; + +static unsigned long long __sync_cpu_cloc...
With the patch below, it seems 100% reproducible to me (7 out of 7
bootups hung).
The number of loops it could do before hanging were, in order: 697,
898, 237, 55, 45, 92, 59
It seems timing-related, so I'm guessing it could be some interaction
with interrupts?
Vegard
diff --git a/drivers/misc/kgdbts.c b/drivers/misc/kgdbts.c
index 6d6286c..ee87820 100644
--- a/drivers/misc/kgdbts.c
+++ b/drivers/misc/kgdbts.c
@@ -895,7 +895,13 @@ static void kgdbts_run_tests(void)
v1printk("kgdbts:RUN bad memory access test\n");
run_bad_read_test();
v1printk("kgdbts:RUN singlestep breakpoint test\n");
- run_singlestep_break_test();
+
+ while(1) {
+ static int i = 0;
+
+ run_singlestep_break_test();
+ printk(KERN_EMERG "test #%d successfull\n", i++);
+ }
/* ===Optional tests=== */
--cool! Jason: i think that particular self-test should be repeated 1000 times before reporting success ;-) Ingo --
I assume this was SMP? While I had not tried it yet, my guess would have been this did not happen on a UP kernel. If it does occur on a UP kernel it means the problem is squarely between the task scheduling after the exception is handled and the kgdb state logic for re-entering the debug state after a single step exception occurs. It seems reasonable to go for 1000 iterations of this particular test to declare success as pointed out by Ingo. Previous versions of kgdb handled some of the irq + single step + cpu sync slightly differently and it is entirely possible there is a regression there. Jason. --
On Fri, Apr 18, 2008 at 3:02 PM, Jason Wessel
Yes. But now that I realize this, I tried running same kernel with
qemu, using -smp 16, and it seems to be stuck here:
[ 16.562659] kgdb: Registered I/O driver kgdbts.
[ 16.565875] kgdbts:RUN plant and detach test
and the code is at kgdb_handle_exception():
/*
* Wait for the other CPUs to be notified and be waiting for us:
*/
for_each_online_cpu(i) {
while (!atomic_read(&cpu_in_kgdb[i]))
cpu_relax();
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--Unless you have a qemu with the NMI patches, kgdb does not work on SMP with qemu. The very first test is going to fail because the IPI sent by the kernel is not handled in qemu's hardware emulation. Jason. --
Oops, no, and that makes sense. I now picked up qemu 0.9.1 and applied the three NMI/SMI patches by Jan Kiszka. So in qemu it seems to run fine now, except that I need to prod it sometimes (it gets stuck in cpu_clock() and I have to break/continue from gdb to make it proceed). Oh, there it made it to 1056, and gdb can't interrupt anymore. Hmm. This is probably not a very good But booting with nosmp on real hardware gets easily above 100,000 iterations of the loop (before I reboot), so it seems to be related to that, anyway. Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 --
It gets stuck in kgdb_roundup_cpus(), verified by putting a printk() before and after this call (in kgdb_handle_exception()). Simple, but effective :-) Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 --
Interesting, that's the new major:minor code. I'll go poke at it... thanks, greg k-h --
Is this with the deprecated CONFIG_USB_DEVICE_CLASS=y? They have the same dev_t as usb_device and would be a reason for the duplicates. Thanks, Kay --
The mac g5 is warning us about stuff too: io scheduler deadline registered io scheduler cfq registered io scheduler bfq registered proc_dir_entry '00' already registered Call Trace: [c00000017a0fbb80] [c000000000012018] .show_stack+0x58/0x1dc (unreliable) [c00000017a0fbc30] [c00000000013f68c] .proc_register+0x218/0x260 [c00000017a0fbce0] [c00000000013fab8] .proc_mkdir_mode+0x40/0x74 [c00000017a0fbd60] [c0000000001f49a8] .pci_proc_attach_device+0x90/0x134 [c00000017a0fbe00] [c0000000005f0084] .pci_proc_init+0x68/0xa0 [c00000017a0fbe80] [c0000000005cbc94] .kernel_init+0x1ec/0x430 [c00000017a0fbf90] [c000000000026fc0] .kernel_thread+0x4c/0x68 nvidiafb: Device ID: 10de0141 nvidiafb: CRTC0 analog not found http://userweb.kernel.org/~akpm/config-g5.txt http://userweb.kernel.org/~akpm/dmesg-g5.txt --
On Fri, 18 Apr 2008 02:48:19 +0200 --
On Thu, Apr 17, 2008 at 4:03 PM, Andrew Morton The duplicate filename <major>:<minor> messages are coming from "sysfs-add-sys-dev-char-block-to-lookup-sysfs-path-by-major-minor.patch" now in Greg's tree. I'll take a look. -- Dan --
| Martin Bligh | Re: Unified tracing buffer |
| Ingo Molnar | [announce] "kill the Big Kernel Lock (BKL)" tree |
| Con Kolivas | [PATCH] [RFC] sched: accurate user accounting |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Krzysztof Oledzki | Error: an inet prefix is expected rather than "0/0". |
| Wenji Wu | A Linux TCP SACK Question |
| Ramachandra K | [PATCH 11/13] QLogic VNIC: Driver utility file - implements various utility macros |
| Jay Cliburn | Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected) |
git: | |
| Andrew Morton | Untracked working tree files |
| Pierre Habouzit | Re: libgit2 - a true git library |
| Nicolas Vilz 'niv' | git + ssh + key authentication feature-request |
| Martin Langhoff | Re: pack operation is thrashing my server |
| Steve B | SSH brute force attacks no longer being caught by PF rule |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| rancor | How to copy/pipe console buffert to file? |
| Richard Stallman | Real men don't attack straw men |
| Question on swap as ramdisk partition | 7 minutes ago | Linux kernel |
| Netfilter kernel module | 10 hours ago | Linux kernel |
| serial driver xmit problem | 13 hours ago | Linux kernel |
| Why Windows is better than Linux | 13 hours ago | Linux general |
| How can I see my kernel messages in vt12? | 20 hours ago | Linux kernel |
| Grub | 1 day ago | Linux general |
| vmalloc_fault handling in x86_64 | 1 day ago | Linux kernel |
| epoll_wait()ing on epoll FD | 1 day ago | Linux kernel |
| Framebuffer in x86_64 causes problems to multiseat | 1 day ago | Linux kernel |
| Difference between 2.4 and 2.6 regarding thread creation | 2 days ago | Linux general |
