I repulled all the trees an hour or two ago, installed everything on an
8-way x86_64 box and:stack-protector:
Testing -fstack-protector-all feature
No -fstack-protector-stack-frame!
-fstack-protector-all test failed
------------[ cut here ]------------
WARNING: at kernel/panic.c:369 __stack_chk_test+0x4b/0x51()
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.25-mm1 #4Call Trace:
[<ffffffff80256692>] ? print_modules+0x88/0x8f
[<ffffffff80237b70>] warn_on_slowpath+0x58/0x7f
[<ffffffff802388fe>] ? printk+0x67/0x69
[<ffffffff8034ec74>] ? debug_write_lock_after+0x18/0x1f
[<ffffffff8034ed43>] ? _raw_write_unlock+0x29/0x7b
[<ffffffff804f0254>] ? _write_unlock+0x9/0xb
[<ffffffff8023d25e>] ? insert_resource+0xe3/0xea
[<ffffffff80237be2>] __stack_chk_test+0x4b/0x51
[<ffffffff8092f912>] kernel_init+0x16c/0x29e
[<ffffffff8020ce58>] child_rip+0xa/0x12
[<ffffffff8092f7a6>] ? kernel_init+0x0/0x29e
[<ffffffff8020ce4e>] ? child_rip+0x0/0x12---[ end trace da2bc9ee81defeda ]---
usb/sysfs:
ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 17 (level, low) -> IRQ 17
uhci_hcd 0000:00:1d.0: UHCI Host Controller
uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 1
uhci_hcd 0000:00:1d.0: irq 17, io base 0x00002080
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 2 ports detected
sysfs: duplicate filename '189:0' can not be created
------------[ cut here ]------------
WARNING: at fs/sysfs/dir.c:425 sysfs_add_one+0x42/0x7c()
Modules linked in: uhci_hcd(+)
Pid: 600, comm: insmod Tainted: G W 2.6.25-mm1 #4Call Trace:
[<ffffffff80256692>] ? print_modules+0x88/0x8f
[<ffffffff80237b70>] warn_on_slowpath+0x58/0x7f
[<ffffffff802388fe>] ? printk+0x67/0x69
[<ffffffff804f0249>] ? _spin_unlock+0x9/0xb
[<ffffffff802a932f>] ? ifind+0x72/0x82
[<ffffffff802e0c49>] ? sysfs_ilookup_test+0x0/0x14
[...
An old T21 is failing to boot and the relevant message appears to be
[ 1.929536] Probing IDE interface ide0...
[ 36.939317] ide0: Wait for ready failed before probe !
[ 37.502676] ide0: DISABLED, NO IRQ
[ 37.506356] ide0: failed to initialize IDE interfaceThe owner of ide-mm-ide-add-struct-ide_io_ports-take-2.patch with the
"DISABLED, NO IRQ" message is cc'd. I've attached the config, full boot log
and lspci -v for the machine in question. I'll start reverting some of the
these patches to see if ide-mm-ide-add-struct-ide_io_ports-take-2.patch
is really the culprit.--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
Hi,
Please try reverting ide-fix-hwif-s-initialization.patch first - it has
already been dropped from IDE tree because people were reporting problems
similar to the one encountered by you.Thanks,
Bart
--
Thanks.
I reverted this patch and ide-mm-ide-make-ide_hwifs-static.patch (for compile
breakage reasons). It's better but still fails to find the IDE device.
What is better is that it finds ide0 at;ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
but does not identify any of the disks nor does it find ide1. For
comparison, a "good" dmesg looks like[ 1.793244] Probing IDE interface ide0...
[ 2.235292] hda: IBM-DJSA-220, ATA DISK drive
[ 2.915457] Probing IDE interface ide1...
[ 3.787516] hdc: CRN-8241U, ATAPI CD/DVD-ROM drive
[ 4.475650] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
[ 4.478096] ide1 at 0x170-0x177,0x376 on irq 15
[ 4.484547] hda: max request size: 128KiB
[ 4.522696] hda: 39070080 sectors (20003 MB) w/1874KiB Cache, CHS=41344/15/63
[ 4.530706] hda: cache flushes not supported
[ 4.538724] hda: hda1 hda2 hda3 hda4
[ 4.569606] hdc: ATAPI 24X CD-ROM drive, 128kB Cache
[ 4.587678] Uniform CD-ROM driver Revision: 3.20
[ 4.595690] Driver 'sd' needs updating - please use bus_type methodsHere is the bootlog with the two patches reverted.
root (hd0,0)
Filesystem type is ext2fs, partition type 0x83
kernel /boot/vmlinuz-2.6.25-mm1 root=/dev/hda1 mminit_loglevel=4 logle
vel=9 console=tty0 console=ttyS0,9600 ro earlyprintk=serial,ttyS0,9600 kernelco
re=384MB movablecore=384MB profile=sleep,2 resume=/dev/hda2
[Linux-bzImage, setup=0x2c00, size=0x1d9390]
savedefault
boot
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Linux version 2.6.25-mm1 (mel@arnold) (gcc version 4.2.3 (Debian 4.2.3-3)) #1 SMP Tue Apr 29 10:04:35 IST 2008
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
[ 0.000000] BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 000000001fff0000 (usable)
[ 0.000000] BIOS-e820: 000000001fff0000 - ...
Interestingly, bisection firmly blames this patch and QEMU boots with the two
patches reverted but fails with them applied so that patch does cause problems.
The failure on the laptop must be depending on some follow-on patch. I tried
a hatchet-job revert of the IDE patches between IDE-START and IDE-END in
the series file and it similarly fails to probe the IDE devices. So either
I made a mess of the reverts (strong possibility) or there is more than one
problem patch.--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
The third patch that needed reverting was
gregkh-pci-pci-clean-up-resource-alignment-management.patch (owners added
to cc). The relevant hint in the a diff between a broken and working bootlog was;system 00:09: ioport range 0x15e0-0x15ef has been reserved
+ PCI: bogus alignment of resource 7 [100:1ff] (flags 100) of 0000:00:02.0
+ PCI: bogus alignment of resource 8 [100:1ff] (flags 100) of 0000:00:02.0
+ PCI: bogus alignment of resource 9 [4000000:7ffffff] (flags 1200) of 0000:00:02.0
+ PCI: bogus alignment of resource 10 [4000000:7ffffff] (flags 200) of 0000:00:02.0
+ PCI: bogus alignment of resource 7 [100:1ff] (flags 100) of 0000:00:02.1
+ PCI: bogus alignment of resource 8 [100:1ff] (flags 100) of 0000:00:02.1
+ PCI: bogus alignment of resource 9 [4000000:7ffffff] (flags 1200) of 0000:00:02.1
+ PCI: bogus alignment of resource 10 [4000000:7ffffff] (flags 200) of 0000:00:02.1With the resource alignment patch and the two IDE patches reverted, the
laptop is able to boot.--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
Thanks for tracking it down.
Hmm, it seems that the above patch was merged a week ago:
commit bda0c0afa7a694bb1459fd023515aca681e4d79a
Merge: 904e0ab... af40b48...
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon Apr 21 15:58:35 2008 -0700Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pci-2.6
...
PCI: clean up resource alignment management
...but it could be that the issue has been already fixed in git tree
(could you verify it please?).BTW according to lspci output you should be able to use piix driver
instead of ide_generic on this laptop.Thanks,
Bart
--
I know but the config is a bit minimal for faster building as it's only
intended for sniff-testing patches.Thanks for the help.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
ide-mm-ide-add-struct-ide_io_ports-take-2.patch is now in mainline so a
quicky confirmation would be to test Linus's tree.--
2.6.25 and latest git are both booting fine.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
Another runtime warning on the t61p:
Brought up 2 CPUs
Total of 2 processors activated (9583.80 BogoMIPS).
CPU0 attaching sched-domain:
domain 0: span 00000000,00000003
groups: 00000000,00000001 00000000,00000002
domain 1: span 00000000,00000003
groups: 00000000,00000003
CPU1 attaching sched-domain:
domain 0: span 00000000,00000003
groups: 00000000,00000002 00000000,00000001
domain 1: span 00000000,00000003
groups: 00000000,00000003
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2677 check_flags+0x84/0x11f()
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.25-mm1 #15Call Trace:
[<ffffffff8105f7ec>] ? print_modules+0x88/0x8f
[<ffffffff81037b55>] warn_on_slowpath+0x58/0x7f
[<ffffffff81056143>] ? trace_hardirqs_off+0xd/0xf
[<ffffffff810560b7>] ? trace_hardirqs_off_caller+0x1d/0x9c
[<ffffffff81056143>] ? trace_hardirqs_off+0xd/0xf
[<ffffffff810560b7>] ? trace_hardirqs_off_caller+0x1d/0x9c
[<ffffffff81056143>] ? trace_hardirqs_off+0xd/0xf
[<ffffffff81058576>] ? __lock_acquire+0x809/0x893
[<ffffffff810560b7>] ? trace_hardirqs_off_caller+0x1d/0x9c
[<ffffffff81056143>] ? trace_hardirqs_off+0xd/0xf
[<ffffffff812b94d1>] ? __atomic_notifier_call_chain+0x0/0x81
[<ffffffff8105627e>] check_flags+0x84/0x11f
[<ffffffff81058914>] lock_acquire+0x54/0xb4
[<ffffffff812b9515>] __atomic_notifier_call_chain+0x44/0x81
[<ffffffff8100a2c2>] ? mwait_idle+0x0/0x49
[<ffffffff812b9561>] atomic_notifier_call_chain+0xf/0x11
[<ffffffff8100a228>] __exit_idle+0x27/0x29
[<ffffffff8100b33c>] cpu_idle+0xdf/0xf7
[<ffffffff812b10da>] start_secondary+0xb2/0xb4---[ end trace 93d72a36b9146f22 ]---
possible reason: unannotated irqs-on.
irq event stamp: 34
hardirqs last enabled at (33): [<ffffffff812b63f0>] trace_hardirqs_on_thunk+0x3a/0x3f
hardirqs last disabled at (34): [<ffffffff81056143>] trace_hardirqs_off+0xd/0xf
...
oop, there's more:
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
firewire_core: created device fw0: GUID 00016c2000174bad, S400
PM: Device usb4 failed to restore: error -113
eth0: Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
eth0: 10/100 speed: disabling TSO
PM: Device usb5 failed to restore: error -113
PM: Device usb7 failed to restore: error -113
sd 0:0:0:0: [sda] Starting disk
PM: Image restored successfully.
Restarting tasks ... done.
PM: Basic memory bitmaps freedThose USB restore failures are new. They're similar to the ones on the
doesnt-resume-properly-any-more Vaio. They came out from the machine's
second (successful) resume-from-disk.
--
I got USB messages after s2ram + suspend to disk combination, too, but
machine seems to work.ata1.00: ACPI cmd ef/10:03:00:00:00:a0 succeeded
ata1.00: configured for UDMA/100
ata1.00: configured for UDMA/100
ata1: EH complete
sd 0:0:0:0: [sda] 117210240 512-byte hardware sectors (60012 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 0:0:0:0: [sda] 117210240 512-byte hardware sectors (60012 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
PM: Device usb2 failed to restore: error -113
PM: Device usb3 failed to restore: error -113
PM: Device usb4 failed to restore: error -113
PM: Image restored successfully.
Restarting tasks ... done.
PM: Basic memory bitmaps freed
wlan0: RX disassociation from 00:11:2f:0e:95:a0 (reason=7)
wlan0: disassociated(Apart from some wireless problems, solved by reconnecting...)
(And ipw3945 LED indication now seems to work, good!)
Pavel--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
Try rmmod usb / insmod usb around suspend to see if it is
usb-specific, or if something went seriously wrong in core.Or you might just bisect it ;-).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
There's no need to worry about them. They merely indicate that the
root hubs didn't resume along with everything else, because they were
already suspended when the system went to sleep and so they were left
suspended. The return codes in usbcore will be changed soon so that
this won't appear to be an error.Alan Stern
--
I found another machine! This one's an old 4-way Nocona (x86_64)
http://userweb.kernel.org/~akpm/config-x.txt
http://userweb.kernel.org/~akpm/dmesg-x.txtCPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
CPU0: Thermal monitoring enabled (TM1)
ACPI: Core revision 20080321
Parsing all Control Methods:
Table [DSDT](id 0001) - 461 Objects with 50 Devices 130 Methods 11 Regions
tbxface-0598 [00] tb_load_namespace : ACPI Tables successfully acquired
evxfevnt-0091 [00] enable : Transition to ACPI mode successful
------------[ cut here ]------------
WARNING: at arch/x86/kernel/genapic_64.c:86 read_apic_id+0x31/0x67()
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.25-mm1 #16Call Trace:
[<ffffffff8025272f>] ? print_modules+0x88/0x8f
[<ffffffff80233493>] warn_on_slowpath+0x58/0x81
[<ffffffff80351ceb>] ? debug_spin_lock_after+0x18/0x1f
[<ffffffff8035217a>] ? _raw_spin_lock+0x116/0x120
[<ffffffff80228398>] ? sub_preempt_count+0x6d/0x74
[<ffffffff804e9ba3>] ? _spin_unlock_irqrestore+0x33/0x40
[<ffffffff803523e6>] ? debug_smp_processor_id+0x32/0xc4
[<ffffffff8021ede5>] read_apic_id+0x31/0x67
[<ffffffff8066f7f2>] verify_local_APIC+0xa7/0x163
[<ffffffff8066e837>] native_smp_prepare_cpus+0x1ed/0x301
[<ffffffff80669ab2>] kernel_init+0x5a/0x276
[<ffffffff804e9a1e>] ? _spin_unlock_irq+0x2a/0x35
[<ffffffff8022b7c2>] ? finish_task_switch+0x68/0x7f
[<ffffffff8020c1d8>] child_rip+0xa/0x12
[<ffffffff80669a58>] ? kernel_init+0x0/0x276
[<ffffffff8020c1ce>] ? child_rip+0x0/0x12---[ end trace 4eaa2a86a8e2da22 ]---
------------[ cut here ]------------
WARNING: at arch/x86/kernel/genapic_64.c:86 read_apic_id+0x31/0x67()
Modules linked in:
Pid: 1, comm: swapper Tainted: G W 2.6.25-mm1 #16Call Trace:
[<ffffffff8025272f>] ? print_modules+0x88/0x8f
[<ffffffff80233493>] warn_on_slowpath+0x58/0x81
[<ffffffff80351ceb>] ...
that came in via the UV-APIC patchset but the warning is entirely
harmless. At that point we've got a single CPU running only so
preemption of that code to another CPU is not possible.native_smp_prepare_cpus() should probably just disable preemption, that
way we could remove all those ugly preempt disable-enable calls from the
called functions - per the patch below. (not boot tested yet - might
provoke atomic-scheduling warnings if i forgot about some schedule point
in this rather large codepath)Ingo
------------------->
Subject: x86: disable preemption in native_smp_prepare_cpus
From: Ingo Molnar <mingo@elte.hu>
Date: Fri Apr 18 11:07:10 CEST 2008Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/smpboot.c | 2 ++
1 file changed, 2 insertions(+)Index: linux-x86.q/arch/x86/kernel/smpboot.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/smpboot.c
+++ linux-x86.q/arch/x86/kernel/smpboot.c
@@ -1181,6 +1181,7 @@ static void __init smp_cpu_index_default
*/
void __init native_smp_prepare_cpus(unsigned int max_cpus)
{
+ preempt_disable();
nmi_watchdog_default();
smp_cpu_index_default();
current_cpu_data = boot_cpu_data;
@@ -1237,6 +1238,7 @@ void __init native_smp_prepare_cpus(unsi
printk(KERN_INFO "CPU%d: ", 0);
print_cpu_info(&cpu_data(0));
setup_boot_clock();
+ preempt_enable();
}
/*
* Early setup to make printk work.
--
that should be the patch below.
Ingo
------------>
Subject: x86: disable preemption in native_smp_prepare_cpus
From: Ingo Molnar <mingo@elte.hu>
Date: Fri Apr 18 11:07:10 CEST 2008Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/smpboot.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)Index: linux-x86.q/arch/x86/kernel/smpboot.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/smpboot.c
+++ linux-x86.q/arch/x86/kernel/smpboot.c
@@ -1181,6 +1181,7 @@ static void __init smp_cpu_index_default
*/
void __init native_smp_prepare_cpus(unsigned int max_cpus)
{
+ preempt_disable();
nmi_watchdog_default();
smp_cpu_index_default();
current_cpu_data = boot_cpu_data;
@@ -1197,7 +1198,7 @@ void __init native_smp_prepare_cpus(unsi
if (smp_sanity_check(max_cpus) < 0) {
printk(KERN_INFO "SMP disabled\n");
disable_smp();
- return;
+ goto out;
}preempt_disable();
@@ -1237,6 +1238,8 @@ void __init native_smp_prepare_cpus(unsi
printk(KERN_INFO "CPU%d: ", 0);
print_cpu_info(&cpu_data(0));
setup_boot_clock();
+out:
+ preempt_enable();
}
/*
* Early setup to make printk work.
--
that's the stackprotector self-test: you probably have a gcc that cannot
build a proper stackprotector kernel. No damage other than having no
stackprotector. Arjan Cc:-ed.Ingo
--
On Fri, Apr 18, 2008 at 2:03 AM, Andrew Morton
Andrew, you don't seem to have slab debugging enabled:
# CONFIG_DEBUG_SLAB is not set
And quite frankly, the oops looks unlikely to be a slab bug but rather
a plain old slab corruption cause by the callers...Pekka
--
hm, there's sel_netnode_free() in the stackframe - that's from
security/selinux/netnode.c. Andrew, any recent changes in that area?Ingo
--
I've reverted the -mm only change to that file in
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6.git#for-akpm
commit f777964ad75cf4a119d911d12e81948d2402677f
Author: James Morris <jmorris@namei.org>
Date: Fri Apr 18 20:27:24 2008 +1000Revert "SELinux: Made netnode cache adds faster"
This reverts commit 6bf8f41d4efdf9d4eeb4f7df9c591e281f7da93e.
Possible cause of slab corruption in -mm.
--
James Morris
<jmorris@namei.org>
--
Keep in mind that slab might have been corrupted by someone else much
earlier but we didn't notice due to the lack of CONFIG_SLAB_DEBUG.
--
Yes, I'd agree. All has been peachy since I dropped git-selinux.
--
On Thu, 17 Apr 2008 16:03:31 -0700
do you have a stack-protector capable GCC? I guess not.
This is a catch-22. You do not have stack-protector. Should we make that
a silent failure? or do you want to know that you don't have a security
feature you thought you had.... complaining seems to be the right thing to do imo.--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
A #warning sounds more appropriate.
--
this warning is telling the user that the security feature that got
enabled in the .config is completely, 100% not working due to using a
stack-protector-incapable GCC.it's analogous as if there was a bug in gcc that made SELinux totally
ineffective in some mitigate-exploit-damage scenarios. No harm done on a
perfectly bug-free system - but once a bug happens that SELinux should
have mitigated, the breakage becomes real. Having a prominent warning is
the _minimum_.having a build failure would be nice too because this is a build
environment problem. (not a build warning - warnings can easily be
missed because on a typical kernel build there's so many false positives
that get emitted by various other warning mechanisms) Arjan?Ingo
--
Not really. In the selinux case we don't know that it'll break at compile
Yeah, #error would work too.
--
On Fri, 18 Apr 2008 00:28:58 -0700
I'm totally fine with that, but I think I need Sam's help on making that happen
the right way; this is going to need makefile fu L(Sam:
Basically what I need is that if the
scripts/gcc-x86_64-has-stack-protector.sh script fails, the build aborts with
a message/#error that says that the compiler is not capable of supporting this feature.Right now the script is used like this:
stackp := $(CONFIG_SHELL) $(srctree)/scripts/gcc-x86_64-has-stack-protector.sh
stackp-$(CONFIG_CC_STACKPROTECTOR) := $(shell $(stackp) \
"$(CC)" -fstack-protector )It's obviously easy to make this script print a warning.. but how do we make it stop the build?
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
ok I found a way that works for me:
From: Arjan van de Ven <arjan@linux.intel.com>
Subject: [PATCH] stackprotector: turn not having the right gcc into an #errorIf the user selects the stack-protector config option, but does not have
a gcc that has the right bits enabled (for example because it isn't build
with a glibc that supports TLS, as is common for cross-compilers, but also
because it may be too old), then the runtime test fails right now.Andrew rightfully points out that this is a condition we can detect at
build time, and we should error out at that point instead.This patch adds an error message for this scenario. This error accomplishes
two goals
1) the user is informed that the security option he selective isn't available
2) the user has enough info to turn of the CONFIG option that won't work for him,
and would make the runtime test fail anyway.Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
arch/x86/Makefile | 2 +-
kernel/panic.c | 3 +++
2 files changed, 4 insertions(+), 1 deletions(-)diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 3cff3c8..c3e0eee 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -73,7 +73,7 @@ elsestackp := $(CONFIG_SHELL) $(srctree)/scripts/gcc-x86_64-has-stack-protector.sh
stackp-$(CONFIG_CC_STACKPROTECTOR) := $(shell $(stackp) \
- "$(CC)" -fstack-protector )
+ "$(CC)" "-fstack-protector -DGCC_HAS_SP" )
stackp-$(CONFIG_CC_STACKPROTECTOR_ALL) += $(shell $(stackp) \
"$(CC)" -fstack-protector-all )diff --git a/kernel/panic.c b/kernel/panic.c
index c92c1e2..7cbcd8e 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -321,6 +321,9 @@ EXPORT_SYMBOL(warn_on_slowpath);#ifdef CONFIG_CC_STACKPROTECTOR
+#ifndef GCC_HAS_SP
+#error You have selected the CONFIG_CC_STACKPROTECTOR option, but the gcc used does not support this.
+#endif
static unsigned long __stack_check_testing;
/*
*...
you noticed it ;-) Distro maintainers will notice it too if it pops up
when something breaks StackProtector. Normal user might not notice. (but
normal user might not notice a few hundred guest roots either)but ... the real thing that made it slip into your config was that it
was default-enabled in x86/latest - the patch below should fix that.we need the warning: it could have caught the toplevel Makefile change
last October that broke StackProtector completely. So no, we wont be and
cannot be silent about this anymore - we need and now have an end-to-end
test about it.Ingo
------------------>
Subject: stackprotector: non default
From: Ingo Molnar <mingo@elte.hu>
Date: Fri Apr 18 11:13:17 CEST 2008Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/Kconfig | 1 -
1 file changed, 1 deletion(-)Index: linux-x86.q/arch/x86/Kconfig
===================================================================
--- linux-x86.q.orig/arch/x86/Kconfig
+++ linux-x86.q/arch/x86/Kconfig
@@ -1146,7 +1146,6 @@ config CC_STACKPROTECTOR
bool "Enable -fstack-protector buffer overflow detection (EXPERIMENTAL)"
depends on X86_64
select CC_STACKPROTECTOR_ALL
- default y
help
This option turns on the -fstack-protector GCC feature. This
feature puts, at the beginning of functions, a canary value on
--
For what it's worth I just looked over the changes in netnode.c and
nothing is jumping out at me. The changes ran fine for me when tested
on the later 2.6.25-rcX kernels but I suppose that doesn't mean a whole
lot.I've got a 4-way x86_64 box but it needs to be installed (which means
I'm not going to be able to do anything useful with it until tomorrow
at the earliest). I'll try it out and see if I can recreate the
problem.--
paul moore
linux @ hp
--
On Thu, 17 Apr 2008 19:55:46 -0400
I dropped git-selinux and that crash seems to have gone away. It took about
five minutes before, but would presumably have happened earlier if I'd
reduced the cache size.btw, wouldn't this
--- a/security/selinux/netnode.c~a
+++ a/security/selinux/netnode.c
@@ -190,7 +190,7 @@ static int sel_netnode_insert(struct sel
if (sel_netnode_hash[idx].size == SEL_NETNODE_HASH_BKT_LIMIT) {
struct sel_netnode *tail;
tail = list_entry(node->list.prev, struct sel_netnode, list);
- list_del_rcu(node->list.prev);
+ list_del_rcu(&tail->list);
call_rcu(&tail->rcu, sel_netnode_free);
} else
sel_netnode_hash[idx].size++;
_be a bit clearer? If it's correct - I didn't try too hard :)
--
Looks good to me, although before I fix this let me try and figure out
why this code is causing the machine to puke all over itself.
Priorities you know :)--
paul moore
linux @ hp
--
On Thu, 17 Apr 2008 19:55:46 -0400
Perhaps it was tested only against slub? That config uses slab.
--
Yes, I believe it was testing it with slub.
--
paul moore
linux @ hp
--
On Thu, 17 Apr 2008 16:03:31 -0700
With git-selinux at top-of tree it's repeatably hanging in the CPA
self-tests (git-x86 stuff). Last two lines are:CPA self-test:
4k 8704 large 4847 gb 0 x 0[0-0] miss 0(clear as mud ;))
I will find the config knob to disable that test. Of course, it could be
telling me that CPA is buggy.
--
On Thu, 17 Apr 2008 16:40:34 -0700
Disabling CPA_DEBUG didn't help. It's still hanging. The final initcall
is init_kgdbts() and disabling KGDB prevents the hang.--
In this case you do not have to disable kgdb, but just disable the
kgdb test suite. Certainly I would be interested to know where it is
failing as it would indicate that there is a regression that is caused
by a change that occurred somewhere else in the kernel or a latent
defect in kgdb was triggered. The kgdb test suite exercises a number
of kernel fault systems as well as arch specific single stepping when
it runs and when it fails it is likely worth it to track down which
test failed and why.If you are looking to bypass the kgdb test suite you have two options.
The kernel option that runs the tests on boot (which is not on by
default) is CONFIG_KGDB_TESTS_ON_BOOT, and make sure this is off.You can turn off the tests in an already compiled kernel that had the
testing turned on with boot by adding the boot argument with nothing
on the other side of the = sign of the kgdbts paramter. Like:kgdbts=
In terms of debugging what happened, if you have console output you
can save, please do send me the output of kernel boot with the kernel
boot argument:kgdbts=V2
That enables verbose logging of exactly what is going on and will show
where wheels fall off the cart. If the kernel is dying silently it
means the early exception code has completely failed in some way on
the kernel architecture that was selected, and of course the .config
is always useful in this case.Thanks,
Jason.
--
incidentally, just today, in overnight testing i triggered a similar
hang in the KGDB self-test:http://redhat.com/~mingo/misc/config-Thu_Apr_17_23_46_36_CEST_2008.bad
to get a similar tree to the one i tested, pick up sched-devel/latest
from:http://people.redhat.com/mingo/sched-devel.git/README
pick up that failing .config, do 'make oldconfig' and accept all the
defaults to get a comparable kernel to mine. (kgdb is embedded in
sched-devel.git.)the hang was at:
[ 12.504057] Calling initcall 0xffffffff80b800c1: init_kgdbts+0x0/0x1b()
[ 12.511298] kgdb: Registered I/O driver kgdbts.
[ 12.515062] kgdbts:RUN plant and detach test
[ 12.520283] kgdbts:RUN sw breakpoint test
[ 12.524651] kgdbts:RUN bad memory access test
[ 12.529052] kgdbts:RUN singlestep breakpoint testfull log:
http://redhat.com/~mingo/misc/log-Thu_Apr_17_23_46_36_CEST_2008.bad
note that this was a 64-bit config too - our tests do a perfect mix of
50% 32-bit and 50% 64-bit kernels. So single-stepping of the kernel
broke in some circumstances.find the boot log below. (it also includes all command line parameters)
This is the first time ever i saw the self-test in KGDB hanging, so it's
some recent non-KGDB change that provoked it or made it more likely. The
KGDB self-test runs very frequently in my bootup tests:[ 12.508236] kgdb: Registered I/O driver kgdbts.
[ 12.511245] kgdbts:RUN plant and detach test
[ 12.517418] kgdbts:RUN sw breakpoint test
[ 12.521056] kgdbts:RUN bad memory access test
[ 12.525515] kgdbts:RUN singlestep breakpoint test
[ 12.531483] kgdbts:RUN hw breakpoint test
[ 12.536142] kgdbts:RUN hw write breakpoint test
[ 12.541007] kgdbts:RUN access write breakpoint test
[ 12.546223] kgdbts:RUN do_fork for 100 breakpointsso the latest kgdb-light tree literally survived thousands of such tests
since it was changed last.unfortunately, the condition was not reproducible - i booted it once
more and then it came up just f...
So I pulled your tree and I would agree there was a problem. But it
seems unrelated to kgdb. I bisected the tree because it worked starting
with the kgdb-light merge.It fails once with the patch below, but it is not clear as to why other
than the lock must have something to do with it.I'll submit a patch to the kgdb test suite to increase the amount of
loops through the single step test as it is it can definitely catch
things :-)Jason.
From 84556fe84dd975161e70b782d7d7cc7bd080c06a Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Thu, 28 Feb 2008 21:00:21 +0100
Subject: [PATCH 0883/1078] sched: make cpu_clock() globally synchronousAlexey Zaytsev reported (and bisected) that the introduction of
cpu_clock() in printk made the timestamps jump back and forth.Make cpu_clock() more reliable while still keeping it fast when it's
called frequently.Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/sched.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++---
1 files changed, 49 insertions(+), 3 deletions(-)diff --git a/kernel/sched.c b/kernel/sched.c
index 8dcdec6..7377222 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -632,11 +632,39 @@ int sysctl_sched_rt_runtime = 950000;
*/
#define RUNTIME_INF ((u64)~0ULL)+static const unsigned long long time_sync_thresh = 100000;
+
+static DEFINE_PER_CPU(unsigned long long, time_offset);
+static DEFINE_PER_CPU(unsigned long long, prev_cpu_time);
+
/*
- * For kernel-internal use: high-speed (but slightly incorrect) per-cpu
- * clock constructed from sched_clock():
+ * Global lock which we take every now and then to synchronize
+ * the CPUs time. This method is not warp-safe, but it's good
+ * enough to synchronize slowly diverging time sources and thus
+ * it's good enough for tracing:
*/
-unsigned long long cpu_clock(int cpu)
+static DEFINE_SPINLOCK(time_sync_lock);
+static unsigned long long prev_global_time;
+
+static unsigned long long __sync_cpu_cloc...
With the patch below, it seems 100% reproducible to me (7 out of 7
bootups hung).The number of loops it could do before hanging were, in order: 697,
898, 237, 55, 45, 92, 59It seems timing-related, so I'm guessing it could be some interaction
with interrupts?Vegard
diff --git a/drivers/misc/kgdbts.c b/drivers/misc/kgdbts.c
index 6d6286c..ee87820 100644
--- a/drivers/misc/kgdbts.c
+++ b/drivers/misc/kgdbts.c
@@ -895,7 +895,13 @@ static void kgdbts_run_tests(void)
v1printk("kgdbts:RUN bad memory access test\n");
run_bad_read_test();
v1printk("kgdbts:RUN singlestep breakpoint test\n");
- run_singlestep_break_test();
+
+ while(1) {
+ static int i = 0;
+
+ run_singlestep_break_test();
+ printk(KERN_EMERG "test #%d successfull\n", i++);
+ }/* ===Optional tests=== */
--
cool! Jason: i think that particular self-test should be repeated 1000
times before reporting success ;-)Ingo
--
I assume this was SMP?
While I had not tried it yet, my guess would have been this did not
happen on a UP kernel. If it does occur on a UP kernel it means the
problem is squarely between the task scheduling after the exception is
handled and the kgdb state logic for re-entering the debug state after a
single step exception occurs.It seems reasonable to go for 1000 iterations of this particular test to
declare success as pointed out by Ingo. Previous versions of kgdb
handled some of the irq + single step + cpu sync slightly differently
and it is entirely possible there is a regression there.Jason.
--
On Fri, Apr 18, 2008 at 3:02 PM, Jason Wessel
Yes. But now that I realize this, I tried running same kernel with
qemu, using -smp 16, and it seems to be stuck here:[ 16.562659] kgdb: Registered I/O driver kgdbts.
[ 16.565875] kgdbts:RUN plant and detach testand the code is at kgdb_handle_exception():
/*
* Wait for the other CPUs to be notified and be waiting for us:
*/
for_each_online_cpu(i) {
while (!atomic_read(&cpu_in_kgdb[i]))
cpu_relax();Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
Unless you have a qemu with the NMI patches, kgdb does not work on SMP
with qemu. The very first test is going to fail because the IPI sent by
the kernel is not handled in qemu's hardware emulation.Jason.
--
Oops, no, and that makes sense.
I now picked up qemu 0.9.1 and applied the three NMI/SMI patches by Jan Kiszka.
So in qemu it seems to run fine now, except that I need to prod it
sometimes (it gets stuck in cpu_clock() and I have to break/continue
from gdb to make it proceed). Oh, there it made it to 1056, and gdb
can't interrupt anymore. Hmm. This is probably not a very goodBut booting with nosmp on real hardware gets easily above 100,000
iterations of the loop (before I reboot), so it seems to be related to
that, anyway.Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
It gets stuck in kgdb_roundup_cpus(), verified by putting a printk()
before and after this call (in kgdb_handle_exception()). Simple, but
effective :-)Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
Interesting, that's the new major:minor code. I'll go poke at it...
thanks,
greg k-h
--
Is this with the deprecated CONFIG_USB_DEVICE_CLASS=y? They have the
same dev_t as usb_device and would be a reason for the duplicates.Thanks,
Kay
--
The mac g5 is warning us about stuff too:
io scheduler deadline registered
io scheduler cfq registered
io scheduler bfq registered
proc_dir_entry '00' already registered
Call Trace:
[c00000017a0fbb80] [c000000000012018] .show_stack+0x58/0x1dc (unreliable)
[c00000017a0fbc30] [c00000000013f68c] .proc_register+0x218/0x260
[c00000017a0fbce0] [c00000000013fab8] .proc_mkdir_mode+0x40/0x74
[c00000017a0fbd60] [c0000000001f49a8] .pci_proc_attach_device+0x90/0x134
[c00000017a0fbe00] [c0000000005f0084] .pci_proc_init+0x68/0xa0
[c00000017a0fbe80] [c0000000005cbc94] .kernel_init+0x1ec/0x430
[c00000017a0fbf90] [c000000000026fc0] .kernel_thread+0x4c/0x68
nvidiafb: Device ID: 10de0141
nvidiafb: CRTC0 analog not foundhttp://userweb.kernel.org/~akpm/config-g5.txt
http://userweb.kernel.org/~akpm/dmesg-g5.txt
--
On Fri, 18 Apr 2008 02:48:19 +0200
--
On Thu, Apr 17, 2008 at 4:03 PM, Andrew Morton
The duplicate filename <major>:<minor> messages are coming from
"sysfs-add-sys-dev-char-block-to-lookup-sysfs-path-by-major-minor.patch"
now in Greg's tree. I'll take a look.--
Dan
--
| Jan Engelhardt | intel iommu (Re: -mm merge plans for 2.6.23) |
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| Linus Torvalds | Linux 2.6.25-rc4 |
| Jon Smirl | Re: 463 kernel developers missing! |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| Jarek Poplawski | Re: HTB accuracy for high speed |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
