ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc5/2.6.25-rc5-mm1/ - Added the kgdb tree as git-kgdb-light (Jason Wessel, Ingo Molnar) - Added a random-security-stuff-apart-from-selinux tree as git-security-testing (James Morris) - suspend-to-disk is still busted on my x86_64 t61p (git-x86, iirc) Boilerplate: - See the `hot-fixes' directory for any important updates to this patchset. - To fetch an -mm tree using git, use (for example) git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1 git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1 - -mm kernel commit activity can be reviewed by subscribing to the mm-commits mailing list. echo "subscribe mm-commits" | mail majordomo@vger.kernel.org - If you hit a bug in -mm and it is not obvious which patch caused it, it is most valuable if you can perform a bisection search to identify which patch introduced the bug. Instructions for this process are at http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt But beware that this process takes some time (around ten rebuilds and reboots), so consider reporting the bug first and if we cannot immediately identify the faulty patch, then perform the bisection search. - When reporting bugs, please try to Cc: the relevant maintainer and mailing list on any email. - When reporting bugs in this kernel via email, please also rewrite the email Subject: in some manner to reflect the nature of the bug. Some developers filter by Subject: when looking for messages to read. - Occasional snapshots of the -mm lineup are uploaded to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on the mm-commits list. These probably are at least compilable. - More-than-daily -mm snapshots may be found at http://userweb.kernel.org/~akpm/mmotm/. These are almost certainly not compileable. Changes since 2.6.25-rc3-mm1: ...
Hi Andrew,
The 2.6.25-rc5-mm1 kernel build fails with allyesconfig
LD .tmp_vmlinux1
fs/built-in.o: In function `reiser4_debugtrap':
/root/kernels/linux-2.6.25-rc5/fs/reiser4/debug.c:295: undefined reference to `breakpoint'
make: *** [.tmp_vmlinux1] Error 1
This build failure has been introduced by reiser4.patch, i think the
breakpoint() have been used instead of kgdb_breakpoint().
--- linux-2.6.25-rc5/fs/reiser4/debug.c 2008-03-11 22:12:45.000000000 +0530
+++ linux-2.6.25-rc5/fs/reiser4/~debug.c 2008-03-11 23:14:54.000000000 +0530
@@ -291,8 +291,8 @@ void reiser4_debugtrap(void)
{
/* do nothing. Put break point here. */
#if defined(CONFIG_KGDB) && !defined(CONFIG_REISER4_FS_MODULE)
- extern void breakpoint(void);
- breakpoint();
+ extern void kgdb_breakpoint(void);
+ kgdb_breakpoint();
#endif
}
#endif
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
wow, kgdb is enabled again.. Thanks! Edward --
Hi Andrew, The 2.6.25-rc5-mm1 kernel build fails with allmodconfig MODPOST 2279 modules ERROR: "probe_4drives" [drivers/ide/ide-core.ko] undefined! -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL.
On Tue, 11 Mar 2008 18:25:02 +0530 Yes, it has been doing this for a while. But apparently it doesn't happen for Bart with just his patch queue. <slightlypeeved>If your subssytem fails in my tree, that doesn't automatically make it my problem</slightlypeeved> I'll take a look, see what went wrong. --
No need to peeve - it is not like I forget about the problem or ignored it after I got your _first_ mail. I just couldn't reproduce it here so instead I has been working on making _all_ probe_* variables static (it should make the problem vanish alongside some other nice improvements). Thanks, Bart --
On Tue, 11 Mar 2008 18:25:02 +0530 Caused by ide-mm-ide-add-ide-4drives-host-driver-take-3.patch. Applying that patch alone to current mainline causes the above error after i386 `make allmodconfig'. Just exporting the symbol doesn't fix it, so something funny is going on. probe_4drives should not be initialised to zero. probe_4drives should not be declared extern in drivers/ide/ide.c - please declare it in a header which is included by the definition site and by all users. --
I was aware of the warnings and this was only temporary (it is already fixed by to-be-posted-today patch which removes deprecated "idex=" kernel parameters and makes _all_ probe_* variables static). Thanks, Bart --
randconfig (x86_64) with PCI=n PARAVIRT=y VSMP=n ends with arch/x86/kernel/built-in.o: In function `is_vsmp_box': (.text+0x1178d): undefined reference to `early_pci_allowed' arch/x86/kernel/built-in.o: In function `is_vsmp_box': (.text+0x117a9): undefined reference to `read_pci_config' arch/x86/kernel/built-in.o: In function `vsmp_init': (.init.text+0x4fcc): undefined reference to `early_pci_allowed' arch/x86/kernel/built-in.o: In function `vsmp_init': (.init.text+0x501a): undefined reference to `read_pci_config' make[1]: *** [.tmp_vmlinux1] Error 1 config attached. --- ~Randy
Would anyone have objection to have PARAVIRT depend on PCI, since the vsmp paravirt bits depend on PCI cfg space to determine if the system is vsmp? If not, this patch would suffice. Glauber? Thanks, Kiran --- Make PARAVIRT depend on PCI. vSMP PARAVIRT ops probe the pci config space to determine if the system is indeed a ScaleMP vSMP box. Hence, depend on PCI to enable PARAVIRT. Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Index: linux-2.6.24/arch/x86/Kconfig =================================================================== --- linux-2.6.24.orig/arch/x86/Kconfig 2008-03-11 16:38:26.000000000 -0700 +++ linux-2.6.24/arch/x86/Kconfig 2008-03-11 16:50:52.000000000 -0700 @@ -384,7 +384,7 @@ source "arch/x86/lguest/Kconfig" config PARAVIRT bool "Enable paravirtualization code" - depends on !(X86_VISWS || X86_VOYAGER) + depends on !(X86_VISWS || X86_VOYAGER) && PCI help This changes the kernel so it can modify itself when it is run under a hypervisor, potentially improving performance significantly --
Works for me. Acked-by: Randy Dunlap <randy.dunlap@oracle.com> -- ~Randy --
NAK. Xen doesn't depend on PCI at all. Why not make VSMP depend on
PCI? Then you could put something like:
#ifdef CONFIG_X86_VSMP
extern void vsmp_init(void);
extern int is_vsmp_box(void);
#else
static inline void vsmp_init(void)
{
}
static inline int is_vsmp_box(void)
{
return 0;
}
#endif
in an appropriate header.
Hm, looks like arch/x86/kernel/Makefile should be
obj-$(CONFIG_X86_VSMP) += vsmp_64.o
rather than making it depend directly on CONFIG_PARAVIRT.
J
--
hm, that's not a good idea - there's nothing in lguest, Xen and even KVM that is inherently tied to PCI. Ingo --
whee. Things are going much much more smoothly now than they were in 2.6.24-rcX and 2.6.23-rcX. Tree integration problems are negligible and build errors are far fewer and runtime problems seem to be less too. Fingers crossed. I guess this is due to a combinaton of a) linux-next b) intensive whining and c) extra care which maintainers are taking (due to a) and b)) I suspect that fewer people are testing linux-next and -mm nowadays. We should encourage them to do so, although given the general trainwreckishness of current mainline, this isn't really where our effort should be expended. --
On Tue, Mar 11, 2008 at 9:39 PM, Andrew Morton 2.6.25-rc3-mm1 worked nicely for me, but 2.6.25-rc5-mm1 does not boot. dmesg: [ 0.000000] Linux version 2.6.25-rc5-mm1 (root@treogen) (gcc version 4.2.3 (Gentoo 4.2.3 p1.0)) #1 SMP Wed Mar 12 19:51:41 CET 2008 [ 0.000000] Command line: earlyprintk=serial,ttyS0,115200 console=ttyS0,115200 console=tty1 crypt_root=/dev/md1 sata_nv.swncq=1 [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) [ 0.000000] BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) [ 0.000000] BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved) [ 0.000000] BIOS-e820: 0000000000100000 - 00000000dffd0000 (usable) [ 0.000000] BIOS-e820: 00000000dffd0000 - 00000000dffde000 (ACPI data) [ 0.000000] BIOS-e820: 00000000dffde000 - 00000000e0000000 (ACPI NVS) [ 0.000000] BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved) [ 0.000000] BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved) [ 0.000000] BIOS-e820: 00000000ff700000 - 0000000100000000 (reserved) [ 0.000000] BIOS-e820: 0000000100000000 - 0000000120000000 (usable) [ 0.000000] console [earlyser0] enabled [ 0.000000] end_pfn_map = 1179648 [ 0.000000] DMI present. [ 0.000000] ACPI: RSDP 000FB080, 0024 (r2 ACPIAM) [ 0.000000] ACPI: XSDT DFFD0100, 0064 (r1 A_M_I_ OEMXSDT 4000713 MSFT 97) [ 0.000000] ACPI: FACP DFFD0290, 00F4 (r3 A_M_I_ OEMFACP 4000713 MSFT 97) [ 0.000000] ACPI: DSDT DFFD0450, 4FC5 (r1 S0027 S0027000 0 INTL 20051117) [ 0.000000] ACPI: FACS DFFDE000, 0040 [ 0.000000] ACPI: APIC DFFD0390, 0080 (r1 A_M_I_ OEMAPIC 4000713 MSFT 97) [ 0.000000] ACPI: MCFG DFFD0410, 003C (r1 A_M_I_ OEMMCFG 4000713 MSFT 97) [ 0.000000] ACPI: OEMB DFFDE040, 0060 (r1 A_M_I_ AMI_OEM 4000713 MSFT 97) [ 0.000000] ACPI: HPET DFFD5420, 0038 (r1 A_M_I_ OEMHPET0 4000713 MSFT 97) [ 0.000000] ACPI: MCFG ...
On Wed, 12 Mar 2008 20:33:02 +0100 So you aren't using netconsole. I had a series of hangs yesterday which went away when netconsole was disabled. I think netconsole is still OK, so it looks like it died during networking initialisation. Could you please add initcall_debug to the boot command line so we can see which function it is getting stuck in? --
On Wed, Mar 12, 2008 at 8:44 PM, Andrew Morton Yes, here is the result: [ 2.573979] PCI-DMA: Disabling AGP. [ 2.577639] PCI-DMA: aperture base @ 8000000 size 65536 KB [ 2.589504] PCI-DMA: using GART IOMMU. [ 2.593258] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture [ 2.600132] initcall pci_iommu_init+0x0/0x20() returned 0 after 19 msecs [ 2.622146] calling hpet_late_init+0x0/0x140() [ 2.626689] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31 [ 2.633022] hpet0: 3 32-bit timers, 25000000 Hz [ 2.638562] initcall hpet_late_init+0x0/0x140() returned 0 after 9 msecs [ 2.654545] calling clocksource_done_booting+0x0/0x20() [ 2.659855] initcall clocksource_done_booting+0x0/0x20()<6>Time: hpet clocksource has been installed. [ 2.662185] returned 0 after 0 msecs [ 2.688448] calling init_pipe_fs+0x0/0x60() [ 2.695423] initcall init_pipe_fs+0x0/0x60() returned 0 after 0 msecs [ 2.705784] calling init_mnt_writers+0x0/0x70() [ 2.711681] initcall init_mnt_writers+0x0/0x70() returned 0 after 0 msecs [ 2.721678] calling eventpoll_init+0x0/0x90() [ 2.731644] initcall eventpoll_init+0x0/0x90() returned 0 after 0 msecs [ 2.738295] calling anon_inode_init+0x0/0x130() [ 2.751614] initcall anon_inode_init+0x0/0x130() returned 0 after 0 msecs [ 2.771585] calling pcie_aspm_init+0x0/0x30() [ 2.779297] initcall pcie_aspm_init+0x0/0x30() returned 0 after 2 msecs [ 2.793911] calling acpi_event_init+0x0/0x52() -> it looked like the system this time already hung here. But just pressing the 'Alt' key let the system continue until the network hang. (I tried this a second time, again it paused here until I pressed a key) [ 94.857929] initcall acpi_event_init+0x0/0x52() returned 0 after 29276 msecs [ 94.865002] calling pnp_system_init+0x0/0x20() [ 94.877935] system 00:06: ioport range 0x4d0-0x4d1 has been reserved [ 94.884286] system 00:06: ioport range 0x7b0-0x7df has been reserved [ 94.897886] system 00:06: ...
On Wed, Mar 12, 2008 at 9:01 PM, Torsten Kaiser CONFIG_PCIEASPM does not change anything. Also testing the range of ipc patches you suggested to Badari did not fix it. I did a bisect, these patches are currently remaining, but I dod not have the time for more bisect steps until tomorrow: git-scsi-misc git-sh execute-tasklets-in-the-same-order-they-were-queued git-sched sched: work around hrtick related lockup sched: make sure jiffies is up to date before calling __update_rq_clock() sched: fix rq->clock overflows detection with CONFIG_NO_HZ sched: make cpu_clock() globally synchronous sched: remove isolcpus ftrace: make the task state char-string visible to all sched: add latency tracer callbacks to the scheduler latencytop: optimize LT_BACKTRACEDEPTH loops a bit sched: cleanup old and rarely used 'debug' features. [SCSI] zfcp: convert zfcp to use target reset and device reset handler [SCSI] qla4xxx: Add target reset functionality [SCSI] scsi_error: add target reset handler [SCSI] ps3rom: Simplify fill_from_dev_buffer() [SCSI] scsi_debug: use shost_priv macro [SCSI] scsi_debug: remove unnecessary checking [SCSI] scsi_debug: remove scsi_debug.h [SCSI] scsi_debug: stop including drivers/scsi/scsi.h [SCSI] Remove random noop unchecked_isa_dma users [SCSI] aacraid: READ_CAPACITY_16 shouldn't trust allocation length in cdb [SCSI] st: show options currently set in sysfs [SCSI] st: add option to use SILI in variable block reads [SCSI] gdth: remove command accessors [SCSI] aic94xx: Use sas_request_addr() to provide SAS WWN if the adapter lacks one [SCSI] libsas: Provide a transport-level facility to request SAS addrs [SCSI] ips: sg chaining support to the path to non I/O commands [SCSI] gdth: convert to PCI hotplug API [SCSI] gdth: PCI probe cleanups, prep for PCI hotplug API conversion rtc: rtc-sh: Add support for periodic IRQs. sh: SuperH KEYSC keypad data for Solution Engine 7722 sh: SuperH KEYSC keypad data for MigoR sh: SuperH KEYSC platform driver Torsten --
Yes. I found the following patch to be the culprit.
sched: make sure jiffies is up to date before calling __update_rq_clock
()
Torsten, looking at your output, it looks like it hung at the same
place. Backing out this patch should help. Try it out. I am sure
you also have CONFIG_DETECT_SOFTLOCKUP=y in your config ?
commit 60befbc1c0b6d141c9c26e61ddd303aedd1e7396
Author: Guillaume Chazarain <guichaz@yahoo.fr>
Date: Mon Mar 10 08:16:41 2008 +0100
sched: make sure jiffies is up to date before calling
__update_rq_clock()
Now that __update_rq_clock() uses jiffies to detect clock overflows,
make sure jiffies are up to date before touch_softlockup_watchdog().
Removed a touch_softlockup_watchdog() call becoming redundant with
the
added tick_nohz_update_jiffies().
Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/kernel/sched.c b/kernel/sched.c
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -66,6 +66,7 @@
#include <linux/unistd.h>
#include <linux/pagemap.h>
#include <linux/hrtimer.h>
+#include <linux/tick.h>
#include <asm/tlb.h>
#include <asm/irq_regs.h>
@@ -913,7 +914,7 @@ void sched_clock_idle_wakeup_event(u64 delta_ns)
rq->prev_clock_raw = now;
rq->clock += delta_ns;
spin_unlock(&rq->lock);
- touch_softlockup_watchdog();
+ tick_nohz_update_jiffies();
}
EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event);
Thanks,
Badari
--
Hi, I got the following lockdep warning: (add linux-acpi to cc) [ 0.097109] ACPI: Core revision 20070126 [ 0.097282] INFO: trying to register non-static key. [ 0.097355] the code is fine but needs lockdep annotation. [ 0.097428] turning off the locking correctness validator. [ 0.097503] Pid: 0, comm: swapper Not tainted 2.6.25-rc5-mm1 #3 [ 0.097578] [<c0127bf8>] ? printk+0x18/0x20 [ 0.097716] [<c014b01c>] __lock_acquire+0x40c/0x760 [ 0.097822] [<c0181ba0>] ? alloc_debug_processing+0xb0/0x140 [ 0.097959] [<c014b969>] lock_acquire+0x79/0xb0 [ 0.098063] [<c0140204>] ? down_trylock+0x14/0x40 [ 0.098197] [<c03df9e8>] _spin_lock_irqsave+0x48/0xa0 [ 0.098303] [<c0140204>] ? down_trylock+0x14/0x40 [ 0.098436] [<c0140204>] down_trylock+0x14/0x40 [ 0.098540] [<c027c7ea>] acpi_os_wait_semaphore+0x3e/0xb9 [ 0.098647] [<c029263e>] acpi_ut_acquire_mutex+0x34/0x72 [ 0.098753] [<c0289ab1>] acpi_ns_root_initialize+0x19/0x250 [ 0.098859] [<c05453e6>] acpi_initialize_subsystem+0x42/0x64 [ 0.098966] [<c0545725>] acpi_early_init+0x50/0xef [ 0.099070] [<c052b7f6>] start_kernel+0x1e6/0x250 [ 0.099175] [<c052b1a0>] ? unknown_bootoption+0x0/0x130 [ 0.099310] [<c052b008>] __init_begin+0x8/0x10 [ 0.099414] ======================= --
The kernel won't build if CONFIG_NO_HZ=y and CONFIG_PREEMPT_RCU=y:
$ grep -e PREEMPT -e HZ .config
CONFIG_NO_HZ=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_RCU=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_DEBUG_PREEMPT=y
$
$ make
...
CC init/main.o
In file included from include/linux/rcupdate.h:60,
from include/linux/rculist.h:11,
from include/linux/dcache.h:9,
from include/linux/fs.h:279,
from include/linux/proc_fs.h:6,
from init/main.c:15:
include/linux/rcupreempt.h: In function 'rcu_enter_nohz':
include/linux/rcupreempt.h:91: error: 'HZ' undeclared (first use in this function)
include/linux/rcupreempt.h:91: error: (Each undeclared identifier is reported only once
include/linux/rcupreempt.h:91: error: for each function it appears in.)
include/linux/rcupreempt.h: In function 'rcu_exit_nohz':
include/linux/rcupreempt.h:99: error: 'HZ' undeclared (first use in this function)
make[1]: *** [init/main.o] Error 1
make: *** [init] Error 2
$
At first glance, I would suspect these patches:
add-warn_on_secs-macro.patch
use-warn_on_secs-in-rcupreempth.patch
~~
laurent
--
hm, it works OK for me, but I don't have your full config. This, I guess: --- a/include/asm-generic/bug.h~add-warn_on_secs-macro-fix-fix +++ a/include/asm-generic/bug.h @@ -2,7 +2,7 @@ #define _ASM_GENERIC_BUG_H #include <linux/compiler.h> - +#include <linux/param.h> #ifdef CONFIG_BUG _ --
Yes it does work, thanks. But it does hang on boot. I'm unable to get any information with Sysrq-keys. I'll start a bisection to narrow this problem. I attached my .config FWIW. ~~ laurent ------- # # Automatically generated make config: don't edit # Linux kernel version: 2.6.25-rc5-mm1 # Wed Mar 12 08:03:47 2008 # # CONFIG_64BIT is not set CONFIG_X86_32=y # CONFIG_X86_64 is not set CONFIG_X86=y # CONFIG_GENERIC_LOCKBREAK is not set CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_FAST_CMPXCHG_LOCAL=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y # CONFIG_GENERIC_GPIO is not set CONFIG_ARCH_MAY_HAVE_PC_FDC=y # CONFIG_RWSEM_GENERIC_SPINLOCK is not set CONFIG_RWSEM_XCHGADD_ALGORITHM=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y CONFIG_GENERIC_CALIBRATE_DELAY=y # CONFIG_GENERIC_TIME_VSYSCALL is not set CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y # CONFIG_HAVE_SETUP_PER_CPU_AREA is not set CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y # CONFIG_ZONE_DMA32 is not set CONFIG_ARCH_POPULATES_NODE_MAP=y # CONFIG_AUDIT_ARCH is not set CONFIG_ARCH_SUPPORTS_AOUT=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_X86_BIOS_REBOOT=y CONFIG_KTIME_SCALAR=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_AUDIT is not ...
Works fine here with those very same settings, except for a BUG message and a warning I'll report separately and which don't seem to have any serious consequences. This is a 32 bit build on a rather ordinary Pentium D/Intel motherboard/openSUSE 10.3 workstation. --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
Hi Andrew,
The 2.6.25-rc5-mm1 kernel build fails with randconfig compile
CC arch/x86/kernel/asm-offsets.s
In file included from include/asm/irqflags.h:59,
from include/linux/irqflags.h:46,
from include/asm/system.h:11,
from include/asm/processor.h:21,
from include/asm/atomic_32.h:5,
from include/asm/atomic.h:2,
from include/linux/crypto.h:20,
from arch/x86/kernel/asm-offsets_32.c:7,
from arch/x86/kernel/asm-offsets.c:2:
include/asm/paravirt.h: In function Hi Andrew,
The 2.6.25-rc5-mm1 kernel panics while bootup on powerpc
returning from prom_init
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc00000000000d5dc
cpu 0x0: Vector: 300 (Data Access) at [c0000000007636e0]
pc: c00000000000d5dc: .do_IRQ+0x74/0x1f4
lr: c00000000000d5a8: .do_IRQ+0x40/0x1f4
sp: c000000000763960
msr: 8000000000001032
dar: 0
dsisr: 40000000
current = 0xc000000000688e60
paca = 0xc000000000689900
pid = 0, comm = swapper
enter ? for help
[c000000000763a00] c000000000004c24 hardware_interrupt_entry+0x24/0x28
--- Exception: 501 (Hardware Interrupt) at c0000000006021b0 .free_bootmem_core+0x94/0xcc
[link register ] c00000000060373c .free_bootmem_with_active_regions+0x78/0xb8
[c000000000763cf0] c000000000602610 .init_bootmem_core+0x5c/0xfc (unreliable)
[c000000000763d80] c0000000005eb68c .do_init_bootmem+0x964/0xaf0
[c000000000763e50] c0000000005e03b0 .setup_arch+0x1a4/0x218
[c000000000763ee0] c0000000005d76bc .start_kernel+0xe8/0x424
[c000000000763f90] c000000000008590 .start_here_common+0x60/0xd0
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
Beats me. Maybe we're still enabling interrupts too early. But the new semaphore code got fixed (didn't it?) --
On the 7th, according to my records. Easy to check -- look in kernel/semaphore.c and see whether down() is using spin_lock_irqsave (good) or spin_lock_irq (bad). -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
down() looks OK, but there's still a spin_lock_irq() in __down_common(), although I don't know if it makes sense for us to be in __down() at that stage. cheers --=20 Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person
The spin_lock_irq in __down_common is correct. We're going to schedule(), so we spin_unlock_irq() to save us passing the flags into the helper function. If we had interrupts disabled on entry, there's an Aieee for that. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
Hi All, Sorry for all the noise made :-(, something wrong in the test setup from my end, the kernel was 2.6.25-rc3-mm1 not 2.6.25-rc5-mm1. This bug is not seen in the 2.6.25-rc5-mm1 kernel. -- Thanks & Regards, Kamalesh Babulal, --
Won't lockdep/irqtrace warn if that happens ? You don't yet have the lockdep patches for ppc64 (I'm still trying to find out why they break iSeries) but it should warn of such a spurrious IRQ enable on other archs too... At least, from a quick look at the code, it -seems- that it does have such a test. Cheers, Ben. --
Is this only on one machine ? happens all the time ? I ran into similar issues on rc3-mm1. rc5-mm1 seems to be working fine for me on ppc64. Thanks, Badari --
I am having trouble booting rc5-mm1 on my x86_64. (ppc64 boots & works fine). Seems to be a networking issues (hangs on boot). Here are the messages on the console (not really useful to me). On a good kernel (rc5), the next set of messages would be .. IP route cache hash table entries: 262144 (order: 9, 2097152 bytes) TCP established hash table entries: 524288 (order: 11, 8388608 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 524288 bind 65536) TCP reno registered Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) msgmni has been set to 13864 for ipc namespace ffffffff806903a0 .. Sorry for not being really useful here. But would like to know if its a known issue ? Or should I start bisecting ? Thanks, Badari Linux version 2.6.25-rc5-mm1 (root@elm3b29) (gcc version 4.1.0 (SUSE Linux)) #1 SMP Wed Mar 12 12:27:14 PDT 2008 Command line: root=/dev/hda2 vga=0x314 crashkernel=64M@16M selinux=0 console=tty0 console=ttyS0,38400 resume=/dev/hda1 resume=/dev/hda1 splash=silent showopts BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f000 (usable) BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved) BIOS-e820: 00000000000ca000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000dfef0000 (usable) BIOS-e820: 00000000dfef0000 - 00000000dfeff000 (ACPI data) BIOS-e820: 00000000dfeff000 - 00000000dff00000 (ACPI NVS) BIOS-e820: 00000000dff00000 - 00000000e0000000 (usable) BIOS-e820: 00000000fec00000 - 00000000fec00400 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 00000001e0000000 (usable) end_pfn_map = 1966080 DMI 2.3 present. ACPI: RSDP 000F6970, 0024 (r2 PTLTD ) ACPI: XSDT DFEFC625, 003C (r1 PTLTD XSDT 6040000 LTP 0) ACPI: FACP DFEFED02, 00F4 (r3 AMD HAMMER ...
Would be good, please. I guess here: # # ipc # ipc-use-ipc_buildid-directly-from-ipc_addid.patch ipc-use-ipc_buildid-directly-from-ipc_addid-cleanup.patch # ipc-scale-msgmni-to-the-amount-of-lowmem.patch ipc-scale-msgmni-to-the-number-of-ipc-namespaces.patch ipc-define-the-slab_memory_callback-priority-as-a-constant.patch ipc-recompute-msgmni-on-memory-add--remove.patch ipc-invoke-the-ipcns-notifier-chain-as-a-work-item.patch ipc-recompute-msgmni-on-ipc-namespace-creation-removal.patch ipc-do-not-recompute-msgmni-anymore-if-explicitly-set-by-user.patch ipc-re-enable-msgmni-automatic-recomputing-msgmni-if-set-to-negative.patch # ipc-semaphores-code-factorisation.patch ipc-shared-memory-introduce-shmctl_down.patch ipc-message-queues-introduce-msgctl_down.patch ipc-semaphores-move-the-rwmutex-handling-inside-semctl_down.patch ipc-semaphores-remove-one-unused-parameter-from-semctl_down.patch ipc-get-rid-of-the-use-_setbuf-structure.patch ipc-introduce-ipc_update_perm.patch ipc-consolidate-all-xxxctl_down-functions.patch ipc-consolidate-all-xxxctl_down-functions-fix.patch would be the place to start looking. --
Hi Andrew, Finally narrowed down the problem to git-sched.patch in rc5-mm1. I am going to try which individual patch in that git caused my amd64 boot hang. Peter, Ingo - here are the boot messages on the console. Any ideas ? config file attached. Thanks, Badari Linux version 2.6.25-rc5 (root@elm3b29) (gcc version 4.1.0 (SUSE Linux)) #11 SMP Thu Mar 13 12:28:17 PDT 2008 Command line: root=/dev/hda2 vga=0x314 crashkernel=64M@16M selinux=0 console=tty0 console=ttyS0,38400 resume=/dev/hda1 resume=/dev/hda1 splash=silent showopts BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f000 (usable) BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved) BIOS-e820: 00000000000ca000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000dfef0000 (usable) BIOS-e820: 00000000dfef0000 - 00000000dfeff000 (ACPI data) BIOS-e820: 00000000dfeff000 - 00000000dff00000 (ACPI NVS) BIOS-e820: 00000000dff00000 - 00000000e0000000 (usable) BIOS-e820: 00000000fec00000 - 00000000fec00400 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 00000001e0000000 (usable) end_pfn_map = 1966080 DMI 2.3 present. ACPI: RSDP 000F6970, 0024 (r2 PTLTD ) ACPI: XSDT DFEFC625, 003C (r1 PTLTD XSDT 6040000 LTP 0) ACPI: FACP DFEFED02, 00F4 (r3 AMD HAMMER 6040000 PTEC F4240) ACPI: DSDT DFEFC661, 262D (r1 AMD-K8 AMDACPI 6040000 MSFT 100000D) ACPI: FACS DFEFFFC0, 0040 ACPI: SRAT DFEFEDF6, 0160 (r1 AMD HAMMER 6040000 AMD 1) ACPI: APIC DFEFEF56, 00AA (r1 PTLTD APIC 6040000 LTP 0) SRAT: PXM 0 -> APIC 0 -> Node 0 SRAT: PXM 1 -> APIC 1 -> Node 1 SRAT: PXM 2 -> APIC 2 -> Node 2 SRAT: PXM 3 -> APIC 3 -> Node 3 SRAT: Node 0 PXM 0 0-a0000 SRAT: Node 0 PXM 0 0-e0000000 SRAT: Node 0 PXM 0 0-180000000 SRAT: PXM 1 (100000000-1a0000000) overlaps with PXM 0 (0-180000000) SRAT: SRAT not used. Scanning NUMA ...
Further narrowed it down to following patch in git-sched.patch.
When I back out this patch from rc5-mm1, my amd64 box boots fine.
commit 60befbc1c0b6d141c9c26e61ddd303aedd1e7396
Author: Guillaume Chazarain <guichaz@yahoo.fr>
Date: Mon Mar 10 08:16:41 2008 +0100
sched: make sure jiffies is up to date before calling
__update_rq_clock()
Now that __update_rq_clock() uses jiffies to detect clock overflows,
make sure jiffies are up to date before touch_softlockup_watchdog().
Removed a touch_softlockup_watchdog() call becoming redundant with
the
added tick_nohz_update_jiffies().
Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/kernel/sched.c b/kernel/sched.c
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -66,6 +66,7 @@
#include <linux/unistd.h>
#include <linux/pagemap.h>
#include <linux/hrtimer.h>
+#include <linux/tick.h>
#include <asm/tlb.h>
#include <asm/irq_regs.h>
@@ -913,7 +914,7 @@ void sched_clock_idle_wakeup_event(u64 delta_ns)
rq->prev_clock_raw = now;
rq->clock += delta_ns;
spin_unlock(&rq->lock);
- touch_softlockup_watchdog();
+ tick_nohz_update_jiffies();
}
EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event);
Thanks,
Badari
--
I didn't know this patch could prevent booting, but anyway it should have been removed a long time ago: http://lkml.org/lkml/2008/1/25/408 Thanks. -- Guillaume --
I don't know whats happening either, but my debug shows that
tick_nohz_update_jiffies() always returns due to following
check without calling touch_softlockup_watchdog().
if (!ts->tick_stopped)
return;
BTW, I have CONFIG_DETECT_SOFTLOCKUP=y in my config.
Thanks,
Badari
--
5/2.6.25-rc5-mm1/ This still complains during startup: <6>[ 0.063442] Checking 'hlt' instruction... OK. <0>[ 0.068233] BUG: spinlock bad magic on CPU#0, swapper/0 <0>[ 0.068996] lock: c2c19380, .magic: 00000000, .owner: swapper/0, .= owner_cpu: 0 <4>[ 0.069227] Pid: 0, comm: swapper Not tainted 2.6.25-rc5-mm1-testin= g #1 <4>[ 0.069369] [spin_bug+124/135] spin_bug+0x7c/0x87 <4>[ 0.069563] [_raw_spin_unlock+25/113] _raw_spin_unlock+0x19/0x71 <4>[ 0.069752] [_spin_unlock+29/60] _spin_unlock+0x1d/0x3c <4>[ 0.069941] [mnt_want_write+98/136] mnt_want_write+0x62/0x88 <4>[ 0.070131] [sys_mkdirat+134/214] sys_mkdirat+0x86/0xd6 <4>[ 0.070322] [clean_path+22/74] ? clean_path+0x16/0x4a <4>[ 0.070558] [kfree+216/236] ? kfree+0xd8/0xec <4>[ 0.070793] [sys_mkdir+16/18] sys_mkdir+0x10/0x12 <4>[ 0.070995] [do_name+274/435] do_name+0x112/0x1b3 <4>[ 0.071184] [write_buffer+29/44] write_buffer+0x1d/0x2c <4>[ 0.071371] [flush_window+100/179] flush_window+0x64/0xb3 <4>[ 0.071558] [unpack_to_rootfs+1580/2233] unpack_to_rootfs+0x62c/0x= 8b9 <4>[ 0.071747] [populate_rootfs+32/265] populate_rootfs+0x20/0x109 <4>[ 0.071995] [alternative_instructions+339/344] ? alternative_instr= uctions+0x153/0x158 <4>[ 0.072235] [start_kernel+835/853] start_kernel+0x343/0x355 <4>[ 0.072422] [i386_start_kernel+8/10] i386_start_kernel+0x8/0xa <4>[ 0.072610] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D <6>[ 0.072808] Unpacking initramfs... done System comes up fine, though. Not sure whom to CC. Machine's a dual-core Pentium D running a 32 bit kernel. Let me know if you want me to provide more information or test anything. HTH T. --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
On Thu, 13 Mar 2008 00:54:43 +0100 I thought we already fixed this, actually. Maybe we just talked about it a bit? --
I'm really confused by this one. It looks to me like the initcalls got all out of whack in their ordering. There's no way in hell that the populate_rootfs() call should be happening right next to cpu If you can send me your vmlinux (not vmlinuz), I'll see how the initcalls are laid out in it. What distro and compiler are you on? -- Dave --
Hi Tim, Could you send me your full dmesg along with your kernel .config? i think this is an ordering issue in bootup, but I'd like to be sure. Bonus points if I can also have your initrd. :) -- Dave --
Dave, Ok, you asked for it. Find the lot at: http://gollum.phnxsoft.com/~ts/linux/ I guess you know what to expect, sizewise. :-) openSUSE 10.3 and the toolchain it brought along, including GCC 4.2.1. HTH T. --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
Tim, thanks for the excellent debugging info. It's making this much easier. I actually booted your vmlinux in a kvm image and I got the same error. However, it is in a *completely* bogus place. Certainly before initcalls get run and before the lock you saw the BUG_ON() for got initialized. I'm going to go try and find a gcc-4.2 and compile on that. Andrew, I don't think this is an actual bug in the r/o bind mount code, but a random, way early call to populate_rootfs(), somehow. I'll keep looking into it, though. Notice in the dmesg that this all happens even before the first initcall is made? It isn't the SMP alternates freeing, at least. I booted this on SMP, too, with the same results. [ 0.074025] Intel machine check architecture supported. [ 0.075013] Intel machine check reporting enabled on CPU#0. [ 0.076010] Compat vDSO mapped to ffffe000. [ 0.078030] Checking 'hlt' instruction... OK. [ 0.083987] SMP alternatives: switching to UP code [ 0.084023] Freeing SMP alternatives: 9k freed [ 0.086006] BUG: spinlock bad magic on CPU#0, swapper/0 [ 0.087002] lock: c1754380, .magic: 00000000, .owner: swapper/0, .owner_cpu: 0 [ 0.088003] Pid: 0, comm: swapper Not tainted 2.6.25-rc5-mm1-testing #2 [ 0.089001] [<c01f728c>] spin_bug+0x7c/0x87 [ 0.091002] [<c01f72b0>] _raw_spin_unlock+0x19/0x71 [ 0.093001] [<c0301922>] _spin_unlock+0x1d/0x3c [ 0.095001] [<c01981aa>] mnt_want_write+0x62/0x88 [ 0.097000] [<c018c382>] sys_mkdirat+0x86/0xd6 [ 0.098695] [<c04260ab>] ? clean_path+0x16/0x4a [ 0.100000] [<c017fd6f>] ? kfree+0xd8/0xec [ 0.101999] [<c018c3e2>] sys_mkdir+0x10/0x12 [ 0.103999] [<c0426353>] do_name+0x112/0x1b3 [ 0.104999] [<c042558b>] write_buffer+0x1d/0x2c [ 0.106999] [<c04255fe>] flush_window+0x64/0xb3 [ 0.108998] [<c04272f5>] unpack_to_rootfs+0x62c/0x8b9 [ 0.111000] [<c0127d76>] ? printk+0x15/0x17 [ 0.112671] [<c0118982>] ? free_init_pages+0x82/0x8d [ 0.113998] [<c04275a2>] ...
For those of you new to this thread, here's the initial report: http://marc.info/?t=120536629300001&r=1&w=2 I'm pretty sure the root cause of this bug is this commit: ACPI: basic initramfs DSDT override support 71fc47a9adf8ee89e5c96a47222915c5485ac437 Which did this hunk: @@ -648,6 +654,7 @@ asmlinkage void __init start_kernel(void) check_bugs(); + populate_rootfs(); /* For DSDT override from initramfs */ acpi_early_init(); /* before LAPIC and SMP init */ /* Do the rest non-__init'ed, we're now alive */ rest_init(); ... Well, the fs initcalls aren't actually done until during rest_init(), including initializing my mnt_writer[] spinlocks. I guess I could statically initialize them, but that's not the root of the problem, it's just the canary in the coal mine. I think the populate_rootfs() call is completely bogus and certainly can't be done before the initcalls. But, I don't immediately have any better suggestions for you. Can you delay the ACPI init until after the fs initcalls are made? -- Dave --
Time to just revert that one? It caused some other issues too, iirc. Len? Linus --
Hi, I have made a patch to fix problems with regards to early userspace calls (http://lkml.org/lkml/2008/2/23/306) but I don't think it will solve this bug. So far I had not heard of problems with filesystem initialization. I'm not sure it would be possible to delay acpi_early_init() until after the fs initcalls. Maybe Len knows. How about trying the opposite: what is the barely minimum to initialize so that the rootfs can be populated and read? Would it be possible to have a kind of early_mnt_writer_initialize() that would do that? See you, Eric --
I *can* probably do it earlier, maybe even statically, but I think you're missing the point a bit here. We've just been super lucky so far that populate_rootfs() doesn't depend on any other initcalls (or at least BUG_ON() because of them). There may be some more buglets hiding around. It'd be a shame to have to have "super_early_fs_initcall()" logic for every part of the VFS or any other initcall for that matter that you might need. How do we tell all future VFS hackers that they have to do this so that the next guy doesn't break it? I certainly missed it. :) We could separate out the initcalls and just have the fs ones run before the rest do. But, I'm not sure what interactions *THAT* might have. There are arch-specific initcalls, and I have no idea if the fs init code depends on *those*. That's a lot of code to check. It is nailed when you the patch says: + /* + * Never do this at home, only the user-space is allowed to open a file. + * The clean way would be to use the firmware loader. But this code must be run + * before there is any userspace available. So we need a static/init firmware + * infrastructure, which doesn't exist yet... + */ I think requiring FS access this early in the boot processes is just broken. It seems like the author of the patch knew a better way and tried to get away with a hack. I think it backfired. :) -- Dave --
Actually, each time I look at init/main.c I feel like we are super lucky Well, my point was that actually populate_rootfs() does _very_ little with regard to FS manipulation, acpi_find_dsdt_initrd() even less. The task of checking that everything needed is available beforehand is certainly not the same magnitude as the one of the Danaides as you seemed to implied ;-) The fact is, this patch has been tested a lot, because it's been used by several distributions for a long time. I expect that the only potential I'm actually the author of this comment... The static/init firmware infrastructure that I mentioned was more just about a way to hide the fs access in a special part of the kernel, not avoiding it. We used to have a different way but it was even uglier: append the DSDT after the initramfs, and then access it _directly_. This implies teaching populate_rootfs() to not panic when seeing DSDTs and loosing the benefit of the compression. That said, I'm really not against any complete different approach. All that is needed is being able to read a file early at boot (the DSDT) without having to recompile the kernel each time the file is modified. For instance someone had once mentioned modifying the in-kernel DSDT by unlinking and relinking the bzimage. If one can show me how to do that I'd be happy to implement it... Eric --
The problem is defining how much "very little" is, and making sure that
all the other kernel developers agree with you on it.
Anyway, I'm sick of too much bitching and too little coding. Andrew,
here's a patch for -mm that will at least shut up the spinlock warnings.
Al, you'll also need something similar to this for when you get Linus to
pull your git tree that has the r/o bind mount patches.
It's a hack, but I don't know any better way to do it until the ACPI
mess gets cleaned up.
Arjan, is there a way to statically set lockdep classes for a spinlock
that I'm missing?
I'll leave it to everyone else to describe the evils of calling into
*any* fs code before the fs initcalls have been made.
-- Dave
I'm not happy with this patch, but I don't see an easier way
to do it. We can't statically initialize the lockdep classes
as far as I can see.
---
linux-2.6.git-dave/fs/namespace.c | 3 +--
linux-2.6.git-dave/include/linux/mount.h | 1 +
linux-2.6.git-dave/init/main.c | 8 ++++++++
3 files changed, 10 insertions(+), 2 deletions(-)
diff -puN fs/namei.c~robind-statically-initialize-locks fs/namei.c
diff -puN fs/namespace.c~robind-statically-initialize-locks fs/namespace.c
--- linux-2.6.git/fs/namespace.c~robind-statically-initialize-locks 2008-03-14 16:12:44.000000000 -0700
+++ linux-2.6.git-dave/fs/namespace.c 2008-03-14 16:16:43.000000000 -0700
@@ -158,7 +158,7 @@ struct mnt_writer {
} ____cacheline_aligned_in_smp;
static DEFINE_PER_CPU(struct mnt_writer, mnt_writers);
-static int __init init_mnt_writers(void)
+int __init init_mnt_writers(void)
{
int cpu;
for_each_possible_cpu(cpu) {
@@ -169,7 +169,6 @@ static int __init init_mnt_writers(void)
}
return 0;
}
-fs_initcall(init_mnt_writers);
static void unlock_mnt_writers(void)
{
diff -puN init/main.c~robind-statically-initialize-locks init/main.c
--- linux-2.6.git/init/main.c~robind-statically-initialize-locks 2008-03-14 16:13:02.000000000 -0700
+++ ...=2E Sorry to say, it doesn't. That is, it does shut up the warning I reported, but there's a new one appearing now instead, three lines later. Here's the dmesg diff: @@ -216,29 +216,30 @@ CPU0: Thermal monitoring enabled Compat vDSO mapped to ffffe000. Checking 'hlt' instruction... OK. -BUG: spinlock bad magic on CPU#0, swapper/0 - lock: c2c19380, .magic: 00000000, .owner: swapper/0, .owner_cpu: 0 -Pid: 0, comm: swapper Not tainted 2.6.25-rc5-mm1-testing #2 - [<c01f728c>] spin_bug+0x7c/0x87 - [<c01f72b0>] _raw_spin_unlock+0x19/0x71 - [<c0301922>] _spin_unlock+0x1d/0x3c - [<c01981aa>] mnt_want_write+0x62/0x88 - [<c018c382>] sys_mkdirat+0x86/0xd6 - [<c04260ab>] ? clean_path+0x16/0x4a - [<c017fd6f>] ? kfree+0xd8/0xec - [<c018c3e2>] sys_mkdir+0x10/0x12 - [<c0426353>] do_name+0x112/0x1b3 - [<c042558b>] write_buffer+0x1d/0x2c - [<c04255fe>] flush_window+0x64/0xb3 - [<c04272f5>] unpack_to_rootfs+0x62c/0x8b9 - [<c04275a2>] populate_rootfs+0x20/0x109 - [<c0429ed2>] ? alternative_instructions+0x153/0x158 - [<c04248f5>] start_kernel+0x343/0x355 - [<c0424019>] i386_start_kernel+0x8/0xa - =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Unpacking initramfs... done -Freeing initrd memory: 8767k freed +Freeing initrd memory: 8834k freed ACPI: Core revision 20070126 +INFO: trying to register non-static key. +the code is fine but needs lockdep annotation. +turning off the locking correctness validator. +Pid: 0, comm: swapper Not tainted 2.6.25-rc5-mm1-testing #3 + [<c014321e>] __lock_acquire+0x144/0xb6e + [<c010b1a2>] ? native_sched_clock+0xe0/0xff + [<c017fc57>] ? kmem_cache_alloc+0x89/0xc9 + [<c0142ce0>] ? trace_hardirqs_on+0xe8/0x11d + [<c014404f>] lock_acquire+0x6a/0x90 + [<c013b460>] ? down_trylock+0xc/0x27 + [<c03016cb>] _spin_lock_irqsave+0x42/0x72 + [<c013b460>] ? down_trylock+0xc/0x27 + [<c013b460>] down_trylock+0xc/0x27 + [<c021fa65>] acpi_os_wait_semaphore+0x67/0x13d + [<c023a39e>] acpi_ut_acquire_mutex+0x65/0xcf + [<c0230261>] ...
I've reverted the whole thing. Or rather, since there were various small fixup commits over time, and a simple revert doesn't really work, I ended up just removing the option and the code that was conditional on it - that way, if we really want to fight this out some time (after 2.6.25 is out) or some vendor wants to use a known-broken option anyway, there's a simple and fairly clean commit to revert the revert. It's commit 9a9e0d685553af76cb6ae2af93cca4913e7fcd47, see http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=9a9e0d... for details if you aren't a git person. But quite frankly I don't think that we even want to re-introduce this in that form. If we really want to have a dynamic custom DSDT, I think we should do the whole DSDT replacement *much* later by ACPI (like just before driver loading or something like that). If the BIOS-provided DSDT is _so_ broken that we cannot even get core stuff like the CPU's going, I think it has more serious issues than any custom DSDT will ever fix, but letting ACPI actually switch DSDT's at run-time (instead of just replacing it when looking for it very very early in the boot sequence) in order to work around some device issues sounds reasonably sane. So how about aiming to make that DSDT-replacement something you can do from any kernel module, _after_ the original DSDT has already been parsed? And then the whole "load it from initrd" turns into a regular thing that we can do pretty early, but that we don't have to do quite _this_ early! Linus --
So that avoids the VFS layer issues, but it's still strictly much worse than just having a run-time loading. What's the problem with just loading a new DSDT later? Potentially as in *much* later: including when user-space is all up-and-running? For things like DVD install images, you'd quite possibly want to have a few known-workaround DSDT images with the installer, and just say "ok, we want to fix up this ACPI crap in order to get working suspend/resume" kind of thing. So what's the reason for pushing for this insanely-early workaround in the first place, instead of letting user-space do something like cat my-dsdt-image > /proc/sys/acpi/DSDT or whatever at runtime? Linus --
Yeah, or probably more something like this nowadays ;-) cat my-dsdt-image > /sys/firmware/acpi/tables/DSDT As I said in my previous email, I'm already convinced that late-override of ACPI table approach would be very interesting to investigate. However, this cannot be taken lightly. A _lot_ of places in the kernel depend on the ACPI and nothing has ever been done in the direction of dynamic modification of the APCI tables. The implementation is likely to be much bigger than the current 100 lines of patch. That said, it should be possible to draw some assumptions without restraining much the functionality. Such as: * every object present in the original table is still present is the new table * they keep the same name Len, do you think it would be feasible? How do you think the implementation could be done? Eric --
I agree with Linus' decision to revert/disable this feature. I think it is appropriate to muck with this in -mm, but not in -rc6 I don't think re-loading the DSDT at run-time would be practical. First, booting with the OEM DSDT may nullify the benefit of overriding the OEM DSDT -- the damage may have already been done. Secondly, unwinding everything that depends on the DSDT is on the order of kexec or suspend/resume. We're talking about all the stuff that PNP does at boot time, plus device discovery and driver binding. The feature on the table here is an initrd DSDT override. We already have the ability to statically compile a DSDT override into the kernel image. That capability is sufficient for kernel developers. The initrd version of the DSDT override is really for one scenario. Somebody who has a BIOS that even Windows can't deal with -- so no amount of "Windows bug compatbility" will help Linux with it. They must be capable eough to generate or acquire a modified DSDT. They must be unwilling/unable to re-build their kenrel from scratch each time they update it. Eg. following debian unstable updates etc. I think that customer deserves support, particularly because they get bragging rights that Linux works better on a box build for Windows than Windows does:-) However, I don't think there are enough customers like this to justify a huge effort that would add risk to Linux. -Len --
For a Linux distro to ship DSDT override images, they'd have to have some licensing & support arrangement with the OEM who actually owns that BIOS code. While this wouldn't defy any laws of physics, it doesn't look compatible with current industry business practices. OEMs are more likely to simply ship a BIOS update ISO. -Len --
You have interpretted code runing (AML), and you want to replace it with different code? Akin to changing from one kernel to different during runtime? Yes, I guess it might work for very simple changes, but if you need to change data structures between origina and modified DSDT, you are in for a big trouble, right? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
Heh. That gave me an idea. Can we use kexec for this? Let's say you get as far in boot as the initrd and realize that you're running on one of these screwed up systems. Can you stick the new DSDT somewhere known (and safe) in memory, and kexec yourself back to the beginning of the kernel boot? When you boot up the second time, you have the new, shiny DSDT there which is, of course, used instead of the bogus BIOS one. It costs you some bootup time, but we're talking about working around really busted hardware here. -- Dave --
Hmmm. I guess we should turn off acpi mode, kexec, turn on acpi mode with new dsdt. Turning off acpi is not exactly easy, but specs describe how to do it... So yes, this is hard but doable. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
Why do you think it's necessary to turn off acpi mode? What will not work if we keep it on all the time? BTW, let me summarize my understanding of the kexec approach: * the userspace write the new DSDT (cat my-dsdt-image > /sys/firmware/acpi/tables/DSDT) * the kernel don't use this DSDT directly but keeps it somewhere warm and fuzzy in the RAM * userspace does a kexec * the new kernel boots and at some (early) point, dsdt_override() is called. It detects that the special place in the RAM for a new DSDT is used. It provides this pointer to ACPI as the new place to read the DSDT. Dave, am I correctly understanding the scenario you had in mind? I have pratically no knowledge of kexec. Is there a documented way to pass big chunk of data from one kernel to another one? How can I do that? Eric --
Yes, and now ACPI layer tries to enable already enabled ACPI... which is no-no according to spec, but you may be able to get away with it. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html pomozte zachranit klanovicky les: http://www.ujezdskystrom.info/ --
Yeah, that's basically what I was thinking. But, this is only for a case where we can't do the real runtime replacement that Linus has been advocating. That approach is clearly superior, but I would imagine that it'll require some serious ACPI Heh. Documented, no. What OS do you think this is? ;) I'm not sure it has ever been really needed before. At one point, kexec just make a copy of the e820 table to tell the new kernel where it's ram was. If you carved out a chunk of memory and set it as reserved, the new kernel could go looking there. kexec is Eric Biederman's (on cc) baby, and he might have some more concrete suggestions for you. -- Dave --
I see a problem here. This could work. And if it is successful, the "kexec reboot around busted hw"-trick is used for other stuff as well. So your broken machine reboots with some fix, then it reboots with the custom DSDT. Is the previous fix preserved? Then a third problem is hit, another kexec reboot. Is the first fix _and_ the custom DSDT preserved on this reboot? Or do we get an infinite sequence of reboots, alternating between a couple of completely unrelated fixes for bad hw/bios... Once there is more than one fix utilizing this trick, some "protocol" for managing a string of kexec fixes might become necessary. Helge Hafting --
I recommend that you make a new proposal for 2.6.26 that applies on top of Linus' top-of-tree and that we include lkml in hashing it out rather than just linux-acpi. thanks, -Len --
Hi Tim, Again, thanks for the excellent bug reporting. This is actually a different problem (and not my code again, thank goodness). I think a few of these got fixed in current -mm. According So, this looks like an on-stack ACPI structure that got initialized wrongly. At least we already have those dudes on the cc. :) But, this might also get fixed by reverting the patch as Linus just did. It might just be best to wait for another -mm release and see how it settles out. -- Dave --
Actually looks like the semaphore thing again, its a spinlock inside of Looks like another of the semaphore thingies.. Does this go away once you apply the semaphore lockdep fixup from here: http://lkml.org/lkml/2008/3/12/63 --
Yes, it does. With that patch on top of Dave's, I see no stack backtraces in dmesg anymore. HTH T. --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
DSDT's are generally 4KB to 64KB, so I don't think compression for a DSDT override is important. -Len --
5/2.6.25-rc5-mm1/ Late during boot, this issues the following warning on my Pentium D, apparently when trying to load an appropriate CPU frequency driver: [ 56.759128] ------------[ cut here ]------------ [ 56.765058] WARNING: at drivers/base/sys.c:173 sysdev_driver_register+= 0x34/0xce() [ 56.776027] Modules linked in: acpi_cpufreq(+) speedstep_lib ip6table_= filter ip6_tables x_tables ipv6 microcode firmware_class loop osst st sr_= mod cdrom pata_acpi bas_gigaset snd_hda_intel gigaset isdn snd_pcm ata_ge= neric<6>ip_tables: (C) 2000-2006 Netfilter Core Team [ 56.785293] snd_timer aic7xxx slhc snd ohci1394 rtc_cmos ieee1394 shp= chp crc_ccitt iTCO_wdt e1000e rtc_core iTCO_vendor_support soundcore scsi= _transport_spi watchdog_core pci_hotplug intel_agp button thermal agpgart= rtc_lib processor i2c_i801 watchdog_dev parport_pc i2c_core snd_page_all= oc parport pata_marvell sg ext3 jbd mbcache linear sd_mod usbhid hid ff_m= emless ahci libata scsi_mod ehci_hcd uhci_hcd usbcore dm_snapshot dm_mod [ 56.805358] Pid: 2856, comm: modprobe Not tainted 2.6.25-rc5-mm1-testi= ng #1 [ 56.810766] [<c01272e9>] warn_on_slowpath+0x41/0x6d [ 56.820628] [<c0230065>] ? acpi_ns_lookup+0x2b5/0x497 [ 56.830455] [<c0230e25>] ? acpi_evaluate_object+0x23e/0x249 [ 56.840414] [<c02ff809>] ? mutex_unlock+0x8/0xa [ 56.848380] [<fa9fec1d>] ? acpi_processor_preregister_performance+0x4= e6/0x4f1 [processor] [ 56.858297] [<c0286438>] ? cpufreq_register_driver+0x42/0xfc [ 56.868263] [<c026423d>] sysdev_driver_register+0x34/0xce [ 56.877974] [<c0286476>] cpufreq_register_driver+0x80/0xfc [ 56.887327] [<facde034>] acpi_cpufreq_init+0x34/0x3a [acpi_cpufreq] [ 56.897290] [<c014ad7a>] sys_init_module+0x1816/0x1943 [ 56.907304] [<facb5000>] ? icmp_checkentry+0x0/0x14 [ip_tables] [ 56.917255] [<c0183cd2>] ? sys_read+0x3b/0x60 [ 56.925094] [<c0106aec>] sysenter_past_esp+0x6d/0xc5 [ 56.935071] ...
This implys that a cpufreq module is getting registered twice in the sysdev code :( thanks, greg k-h --
On Thu, Mar 13, 2008 at 11:34:39AM -0700, Greg KH wrote: > On Thu, Mar 13, 2008 at 01:15:52AM +0100, Tilman Schmidt wrote: > > Am 11.03.2008 09:14 schrieb Andrew Morton: > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc5/2.6.25-rc5-mm1/ > > > > Late during boot, this issues the following warning on my Pentium D, > > apparently when trying to load an appropriate CPU frequency driver: > > > > [ 56.759128] ------------[ cut here ]------------ > > [ 56.765058] WARNING: at drivers/base/sys.c:173 sysdev_driver_register+0x34/0xce() > > [ 56.776027] Modules linked in: acpi_cpufreq(+) speedstep_lib ip6table_filter ip6_tables x_tables ipv6 microcode firmware_class loop osst st sr_mod cdrom pata_acpi bas_gigaset snd_hda_intel gigaset isdn snd_pcm ata_generic<6>ip_tables: (C) 2000-2006 Netfilter Core Team > > [ 56.785293] snd_timer aic7xxx slhc snd ohci1394 rtc_cmos ieee1394 shpchp crc_ccitt iTCO_wdt e1000e rtc_core iTCO_vendor_support soundcore scsi_transport_spi watchdog_core pci_hotplug intel_agp button thermal agpgart rtc_lib processor i2c_i801 watchdog_dev parport_pc i2c_core snd_page_alloc parport pata_marvell sg ext3 jbd mbcache linear sd_mod usbhid hid ff_memless ahci libata scsi_mod ehci_hcd uhci_hcd usbcore dm_snapshot dm_mod > > [ 56.805358] Pid: 2856, comm: modprobe Not tainted 2.6.25-rc5-mm1-testing #1 > > [ 56.810766] [<c01272e9>] warn_on_slowpath+0x41/0x6d > > [ 56.820628] [<c0230065>] ? acpi_ns_lookup+0x2b5/0x497 > > [ 56.830455] [<c0230e25>] ? acpi_evaluate_object+0x23e/0x249 > > [ 56.840414] [<c02ff809>] ? mutex_unlock+0x8/0xa > > [ 56.848380] [<fa9fec1d>] ? acpi_processor_preregister_performance+0x4e6/0x4f1 [processor] > > [ 56.858297] [<c0286438>] ? cpufreq_register_driver+0x42/0xfc > > [ 56.868263] [<c026423d>] sysdev_driver_register+0x34/0xce > > [ 56.877974] [<c0286476>] cpufreq_register_driver+0x80/0xfc > > [ 56.887327] [<facde034>] acpi_cpufreq_init+0x34/0x3a [acpi_cpufreq] > > [ ...
On Thu, Mar 13, 2008 at 01:15:52AM +0100, Tilman Schmidt wrote: > Am 11.03.2008 09:14 schrieb Andrew Morton: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc5/2.6.25-rc5-mm1/ > > Late during boot, this issues the following warning on my Pentium D, > apparently when trying to load an appropriate CPU frequency driver: > > [ 56.759128] ------------[ cut here ]------------ > [ 56.765058] WARNING: at drivers/base/sys.c:173 sysdev_driver_register+0x34/0xce() > [ 56.776027] Modules linked in: acpi_cpufreq(+) speedstep_lib ip6table_filter ip6_tables x_tables ipv6 microcode firmware_class loop osst st sr_mod cdrom pata_acpi bas_gigaset snd_hda_intel gigaset isdn snd_pcm ata_generic<6>ip_tables: (C) 2000-2006 Netfilter Core Team > [ 56.785293] snd_timer aic7xxx slhc snd ohci1394 rtc_cmos ieee1394 shpchp crc_ccitt iTCO_wdt e1000e rtc_core iTCO_vendor_support soundcore scsi_transport_spi watchdog_core pci_hotplug intel_agp button thermal agpgart rtc_lib processor i2c_i801 watchdog_dev parport_pc i2c_core snd_page_alloc parport pata_marvell sg ext3 jbd mbcache linear sd_mod usbhid hid ff_memless ahci libata scsi_mod ehci_hcd uhci_hcd usbcore dm_snapshot dm_mod > [ 56.805358] Pid: 2856, comm: modprobe Not tainted 2.6.25-rc5-mm1-testing #1 > [ 56.810766] [<c01272e9>] warn_on_slowpath+0x41/0x6d > [ 56.820628] [<c0230065>] ? acpi_ns_lookup+0x2b5/0x497 > [ 56.830455] [<c0230e25>] ? acpi_evaluate_object+0x23e/0x249 > [ 56.840414] [<c02ff809>] ? mutex_unlock+0x8/0xa > [ 56.848380] [<fa9fec1d>] ? acpi_processor_preregister_performance+0x4e6/0x4f1 [processor] > [ 56.858297] [<c0286438>] ? cpufreq_register_driver+0x42/0xfc > [ 56.868263] [<c026423d>] sysdev_driver_register+0x34/0xce > [ 56.877974] [<c0286476>] cpufreq_register_driver+0x80/0xfc > [ 56.887327] [<facde034>] acpi_cpufreq_init+0x34/0x3a [acpi_cpufreq] > [ 56.897290] [<c014ad7a>] sys_init_module+0x1816/0x1943 > [ 56.907304] [<facb5000>] ? ...
Sure, that would be simple to do. Will change it now, and should show up in the next -mm. thanks, greg k-h --
able_filter ip6_tables x_tables ipv6 microcode firmware_class loop osst s= t sr_mod cdrom pata_acpi bas_gigaset snd_hda_intel gigaset isdn snd_pcm a= 4 shpchp crc_ccitt iTCO_wdt e1000e rtc_core iTCO_vendor_support soundcore= scsi_transport_spi watchdog_core pci_hotplug intel_agp button thermal ag= pgart rtc_lib processor i2c_i801 watchdog_dev parport_pc i2c_core snd_pag= e_alloc parport pata_marvell sg ext3 jbd mbcache linear sd_mod usbhid hid= ff_memless ahci libata scsi_mod ehci_hcd uhci_hcd usbcore dm_snapshot dm= =3D7 You can find it at http://gollum.phnxsoft.com/~ts/linux/dmesg.out and the corresponding .config right beside it at http://gollum.phnxsoft.com/~ts/linux/config-2.6.25-rc5-mm1 CCing linux-acpi as you did in your other mail. HTH T. --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
On Fri, Mar 14, 2008 at 01:01:18AM +0100, Tilman Schmidt wrote: > > Full dmesg please, with CPU_FREQ_DEBUG=y, and boot with cpufreq.debug=7 > > You can find it at > http://gollum.phnxsoft.com/~ts/linux/dmesg.out > and the corresponding .config right beside it at > http://gollum.phnxsoft.com/~ts/linux/config-2.6.25-rc5-mm1 > > CCing linux-acpi as you did in your other mail. The interesting bits.. [ 46.075145] cpufreq-core: trying to register driver centrino here we've done sysdev_driver_register(&cpu_sysdev_class,&cpufreq_sysdev_driver); (see cpufreq_register_driver in drivers/cpufreq/cpufreq.c) This is the only place we register sysdev entries. [ 46.075155] cpufreq-core: adding CPU 0 [ 46.075163] speedstep-centrino: found unsupported CPU with Enhanced SpeedStep: send /proc/cpuinfo to cpufreq@lists.linux.org.uk [ 46.075167] cpufreq-core: initialization failed this ENODEVs [ 46.075173] cpufreq-core: adding CPU 1 [ 46.075176] cpufreq-core: initialization failed Same for the 2nd CPU. [ 46.075180] cpufreq-core: no CPU initialized for driver centrino here we hit this part of cpufreq_register_driver /* if all ->init() calls failed, unregister */ if (ret) { dprintk("no CPU initialized for driver %s\n", driver_data->name); sysdev_driver_unregister(&cpu_sysdev_class, &cpufreq_sysdev_driver); So we release all the refs. [ 46.075185] cpufreq-core: unregistering CPU 0 [ 46.075190] cpufreq-core: unregistering CPU 1 These are the sysdev callbacks. [ 46.429147] powernow: This module only works with AMD K7 CPUs [ 47.081642] speedstep-lib: x86: f, model: 6 [ 47.081649] speedstep-ich: Intel(R) SpeedStep(TM) capable processor not found These drivers don't even get as far as calling cpufreq_register_driver, they ENODEV way before things get ...
Please set CONFIG_ACPI_DEBUG and boot the system with the option of "acpi.debug_layer=0x01010000 acpi.debug_level=0x1f". It will be great if the acpidump output is attached. --
CONFIG_ACPI_DEBUG is already set, but I cannot reboot the machine Available now at http://gollum.phnxsoft.com/~ts/linux/acpidump.out HTH T. --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
Ok, that took a bit longer than I hoped, but the result is now finally available at: http://gollum.phnxsoft.com/~ts/linux/dmesg-acpidebug.out Note that I doctored this a bit: the dmesg buffer had already overflowed by the time I ran the dmesg command, so I manually prepended the missing part from the file /var/log/boot.msg into which SUSE saves the early kernel messages. The border between the two is marked off by the string "~~~~~~~~splice~~~~~~~~", and I left a line of overlap to make it very clear. The output of acpidump is unchanged wrt what I already posted. (Unsurprisingly, but nevertheless I checked. Call me paranoid. ;-) HTH T. --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
This kernel hangs during shutdown, and that prevents automatic poweroff. I have one small patch that improves the iwl3945 wireless driver a little. Dell D830 laptop, 64-bit smp It looks like this: *** Last service has quit. *** Your system will now POWER OFF! Goodbye Bug: unable to handle kernel paging request at ffffffff8020a7ad IP: [<ffffffff80211b5a>] text_poke+0xe/0x15 PGD 203067 PUD 2+7+63 PMD 7f3ba163 PTE 20a161 Oops: 0003 [1] SMP last sysfs file: /sys/devices/LNXSYSTM:00/device:00/ACPI0003:00/power_supply/AC/online CPU 0 Modules linked in: tun pcmcia dock piix iTCO_wdt ata_piix watchdog_core watchdog_dev intel_agp ata_generic hci_usb Pid: 7606, comm: initng Not tainted 2.6.25-rc5-mm1 RIP: 0010:[<ffffffff80211b5a>] [<ffffffff80211b5a>]text_poke+0xe/0x15 RSP: 0000:ffff81007e559cb8 EFLAGS: 00010083 (register dump omitted, but I can reproduce anytime if it matters) Process initng (pid 7606, threadinfo...) call trace: alternatives_smp_unlock alternatives_smp_switch ? schedule_timeout __cpu_die _cpu_down disable_nonboot_cpus kernel_power_off sys_reboot ? handle_mm_fault ? __up_read ? do_page_fault ? __put_user ? error_exit system_call_after_swapgs (rest omitted) sysrq still works at this point sysrq+P gives: CPU 0: Modules linked in (same as before) Pid: 0, comm: swapper Tainted: G D 2.6.25-rc5-mm1 RIP ... acpi_idle_enter (register dump omitted) acpi_idle_enter_bm menu_select cpuidle_idle_call cpuidle_idle_call default_idle cpu_idle rest_init sysrq+O fails to deactivate a mouse, complains that the disk may not be spun down properly, prepares for sleep state S5, but don't power off. sysrq doesn't work after this. Helge Hafting --
Yes, I was hitting the text_poke() oops with 2.6.25-rc3-mm1 but not with 2.6.25-rc5-mm1. This _might_ have been due to a snafu in git-x86: it had a [patch 2/2] from Mathieu but was missing the needed [patch 1/2]. But I don't know if this was the cause and I don't know whether 2.6.25-rc3-mm1's git-x86 had the same problem. --
Andrew Morton wrote: The problem seems to be solved in 2.6.25-rc6. Helge Hafting --
5/2.6.25-rc5-mm1/
I'm noticing a strange effect with this:
On my openSUSE 10.3 development machine with SUSEs default MTA
Postfix installed, I occasionally send a pre-formatted mail by
feeding it directly into "/usr/sbin/sendmail -t". If I try that
while running a 2.6.25-rc5-mm1 kernel, I get:
ts@xenon:~/kernel> /usr/sbin/sendmail -t < patch-usb-reduce-syslog-clutte=
r-v3
postdrop: warning: can't open /proc/net/if_inet6 (Permission denied) - sk=
ipping IPv6 configuration
sendmail: warning: command "/usr/sbin/postdrop -r" exited with status 1
sendmail: fatal: ts(1000): unable to execute /usr/sbin/postdrop -r: Succe=
ss
ts@xenon:~/kernel>
and unsurprisingly, the mail is not sent. If I do the same as root,
everything works as usual, there is no console output from the
sendmail command, and the mail goes out as it should. All other
networking applications appear to be running normally.
On a 2.6.25-rc5 (non-mm) kernel I do not need to run the sendmail
command as root. It works just as well if I run it as myself.
IPv6 is not in use on that machine. The Ethernet interface has
just the link local IPv6 address. Possibly relevant information:
ts@xenon:~> /sbin/ifconfig -a
eth0 Protokoll:Ethernet Hardware Adresse 00:19:D1:03:D8:FF
inet Adresse:192.168.59.102 Bcast:192.168.59.255 Maske:255.25=
5.255.0
UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1
RX packets:78 errors:0 dropped:0 overruns:0 frame:0
TX packets:145 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 Sendewarteschlangenl=E4nge:100
RX bytes:9547 (9.3 Kb) TX bytes:17952 (17.5 Kb)
Speicher:92c00000-92c20000
lo Protokoll:Lokale Schleife
inet Adresse:127.0.0.1 Maske:255.0.0.0
inet6 Adresse: ::1/128 G=FCltigkeitsbereich:Maschine
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:2 errors:0 dropped:0 overruns:0 frame:0
TX packets:2 errors:0 dropped:0 overruns:0 ...Hi Tilman, Is it possible to have your config file used to compile the kernel ? --
Sure. You can find it at http://gollum.phnxsoft.com/~ts/linux/config-2.6.25-rc5-mm1 HTH T. --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
I also tried to reproduce your problem with Postfix (on a Debian distro) but failed to obtain the error message. While googling for the error string, I found this link which report the same kind of error when Postfix is used with grsecurity (in 2006): http://blog.jensthebrain.de/archives/2006/12/11/IPv6-Probleme-mit-Postfix-und-grsecurity I barely understand German so I'm not sure it is related to your problem. Benjamin --
m. The userspace failure described there is indeed the same as mine: Postfix' sendmail command tries to open "/proc/net/if_inet6" which fails with EACCES. But I have never installed grsecurity on this machine, and the problem appeared for me only with kernel 2.6.25-rc5-mm1, not when running kernel 2.6.25-rc5 on the same machine, so I guess the cause must be something different. What's also strange is that I can "cat /proc/net/if_inet6" from the command line as the same non-root user with no problem at all. strace of "cat /proc/net/if_inet6" has: open("/proc/net/if_inet6", O_RDONLY|O_LARGEFILE) =3D 3 strace of "/usr/sbin/sendmail", however: open("/proc/net/if_inet6", O_RDONLY) =3D -1 EACCES (Permission denied) Both run as ts@xenon:~> id uid=3D1000(ts) gid=3D100(users) groups=3D0(root),14(uucp),16(dialout),33(= video),100(users),112(bacula) HTH T. --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
It's the one that comes with openSUSE 10.3: ts@xenon:~> rpm -q postfix Sure, no problem. You may find them at http://gollum.phnxsoft.com/~ts/linux/main.cf http://gollum.phnxsoft.com/~ts/linux/strace.log HTH T. --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
Thank you very much, I will try to reproduce it with a simple program. --
Tilman, I've finally managed to reproduce your problem with Postfix on one of my victims. Earlier, in the afternoon, I wrote a piece of code that triggered a similar behaviour, but I wasn't sure it was exactly the problem you found. So, I've rebuilt Postfix, added some traces and, voila, same issue as yours. (The version of Postfix originally installed on my machine seems to have IPv6 disabled) I bisected the problem to the commit "[NET]: Make /proc/net a symlink on /proc/self/net (v3)" Here is what happens: - Recently /proc/net has been moved to /proc/self/net, and /proc/self/net is a symlink on this directory. - Before that everybody could access /proc/net and read /proc/net/if_inet6: dr-xr-xr-x 6 root root 0 2008-03-05 15:23 /proc/net - Now, /proc/self/net has a more restrictive access mode and ony the owner of the process can enter the directory: dr-xr--r-- 5 toto toto 0 Mar 19 17:30 net This is not a problem in most of the cases, but it becomes annoying when a process decides to change its UID or GID. It may loose access to its own /proc/self/net entries. - What happens in the Postfix case is the 'sendmail' process executes the '/usr/sbin/postdrop' binary to enqueue the message, but unfortunately '/usr/bin/postdrop' has the setgid bit set: -rwxr-sr-x 1 root postdrop 479475 Mar 19 17:14 /usr/sbin/postdrop The process egid changes and this seems to be problematic to access /proc/self/net/if_inet6. :) I've attached a tiny test program that can be used to reproduce the problem without Postfix. - Either execute it as root and give it an unprivileged uid in argument ./test-proc_net_if_inet6 1001 - Or change its ownership and access mode to: -rwxr-sr-x root postdrop and execute it as a lambda user. chown root:postdrop test-proc_net_if_inet6; chmod 2755 test-proc_net_if_= inet6 ./test-proc_net_if_inet6 I've found the cause but not the fix. :) (Adding Pavel in cc:) Regards, Benjamin
On Wed, 19 Mar 2008 18:52:41 +0100 Thanks for that - most useful. Although this is advertised as a 2.6.25-rc5-mm1 problem, I assume the regression is also in mainline? 2.6.25-rc6? --
On Wed, Mar 19, 2008 at 10:16 PM, Andrew Morton Yes, it is in mainline. I reproduced it on 2.6.25-rc5. Benjamin --
From: Andrew Morton <akpm@linux-foundation.org> It is in 2.6.25-rc6, correct. If Pavel or someone else doesn't produce a good fix soon I'll revert the guilty change as this bug is worse than the problem that changeset fixes. --
Andre Noll sent a patch to LKML, acked by Pavel: "Fix permissions of /proc/net" http://thread.gmane.org/gmane.linux.kernel/655148 Benjamin --
Have you tested that patch? Rafael --
Also tested here. It fixes the regression. Benjamin --
My results: up to 2.6.25-rc5 -- good 2.6.25-rc5-mm1 -- bad 2.6.25-rc6 -- bad HTH T. --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
On Thu, 13 Mar 2008 23:07:30 +0100 Actually I later dropped signals-send_signal-factor-out-signal_group_exit-checks.patch at Oleg's request. But I don't think we did that because it was known to be buggy, so perhaps the same bug crept back in in another form.. --
Laurent, thanks a lot!
Yes, currently I suspect we have another bug.
And. While doing this patch I forgot we should fix the bugs with init first!
(will try to make the patch soon).
Laurent, any chance you can try 2.6.25-rc5-mm1 + the patch below?
Unlikely it can help, but would be great to be sure.
Oleg.
--- MM/kernel/signal.c~ 2008-03-14 08:08:07.000000000 +0300
+++ MM/kernel/signal.c 2008-03-14 08:08:17.000000000 +0300
@@ -719,6 +719,10 @@ static void complete_signal(int sig, str
/*
* This signal will be fatal to the whole group.
*/
+if (is_global_init(p)) {
+ printk(KERN_CRIT "ERR!! init is killed by %d\n", sig);
+ WARN_ON_ONCE(1);
+} else
if (!sig_kernel_coredump(sig)) {
/*
* Start a group exit and wake everybody up.
--
Great. Thanks a lot Laurent! So what happens is: We have the very old bug (bugs, actually) with the global init && signals which I tried to fix many times but can't find a simple solution. The fatal signal sent to init doesn't really kill it (we have the check in get_signal_to_deliver) but it sets SIGNAL_GROUP_EXIT. This is wrong, now init can't exec, this has other bad implications, and this is just insane. With the signals-send_signal-factor-out-signal_group_exit-checks.patch the task with SIGNAL_GROUP_EXIT doesn't recieve the signals. While this change itself is (I hope) correct, the "killed" /sbin/init now can't see SIGCHLD Not a kernel problem, but this looks a bit strange to me. init has SIG_DFL for SIGUSR1, and someone does kill(1, SIGUSR1). Note that init was explicitly targeted, the signal was not sent to prgp or -1. Most likely Ubuntu knows what it does, and I can't find any email at ubuntu.com to cc... Oleg. --
Hello, The build on my laptop (32bit x86) fails. sound/drivers/pcsp/pcsp.c: In function 'snd_pcsp_create': sound/drivers/pcsp/pcsp.c:54: error: 'loops_per_jiffy' undeclared (first use in this function) sound/drivers/pcsp/pcsp.c:54: error: (Each undeclared identifier is reported only once sound/drivers/pcsp/pcsp.c:54: error: for each function it appears in.) Seems like the patch below is needed. Mariusz Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> --- linux-2.6.25-rc5-mm1-a/sound/drivers/pcsp/pcsp.c 2008-03-16 21:34:28.000000000 +0100 +++ linux-2.6.25-rc5-mm1-b/sound/drivers/pcsp/pcsp.c 2008-03-16 21:58:58.000000000 +0100 @@ -12,6 +12,7 @@ #include <sound/initval.h> #include <sound/pcm.h> #include <linux/input.h> +#include <linux/delay.h> #include <asm/bitops.h> #include "pcsp_input.h" #include "pcsp.h" --
Hello, The gregkh-pci-pci-sparc64-use-generic-pci_enable_resources.patch which replaces arch-specific code with generic pci_enable_resources() makes my sparc64 box unable to boot (that's what quilt bisection says). At first I see these messages: hme 0000:00:01.1: device not available because of BAR 0 [1ff80008000:1ff8000f01f] collisions sym53c8xx 0000:00:03.0: device not available because of BAR 0 [1fe02010400:1fe020104ff] collisions sym53c8xx 0000:00:03.1: device not available because of BAR 0 [1fe02010800:1fe020108ff] collisions and finally, infamous VFS: Cannot open root device "sda3" or unknown-block(0,0) Mariusz PS. I attached .config used at bisection time. # lspci 0000:00:00.0 Host bridge: Sun Microsystems Computer Corp. Psycho PCI Bus Module 0000:00:01.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01) 0000:00:01.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01) 0000:00:03.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 14) 0000:00:03.1 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 14) 0001:00:00.0 Host bridge: Sun Microsystems Computer Corp. Psycho PCI Bus Module $ uname -a Linux sparc64 2.6.25-rc5 #2 SMP PREEMPT Fri Mar 28 12:16:30 CET 2008 sparc64 sun4u TI UltraSparc II (BlackBird) GNU/Linux
From: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Yes, that generic code won't work because of the NULL r->parent check. Alpha, ARM, V32, FRV, IA64, MIPS, MN10300, PARISC, PPC, SH, V850, X86, and Xtensa are all likely to run into problems because of this change. The only platform that did the check as a test of r->parent being NULL is Powerpc. The rest either didn't check (like sparc64), or tested it by going: if (!r->start && r->end) So the amount of potential breakage from this change is enormous. --
ppc and x86 won't have problem, I haven't checked the others, sparc64 Yup, though that makes sense to do it that way on platforms that Not that big, but yeah, it should be limited to platforms that actually build a resource-tree and keep track of assigned & allocated resources, which sparc64 doesn't (which is fair enough, if your firmware is 100% right and your kernel never has to assigns things itself). The NULL parent is a 100% indication that the resource was properly claimed and put in the resource-tree (and thus is non conflicting) on those platforms, but it's unused on sparc64. Basically, on platforms like x86 or powerpc, the PCI subsystem at boot builds a resource tree by collecting resources for all enabled devices and bridges in a first pass, then all others in a second pass, checking for conflicts or unassigned ones, and potentially re-assigning and re-allocating bridges if necessary. Sparc64 takes a different approach, it basically doesn't bother with a full resource tree, and just claims what driver claim, which is fine as long as you are certain that you always get a perfectly well assigned & non conflicting setup done by your firmware. The "full featured" approach is necessary for platforms where this isn't the case, such as powerpc, even with a pretty good OF like Apple ones, since they love to not assign resources that they know their MacOS driver will not need (such as not assigning IO space and closing it on the P2P bridge) which doesn't necessarily quite work with the requirements of the linux drivers, in addition to also gross bugs they have on some versions when using cards with P2P bridges on them. In addition, we also need that resource management to be able to dynamically assign resource after boot as our OF doesn't stay alive to do it, such as when using cardbus cards, or other type of hotplug things for which the firmware doesn't do dynamic resource allocation. So, the meat of the original patch isn't bad per-se. There is definitely a ...
