ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc8...
- The scheduler devel tree has been restored
- The driver tree is presently busted, so I reverted it to the 2..23-rc8-mm1
version.- It's now a nearly-32MB diff.
Boilerplate:
- See the `hot-fixes' directory for any important updates to this patchset.
- To fetch an -mm tree using git, use (for example)
git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1
git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1- -mm kernel commit activity can be reviewed by subscribing to the
mm-commits mailing list.echo "subscribe mm-commits" | mail majordomo@vger.kernel.org
- If you hit a bug in -mm and it is not obvious which patch caused it, it is
most valuable if you can perform a bisection search to identify which patch
introduced the bug. Instructions for this process are athttp://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt
But beware that this process takes some time (around ten rebuilds and
reboots), so consider reporting the bug first and if we cannot immediately
identify the faulty patch, then perform the bisection search.- When reporting bugs, please try to Cc: the relevant maintainer and mailing
list on any email.- When reporting bugs in this kernel via email, please also rewrite the
email Subject: in some manner to reflect the nature of the bug. Some
developers filter by Subject: when looking for messages to read.- Occasional snapshots of the -mm lineup are uploaded to
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on
the mm-commits list.Changes since 2.6.23-rc8-mm2:
origin.patch
git-acpi.patch
git-alsa.patch
git-arm.patch
git-audit-master.patch
git-avr32.patch
git-cifs.patch
git-cpufreq.patch
git-powerpc.patch
git-powerpc-galak.patch
git-drm.patch
git-dvb.patch
git-hwmon.patch
git-gfs2-nmw....
Here's a nearly-useless bug report for ya:
The -mm kernels between rc4-mm1 and rc8-mm1 have been locking up for
me in X when run for about a day or so. The crash does not appear
correlated with any particular action on my part - it seems just as
likely to happen if I'm typing in a local editor or over an ssh
session or using my web browser. The capslock light blinks, so it's
probably generating a trace and it still responds to sysrq but no logs
make it to disk. As my laptop generally doesn't stay put for a whole
day, I haven't managed to capture any of these traces.System is a Thinkpad R51, here's the config.
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-rc8-mm2
# Mon Oct 8 21:58:21 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_TREE=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
# CONFIG_CGROUPS is not set
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_FAIR_USER_SCHED=y
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
# CONFIG_BLK_DEV_INITRD is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_EMBEDDED=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL...
Hi Andrew,
The compilation with the cross compiler for the PowerPC-405 on the powerbox
fails at linkingLD init/built-in.o
LD .tmp_vmlinux1
ld: arch/powerpc/kernel/head_64.o(.text+0x80c8): sibling call optimization to `.text.init.refok' does not allow automatic multiple TOCs; recompile with -mminimal-toc or -fno-optimize-sibling-calls, or make `.text.init.refok' extern
ld: arch/powerpc/kernel/head_64.o(.text+0x8160): sibling call optimization to `.text.init.refok' does not allow automatic multiple TOCs; recompile with -mminimal-toc or -fno-optimize-sibling-calls, or make `.text.init.refok' extern
ld: arch/powerpc/kernel/head_64.o(.text+0x81c4): sibling call optimization to `.text.init.refok' does not allow automatic multiple TOCs; recompile with -mminimal-toc or -fno-optimize-sibling-calls, or make `.text.init.refok' extern
ld: final link failed: Bad value
make: *** [.tmp_vmlinux1] Error 1--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
Adding CC to the powerpc mailing list
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
Locks up hard at very early boot on my Dell Latitude - grub says loading
kernel, the screen clears, and we lock up before we get penguins.-rc8-mm1 was OK. I'm off to go bisect, figured I'd drop a heads-up.
It doesn't ring a bell, sorry. hpet-force-enable-on-ich34.patch is known
to be bad, but it causes failure later in the boot than that.
-
OK, now I'm confoozled. I built -rc8-mm2, and it bricked. Usually my first
test is then using quilt to push just origin.patch, so I know if I'm needing
to bisect Linus git or Andrew -mm.Starting with a clean 23-rc8, and using 'quilt push origin.patch' to just add
the Linus changes *also* gets me a brick. So I did a git bisect between 23-rc8 and
the first commit listed in origin.patch, and got down to this:f7f847b01571e86044dc77e03d92f43699652f8d is first bad commit
commit f7f847b01571e86044dc77e03d92f43699652f8d
Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Wed Sep 26 15:21:33 2007 -0700(Here's the 'git bisect log':
git-bisect start
# good: [4942de4a0e914f205d351a81873f4f63986bcc3c] Linux 2.6.23-rc8
git-bisect good 4942de4a0e914f205d351a81873f4f63986bcc3c
# bad: [f7f847b01571e86044dc77e03d92f43699652f8d] Revert "x86-64: Disable local APIC timer use on AMD systems with C1E"
git-bisect bad f7f847b01571e86044dc77e03d92f43699652f8d
# good: [4d3fac08718b49fc256bdb447a479d089ca97b78] Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev
git-bisect good 4d3fac08718b49fc256bdb447a479d089ca97b78
# good: [d85f57938ad1d674dff8077a2e6a36a45dbe0e22] Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
git-bisect good d85f57938ad1d674dff8077a2e6a36a45dbe0e22
# good: [d8c4a2f9d9e7827362fd7ab0b5d9637c6af5ac5b] mv643xx_eth: duplicate methods in initializer
git-bisect good d8c4a2f9d9e7827362fd7ab0b5d9637c6af5ac5b
# good: [255129d1e9ca0ed3d69d5517fae3e03d7ab4b806] NLM: Fix a circular lock dependency in lockd
git-bisect good 255129d1e9ca0ed3d69d5517fae3e03d7ab4b806
# good: [e66485d747505e9d960b864fc6c37f8b2afafaf0] x86-64: Disable local APIC timer use on AMD systems with C1E
git-bisect good e66485d747505e9d960b864fc6c37f8b2afafaf0
# good: [df912ea4ae7233d1504fbd861ee127bd7ee5781d] xen: execve's error paths don't pin the mm before unpinning
git-bisect good df912ea4ae7233d1504fbd861ee127bd7ee578...
http://lkml.org/lkml/2007/9/27/322, perhaps?
-
Yep, that was it - I applied that one patch on top of -rc8-mm2 and it
came up without complaint. That was certainly one that would make the CPU head
off into the weeds very early in boot.I need to figure how how I managed to botch the git bisect - it flagged the
very last commit, when the problem was commit N-1.
Hi,
The kernel report warnings about sysfs filename duplicate under
rc8-mm1 and rc8-mm2.
1.
----cut----
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xfb93e, last bus=3
PCI: Using configuration type 1
Setting up standard PCI resources
sysfs: duplicate filename 'usbcore' can not be created
WARNING: at fs/sysfs/dir.c:433 sysfs_add_one()
[<c010528f>] dump_trace+0x1bf/0x1d0
[<c0105358>] show_trace_log_lvl+0x18/0x30
[<c010537f>] show_trace+0xf/0x20
[<c01054a2>] dump_stack+0x12/0x20
[<c01c6680>] sysfs_add_one+0xa0/0xe0
[<c01c69e8>] create_dir+0x48/0xb0
[<c01c6a99>] sysfs_create_dir+0x29/0x50
[<c02501eb>] create_dir+0x1b/0x50
[<c02504b6>] kobject_add+0x46/0x150
[<c05a38a0>] kernel_param_sysfs_setup+0x50/0xb0
[<c05a39ee>] param_sysfs_builtin+0xee/0x130
[<c05a3a54>] param_sysfs_init+0x24/0x60
[<c0592866>] do_initcalls+0x46/0x1e0
[<c0592aa2>] kernel_init+0x62/0xb0
[<c0104fb3>] kernel_thread_helper+0x7/0x14
=======================
kobject_add failed for usbcore with -EEXIST, don't try to register
things with the same name in the same directory.
[<c010528f>] dump_trace+0x1bf/0x1d0
[<c0105358>] show_trace_log_lvl+0x18/0x30
[<c010537f>] show_trace+0xf/0x20
[<c01054a2>] dump_stack+0x12/0x20
[<c0250566>] kobject_add+0xf6/0x150
[<c05a38a0>] kernel_param_sysfs_setup+0x50/0xb0
[<c05a39ee>] param_sysfs_builtin+0xee/0x130
[<c05a3a54>] param_sysfs_init+0x24/0x60
[<c0592866>] do_initcalls+0x46/0x1e0
[<c0592aa2>] kernel_init+0x62/0xb0
[<c0104fb3>] kernel_thread_helper+0x7/0x14
=======================
Module 'usbcore' failed to be added to sysfs, error number -17
The system will be unstable now.
ACPI: EC: Look up EC in DSDT2.
----cut----
ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 21 (level, low) -> IRQ 18
PCI: Setting latency timer of devic...
That is very wierd, do you have both USB built in and as a module
I think I need to clean up the double stack trace here, that's reall not
needed :)thanks,
greg k-h
-
Hi,
I debugged the problem, found that it is a bug of kernel/params.c, I
will send a patch later to fix it.Regards
dave
-
From: Greg KH <greg@kroah.com>
I get some identical warnings from iftab in userspace:
eth1 renamed to switch
sysfs: duplicate filename 'switch' can not be created
WARNING: at fs/sysfs/dir.c:433 sysfs_add_one()Call Trace:
[<ffffffff802cb99c>] sysfs_add_one+0xac/0xe0
[<ffffffff802cc8fc>] sysfs_create_link+0xac/0x140
[<ffffffff803fa632>] device_rename+0x1c2/0x220
[<ffffffff80500ccc>] dev_change_name+0xbc/0x250
[<ffffffff80501198>] dev_ifsioc+0x338/0x3a0
[<ffffffff8050156d>] dev_ioctl+0x36d/0x3c0
[<ffffffff80271965>] handle_mm_fault+0x1a5/0x6f0
[<ffffffff804f215d>] sock_ioctl+0x7d/0x250
[<ffffffff80293cb1>] do_ioctl+0x31/0x90
[<ffffffff80293f2b>] vfs_ioctl+0x21b/0x2d0
[<ffffffff8029402a>] sys_ioctl+0x4a/0x80
[<ffffffff8020bc8e>] system_call+0x7e/0x83net switch: device_rename: sysfs_create_symlink failed (-17)
eth2 renamed to adsl
sysfs: duplicate filename 'adsl' can not be created
WARNING: at fs/sysfs/dir.c:433 sysfs_add_one()Call Trace:
[<ffffffff802cb99c>] sysfs_add_one+0xac/0xe0
[<ffffffff802cc8fc>] sysfs_create_link+0xac/0x140
[<ffffffff803fa632>] device_rename+0x1c2/0x220
[<ffffffff80500ccc>] dev_change_name+0xbc/0x250
[<ffffffff80501198>] dev_ifsioc+0x338/0x3a0
[<ffffffff8050156d>] dev_ioctl+0x36d/0x3c0
[<ffffffff80271965>] handle_mm_fault+0x1a5/0x6f0
[<ffffffff804f215d>] sock_ioctl+0x7d/0x250
[<ffffffff80293cb1>] do_ioctl+0x31/0x90
[<ffffffff80293f2b>] vfs_ioctl+0x21b/0x2d0
[<ffffffff8029402a>] sys_ioctl+0x4a/0x80
[<ffffffff8020bc8e>] system_call+0x7e/0x83net adsl: device_rename: sysfs_create_symlink failed (-17)
switch: no link during initialization.
ip_tables: (C) 2000-2006 Netfilter Core Teamthis happens when iftab renames my network interfaces. Booting
2.6.21-rc1-mm2 said:eth1 renamed to switch
eth2 renamed to adsl
ip_tables: (C) 2000-2006 Netfilter Core Team2.6.23-rc4...
I dont think so, below is my .config file: (It works under rc8 tree)
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-rc8-mm2
# Sat Sep 29 16:55:51 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=16
# CONFIG_CGROUPS is not set
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_FAIR_USER_SCHED=y
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_PROC_KPAGEMAP=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y...
# find /proc >/dev/null
find: WARNING: Hard link count is wrong for /proc/net: this may be a bug in your
filesystem driver. Automatically turning on find's -noleaf option. Earlier
results may have failed to include directories that should have been searched.
# stat net
File: `net'
Size: 0 Blocks: 0 IO Block: 1024 directory
Device: 3h/3d Inode: 4026531864 Links: 2
Access: (0555/dr-xr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2007-09-28 18:21:24.651209759 +0200
Modify: 2007-09-28 18:21:24.651209759 +0200
Change: 2007-09-28 18:21:24.651209759 +0200
# stat net/
File: `net/'
Size: 0 Blocks: 0 IO Block: 1024 directory
Device: 3h/3d Inode: 4026531909 Links: 4
Access: (0555/dr-xr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2007-09-28 18:26:48.813048220 +0200
Modify: 2007-09-28 18:26:48.813048220 +0200
Change: 2007-09-28 18:26:48.813048220 +0200hmm, this is some kind of weirdness :)
regards,
--
Jiri Slaby (jirislaby@gmail.com)
Faculty of Informatics, Masaryk University--
Jiri Slaby (jirislaby@gmail.com)
Faculty of Informatics, Masaryk University
-
Yes.
I can explain it. For the network namespace stuff we need special handling
of /proc/net so that depending on the network namespace we are resolving
against you see a different behavior. So you actually are observing
two different directories, one being a magic invisible symlink to the
other.Currently I am resolving against current (which has a number of
limitations) and the weird ugly effect you are current seeing.So it looks like I need to either make /proc/net a symlink to
/proc/self/net or make the network namespace something that we capture
at mount time of /proc.This was my don't get hung up on this implementation detail version.
Thanks for pointing out it has user visible problems. I will see
what I can do to resolve this.Eric
-
Hello !
I just found that warning in my logs. It seems that it's been
happening since rc7-mm1 at least.Thanks !
C.
WARNING: at /home/legoater/linux/2.6.23-rc8-mm2/net/ipv4/tcp_input.c:2314 tcp_fastretrans_alert()
Call Trace:
<IRQ> [<ffffffff8040fdc3>] tcp_ack+0xcd6/0x1894
[<ffffffff80411c79>] tcp_data_queue+0x5be/0xae7
[<ffffffff80412b54>] tcp_rcv_established+0x61f/0x6df
[<ffffffff80254146>] __lock_acquire+0x8a1/0xf1b
[<ffffffff80419cfd>] tcp_v4_do_rcv+0x3e/0x394
[<ffffffff8041a66f>] tcp_v4_rcv+0x61c/0x9a9
[<ffffffff803ff1e3>] ip_local_deliver+0x1da/0x2a4
[<ffffffff803ffb4e>] ip_rcv+0x583/0x5c9
[<ffffffff8046d33f>] packet_rcv_spkt+0x19a/0x1a8
[<ffffffff803e081c>] netif_receive_skb+0x2cf/0x2f5
[<ffffffff88042505>] :tg3:tg3_poll+0x65d/0x8a4
[<ffffffff803e09e8>] net_rx_action+0xb8/0x191
[<ffffffff8023a927>] __do_softirq+0x5f/0xe0
[<ffffffff8020c98c>] call_softirq+0x1c/0x28
[<ffffffff8020e9c3>] do_softirq+0x3b/0xb8
[<ffffffff8023aa1e>] irq_exit+0x4e/0x50
[<ffffffff8020e7df>] do_IRQ+0xbd/0xd7
[<ffffffff80209cb9>] mwait_idle+0x0/0x4d
[<ffffffff8020bce6>] ret_from_intr+0x0/0xf
<EOI> [<ffffffff80209cfc>] mwait_idle+0x43/0x4d
[<ffffffff802099fb>] enter_idle+0x22/0x24
[<ffffffff80209c4f>] cpu_idle+0x9d/0xc0
[<ffffffff80476a91>] rest_init+0x55/0x57
[<ffffffff80630815>] start_kernel+0x2d6/0x2e2
[<ffffffff80630134>] _sinittext+0x134/0x13b
-
...Thanks for the report, I'll have look what could still break
fackets_out...--
i.
-
On Fri, 28 Sep 2007, Ilpo J
I'm trying now to reproduce this WARNING.
It seems that the n/w behaves differently during the week ends. Probably
taking a break.C.
-
On Sat, 29 Sep 2007, Cedric Le Goater wrote:
> Ilpo J
got it !
r3-06.test.meiosys.com login: WARNING: at /home/legoater/linux/2.6.23-rc8-mm2/net/ipv4/tcp_input.c:2314 tcp_fastretrans_alert()
Call Trace:
<IRQ> [<ffffffff8040fdc3>] tcp_ack+0xcd6/0x18af
[<ffffffff80412b6f>] tcp_rcv_established+0x61f/0x6df
[<ffffffff80254146>] __lock_acquire+0x8a1/0xf1b
[<ffffffff80419d19>] tcp_v4_do_rcv+0x3e/0x394
[<ffffffff8041a68b>] tcp_v4_rcv+0x61c/0x9a9
[<ffffffff803ff1e3>] ip_local_deliver+0x1da/0x2a4
[<ffffffff803ffb4e>] ip_rcv+0x583/0x5c9
[<ffffffff8046d35b>] packet_rcv_spkt+0x19a/0x1a8
[<ffffffff803e081c>] netif_receive_skb+0x2cf/0x2f5
[<ffffffff88042505>] :tg3:tg3_poll+0x65d/0x8a4
[<ffffffff803e09e8>] net_rx_action+0xb8/0x191
[<ffffffff8023a927>] __do_softirq+0x5f/0xe0
[<ffffffff8020c98c>] call_softirq+0x1c/0x28
[<ffffffff8020e9c3>] do_softirq+0x3b/0xb8
[<ffffffff8023aa1e>] irq_exit+0x4e/0x50
[<ffffffff8020e7df>] do_IRQ+0xbd/0xd7
[<ffffffff80209cb9>] mwait_idle+0x0/0x4d
[<ffffffff8020bce6>] ret_from_intr+0x0/0xf
<EOI> [<ffffffff80209cfc>] mwait_idle+0x43/0x4d
[<ffffffff802099fb>] enter_idle+0x22/0x24
[<ffffffff80209c4f>] cpu_idle+0x9d/0xc0
[<ffffffff80476aa1>] rest_init+0x55/0x57
[<ffffffff80630815>] start_kernel+0x2d6/0x2e2
[<ffffffff80630134>] _sinittext+0x134/0x13bTCP 0
I wasn't doing any particular test on n/w so it took me a while to figure
out how I was triggering the WARNING. Apparently, this is happening when I
run ketchup, but not always. This test machine is behind many firewall &
routers so it might be a reason.tcpdump gave me this output for a wget on kernel.org :
10:51:14.835981 IP r3-06.test.meiosys.com.40322 > pub2.kernel.org.http: S 737836267:737836267(0) win 5840 <mss 1460,sackOK,timestamp 1309245 0,nop,wscale 7>
10:51:14.975153 IP pub2.kernel.org.http > r3-06.test.meiosys.com.40321: F 524:524(0) ack 166 win 58...
I'm currently out of ideas where it could come from... so lets try
brute-force checking as your test case is not very high-speed... This
could hide it though... :-(Please put the patch below on top of clean rc8-mm2 (it includes the patch
I gave you last time) and try to reproduce.... These counter bugs can
survive for sometime until !sacked_out condition occurs, so the patch
below tries to find that out when inconsisteny occurs for the first time
regardless of sacked_out (I also removed some statics which hopefully
reduces compiler inlining for easier reading of the output). I tried this
myself (except for verify()s in frto funcs and minor printout
modifications), didn't trigger for me.--
i.---
include/net/tcp.h | 3 +
net/ipv4/tcp_input.c | 23 +++++++++--
net/ipv4/tcp_ipv4.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++++
net/ipv4/tcp_output.c | 6 ++-
4 files changed, 128 insertions(+), 6 deletions(-)diff --git a/include/net/tcp.h b/include/net/tcp.h
index 991ccdc..54a0d91 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -43,6 +43,9 @@#include <linux/seq_file.h>
+extern void tcp_verify_fackets(struct sock *sk);
+extern void tcp_print_queue(struct sock *sk);
+
extern struct inet_hashinfo tcp_hashinfo;extern atomic_t tcp_orphan_count;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e22ffe7..1d7367d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1140,7 +1140,7 @@ static int tcp_check_dsack(struct tcp_sock *tp, struct sk_buff *ack_skb,
return dup_sack;
}-static int
+int
tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_una)
{
const struct inet_connection_sock *icsk = inet_csk(sk);
@@ -1160,6 +1160,8 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
int first_sack_index;if (!tp->sacked_out) {
+ if (WARN_ON(tp->fackets_out))
+ tcp_print_queue(sk);
tp->fackets_out ...
On Tue, 2 Oct 2007, Ilpo J
> On Tue, 2 Oct 2007, Ilpo J
Laurent,
It triggered a WARNING on first run in qemu:
[ 0.310000] WARNING: at arch/x86_64/kernel/smp.c:397 smp_call_function_mask()
[ 0.310000]
[ 0.310000] Call Trace:
[ 0.310000] [<ffffffff8100dbde>] dump_trace+0x3ee/0x4a0
[ 0.310000] [<ffffffff8100dcd3>] show_trace+0x43/0x70
[ 0.310000] [<ffffffff8100dd15>] dump_stack+0x15/0x20
[ 0.310000] [<ffffffff8101cd44>] smp_call_function_mask+0x94/0xa0
[ 0.310000] [<ffffffff8101cd69>] smp_call_function+0x19/0x20
[ 0.310000] [<ffffffff8104277f>] on_each_cpu+0x1f/0x50
[ 0.310000] [<ffffffff81026eac>] global_flush_tlb+0x8c/0x110
[ 0.310000] [<ffffffff81025c85>] free_init_pages+0xe5/0xf0
[ 0.310000] [<ffffffff81549b5e>] alternative_instructions+0x7e/0x150
[ 0.310000] [<ffffffff8154a2ea>] check_bugs+0x1a/0x20
[ 0.310000] [<ffffffff81540c4a>] start_kernel+0x2da/0x380
[ 0.310000] [<ffffffff81540132>] _sinittext+0x132/0x140Here is the more complete log:
[ 0.000000] Linux version 2.6.23-rc8-mm2 (wfg@intel) (gcc version 4.2.1 (Debian 4.2.1-5)) #3 SMP Fri Sep 28 10:29:34 CST 2007
[ 0.000000] Command line: root=/dev/hda rw console=ttyS0 clock=pit init=/bin/bash
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
[ 0.000000] BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 0000000039ff0000 (usable)
[ 0.000000] BIOS-e820: 0000000039ff0000 - 000000003a000000 (ACPI data)
[ 0.000000] BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
[ 0.000000] end_pfn_map = 1048576
[ 0.000000] DMI not present or invalid.
[ 0.000000] ACPI: RSDP 000FAA30, 0014 (r0 BOCHS )
[ 0.000000] ACPI: RSDT 39FF0000, 002C (r0 BOCHS BXPCRSDT 1 BXPC 1)
[ 0.000000] ACPI: FACP 39FF002C, 0074 (r0 BOCHS B...
the reason is the WARN_ON():
390 int smp_call_function_mask(cpumask_t mask,
391 void (*func)(void *), void *info,
392 int wait)
393 {
394 int ret;
395
396 /* Can deadlock when called with interrupts disabled */
397 WARN_ON(irqs_disabled());
398
399 spin_lock(&call_lock);
400 ret =3D __smp_call_function_mask(mask, func, info, wait);
401 spin_unlock(&call_lock);
402 return ret;
403 }The patch I sent to Andi didn't include this WARN_ON() and it's why I did=
n't
find this issue. (see http://lkml.org/lkml/2007/8/24/101)smp_call_function_mask() is called by smp_call_function() which calls a f=
unction
on all CPU except current.
The comment of smp_call_function() specifies:
=2E..
* You must not call this function with disabled interrupts or from a
* hardware interrupt handler or from a bottom half handler.
* Actually there are a few legal cases, like panic.
*/So this WARN_ON() is correct, and the caller (global_flush_tlb()) doesn't=
follow
this rule.I guess this WARN_ON() is only needed when we have current CPU in provide=
d mask.
So I think we should change:int smp_call_function (void (*func) (void *info), void *info, int nonatom=
ic,
int wait)
{
return smp_call_function_mask(cpu_online_map, func, info, wait);
}
("cpu_online_map" is a bad choice, comment also specifies: "run a functio=
n on
all other CPU")to
int smp_call_function (void (*func) (void *info), void *info, int nonatom=
ic,
int wait)
{
int ret;
cpumask_t allbutself;allbutself =3D cpu_online_map;
cpu_clear(smp_processor_id(), allbutself);spin_lock(&call_lock);
ret =3D __smp_call_function_mask(allbutself, func, info, wait);
spin_unlock(&call_lock);
return ret;
}
(which is smp_call_function_mask() without the WARN_ON() and without curr=
ent cpu
...
umm, I think all the smp_call_function fucntions are deadlocky if called
with local interrupts disabled, regardless of whether the calling CPU is in
the mask.If CPU A is sending a cross-cpu call to CPU B and CPU B is sending a
cross-cpu call to CPU A, and they both have local interrupts disabled...-
=2E
OK, so there are two errors:
1- one I introduce myself (without any help from anyone) where
smp_call_function() calls all online CPUs instead of calling all CPUs exc=
ept itself.2- one in global_flush_tlb() which calls smp_call_function() with irqs di=
sabled.I think I should at least correct #1 ?
Laurent
--=20
------------- Laurent.Vivier@bull.net --------------
"Software is hard" - Donald Knuth
I'd be pretty surprised if one was able to write a bug like that. You mean
the CPU sends an IPI to itself and then loops around until it has serviced
that IPI? And this works? Wow.And on_each_cpu() can call the handler function twice?
That would be a big bug, and surely we would already have picked it up.
<looks>
argh, mainline's x86_64 smp_call_function() doesn't do the check. We've
had *heaps* of bugs in i386 where people were running smp_call_foo() underI think we should correct all bugs ;)
-
This is a multi-part message in MIME format.
--------------080006050504030805060306
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printableIn fact, it works because __smp_call_function_mask() makes :
cpus_and(mask, mask, allbutself);
So it removes current cpu from the mask. BTW, I don't have to modify
smp_call_function(): it is correct as it is written (except from a semant=
icWe don't live in a perfect world... ;-)
Moreover, I'm not able to reproduce #2 ...
Fengguang could you send me your .config and your qemu command line param=
eters ?Laurent
PS: if semantic is important, you can apply attached patch...
--=20
------------- Laurent.Vivier@bull.net --------------
"Software is hard" - Donald Knuth--------------080006050504030805060306
Content-Type: application/mbox;
name="0001-smp_call_function-call-function-on-all-CPUs.patch"
Content-Transfer-Encoding: base64
Content-Disposition: inline;
filename*0="0001-smp_call_function-call-function-on-all-CPUs.patch"QWNjb3JkaW5nIGNvbW1lbnQgb2Ygc21wX2NhbGxfZnVuY3Rpb24oKSwgCnNtcF9jYWxsX2Z1
bmN0aW9uKCkgInJ1biBhIGZ1bmN0aW9uIG9uIGFsbCBvdGhlciBDUFVzLiIsCm5vdCBvbiBh
bGwgb25saW5lIENQVXMuCgpTaWduZWQtb2ZmLWJ5OiBMYXVyZW50IFZpdmllciA8TGF1cmVu
dC5WaXZpZXJAYnVsbC5uZXQ+Ci0tLQogc21wLmMgfCAgICA3ICsrKysrKy0KIDEgZmlsZSBj
aGFuZ2VkLCA2IGluc2VydGlvbnMoKyksIDEgZGVsZXRpb24oLSkKCmRpZmYgLS1naXQgYS9h
cmNoL3g4Nl82NC9rZXJuZWwvc21wLmMgYi9hcmNoL3g4Nl82NC9rZXJuZWwvc21wLmMKLS0t
IGEvYXJjaC94ODZfNjQva2VybmVsL3NtcC5jCisrKyBiL2FyY2gveDg2XzY0L2tlcm5lbC9z
bXAuYwpAQCAtNDU5LDcgKzQ1OSwxMiBAQCBFWFBPUlRfU1lNQk9MKHNtcF9jYWxsX2Z1bmN0
aW9uX3NpbmdsZSk7CiBpbnQgc21wX2NhbGxfZnVuY3Rpb24gKHZvaWQgKCpmdW5jKSAodm9p
ZCAqaW5mbyksIHZvaWQgKmluZm8sIGludCBub25hdG9taWMsCiAJCQlpbnQgd2FpdCkKIHsK
LQlyZXR1cm4gc21wX2NhbGxfZnVuY3Rpb25fbWFzayhjcHVfb25saW5lX21hcCwgZnVuYywg
aW5mbywgd2FpdCk7CisJY3B1bWFza190IGFsbGJ1dHNlbGY7CisKKwlhbGxidXRzZWxmID0g
Y3B1X29ubGluZV9tYXA7CisJY3B1X2NsZWFyKHNtcF9wcm9jZXNzb3JfaWQoKSwgYWxsYnV0
c2VsZik7CisKKwlyZX...
Thank god it's a bug in alternative_instructions() instead of global_flush_tlb().
The following patch fixed it.
===
call free_init_pages() with irqs enabled in alternative_instructions()It fixes the warning message in smp_call_function*(), which should be called
with irqs disabled.[ 0.310000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 0.310000] CPU: L2 Cache: 512K (64 bytes/line)
[ 0.310000] CPU 0/0 -> Node 0
[ 0.310000] SMP alternatives: switching to UP code
[ 0.310000] Freeing SMP alternatives: 25k freed
[ 0.310000] WARNING: at arch/x86_64/kernel/smp.c:397 smp_call_function_mask()
[ 0.310000]
[ 0.310000] Call Trace:
[ 0.310000] [<ffffffff8100dbde>] dump_trace+0x3ee/0x4a0
[ 0.310000] [<ffffffff8100dcd3>] show_trace+0x43/0x70
[ 0.310000] [<ffffffff8100dd15>] dump_stack+0x15/0x20
[ 0.310000] [<ffffffff8101cd44>] smp_call_function_mask+0x94/0xa0
[ 0.310000] [<ffffffff8101d0b2>] smp_call_function+0x32/0x40
[ 0.310000] [<ffffffff8104277f>] on_each_cpu+0x1f/0x50
[ 0.310000] [<ffffffff81026eac>] global_flush_tlb+0x8c/0x110
[ 0.310000] [<ffffffff81025c85>] free_init_pages+0xe5/0xf0
[ 0.310000] [<ffffffff81549b5e>] alternative_instructions+0x7e/0x150
[ 0.310000] [<ffffffff8154a2ea>] check_bugs+0x1a/0x20
[ 0.310000] [<ffffffff81540c4a>] start_kernel+0x2da/0x380
[ 0.310000] [<ffffffff81540132>] _sinittext+0x132/0x140
[ 0.310000]
[ 0.320000] ACPI: Core revision 20070126
[ 0.560000] Using local APIC timer interrupts.
[ 0.590000] Detected 62.496 MHz APIC timer.
[ 0.590000] Brought up 1 CPUsCc: Laurent Vivier <Laurent.Vivier@bull.net>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
arch/i386/kernel/alternative.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)--- linux-2.6.23-rc8-mm2.orig/arch/i386/kernel/alternative.c
...
OK. I applied your patch and reran it.
Here are the qemu cmdline, warning messages and .config:qemu-system-x86_64 -m 928 -hda /dev/sdb5 -kernel ./arch/x86_64/boot/bzImage -nographic -append "root=/dev/hda rw console=ttyS0 clock=pit init=/bin/bash"
[ 0.310000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 0.310000] CPU: L2 Cache: 512K (64 bytes/line)
[ 0.310000] CPU 0/0 -> Node 0
[ 0.310000] SMP alternatives: switching to UP code
[ 0.310000] Freeing SMP alternatives: 25k freed
[ 0.310000] WARNING: at arch/x86_64/kernel/smp.c:397 smp_call_function_mask()
[ 0.310000]
[ 0.310000] Call Trace:
[ 0.310000] [<ffffffff8100dbde>] dump_trace+0x3ee/0x4a0
[ 0.310000] [<ffffffff8100dcd3>] show_trace+0x43/0x70
[ 0.310000] [<ffffffff8100dd15>] dump_stack+0x15/0x20
[ 0.310000] [<ffffffff8101cd44>] smp_call_function_mask+0x94/0xa0
[ 0.310000] [<ffffffff8101d0b2>] smp_call_function+0x32/0x40
[ 0.310000] [<ffffffff8104277f>] on_each_cpu+0x1f/0x50
[ 0.310000] [<ffffffff81026eac>] global_flush_tlb+0x8c/0x110
[ 0.310000] [<ffffffff81025c85>] free_init_pages+0xe5/0xf0
[ 0.310000] [<ffffffff81549b5e>] alternative_instructions+0x7e/0x150
[ 0.310000] [<ffffffff8154a2ea>] check_bugs+0x1a/0x20
[ 0.310000] [<ffffffff81540c4a>] start_kernel+0x2da/0x380
[ 0.310000] [<ffffffff81540132>] _sinittext+0x132/0x140
[ 0.310000]
[ 0.320000] ACPI: Core revision 20070126
[ 0.560000] Using local APIC timer interrupts.
[ 0.590000] Detected 62.496 MHz APIC timer.
[ 0.590000] Brought up 1 CPUs#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-rc8-mm2
# Fri Sep 28 10:26:40 2007
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_NONIRQ_WAKEUP=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CMOS_UPDATE=y
CO...
Le 27.09.2007 11:22, Andrew Morton a
On Thu, 27 Sep 2007 21:18:55 +0200
-
This sure looks like a result of the reiserfs xattr code beeing
really sucky and passing a NULL vfsmount to dentry_open.Dave will probably find a bandaid to work around this, but the
right fix is to stop using a file struct here entirely. If you
look at reiserfs_xattr_set it's not actually used at all except
for passing it to ->prepare_write and ->commit_write which then
don't use it at all. All that crap should be rewritten to just
pass the dentry around. Note that all this should not acquire
writer counts on the vfsmount - we have done this already
before calling into the xattr methods.
-
I'm perplexed as to why a 'struct file' is needed, too. The 'struct
file' doesn't get used for anything _but_ the dentry, and the functions
to which it is passed (reiserfs_prepate/commit_write()) don't use it.
In fact, some other callers just pass NULL.This is a completely naive implementation, and I've only tested that it
compiles, but does anybody see a reason we can't just do this?BTW, do reiserfs developers know that you can put function declarations
in header files? ;) 4 different .c files have this:int reiserfs_commit_write(struct page *page, unsigned from, unsigned to);
int reiserfs_prepare_write(struct page *page, unsigned from, unsigned to);---
lxc-dave/fs/reiserfs/inode.c | 15 ++++------
lxc-dave/fs/reiserfs/ioctl.c | 10 ++-----
lxc-dave/fs/reiserfs/xattr.c | 59 +++++++++++++------------------------------
3 files changed, 28 insertions(+), 56 deletions(-)diff -puN fs/open.c~fix-reiserfs-oops fs/open.c
diff -puN fs/file_table.c~fix-reiserfs-oops fs/file_table.c
diff -puN include/linux/mount.h~fix-reiserfs-oops include/linux/mount.h
diff -puN fs/reiserfs/xattr.c~fix-reiserfs-oops fs/reiserfs/xattr.c
--- lxc/fs/reiserfs/xattr.c~fix-reiserfs-oops 2007-09-27 13:36:38.000000000 -0700
+++ lxc-dave/fs/reiserfs/xattr.c 2007-09-27 13:49:02.000000000 -0700
@@ -194,27 +194,6 @@ static struct dentry *get_xa_file_dentry
return xafile;
}-/* Opens a file pointer to the attribute associated with inode */
-static struct file *open_xa_file(const struct inode *inode, const char *name,
- int flags)
-{
- struct dentry *xafile;
- struct file *fp;
-
- xafile = get_xa_file_dentry(inode, name, flags);
- if (IS_ERR(xafile))
- return ERR_PTR(PTR_ERR(xafile));
- else if (!xafile->d_inode) {
- dput(xafile);
- return ERR_PTR(-ENODATA);
- }
-
- fp = dentry_open(xafile, NULL, O_RDWR);
- /* dentry_open dputs the dentry if it fails */
-
- return fp;
-}
-
/*
* this is very similar to fs/reiserfs/dir.c:reiserfs_readdir, but
* we need to dro...
I doubt this will work. These are also used for the ->prepare_write
and ->commit_write aops, and the method signature definitively wants
a file there, even if it's zero..-
Oddly enough, I don't see those functions being used in aops:
const struct address_space_operations reiserfs_address_space_operations = {
.writepage = reiserfs_writepage,
.readpage = reiserfs_readpage,
.readpages = reiserfs_readpages,
.releasepage = reiserfs_releasepage,
.invalidatepage = reiserfs_invalidatepage,
.sync_page = block_sync_page,
.write_begin = reiserfs_write_begin,
.write_end = reiserfs_write_end,
.bmap = reiserfs_aop_bmap,
.direct_IO = reiserfs_direct_IO,
.set_page_dirty = reiserfs_set_page_dirty,
};Plus, reiserfs seems to compile with that patch I just sent. Sure as
heck surprised me.-- Dave
-
On Thu, 27 Sep 2007 14:27:14 -0700
That'll be because reiserfs-convert-to-new-aops.patch witched reiserfs over
to ->write_begin() and ->write_end().So your stuff becomes dependent on Nick's stuff, and Nick's stuff is still
failing on NFS, I think.-
I'd rather avoid the paramater removal for now, that makes it less
entangle, and it's an unrelated cleanup anyway.Btw, there's more abuse of this sort in reiserfs. Various other places
in xattr.c call dentry_open directly without the vfsmount aswell. And
handling of an external journal uses filp_open which is similarly stupid,
it should use open_bdev_excl like xfs or the generic code to open the
main filesystem blockdevice.-
And here's a patch to stop the filp abuse in the journal code. An additional
benefit is that the block device is now properly claimed when opened by
device number.Index: linux-2.6/fs/reiserfs/journal.c
===================================================================
--- linux-2.6.orig/fs/reiserfs/journal.c 2007-09-28 09:18:50.000000000 +0200
+++ linux-2.6/fs/reiserfs/journal.c 2007-09-28 09:28:36.000000000 +0200
@@ -2544,11 +2544,9 @@ static int release_journal_dev(struct suresult = 0;
- if (journal->j_dev_file != NULL) {
- result = filp_close(journal->j_dev_file, NULL);
- journal->j_dev_file = NULL;
- journal->j_dev_bd = NULL;
- } else if (journal->j_dev_bd != NULL) {
+ if (journal->j_dev_bd != NULL) {
+ if (journal->j_dev_bd->bd_dev != super->s_dev)
+ bd_release(journal->j_dev_bd);
result = blkdev_put(journal->j_dev_bd);
journal->j_dev_bd = NULL;
}
@@ -2573,7 +2571,6 @@ static int journal_init_dev(struct super
result = 0;journal->j_dev_bd = NULL;
- journal->j_dev_file = NULL;
jdev = SB_ONDISK_JOURNAL_DEVICE(super) ?
new_decode_dev(SB_ONDISK_JOURNAL_DEVICE(super)) : super->s_dev;@@ -2590,35 +2587,34 @@ static int journal_init_dev(struct super
"cannot init journal device '%s': %i",
__bdevname(jdev, b), result);
return result;
- } else if (jdev != super->s_dev)
+ } else if (jdev != super->s_dev) {
+ result = bd_claim(journal->j_dev_bd, journal);
+ if (result) {
+ blkdev_put(journal->j_dev_bd);
+ return result;
+ }
+
set_blocksize(journal->j_dev_bd, super->s_blocksize);
+ }
+
return 0;
}- journal->j_dev_file = filp_open(jdev_name, 0, 0);
- if (!IS_ERR(journal->j_dev_file)) {
- struct inode *jdev_inode = journal->j_dev_file->f_mapping->host;
- if (!S_ISBLK(jdev_inode->i_mode)) {
- reiserfs_warning(super, "journal_init_dev: '%s' is "
- "not a block device", jdev_name);
- result = -E...
It worked today, it turned out to be a UML bug. Real hardware seemed to
work properly, but will test a bit more tomorrow.-
On Thu, 27 Sep 2007 14:51:25 -0700
Actually, we should rename reiserfs_prepare_write and reiserfs_commit_write
to something else to reduce confusion. Probably lots of other filesystems
would benefit from the same change, post-Nick's-stuff.
-
Got it. I'll try to reproduce.
-- Dave
-
On HP nx6325:
1) The audio is back (thanks for reverting x86_64-mm-cpa-einval.patch)
2) CPU hotplug is busted (onlining of CPU1 kills the kernel), probably due to
the same issue that I'm having with the -hrt version of 2.6.23-rc8 (we're
debugging it right now)3) Some call traces unrelated to 2) appear in dmesg:
a) This probably is due to the fact that I've compiled two RTC drivers, by
mistake ;-)Duplicate file names "rtc" detected.
Call Trace:
/home/rafael/src/mm/linux-2.6.23-rc8-mm2/drivers/usb/core/inode.c: creating file
'001'
usb usb3: new device found, idVendor=0000, idProduct=0000
usb usb3: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb3: Product: OHCI Host Controller
usb usb3: Manufacturer: Linux 2.6.23-rc8-mm2 ohci_hcd
usb usb3: SerialNumber: 0000:00:13.1
[<ffffffff802d1b77>] proc_register+0x15b/0x173
[<ffffffff802d1c03>] create_proc_entry+0x74/0x89
[<ffffffff8811e05a>] :rtc_core:rtc_proc_add_device+0x25/0x48
[<ffffffff8811d229>] :rtc_core:rtc_device_register+0x182/0x215
[<ffffffff802d7f8b>] sysfs_add_one+0xbf/0xc9
[<ffffffff88160587>] :rtc_cmos:cmos_do_probe+0xa5/0x220
[<ffffffff88160743>] :rtc_cmos:cmos_pnp_probe+0x41/0x43
[<ffffffff80358805>] pnp_device_probe+0x7b/0xa2
[<ffffffff803825e4>] driver_probe_device+0x100/0x18f
[<ffffffff80382793>] __driver_attach+0x70/0xae
[<ffffffff80382723>] __driver_attach+0x0/0xae
[<ffffffff803818f8>] bus_for_each_dev+0x49/0x7a
[<ffffffff803823f6>] driver_attach+0x1c/0x1e
[<ffffffff80381cc5>] bus_add_driver+0x86/0x1d6
[<ffffffff803829a5>] driver_register+0x72/0x76
[<ffffffff803585a1>] pnp_register_driver+0x1c/0x1e
[<ffffffff880e3010>] :rtc_cmos:cmos_init+0x10/0x12
[<ffffffff80258398>] sys_init_module+0x16df/0x1864
[<ffffffff8811d0a7>] :rtc_core:rtc_device_register+0x0/0x215
[<ffffffff8020bfee>] system_call+0x7e/0x83rtc_cmos 00:07: rtc core: registered ...
This one is fixed by the following patch:
---
From: Rafael J. Wysocki <rjw@sisk.pl>Fix CPU hotplug breakage on HP nx6325 and similar boxes caused by a reference
to disable_apic_timer (labeled as __initdata) from the CPU initialization code.Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
arch/x86_64/kernel/apic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)Index: linux-2.6.23-rc8-mm2/arch/x86_64/kernel/apic.c
===================================================================
--- linux-2.6.23-rc8-mm2.orig/arch/x86_64/kernel/apic.c
+++ linux-2.6.23-rc8-mm2/arch/x86_64/kernel/apic.c
@@ -42,7 +42,7 @@int apic_verbosity;
static int apic_calibrate_pmtmr __initdata;
-int disable_apic_timer __initdata;
+int disable_apic_timer __cpuinitdata;/* Local APIC timer works in C2? */
int local_apic_timer_c2_ok;
-
This is with your configuration a reference from a .text section
to a .init.data section as I see it.So this ought to have been flagged by the section mismatch
checks in modpost.I assume you did not see such warning??
Sam
-
I did:
WARNING: vmlinux.o(.text+0x7778): Section mismatch: reference to .init.data:disable_apic_timer (between 'identify_cpu' and 'IRQ0x20_interrupt')
---
~Randy
Phaedrus says that Quality is about caring.
-
Thanks. I look forward to the day I can make them errors so people
cannot ignore them.
That said a lot of individuals has done a very good job getting
rid of section mismatch warnings all over the kernel.Sam
-
Yes.
Greetings,
Rafael
-
I did only on one build machine (multiple times). I tried
to reproduce it on another machine and could not. I blame---
~Randy
Phaedrus says that Quality is about caring.
-
That's possible. I use the default openSUSE 10.2 toolchain.
Greetings,
Rafael
-
Doh, I knew I blew it.
Good catch, thanks,
-
Some good news from here. :-)
WIth the patch below applied 2.6.23-rc8-mm2 works fine on the nx6325 _with_ NO_HZ
and HIGH_RES_TIMERS set. Suspend and hibernation work as well, happy me.NO_HZ and HIGH_RES_TIMERS also work on this box with the hrt patch plus the
C1E-related fix on top of 2.6.23-rc8.Does it make sense to test CPU_IDLE too at this point?
Greetings,
--
"Premature optimization is the root of all evil." - Donald Knuth
-
Hi Andrew,
The drivers/net/ibm_newemac/mal seems to be broken with 2.6.23-rc8-mm2 also, it was
reported on 2.6.23-rc8-mm1 (http://lkml.org/lkml/2007/9/25/173).--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
| Cliffe | Re: [RFC 0/5] [TALPA] Intro to a linux interface for on access scanning |
| Amit K. Arora | [RFC] Heads up on sys_fallocate() |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Andrew Morton | Re: [RFC/PATCH] Documentation of kernel messages |
| David Miller | [GIT]: Networking |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Radu Rendec | Endianness problem with u32 classifier hash masks |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
git: | |
