ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/ - Lots of x86 updates - This is a 25MB diff against mainline, which is rather large. Boilerplate: - See the `hot-fixes' directory for any important updates to this patchset. - To fetch an -mm tree using git, use (for example) git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1 git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1 - -mm kernel commit activity can be reviewed by subscribing to the mm-commits mailing list. echo "subscribe mm-commits" | mail majordomo@vger.kernel.org - If you hit a bug in -mm and it is not obvious which patch caused it, it is most valuable if you can perform a bisection search to identify which patch introduced the bug. Instructions for this process are at http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt But beware that this process takes some time (around ten rebuilds and reboots), so consider reporting the bug first and if we cannot immediately identify the faulty patch, then perform the bisection search. - When reporting bugs, please try to Cc: the relevant maintainer and mailing list on any email. - When reporting bugs in this kernel via email, please also rewrite the email Subject: in some manner to reflect the nature of the bug. Some developers filter by Subject: when looking for messages to read. - Occasional snapshots of the -mm lineup are uploaded to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on the mm-commits list. Changes since 2.6.21-rc5-mm4: origin.patch git-acpi.patch git-alsa.patch git-agpgart.patch git-arm.patch git-avr32.patch git-cifs.patch git-cpufreq.patch git-powerpc.patch git-drm.patch git-dvb.patch git-gfs2-nmw.patch git-hid.patch git-ia64.patch git-ieee1394.patch git-infiniband.patch git-input.patch git-jfs.patch ...
Since 2.6.21-rc5-mm1, one of the test.kernel.org machines (elm3b239) has not been able to boot because it cannot find the SCSI device. You can view http://test.kernel.org/abat/82623/debug/console.log for the latest boot log (rc6-mm1). I tracked this down to the git-scsi-misc patch in the -mm tree and then bisected the scsi-misc git tree until I reached the commit below from Mark Salyzyn: fe76df4235986cfacc2d3b71cef7c42bc1a6dd6c [SCSI] aacraid: Fix blocking issue with container probing function (cast update) This is a pretty big patch, so hopefully Mark can take a look at it. lspci shows 01:02.0 RAID bus controller: Adaptec AAC-RAID (rev 02) 0f:02.0 SCSI storage controller: Adaptec AIC-9410W SAS (Razor ASIC non-RAID) (rev 08) 1d:02.0 SCSI storage controller: Adaptec AIC-9410W SAS (Razor ASIC non-RAID) (rev 08) 2b:02.0 SCSI storage controller: Adaptec AIC-9410W SAS (Razor ASIC non-RAID) (rev 08) on 2.6.21-rc6. Let me know if I can provide more details. -- Steve Fox IBM Linux Technology Center -
Thanks for the help from Steve Fox and Duane Cox investigating this issue, I'd like to report that we found the problem. The issue is with the patch Steve Fox isolated below, by not accommodating older adapters properly and issuing a command they do not support when retrieving storage parameters about the arrays. This simple patch resolves the problem (and more accurately mimics the logic of the original code before the patch). ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patches. This attached patch is against current scsi-misc-2.6 and can apply to 2.6.21-rc6-mm1. Please consider it for expedited inclusion. Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> --- ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6 /2.6.21-rc6-mm1/ Since 2.6.21-rc5-mm1, one of the test.kernel.org machines (elm3b239) has not been able to boot because it cannot find the SCSI device. You can view http://test.kernel.org/abat/82623/debug/console.log for the latest boot log (rc6-mm1). I tracked this down to the git-scsi-misc patch in the -mm tree and then bisected the scsi-misc git tree until I reached the commit below from Mark Salyzyn: fe76df4235986cfacc2d3b71cef7c42bc1a6dd6c [SCSI] aacraid: Fix blocking issue with container probing function (cast update) This is a pretty big patch, so hopefully Mark can take a look at it. lspci shows 01:02.0 RAID bus controller: Adaptec AAC-RAID (rev 02) 0f:02.0 SCSI storage controller: Adaptec AIC-9410W SAS (Razor ASIC non-RAID) (rev 08) 1d:02.0 SCSI storage controller: Adaptec AIC-9410W SAS (Razor ASIC non-RAID) (rev 08) 2b:02.0 SCSI storage controller: Adaptec AIC-9410W SAS (Razor ASIC non-RAID) (rev 08) on 2.6.21-rc6. Let me know if I can provide more details. --=20 Steve Fox IBM Linux Technology Center
Hi Rafael, At what point during boot does it hang? Can you send me the last few messages before the hang. And full dmesg when cpuidle is not configured will help as well. Thanks, Venki -
When mounting the root filesystem. It hangs completely, even the magic SysRq Freeing unused kernel memory: 240k freed Write protecting the kernel read-only data: 4356k PM: Adding info for No Bus:vcs1 PM: Adding info for No Bus:vcsa1 ACPI: Invalid PBLK length [0] cpuidle: driver acpi_idle failed to attach to cpu 0 cpuidle: using driver acpi_idle ACPI: Thermal Zone [THRM] (59 C) ACPI: Fan [FN00] (on) Attempting manual resume swsusp: Resume From Partition 22:3 PM: Checking swsusp image. PM: Resume from disk failed. kjournald starting. Commit interval 5 seconds EXT3 FS on hdc6, internal journal Attached. Greetings, Rafael
Rafael: Below patch should fix the hang.
Len: Please include this patch in acpi-test.
Thanks,
Venki
Prevent hang on x86-64, when ACPI processor driver is added as a module on
a system that does not support C-states.
x86-64 expects all idle handlers to enable interrupts before returning from
idle handler. This is due to enter_idle(), exit_idle() races. Make
cpuidle_idle_call() confirm to this when there is no pm_idle_old.
Also, cpuidle look at the return values of attch_driver() and set
current_driver to NULL if attach fails on all CPUs.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Index: linux-2.6.21-rc6-mm1/drivers/cpuidle/cpuidle.c
===================================================================
--- linux-2.6.21-rc6-mm1.orig/drivers/cpuidle/cpuidle.c
+++ linux-2.6.21-rc6-mm1/drivers/cpuidle/cpuidle.c
@@ -43,6 +43,8 @@ static void cpuidle_idle_call(void)
if (dev->status != CPUIDLE_STATUS_DOIDLE) {
if (pm_idle_old)
pm_idle_old();
+ else
+ local_irq_enable();
return;
}
Index: linux-2.6.21-rc6-mm1/drivers/cpuidle/driver.c
===================================================================
--- linux-2.6.21-rc6-mm1.orig/drivers/cpuidle/driver.c
+++ linux-2.6.21-rc6-mm1/drivers/cpuidle/driver.c
@@ -107,11 +107,20 @@ int cpuidle_switch_driver(struct cpuidle
cpuidle_curr_driver = drv;
if (drv) {
+ int ret = 1;
list_for_each_entry(dev, &cpuidle_detected_devices, device_list)
- cpuidle_attach_driver(dev);
- if (cpuidle_curr_governor)
+ if (cpuidle_attach_driver(dev) == 0)
+ ret = 0;
+
+ /* If attach on all devices fail, switch to NULL driver */
+ if (ret)
+ cpuidle_curr_driver = NULL;
+
+ if (cpuidle_curr_driver && cpuidle_curr_governor) {
+ printk(KERN_INFO "cpuidle: using driver %s\n",
+ drv->name);
cpuidle_install_idle_handler();
- printk(KERN_INFO "cpuidle: using driver %s\n", drv->name);
+ }
}
return 0;
-
Get this Oops: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<ffffffff802f9320>] hugetlbfs_set_page_dirty+0x4/0xc PGD 414e067 PUD 4198067 PMD 0 Oops: 0002 [1] SMP last sysfs file: devices/system/node/node0/cpumap CPU 1 Modules linked in: ipv6 hidp rfcomm l2cap bluetooth sunrpc video button battery asus_acpi ac lp parport_pc parport nvram amd_rng rng_core i2c_amd756 i2c_core Pid: 6053, comm: readback Not tainted 2.6.21-rc6-mm1-autokern1 #1 RIP: 0010:[<ffffffff802f9320>] [<ffffffff802f9320>] hugetlbfs_set_page_dirty+0x4/0xc RSP: 0018:ffff810004145d90 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff81003f1ad000 RCX: 000000000000003f RDX: ffff810004771dc0 RSI: ffff810004145db0 RDI: ffff81003f1ad000 RBP: 8000000007800040 R08: 0000000001258020 R09: ffff81000160ad84 R10: 0000000000000282 R11: ffffffff802f931c R12: ffff8100035db7c0 R13: ffff810003675c38 R14: 00002aaaaae00000 R15: ffff810001022820 FS: 00002ac8d0bd6590(0000) GS:ffff81000160acc0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000000047b7000 CR4: 00000000000006e0 Process readback (pid: 6053, threadinfo ffff810004144000, task ffff81000177b140) Stack: ffffffff80283f95 ffff810004145d98 ffff810004145d98 ffff810000000000 00002aaaaac00000 ffff810003675c38 00002aaaaae00000 00002aaaaac00000 ffff8100047b68b8 00000036d5f18000 ffffffff80284060 ffff81003fc066c0 Call Trace: [<ffffffff80283f95>] __unmap_hugepage_range+0xcf/0x163 [<ffffffff80284060>] unmap_hugepage_range+0x37/0x57 [<ffffffff802761e4>] unmap_vmas+0xf6/0x744 [<ffffffff8027a197>] exit_mmap+0x78/0xed [<ffffffff802313bc>] mmput+0x45/0xb7 [<ffffffff80236636>] do_exit+0x23d/0x811 [<ffffffff80236c86>] sys_exit_group+0x0/0xe [<ffffffff80209b6e>] system_call+0x7e/0x83 Code: f0 0f ba 28 04 31 c0 c3 48 89 c8 48 c7 c1 5f 9b 2f 80 48 89 RIP [<ffffffff802f9320>] hugetlbfs_set_page_dirty+0x4/0xc RSP <ffff810004145d90> CR2: 0000000000000000 Fixing recursive fault but ...
Correct. Acked-by: Christoph Lameter <clameter@sgi.com> Who is off to look for more of these. -
I'm seeing this while booting:
ima (ima_init): No TPM chip found(rc = -19), activating TPM-bypass!
=========================
[ BUG: held lock freed! ]
-------------------------
swapper/1 is freeing memory c04c7660-c04c76a3, with a lock still held there!
(ima_queue_lock){--..}, at: [<c0202710>] ima_create_htable+0x10/0x90
1 lock held by swapper/1:
#0: (ima_queue_lock){--..}, at: [<c0202710>] ima_create_htable+0x10/0x90
stack backtrace:
[<c0105959>] dump_trace+0x1d9/0x210
[<c01059aa>] show_trace_log_lvl+0x1a/0x30
[<c0106612>] show_trace+0x12/0x20
[<c01066d6>] dump_stack+0x16/0x20
[<c014fd3a>] debug_check_no_locks_freed+0x17a/0x180
[<c014cdbf>] debug_mutex_init+0x1f/0x50
[<c0145451>] __mutex_init+0x41/0x50
[<c020277d>] ima_create_htable+0x7d/0x90
[<c020286f>] ima_init+0x3f/0x270
[<c051b765>] init_evm+0x1f5/0x250
[<c05015d2>] kernel_init+0x132/0x320
[<c010532f>] kernel_thread_helper+0x7/0x18
=======================
I saw this in -rc5-mm4 also.
I couldn't find a contact address in MAINTAINERS, so I've CC'd the
two authors listed on top of ima_create_htable.c , as well as the
first submitter of the IMA stuff I found in my LKML archive.
As an aside, this computer does have (some sort of) TPM chip, but
the driver is built as a module, and not loaded at this point (not a
worry for me, I don't intend to use it).
--
Joseph Fannin
jfannin@gmail.com || jhf@columbus.rr.com
On Sun, 8 Apr 2007 14:35:59 -0700,
Add the missing arch_trampoline_kprobe() for s390.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
arch/s390/kernel/kprobes.c | 7 +++++++
1 files changed, 7 insertions(+)
--- linux-2.6.21-rc6-mm1.orig/arch/s390/kernel/kprobes.c
+++ linux-2.6.21-rc6-mm1/arch/s390/kernel/kprobes.c
@@ -662,3 +662,10 @@ int __init arch_init_kprobes(void)
{
return register_kprobe(&trampoline_p);
}
+
+int __kprobes arch_trampoline_kprobe(struct kprobe *p)
+{
+ if (p->addr == (kprobe_opcode_t *) & kretprobe_trampoline)
+ return 1;
+ return 0;
+}
-
-
This patch makes the needlessly global truct proc_kpagemap static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
---
--- linux-2.6.21-rc6-mm1/fs/proc/proc_misc.c.old 2007-04-10 00:52:35.000000000 +0200
+++ linux-2.6.21-rc6-mm1/fs/proc/proc_misc.c 2007-04-10 00:52:49.000000000 +0200
@@ -732,7 +732,7 @@
return ret;
}
-struct proc_dir_entry *proc_kpagemap;
+static struct proc_dir_entry *proc_kpagemap;
static struct file_operations proc_kpagemap_operations = {
.llseek = mem_lseek,
.read = kpagemap_read,
-
Acked-by: Matt Mackall <mpm@selenic.com> -- Mathematics is the supreme nostalgia of our time. -
is_exported() can now become static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
---
include/linux/module.h | 7 -------
kernel/module.c | 2 +-
2 files changed, 1 insertion(+), 8 deletions(-)
--- linux-2.6.21-rc6-mm1/include/linux/module.h.old 2007-04-10 01:04:03.000000000 +0200
+++ linux-2.6.21-rc6-mm1/include/linux/module.h 2007-04-10 01:05:09.000000000 +0200
@@ -382,8 +382,6 @@
/* Look for this name: can be of form module:name. */
unsigned long module_kallsyms_lookup_name(const char *name);
-int is_exported(const char *name, const struct module *mod);
-
extern void __module_put_and_exit(struct module *mod, long code)
__attribute__((noreturn));
#define module_put_and_exit(code) __module_put_and_exit(THIS_MODULE, code);
@@ -558,11 +556,6 @@
return 0;
}
-static inline int is_exported(const char *name, const struct module *mod)
-{
- return 0;
-}
-
static inline int register_module_notifier(struct notifier_block * nb)
{
/* no events will happen anyway, so this can always succeed */
--- linux-2.6.21-rc6-mm1/kernel/module.c.old 2007-04-10 01:05:16.000000000 +0200
+++ linux-2.6.21-rc6-mm1/kernel/module.c 2007-04-10 01:05:36.000000000 +0200
@@ -1746,7 +1746,7 @@
}
#ifdef CONFIG_KALLSYMS
-int is_exported(const char *name, const struct module *mod)
+static int is_exported(const char *name, const struct module *mod)
{
if (!mod && lookup_symbol(name, __start___ksymtab, __stop___ksymtab))
return 1;
-
This patch makes the following needlessly global functions static:
- aops.c: ocfs2_write_data_page()
- dlmglue.c: ocfs2_dump_meta_lvb_info()
- file.c: ocfs2_set_inode_size()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
---
fs/ocfs2/aops.c | 6 ++---
fs/ocfs2/dlmglue.c | 54 ++++++++++++++++++++++++---------------------
fs/ocfs2/dlmglue.h | 7 -----
fs/ocfs2/file.c | 8 +++---
fs/ocfs2/file.h | 5 ----
5 files changed, 36 insertions(+), 44 deletions(-)
--- linux-2.6.21-rc6-mm1/fs/ocfs2/aops.c.old 2007-04-10 00:38:47.000000000 +0200
+++ linux-2.6.21-rc6-mm1/fs/ocfs2/aops.c 2007-04-10 00:38:55.000000000 +0200
@@ -934,9 +934,9 @@
* Returns a negative error code or the number of bytes copied into
* the page.
*/
-int ocfs2_write_data_page(struct inode *inode, handle_t *handle,
- u64 *p_blkno, struct page *page,
- struct ocfs2_write_ctxt *wc, int new)
+static int ocfs2_write_data_page(struct inode *inode, handle_t *handle,
+ u64 *p_blkno, struct page *page,
+ struct ocfs2_write_ctxt *wc, int new)
{
int ret, copied = 0;
unsigned int from = 0, to = 0;
--- linux-2.6.21-rc6-mm1/fs/ocfs2/dlmglue.h.old 2007-04-10 00:41:39.000000000 +0200
+++ linux-2.6.21-rc6-mm1/fs/ocfs2/dlmglue.h 2007-04-10 00:47:06.000000000 +0200
@@ -119,11 +119,4 @@
struct ocfs2_dlm_debug *ocfs2_new_dlm_debug(void);
void ocfs2_put_dlm_debug(struct ocfs2_dlm_debug *dlm_debug);
-/* aids in debugging and tracking lvbs */
-void ocfs2_dump_meta_lvb_info(u64 level,
- const char *function,
- unsigned int line,
- struct ocfs2_lock_res *lockres);
-#define mlog_meta_lvb(__level, __lockres) ocfs2_dump_meta_lvb_info(__level, __PRETTY_FUNCTION__, __LINE__, __lockres)
-
#endif /* DLMGLUE_H */
--- linux-2.6.21-rc6-mm1/fs/ocfs2/dlmglue.c.old 2007-04-10 00:42:19.000000000 +0200
+++ linux-2.6.21-rc6-mm1/fs/ocfs2/dlmglue.c 2007-04-10 00:44:23.000000000 +0200
@@ -103,6 +103,35 @@
static void ocfs2_dentry_post_unlock(struct ocfs2_super ...2.6.21-rc6-mm1 locks up during boot. The last message is: usbcore: registered new interface driver hiddev Then it hangs so hard that not even sysrq+B have any effect. With 2.6.18-rc5-mm1, the next messages I normally get are: usbcore: registered new interface driver usbhid drivers/usb/input/hid-core.c: v2.6:USB HID core driver usbcore: registered new interface driver usbserial This is a x86-64 single processor Helge Hafting -
On Wed, 11 Apr 2007 21:42:27 +0200 OK. If you add initcall_debug to the kernel boot command line, what's the last thing we call? -
The last messages (handwritten, somewhat shortened) calling hid_init+0x0/0x10() returned 0 ran for 0 msec calling hid_init+0x0/0x50() usbcore registered new interface driver hiddev and then it hangs completely. Helge Hafting -
On Thu, 12 Apr 2007 01:07:00 +0200 OK, thanks. If it happens to be, I'll bisect it down. Chances are it won't, and it gets merged, and we get to futz around with it for a week or two while holding up 2.6.22. I can only think we must enjoy doing it this way. -
Hi Helge, 2.6.21-rc6 (without any -mm patches) works fine? Could you please - try booting without any HID devices plugged in (i.e. usb mice, usb keyboards) if the problem persists? - recompile 2.6.21-rc6-mm1 with git-hid.patch reverted to see if it helps? I am unfortunately not able to reproduce it here on x86_64. Thanks, -- Jiri Kosina -
Pulled the usb mouse - this moved the crash around. usbhid was registered anyway, but later than usual. The last messages: md: <...> cpuidle: <...> sdhci: <...> sdhci: <...> usbcore: registered new interface hiddev usbcore: registered new interface usbhid drivers/hid/usbhid/hid_core.c v2.6 USB HID coredriver Advanced linux sound architecture <...> ACPI: PCI Interrupt 0000:00:06.0[A]->GSI 17 (lewel,low)->IRQ 17 And then it hung. Rebooting into rc5mm4, I got this as the next msgs: gameport: Trident 4DWave is pci0000:00:06.0/gameport0, speed 1966kHz ALSA device list: #0: Trident TRID4DWAVENX PCI Audio at 0x9400, irq 17 oprofile: using NMI interrupt. Netfilter messages via NETLINK v0.30. Just downloaded it. Unfortunately, it will not revert cleanly: $ patch -p1 -R --dry-run < ../git-hid.patch patching file drivers/hid/Kconfig patching file drivers/hid/Makefile patching file drivers/hid/hid-core.c Hunk #1 succeeded at 30 (offset -1 lines). Hunk #2 succeeded at 871 (offset -1 lines). Hunk #3 succeeded at 968 (offset -1 lines). Hunk #4 succeeded at 984 (offset -1 lines). patching file drivers/hid/hid-input.c Hunk #1 succeeded at 433 (offset 2 lines). Hunk #2 succeeded at 533 (offset 2 lines). patching file drivers/hid/hidraw.c patching file drivers/hid/usbhid/Kconfig patching file drivers/hid/usbhid/Makefile patching file drivers/hid/usbhid/hid-core.c Unreversed patch detected! Ignore -R? [n] Apply anyway? [n] Skipping patch. 1 out of 1 hunk ignored -- saving rejects to file drivers/hid/usbhid/hid-core.c. rej patching file drivers/hid/usbhid/hid-ff.c patching file drivers/hid/usbhid/hid-lgff.c patching file drivers/hid/usbhid/hid-pidff.c patching file drivers/hid/usbhid/hid-plff.c patching file drivers/hid/usbhid/hid-tmff.c patching file drivers/hid/usbhid/hid-zpff.c patching file drivers/hid/usbhid/hiddev.c patching file drivers/hid/usbhid/usbhid.h patching file drivers/hid/usbhid/usbkbd.c patching file drivers/hid/usbhid/usbmouse.c patching file ...
Do you compile with CONFIG_HIDRAW? -- Jiri Kosina -
No, that one is not set. I did use the new SLUB thing - could that possibly be the cause? Going back to SLAB is easy enough. Helge Hafting -
Went back to SLAB, got a compile error. Did a make clean and compiled again. Got some warnings: LD vmlinux SYSMAP System.map SYSMAP .tmp_System.map MODPOST vmlinux WARNING: init/built-in.o - Section mismatch: reference to .init.text:kernel_init from .text.rest_init after 'rest_init' (at offset 0xe) WARNING: mm/built-in.o - Section mismatch: reference to .init.text: from .text.k mem_cache_create after 'kmem_cache_create' (at offset 0x40b) WARNING: mm/built-in.o - Section mismatch: reference to .init.text: from .text.k mem_cache_create after 'kmem_cache_create' (at offset 0x568) AS arch/x86_64/boot/bootsect.o LD arch/x86_64/boot/bootsect AS arch/x86_64/boot/setup.o LD arch/x86_64/boot/setup AS arch/x86_64/boot/compressed/head.o CC arch/x86_64/boot/compressed/misc.o OBJCOPY arch/x86_64/boot/compressed/vmlinux.bin GZIP arch/x86_64/boot/compressed/vmlinux.bin.gz LD arch/x86_64/boot/compressed/piggy.o LD arch/x86_64/boot/compressed/vmlinux OBJCOPY arch/x86_64/boot/vmlinux.bin HOSTCC arch/x86_64/boot/tools/build BUILD arch/x86_64/boot/bzImage Root device is (8, 49) Boot sector 512 bytes. Setup is 7302 bytes. System is 3075 kB Kernel: arch/x86_64/boot/bzImage is ready (#11) Then I booted this - and it hung exactly the same way. I thought SLUB was reasonbably safe, it is new but not marked experimental. Helge Hafting -
Helge, with your .config, my machine hangs upon IPMI initialization, the last thing I see before total freeze is ipmi_si: Trying PCI-specified kcs state machine at mem address 0xd0121000, slave address 0x0, irq 5 (this was run on 32bit machine) When I turn IPMI off, I can't reproduce your hang, evetything runs smoothly. Could you please try recompiling the kernel with IPMI disabled, if it could be related? Corey added to CC. Thanks, -- Jiri Kosina -
Was that with ipmi linked into vmlinux? (Please send the output of grep IPMI .config) I thought we fixed that. -
I thought we fixed that too :( Can you run with the "print out what init function is running" option and see if it really is the ipmi driver that is dying or not? thanks, greg k-h -
Confirmed. 2.6.21-rc6-mm1 with CONFIG_IPMI_SI=y hangs upon boot on the already mentioned printk from ipmi_si. With CONFIG_IPMI_SI=m the boot succeeds. When manually trying to modprobe ipmi_si after that, the modprobe itself hangs, but the machine remains usable otherwise. I still wonder if this could be related to what Helge was originally reporting. -- Jiri Kosina -
Does this same .config hang in 2.6.21-rc6 without the -mm stuff? thanks, greg k-h -
Actually, after approximately 6 minutes 30 seconds, the modprobe finishes with -ENODEV and the following is spitted into dmesg: ipmi_si: There appears to be no BMC at this location ACPI: PCI interrupt for device 0000:02:00.4 disabled ipmi_si: Unable to find any System Interface(s) Anyway I just checked that I get precisely the same behavior with plain 2.6.21-rc6, so we can rule out -mm with this issue. It's possible that this system has some broken KCS. I will try to narrow this down. Anyway, the USB-related hang Helge is seeing is therefore a different story. -- Jiri Kosina -
My guess is that this system spaces out its KCS registers, but there appears to be no way to specify register spacing or offsets with PCI. That would mean that the configuration register appears operational to the driver, but the data register is returning bogus data. Thus it appears "sort of" working to the driver, and it takes a long time to time out. I'm pretty sure it's possible to test to figure out where the registers are really located. However, I have no way to test this change. All the other configuration methods have a way to discover this information. Jiri, we should probably take this offline if you want to continue to work on it. Thanks, -corey -
Removed IPMI, recompiled, rebooted, crashed the same way. Helge Hafting -
Jiri, can you send me the output of "lspci -x" ? -corey -
OK, so it hangs somewhere nearby usbhid's hid_init(), and the usb_register() has been already invoked. Could you please apply the superstupid patch below and send me the output up to the point it hangs? I am curious to know whether it hangs somewhere inside usb_register(), or elsewhere. Thanks. diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c index 1ddca31..d930f62 100644 --- a/drivers/hid/usbhid/hid-core.c +++ b/drivers/hid/usbhid/hid-core.c @@ -1550,15 +1550,22 @@ static int __init hid_init(void) retval = hiddev_init(); if (retval) goto hiddev_init_fail; + printk(KERN_DEBUG "hid_init: before usb_register()\n"); retval = usb_register(&hid_driver); + printk(KERN_DEBUG "hid_init: after usb_register(), retuned %d\n", retval); if (retval) goto usb_register_fail; info(DRIVER_VERSION ":" DRIVER_DESC); + printk(KERN_DEBUG "hid_init: returning 0\n"); + dump_stack(); return 0; usb_register_fail: + printk(KERN_DEBUG "hid_init: calling hiddev_exit()\n"); hiddev_exit(); hiddev_init_fail: + printk(KERN_DEBUG "hid_init: returning %d\n", retval); + dump_stack(); return retval; } -
Are you sure this is the correct patch - against 2.6.21-rc6-mm1 ? -
Well I am pretty sure: box:~/scratch # wget ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm... 2>&1 box:~/scratch # wget ftp://ftp.kernel.org/pub/linux/kernel/v2.6/linux-2.6.20.tar.bz2>/dev/null 2>&1 box:~/scratch # wget ftp://ftp.kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.21-rc6.bz2>/dev/null 2>&1 box:~/scratch # tar xf linux-2.6.20.tar.bz2 box:~/scratch # cd linux-2.6.20/ box:~/scratch/linux-2.6.20 # mv ../patch-2.6.21-rc6.bz2 . box:~/scratch/linux-2.6.20 # bunzip2 patch-2.6.21-rc6.bz2 box:~/scratch/linux-2.6.20 # patch -p1 < patch-2.6.21-rc6 >/dev/null 2>&1; echo $? 0 box:~/scratch/linux-2.6.20 # mv ../2.6.21-rc6-mm1.bz2 . box:~/scratch/linux-2.6.20 # bunzip2 2.6.21-rc6-mm1.bz2 box:~/scratch/linux-2.6.20 # patch -p1 < 2.6.21-rc6-mm1 >/dev/null 2>&1; echo $? 0 box:~/scratch/linux-2.6.20 # cat tmp.patch diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c index 1ddca31..d930f62 100644 --- a/drivers/hid/usbhid/hid-core.c +++ b/drivers/hid/usbhid/hid-core.c @@ -1550,15 +1550,22 @@ static int __init hid_init(void) retval = hiddev_init(); if (retval) goto hiddev_init_fail; + printk(KERN_DEBUG "hid_init: before usb_register()\n"); retval = usb_register(&hid_driver); + printk(KERN_DEBUG "hid_init: after usb_register(), retuned %d\n", retval); if (retval) goto usb_register_fail; info(DRIVER_VERSION ":" DRIVER_DESC); + printk(KERN_DEBUG "hid_init: returning 0\n"); + dump_stack(); return 0; usb_register_fail: + printk(KERN_DEBUG "hid_init: calling hiddev_exit()\n"); hiddev_exit(); hiddev_init_fail: + printk(KERN_DEBUG "hid_init: returning %d\n", retval); + dump_stack(); return retval; } box:~/scratch/linux-2.6.20 # patch -p1 < tmp.patch patching file drivers/hid/usbhid/hid-core.c box:~/scratch/linux-2.6.20 # So I guess you are ...
Jiri Kosina wrote: I don't know about 2.6.21-rc6, but 2.6.21-rc7 (from fresh sources) is good. It boots up without hanging, and my USB devices works too. Should I test rc7-mm1 then? Helge Hafting -
That would also be useful. But really identifying offending patch using bisection would help most. And it should be pretty easy and not too much time consuming for you, as the bug triggers immediately upon boot in your case. In case you are not convenient with "bisecting by hand" Andrew's quilt patchset, don't forget that it is also possible to obtain -mm tree through git, which provides very convenient means for bisecting. This is what I usually do. -- Jiri Kosina -
If there is an offending patch at all - my rc6-mm1 kernel must have been built from messed-up sources - we saw that when your patch did not apply. So my source had errors - right in the USB part. I haven't tested a correct rc6-mm1, so I don't even know if it Indeed - it is easy to spot. :-) Helge Hafting -
I recompiled 2.6.21-rc6-mm1 from fresh sources. It still hangs initializing USBm but this time your patch applied. I rebooted with your patch, and got: Detailed lists of all the USB devices found (printer,mouse,...) Then usbcore registered various drivers, such as usblp, usb-storage, libusual, usbserial, ipaq These messages were intermixed with messages from the md raid system initializing The three last lines were: sdhci: Secure digital host controller interface driver sdhci: copyright Pierre Ossman usbcore: registered new interface driver hiddev And then the machine hung completely. I'll have a look at bisecting. :-( -
2.6.21-rc6 boots up fine. Both rc6 and rc7 has a different problem - the machine tends to hang after some minutes work in X. That hang is unusual in that moving the mouse still move the X cursor, but everything else stops and sysrq fails me. But that is another story. rc6 boots, rc6-mm1 hangs at the "usbcore registered hiddev" message. Bisection: 1, 2, 3: the three first hangs at "usbcore registered hiddev" 4, 5, 6: the next three hangs at a message about ACPI PCI[A]->IRQ17 I decided to keep bisecting these hangers as "bad", I don't really know if this could be the same thing or completely different issues. If they are different, then one problem will mask the other anyway, so calling every hanging kernel "bad" will at least find the first broken patch. 7: boots up ok! 8,9,10: hangs at the aboce mentioned ACPI message The (first) "hanging" patch in 2.6.21-rc6-mm1 is: git-acpi.patch Helge Hafting -
Hi Helge, thanks for the effort. If you take stock rc6-mm1 and revert just git-acpi.patch, doesn the machine behave correctly? -- Jiri Kosina -
It would be easier and would produce a clearer result to test just 2.6.21-rc7 + 2.6.21-rc7-mm2's origin.patch + 2.6.21-rc7-mm2's acpi.patch from ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm... -
Just compiled & booted such a kernel - it came up fine! So it looks like USB is fine then, and the problem is in that ACPI patch. Helge Hafting -
On Fri, 27 Apr 2007 23:04:58 +0200 OK, thanks. Len&co: we've established that 2.6.21-rc6-mm1's git-acpi.patch -
On Sun, Apr 08, 2007 at 02:35:59PM -0700, Andrew Morton wrote: after bisecting I can finally say what breaks resume from STR here: tadaaaaa: CPU_IDLE. I first spotted the git-acpi.patch then reapplied it and disabled CPU_IDLE, now my laptop resumes. Any useful information I should add? $ cat /sys/devices/system/cpu/cpuidle/* acpi_idle no governors acpi_idle no governor $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz stepping : 6 cpu MHz : 1000.000 cache size : 2048 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips : 3671.24 clflush size : 64 processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz stepping : 6 cpu MHz : 1000.000 cache size : 2048 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips : 15805.85 clflush size : 64 -- -
please check if the patch at http://marc.info/?l=linux-acpi&m=117523651630038&w=2 fixed the issue Thanks, Shaohua -
I have the same system as Mattia, and when I applied this patch and turned CPU_IDLE back on, I got a panic on boot. Unfortunately, the EIP scrolled off screen, so I can't get a line number. (I had the same STR breakage as him; STR did not work with CPU_IDLE turned on, and it did work with CPU_IDLE turned off.) I'm running +rc6+mm(April 11) on a Sony VAIO SZ. joshua -
Is it possible you can get the log from a serial? I thought at least you can see some log info in the screen, if you haven't serial, please write it down. The boot panic surprise me, as it works here. Thanks, Shaohua -
Looks there is init order issue of sysfs files. The new refreshed patch
should fix your bug.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Index: 21-rc6-mm1/drivers/acpi/processor_idle.c
===================================================================
--- 21-rc6-mm1.orig/drivers/acpi/processor_idle.c 2007-04-17 13:41:29.000000000 +0800
+++ 21-rc6-mm1/drivers/acpi/processor_idle.c 2007-04-17 14:03:56.000000000 +0800
@@ -624,7 +624,7 @@ int acpi_processor_cst_has_changed(struc
return -ENODEV;
acpi_processor_get_power_info(pr);
- return cpuidle_force_redetect(&per_cpu(cpuidle_devices, pr->id));
+ return cpuidle_force_redetect(per_cpu(cpuidle_devices, pr->id));
}
/* proc interface */
Index: 21-rc6-mm1/drivers/cpuidle/cpuidle.c
===================================================================
--- 21-rc6-mm1.orig/drivers/cpuidle/cpuidle.c 2007-04-17 13:41:29.000000000 +0800
+++ 21-rc6-mm1/drivers/cpuidle/cpuidle.c 2007-04-17 14:42:17.000000000 +0800
@@ -18,7 +18,7 @@
#include "cpuidle.h"
-DEFINE_PER_CPU(struct cpuidle_device, cpuidle_devices);
+DEFINE_PER_CPU(struct cpuidle_device *, cpuidle_devices);
EXPORT_PER_CPU_SYMBOL_GPL(cpuidle_devices);
DEFINE_MUTEX(cpuidle_lock);
@@ -34,13 +34,13 @@ static void (*pm_idle_old)(void);
*/
static void cpuidle_idle_call(void)
{
- struct cpuidle_device *dev = &__get_cpu_var(cpuidle_devices);
+ struct cpuidle_device *dev = __get_cpu_var(cpuidle_devices);
struct cpuidle_state *target_state;
int next_state;
/* check if the device is ready */
- if (dev->status != CPUIDLE_STATUS_DOIDLE) {
+ if (!dev || dev->status != CPUIDLE_STATUS_DOIDLE) {
if (pm_idle_old)
pm_idle_old();
return;
@@ -117,19 +117,32 @@ static int cpuidle_add_device(struct sys
int cpu = sys_dev->id;
struct cpuidle_device *dev;
- dev = &per_cpu(cpuidle_devices, cpu);
+ dev = per_cpu(cpuidle_devices, cpu);
- dev->cpu = cpu;
mutex_lock(&cpuidle_lock);
if (cpu_is_offline(cpu)) {
...Yes, that did fix the hang on resume from STR -- that now works fine. However: joshua@rebirth:/sys/devices/system/cpu/cpuidle$ cat available_drivers current_driver <NULL> joshua@rebirth:/sys/devices/system/cpu/cpuidle$ cat available_governors current_governor ladder ladder Is this correct? For reference, my config is http://joshuawise.com/config.gz -- I didn't see any options for cpuidle drivers to access ACPI states... joshua -
drivers/ieee1394/ieee1394_transactions.c fails for me if CONFIG_SMP=n gcc complains: CC drivers/ieee1394/ieee1394_transactions.o drivers/ieee1394/ieee1394_transactions.c: In function 'hpsb_get_tlabel': drivers/ieee1394/ieee1394_transactions.c:183: error: 'TASK_INTERRUPTIBLE' undeclared (first use in this function) drivers/ieee1394/ieee1394_transactions.c:183: error: (Each undeclared identifier is reported only once drivers/ieee1394/ieee1394_transactions.c:183: error: for each function it appears in.) drivers/ieee1394/ieee1394_transactions.c:183: warning: implicit declaration of function 'signal_pending' drivers/ieee1394/ieee1394_transactions.c:183: warning: implicit declaration of function 'schedule' drivers/ieee1394/ieee1394_transactions.c: In function 'hpsb_free_tlabel': drivers/ieee1394/ieee1394_transactions.c:213: error: 'TASK_INTERRUPTIBLE' undeclared (first use in this function) make[2]: *** [drivers/ieee1394/ieee1394_transactions.o] Error 1 make[1]: *** [drivers/ieee1394] Error 2 make: *** [drivers] Error 2 I fixed this by adding #include <linux/sched.h> before #include <linux/wait.h> But that is probably not the correct fix, but gives me a working kernel. Diff between a working .config and a failing one: (created by switching SMP off with menuconfig) --- config.works 2007-04-09 20:54:30.182374075 +0200 +++ .config 2007-04-09 20:54:47.317863059 +0200 @@ -3,3 +3,3 @@ # Linux kernel version: 2.6.21-rc6-mm1 -# Mon Apr 9 16:01:11 2007 +# Mon Apr 9 20:54:47 2007 # @@ -36,3 +36,3 @@ CONFIG_EXPERIMENTAL=y -CONFIG_LOCK_KERNEL=y +CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 @@ -57,3 +57,2 @@ CONFIG_IKCONFIG_PROC=y -CONFIG_CPUSETS=y # CONFIG_SYSFS_DEPRECATED is not set @@ -104,3 +103,2 @@ CONFIG_KMOD=y -CONFIG_STOP_MACHINE=y @@ -151,5 +149,3 @@ CONFIG_MTRR=y -CONFIG_SMP=y -# CONFIG_SCHED_SMT is not set -CONFIG_SCHED_MC=y +# CONFIG_SMP is not set CONFIG_PREEMPT_NONE=y @@ -157,21 +153,12 @@ # CONFIG_PREEMPT is not ...
Thanks, I'll add this to linux1394-2.6.git (which exposed the problem) ASAP. On the other hand, the culprit is actually include/linux/wait.h which IMO should include the headers it needs for itself. -- Stefan Richter -=====-=-=== -=-- -=--= http://arcgraph.de/sr/ -
And while I am at it: From: Stefan Richter <stefanr@s5r6.in-berlin.de> Subject: ieee1394: some more includes Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> --- drivers/ieee1394/ieee1394_transactions.c | 3 +++ 1 file changed, 3 insertions(+) Index: linux/drivers/ieee1394/ieee1394_transactions.c =================================================================== --- linux.orig/drivers/ieee1394/ieee1394_transactions.c +++ linux/drivers/ieee1394/ieee1394_transactions.c @@ -10,13 +10,16 @@ */ #include <linux/bitops.h> +#include <linux/compiler.h> #include <linux/hardirq.h> #include <linux/spinlock.h> +#include <linux/string.h> #include <linux/sched.h> /* because linux/wait.h is broken if CONFIG_SMP=n */ #include <linux/wait.h> #include <asm/bug.h> #include <asm/errno.h> +#include <asm/system.h> #include "ieee1394.h" #include "ieee1394_types.h" -- Stefan Richter -=====-=-=== -=-- -=--= http://arcgraph.de/sr/ -
Has somthing related with PTY's changed in this kernel ?
I have to enable legacy PTY handling in a couple boxes to get ssh working.
If not, I had openpty() errors and nor sshd nor virtual terminals (aterm) were
able to get a terminal.
User space (udev) is the same in three boxes and one works and two fail.
I had /dev/ptmx everywhere and /dev/pts mounted
Any idea ?
TIA
--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2008.0 (Cooker) for i586
Linux 2.6.20-jam10 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #1 SMP PREEMPT
-
Not as far as I know, but there were some kobject_uevent changes which I have CONFIG_PM_LEGACY unset in at least one of my test configs and it Nope. Can you please check 2.6.21-rc7-mm1, see if that fixed it? If so, it might have been the kobject_uevent thing. -
I will, thanks.
A couple questions (as far as udev behaviour is sooooooo distro dependent):
- What should I have in /dev if I don't use legacy ptys ? As I understand
it, only /dev/ptmx and /dev/pts/*, no /dev/tty* nor /dev/pty* ?
- If my setup, for whatever strange reasons has /dev/tty* stored anyware
(/dev/.udev, links.conf...) and they get created, I supose that opening
/dev/tty will give a ENODEV ?
TIA
--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2008.0 (Cooker) for i586
Linux 2.6.20-jam10 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #4 SMP PREEMPT
-
My FC5 CONFIG_LEGACY_PTYS=n box has no /dev/ptmx, /dev/pts/*, all of /dev/tty0 through /dev/tty63 and no /dev/pty*. I'm not sure where all the /dev/tty*'s came from - perhaps a static udev well, /dev/tty is attached to your current tty and /dev/tty2 will get you talking to the second VT. I can't immediately thing what /dev/tty22 is attached to. -
Linux has traditionally used the BSD-like names /dev/ptyxx for
masters and /dev/ttyxx for slaves of pseudo terminals. This scheme
has a number of problems. The GNU C library glibc 2.1 and later,
however, supports the Unix98 naming standard: in order to acquire a
pseudo terminal, a process opens /dev/ptmx; the number of the pseudo
terminal is then made available to the process and the pseudo
terminal slave can be accessed as /dev/pts/<number>. What was
traditionally /dev/ttyp2 will then be /dev/pts/2, for example.
So if all userspace is Unix98-aware, you just would be done with
/dev/ptmx and /dev/pts/*. In your setup it looks like you are not able
to use Unix98 PTYs, but as udev has created tty* things work.
I supposed it was something like you always opened /dev/tty but kernel+glibc
redirect you to /dev/ttyXX, that is your _real_ terminal.
I will try to check docs...
--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2008.0 (Cooker) for i586
Linux 2.6.20-jam10 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #4 SMP PREEMPT
-
Oops, no, /dev/tty?? are for virtual consoles.
But I think I found the problem.
In short, in /dev/pts is mounted before /dev. I remounted it and ssh worked
fine again.
I'll dig mandrivas rc's to check this...
Anyways, I see no plain 'mount' command in /sbin/start_udev, all are
'mount --move' commands. So I think it supposes is already mounted and
tries to move it.
--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2008.0 (Cooker) for i586
Linux 2.6.20-jam10 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #4 SMP PREEMPT
-
As a (in)famous last work, I think Unix98 PTYs really don't like mount --move
for /dev/pts. If I mount it manually after boot, everything works fine.
--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2008.0 (Cooker) for i586
Linux 2.6.20-jam10 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #4 SMP PREEMPT
-
