Re: [PATCH] SCSI: fix isa/pcmcia compile problem

Previous thread: none

Next thread: SCHED_FIFO & system() by linux on Thursday, January 17, 2008 - 6:19 am. (5 messages)
To: <linux-kernel@...>
Date: Thursday, January 17, 2008 - 6:35 am

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc8...

- selinux is busted on one of my two selinux-enabled test machines.

- suspend-to-ram and suspend-to-disk are totally hosed on one of my test
machines. I guess I get to bisect this.

- git-nfsd is dropped due to conflicts with git-nfs

- git-newsetup is dropped due to conflicts with git-x86 (I think)

- git-perfmon is dropped due to conflicts with git-x86 (I think)

- git-kgdb is dropped due to conflicts with git-damn-near-everything

- git-block is dropped due to conflicts with the IDE tree

- kvm probably doesn't work properly because I couldn't be bothered fixing
the conflicts between git-kvm and the driver tree

- the volume of rejects and build errors which are caused by subsystem
maintainers fiddling with other people's stuff is quite out of control.
Something needs to happen here.

Boilerplate:

- See the `hot-fixes' directory for any important updates to this patchset.

- To fetch an -mm tree using git, use (for example)

git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1
git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1

- -mm kernel commit activity can be reviewed by subscribing to the
mm-commits mailing list.

echo "subscribe mm-commits" | mail majordomo@vger.kernel.org

- If you hit a bug in -mm and it is not obvious which patch caused it, it is
most valuable if you can perform a bisection search to identify which patch
introduced the bug. Instructions for this process are at

http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt

But beware that this process takes some time (around ten rebuilds and
reboots), so consider reporting the bug first and if we cannot immediately
identify the faulty patch, then perform the bisection search.

- When reporting bugs, please try to Cc: the relevant maintainer and mailing
list on any email.

- When reporting...

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>
Date: Friday, January 25, 2008 - 5:59 pm

I'm still seeing my mystery-crash that I had since 2.6.24-rc3-mm2.

The crashed kernel was 2.6.24-rc8-mm1 with the following patches:

* personal fix for the "do_md_run returned -22"-problem
I'm just moving the analyze_sbs(mddev); above the test.

* git-sched-fix-bug_on.patch
* hotfix-libata-scsi-corruption.patch

The crash (captured via serial console):
Jan 25 21:40:01 treogen cron[6553]: (root) CMD (test -x
/usr/sbin/run-crons && /usr/sbin/run-crons )
Jan 25 20:40:44 treogen syslog-ng[4839]: I/O error occurred while
writing; fd='5', error='Input/output error (5)'
[ 1242.319555] ------------[ cut here ]------------
[ 1242.319557] kernel BUG at lib/list_debug.c:33!
[ 1242.319558] invalid opcode: 0000 [1] SMP
[ 1242.319560] last sysfs file:
/sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[ 1242.319562] CPU 3
[ 1242.319563] Modules linked in:

The cursor on the receiving machine stayed after the : in the last
line, the crashed machine blinked caps lock and scroll lock.

I don't have a clue what the syslog-ng error is about or why this line
is one hour to early.
At 20:40 this kernel wasn't even build yet and syslog-ng started with
the correct timezone:
Jan 25 21:26:26 treogen syslog-ng[4839]: syslog-ng starting up; version='2.0.6'

As I'm seeing this bug during times of both network and hard disk
activity, could this be related to the problem discussed in the thread
"[PATCH rc8-mm1] hotfix libata-scsi corruption"? The line fixed in the
mm-hotfix seems to be to new to cause this in -rc3-mm2, but these
alignment problems seem to touch more than this and I'm not clear one
how old this might be.

(If this matters: The crashing system is running the smartd daemon
from smartmontools version 5.37)

I hope I will have time to try git-misc-tree on sunday...

Torsten
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <sparclinux@...>, David Miller <davem@...>, <perex@...>
Date: Thursday, January 24, 2008 - 8:04 pm

Hello,

I was digging through the gentoo bugzilla and found this:

http://bugs.gentoo.org/show_bug.cgi?id=141823

As you see this bug is present since at least 2.6.17. I can reproduce
that here on my hardware with 2.6.24-rc8-mm1. All you need to do is install
mp3blaster on sparc64, run:

$ mp3blaster some_mp3_file.mp3

and stop it by pressing ctrl-c. It oopses when you stop it. It doesn't happen
every time but it'll oops in a few tries.

This is my trace:

Unable to handle kernel paging request at virtual address 0000000100024000
tsk->{mm,active_mm}->context = 0000000000000dd8
tsk->{mm,active_mm}->pgd = fffff800bf5d6000
\|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
mp3blaster(3254): Oops [#1]
TSTATE: 0000000080009607 TPC: 000000000056a320 TNPC: 000000000056a324 Y: 00000000 Not tainted
TPC: <memcpy+0x1a8/0x13c0>
g0: 0000000000795b00 g1: 0000000100024ff8 g2: 0000000000000000 g3: 0000000000000038
g4: fffff800bf581580 g5: fffff8007f988000 g6: fffff800bdd70000 g7: 0000000000001000
o0: fffff800bdbd6740 o1: 0000000100024000 o2: 0000000000000008 o3: 000000000056a300
o4: fffff800bdbd6080 o5: 0000000000000000 sp: fffff800bdd72b21 ret_pc: 000000000056c228
RPC: <memcpy_user_stub+0x14/0x24>
l0: fffff800bdd73468 l1: fffff800bdd73370 l2: 0000000000755000 l3: fffff800bdb35fa0
l4: fffff800bdb34001 l5: 0000000000000000 l6: 0000000000000000 l7: 0000000000000000
i0: fffff800bdbd6080 i1: 0000000100023880 i2: 0000000000001780 i3: fffff800bf3419b0
i4: 00000000000005e0 i5: 0000000000000000 i6: fffff800bdd72be1 i7: 00000000100554cc
I7: <snd_pcm_lib_write_transfer+0x94/0xe0 [snd_pcm]>
Caller[00000000100554cc]: snd_pcm_lib_write_transfer+0x94/0xe0 [snd_pcm]
Caller[0000000010053294]: snd_pcm_lib_write1+0x15c/0x2a0 [snd_pcm]
Caller[0000000010074924]: snd_pcm_oss_sync+0x28c/0x2c0 [snd_pcm_oss]
Caller[00000000100756d0]: snd_pcm_oss_release+0x38/0xa0 [snd_pcm_oss]
Caller[00000000004c8eb0]: __fput+0xb8/0x1e0
Cal...

To: Mariusz Kozlowski <m.kozlowski@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <sparclinux@...>, David Miller <davem@...>, <perex@...>
Date: Friday, January 25, 2008 - 1:11 pm

At Fri, 25 Jan 2008 01:04:34 +0100,

This looks similar like a bug I fixed ago. Damn, it's still there.

Could you build with CONFIG_SND_DEBUG=y ? It addas some sanity checks
and might catch the fatal condition.

thanks,

Takashi
--

To: Takashi Iwai <tiwai@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <sparclinux@...>, David Miller <davem@...>, <perex@...>
Date: Friday, January 25, 2008 - 2:34 pm

Done. I don't think it changed much though :-/ If you have any other ideas please
let me know.

Unable to handle kernel paging request at virtual address 0000000100024000
tsk->{mm,active_mm}->context = 0000000000000dac
tsk->{mm,active_mm}->pgd = fffff800bdb64000
\|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
mp3blaster(3201): Oops [#1]
TSTATE: 0000000080009600 TPC: 000000000056a294 TNPC: 000000000056a298 Y: 00000000 Not tainted
TPC: <memcpy+0x11c/0x13c0>
g0: 0000000000795b00 g1: 0000000100025ec0 g2: 0000000000000000 g3: 0000000000000038
g4: fffff800bf58a5a0 g5: fffff8007f988000 g6: fffff800bc91c000 g7: 0000000000001ec0
o0: fffff800bdeb2080 o1: 0000000100024000 o2: 0000000000000008 o3: fffff801bdeb6080
o4: fffff800bdeb2080 o5: 0000000000000000 sp: fffff800bc91eb21 ret_pc: 000000000056c228
RPC: <memcpy_user_stub+0x14/0x24>
l0: fffff800bc91f468 l1: fffff800bc91f370 l2: 0000000000755000 l3: fffff800bde7dfa0
l4: fffff800bde7c001 l5: 0000000000000001 l6: fffff800bc818000 l7: 0000000000446600
i0: fffff800bdeb2080 i1: 0000000100024000 i2: 0000000000001f00 i3: 00000000100575b8
i4: 00000000000007c0 i5: fffff800bdeb0000 i6: fffff800bc91ebe1 i7: 0000000010054a30
I7: <snd_pcm_lib_write_transfer+0xb8/0x120 [snd_pcm]>
Caller[0000000010054a30]: snd_pcm_lib_write_transfer+0xb8/0x120 [snd_pcm]
Caller[000000001005215c]: snd_pcm_lib_write1+0x164/0x2e0 [snd_pcm]
Caller[0000000010074cc4]: snd_pcm_oss_sync+0x28c/0x2c0 [snd_pcm_oss]
Caller[0000000010075b64]: snd_pcm_oss_release+0x8c/0xe0 [snd_pcm_oss]
Caller[00000000004c8eb0]: __fput+0xb8/0x1e0
Caller[00000000004c5e58]: filp_close+0x60/0x80
Caller[00000000004723d4]: put_files_struct+0xfc/0x120
Caller[0000000000473a98]: do_exit+0x160/0x840
Caller[00000000004741ac]: do_group_exit+0x34/0xc0
Caller[000000000047db0c]: get_signal_to_deliver+0x274/0x300
Caller[0000000000449f64]: do_signal32+0x2c/0x1240
Caller[0000000000434758]: do_notify_resume+0x1c0/0x5a0
Caller[0000...

To: Mariusz Kozlowski <m.kozlowski@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <sparclinux@...>, David Miller <davem@...>, <perex@...>
Date: Monday, January 28, 2008 - 7:55 am

At Fri, 25 Jan 2008 19:34:35 +0100,

OK, could you try the patch below?

thanks,

Takashi

---

diff -r edbe1b84179b sound/core/oss/pcm_oss.c
--- a/sound/core/oss/pcm_oss.c Mon Jan 28 12:30:17 2008 +0100
+++ b/sound/core/oss/pcm_oss.c Mon Jan 28 12:56:13 2008 +0100
@@ -1621,6 +1621,7 @@ static int snd_pcm_oss_sync(struct snd_p
snd_pcm_format_set_silence(runtime->format,
runtime->oss.buffer,
size1);
+ size1 /= runtime->channels; /* frames */
fs = snd_enter_user();
snd_pcm_lib_write(substream, (void __user *)runtime->oss.buffer, size1);
snd_leave_user(fs);
--

To: Takashi Iwai <tiwai@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <sparclinux@...>, David Miller <davem@...>, <perex@...>
Date: Monday, January 28, 2008 - 6:13 pm

Great news. It works fine now. I tested it for some time but I'll test it even more
tommorow.

Thanks,

--

To: <ilpo.jarvinen@...>
Cc: Andrew Morton <akpm@...>, <netdev@...>, David Miller <davem@...>, <krkumar2@...>, LKML <linux-kernel@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>
Date: Thursday, January 24, 2008 - 2:44 am

Hi,

The following call trace is seen in the 2.6.24-rc8-mm1 kernel, which is same as one of
the call trace you have given a debug patch at http://marc.info/?l=linux-netdev&m=120107165228368&w=2
i was not able to apply the debug patch, can you kindly rebase the patch for 2.6.24-rc8-mm1 or let
me know, if i can help you in debugging this call trace.

Jan 24 11:13:57 p55lp6 kernel: [60656.708573] Badness at net/ipv4/tcp_input.c:2506
Jan 24 11:13:57 p55lp6 kernel: [60656.708583] NIP: c0000000003776e0 LR: c0000000003776a8 CTR: c0000000003aaf8c
Jan 24 11:13:57 p55lp6 kernel: [60656.708597] REGS: c00000000f6f34a0 TRAP: 0700 Not tainted (2.6.24-rc8-mm1)
Jan 24 11:13:57 p55lp6 kernel: [60656.708608] MSR: 8000000000029032 <EE,ME,IR,DR> CR: 24000088 XER: 00000018
Jan 24 11:13:57 p55lp6 kernel: [60656.708636] TASK = c000000000571710[0] 'swapper' THREAD: c000000000670000 CPU: 0
Jan 24 11:13:57 p55lp6 kernel: [60656.708648] GPR00: 00000000fffffffc c00000000f6f3720 c000000000664ab0 0000000000000000
Jan 24 11:13:57 p55lp6 kernel: [60656.708663] GPR04: 0000000000000001 000000000000040e 00000000000000d0 0000000000000000
Jan 24 11:13:57 p55lp6 kernel: [60656.708677] GPR08: 0000000000000000 00000000001bce5f 0000000000000001 fffffffffffffffc
Jan 24 11:13:57 p55lp6 kernel: [60656.708690] GPR12: c00000000f6f34f0 c000000000572180 0000000000000000 0000000000000004
Jan 24 11:13:57 p55lp6 kernel: [60656.708704] GPR16: 0000000000000001 000000003e3133e9 0000000000000402 000000000000011f
Jan 24 11:13:57 p55lp6 kernel: [60656.708718] GPR20: 000000000000011f 0000000000000004 0000000000000002 000000003e3133e9
Jan 24 11:13:57 p55lp6 kernel: [60656.708732] GPR24: c00000004c09fe80 0000000000000000 0000000000000000 000000000000040e
Jan 24 11:13:57 p55lp6 kernel: [60656.708745] GPR28: 0000000000000002 000000000000040e c000000000628570 c0000001791ff8d8
Jan 24 11:13:57 p55lp6 kernel: [60656.708760] NIP [c0000000003776e0] .tcp_fastretrans_alert+0xfc/0xe20
Jan 24 11:13:58 p55lp6 kernel: [60656.708778] LR [c000000...

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <linux-cifs-client@...>, <samba-technical@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>
Date: Thursday, January 24, 2008 - 2:09 am

Hi Andrew,

The machine drops into xmon, while running the fsstress
on the cifs mounted partition.

1:mon> r
R00 = 0000000000000000 R16 = 0000000000000000
R01 = c00000017527f910 R17 = 0000000000000000
R02 = d000000000862258 R18 = 0000000000000000
R03 = 0000000000000001 R19 = 0000000000000000
R04 = 0000000000000001 R20 = c00000013999fcb0
R05 = 0000000000000000 R21 = c00000013c589978
R06 = 0000000000000000 R22 = c00000015a9e6e00
R07 = 0000000000000001 R23 = c0000001786dfdf0
R08 = 0000000000000000 R24 = c00000018b5a7f50
R09 = c00000019012ff08 R25 = c00000017527fc10
R10 = 0000000000000001 R26 = 000000000000477d
R11 = c0000000003d4f90 R27 = c00000013999fcb0
R12 = d0000000008323e8 R28 = c00000017527fc10
R13 = c000000000572380 R29 = c0000001504afdf0
R14 = 0000000000000000 R30 = d000000000860168
R15 = 0000000000000000 R31 = d00000000085a338
pc = d000000000819e04 .find_writable_file+0x8c/0x1c0 [cifs]
lr = d000000000819de0 .find_writable_file+0x68/0x1c0 [cifs]
msr = 8000000000009032 cr = 24000888
ctr = c0000000003d4f90 xer = 0000000000000000 trap = 300
dar = c00000019012ff30 dsisr = 40010000
1:mon> t
[c00000017527f9b0] d00000000081f808 .cifs_setattr+0x178/0xb04 [cifs]
[c00000017527fae0] c0000000001202e4 .notify_change+0x1e0/0x414
[c00000017527fba0] c000000000101e2c .do_truncate+0x74/0xa8
[c00000017527fc80] c000000000102170 .sys_truncate+0x1c8/0x21c
[c00000017527fdb0] c000000000015f2c .compat_sys_truncate64+0x18/0x30
[c00000017527fe30] c000000000008734 syscall_exit+0x0/0x40
--- Exception: c00 (System Call) at 000000000ff2620c
SP (ff87f070) is in userspace
1:mon> e
cpu 0x1: Vector: 300 (Data Access) at [c00000017527f690]
pc: d000000000819e04: .find_writable_file+0x8c/0x1c0 [cifs]
lr: d000000000819de0: .find_writable_file+0x68/0x1c0 [cifs]
sp: c00000017527f910
msr: 8000000000009032
dar: c00000019012ff30
dsisr: 40010000
current = 0xc00000018761f640
paca = 0xc000000000572380
pid = 12920, comm = fsst...

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, David Miller <davem@...>, <sparclinux@...>
Date: Tuesday, January 22, 2008 - 4:30 pm

Hello,

Issuing "sysrq-s sysrq-u" sequence causes these warnings on sparc64:

------------[ cut here ]------------
WARNING: at fs/file_table.c:49 __fput+0x1a8/0x1e0()
Modules linked in: sg sr_mod cdrom
Call Trace:
[00000000004c9ac8] __fput+0x1b0/0x1e0
[00000000004c6978] filp_close+0x60/0x80
[00000000004c6a18] sys_close+0x80/0xe0
[00000000004062d4] linux_sparc_syscall32+0x3c/0x40
[0000000000012f1c] 0x12f24
---[ end trace 6dbe14ff8ec57744 ]---
------------[ cut here ]------------
WARNING: at fs/file_table.c:49 __fput+0x1a8/0x1e0()
Modules linked in: sg sr_mod cdrom
Call Trace:
[00000000004c9ac8] __fput+0x1b0/0x1e0
[00000000004c6978] filp_close+0x60/0x80
[00000000004c6a18] sys_close+0x80/0xe0
[00000000004062d4] linux_sparc_syscall32+0x3c/0x40
[0000000000012f1c] 0x12f24
---[ end trace 6dbe14ff8ec57744 ]---
------------[ cut here ]------------
WARNING: at fs/file_table.c:49 __fput+0x1a8/0x1e0()
Modules linked in: sg sr_mod cdrom
Call Trace:
[00000000004c9ac8] __fput+0x1b0/0x1e0
[00000000004c6978] filp_close+0x60/0x80
[00000000004c6a18] sys_close+0x80/0xe0
[00000000004062d4] linux_sparc_syscall32+0x3c/0x40
[0000000000012f1c] 0x12f24
---[ end trace 6dbe14ff8ec57744 ]---
------------[ cut here ]------------
WARNING: at fs/file_table.c:49 __fput+0x1a8/0x1e0()
Modules linked in: sg sr_mod cdrom
Call Trace:
[00000000004c9ac8] __fput+0x1b0/0x1e0
[00000000004c6978] filp_close+0x60/0x80
[00000000004c6a18] sys_close+0x80/0xe0
[00000000004062d4] linux_sparc_syscall32+0x3c/0x40
[0000000000012f1c] 0x12f24
---[ end trace 6dbe14ff8ec57744 ]---

Regards,

Mariusz
--

To: Mariusz Kozlowski <m.kozlowski@...>
Cc: <linux-kernel@...>, <davem@...>, <sparclinux@...>, Dave Hansen <haveblue@...>, Christoph Hellwig <hch@...>
Date: Tuesday, January 22, 2008 - 5:02 pm

That's

WARN_ON(f->f_mnt_write_state == FILE_MNT_WRITE_TAKEN);

cc's added...
--

To: Andrew Morton <akpm@...>
Cc: Mariusz Kozlowski <m.kozlowski@...>, <linux-kernel@...>, <davem@...>, <sparclinux@...>, Christoph Hellwig <hch@...>
Date: Tuesday, January 22, 2008 - 7:13 pm

The emergency remount code forcibly removes FMODE_WRITE from
filps. The r/o bind mount code notices that this was done
without a proper mnt_drop_write() and properly gives a
warning.

This patch does a mnt_drop_write() and also notes in the
filp that this was done to suppress any warning that would
have otherwise been triggered.

I also wonder if inode->i_writecount is made inconsistent
by the emergency remount code. I guess it is, but the
damage is limited to a single inode instead of being
visible more globally like the mnt write count. Probably
not really worth fixing.

BTW, this wasn't just a sparc thing. I triggered it in about
3 seconds on a plain old x86 machine.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---

linux-2.6.git-dave/fs/super.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)

diff -puN fs/super.c~robind-sysrq-fix fs/super.c
--- linux-2.6.git/fs/super.c~robind-sysrq-fix 2008-01-22 14:33:38.000000000 -0800
+++ linux-2.6.git-dave/fs/super.c 2008-01-22 14:51:13.000000000 -0800
@@ -37,6 +37,7 @@
#include <linux/idr.h>
#include <linux/kobject.h>
#include <linux/mutex.h>
+#include <linux/file.h>
#include <asm/uaccess.h>

@@ -566,10 +567,26 @@ static void mark_files_ro(struct super_b
{
struct file *f;

+retry:
file_list_lock();
list_for_each_entry(f, &sb->s_files, f_u.fu_list) {
- if (S_ISREG(f->f_path.dentry->d_inode->i_mode) && file_count(f))
- f->f_mode &= ~FMODE_WRITE;
+ struct vfsmount mnt;
+ if (!S_ISREG(f->f_path.dentry->d_inode->i_mode))
+ continue;
+ if (!file_count(f))
+ continue;
+ if (!(f->f_mode & FMODE_WRITE))
+ continue;
+ f->f_mode &= ~FMODE_WRITE;
+ f->f_mnt_write_state |= FILE_MNT_WRITE_RELEASED;
+ mnt = f->f_path.mnt;
+ file_list_unlock();
+ /*
+ * This can sleep, so we can't hold
+ * the file_list_lock() spinlock.
+ */
+ mnt_drop_write(...

To: Dave Hansen <haveblue@...>
Cc: <m.kozlowski@...>, <linux-kernel@...>, <davem@...>, <sparclinux@...>, <hch@...>
Date: Friday, February 1, 2008 - 7:34 pm

On Tue, 22 Jan 2008 15:13:58 -0800

this doesn't even compile. How much confidence am I supposed
to have that once I've fixed it, it will actually work?
--

To: Dave Hansen <haveblue@...>
Cc: Andrew Morton <akpm@...>, Mariusz Kozlowski <m.kozlowski@...>, <linux-kernel@...>, <davem@...>, <sparclinux@...>, Christoph Hellwig <hch@...>
Date: Wednesday, January 23, 2008 - 1:47 am

The right fix is to not simply remove FMODE_WRITE, but just remove
this whole function. Until we have a proper revoke it will cause more
harm than good.
--

To: Andrew Morton <akpm@...>
Cc: Mariusz Kozlowski <m.kozlowski@...>, <linux-kernel@...>, <davem@...>, <sparclinux@...>, Christoph Hellwig <hch@...>
Date: Tuesday, January 22, 2008 - 6:28 pm

I've reproduced it, and I'm looking in to it. I should have a patch
shortly.

-- Dave

--

To: Andrew Morton <akpm@...>, Gerd Knorr <kraxel@...>
Cc: <linux-kernel@...>
Date: Monday, January 21, 2008 - 2:53 pm

(Gerd Knorr cc'ed because 'git blame' says he last touched the line of code
I ended up touching - if this needs other cc:'s, will somebody who knows who
should review please add them?)

This gave the NVidia binary driver indigestion:

X:2772 conflicting cache attribute d0000000-d0006000 uncached<->default

While researching this one and adding some debugging printk's and dump_stack()s,
I found that this was caused by:

[ 0.444051] reserve_mattr swapper:1 setting d0000000-d0500000 to default
[ 0.444055] Pid: 1, comm: swapper Not tainted 2.6.24-rc8-mm1 #4
[ 0.444057]
[ 0.444057] Call Trace:
[ 0.444065] [<ffffffff8034dfe7>] ? ioremap_page_range+0x17b/0x244
[ 0.444069] [<ffffffff802259ba>] reserve_mattr+0x91/0x27d
[ 0.444072] [<ffffffff80224c24>] __ioremap+0xe6/0x146
[ 0.444077] [<ffffffff806f7297>] vesafb_probe+0x196/0x6b2
...

So the vesafb driver had decided to tag an even *larger* space as 'default'..
I'm *guessing* that in fact, this area should have been non-caching (since
it's the video memory for the card), but nothing cared/flagged before.

Pigheaded-and-probably-wrong brute-force fix that works on my laptop, but
somebody who actually understands the vesafb code should check that in fact
the space *should* be non-caching.

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>

--- linux-2.6.24-rc8-mm1/drivers/video/vesafb.c.dist 2007-10-09 16:31:38.000000000 -0400
+++ linux-2.6.24-rc8-mm1/drivers/video/vesafb.c 2008-01-20 11:11:57.000000000 -0500
@@ -286,7 +286,7 @@ static int __init vesafb_probe(struct pl
info->pseudo_palette = info->par;
info->par = NULL;

- info->screen_base = ioremap(vesafb_fix.smem_start, vesafb_fix.smem_len);
+ info->screen_base = ioremap_nocache(vesafb_fix.smem_start, vesafb_fix.smem_len);
if (!info->screen_base) {
printk(KERN_ERR
"vesafb: abort, cannot ioremap video memory 0x%x @ 0x%lx\n",

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>
Date: Monday, January 21, 2008 - 2:31 pm

This problem is fixed in Paul Moore's latest spin of the networking patches - I
was able to quilt up a fixed-up -rc8-mm1 with an updated git-lblnet.patch built
from his latest patch-bomb (labelled as 'RFC PATCH v12' posted to the selinux
and linux-security-module lists on Thursday, so the next pull of that git tree
into -mm should be OK...

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <venkatesh.pallipadi@...>, <suresh.b.siddha@...>, <mingo@...>
Date: Sunday, January 20, 2008 - 12:31 pm

The e100 network driver is failing to load properly on an old laptop. The
dmesg output is as follows

[ 68.875508] e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
[ 68.877091] e100: Copyright(c) 1999-2006 Intel Corporation
[ 68.881216] ACPI: PCI Interrupt 0000:00:03.0[A] -> Link [LNKC] -> GSI 11 (level, low) -> IRQ 11
[ 68.893113] modprobe:2736 conflicting cache attribute e8120000-e8121000 uncached<->default
[ 68.897090] e100: 0000:00:03.0: e100_probe: Cannot map device registers, aborting.
[ 68.901108] ACPI: PCI interrupt for device 0000:00:03.0 disabled
[ 68.905106] e100: probe of 0000:00:03.0 failed with error -12

The "conflicting cache attribute" message appears to be part of the PAT patches
in the git-x86 tree (cc's added). It may be a co-incidence but reverting
git-net related patches didn't fix it but reverting git-x86 and any
depencies to make quilt work did.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--

To: Mel Gorman <mel@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <venkatesh.pallipadi@...>, <suresh.b.siddha@...>, <mingo@...>
Date: Sunday, January 20, 2008 - 12:35 pm

Hi, Mel,

Found and fixed the problem. The patch is available at

http://lkml.org/lkml/2008/1/17/534

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--

To: Andrew Morton <akpm@...>, <linux-kernel@...>, <venkatesh.pallipadi@...>, <suresh.b.siddha@...>, <mingo@...>
Date: Sunday, January 20, 2008 - 2:24 pm

Ah, my bad. I missed it when reading the thread earlier. Confirmed, this
patch fixes the problem. Thanks.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, Thomas Gleixner <tglx@...>
Date: Saturday, January 19, 2008 - 9:10 pm

Something goes wrong in the timers land. I get this on boot:

..MP-BIOS bug: 8254 timer not connected to IO-APIC
Disabling APIC timer
------------[ cut here ]------------
WARNING: at /home/rafael/src/mm/linux-2.6.24-rc8-mm1/kernel/time/clockevents.c:1
65 clockevents_register_device+0x36/0xc4()
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-rc8-mm1-rjw #7

Call Trace:
[<ffffffff8023874b>] warn_on_slowpath+0x58/0x6b
[<ffffffff80213a3c>] ? native_read_tsc+0x18/0x22
[<ffffffff80239527>] ? printk+0x67/0x69
[<ffffffff802528ee>] clockevents_register_device+0x36/0xc4
[<ffffffff80220812>] setup_APIC_timer+0x61/0x68
[<ffffffff806c838f>] setup_boot_APIC_clock+0x1ba/0x1c5
[<ffffffff806c7450>] smp_prepare_cpus+0x521/0x542
[<ffffffff806bc56e>] kernel_init+0x64/0x2ef
[<ffffffff8020ce78>] child_rip+0xa/0x12
[<ffffffff806bc50a>] ? kernel_init+0x0/0x2ef
[<ffffffff8020ce6e>] ? child_rip+0x0/0x12

---[ end trace ca143223eefdc828 ]---
SMP alternatives: switching to SMP code
Booting processor 1/2 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 3990.34 BogoMIPS (lpj=7980687)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
AMD Turion(tm) 64 X2 Mobile Technology TL-60 stepping 02
------------[ cut here ]------------
Brought up 2 CPUs
WARNING: at /home/rafael/src/mm/linux-2.6.24-rc8-mm1/kernel/time/clockevents.c:1
65 clockevents_register_device+0x36/0xc4()
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1-rjw #7

Call Trace:
CPU0 attaching sched-domain:
[<ffffffff8023874b>] warn_on_slowpath+0x58/0x6b
domain 0: span 00000000,00000000,00000000,00000003
groups: 00000000,00000000,00000000,00000001 00000000,00000000,00000000,0000000
2
[<ffffffff80239527>] ? printk+0x67/0x69
CPU1 attaching sched-domain:
domain 0: span 00000000,0000000...

To: Rafael J. Wysocki <rjw@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, Thomas Gleixner <tglx@...>
Date: Sunday, January 20, 2008 - 6:24 am

ok, since we disable the APIC timer, i suspect this warning can be
disregarded - the bootup is otherwise fine, right?

Thomas - why do we register it while it's disabled? I have put in that
warning to detect APIC miscalibrations.

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, Thomas Gleixner <tglx@...>
Date: Sunday, January 20, 2008 - 7:21 am

Well, for an unknown reason, there's a ~5 s delay during resume from RAM (in
fact I can also trigger it in the "core" test mode, ie. without entering the
sleep state), which I thought might be related. If not, it might be necessary
to bisect again. Sigh.

Thanks,
Rafael
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>
Date: Friday, January 18, 2008 - 2:06 pm

Odd nobody else has seen this... oldconfig fails for me on Debian...
kconfig/conf.c is using setlocale() without including the locale.h
header.

HOSTCC scripts/kconfig/conf.o
scripts/kconfig/conf.c: In function 'main':
scripts/kconfig/conf.c:502: warning: implicit declaration of function
'setlocale'
scripts/kconfig/conf.c:502: error: 'LC_ALL' undeclared (first use in
this function)
scripts/kconfig/conf.c:502: error: (Each undeclared identifier is
reported only once

Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>

--- a/scripts/kconfig/conf.c 2008-01-17 15:45:59.000000000 -0800
+++ b/scripts/kconfig/conf.c 2008-01-18 10:01:54.000000000 -0800
@@ -3,6 +3,8 @@
* Released under the terms of the GNU GPL v2.0.
*/

+#include <locale.h>
+
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <kvm-devel@...>
Date: Friday, January 18, 2008 - 1:26 pm

Hi, Andrew,

The following changes got KVM up and running for me

This patch fixes the kvm build on 2.6.24-rc8-mm1. First of all, it enables
the KVM build, the second fix moves kset_set_name to the .name member.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

arch/x86/Makefile | 2 +-
virt/kvm/kvm_main.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff -puN arch/x86/Makefile~fix-kvm-build arch/x86/Makefile
--- linux-2.6.24-rc8/arch/x86/Makefile~fix-kvm-build 2008-01-18 22:42:41.000000000 +0530
+++ linux-2.6.24-rc8-balbir/arch/x86/Makefile 2008-01-18 22:42:47.000000000 +0530
@@ -185,7 +185,7 @@ core-y += arch/x86/vdso/
core-$(CONFIG_IA32_EMULATION) += arch/x86/ia32/

# kvm host support - uncomment when merging
-# core-$(CONFIG_KVM) += arch/x86/kvm/
+core-$(CONFIG_KVM) += arch/x86/kvm/

# drivers-y are linked after core-y
drivers-$(CONFIG_MATH_EMULATION) += arch/x86/math-emu/
diff -puN virt/kvm/kvm_main.c~fix-kvm-build virt/kvm/kvm_main.c
--- linux-2.6.24-rc8/virt/kvm/kvm_main.c~fix-kvm-build 2008-01-18 22:42:41.000000000 +0530
+++ linux-2.6.24-rc8-balbir/virt/kvm/kvm_main.c 2008-01-18 22:42:47.000000000 +0530
@@ -1260,7 +1260,7 @@ static int kvm_resume(struct sys_device
}

static struct sysdev_class kvm_sysdev_class = {
- set_kset_name("kvm"),
+ .name = "kvm",
.suspend = kvm_suspend,
.resume = kvm_resume,
};
_

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--

To: <balbir@...>
Cc: <linux-kernel@...>, <kvm-devel@...>, Greg KH <greg@...>
Date: Tuesday, January 22, 2008 - 4:40 am

This patch straddles such a pickle of other patches (driver tree, kvm, git-x86) that
there doesn't seem much point in me untangling it. Presumably people will fix things
up as various trees merge into 2.6.25-rc1.

As long as Greg remembers to try to build kvm ;)
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, Dave Jones <davej@...>
Date: Friday, January 18, 2008 - 9:34 am

Suspend and hibernation are also broken on my HP nx6325, which is caused by
git-cpufreq.patch. Reverting this patch and
drivers-cpufreq-add-calls-to-cpufreq_cpu_put.patch makes things work again.

I reported this already for 2.6.24-rc6-mm1 and Dave said he would look at it
in January. It's still January, so hopefully he's still going to do that. ;-)

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Friday, January 18, 2008 - 1:10 pm

On Fri, Jan 18, 2008 at 02:34:59PM +0100, Rafael J. Wysocki wrote:
> On Thursday, 17 of January 2008, Andrew Morton wrote:
> >
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc8...
> >
> > - selinux is busted on one of my two selinux-enabled test machines.
> >
> > - suspend-to-ram and suspend-to-disk are totally hosed on one of my test
> > machines. I guess I get to bisect this.
>
> Suspend and hibernation are also broken on my HP nx6325, which is caused by
> git-cpufreq.patch. Reverting this patch and
> drivers-cpufreq-add-calls-to-cpufreq_cpu_put.patch makes things work again.
>
> I reported this already for 2.6.24-rc6-mm1 and Dave said he would look at it
> in January. It's still January, so hopefully he's still going to do that. ;-)

Given that laptop has a K8 CPU, it's highly likely that it's this patch..
http://userweb.kernel.org/~davej/pn.diff
Can you revert just that on top of -mm, (or just try this standalone on top of -rc8)
and confirm this is problematic ?

The rest of the stuff in cpufreq.git looks benign at first look.

Dave

--
http://www.codemonkey.org.uk
--

To: Dave Jones <davej@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Friday, January 18, 2008 - 4:50 pm

Reverting it from the -mm makes things work. I'll check if it breaks things
when applied on top of -rc8.
--

To: Dave Jones <davej@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Friday, January 18, 2008 - 5:55 pm

Yes, it does (ie. pn.diff alone on top of -rc8 breaks suspend 100% of the
time).
--

To: Rafael J. Wysocki <rjw@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Friday, January 18, 2008 - 12:53 pm

On Fri, Jan 18, 2008 at 02:34:59PM +0100, Rafael J. Wysocki wrote:
> On Thursday, 17 of January 2008, Andrew Morton wrote:
> >
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc8...
> >
> > - selinux is busted on one of my two selinux-enabled test machines.
> >
> > - suspend-to-ram and suspend-to-disk are totally hosed on one of my test
> > machines. I guess I get to bisect this.
>
> Suspend and hibernation are also broken on my HP nx6325, which is caused by
> git-cpufreq.patch. Reverting this patch and
> drivers-cpufreq-add-calls-to-cpufreq_cpu_put.patch makes things work again.
>
> I reported this already for 2.6.24-rc6-mm1 and Dave said he would look at it
> in January. It's still January, so hopefully he's still going to do that. ;-)

Yeah, I put myself out of action for a few weeks. I'm back now.
I'll look at this today. Thanks for the reminder.

Dave

--
http://www.codemonkey.org.uk
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <linuxppc-dev@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>
Date: Friday, January 18, 2008 - 4:36 am

Hi Andrew,

Following oops was seen while running kernbench on one of test machine
(power4+ box). I tried reproducing the oops but was unsuccessful.
I will try to reproduce the oops with debug info compiled.

Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: 0000000000004570 LR: 000000000fc42dc0 CTR: 0000000000000000
REGS: c00000077b6bf8c0 TRAP: 0300 Not tainted (2.6.24-rc8-mm1-autotest)
MSR: 8000000000001000 <ME> CR: 28022422 XER: 00000000
DAR: c00000077b6bfce0, DSISR: 000000000a000000
TASK = c000000773164c40[19588] 'as' THREAD: c00000077b6bc000 CPU: 1
GPR00: 0000000000004000 c00000077b6bfb40 0000000000007346 000000000000d032
GPR04: 000000000000043a 0000000000000000 000000000000000c 0000000000000004
GPR08: 000000000fd278c8 0000000048022424 c00000077b6bfe30 0000998be2321500
GPR12: 8000000000001030 c0000000005f6280 0000000010030000 0000000010030000
GPR16: 0000000010030000 0000000010050000 000000001006aac0 0000000010053cd0
GPR20: 0000000000000000 0000000000000fe0 0000000010050000 0000000010050000
GPR24: 0000000000000ff8 0000000000000fe8 0000000000000062 000000000fd27490
GPR28: 000000000fd274c8 0000000010099420 000000000fd25ff4 000000001009a400
NIP [0000000000004570] 0x4570
LR [000000000fc42dc0] 0xfc42dc0
Call Trace:
[c00000077b6bfb40] [c00000077b292000] 0xc00000077b292000 (unreliable)
Instruction dump:
48000000 XXXXXXXX XXXXXXXX XXXXXXXX 41820008 XXXXXXXX XXXXXXXX XXXXXXXX
48000010 XXXXXXXX XXXXXXXX XXXXXXXX f92101a0 XXXXXXXX XXXXXXXX XXXXXXXX

--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--

To: Kamalesh Babulal <kamalesh@...>
Cc: <linux-kernel@...>, <linuxppc-dev@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>
Date: Friday, January 18, 2008 - 4:44 am

odd. Where did the stack trace go?
--

To: Andrew Morton <akpm@...>
Cc: Kamalesh Babulal <kamalesh@...>, <linux-kernel@...>, <linuxppc-dev@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>
Date: Friday, January 18, 2008 - 5:01 am

It's there, it's just really really short (one line). The link
register is in userspace and the stack pointer looks to be right at
the top of a kernel stack area.

The trap was a data access exception which is very odd given that the
machine is in real mode (MMU off) with the pc at 0x4570. Actually it
looks like the machine probably got a data access exception somewhere
(probably in userspace, probably a page fault or similar) and then got
another exception before it had finished saving the state from the
first exception.

Kamalesh, do you still have the vmlinux? If so could you disassemble
the area from say 0x4500 to 0x4600, and find out what is the closest
symbol before 0xc000000000004570 from System.map, and show us those?

Paul.
--

To: Paul Mackerras <paulus@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <linuxppc-dev@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>
Date: Friday, January 18, 2008 - 5:34 am

I tried reproducing the problem and was successful with following trace
in which the pc is at 0x4570 as the above one

Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: 0000000000004570 LR: 000000000ff0288c CTR: 000000000ff013e0
REGS: c00000077e61f8c0 TRAP: 0300 Not tainted (2.6.24-rc8-mm1-autotest)
MSR: 8000000000001000 <ME> CR: 28000422 XER: 00000000
DAR: c00000077e61fce0, DSISR: 000000000a000000
TASK = c00000077207f880[23480] 'cc1' THREAD: c00000077e61c000 CPU: 3
GPR00: 0000000000004000 c00000077e61fb40 0000000000000088 000000000000d032
GPR04: 0000000000000088 000000000000030c 00000000fefefeff 000000007f7f7f7f
GPR08: 0000000000008000 0000000044000428 c00000077e61fe30 0000998be2321500
GPR12: 8000000000001030 c0000000005f6680 0000000010030000 0000000010030000
GPR16: 00000000105b0000 00000000105b0000 0000000010440000 00000000105b0000
GPR20: 00000000105b0000 00000000105b0000 00000000105b0000 00000000105b0000
GPR24: 00000000105b0000 00000000105b0000 00000000105b0000 00000000ffa11b24
GPR28: 0000000000000000 00000000ffffffff 000000000ffebff4 000000000ffec408
NIP [0000000000004570] 0x4570
LR [000000000ff0288c] 0xff0288c
Call Trace:
[c00000077e61fb40] [c00000077e61fcf0] 0xc00000077e61fcf0 (unreliable)
[c00000077e61fbd0] [0000000010440000] 0x10440000
Instruction dump:
48000000 XXXXXXXX XXXXXXXX XXXXXXXX 41820008 XXXXXXXX XXXXXXXX XXXXXXXX
48000010 XXXXXXXX XXXXXXXX XXXXXXXX f92101a0 XXXXXXXX XXXXXXXX XXXXXXXX

The disassembled vmlinux from 0x4500 to 0x4600

c000000000004500: f9 4d 01 68 std r10,360(r13)
c000000000004504: 48 02 89 f9 bl c00000000002cefc <.slb_allocate_realmode>
c000000000004508: e9 4d 01 68 ld r10,360(r13)
c00000000000450c: e8 6d 01 60 ld r3,352(r13)
c000000000004510: 81 2d 01 5c lwz r9,348(r13)
c000000000004514: 7d 48 03 a6 mtlr r10
c000000000004518: 71 8a 00 02 andi. r10,r12,2
c00000000000451...

To: Kamalesh Babulal <kamalesh@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <linuxppc-dev@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>
Date: Friday, January 18, 2008 - 6:26 am

Actually, how much RAM does this machine have? If it has less than
32GB, then the problem is that the kernel stack pointer is bogus.
(How it got to be bogus is the interesting question, of course. :)

Paul.

--

To: Paul Mackerras <paulus@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <linuxppc-dev@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>
Date: Friday, January 25, 2008 - 2:05 am

Hi Paul,

This kernel oops in seen in 2.6.24-rc8-git(2,3,4,5,7,8) and the 2.6.24.

Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: 0000000000004570 LR: 000000001030e594 CTR: 000000001012ddd0
REGS: c000000771f9f8c0 TRAP: 0300 Not tainted (2.6.24-autotest)
MSR: 8000000000001000 <ME> CR: 28000482 XER: 20000000
DAR: c000000771f9fce0, DSISR: 000000000a000000
TASK = c00000077b9c6000[19197] 'cc1' THREAD: c000000771f9c000 CPU: 2
GPR00: 0000000000000064 c000000771f9fb40 00000000f7fdb470 0000000000000000
GPR04: 0000000000000002 0000000000000000 0000000000782498 00000000003ff3ff
GPR08: 00000000aaaaaaab 0000000040000484 c000000771f9fe30 0000998be2321500
GPR12: 8000000000003030 c0000000005c5680 0000000010030000 0000000010030000
GPR16: 00000000105b0000 00000000105b0000 0000000010440000 00000000105b0000
GPR20: 00000000105b0000 00000000105f0000 0000000000000000 00000000ffd00b44
GPR24: 00000000105b0000 00000000105b0000 00000000105b0000 00000000105b0000
GPR28: 00000000105b0000 0000000010604684 0000000000000100 00000000105f75a8
NIP [0000000000004570] 0x4570
LR [000000001030e594] 0x1030e594
Call Trace:
[c000000771f9fb40] [c000000771f9fcf0] 0xc000000771f9fcf0 (unreliable)
Instruction dump:
48000000 XXXXXXXX XXXXXXXX XXXXXXXX 41820008 XXXXXXXX XXXXXXXX XXXXXXXX
48000010 XXXXXXXX XXXXXXXX XXXXXXXX f92101a0 XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace a8c779b801674eed ]---
-- 0:conmux-control -- time-stamp -- Jan/24/08 16:40:29 --
-- 0:conmux-control -- time-stamp -- Jan/24/08 16:47:56 --
Unable to handle kernel paging request for data at address 0xc00000077168f870
Faulting instruction address: 0x00004570
Oops: Kernel access of bad area, sig: 11 [#2]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: 0000000000004570 LR: c00000000004a310 CTR: 0000000000000000
REGS: c00000077168f450 TRAP: 0300 Tainted: G D (2.6.24-autotest)
MSR: 8000000000001000 <ME> CR: 28000242 XER: 00000000
DAR: c00000077168f870, DSISR: 000000000...

To: Paul Mackerras <paulus@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <linuxppc-dev@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>
Date: Friday, January 18, 2008 - 6:44 am

Hi Paul,

The machine has around 30GB of RAM, do you want me to try, by taking
the git-powerpc.patch out of the series and try reproducing the oops.

--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--

To: Kamalesh Babulal <kamalesh@...>
Cc: Paul Mackerras <paulus@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <linuxppc-dev@...>, Andy Whitcroft <apw@...>
Date: Friday, January 18, 2008 - 6:54 am

Kamalesh, I thought I saw Paul's request for trying without
git-powerpc.patch (it's in a separate email).

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--

To: Kamalesh Babulal <kamalesh@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <linuxppc-dev@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>
Date: Friday, January 18, 2008 - 6:19 am

So it's in the code that gets called on an unrecoverable SLB fault.
That's bad, we should never get those. Does this happen with mainline
too, or only with -rc8-mm1? I don't understand why we should start
seeing this problem unless something has changed in
arch/powerpc/kernel or arch/powerpc/mm (well I suppose a bug somewhere
else could cause memory corruption which might be able to lead to
this).

Does it still happen if you take git-powerpc.patch out of the series?

Paul.
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <linux-kbuild@...>, <linuxppc-dev@...>, Sam Ravnborg <sam@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>
Date: Friday, January 18, 2008 - 3:09 am

Hi Andrew,

The kernel build fails during the headers_check on power box

CHECK include/asm/nvram.h
/usr/local/autobench/autotest/tmp/build/linux/usr/include/asm/nvram.h requires linux/list.h, which does not exist in exported headers
make[3]: *** [/usr/local/autobench/autotest/tmp/build/linux/usr/include/asm/.check.nvram.h] Error 1
make[2]: *** [asm-powerpc] Error 2
make[1]: *** [headers_check] Error 2
make: *** [vmlinux] Error 2

--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--

To: Kamalesh Babulal <kamalesh@...>
Cc: <linux-kernel@...>, <linux-kbuild@...>, <linuxppc-dev@...>, Sam Ravnborg <sam@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>
Date: Friday, January 18, 2008 - 3:38 am

doh.

--- a/include/asm-powerpc/nvram.h~include-asm-powerpc-nvramh-needs-listh-fix
+++ a/include/asm-powerpc/nvram.h
@@ -11,7 +11,6 @@
#define _ASM_POWERPC_NVRAM_H

#include <linux/errno.h>
-#include <linux/list.h>

#define NVRW_CNT 0x20
#define NVRAM_HEADER_LEN 16 /* sizeof(struct nvram_header) */
@@ -59,6 +58,9 @@ struct nvram_header {
};

#ifdef __KERNEL__
+
+#include <linux/list.h>
+
struct nvram_partition {
struct list_head partition;
struct nvram_header header;
_

--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <sam@...>, <linux-kbuild@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>
Date: Friday, January 18, 2008 - 2:14 am

Hi Andrew,

The kernel build fails with following error message

scripts/mkubootimg/crc32.c:15:18: error: zlib.h: No such file or directory
scripts/mkubootimg/crc32.c:77: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'crc_table'
scripts/mkubootimg/crc32.c:153: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'crc32'
make[2]: *** [scripts/mkubootimg/crc32.o] Error 1
make[1]: *** [scripts/mkubootimg] Error 2
make: *** [scripts] Error 2

The patch causing this build failure may be git-kbuild.patch.

--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--

To: Kamalesh Babulal <kamalesh@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <sam@...>, <linux-kbuild@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>
Date: Sunday, January 20, 2008 - 11:30 am

A dependency on zlib was introduced by that patch. I installed
zlib1g-dev but I see that this is expected to be fixed without internal
dependencies anyway.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--

To: Kamalesh Babulal <kamalesh@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-kbuild@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>
Date: Friday, January 18, 2008 - 4:06 am

The mkubootimg patches in kbuild.git has been reverted - but that was
after akpm merged kbuild.git.
So it is fixed in next -mm.

The workaround for now is to just remove the line
containing "mkubootimg" in scripts/Makefile.

(Assuming you do not need the uImage target).

Sam
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, Josh Boyer <jwboyer@...>, Sam Ravnborg <sam@...>
Date: Thursday, January 17, 2008 - 5:29 pm

The "kbuild: rework arch specific Makefiles to use mkubootimg"
changeset in Kbuild git introduces a kernel build dependency on the
system zlib.h headers; scripts/mkubootimg/crc32.c wants it.

The build errors out if those headers aren't installed. Was this
intentional?

--
Joseph Fannin
jfannin@gmail.com

--

To: Joseph Fannin <jfannin@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, Sam Ravnborg <sam@...>
Date: Thursday, January 17, 2008 - 6:21 pm

On Thu, 17 Jan 2008 16:29:59 -0500

No, it wasn't. The first patch in that series should have included the
zlib.h header itself as well.

Sam, I'll respin the first patch and send it to you shortly. Sorry for
the trouble.

josh
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <paulus@...>, <linuxppc-dev@...>
Date: Thursday, January 17, 2008 - 6:15 pm

Hello,

The script below kills powerpc. oopses get longer and more
wonderful with every next 'cated' file.

/proc/<pid>/task/<pid>/pagemap seems to be the cause of oops. The
important thing is that it oopses for random (that is not first in a row)
process from /proc. So not every 'cat /proc/<pid>/task/<pid>/pagemap'
causes an oops.

I could try to bisect this but this powerpc box is iMac G3 (cpu at 400MHz)
and this will take time. So any hints appreciated.

Regards,

Mariusz

script:
---------
#!/bin/bash

for i in `find /proc/*/ -readable -type f`; do
echo -n "cat $i > /dev/null ... ";
logger -t proc_loop $i;
sync;
cat $i > /dev/null;
echo "done";
done
----------
syslog:
proc_loop: /proc/3731/task/3731/pagemap
kernel: BUG: sleeping function called from invalid context at fs/proc/task_mmu.c:554
kernel: in_atomic():1, irqs_disabled():0
kernel: Call Trace:
kernel: [cf1cddf0] [c000840c] show_stack+0x3c/0x194 (unreliable)
kernel: [cf1cde20] [c002b2ec] __might_sleep+0xf4/0x108
kernel: [cf1cde30] [c00d2d54] add_to_pagemap+0x40/0x11c
kernel: [cf1cde50] [c00d2f44] pagemap_pte_range+0xa8/0x10c
kernel: [cf1cde70] [c0081b30] walk_page_range+0x148/0x23c
kernel: [cf1cdeb0] [c00d3104] pagemap_read+0x15c/0x244
kernel: [cf1cdef0] [c0092144] vfs_read+0xc4/0x16c
kernel: [cf1cdf10] [c009261c] sys_read+0x4c/0x90
kernel: [cf1cdf40] [c001328c] ret_from_syscall+0x0/0x40
kernel: --- Exception: c01 at 0xff5a364
kernel: LR = 0x10002f60
kernel: BUG: scheduling while atomic: cat/8929/0x00000002
kernel: Call Trace:
kernel: [cf1cde90] [c000840c] show_stack+0x3c/0x194 (unreliable)
kernel: [cf1cdec0] [c002db24] __schedule_bug+0x64/0x78
kernel: [cf1cdee0] [c027207c] schedule+0x304/0x32c
kernel: [cf1cdf40] [c0013a5c] recheck+0x0/0x28
kernel: --- Exception: c01 at 0xff5a364
kernel: LR = 0x10002f60
kernel: BUG: scheduling while atomic: cat/8929/0x00000007
kernel: Call Trace:
kernel: [cf1cde90] [c000840c] show...

To: Mariusz Kozlowski <m.kozlowski@...>
Cc: <linux-kernel@...>, <paulus@...>, <linuxppc-dev@...>, Matt Mackall <mpm@...>
Date: Thursday, January 17, 2008 - 6:51 pm

On Thu, 17 Jan 2008 23:15:27 +0100

It's not really an oops - it's a warning. add_to_pagemap() is doing a
put_user() inside pagemap_pte_range->pte_offset_map->kmap_atomic.

A known bug, I'm afraid.

How to fix?

- double-buffer the data to be copied to userspace or

- take a local copy of the pte page then work on that instead or

- play copy_to_user_inatomic() tricks.

It would be really nice to get the maps4 stuff merged this time around but

hm. Not sure how that happened. The arch code thinks we're running in
--

To: Andrew Morton <akpm@...>
Cc: Mariusz Kozlowski <m.kozlowski@...>, <linux-kernel@...>, <paulus@...>, <linuxppc-dev@...>
Date: Thursday, January 17, 2008 - 7:39 pm

Hmm, this fell off my radar. How about something like this as a minimal
fix (untested as -mm is a complete doorstop for me at the moment)?

diff -r 5595adaea70f fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c Thu Jan 17 13:26:54 2008 -0600
+++ b/fs/proc/task_mmu.c Thu Jan 17 17:29:21 2008 -0600
@@ -582,20 +583,26 @@
{
struct pagemapread *pm = private;
pte_t *pte;
- int err = 0;
+ int offset = 0, err = 0;

pte = pte_offset_map(pmd, addr);
- for (; addr != end; pte++, addr += PAGE_SIZE) {
+ for (; addr != end; offset++, addr += PAGE_SIZE) {
u64 pfn = PM_NOT_PRESENT;
- if (is_swap_pte(*pte))
- pfn = swap_pte_to_pagemap_entry(*pte);
- else if (pte_present(*pte))
- pfn = pte_pfn(*pte);
+ if (is_swap_pte(pte[offset]))
+ pfn = swap_pte_to_pagemap_entry(pte[offset]);
+ else if (pte_present(pte[offset]))
+ pfn = pte_pfn(pte[offset]);
+#ifdef CONFIG_HIGHPTE
+ pte_unmap(pte);
err = add_to_pagemap(addr, pfn, pm);
+ pte = pte_offset_map(pmd, addr);
+#else
+ err = add_to_pagemap(addr, pfn, pm);
+#endif
if (err)
return err;
}
- pte_unmap(pte - 1);
+ pte_unmap(pte);

cond_resched();

--
Mathematics is the supreme nostalgia of our time.

--

To: Matt Mackall <mpm@...>
Cc: <m.kozlowski@...>, <linux-kernel@...>, <paulus@...>, <linuxppc-dev@...>
Date: Thursday, January 17, 2008 - 8:05 pm

On Thu, 17 Jan 2008 17:39:54 -0600

Good point, it really can be taht simple.

Do we need the ifdef? pte_offset_map/pte_unmap should be super-cheap on
!CONFIG_HIGHPTE builds.

--

To: Andrew Morton <akpm@...>
Cc: <m.kozlowski@...>, <linux-kernel@...>, <paulus@...>, <linuxppc-dev@...>
Date: Thursday, January 17, 2008 - 8:12 pm

In that case, pte_unmap is free, pte_offset_map is just a bit of math.
So yeah, we can simplify this. How about:

diff -r 5595adaea70f fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c Thu Jan 17 13:26:54 2008 -0600
+++ b/fs/proc/task_mmu.c Thu Jan 17 18:11:13 2008 -0600
@@ -582,20 +583,20 @@
{
struct pagemapread *pm = private;
pte_t *pte;
- int err = 0;
+ int offset = 0, err = 0;

- pte = pte_offset_map(pmd, addr);
- for (; addr != end; pte++, addr += PAGE_SIZE) {
+ for (; addr != end; offset++, addr += PAGE_SIZE) {
u64 pfn = PM_NOT_PRESENT;
- if (is_swap_pte(*pte))
- pfn = swap_pte_to_pagemap_entry(*pte);
- else if (pte_present(*pte))
- pfn = pte_pfn(*pte);
+ pte = pte_offset_map(pmd, addr);
+ if (is_swap_pte(pte[offset]))
+ pfn = swap_pte_to_pagemap_entry(pte[offset]);
+ else if (pte_present(pte[offset]))
+ pfn = pte_pfn(pte[offset]);
+ pte_unmap(pte);
err = add_to_pagemap(addr, pfn, pm);
if (err)
return err;
}
- pte_unmap(pte - 1);

cond_resched();

--
Mathematics is the supreme nostalgia of our time.

--

To: Matt Mackall <mpm@...>
Cc: <m.kozlowski@...>, <linux-kernel@...>, <paulus@...>, <linuxppc-dev@...>
Date: Thursday, January 17, 2008 - 8:29 pm

On Thu, 17 Jan 2008 18:12:48 -0600

Do we need `offset' at all?

You have

static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
void *private)
{
struct pagemapread *pm = private;
pte_t *pte;
int offset = 0, err = 0;

for (; addr != end; offset++, addr += PAGE_SIZE) {
u64 pfn = PM_NOT_PRESENT;
pte = pte_offset_map(pmd, addr);
if (is_swap_pte(pte[offset]))
pfn = swap_pte_to_pagemap_entry(pte[offset]);
else if (pte_present(pte[offset]))
pfn = pte_pfn(pte[offset]);
pte_unmap(pte);
err = add_to_pagemap(addr, pfn, pm);
if (err)
return err;
}

cond_resched();

return err;
}

but I think we just do s/pte[offset]/*pte/. The virtual address should be
the only thing we need to increment as we walk across the addresses here?

--

To: Andrew Morton <akpm@...>
Cc: <m.kozlowski@...>, <linux-kernel@...>, <paulus@...>, <linuxppc-dev@...>
Date: Thursday, January 17, 2008 - 8:47 pm

Looks like no.

I wonder if there's a good argument for adding a pte_offset_val() which
would let us do:

pteval = pte_offset_val(pmd, addr);

and shrink the map/unmap window and overhead here and possibly
elsewhere?

Anyway, updated but still untested patch now with revealing comment:

diff -r 5595adaea70f fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c Thu Jan 17 13:26:54 2008 -0600
+++ b/fs/proc/task_mmu.c Thu Jan 17 18:45:57 2008 -0600
@@ -584,18 +585,19 @@
pte_t *pte;
int err = 0;

- pte = pte_offset_map(pmd, addr);
- for (; addr != end; pte++, addr += PAGE_SIZE) {
+ for (; addr != end; addr += PAGE_SIZE) {
u64 pfn = PM_NOT_PRESENT;
+ pte = pte_offset_map(pmd, addr);
if (is_swap_pte(*pte))
pfn = swap_pte_to_pagemap_entry(*pte);
else if (pte_present(*pte))
pfn = pte_pfn(*pte);
+ /* unmap so we're not in atomic when we copy to userspace */
+ pte_unmap(pte);
err = add_to_pagemap(addr, pfn, pm);
if (err)
return err;
}
- pte_unmap(pte - 1);

cond_resched();

--
Mathematics is the supreme nostalgia of our time.

--

To: Matt Mackall <mpm@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <paulus@...>, <linuxppc-dev@...>
Date: Friday, January 18, 2008 - 1:23 pm

I patched the ppc32 kernel with this and run tests on /proc.
This patch helps. No more BUGs and oopses :)

Thanks,

--

To: Mariusz Kozlowski <m.kozlowski@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <paulus@...>, <linuxppc-dev@...>
Date: Friday, January 18, 2008 - 1:33 pm

Thanks, Andrew's already queued it up.

--
Mathematics is the supreme nostalgia of our time.

--

To: Matt Mackall <mpm@...>
Cc: <m.kozlowski@...>, <linux-kernel@...>, <paulus@...>, <linuxppc-dev@...>
Date: Thursday, January 17, 2008 - 9:07 pm

On Thu, 17 Jan 2008 18:47:17 -0600

That worked out nicely.

Wasn't the old code potentially pte_unmap()ping the wrong address? If we
enter with addr==end?

--

To: Andrew Morton <akpm@...>
Cc: <m.kozlowski@...>, <linux-kernel@...>, <paulus@...>, <linuxppc-dev@...>
Date: Thursday, January 17, 2008 - 9:16 pm

Cool, feel free to add:

Yes, that was busted.

--
Mathematics is the supreme nostalgia of our time.

--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>
Date: Thursday, January 17, 2008 - 5:10 pm

Hmm. On my Thinkpad R51, this gives me:

Uncompressing linux... Ok, booting kernel.

..and nothing more with the attached .config.

--
Mathematics is the supreme nostalgia of our time.

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>
Date: Thursday, January 17, 2008 - 9:29 pm

Also, both mainline git and x86.git build and boot fine.

--
Mathematics is the supreme nostalgia of our time.

--

To: Matt Mackall <mpm@...>
Cc: <linux-kernel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Friday, January 18, 2008 - 1:08 am

Tried your config on the old PIII. Boots OK.

Then tried it on the Vaio and the machine instantly stops with blinking
leds. But it also does this with mainline. You use pentium-M and so do I
on that machine. Odd.

--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Friday, January 18, 2008 - 9:55 am

Hmm, I don't think I was getting the blinking LEDs, so I suspect mine
was dying even earlier, perhaps in setup.s.

--
Mathematics is the supreme nostalgia of our time.

--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <linuxppc-dev@...>, <raisch@...>, <jgarzik@...>
Date: Thursday, January 17, 2008 - 3:06 pm

Hi,

My powerpc build-all-defconfigs script found the following:

mpc837x_mds_defconfig. Brokage looks like it came from libata's
for_each_sg() patch.

drivers/ata/sata_fsl.c: In function 'sata_fsl_fill_sg':
drivers/ata/sata_fsl.c:337: error: redeclaration of 'si' with no linkage
drivers/ata/sata_fsl.c:326: error: previous declaration of 'si' was here

powerpc_allyesconfig:

drivers/net/ehea/ehea_main.c: In function 'ehea_driver_sysfs_add':
drivers/net/ehea/ehea_main.c:2812: error: 'struct device_driver' has no member named 'kobj'
drivers/net/ehea/ehea_main.c:2815: error: 'struct device_driver' has no member named 'kobj'
drivers/net/ehea/ehea_main.c:2818: error: 'struct device_driver' has no member named 'kobj'
drivers/net/ehea/ehea_main.c: In function 'ehea_driver_sysfs_remove':
drivers/net/ehea/ehea_main.c:2830: error: 'struct device_driver' has no member named 'kobj'

-Olof
--

To: Olof Johansson <olof@...>
Cc: <linux-kernel@...>, <linuxppc-dev@...>, <raisch@...>, <jgarzik@...>, Greg KH <greg@...>, Kay Sievers <kay.sievers@...>
Date: Thursday, January 17, 2008 - 3:35 pm

--- a/drivers/ata/sata_fsl.c~git-libata-all-fix-drivers-ata-sata_fslc
+++ a/drivers/ata/sata_fsl.c
@@ -323,7 +323,6 @@ static unsigned int sata_fsl_fill_sg(str
struct scatterlist *sg;
unsigned int num_prde = 0;
u32 ttl_dwords = 0;
- unsigned int si;

/*
* NOTE : direct & indirect prdt's are contigiously allocated

Looks like the driver tree wrecking ball failed to visit that driver.

--

To: Andrew Morton <akpm@...>
Cc: Olof Johansson <olof@...>, <linux-kernel@...>, <linuxppc-dev@...>, <raisch@...>, <jgarzik@...>, Kay Sievers <kay.sievers@...>
Date: Thursday, January 17, 2008 - 6:00 pm

Crap, I thought I fixed that one, but the patch never made it out...
I'll fix that tomorrow, sorry about that.

greg k-h
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <paulus@...>, <linuxppc-dev@...>
Date: Thursday, January 17, 2008 - 2:18 pm

Hello,

This is from powerpc (iMac G3):

CC [M] sound/ppc/awacs.o
In file included from sound/ppc/awacs.c:24:
include/asm/nvram.h:62: error: field 'partition' has incomplete type
make[1]: *** [sound/ppc/awacs.o] Error 1
make: *** [sound/ppc/awacs.o] Error 2

Regards,

Mariusz

To: Mariusz Kozlowski <m.kozlowski@...>
Cc: <linux-kernel@...>, <paulus@...>, <linuxppc-dev@...>
Date: Thursday, January 17, 2008 - 3:27 pm

hm.

--- a/include/asm-powerpc/nvram.h~include-asm-powerpc-nvramh-needs-listh
+++ a/include/asm-powerpc/nvram.h
@@ -11,6 +11,7 @@
#define _ASM_POWERPC_NVRAM_H

#include <linux/errno.h>
+#include <linux/list.h>

#define NVRW_CNT 0x20
#define NVRAM_HEADER_LEN 16 /* sizeof(struct nvram_header) */
_

I wonder why mainline isn't busted actually.

--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>
Date: Thursday, January 17, 2008 - 2:13 pm

Please see if reverting pm-acquire-device-locks-on-suspend-rev-3.patch helps
and if it doesn't, please see if reverting git-acpi.patch helps.

Everyone having suspend/hibernation problems with this kernel, please check
if reverting reverting pm-acquire-device-locks-on-suspend-rev-3.patch (and the
fixes) helps first.

Thanks,
Rafael

--

To: Ingo Molnar <mingo@...>, <tglx@...>
Cc: <linux-kernel@...>, Andrew Morton <akpm@...>
Date: Thursday, January 17, 2008 - 1:28 pm

[Empty message]
To: Ingo Molnar <mingo@...>, <tglx@...>
Cc: <linux-kernel@...>, Andrew Morton <akpm@...>
Date: Thursday, January 17, 2008 - 2:44 pm

<snip>

grepping around and looking through the code, I notice it is because
these variables just do not exist for 32 bit NUMA. I am not sure how to
go about it, and will just leave it to folks who know what they are
doing there :).

--
regards,
Dhaval
--

To: Dhaval Giani <dhaval@...>
Cc: <tglx@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Mike Travis <travis@...>
Date: Friday, January 18, 2008 - 4:59 am

yes, Mike Travis has i think some patches in the works for this build
problem. Disabling NUMA on 32-bit is the solution meanwhile.

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: Dhaval Giani <dhaval@...>, <tglx@...>, <linux-kernel@...>, Andrew Morton <akpm@...>
Date: Friday, January 18, 2008 - 12:16 pm

I have the fix for this problem coming...

Thanks,
Mike
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <mingo@...>
Date: Thursday, January 17, 2008 - 12:48 pm

Booting on x86_64 SMP gives me:

------------[ cut here ]------------
kernel BUG at kernel/sched_rt.c:228!
invalid opcode: 0000 [1] SMP
last sysfs file: /sys/devices/pci0000:40/0000:40:0c.0/0000:41:00.0/0000:42:08.0/class
CPU 2
Modules linked in: parport_pc lp parport tg3 cciss ehci_hcd ohci_hcd uhci_hcd
Pid: 12738, comm: 5-1.test Not tainted 2.6.24-rc8-mm1 #1
RIP: 0010:[<ffffffff8023077b>] [<ffffffff8023077b>] update_curr_rt+0x27/0x87
RSP: 0018:ffff8101e6805e38 EFLAGS: 00010093
RAX: 0000000000000000 RBX: ffff81027f8591e0 RCX: ffff81000100fb80
RDX: 0000000000000000 RSI: ffff81026eb8b1e0 RDI: ffff810001014980
RBP: ffff8101e6805e48 R08: ffffffff8067d960 R09: 00000000000031c1
R10: 0000000000000000 R11: 0000000000000246 R12: ffff81026eb8b1e0
R13: ffff810001014980 R14: 0000000000000001 R15: 00000000ffffffff
FS: 0000000041a07940(0063) GS:ffff81027f80d700(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000338b00c2d0 CR3: 0000000254169000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process 5-1.test (pid: 12738, threadinfo ffff8101e6804000, task ffff81026c8de8f0)
Stack: 0000000000000000 ffff81026eb8b348 ffff8101e6805e78 ffffffff80231c9b
0000000000000000 ffff81026eb8b1e0 ffff810001014980 ffff81026eb8b1e0
ffff8101e6805e98 ffffffff8022f1bb ffff8101e6805ea8 ffff810001014980
Call Trace:
[<ffffffff80231c9b>] dequeue_task_rt+0x1f/0x5e
[<ffffffff8022f1bb>] dequeue_task+0x13/0x1e
[<ffffffff8022f1e8>] deactivate_task+0x22/0x2a
[<ffffffff802343f9>] sched_setscheduler+0x22e/0x32f
[<ffffffff8023486a>] do_sched_setscheduler+0x5f/0x6e
[<ffffffff802348a0>] sys_sched_setscheduler+0x14/0x18
[<ffffffff8020c0f9>] tracesys+0xdc/0xe1

Code: 48 89 c8 c3 55 48 89 e5 53 48 83 ec 08 48 8b 9f a8 07 00 00 8b 83 b0 01 00 00 48 8b 8b 98 01 00 00 83 f8 01 74 09 83 f8 02 74 04 <0f> 0b eb fe 48 8b 9...

To: Randy Dunlap <randy.dunlap@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <mingo@...>
Date: Thursday, January 17, 2008 - 1:15 pm

Hmm that would be me messing up in: 4c121cce

- if (!task_has_rt_policy(curr))
- return;
+ BUG_ON(!task_has_rt_policy(curr));

Does reverting that help?

Strange thing is, at the time I was pretty sure that ought not
happen... /me reaches for a brown paper bag.

--

To: Randy Dunlap <randy.dunlap@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <mingo@...>
Date: Thursday, January 17, 2008 - 1:11 pm

Hmm, that would be me messing up in : 4c121cce

- if (!task_has_rt_policy(curr))
- return;
+ BUG_ON(!task_has_rt_policy(curr));

Does reverting that fix it?

--

To: Peter Zijlstra <peterz@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <mingo@...>
Date: Thursday, January 17, 2008 - 3:14 pm

Ack. Andrew, do you want this hotfix?

---
From: Peter Zijlstra <peterz@infradead.org>

Revert the BUG_ON(). Condition is OK and happens.

Tested-by: Randy Dunlap <randy.dunlap@oracle.com>
---
kernel/sched_rt.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- linux-2.6.24-rc8-mm1.orig/kernel/sched_rt.c
+++ linux-2.6.24-rc8-mm1/kernel/sched_rt.c
@@ -225,7 +225,8 @@ static void update_curr_rt(struct rq *rq
struct rt_rq *rt_rq = rt_rq_of_se(rt_se);
u64 delta_exec;

- BUG_ON(!task_has_rt_policy(curr));
+ if (!task_has_rt_policy(curr))
+ return;

delta_exec = rq->clock - curr->se.exec_start;
if (unlikely((s64)delta_exec < 0))
--

To: Randy Dunlap <randy.dunlap@...>
Cc: Peter Zijlstra <peterz@...>, <linux-kernel@...>, <mingo@...>
Date: Thursday, January 17, 2008 - 3:38 pm

Thanks - I already added that to
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc8...

There are quite a few fixes there so I'd remind people to apply those as
well before testing.

--

To: Peter Zijlstra <peterz@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <mingo@...>
Date: Thursday, January 17, 2008 - 2:05 pm

Yes, it does. Thanks.

---
~Randy
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <linux-scsi@...>, <fischer@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>
Date: Thursday, January 17, 2008 - 12:15 pm

Hi Andrew,

The kernel build fails with following error

drivers/scsi/aha152x.o: In function `aha152x_host_reset_host':
/home/kamalesh/scrap/linux-2.6.24-rc8/drivers/scsi/aha152x.c:1324: multiple definition of `aha152x_host_reset_host'
drivers/scsi/pcmcia/built-in.o:/home/kamalesh/scrap/linux-2.6.24-rc8/drivers/scsi/aha152x.c:1324: first defined here
drivers/scsi/aha152x.o: In function `aha152x_release':
/home/kamalesh/scrap/linux-2.6.24-rc8/drivers/scsi/aha152x.c:908: multiple definition of `aha152x_release'
drivers/scsi/pcmcia/built-in.o:/home/kamalesh/scrap/linux-2.6.24-rc8/drivers/scsi/aha152x.c:908: first defined here
ld: Warning: size of symbol `aha152x_release' changed from 68 in drivers/scsi/pcmcia/built-in.o to 100 in drivers/scsi/aha152x.o
drivers/scsi/aha152x.o: In function `aha152x_probe_one':
/home/kamalesh/scrap/linux-2.6.24-rc8/drivers/scsi/aha152x.c:772: multiple definition of `aha152x_probe_one'
drivers/scsi/pcmcia/built-in.o:/home/kamalesh/scrap/linux-2.6.24-rc8/drivers/scsi/aha152x.c:772: first defined here
drivers/scsi/fdomain.o:(.data+0x0): multiple definition of `fdomain_driver_template'
drivers/scsi/pcmcia/built-in.o:(.data+0x5a0): first defined here
drivers/scsi/fdomain.o: In function `fdomain_16x0_bus_reset':
/home/kamalesh/scrap/linux-2.6.24-rc8/drivers/scsi/fdomain.c:1568: multiple definition of `fdomain_16x0_bus_reset'
drivers/scsi/pcmcia/built-in.o:/home/kamalesh/scrap/linux-2.6.24-rc8/drivers/scsi/fdomain.c:1568: first defined here
drivers/scsi/fdomain.o: In function `__fdomain_16x0_detect':
/home/kamalesh/scrap/linux-2.6.24-rc8/drivers/scsi/fdomain.c:894: multiple definition of `__fdomain_16x0_detect'
drivers/scsi/pcmcia/built-in.o:/home/kamalesh/scrap/linux-2.6.24-rc8/drivers/scsi/fdomain.c:894: first defined here
ld: Warning: size of symbol `__fdomain_16x0_detect' changed from 1206 in drivers/scsi/pcmcia/built-in.o to 1700 in drivers/scsi/fdomain.o
drivers/scsi/fdomain.o: In function `fdomain_setup':
/home/kamalesh/scrap/linux-2.6.24-rc8/drivers/scsi/fdomai...

To: Kamalesh Babulal <kamalesh@...>
Cc: <linux-kernel@...>, <linux-scsi@...>, <fischer@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Samuel Ortiz <samuel@...>, James Bottomley <James.Bottomley@...>, Tejun Heo <htejun@...>
Date: Thursday, January 17, 2008 - 3:11 pm

Neat. Seems that the scsi build system is linking together two copies of
drivers/scsi/aha152x.o. One via drivers/scsi/aha152x.o directly and the
other via drivers/scsi/pcmcia/built-in.o.

Please send the .config.

I'm looking suspiciously at this, from git-scsi-misc:

commit 8ae732a91df051aba6820068a47b631a06599d84
Author: Tejun Heo <htejun@gmail.com>
Date: Fri Dec 7 22:36:23 2007 +0900

[SCSI] make pcmcia directory use obj-y|m instead of subdir-y|m

subdir-y|m isn't supposed to contain modules or built-in components.
Change subdir-$(CONFIG_PCMCIA) to obj-$(CONFIG_PCMCIA).

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index b5441f5..93e1428 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -17,7 +17,7 @@
CFLAGS_aha152x.o = -DAHA152X_STAT -DAUTOCONF
CFLAGS_gdth.o = # -DDEBUG_GDTH=2 -D__SERIAL__ -D__COM2__ -DGDTH_STATISTICS

-subdir-$(CONFIG_PCMCIA) += pcmcia
+obj-$(CONFIG_PCMCIA) += pcmcia/

obj-$(CONFIG_SCSI) += scsi_mod.o
obj-$(CONFIG_SCSI_TGT) += scsi_tgt.o

--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <linux-scsi@...>, <fischer@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Samuel Ortiz <samuel@...>, James Bottomley <James.Bottomley@...>, Tejun Heo <htejun@...>
Date: Friday, January 18, 2008 - 2:37 am

Hi Andrew,

Patch from Tejun Heo fixes the aha152x.c build failure, and following second part
of the build failure, is still occurring.

drivers/scsi/fdomain.o:(.data+0x0): multiple definition of `fdomain_driver_template'
drivers/scsi/pcmcia/built-in.o:(.data+0x5a0): first defined here
drivers/scsi/fdomain.o: In function `fdomain_16x0_bus_reset':
/home/kamalesh/scrap/linux-2.6.24-rc8/drivers/scsi/fdomain.c:1568: multiple definition of `fdomain_16x0_bus_reset'
drivers/scsi/pcmcia/built-in.o:/home/kamalesh/scrap/linux-2.6.24-rc8/drivers/scsi/fdomain.c:1568: first defined here
drivers/scsi/fdomain.o: In function `__fdomain_16x0_detect':
/home/kamalesh/scrap/linux-2.6.24-rc8/drivers/scsi/fdomain.c:894: multiple definition of `__fdomain_16x0_detect'
drivers/scsi/pcmcia/built-in.o:/home/kamalesh/scrap/linux-2.6.24-rc8/drivers/scsi/fdomain.c:894: first defined here
ld: Warning: size of symbol `__fdomain_16x0_detect' changed from 1206 in drivers/scsi/pcmcia/built-in.o to 1700 in drivers/scsi/fdomain.o
drivers/scsi/fdomain.o: In function `fdomain_setup':
/home/kamalesh/scrap/linux-2.6.24-rc8/drivers/scsi/fdomain.c:554: multiple definition of `fdomain_setup'
drivers/scsi/pcmcia/built-in.o:/home/kamalesh/scrap/linux-2.6.24-rc8/drivers/scsi/fdomain.c:554: first defined here
make[2]: *** [drivers/scsi/built-in.o] Error 1
make[1]: *** [drivers/scsi] Error 2
make: *** [drivers] Error 2

--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--

To: Kamalesh Babulal <kamalesh@...>
Cc: <linux-kernel@...>, <linux-scsi@...>, <fischer@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Samuel Ortiz <samuel@...>, James Bottomley <James.Bottomley@...>, Tejun Heo <htejun@...>
Date: Friday, January 18, 2008 - 3:27 am

Tejun has more fixing to do, I suspect ;)

I assume a basic allyesconfig will weed out most remaining problems of this
sort. Problem is, it needs to be done for all architectures (and even that
might not suffice). So old-fashioned code inspection is also needed.
--

To: Kamalesh Babulal <kamalesh@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-scsi@...>, <fischer@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Samuel Ortiz <samuel@...>, James Bottomley <James.Bottomley@...>
Date: Friday, January 18, 2008 - 3:20 am

aha152x.c and fdomain are built twice - once for the isa driver and
once for the PCMCIA one. Through #ifdefs, the compiled codes are
slightly different; thus, global symbols need to be given different
names depending on which flavor is being built. This patch adds
GLOBAL() macro to aha152x.h and fdomain.h which change the symbol
depending on PCMCIA.

This bug has always existed but has been masked by the fact the
drivers/scsi/pcmcia used subdir-(y|m) instead of obj-(y|m) which made
drivers/scsi/pcmcia/built_in.o not linked into the kernel and thus
avoided the duplicate symbols during compilation.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
Ah... missed that one. Here's the updated version.

drivers/scsi/aha152x.c | 12 ++++++------
drivers/scsi/aha152x.h | 20 +++++++++++++++++---
drivers/scsi/fdomain.c | 20 ++++++++++----------
drivers/scsi/fdomain.h | 21 +++++++++++++++++----
drivers/scsi/pcmcia/aha152x_stub.c | 10 ++++++----
drivers/scsi/pcmcia/fdomain_stub.c | 10 ++++++----
6 files changed, 62 insertions(+), 31 deletions(-)

diff --git a/drivers/scsi/aha152x.c b/drivers/scsi/aha152x.c
index ea8c699..0204f44 100644
--- a/drivers/scsi/aha152x.c
+++ b/drivers/scsi/aha152x.c
@@ -769,7 +769,7 @@ static irqreturn_t swintr(int irqno, void *dev_id)
return IRQ_HANDLED;
}

-struct Scsi_Host *aha152x_probe_one(struct aha152x_setup *setup)
+struct Scsi_Host *GLOBAL(aha152x_probe_one)(struct aha152x_setup *setup)
{
struct Scsi_Host *shpnt;

@@ -905,7 +905,7 @@ out_host_put:
return NULL;
}

-void aha152x_release(struct Scsi_Host *shpnt)
+void GLOBAL(aha152x_release)(struct Scsi_Host *shpnt)
{
if (!shpnt)
return;
@@ -1327,7 +1327,7 @@ static void reset_ports(struct Scsi_Host *shpnt)
* Reset the host (bus and controller)
*
*/
-int aha152x_host_reset_host(struct Scsi_Host *shpnt)
+int GLOBAL(aha152x_host_reset_host)(struct Scsi_Host *shpnt)
{
DPRINTK(debug_eh, KERN_DEBU...

To: Tejun Heo <htejun@...>
Cc: Kamalesh Babulal <kamalesh@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-scsi@...>, <fischer@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Samuel Ortiz <samuel@...>, James Bottomley <James.Bottomley@...>
Date: Monday, January 21, 2008 - 5:56 am

The right fix would be to compile it only once and attach it to both
busses. It would be nice if someone could look into that instead of
hacking around the issue.

--

To: Christoph Hellwig <hch@...>
Cc: Tejun Heo <htejun@...>, Kamalesh Babulal <kamalesh@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-scsi@...>, <fischer@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Samuel Ortiz <samuel@...>
Date: Monday, January 21, 2008 - 10:59 am

I agree in principle, but without the hardware such a change would be
untested ... which is what makes me worry about it.

James

--

To: Tejun Heo <htejun@...>
Cc: Kamalesh Babulal <kamalesh@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-scsi@...>, <fischer@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Samuel Ortiz <samuel@...>
Date: Friday, January 18, 2008 - 10:58 am

Actually, isn't the better fix just to return to the original behaviour?

As you pointed out, using the subdir instead of obj meant that although
the modules were built, the drivers were never linked into the main
kernel. According to the records, this has been the default forever, so
there can be no-one anywhere relying on these drivers being built in.
Actually, as old style pcmcia drivers, I'm not sure there's much value
building them into the kernel anyway.

So just modify scsi/pcmcia/Kconfig to make them all depend on m.

James

--

To: James Bottomley <James.Bottomley@...>
Cc: Kamalesh Babulal <kamalesh@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-scsi@...>, <fischer@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Samuel Ortiz <samuel@...>
Date: Friday, January 18, 2008 - 7:27 pm

Yeap, there is no problem if you don't allow them to be linked into the
kernel. If that's how you want it, please go ahead.

I personally think it's a bit odd to disallow building into kernel
because of the peculiarity of the implementation (including c files and
compiling them slightly differently) and also no one reporting doesn't
necessarily mean no one has attempted it and failed.

Thanks.

--
tejun
--

To: Tejun Heo <htejun@...>
Cc: Kamalesh Babulal <kamalesh@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-scsi@...>, <fischer@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Samuel Ortiz <samuel@...>
Date: Friday, January 18, 2008 - 7:32 pm

Heh ... I'll make you a deal. Find just one user of one of these
drivers who can make use of them built in, and I'll apply the patch.

I'm just a bit reluctant to touch these drivers, since they're all
incredibly ancient. We don't have good luck with simple transformation
patches on the older drivers ... and it seems to take months before
anyone notices there's a problem.

James

--

To: Tejun Heo <htejun@...>
Cc: Kamalesh Babulal <kamalesh@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-scsi@...>, <fischer@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Samuel Ortiz <samuel@...>
Date: Friday, January 18, 2008 - 7:47 pm

This is the patch that will return them to their original behaviour.

James

---
diff --git a/drivers/scsi/pcmcia/Kconfig b/drivers/scsi/pcmcia/Kconfig
index fa481b5..53857c6 100644
--- a/drivers/scsi/pcmcia/Kconfig
+++ b/drivers/scsi/pcmcia/Kconfig
@@ -6,7 +6,8 @@ menuconfig SCSI_LOWLEVEL_PCMCIA
bool "PCMCIA SCSI adapter support"
depends on SCSI!=n && PCMCIA!=n

-if SCSI_LOWLEVEL_PCMCIA && SCSI && PCMCIA
+# drivers have problems when build in, so require modules
+if SCSI_LOWLEVEL_PCMCIA && SCSI && PCMCIA && m

config PCMCIA_AHA152X
tristate "Adaptec AHA152X PCMCIA support"

--

To: James Bottomley <James.Bottomley@...>
Cc: Kamalesh Babulal <kamalesh@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-scsi@...>, <fischer@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Samuel Ortiz <samuel@...>
Date: Friday, January 18, 2008 - 7:54 pm

Looks good to me.

--
tejun
--

To: James Bottomley <James.Bottomley@...>
Cc: Kamalesh Babulal <kamalesh@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-scsi@...>, <fischer@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Samuel Ortiz <samuel@...>
Date: Friday, January 18, 2008 - 7:46 pm

I don't think I can. I didn't even know they were isa ones before

Alright then, please go ahead and disallow built-in.

Thanks.

--
tejun
--

To: James Bottomley <James.Bottomley@...>
Cc: Kamalesh Babulal <kamalesh@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-scsi@...>, <fischer@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Samuel Ortiz <samuel@...>
Date: Friday, January 18, 2008 - 7:28 pm

Actually what's better would be to make all symbols static and include
the c file directly into the stub file. How about that?

--
tejun
--

To: Tejun Heo <htejun@...>
Cc: Kamalesh Babulal <kamalesh@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-scsi@...>, <fischer@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Samuel Ortiz <samuel@...>, James Bottomley <James.Bottomley@...>
Date: Friday, January 18, 2008 - 3:30 am

Hi Tejun Heo,

Thanks, I have tested the patch, it fixes both build failures.

Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
---
Ah... missed that one. Here's the updated version.

drivers/scsi/aha152x.c | 12 ++++++------
drivers/scsi/aha152x.h | 20 +++++++++++++++++---
drivers/scsi/fdomain.c | 20 ++++++++++----------
drivers/scsi/fdomain.h | 21 +++++++++++++++++----
drivers/scsi/pcmcia/aha152x_stub.c | 10 ++++++----
drivers/scsi/pcmcia/fdomain_stub.c | 10 ++++++----
6 files changed, 62 insertions(+), 31 deletions(-)

diff --git a/drivers/scsi/aha152x.c b/drivers/scsi/aha152x.c
index ea8c699..0204f44 100644
--- a/drivers/scsi/aha152x.c
+++ b/drivers/scsi/aha152x.c
@@ -769,7 +769,7 @@ static irqreturn_t swintr(int irqno, void *dev_id)
return IRQ_HANDLED;
}

-struct Scsi_Host *aha152x_probe_one(struct aha152x_setup *setup)
+struct Scsi_Host *GLOBAL(aha152x_probe_one)(struct aha152x_setup *setup)
{
struct Scsi_Host *shpnt;

@@ -905,7 +905,7 @@ out_host_put:
return NULL;
}

-void aha152x_release(struct Scsi_Host *shpnt)
+void GLOBAL(aha152x_release)(struct Scsi_Host *shpnt)
{
if (!shpnt)
return;
@@ -1327,7 +1327,7 @@ static void reset_ports(struct Scsi_Host *shpnt)
* Reset the host (bus and controller)
*
*/
-int aha152x_host_reset_host(struct Scsi_Host *shpnt)
+int GLOBAL(aha152x_host_reset_host)(struct Scsi_Host *shpnt)
{
DPRINTK(debug_eh, KERN_DEBUG "scsi%d: host reset\n", shpnt->host_no);

@@ -1345,7 +1345,7 @@ int aha152x_host_reset_host(struct Scsi_Host *shpnt)
*/
static int aha152x_host_reset(Scsi_Cmnd *SCpnt)
{
- return aha152x_host_reset_host(SCpnt->device->host);
+ return GLOBAL(aha152x_host_reset_host)(SCpnt->device->host);
}

/*
@@ -3916,7 +3916,7 @@ static int __init aha152x_init(void)

for (i=0; i<setup_count; i++) {
if ( request_region(setup[i].io_port, IO_RA...

To: Andrew Morton <akpm@...>
Cc: Kamalesh Babulal <kamalesh@...>, <linux-kernel@...>, <linux-scsi@...>, <fischer@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Samuel Ortiz <samuel@...>, James Bottomley <James.Bottomley@...>
Date: Thursday, January 17, 2008 - 8:53 pm

aha152x.c is built twice - once for the isa driver and once for the
PCMCIA one. Through #ifdefs, the compiled codes are slightly
different; thus, global symbols need to be given different names
depending on which flavor is being built. This patch adds GLOBAL()
macro to aha152x.h which changes the symbol depending on PCMCIA.

This bug has always existed but has been masked by the fact the
drivers/scsi/pcmcia used subdir-(y|m) instead of obj-(y|m) which made
drivers/scsi/pcmcia/built_in.o not linked into the kernel and thus
avoided the duplicate symbols during compilation.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
drivers/scsi/aha152x.c | 12 ++++++------
drivers/scsi/aha152x.h | 20 +++++++++++++++++---
drivers/scsi/pcmcia/aha152x_stub.c | 10 ++++++----
3 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/aha152x.c b/drivers/scsi/aha152x.c
index ea8c699..0204f44 100644
--- a/drivers/scsi/aha152x.c
+++ b/drivers/scsi/aha152x.c
@@ -769,7 +769,7 @@ static irqreturn_t swintr(int irqno, void *dev_id)
return IRQ_HANDLED;
}

-struct Scsi_Host *aha152x_probe_one(struct aha152x_setup *setup)
+struct Scsi_Host *GLOBAL(aha152x_probe_one)(struct aha152x_setup *setup)
{
struct Scsi_Host *shpnt;

@@ -905,7 +905,7 @@ out_host_put:
return NULL;
}

-void aha152x_release(struct Scsi_Host *shpnt)
+void GLOBAL(aha152x_release)(struct Scsi_Host *shpnt)
{
if (!shpnt)
return;
@@ -1327,7 +1327,7 @@ static void reset_ports(struct Scsi_Host *shpnt)
* Reset the host (bus and controller)
*
*/
-int aha152x_host_reset_host(struct Scsi_Host *shpnt)
+int GLOBAL(aha152x_host_reset_host)(struct Scsi_Host *shpnt)
{
DPRINTK(debug_eh, KERN_DEBUG "scsi%d: host reset\n", shpnt->host_no);

@@ -1345,7 +1345,7 @@ int aha152x_host_reset_host(struct Scsi_Host *shpnt)
*/
static int aha152x_host_reset(Scsi_Cmnd *SCpnt)
{
- return aha152x_host_reset_host(SCpnt->device->host);
+ return GLOBAL(aha1...

To: Tejun Heo <htejun@...>
Cc: Andrew Morton <akpm@...>, Kamalesh Babulal <kamalesh@...>, <linux-kernel@...>, <linux-scsi@...>, <fischer@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Samuel Ortiz <samuel@...>, James Bottomley <James.Bottomley@...>
Date: Friday, January 18, 2008 - 2:29 am

Hi Tejun Heo,

Thanks, I have tested the patch fixes the build failure on aha152x.c.
Tested-By: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>

Thanks & Regards,
Kamalesh Babulal.
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, Torsten Kaiser <just.for.lkml@...>, NeilBrown <neilb@...>, <mingo@...>, <linux-raid@...>
Date: Thursday, January 17, 2008 - 11:23 am

still the same md issue (do_md_run returns -22=EINVAL) as in -rc6-mm1 reported
by Thorsten here:
http://lkml.org/lkml/2007/12/27/45

Is there around any fix for this?

Having 0.90 raid 0 and 1, commenting this out helps:
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8633bd4..9b8ecc8 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -3292,8 +3292,8 @@ static int do_md_run(mddev_t * mddev)
* Analyze all RAID superblock(s)
*/
if (!mddev->raid_disks) {
- if (!mddev->persistent)
- return -EINVAL;
+/* if (!mddev->persistent)
+ return -EINVAL;*/
analyze_sbs(mddev);
}

The persistency is marked even in analyze_sbs->validate_super, I guess?
--

To: Jiri Slaby <jirislaby@...>
Cc: <linux-kernel@...>, Torsten Kaiser <just.for.lkml@...>, NeilBrown <neilb@...>, <mingo@...>, <linux-raid@...>
Date: Thursday, January 17, 2008 - 3:01 pm

--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, Jeff Dike <jdike@...>, <user-mode-linux-devel@...>
Date: Thursday, January 17, 2008 - 9:56 am

This patch fixes this building error:
...
drivers/char/mem.c: In function ‘read_mem’:
drivers/char/mem.c:136: error: implicit declaration of function ‘unxlate_dev_mem_ptr’
...

Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>

---

Index: linux/include/asm-um/io.h
===================================================================
--- linux.orig/include/asm-um/io.h
+++ linux/include/asm-um/io.h
@@ -27,6 +27,7 @@ static inline void * phys_to_virt(unsign
* access
*/
#define xlate_dev_mem_ptr(p) __va(p)
+#define unxlate_dev_mem_ptr(p, ptr)

/*
* Convert a virtual cached pointer to an uncached pointer
--

To: WANG Cong <xiyou.wangcong@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, Jeff Dike <jdike@...>, <user-mode-linux-devel@...>, David Miller <davem@...>, <sparclinux@...>
Date: Thursday, January 17, 2008 - 2:11 pm

I see this on sparc64 as well:

CC drivers/char/mem.o
drivers/char/mem.c: In function 'read_mem':
drivers/char/mem.c:136: error: implicit declaration of function 'unxlate_dev_mem_ptr'
make[2]: *** [drivers/char/mem.o] Error 1
make[1]: *** [drivers/char] Error 2
make: *** [drivers] Error 2

Does sparc64 need similar fix?

Regards,

Mariusz
--

To: Mariusz Kozlowski <m.kozlowski@...>
Cc: WANG Cong <xiyou.wangcong@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, Jeff Dike <jdike@...>, <user-mode-linux-devel@...>, David Miller <davem@...>, <sparclinux@...>
Date: Thursday, January 17, 2008 - 2:56 pm

Probably - it seems that xlate_dev_mem_ptr can now introduce
side-effects which need to be undone with unxlate_dev_mem_ptr.

Jeff

--
Work email - jdike at linux dot intel dot com
--

To: Mariusz Kozlowski <m.kozlowski@...>
Cc: WANG Cong <xiyou.wangcong@...>, <linux-kernel@...>, Jeff Dike <jdike@...>, <user-mode-linux-devel@...>, David Miller <davem@...>, <sparclinux@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Venkatesh Pallipadi <venkatesh.pallipadi@...>
Date: Thursday, January 17, 2008 - 2:56 pm

The PAT patches strike again.

Ingo, I think you might need to toss some cross-compilers into that build
test setup of yours.

--

To: Andrew Morton <akpm@...>, Mariusz Kozlowski <m.kozlowski@...>
Cc: WANG Cong <xiyou.wangcong@...>, <linux-kernel@...>, Jeff Dike <jdike@...>, <user-mode-linux-devel@...>, David Miller <davem@...>, <sparclinux@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Thursday, January 17, 2008 - 3:38 pm

These functions were defined for other archs in asm-generic/iomap.h.
We need all archs including it in io.h. I now see only few archs are
including it..

Apart from unxlate, there is also ioremap_wc which is defined in the
same way.

I can send a patch for this. But, I don't have cross compiler setup for
all archs to test. Andrew, I will need your help.

Thanks,
Venki
--

To: Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Andrew Morton <akpm@...>, Mariusz Kozlowski <m.kozlowski@...>, WANG Cong <xiyou.wangcong@...>, <linux-kernel@...>, Jeff Dike <jdike@...>, <user-mode-linux-devel@...>, David Miller <davem@...>, <sparclinux@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Thursday, January 17, 2008 - 5:14 pm

And while we're on the subject, what's the deal with these, in
include/asm-x86/io.h?

#define ioremap_wc ioremap_wc
#define unxlate_dev_mem_ptr unxlate_dev_mem_ptr

Jeff

--
Work email - jdike at linux dot intel dot com
--

To: Jeff Dike <jdike@...>
Cc: Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Andrew Morton <akpm@...>, Mariusz Kozlowski <m.kozlowski@...>, WANG Cong <xiyou.wangcong@...>, <linux-kernel@...>, <user-mode-linux-devel@...>, David Miller <davem@...>, <sparclinux@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Thursday, January 17, 2008 - 5:41 pm

If archs want to override the defaults for these two functions, they define
the above and then include asm-generic/iomap.h.

Archs which doesnt want to implement anything in these new funcs just have to
include asm-generic/iomap.h which has the proper stubs.

So, a patch like the below is what is required here for all archs to
include asm-generic iomap.h (without the other patch that
defines null unxlate in asm specific header).

Totally untested.

Thanks,
Venki

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

Index: linux-2.6.git/include/asm-arm/io.h
===================================================================
--- linux-2.6.git.orig/include/asm-arm/io.h 2008-01-17 06:28:06.000000000 -0800
+++ linux-2.6.git/include/asm-arm/io.h 2008-01-17 06:39:13.000000000 -0800
@@ -27,6 +27,8 @@
#include <asm/byteorder.h>
#include <asm/memory.h>

+#include <asm-generic/iomap.h>
+
/*
* ISA I/O bus memory addresses are 1:1 with the physical address.
*/
Index: linux-2.6.git/include/asm-avr32/io.h
===================================================================
--- linux-2.6.git.orig/include/asm-avr32/io.h 2008-01-17 06:28:06.000000000 -0800
+++ linux-2.6.git/include/asm-avr32/io.h 2008-01-17 06:39:13.000000000 -0800
@@ -10,6 +10,8 @@

#include <asm/arch/io.h>

+#include <asm-generic/iomap.h>
+
/* virt_to_phys will only work when address is in P1 or P2 */
static __inline__ unsigned long virt_to_phys(volatile void *address)
{
Index: linux-2.6.git/include/asm-blackfin/io.h
===================================================================
--- linux-2.6.git.orig/include/asm-blackfin/io.h 2008-01-17 06:28:06.000000000 -0800
+++ linux-2.6.git/include/asm-blackfin/io.h 2008-01-17 06:39:13.000000000 -0800
@@ -8,6 +8,8 @@
#endif
#include <linux/compiler.h>

+#include <asm-generic/iomap.h>
+
/*
* These are for ISA/PCI shared memory _only_ and should never be used
* on any other type of...

To: Venki Pallipadi <venkatesh.pallipadi@...>
Cc: Andrew Morton <akpm@...>, Mariusz Kozlowski <m.kozlowski@...>, WANG Cong <xiyou.wangcong@...>, <linux-kernel@...>, <user-mode-linux-devel@...>, David Miller <davem@...>, <sparclinux@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Thursday, January 17, 2008 - 7:08 pm

[Empty message]
To: Jeff Dike <jdike@...>
Cc: Andrew Morton <akpm@...>, Mariusz Kozlowski <m.kozlowski@...>, WANG Cong <xiyou.wangcong@...>, <linux-kernel@...>, <user-mode-linux-devel@...>, David Miller <davem@...>, <sparclinux@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Thursday, January 17, 2008 - 7:17 pm

Later there is code in generic.h which is doing
#ifndef ioremap_wc
#define ioremap_wc ioremap_nocache
#endif

Thanks,
Venki
--

To: Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Andrew Morton <akpm@...>, Mariusz Kozlowski <m.kozlowski@...>, WANG Cong <xiyou.wangcong@...>, <linux-kernel@...>, <user-mode-linux-devel@...>, David Miller <davem@...>, <sparclinux@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Thursday, January 17, 2008 - 10:19 pm

Ah, that makes a bit more sense.

It'd be nice if there was less of a WTF factor there, though.

Jeff

--
Work email - jdike at linux dot intel dot com

--

To: Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Andrew Morton <akpm@...>, WANG Cong <xiyou.wangcong@...>, <linux-kernel@...>, Jeff Dike <jdike@...>, <user-mode-linux-devel@...>, David Miller <davem@...>, <sparclinux@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Thursday, January 17, 2008 - 4:42 pm

I can confirm that this fixes the build problem for sparc64 here.

Regards,

Mariusz
--

To: Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Mariusz Kozlowski <m.kozlowski@...>, WANG Cong <xiyou.wangcong@...>, <linux-kernel@...>, Jeff Dike <jdike@...>, <user-mode-linux-devel@...>, David Miller <davem@...>, <sparclinux@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Thursday, January 17, 2008 - 3:44 pm

Well.

- there are a bunch of cross-compiler binaries in
http://userweb.kernel.org/~akpm/cross-compilers/

- I'll (again) encourage Ingo to add cross-compilation testing to his
auto-testing setup.

- I ran out of steam (and the selinux bug crashed my remote
cross-compilation test box) so I didn't do much cross-compilation testing
on rc8-mm1: just alpha and ia64 iirc.

I'd suggest that you just prepare a best-effort patch and when I next get
around to doing more cross-compilation any problems shold get weeded out.

--

To: WANG Cong <xiyou.wangcong@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, Jeff Dike <jdike@...>, <user-mode-linux-devel@...>
Date: Thursday, January 17, 2008 - 1:59 pm

ACK

Jeff

--
Work email - jdike at linux dot intel dot com
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>
Date: Thursday, January 17, 2008 - 9:54 am

Hi Andrew,

The 2.6.24-rc8-mm1 kernel panic while bootup with bootup message

Dual Core AMD Opteron(tm) Processor 270 stepping 02
Unable to handle kernel paging request at 0000000000004a78 RIP:
[<ffffffff8026f966>] __alloc_pages+0x40/0x31e
PGD 0
Oops: 0000 [1] SMP
last sysfs file:
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-rc8-mm1-autotest #1
RIP: 0010:[<ffffffff8026f966>] [<ffffffff8026f966>] __alloc_pages+0x40/0x31e
RSP: 0000:ffff81003f9b9c60 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
RDX: 0000000000004a70 RSI: 0000000000000605 RDI: ffffffff805a6f66
RBP: 00000000000000d0 R08: 00380800000000c0 R09: 000000000003db89
R10: ffffe20000fe6880 R11: ffffffff806287b0 R12: 0000000000004a70
R13: 0000000000000000 R14: 0000000000000286 R15: ffff81003f9b6000
FS: 0000000000000000(0000) GS:ffffffff80664000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000004a78 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff81003f9b8000, task ffff81003f9b6000)
Stack: 000000000000c0d0 0000001000000000 ffffffff8027574f ffff81000000e5c8
000000000000c0d0 ffffffff8026f320 ffff81003f9b9c88 0000000000000000
0000000000000000 ffffffff807fac90 ffffffff807fac90 0000000000000286
Call Trace:
[<ffffffff8027574f>] ? zone_statistics+0x3f/0x97
[<ffffffff8026f320>] ? get_page_from_freelist+0x463/0x5b5
[<ffffffff8028d7b8>] ? new_slab+0x10e/0x261
[<ffffffff8028d92b>] ? get_new_slab+0x20/0xaa
[<ffffffff8028dad8>] ? __slab_alloc+0x123/0x182
[<ffffffff8026e5a1>] ? process_zones+0x79/0x15e
[<ffffffff8028db73>] ? kmem_cache_alloc_node+0x3c/0x70
[<ffffffff8026e5a1>] ? process_zones+0x79/0x15e
[<ffffffff804f15b9>] ? _spin_lock_irqsave+0x9/0xe
[<ffffffff8026e6b9>]...

To: Kamalesh Babulal <kamalesh@...>
Cc: <linux-kernel@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Thursday, January 17, 2008 - 2:54 pm

Can you please bisect it? I'd start with git-x86. These:

ssb-add-ssb_pcihost_set_power_state-function.patch
b44-power-down-phy-when-interface-down.patch
drivers-net-wireless-iwlwifi-iwl-3945c-fix-printk-warning.patch
drivers-net-wireless-iwlwifi-iwl-4965c-fix-printk-warning.patch
drivers-net-wireless-rt2x00-rt2x00usbc-fix-uninitialized-var-warning.patch
-> git-ipwireless_cs.patch
#
revert-kvm-stuff-to-make-git-x86-apply.patch
git-x86.patch
git-x86-fixup.patch
git-x86-fixup-2.patch
acpi-default-unmap-fixpatch.patch
git-x86-vs-pm-acquire-device-locks-on-suspend-rev-3.patch
git-x86-fix-doubly-merged-patch.patch
pci-dont-load-acpi_php-when-acpi-is-disabled.patch
pci-dont-load-acpi_php-when-acpi-is-disabled-fix.patch
#
#X86-ANDI-START
#X86-ANDI-END
#
#
-> iommu-sg-merging-add-device_dma_parameters-structure.patch

--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Sunday, January 20, 2008 - 2:24 am

The kernel boots up while patches applied till here and fails with the next check point.

--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--

To: Kamalesh Babulal <kamalesh@...>
Cc: <linux-kernel@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Christoph Lameter <clameter@...>
Date: Sunday, January 20, 2008 - 3:17 am

There is no way in which
iommu-sg-merging-add-device_dma_parameters-structure.patch can cause
__alloc_pages to crash. I'd be suspecting some weird interaction between
this patch's changes to kernel layout and the real bug.

I don't know what the real bug is though. Perhaps x86_64 memory
enumeration or NUMA initialisation problems. Does it look familar to
anyone?

--

To: Andrew Morton <akpm@...>
Cc: Kamalesh Babulal <kamalesh@...>, <linux-kernel@...>, Andy Whitcroft <apw@...>, Balbir Singh <balbir@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Thursday, January 17, 2008 - 3:00 pm

Arjan added them. They mean that those addresses are Questionable,

---
~Randy
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <apw@...>, <balbir@...>
Date: Thursday, January 17, 2008 - 9:34 am

Hi Andrew,

The kernel build fails with following error

arch/x86/kernel/mpparse_32.c: In function `smp_read_mpc_oem':
arch/x86/kernel/mpparse_32.c:318: error: `oemtable' undeclared (first use in this function)
arch/x86/kernel/mpparse_32.c:318: error: (Each undeclared identifier is reported only once
arch/x86/kernel/mpparse_32.c:318: error: for each function it appears in.)
arch/x86/kernel/mpparse_32.c:332: error: `mpc_phys' undeclared (first use in this function)

This patch is build tested only.

Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
---
--- linux-2.6.24-rc8/arch/x86/kernel/mpparse_32.c 2008-01-17 18:02:45.000000000 +0530
+++ linux-2.6.24-rc8/arch/x86/kernel/~mpparse_32.c 2008-01-17 18:17:29.000000000 +0530
@@ -32,6 +32,7 @@
#include <mach_apic.h>
#include <mach_apicdef.h>
#include <mach_mpparse.h>
+#include <asm/mpspec_def.h>

/* Have we found an MP table */
int smp_found_config;
@@ -329,7 +330,7 @@ static void __init smp_read_mpc_oem(unsi
oem_length = oemtable->oem_length;
/* Unmap header and map full base table */
early_iounmap(oemtable, sizeof(struct mp_config_oemtable));
- oemtable = (struct mp_config_oemtable *)early_ioremap(mpc_phys,
+ oemtable = (struct mp_config_oemtable *)early_ioremap(oemtable_phys,
oem_length);
if (!oemtable) {
printk("MPTABLE: full oemtable map error!\n");
--

To: Kamalesh Babulal <kamalesh@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <apw@...>, <balbir@...>
Date: Thursday, January 17, 2008 - 10:26 am

thanks, applied.

Ingo
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, Jeff Dike <jdike@...>, <user-mode-linux-devel@...>
Date: Thursday, January 17, 2008 - 9:21 am

Hi, Andrew!

Building uml failed in current -mm tree. ;(

The below patch fixes this building error:
...
include/asm/arch/system.h:8:22: error: asm/nops.h: No such file or directory
...

Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>

---

Index: linux/include/asm-um/nops.h
===================================================================
--- /dev/null
+++ linux/include/asm-um/nops.h
@@ -0,0 +1,6 @@
+#ifndef __UM_NOPS_H
+#define __UM_NOPS_H
+
+#include "asm/arch/nops.h"
+
+#endif
--

To: WANG Cong <xiyou.wangcong@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <user-mode-linux-devel@...>
Date: Thursday, January 17, 2008 - 1:59 pm

ACK

Jeff

--
Work email - jdike at linux dot intel dot com
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, Linux ACPI mailing list <linux-acpi@...>, Intel E/100 mailing list <e1000-devel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Thursday, January 17, 2008 - 8:46 am

Hi, Andrew,

May be it was one of the conflicts, but my system fails to get
ethernet working with this version. I see

e100: Intel(R) PRO/100 Network Driver, 3. 5.23-k4-NAPI
e100: Copyright(c) 1999-2006 Intel Corporation
ACPI: PCI Interrupt 0000:04:08.0[A] -> GSI 20 (level, low) -> IRQ 20
modprobe:2584 conflicting cache attribute 50000000-50001000
uncached<->default
e100: 0000:04:08.0: e100_probe: Cannot map device registers, aborting.
ACPI: PCI interrupt for device 0000:04:08.0 disabled
e100: probe of 0000:04:08.0 failed with error -12

Other interesting boot information

Using ACPI (MADT) for SMP configuration information
PM: Registered nosave memory: 000000000008f000 - 00000000000a0000
PM: Registered nosave memory: 00000000000a0000 - 00000000000e0000
PM: Registered nosave memory: 00000000000e0000 - 0000000000100000
PM: Registered nosave memory: 000000003e5d1000 - 000000003e6e5000
PM: Registered nosave memory: 000000003f574000 - 000000003f57c000
PM: Registered nosave memory: 000000003f62d000 - 000000003f631000
PM: Registered nosave memory: 000000003f6a7000 - 000000003f6e9000
PM: Registered nosave memory: 000000003f6ed000 - 000000003f6ff000
Allocating PCI resources starting at 50000000 (gap: 40000000:bff80000)

PCI: Bridge: 0000:00:1c.0
IO window: disabled.
MEM window:
0x50300000-0x503fffff
PREFETCH window: disabled.
PCI: Bridge: 0000:00:1c.2
IO window: disabled.
MEM window:
0x50400000-0x504fffff
PREFETCH window: disabled.
PCI: Bridge: 0000:00:1c.3
IO window: disabled.
MEM window:
0x50500000-0x505fffff
PREFETCH window: disabled.
PCI: Bridge: 0000:00:1e.0
IO window: 1000-1fff
MEM window:
0x50000000-0x500fffff
PREFETCH window: disabled.

I am yet to get down to the root cause, thought I'd report it first to
the x86 and ACPI list to see if someone has seen the problem before.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--

To: <balbir@...>
Cc: <linux-kernel@...>, Linux ACPI mailing list <linux-acpi@...>, Intel E/100 mailing list <e1000-devel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Venkatesh Pallipadi <venkatesh.pallipadi@...>
Date: Thursday, January 17, 2008 - 2:40 pm

It appears that the new PAT code didn't like e100's pci_iomap(). Venki, can you
take a look please?

--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, Linux ACPI mailing list <linux-acpi@...>, Intel E/100 mailing list <e1000-devel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, Venkatesh Pallipadi <venkatesh.pallipadi@...>
Date: Thursday, January 17, 2008 - 4:25 pm

I tried booting with nopat with no effect.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--

To: Andrew Morton <akpm@...>, <balbir@...>
Cc: <linux-kernel@...>, Linux ACPI mailing list <linux-acpi@...>, Intel E/100 mailing list <e1000-devel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Thursday, January 17, 2008 - 3:22 pm

This seems similar to one problem we saw yday. May not be specific to
e1000. May be at some generic pci code.

Some address range here is being mapped with conflicting types.
Somewhere the range was mapped with default (write-back). Later
pci_iomap() is mapping that region as uncacheable which is basically
aliasing. PAT code detects the aliasing and fails the second uncacheable
request which leads in the failure.

We are trying to find who exactly is mapping this with default at the
beginning.
Balbir: Full dmesg with debug boot parameter may help.

Thanks,
Venki
--

To: Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, Linux ACPI mailing list <linux-acpi@...>, Intel E/100 mailing list <e1000-devel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Thursday, January 17, 2008 - 7:04 pm

Venki/Andrew,

I think I found the root cause of the problem and a fix for it.
The fix works for me.

Description
-----------

With the introduction of reserve_mattr() and free_mattr(), the ioremap* routines
started exploiting it. The recent 2.6.24-rc8-mm1 kernel has a peculiar problem
where in, certain devices disappear. In my case for example

e100: Intel(R) PRO/100 Network Driver, 3. 5.23-k4-NAPI
e100: Copyright(c) 1999-2006 Intel Corporation
ACPI: PCI Interrupt 0000:04:08.0[A] -> GSI 20 (level, low) -> IRQ 20
modprobe:2584 conflicting cache attribute 50000000-50001000 uncached<->default
e100: 0000:04:08.0: e100_probe: Cannot map device registers, aborting.
ACPI: PCI interrupt for device 0000:04:08.0 disabled

On further analysis, it was discovered that quirk_e100_interrupt() calls
ioremap(), which reserves memory attributes for the e100 card, but iounmap()
does not free it. The patch below removes the check fixes this problem.
It removes for the check of (p->flags >> 20), which checks for architecture
specific bits set on the vm_struct's flags member. ioremap() unconditionally
reserves memory attributes, iounmap() should undo it.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

arch/x86/mm/ioremap_32.c | 2 +-
arch/x86/mm/ioremap_64.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff -puN arch/x86/mm/ioremap_32.c~fix-mattr-issue arch/x86/mm/ioremap_32.c
--- linux-2.6.24-rc8/arch/x86/mm/ioremap_32.c~fix-mattr-issue 2008-01-18 04:25:33.000000000 +0530
+++ linux-2.6.24-rc8-balbir/arch/x86/mm/ioremap_32.c 2008-01-18 04:25:53.000000000 +0530
@@ -220,7 +220,7 @@ void iounmap(volatile void __iomem *addr
}

/* Reset the direct mapping. Can block */
- if (p->flags >> 20) {
+ if (p->flags) {
free_mattr(p->phys_addr, p->phys_addr + get_vm_area_size(p),
p->flags>>20);
ioremap_change_attr(p->phys_addr, get_vm_area_size(p), 0);
diff -puN arch/x86/mm/ioremap_64....

To: Balbir Singh <balbir@...>
Cc: Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, Linux ACPI mailing list <linux-acpi@...>, Intel E/100 mailing list <e1000-devel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, <andreas.herrmann3@...>
Date: Thursday, January 17, 2008 - 9:42 pm

Thanks Balbir. But the appended fix is more clean and appropriate. Can you
please check if it works.
---

Fix the iounmap() to call free_matrr() unconditionally.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

diff --git a/arch/x86/mm/ioremap_32.c b/arch/x86/mm/ioremap_32.c
index ae9c8b3..4d5bea8 100644
--- a/arch/x86/mm/ioremap_32.c
+++ b/arch/x86/mm/ioremap_32.c
@@ -201,12 +201,11 @@ void iounmap(volatile void __iomem *addr)
return;
}

+ free_mattr(p->phys_addr, p->phys_addr + get_vm_area_size(p),
+ p->flags>>20);
/* Reset the direct mapping. Can block */
- if (p->flags >> 20) {
- free_mattr(p->phys_addr, p->phys_addr + get_vm_area_size(p),
- p->flags>>20);
+ if (p->flags >> 20)
ioremap_change_attr(p->phys_addr, get_vm_area_size(p), 0);
- }

/* Finally remove it */
o = remove_vm_area((void *)addr);
diff --git a/arch/x86/mm/ioremap_64.c b/arch/x86/mm/ioremap_64.c
index 022b645..c766327 100644
--- a/arch/x86/mm/ioremap_64.c
+++ b/arch/x86/mm/ioremap_64.c
@@ -183,12 +183,11 @@ void iounmap(volatile void __iomem *addr)
return;
}

+ free_mattr(p->phys_addr, p->phys_addr + get_vm_area_size(p),
+ p->flags>>20);
/* Reset the direct mapping. Can block */
- if (p->flags >> 20) {
- free_mattr(p->phys_addr, p->phys_addr + get_vm_area_size(p),
- p->flags>>20);
+ if (p->flags >> 20)
ioremap_change_attr(p->phys_addr, get_vm_area_size(p), 0);
- }

/* Finally remove it */
o = remove_vm_area((void *)addr);
--

To: Siddha, Suresh B <suresh.b.siddha@...>
Cc: Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, Linux ACPI mailing list <linux-acpi@...>, Intel E/100 mailing list <e1000-devel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, <andreas.herrmann3@...>
Date: Friday, January 18, 2008 - 1:06 am

Yes, it looks better. p->flags is always set, so the check was not doing much.
I also tested it and it works for me!

Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com>

Balbir
--

To: Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: <balbir@...>, <linux-kernel@...>, Linux ACPI mailing list <linux-acpi@...>, Intel E/100 mailing list <e1000-devel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Thursday, January 17, 2008 - 3:40 pm

It sounds to me like you need considerably more runtime debugging and
reporting support in that code. Ensure that it generates enough output
both during regular operation and during failures for you to be able to
diagnose things in a single iteration.

We can always take it out later.

--

To: Andrew Morton <akpm@...>
Cc: Pallipadi, Venkatesh <venkatesh.pallipadi@...>, <balbir@...>, <linux-kernel@...>, Linux ACPI mailing list <linux-acpi@...>, Intel E/100 mailing list <e1000-devel@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
Date: Thursday, January 17, 2008 - 7:33 pm

Patch below makes the interesting printks from PAT non DEBUG.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

Index: linux-2.6.git/arch/x86/mm/ioremap.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/ioremap.c 2008-01-17 03:18:59.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/ioremap.c 2008-01-17 08:11:51.000000000 -0800
@@ -25,10 +25,13 @@
*/
void __iomem *ioremap_wc(unsigned long phys_addr, unsigned long size)
{
- if (pat_wc_enabled)
+ if (pat_wc_enabled) {
+ printk(KERN_INFO "ioremap_wc: addr %lx, size %lx\n",
+ phys_addr, size);
return __ioremap(phys_addr, size, _PAGE_WC);
- else
+ } else {
return ioremap_nocache(phys_addr, size);
+ }
}
EXPORT_SYMBOL(ioremap_wc);

Index: linux-2.6.git/arch/x86/mm/ioremap_32.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/ioremap_32.c 2008-01-17 03:18:59.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/ioremap_32.c 2008-01-17 08:10:58.000000000 -0800
@@ -164,6 +164,8 @@

void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size)
{
+ printk(KERN_INFO "ioremap_nocache: addr %lx, size %lx\n",
+ phys_addr, size);
return __ioremap(phys_addr, size, _PAGE_UC);
}
EXPORT_SYMBOL(ioremap_nocache);
Index: linux-2.6.git/arch/x86/mm/ioremap_64.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/ioremap_64.c 2008-01-17 03:18:59.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/ioremap_64.c 2008-01-17 08:10:13.000000000 -0800
@@ -144,7 +144,7 @@

void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size)
{
- printk(KERN_DEBUG "ioremap_nocache: addr %lx, size %lx\n",
+ printk(KERN_INFO "ioremap_nocache: addr %lx, size %lx\n",
phys_addr, size);
return __ioremap(phys_addr, size, _PAGE_UC);
}
Index: linux-2.6.git/arch/x86/mm/pat.c
=================================================...

To: Andrew Morton <akpm@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Intel E/100 mailing list <e1000-devel@...>, <linux-kernel@...>, Linux ACPI mailing list <linux-acpi@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>, <balbir@...>
Date: Thursday, January 17, 2008 - 3:47 pm

its probably the e100 screaming interrupt disable quirk code doing the

FWIW (nothing) I agree.
--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <linux390@...>, <linux-s390@...>, Andy Whitcroft <apw@...>
Date: Thursday, January 17, 2008 - 8:41 am

Hi Andrew,

The 2.6.24-rc8-mm1 kernel build fails on S390x with build error

arch/s390/kernel/ipl.c: In function `ipl_register_fcp_files':
arch/s390/kernel/ipl.c:415: error: `ipl_subsys' undeclared (first use in this function)
arch/s390/kernel/ipl.c:415: error: (Each undeclared identifier is reported only once
arch/s390/kernel/ipl.c:415: error: for each function it appears in.)
arch/s390/kernel/ipl.c: In function `ipl_init':
arch/s390/kernel/ipl.c:449: error: implicit declaration of function `firmware_register'
arch/s390/kernel/ipl.c:449: error: `ipl_subsys' undeclared (first use in this function)
CC arch/s390/kernel/dis.o
arch/s390/kernel/ipl.c: In function `on_panic_show':
arch/s390/kernel/ipl.c:766: error: implicit declaration of function `shutdown_action_str'
arch/s390/kernel/ipl.c:766: error: `on_panic_action' undeclared (first use in this function)
arch/s390/kernel/ipl.c:766: warning: format argument is not a pointer (arg 3)
arch/s390/kernel/ipl.c:766: warning: format argument is not a pointer (arg 3)
arch/s390/kernel/ipl.c: In function `on_panic_store':
arch/s390/kernel/ipl.c:773: error: `SHUTDOWN_REIPL_STR' undeclared (first use in this function)
arch/s390/kernel/ipl.c:774: error: `on_panic_action' undeclared (first use in this function)
arch/s390/kernel/ipl.c:774: error: `SHUTDOWN_REIPL' undeclared (first use in this function)
arch/s390/kernel/ipl.c:775: error: `SHUTDOWN_DUMP_STR' undeclared (first use in this function)
arch/s390/kernel/ipl.c:777: error: `SHUTDOWN_DUMP' undeclared (first use in this function)
arch/s390/kernel/ipl.c:778: error: `SHUTDOWN_STOP_STR' undeclared (first use in this function)
arch/s390/kernel/ipl.c:780: error: `SHUTDOWN_STOP' undeclared (first use in this function)
arch/s390/kernel/ipl.c: At top level:
arch/s390/kernel/ipl.c:879: error: redefinition of 'ipl_register_fcp_files'
arch/s390/kernel/ipl.c:412: error: previous definition of 'ipl_register_fcp_files' was here
arch/s390/kernel/ipl.c:904: error: redefinition of 'ipl_init'
arch/s390/kern...

To: Kamalesh Babulal <kamalesh@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <linux390@...>, <linux-s390@...>, Andy Whitcroft <apw@...>
Date: Thursday, January 17, 2008 - 10:05 am

This is the fallout from the merge conflict between git390 and Gregs git
tree. This will stay broken until Gregs tree has been merged. Please
ignore the compile failures for now.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

--

Previous thread: none

Next thread: SCHED_FIFO & system() by linux on Thursday, January 17, 2008 - 6:19 am. (5 messages)