Re: [-mm patch] make tcp_splice_data_recv() static

Previous thread: socket locking obscure code by Cyrill Gorcunov on Friday, August 31, 2007 - 9:50 pm. (1 message)

Next thread: BUG POWERPC: snd-powermac hangs since 'Merge 32 and 64 bits asm-powerpc/io.h' by Dave Vasilevsky on Friday, August 31, 2007 - 9:58 pm. (1 message)
From: Andrew Morton
Date: Friday, August 31, 2007 - 9:58 pm

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc4/2.6.23-rc4-mm1/

- git-kbuild is broken and has been dropped

- git-ixgb is broken by git-net and has been dropped

- git-md-accel is broken by MD fixes and has been dropped

- git-v9fs breaks the build on all non-x86 and the fs has been disabled in
  config

- dynticks-for-x86_64 has returned



Changes since 2.6.23-rc3-mm1:


 origin.patch
 git-acpi.patch
 git-alsa.patch
 git-audit-master.patch
 git-avr32.patch
 git-cifs.patch
 git-cpufreq.patch
 git-powerpc.patch
 git-dvb.patch
 git-hwmon.patch
 git-gfs2-nmw.patch
 git-hid.patch
 git-ia64.patch
 git-ieee1394.patch
 git-infiniband.patch
 git-input.patch
 git-jfs.patch
 git-jg-misc.patch
 git-kvm.patch
 git-libata-all.patch
 git-m32r.patch
 git-mips.patch
 git-mmc.patch
 git-mtd.patch
 git-ubi.patch
 git-netdev-all.patch
 git-net.patch
 git-backlight.patch
 git-nfs.patch
 git-nfsd.patch
 git-ocfs2.patch
 git-r8169.patch
 git-selinux.patch
 git-s390.patch
 git-sched.patch
 git-sh.patch
 git-scsi-misc.patch
 git-scsi-rc-fixes.patch
 git-block.patch
 git-unionfs.patch
 git-v9fs.patch
 git-watchdog.patch
 git-wireless.patch
 git-ipwireless_cs.patch
 git-newsetup.patch
 git-xfs.patch
 git-cryptodev.patch
 git-xtensa.patch
 git-kgdb.patch

 git ...
From: Adrian Bunk
Date: Saturday, September 1, 2007 - 7:18 am

This patch doesn't fix 
drivers-video-geode-lxfb_corec-fix-lxfb_setup-warning.patch, together 
they break the compilation (2 - 1 - 1 = 0 < 1):

<--  snip  -->

...
  CC      drivers/video/geode/lxfb_core.o
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/drivers/video/geode/lxfb_core.c: In function ‘lxfb_setup’:
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/drivers/video/geode/lxfb_core.c:567: error: ‘opt’ undeclared (first use in this function)
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/drivers/video/geode/lxfb_core.c:567: error: (Each undeclared identifier is reported only once
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/drivers/video/geode/lxfb_core.c:567: error: for each function it appears in.)
make[4]: *** [drivers/video/geode/lxfb_core.o] Error 1

<--  snip  -->

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Satyam Sharma
Date: Saturday, September 1, 2007 - 10:03 am

True, both (Eugene's and mine) patches are different/independent patches
to remove the "unused variable" warning. Both seem to have got included in
-mm with the result that build broke. One of these 2 patches (either mine
or Eugene's) is superfluous and should be dropped.
-

From: Adrian Bunk
Date: Saturday, September 1, 2007 - 8:19 am

One ktime_sub_ns() should be enough for everyone - and the net tree 
already adds one (even with a correct EXPORT_SYMBOL...).

<--  snip  -->

...
  CC      kernel/hrtimer.o
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/kernel/hrtimer.c:313: error: redefinition of 'ktime_sub_ns'
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/kernel/hrtimer.c:289: error: previous definition of 'ktime_sub_ns' was here
make[2]: *** [kernel/hrtimer.o] Error 1

<--  snip  -->


cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Adrian Bunk
Date: Saturday, September 1, 2007 - 8:44 am

<--  snip  -->

...
  CC      arch/mips/kernel/asm-offsets.s
In file included from include2/asm/processor.h:22,
                 from include2/asm/thread_info.h:15,
                 from 
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/include/linux/thread_info.h:21,
                 from 
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/include/linux/preempt.h:9,
                 from 
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/include/linux/spinlock.h:49,
                 from 
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/include/linux/seqlock.h:29,
                 from 
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/include/linux/time.h:8,
                 from 
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/include/linux/timex.h:57,
                 from 
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/include/linux/sched.h:52,
                 from 
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/arch/mips/kernel/asm-offsets.c:13:
include2/asm/system.h:415:39: error: asm-generic/cmpxchg-local.h: No such file or directory
make[2]: *** [arch/mips/kernel/asm-offsets.s] Error 1

<--  snip  -->


cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Mathieu Desnoyers
Date: Monday, September 3, 2007 - 10:27 pm

Hello,

It is because
"Add cmpxchg64 and cmpxchg64_local to mips" has been added to the
git-mips.patch, but it depends on 
"add-cmpxchg-local-to-generic-for-up.patch" which is not merged yet.

It was an error in my series file.
add-cmpxchg-local-to-generic-for-up.patch should come before these
patches:

i386-cmpxchg64-80386-80486-fallback.patch
add-cmpxchg64-to-alpha.patch
add-cmpxchg64-to-mips.patch
add-cmpxchg64-to-powerpc.patch
add-cmpxchg64-to-x86_64.patch



-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-

From: Ralf Baechle
Date: Tuesday, September 4, 2007 - 3:21 am

I had add-cmpxchg64-to-mips.patch queued myself also but removed it a few
days ago, so next -mm (if it's not out yet?) should be ok again.

  Ralf
-

From: KAMEZAWA Hiroyuki
Date: Friday, August 31, 2007 - 11:53 pm

I met 2 troubles while I compiled rc4-mm1 on x86/UP system,

One on pcnet32.c (patch is attaced below).
One on crypto CONFIG.

== compile log ==
drivers/net/pcnet32.c: In function 'pcnet32_netif_stop':
drivers/net/pcnet32.c:445: warning: unused variable 'lp'
drivers/net/pcnet32.c: In function 'pcnet32_netif_start':
drivers/net/pcnet32.c:455: warning: unused variable 'lp'
drivers/net/pcnet32.c: In function 'pcnet32_interrupt':
drivers/net/pcnet32.c:2622: error: 'struct net_device' has no member named 'napi'
....
crypto/built-in.o: In function `update2':
digest.c:(.text+0x94a): undefined reference to `crypto_km_types'
digest.c:(.text+0x9bf): undefined reference to `crypto_km_types'

digest.c (CONFIG_CRYPTO) uses crypto/scatterwalk.c's object (CONFIG_CRYPTO_ALGAPI)
I meet this when CONFIG_CRYPTO_ALGAPI=m. I need to make CONFIG_CRYPTO_ALGAPI=y.

Regards,
-Kame.
== cut from here ==

 tiny bug fix for pcnet32.c (maybe works well. please confirm.)

 Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

 drivers/net/pcnet32.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: devel-2.6.23-rc4-mm1/drivers/net/pcnet32.c
===================================================================
--- devel-2.6.23-rc4-mm1.orig/drivers/net/pcnet32.c
+++ devel-2.6.23-rc4-mm1/drivers/net/pcnet32.c
@@ -2619,7 +2619,7 @@ pcnet32_interrupt(int irq, void *dev_id)
 			break;
 		}
 #else
-		pcnet32_rx(dev, dev->napi.weight);
+		pcnet32_rx(dev, lp->napi.weight);
 		if (pcnet32_tx(dev)) {
 			/* reset the chip to clear the error condition, then restart */
 			lp->a.reset(ioaddr);



-

From: Andrew Morton
Date: Friday, August 31, 2007 - 11:58 pm

cc netdev, thanks.
-

From: Herbert Xu
Date: Saturday, September 1, 2007 - 1:54 am

Sorry, only tested on x86-64 which doesn't have HIGHMEM.

I've just pushed the following fix into cryptodev-2.6.

commit 25531e010a2a1d0099b62d473244d09e72402ce5
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Sat Sep 1 16:52:13 2007 +0800

    [CRYPTO] api: Kill crypto_km_types

    When scatterwalk is built as a module digest.c was broken because it
    requires the crypto_km_types structure which is in scatterwalk.  This
    patch removes the crypto_km_types structure by encoding the logic into
    crypto_kmap_type directly.

    In fact, this even saves a few bytes of code (not to mention the data
    structure itself) on i386 which is about the only place where it's
    needed.

    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/crypto/internal.h b/crypto/internal.h
index 60acad9..abb01f7 100644
--- a/crypto/internal.h
+++ b/crypto/internal.h
@@ -50,11 +50,16 @@ extern struct list_head crypto_alg_list;
 extern struct rw_semaphore crypto_alg_sem;
 extern struct blocking_notifier_head crypto_chain;
 
-extern enum km_type crypto_km_types[];
-
 static inline enum km_type crypto_kmap_type(int out)
 {
-	return crypto_km_types[(in_softirq() ? 2 : 0) + out];
+	enum km_type type;
+
+	if (in_softirq())
+		type = out * (KM_SOFTIRQ1 - KM_SOFTIRQ0) + KM_SOFTIRQ0;
+	else
+		type = out * (KM_USER1 - KM_USER0) + KM_USER0;
+
+	return type;
 }
 
 static inline void *crypto_kmap(struct page *page, int out)
-

From: Satyam Sharma
Date: Saturday, September 1, 2007 - 2:09 pm

Tangential, but I've often wondered what are the upsides of keeping
CONFIG_CRYPTO_ALGAPI as a separate config option in the first place? Every
single item in crypto/ ends up "select"ing it (directly or transitively)
so it makes all sense to just do away with it and keep it == CONFIG_CRYPTO
in the Makefile, thusly:


[PATCH] crypto: Remove CONFIG_CRYPTO_ALGAPI config option

Because all other options in crypto/ end up selecting it anyway. So let's
make it a default part of the rest of "core" crypto stuff, that gets built
whenever CONFIG_CRYPTO == y.

Signed-off-by: Satyam Sharma <satyam@infradead.org>

---

 arch/s390/crypto/Kconfig |    4 ----
 crypto/Kconfig           |   37 -------------------------------------
 crypto/Makefile          |    7 ++-----
 drivers/crypto/Kconfig   |    2 --
 4 files changed, 2 insertions(+), 48 deletions(-)

diff --git a/arch/s390/crypto/Kconfig b/arch/s390/crypto/Kconfig
index d1defbb..d35f901 100644
--- a/arch/s390/crypto/Kconfig
+++ b/arch/s390/crypto/Kconfig
@@ -1,7 +1,6 @@
 config CRYPTO_SHA1_S390
 	tristate "SHA1 digest algorithm"
 	depends on S390
-	select CRYPTO_ALGAPI
 	help
 	  This is the s390 hardware accelerated implementation of the
 	  SHA-1 secure hash standard (FIPS 180-1/DFIPS 180-2).
@@ -9,7 +8,6 @@ config CRYPTO_SHA1_S390
 config CRYPTO_SHA256_S390
 	tristate "SHA256 digest algorithm"
 	depends on S390
-	select CRYPTO_ALGAPI
 	help
 	  This is the s390 hardware accelerated implementation of the
 	  SHA256 secure hash standard (DFIPS 180-2).
@@ -20,7 +18,6 @@ config CRYPTO_SHA256_S390
 config CRYPTO_DES_S390
 	tristate "DES and Triple DES cipher algorithms"
 	depends on S390
-	select CRYPTO_ALGAPI
 	select CRYPTO_BLKCIPHER
 	help
 	  This us the s390 hardware accelerated implementation of the
@@ -29,7 +26,6 @@ config CRYPTO_DES_S390
 config CRYPTO_AES_S390
 	tristate "AES cipher algorithms"
 	depends on S390
-	select CRYPTO_ALGAPI
 	select CRYPTO_BLKCIPHER
 	help
 	  This is the s390 hardware ...
From: Herbert Xu
Date: Saturday, September 1, 2007 - 6:46 pm

NACK.  ALGAPI exists so that it can be built as a module, as
opposed to CRYPTO which is always built-in.  It's already
invisible to the user so I don't see why you have a problem
with it.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-

From: Satyam Sharma
Date: Saturday, September 1, 2007 - 7:52 pm

I had already noticed that, and was even *expecting* you to reply with
*exactly* this ;-)

[ BTW CRYPTO is _not_ always built-in -- but only when CONFIG_CRYPTO=y ]

Anyway, the natural follow-up to your argument is -- why is the other
stuff in CRYPTO always built-in too ?

Take the crypto_alloc_xxx() callchain for example (I chose it because
it is the _first_ call any cryptoapi user ever has to make, and hence
it's the one that deals with module-loading stuff).

So what finally got exported out of crypto/ to the rest of the kernel
was just the crypto_alloc_xxx() wrapper. That resolves to a call to
crypto_alloc_base() in crypto/api.c, which first loads the specific
low-level algo modules, and then proceeds to crypto_init_ops(), which
itself may, say, resolve to a crypto_init_digest_ops() -- the only
interface exported from digest.c.

The point is, because the module-loading (if necessary) already takes
place before the call to digest.c is made, there is _no_ reason why
even digest.c can't be made modular -- or _any_ of the other CRYPTO
stuff (with the exception of api.c itself, of course) that "always
gets built-in" as you mentioned above.

And so caring about the optimization of making ALGAPI modular rather
than simply built-in with rest of "core" crypto stuff such as digest.c
(which could _also_ have been made modular by the same logic but wasn't)
sounds like a bogus argument to me. [ BTW did you notice that the
__crypto_alloc_tfm() has been EXPORT_SYMBOL'ed _only_ because of one
solitary modular-callsite in algapi.c ? ]


Satyam
-

From: Herbert Xu
Date: Saturday, September 1, 2007 - 8:59 pm

The mid-level code such as digest.c are only built-in because
they are legacy code.  All the new mid-level code such as
blkcipher/hash are registered dynamically.

Once all the digest stuff have been converted to hash digest.c
will be removed.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-

From: Kamalesh Babulal
Date: Saturday, September 1, 2007 - 4:55 am

<snip>
Hi Kamezawa,

I got the pcnet32.c compile failure and after applying the patch compile 
does not fails.

Thanks & Regards,
Kamalesh Babulal.
-

From: thunder7
Date: Saturday, September 1, 2007 - 11:13 am

From: Andrew Morton <akpm@linux-foundation.org>
On this machine (Athlon 64 X2 4600, 4 GiB memory, lots of disks),
2.6.23-rc1-mm2 runs fine. 2.6.23-rc4-mm1 reproducably dies within seconds of starting
a rsync session on another PC against this machine.

NULL pointer dereference
code:	nv_napi_poll+0x108
trace:	net_rx_action+0xab
	__do_softirq+0x74
	call_softirq+0x1c
	do_softirq+0x3d
	irq_exit+0x85
	do_IRQ+0x85
	ret_from_intr+0x0

.config and dmesg output below.

Kind regards,
Jurriaan

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-rc4-mm1
# Sat Sep  1 08:20:54 2007
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_NONIRQ_WAKEUP=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_ZONE_DMA32=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_NR_QUICK=2
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_CMPXCHG=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_DMI=y
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
# CONFIG_CONTAINERS is not set
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
# ...
From: Jeff Garzik
Date: Saturday, September 1, 2007 - 12:05 pm

(added netdev to CC)

I'm guessing that this is net-2.6.24.git's NAPI update.

	Jeff


-

From: Satyam Sharma
Date: Saturday, September 1, 2007 - 5:54 pm

The dmesg you posted below doesn't cover the messages from this oops
itself. As you mentioned you can reproduce this oops easily, please do so,
and post the *full* oops log (if it doesn't get logged to disk, you can
try taking digicam photo, or write down *all* the messages and post here).
I built an x86_64 kernel as per your .config, but don't see any memory
dereference at nv_napi_poll+0x108 -- could be toolchain differences.

Else, can you run:
$ gdb ./vmlinux

and then:
(gdb) l *nv_napi_poll+0x108

and send us the output?


Satyam
-

From: thunder7
Date: Saturday, September 1, 2007 - 10:36 pm

From: Satyam Sharma <satyam@infradead.org>
That seems to be the easier option:

AMD64 :gdb /usr/src/linux-2.6.23-rc4-mm1/vmlinux
GNU gdb 6.6-debian
(gdb) l *nv_napi_poll+0x108
0xffffffff80418f28 is in nv_napi_poll (drivers/net/forcedeth.c:2470).
2465				if ((flags & NV_RX2_CHECKSUMMASK) == NV_RX2_CHECKSUMOK2)/*ip and tcp */ {
2466					skb->ip_summed = CHECKSUM_UNNECESSARY;
2467				} else {
2468					if ((flags & NV_RX2_CHECKSUMMASK) == NV_RX2_CHECKSUMOK1 ||
2469					    (flags & NV_RX2_CHECKSUMMASK) == NV_RX2_CHECKSUMOK3) {
2470						skb->ip_summed = CHECKSUM_UNNECESSARY;
2471					}
2472				}
2473	
2474				/* got a valid packet - forward it to the network core */
(gdb) q

as for toolchain differences: this is Debian Unstable, up-to-date as of
yesterday morning.

If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.
 
Linux middle 2.6.23-rc1-mm2 #1 SMP Wed Aug 1 14:58:22 CEST 2007 x86_64 GNU/Linux
 
Gnu C                  4.1.3
Gnu make               3.81
binutils               Binutils
util-linux             2.13
mount                  2.13
module-init-tools      3.3-pre11
e2fsprogs              1.40.2
reiserfsprogs          3.6.19
Linux C Library        6.1
Dynamic linker (ldd)   2.6.1
Procps                 3.2.7
Net-tools              1.60
Console-tools          0.2.3
Sh-utils               5.97
Modules Loaded         nf_nat_ftp nf_nat_irc nf_conntrack_irc nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat ipt_REJECT ipt_LOG xt_limit nf_conntrack_ipv4 xt_state xt_tcpudp iptable_filter ip_tables x_tables snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_emu10k1 snd_seq_dummy snd_seq_oss snd_seq_midi snd_seq_midi_event snd_seq snd_rawmidi snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_device snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore k8temp it87 hwmon_vid hwmon i2c_nforce2

Good luck,
Jurriaan
-- 
His pride could ...
From: thunder7
Date: Saturday, September 1, 2007 - 11:19 pm

From: Satyam Sharma <satyam@infradead.org>
There are 4 pictures of oopses here:

http://www.xs4all.nl/~thunder7/oops_2623rc4mm1_1.jpg
http://www.xs4all.nl/~thunder7/oops_2623rc4mm1_2.jpg
http://www.xs4all.nl/~thunder7/oops_2623rc4mm1_3.jpg
http://www.xs4all.nl/~thunder7/oops_2623rc4mm1_4.jpg

image quality, well, they're readable.

Good luck,
Jurriaan
-- 
management n.
1. Corporate power elites distinguished primarily by their distance from
actual productive work and their chronic failure to manage (see also suit).
Spoken derisively, as in "Management decided that ...". 2. Mythically, a
vast bureaucracy responsible for all the world's minor irritations.
Hackers' satirical public notices are often signed `The Mgt'; this derives
from the "Illuminatus" novels (see the Bibliography in Appendix C).
Debian (Unstable) GNU/Linux 2.6.23-rc1-mm2 2x2010 bogomips load 0.43
the Jack Vance Integral Edition: http://www.integralarchive.org
-

From: Satyam Sharma
Date: Sunday, September 2, 2007 - 2:55 am

OK, I've been pouring over forcedeth.c and the newly introduce NAPI code,
but didn't debug this yet, so I'll at least lay out the situation so that
somebody else who's more experienced @netdev can pick up from here with
minimal time wastage.

Here's what's happening (repeatedly, reproducibly) on Jurriaan's x64 box:

(1) The following NULL dereference oops:

    nv_rx_process_optimized(), inlined from nv_napi_poll(), found that
    "skb" i.e. np->get_rx_ctx->skb == NULL when trying to update
    skb->ip_summed.

(2) The following BUG in napi_complete():

    BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));

    from the nv_napi_poll()->__netif_rx_complete()->napi_complete()
    callchain is triggering. IOW napi_complete() found that a NAPI
    poll wasn't/shouldn't have been scheduled at all (!)

The above two problems appear to be occurring independently, AFAICT.


Satyam
-

From: Andrew James Wade
Date: Thursday, September 13, 2007 - 8:51 pm

I have an Oops that may be related:

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000025
printing eip: c037d81b *pde = 00000000
Oops: 0000 [#1]
last sysfs file: /devices/pci0000:00/0000:00:01.0/0000:01:00.0/class

Pid: 0, comm: swapper Not tainted (2.6.23-rc4-mm1-config2 #2)
EIP: 0060:[<c037d81b>] EFLAGS: 00010246 CPU: 0
EIP is at tcp_rto_min+0xb/0x15
EAX: 00000032 EBX: c4c98b68 ECX: fffffffe EDX: 00000000
ESI: c4c98b68 EDI: c055f600 EBP: c4432e40 ESP: c0596dec
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=c0596000 task=c052a340 task.ti=c0568000)
Stack: c037d8de c4c98b68 c4c98b68 c037e0ec 00000001 c037f879 c052a8b4 c052a340
       00000000 00000001 c25e1e60 00000000 00000000 00000001 8c176265 8c17678a
       00000000 00000001 00000001 00000000 8c17678a 86000000 ffffffff 007d8b21
Call Trace:
 [<c037d8de>] tcp_rtt_estimator+0xb9/0xfe
 [<c037e0ec>] tcp_ack_saw_tstamp+0x14/0x43
 [<c037f879>] tcp_ack+0x6b8/0x17b8
 [<c03833cc>] tcp_rcv_established+0x519/0x5f1
 [<c038838d>] tcp_v4_do_rcv+0x28/0x2f8
 [<c038a4ce>] tcp_v4_rcv+0x7df/0x83d
 [<c0372542>] ip_local_deliver+0xcc/0x148
 [<c0372975>] ip_rcv+0x3b7/0x3de
 [<c035fa0e>] netif_receive_skb+0x17a/0x1c2
 [<c02cc121>] rtl8139_poll+0x2d9/0x425
 [<c03616d7>] net_rx_action+0xa8/0xc8
 [<c011e8e0>] __do_softirq+0x40/0x90
 [<c010635d>] do_softirq+0x4d/0xb6
 =======================
INFO: lockdep is turned off.
Code: 24 8b 82 88 03 00 00 89 82 40 05 00 00 a1 a0 23 53 c0 89 82 44 05 00 00 83 c4 0c 5b 5e 5f 5d c3 8b 90 88 00 00 00 b8 32 00 00 00 <f6> 42 25 20 74 03 8b 42 54 c3 56
 85 d2 b9 01 00 00 00 0f 45 ca
EIP: [<c037d81b>] tcp_rto_min+0xb/0x15 SS:ESP 0068:c0596dec
Kernel panic - not syncing: Fatal exception in interrupt

config:
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-rc4-mm1
# Wed Sep 12 19:53:26 ...
From: Dhaval Giani
Date: Monday, September 17, 2007 - 6:57 am

Hi,

Any solutions for this one? I too have been hitting it on my system.


=======================
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000025
printing eip: c03e790d *pdpt = 00000000097c2001 <1>*pde = 0000000000000000 
Oops: 0000 [#1] SMP 
last sysfs file: /class/vc/vcs1/dev
Modules linked in:

Pid: 0, comm: swapper Not tainted (2.6.23-rc4-mm1-cpuctl #12)
EIP: 0060:[<c03e790d>] EFLAGS: 00010246 CPU: 0
EIP is at tcp_rto_min+0xe/0x19
EAX: 00000032 EBX: cc4a8180 ECX: 00000095 EDX: 00000000
ESI: cc4a8180 EDI: c05b28e0 EBP: c05c7cfc ESP: c05c7cfc
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=c05c6000 task=c0559cc0 task.ti=c05c6000)
Stack: c05c7d0c c03e79d2 cc4a8180 cc4a8180 c05c7d18 c03e9eb2 cc48a200 c05c7d68 
       c03ea37b 00000001 ffffff8f 00000000 ca52e8c0 005c7d48 006f7c09 00000001 
       86cff480 00000000 00000000 00000001 0000000c 000333ff c05c7d94 ffffffff 
Call Trace:
 [<c0105c64>] show_trace_log_lvl+0x19/0x2e
 [<c0105d26>] show_stack_log_lvl+0x99/0xa8
 [<c0105e2e>] show_registers+0xb6/0x185
 [<c010604a>] die+0x108/0x1ed
 [<c0419b3e>] do_page_fault+0x64e/0x735
 [<c04180b2>] error_code+0x72/0x78
 [<c03e79d2>] tcp_rtt_estimator+0xba/0x100
 [<c03e9eb2>] tcp_ack_saw_tstamp+0x17/0x47
 [<c03ea37b>] tcp_clean_rtx_queue+0x298/0x45d
 [<c03eaac2>] tcp_ack+0x183/0x2d4
 [<c03ec6a0>] tcp_rcv_established+0xd3/0x5ba
 [<c03f35ae>] tcp_v4_do_rcv+0x25/0xc2
 [<c03f3ac0>] tcp_v4_rcv+0x475/0x7c0
 [<c03db831>] ip_local_deliver+0xd9/0x17a
 [<c03dbcf2>] ip_rcv+0x420/0x45a
 [<c03c65be>] netif_receive_skb+0x22b/0x249
 [<c02e215f>] tg3_rx+0x24c/0x359
 [<c02e2349>] tg3_poll+0xdd/0x17c
 [<c03c6814>] net_rx_action+0x114/0x14a
 [<c0129163>] __do_softirq+0x73/0xe6
 [<c012920f>] do_softirq+0x39/0x51
 [<c012928d>] irq_exit+0x47/0x49
 [<c0106a95>] do_IRQ+0x5d/0x71
 [<c01058a2>] common_interrupt+0x2e/0x34
 [<c0103123>] cpu_idle+0x9e/0xb7
 [<c041557e>] rest_init+0x52/0x54
 [<c05cc73d>] start_kernel+0x21f/0x221
 ...
From: Denis V. Lunev
Date: Monday, September 17, 2007 - 7:07 am

I have also seen this OOPS on e1000 card. So, looks like driver independent.

By the way, this one has been triggered in a semi-stable way by the
'git-pull'

Regards,
	Den

-

From: Vlad Yasevich
Date: Monday, September 17, 2007 - 2:00 pm

Do you have this patch:

commit 5c127c58ae9bf196d787815b1bd6b0aec5aee816
Author: David S. Miller <davem@sunset.davemloft.net>
Date:   Fri Aug 31 14:39:44 2007 -0700

    [TCP]: 'dst' can be NULL in tcp_rto_min()
    
    Reported by Rick Jones.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1ee7212..bbad2cd 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -560,7 +560,7 @@ static u32 tcp_rto_min(struct sock *sk)
        struct dst_entry *dst = __sk_dst_get(sk);
        u32 rto_min = TCP_RTO_MIN;
 
-       if (dst_metric_locked(dst, RTAX_RTO_MIN))
+       if (dst && dst_metric_locked(dst, RTAX_RTO_MIN))
                rto_min = dst->metrics[RTAX_RTO_MIN-1];
        return rto_min;

-

From: Satyam Sharma
Date: Monday, September 17, 2007 - 4:56 pm

As Vlad Yasevich mentioned, this one is already fixed in 23-rc6.

The forcedeth oops is unrelated, but multiple people have reported that
same oops now -- adding Manfred Spraul to CC. [ original thread is at:
http://lkml.org/lkml/2007/9/1/115 ]
-

From: Alexey Dobriyan
Date: Saturday, September 1, 2007 - 7:36 pm

Good news is that, contary to popular belief, -mm is not horrible piece
of crap and NO_HZ on x86_64 worked here straight away.


The bad news is something knocked off box from the net, then panicked it:

Box: Core 2 Duo (E6400), 2G RAM
Setup: x86_64 kernel, no preemption, SLUB with debugging on and almost
       all other debugging on
       atl1 NIC driver, connected to master box, netconsoling to it as well
Load: sequential kernel build with -j9 on many configs I do here (easy)
      LTP in infinite loop
      gdb testsuite in infinite loop with "ulimit -c unlimited"
      ssh session feeding all the above to master box

Box was left alone for several hours, strange things happened while I
was away:
	* unpingable box, frozen ssh sessions
	* still can login via VT console
	* SysRq+t works (see dmesg)
	* SysRq+t left "atl1 0000:03:00.0: tx busy" after output

At this state box was left alone for a couple of more hours, and
eventually panicked with (see full dmesg at the end)

Unable to handle kernel NULL pointer dereference at 0000000000000039 RIP: 
 [<ffffffff803b6f7c>] tcp_rto_min+0xc/0x20

which corresponds to:

	<tcp_rto_min>:
		mov	0x100(%rdi),%rdx
		mov	$0x14,%eax
		testb	$0x20,0x39(%rdx)	<===


See below for full dmesg with SysRq+t output, oops and .config and
tcp_rto_min disassembly:

P.S.: uh-oh, it's "[TCP] Allow minnimum RTO ..." aka 05bb1fad1cde

Linux version 2.6.23-rc4-mm1 (ad@core2) (gcc version 4.1.1 (Gentoo 4.1.1-r3)) #1 SMP Sat Sep 1 10:53:14 MSD 2007
Command line: root=/dev/sda2 netconsole=@10.10.0.42/eth0,9353@10.10.0.1/00:80:48:45:EC:73 ignore_loglevel
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007ff90000 (usable)
 BIOS-e820: 000000007ff90000 - 000000007ff9e000 (ACPI data)
 BIOS-e820: 000000007ff9e000 - 000000007ffe0000 (ACPI NVS)
 ...
From: Satyam Sharma
Date: Saturday, September 1, 2007 - 10:02 pm

tcp_rto_min() lacks a check-for-NULL. You want 5c127c58ae9bf196 from

Yup, it came from this last commit in net-2.6 before -rc5.

[ Considering it's pretty core code (and thus the oops fairly easily
  reproducible), I initially thought this must've come from net-2.6.24.
  I suspect lot of testers might hit this, so would be wise to put that
  patch up as a hot-fix ? ]


Satyam
-

From: Andrew Morton
Date: Sunday, September 2, 2007 - 1:52 pm

Yeah, the net tree has been quite bad lately.  Unusually bad - it's usually
one of the good ones.

It also breaks a lot of the net driver work in several other trees (I dropped
git-ixgbe.patch wholesale because of this).  But there isn't a lot we can
do about that.  
-

From: Alexey Dobriyan
Date: Sunday, September 2, 2007 - 2:19 pm

OK, I'm currently running with "dst entry can be NULL" fix from net
tree, so far it's fine.
-

From: Adrian Bunk
Date: Sunday, September 2, 2007 - 2:14 am

defconfig fails with the following error on parisc:

<--  snip  -->

...
  CC      net/core/gen_estimator.o
In file included from include2/asm/bitops.h:111,
                 from /home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/net/core/gen_estimator.c:18:
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/include/asm-generic/bitops/non-atomic.h: 
In function '__set_bit':
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/include/asm-generic/bitops/non-atomic.h:17: 
error: implicit declaration of function 'BIT_MASK'
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/include/asm-generic/bitops/non-atomic.h:18: 
error: implicit declaration of function 'BIT_WORD'
make[3]: *** [net/core/gen_estimator.o] Error 1

<--  snip  -->

Either #include <asm/bitops.h> must become forbidden and #error or the 
move of the #define's to include/linux/bitops.h reverted.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Jiri Slaby
Date: Tuesday, September 4, 2007 - 10:53 am

Just to let you know, that I'm working on the former.

thanks,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
-

From: Adrian Bunk
Date: Sunday, September 2, 2007 - 4:25 am

This patch fixes the following compile error:

<--  snip  -->

...
  LD      .tmp_vmlinux1
net/built-in.o: In function `inet6_csk_xmit':
(.text+0x72b0f): undefined reference to `flow_cache_genid'
net/built-in.o: In function `inet6_csk_xmit':
(.text+0x72be5): undefined reference to `flow_cache_genid'
make[1]: *** [.tmp_vmlinux1] Error 1

<--  snip  -->

Signed-off-by: Adrian Bunk <bunk@kernel.org>

---
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -5,6 +5,7 @@
 #   IPv6 as module will cause a CRASH if you try to unload it
 config IPV6
 	tristate "The IPv6 protocol"
+	select XFRM
 	default m
 	---help---
 	  This is complemental support for the IP version 6.

-

From: Masahide NAKAMURA
Date: Monday, September 3, 2007 - 3:43 am

Hello,

On Sun, 2 Sep 2007 13:25:57 +0200


Thank you for catching this. the issue is caused with patch
"[IPV6] XFRM: Fix connected socket to use transformation."
which I sent to netdev.
(a85d5450ddeb959bdf9e4603f9c06e9d79217cfa on net-2.6.24).

I'd prefer to modify the original patch to use "ifdef CONFIG_XFRM"
than changing kernel config depends. Does it make sense?

Please review the attached patch.

-- 
Masahide NAKAMURA
From: Noriaki TAKAMIYA
Date: Thursday, September 6, 2007 - 3:01 am

I'm sorry not to check more precisely.

  As Eric said, this issue should be fixed by the patch attached in
  the following mail.

--
Noriaki TAKAMIYA
-

From: Torsten Kaiser
Date: Saturday, September 1, 2007 - 9:07 am

Kernel 2.6.23-rc4-mm1 works on one of my systems with:
00:00.0 Host bridge: VIA Technologies, Inc. VT8385 [K8T800 AGP] Host
Bridge (rev 01)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge
[K8T800/K8T890 South]
00:0e.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host
Controller (rev 80)
00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA
RAID Controller (rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 81)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 81)
00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge
[KT600/K8T800/K8T890 South]
00:11.5 Multimedia audio controller: VIA Technologies, Inc.
VT8233/A/8235/8237 AC97 Audio Controller (rev 60)

It now has a working HPET.

The bad:
sata_sil24 and/or libata are broken.
On my second system (MCP55 + SiI 3132) I see this:
[    3.890000] scsi0 : sata_sil24
[    3.900000] scsi1 : sata_sil24
[    3.900000] ata1: SATA max UDMA/100 host m128@0xefeffc00 port
0xefef8000 irq 16
[    3.920000] ata2: SATA max UDMA/100 host m128@0xefeffc00 port
0xefefa000 irq 16
[    4.300000] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    4.360000] ata1.00: ATA-7: MAXTOR STM3320820AS, 3.AAE, max UDMA/133
[    4.370000] ata1.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32)
[    4.430000] ata1.00: configured for UDMA/100
[    4.500000] ieee1394: Node added: ID:BUS[0-00:1023]  GUID[0010dc00005cc354]
[    4.500000] ieee1394: Host added: ID:BUS[0-01:1023]  GUID[0011d80000c4c261]
[    4.790000] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    ...
From: Andrew Morton
Date: Saturday, September 1, 2007 - 9:16 am

From: Satyam Sharma
Date: Saturday, September 1, 2007 - 3:06 pm

Got these on an i386 build with CONFIG_MODVERSIONS=y ...

WARNING: "div64_64" [net/netfilter/xt_connbytes.ko] has no CRC!
WARNING: "div64_64" [net/ipv4/tcp_cubic.ko] has no CRC!

Full .config at: http://www.cse.iitk.ac.in/users/ssatyam/config-mm
-

From: Adrian Bunk
Date: Saturday, September 1, 2007 - 3:40 pm

That's expected since the fix is in git-kbuild.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Sam Ravnborg
Date: Saturday, September 1, 2007 - 4:15 pm

As Adrian already commented it is fixed in kbuild.git.
It happes bacause genksyms did not know __extension__ and error recovery
in the parser were bad. I only managed to add support for __extension__ but
the error receovery are not fixed :-(

kbuild.git is not part of this -mm due to me fucking up the above fix.
That is corrected now so it will be in next -mm.

	Sam
-

From: Satyam Sharma
Date: Saturday, September 1, 2007 - 4:12 pm

kernel/softlockup.c: In function 'softlockup_tick':
kernel/softlockup.c:125: warning: 'regs' is used uninitialized in this function

So let's fix softlockup-improve-debug-output.patch to actually work,
and do what it claimed in the changelog :-)

Signed-off-by: Satyam Sharma <satyam@infradead.org>

---

 softlockup.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- kernel/softlockup.c~fix	2007-09-02 04:23:49.000000000 +0530
+++ kernel/softlockup.c	2007-09-02 04:34:45.000000000 +0530
@@ -80,7 +80,7 @@ void softlockup_tick(void)
 	int this_cpu = smp_processor_id();
 	unsigned long touch_timestamp = per_cpu(touch_timestamp, this_cpu);
 	unsigned long print_timestamp;
-	struct pt_regs *regs;
+	struct pt_regs *regs = get_irq_regs();
 	unsigned long now;
 
 	if (touch_timestamp == 0) {
-

From: Satyam Sharma
Date: Sunday, September 2, 2007 - 5:37 am

^^^^^^^^^^

Ick, I botched a trivial patch, it doesn't even apply. Updated one below
(with indentation fix as added bonus :-)


[PATCH -mm] softlockup-improve-debug-output.patch fix (v2)

kernel/softlockup.c: In function 'softlockup_tick':
kernel/softlockup.c:125: warning: 'regs' is used uninitialized in this function

is a genuine bug (will cause an oops in all probability,
or cause wrong info to be printed, if we're lucky). So let's fix the
softlockup-improve-debug-output.patch to actually work as intended.

Signed-off-by: Satyam Sharma <satyam@infradead.org>

---

 kernel/softlockup.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- linux-2.6.23-rc4-mm1/kernel/softlockup.c~fix	2007-09-02 17:58:23.000000000 +0530
+++ linux-2.6.23-rc4-mm1/kernel/softlockup.c	2007-09-02 17:58:48.000000000 +0530
@@ -80,7 +80,7 @@ void softlockup_tick(void)
 	int this_cpu = smp_processor_id();
 	unsigned long touch_timestamp = per_cpu(touch_timestamp, this_cpu);
 	unsigned long print_timestamp;
-	struct pt_regs *regs;
+	struct pt_regs *regs = get_irq_regs();
 	unsigned long now;
 
 	if (touch_timestamp == 0) {
@@ -121,7 +121,7 @@ void softlockup_tick(void)
 	spin_lock(&print_lock);
 	printk(KERN_ERR "BUG: soft lockup - CPU#%d stuck for %lus! [%s:%d]\n",
 			this_cpu, now - touch_timestamp,
-				current->comm, task_pid_nr(current));
+			current->comm, task_pid_nr(current));
 	if (regs)
 		show_regs(regs);
 	else
-

From: Ingo Molnar
Date: Sunday, September 2, 2007 - 5:28 am

Thanks! Not sure how that bug slipped in, in my tree it does this:

 +       struct pt_regs *regs;
 ...
 +       regs = get_irq_regs();

Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo
-

From: Satyam Sharma
Date: Wednesday, September 5, 2007 - 11:52 pm

Hi Ingo,



You're very right indeed -- this bit was absent from -rc4-mm1's
softlockup-improve-debug-output.patch but now that I looked at your
original patch at http://lkml.org/lkml/2007/7/17/180, it becomes
obvious this was simply a mismerge issue after all :-)

[ Andrew, feel free to ignore my patch in case you just resolve
  the mismerge by yourself. ]


BTW would something similar would be useful in __schedule_bug() too?
I sure think so -- I'm not sure if EIP holds anything useful there,
but CPU#, EFLAGS and the init_utsname() stuff would be definitely
helpful ...


[PATCH] sched: Use show_regs() to improve __schedule_bug() output

A full register dump along with stack backtrace would make the "scheduling
while atomic" message more helpful. Use show_regs() instead of dump_stack()
for this. We already know we're atomic in here (that is why this function
was called) so show_regs()'s atomicity expectations are guaranteed.

Also, modify the output of the "BUG: scheduling while atomic:" header a bit
to keep task->comm and task->pid together and preempt_count() after them.

Signed-off-by: Satyam Sharma <satyam@infradead.org>

---

 kernel/sched.c |   14 +++++++++++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index b533d6d..4fb07c1 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -63,6 +63,7 @@
 #include <linux/unistd.h>
 
 #include <asm/tlb.h>
+#include <asm/irq_regs.h>
 
 /*
  * Scheduler clock - returns current time in nanosec units.
@@ -3404,12 +3405,19 @@ EXPORT_SYMBOL(sub_preempt_count);
  */
 static noinline void __schedule_bug(struct task_struct *prev)
 {
-	printk(KERN_ERR "BUG: scheduling while atomic: %s/0x%08x/%d\n",
-		prev->comm, preempt_count(), prev->pid);
+	struct pt_regs *regs = get_irq_regs();
+
+	printk(KERN_ERR "BUG: scheduling while atomic: %s/%d/0x%08x\n",
+		prev->comm, prev->pid, preempt_count());
+
 	debug_show_held_locks(prev);
 	if (irqs_disabled())
 ...
From: Ingo Molnar
Date: Monday, October 22, 2007 - 5:35 am

thanks, applied.

	Ingo
-

From: Valdis.Kletnieks
Date: Monday, September 3, 2007 - 9:36 am

Thanks for catching this, it was actually managing to inspire a full-scale
panic - flashing LEDs and the like.  Now to go track down the probably
self-inflicted cause of the soft-lockup message.. ;)
From: Satyam Sharma
Date: Saturday, September 1, 2007 - 4:42 pm

drivers/acpi/tables/tbutils.c: In function 'acpi_tb_parse_root_table':
drivers/acpi/tables/tbutils.c:403:
warning: 'rsdt_address' may be used uninitialized in this function

has been verified to be a bogus warning. Let's just initialize the
variable to zero and shut this up.

Signed-off-by: Satyam Sharma <satyam@infradead.org>

---

I didn't use uninitialized_var() here because drivers/acpi/ is dual-licensed
stuff and used elsewhere, where that macro may be unavailable (?)

 drivers/acpi/tables/tbutils.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.23-rc4-mm1/drivers/acpi/tables/tbutils.c~fix	2007-09-02 05:07:02.000000000 +0530
+++ linux-2.6.23-rc4-mm1/drivers/acpi/tables/tbutils.c	2007-09-02 05:07:14.000000000 +0530
@@ -400,7 +400,7 @@ acpi_tb_parse_root_table(acpi_physical_a
 	u32 table_count;
 	struct acpi_table_header *table;
 	acpi_physical_address address;
-	acpi_physical_address rsdt_address;
+	acpi_physical_address rsdt_address = 0;
 	u32 length;
 	u8 *table_entry;
 	acpi_status status;
-

From: Adrian Bunk
Date: Saturday, September 1, 2007 - 5:19 pm

Please use uninitialized_var() instead.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Satyam Sharma
Date: Saturday, September 1, 2007 - 6:02 pm

Len, would it be okay to use uninitialized_var() in drivers/acpi/ code?
-

From: Satyam Sharma
Date: Saturday, September 1, 2007 - 6:30 pm

net/sched/sch_cbq.c: In function 'cbq_enqueue':
net/sched/sch_cbq.c:383: warning: 'ret' may be used uninitialized in this function

has been verified to be a bogus case. So let's shut it up.

Signed-off-by: Satyam Sharma <satyam@infradead.org>

---

 net/sched/sch_cbq.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.23-rc4-mm1/net/sched/sch_cbq.c~fix	2007-09-02 06:45:08.000000000 +0530
+++ linux-2.6.23-rc4-mm1/net/sched/sch_cbq.c	2007-09-02 06:44:37.000000000 +0530
@@ -380,7 +380,7 @@ cbq_enqueue(struct sk_buff *skb, struct 
 {
 	struct cbq_sched_data *q = qdisc_priv(sch);
 	int len = skb->len;
-	int ret;
+	int uninitialized_var(ret);
 	struct cbq_class *cl = cbq_classify(skb, sch, &ret);
 
 #ifdef CONFIG_NET_CLS_ACT
-

From: Patrick McHardy
Date: Sunday, September 2, 2007 - 4:36 am

Acked-by: Patrick McHardy <kaber@trash.net>

-

From: Satyam Sharma
Date: Sunday, September 2, 2007 - 6:00 am

A typo results in build breakage:

drivers/char/nozomi.c:2204: error: syntax error before $B!F(J__attribute__$B!G(J
make[2]: *** [drivers/char/nozomi.o] Error 1

when CONFIG_HOTPLUG=n. This was actually meant to be __devexit_p.

Signed-off-by: Satyam Sharma <satyam@infradead.org>

---

 drivers/char/nozomi.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.23-rc4-mm1/drivers/char/nozomi.c~fix	2007-09-02 16:16:59.000000000 +0530
+++ linux-2.6.23-rc4-mm1/drivers/char/nozomi.c	2007-09-02 16:17:07.000000000 +0530
@@ -2201,7 +2201,7 @@ static struct pci_driver nozomi_driver =
 	.name = NOZOMI_NAME,
 	.id_table = nozomi_pci_tbl,
 	.probe = nozomi_card_init,
-	.remove = __devexit(nozomi_card_exit),
+	.remove = __devexit_p(nozomi_card_exit),
 };
 
 static __init int nozomi_init(void)
From: Laurent Riffard
Date: Sunday, September 2, 2007 - 12:01 pm

[...]

Alan,

libata-correct-handling-of-srst-reset-sequences.patch broke 80-wire

2.6.23-rc3-mm1 and 2.6.23-rc4 work fine (ata1 devices are configured
for UDMA/100).

Few weeks ago, I wrote a patch to solve a wrong cable detection
problem after suspend-to-disk/resume, and it solves this problem
too. Is it the right way to go ?



via_do_set_mode overwrites 80-wire cable detection bits. Let's
preserve them.

Signed-off-by: Laurent Riffard <laurent.riffard@free.fr>
---
 drivers/ata/pata_via.c |    7 +++++++
 1 file changed, 7 insertions(+)

Index: linux-2.6-mm/drivers/ata/pata_via.c
===================================================================
--- linux-2.6-mm.orig/drivers/ata/pata_via.c
+++ linux-2.6-mm/drivers/ata/pata_via.c
@@ -245,6 +245,7 @@ static void via_do_set_mode(struct ata_p
 	unsigned long T =  1000000000 / via_clock;
 	unsigned long UT = T/tdiv;
 	int ut;
+	u8 cable80_status;
 	int offset = 3 - (2*ap->port_no) - adev->devno;


@@ -294,9 +295,14 @@ static void via_do_set_mode(struct ata_p
 			ut = t.udma ? (0xe0 | (FIT(t.udma, 2, 9) - 2)) : 0x07;
 			break;
 	}
+
+	/* Get 80-wire cable detection bit */
+	pci_read_config_byte(pdev, 0x50 + offset, &cable80_status);
+	cable80_status &= 0x10;
+
 	/* Set UDMA unless device is not UDMA capable */
 	if (udma_type)
-		pci_write_config_byte(pdev, 0x50 + offset, ut);
+		pci_write_config_byte(pdev, 0x50 + offset, ut | cable80_status);
 }

 static void via_set_piomode(struct ata_port *ap, struct ata_device
*adev)


-

From: Alan Cox
Date: Sunday, September 2, 2007 - 12:20 pm

Agreed, on a reset case we may otherwise get confused and misdetect the

Acked-by: Alan Cox <alan@redhat.com>

Thanks a lot
-

From: Jeff Garzik
Date: Monday, September 10, 2007 - 6:50 pm

Laurent Riffard wrote:
> Le 01.09.2007 06:58, Andrew Morton a 
From: Rafael J. Wysocki
Date: Sunday, September 2, 2007 - 1:39 pm

It fails to boot on my HPC nx6325 (hangs very early, before any messages reach
the console), because of this patch:

x86_64-convert-to-clockevents.patch

(as identified by bisection).

Unfortunately, after reverting it I had to revert quite a lot of other patches
(hpet-related mostly).

The failing .config is attached.

Greetings,
Rafael

From: Thomas Gleixner
Date: Monday, September 3, 2007 - 1:36 am

Sigh. Can you try

noapictimer
nohpet

on the kernel commandline please ?

Also it would be interesting whether the -hrt patchset on top of rc5 has
the same problem:

http://www.tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2

Thanks,

	tglx


-

From: Rafael J. Wysocki
Date: Monday, September 3, 2007 - 3:15 am

I'll try that later.

Greetings,
Rafael
-

From: Rafael J. Wysocki
Date: Monday, September 3, 2007 - 1:51 pm

This one boots normally.

Greetings,
Rafael
-

From: Thomas Gleixner
Date: Monday, September 3, 2007 - 6:03 pm

Thanks. that narrows down the wreckage window substantially.

	tglx


-

From: Randy Dunlap
Date: Sunday, September 2, 2007 - 9:30 pm

on x86_64:
drivers/watchdog/core/watchdog_dev.c:84: warning: format '%i' expects type 'int', but argument 5 has type 'size_t'


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-

From: Wim Van Sebroeck
Date: Monday, September 3, 2007 - 12:25 pm

I'll have a look at it.

Greetings,
Wim.
-

From: Satyam Sharma
Date: Monday, September 3, 2007 - 3:32 pm

Hi,




How about ... (unrelated cleanup thrown in, but SCNR)



* Fix this warning:

  drivers/watchdog/core/watchdog_dev.c:84:
  warning: format '%i' expects type 'int', but argument 5 has type 'size_t'

* CONFIG_xxx options are directly usable by preprocessor directives.

Signed-off-by: Satyam Sharma <satyam@infradead.org>

---

 drivers/watchdog/core/Makefile       |    5 -----
 drivers/watchdog/core/watchdog_dev.c |    6 +++---
 2 files changed, 3 insertions(+), 8 deletions(-)

--- linux-2.6.23-rc4-mm1/drivers/watchdog/core/Makefile~fix	2007-09-04 03:12:27.000000000 +0530
+++ linux-2.6.23-rc4-mm1/drivers/watchdog/core/Makefile	2007-09-04 03:12:45.000000000 +0530
@@ -4,8 +4,3 @@
 
 # The Generic Watchdog Driver
 obj-$(CONFIG_WATCHDOG_CORE)		+= watchdog_core.o watchdog_dev.o
-
-ifeq ($(CONFIG_WATCHDOG_DEBUG_CORE), y)
-EXTRA_CFLAGS += -DDEBUG
-endif
-
--- linux-2.6.23-rc4-mm1/drivers/watchdog/core/watchdog_dev.c~fix	2007-09-04 02:37:12.000000000 +0530
+++ linux-2.6.23-rc4-mm1/drivers/watchdog/core/watchdog_dev.c	2007-09-04 03:10:58.000000000 +0530
@@ -36,7 +36,7 @@
 #include <linux/init.h>		/* For __init/__exit/... */
 #include <linux/uaccess.h>	/* For copy_to_user/put_user/... */
 
-#ifdef DEBUG
+#ifdef CONFIG_WATCHDOG_DEBUG_CORE
 #define trace(format, args...) \
 	printk(KERN_INFO "%s(" format ")\n", __FUNCTION__ , ## args)
 #define dbg(format, arg...) \
@@ -81,7 +81,7 @@ static DEFINE_MUTEX(watchdog_register_mt
 static ssize_t watchdog_write(struct file *file, const char __user *data,
 				size_t len, loff_t *ppos)
 {
-	trace("%p, %p, %i, %p", file, data, len, ppos);
+	trace("%p, %p, %zu, %p", file, data, len, ppos);
 
 	if (!watchdogdev ||
 	    !watchdogdev->watchdog_ops ||
@@ -144,7 +144,7 @@ static int watchdog_ioctl(struct inode *
 		.identity =		"Watchdog Device",
 	};
 
-	trace("%p, %p, %i, %li", inode, file, cmd, arg);
+	trace("%p, %p, %u, %li", inode, file, cmd, arg);
 
 	if (!watchdogdev || !watchdogdev->watchdog_ops)
 		return ...
From: Wim Van Sebroeck
Date: Tuesday, September 4, 2007 - 2:21 pm

Patch works for me. I applied it to the linux-2.6-watchdog-mm tree.

Greetings,
Wim.
-

From: Zach Carter
Date: Tuesday, September 4, 2007 - 10:54 am

Folks,

I've got these messages since installing 2.6.23-rc4-mm1:

sky2 0000:07:00.0: error interrupt status=0x80000000
printk: 4 messages suppressed.
sky2 0000:07:00.0: error interrupt status=0x80000000
printk: 4 messages suppressed.
sky2 0000:07:00.0: error interrupt status=0x80000000
printk: 4 messages suppressed.
sky2 0000:07:00.0: error interrupt status=0x80000000
printk: 4 messages suppressed.
sky2 0000:07:00.0: error interrupt status=0x80000000
printk: 5 messages suppressed.
sky2 0000:07:00.0: error interrupt status=0x80000000
printk: 5 messages suppressed.
sky2 0000:07:00.0: error interrupt status=0x80000000
printk: 4 messages suppressed.
sky2 0000:07:00.0: error interrupt status=0x80000000
printk: 4 messages suppressed.
sky2 0000:07:00.0: error interrupt status=0x80000000
printk: 4 messages suppressed.
sky2 0000:07:00.0: error interrupt status=0x80000000

The laptop is a Sony VAIO SZ430N/B

Despite the errors, the interface appears to be working well enough.

I'd be happy to supply additional information, try out patches, or 
submit a bugzilla if needed.

thanks!

-Zach
-

From: Stephen Hemminger
Date: Tuesday, September 4, 2007 - 2:36 pm

On Tue, 4 Sep 2007 10:54:32 -0700

I already told Andrew to please drop this last patch, because
it causes interrupt messages. It seems masking off the IRQ
in hardware doesn't prevent that interrupt!
-

From: Valdis.Kletnieks
Date: Wednesday, September 5, 2007 - 7:37 am

(Warning - if discussion of binary modules bothers you, hit delete now..)

Dell Latitude D840, x86_64 kernel

memory-controller-memory-accounting-v7.patch causes the NVidia graphics driver
to go into a soft-lockup:

BUG: soft lockup - CPU#0 stuck for 11s! [X:2733]
CPU 0:
Modules linked in: irnet ppp_generic slhc irtty_sir sir_dev ircomm_tty ircomm irda crc_ccitt nf_conntrack_ftp xt_pkttype ipt_REJECT ipt_osf nf_conntrack_ipv4 xt_ipisforif ipt_recent ipt_LOG xt_u32 iptable_filter ip_tables xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack nfnetlink ip6t_LOG xt_limit ip6table_filter ip6_tables x_tables vmnet(P)(U) vmmon(U) sha256 aes fan container bay acpi_cpufreq nvram arc4 ecb pcmcia iwl3945 firmware_class yenta_socket nvidia(P)(U) mac80211 iTCO_wdt rsrc_nonstatic iTCO_vendor_support ohci1394 watchdog_core ieee1394 watchdog_dev pcmcia_core cfg80211 video thermal output button battery processor ac intel_agp rtc
Pid: 2733, comm: X Tainted: P        2.6.23-rc4-mm1 #1
RIP: 0010:[<ffffffff80520e16>]  [<ffffffff80520e16>] _spin_lock+0x5b/0x75
RSP: 0018:ffff810007ecdcf8  EFLAGS: 00000202
RAX: 0000000000000000 RBX: ffff810007ecdd08 RCX: 0000000000000173
RDX: ffff8100040fe000 RSI: 00007f3cbf672000 RDI: ffff81000111ec90
RBP: 0000000000000006 R08: ffffffff80687d85 R09: 0000000000010000
R10: ffff810007ecdd60 R11: 00000001000355e8 R12: 00000000000002c7
R13: 0000000000000000 R14: 000000000000000a R15: 0000000000000002
FS:  00007f3cbf65f780(0000) GS:ffffffff806c6000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f3cbc83d540 CR3: 00000000046af000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Call Trace:
[<ffffffff8027b358>] get_locked_pte+0x100/0x114
[<ffffffff8027b3d5>] vm_insert_page+0x69/0x100
[<ffffffff8841e5f8>] :nvidia:nv_kern_mmap+0x712/0x7c0
[<ffffffff8027ec71>] mmap_region+0x222/0x426
[<ffffffff80327dc6>] ...
From: Andrew Morton
Date: Wednesday, September 5, 2007 - 8:12 am

It's legitimate.  That change was supposed to be a no-op.


(is it not a bit weird from a namin POV that we have

Seems to me that there's a missing pte_unmap_lock() in insert_page().

Also, a hunk in do_anonymous_page() is indented one tabstop too far, which
makes me suspect that patch(1) might have put it in the wrong place. 
Balbir, can you please check that?

diff -puN mm/memory.c~memory-controller-memory-accounting-v7-fix mm/memory.c
--- a/mm/memory.c~memory-controller-memory-accounting-v7-fix
+++ a/mm/memory.c
@@ -1135,7 +1135,7 @@ static int insert_page(struct mm_struct 
 {
 	int retval;
 	pte_t *pte;
-	spinlock_t *ptl;  
+	spinlock_t *ptl;
 
 	retval = mem_container_charge(page, mm);
 	if (retval)
@@ -1160,6 +1160,7 @@ static int insert_page(struct mm_struct 
 	set_pte_at(mm, addr, pte, mk_pte(page, prot));
 
 	retval = 0;
+	pte_unmap_unlock(pte, ptl);
 	return retval;
 out_unlock:
 	pte_unmap_unlock(pte, ptl);
@@ -2184,8 +2185,8 @@ static int do_anonymous_page(struct mm_s
 	if (!page)
 		goto oom;
 
-		if (mem_container_charge(page, mm))
-			goto oom_free_page;
+	if (mem_container_charge(page, mm))
+		goto oom_free_page;
 
 	entry = mk_pte(page, vma->vm_page_prot);
 	entry = maybe_mkwrite(pte_mkdirty(entry), vma);
_

-

From: Balbir Singh
Date: Wednesday, September 5, 2007 - 8:20 am

Yes, this fix looks right as well.

Thanks for catching them so quickly.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL
-

From: Valdis.Kletnieks
Date: Wednesday, September 5, 2007 - 8:58 am

Confirming that this patch fixes things.

Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
From: Valdis.Kletnieks
Date: Wednesday, September 5, 2007 - 12:46 pm

git-alsa.patch breaks audio on my laptop, worked fine in -rc3-mm1.  Almost
certainly bustification in the Intel HDA rewrite.

Symptoms:  alsamixer finds the chipset, can adjust the volumes and mute/unmute,
and /usr/bin/play is able to write a .wav to the ALSA device without complaint.
However, no sound actually comes out.  Very "lights are on but nobody is home".

Dell Latitude D820, lspci reports:

00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01)
and alsamixer reports finding a "SigmaTel STAC9200"

% grep HDA_ .config
CONFIG_SND_HDA_INTEL=y
# CONFIG_SND_HDA_HWDEP is not set
CONFIG_SND_HDA_CODEC_REALTEK=y
CONFIG_SND_HDA_CODEC_ANALOG=y
CONFIG_SND_HDA_CODEC_SIGMATEL=y
# CONFIG_SND_HDA_CODEC_VIA is not set
# CONFIG_SND_HDA_CODEC_ATIHDMI is not set
# CONFIG_SND_HDA_CODEC_CONEXANT is not set
# CONFIG_SND_HDA_CODEC_CMEDIA is not set
# CONFIG_SND_HDA_CODEC_SI3054 is not set
CONFIG_SND_HDA_GENERIC=y
# CONFIG_SND_HDA_POWER_SAVE is not set

dmesg under -rc4-mm1:
Advanced Linux Sound Architecture Driver Version 1.0.14 (Fri Jul 20 09:12:58 2007 UTC).
ACPI: PCI Interrupt 0000:00:1b.0[A] -> GSI 21 (level, low) -> IRQ 21
hda_intel: position_fix set to 1 for device 1028:01cc
ALSA device list:
  #0: HDA Intel at 0xefffc000 irq 506

dmesg under -rc3-mm1:

Advanced Linux Sound Architecture Driver Version 1.0.14 (Fri Jul 20 09:12:58 2007 UTC).
ACPI: PCI Interrupt 0000:00:1b.0[A] -> GSI 21 (level, low) -> IRQ 21
hda_intel: position_fix set to 1 for device 1028:01cc
ALSA device list:
  #0: HDA Intel at 0xefffc000 irq 506

(Yes, they look the same to me, too...)

I'd provide more info, if I had a clue what else to add...

 

From: Valdis.Kletnieks
Date: Wednesday, September 5, 2007 - 12:54 pm

For the record, REALTEK and ANALOG got set to Y in my bisect build because of
the infamous "defaults to Y" syndrome - after I saw Sigmatel go by I wised up
and started saying N, but forgot to clean up the first two.  Those two are
disabled in the live -rc4-mm1 .config, and it has the same issue.

From: Takashi Iwai
Date: Wednesday, September 5, 2007 - 1:22 pm

At Wed, 05 Sep 2007 15:54:55 -0400,

The "default Y" is the correct behavior in this case.  These configs
are just splits from a single config, corresponding to all Y.


Takashi
-

From: Takashi Iwai
Date: Wednesday, September 5, 2007 - 1:11 pm

At Wed, 05 Sep 2007 15:46:34 -0400,

First, check /proc/asound/card0/codec#* whether STAC9200 is identified
properly?  If yes, check the mixer contents (at best, run
"alsactl -f somefile store"), see whether "Master Playback Volume" is
raised, "Master Playback Switch" unmuted, "Front..." raised/unmuted,
and "PCM ..." raised/unmuted, etc.

If this still doesn't work, try to give model=ref option to
snd-hda-intel.  If it still not OK, please try bi-sect of
git.kernel.org/perex/alsa.git mm branch...


Takashi
-

From: Takashi Iwai
Date: Wednesday, September 5, 2007 - 1:27 pm

At Wed, 05 Sep 2007 22:11:20 +0200,

BTW, there are 10 different models to test for Dell with STAC9200
(dell-d2[1-3] and dell-m2[1-7], see
Documentation/sound/alsa/ALSA-Configuration.txt), so I recommend to
build it as a module so that you can save the boot time :)


Takashi
-

From: Valdis.Kletnieks
Date: Wednesday, September 5, 2007 - 2:16 pm

modprobe snd_hda_intel model=dell-m23 

was the magic incantation.  I'm sure that every user who trips over this
is going to call it a regression, since the -rc3-mm1 module was able to
get it right without hints. ;)
From: Takashi Iwai
Date: Wednesday, September 5, 2007 - 2:39 pm

At Wed, 05 Sep 2007 17:16:49 -0400,

Well, it's indeed a regression.  There seems to be mistakes in the pin
configuration orders. 

Could you try the patch below (without model option)?


thanks,

Takashi

diff -r 3a300e020eca pci/hda/patch_sigmatel.c
--- a/pci/hda/patch_sigmatel.c	Wed Sep 05 19:14:38 2007 +0200
+++ b/pci/hda/patch_sigmatel.c	Wed Sep 05 23:37:25 2007 +0200
@@ -563,8 +563,8 @@ static unsigned int ref9200_pin_configs[
     102801E8
 */
 static unsigned int dell9200_d21_pin_configs[8] = {
-	0x400001f0, 0x400001f1, 0x01a19021, 0x90100140,
-	0x01813122, 0x02214030, 0x01014010, 0x02a19020,
+	0x400001f0, 0x400001f1, 0x02214030, 0x01014010, 
+	0x02a19020, 0x01a19021, 0x90100140, 0x01813122,
 };
 
 /* 
@@ -573,8 +573,8 @@ static unsigned int dell9200_d21_pin_con
     102801C1
 */
 static unsigned int dell9200_d22_pin_configs[8] = {
-	0x400001f0, 0x400001f1, 0x02a19021, 0x90100140,
-	0x400001f2, 0x0221401f, 0x01014010, 0x01813020,
+	0x400001f0, 0x400001f1, 0x0221401f, 0x01014010, 
+	0x01813020, 0x02a19021, 0x90100140, 0x400001f2,
 };
 
 /* 
@@ -587,8 +587,8 @@ static unsigned int dell9200_d22_pin_con
     102801E3
 */
 static unsigned int dell9200_d23_pin_configs[8] = {
-	0x400001f0, 0x400001f1, 0x01a19021, 0x90100140,
-	0x400001f2, 0x0221401f, 0x01014010, 0x01813020,
+	0x400001f0, 0x400001f1, 0x0221401f, 0x01014010, 
+	0x01813020, 0x01a19021, 0x90100140, 0x400001f2, 
 };
 
 
@@ -598,8 +598,8 @@ static unsigned int dell9200_d23_pin_con
     102801D8 (Dell Inspiron 640m)
 */
 static unsigned int dell9200_m21_pin_configs[8] = {
-	0x40c003fa, 0x03441340, 0x03a11020, 0x401003fc,
-	0x403003fd, 0x0321121f, 0x0321121f, 0x408003fb,
+	0x40c003fa, 0x03441340, 0x0321121f, 0x90170310,
+	0x408003fb, 0x03a11020, 0x401003fc, 0x403003fd,
 };
 
 /* 
@@ -611,8 +611,8 @@ static unsigned int dell9200_m21_pin_con
     102801D6 
 */
 static unsigned int dell9200_m22_pin_configs[8] = {
-	0x40c003fa, 0x0144131f, 0x03A11020, 0x401003fb, 
-	0x40f000fc, 0x0321121f, ...
From: Valdis.Kletnieks
Date: Thursday, September 6, 2007 - 7:10 am

That patch makes it work as expected, at least on my Dell.  Do we need to
find testers for the other 9 varieties of Dell Sigmatel chipsets, or was it
the same basic error on all 10, so if it works on one it should be OK for
the others?

From: Takashi Iwai
Date: Thursday, September 6, 2007 - 7:17 am

At Thu, 06 Sep 2007 10:10:54 -0400,

It's the same logic error for all Dell pin configs, so yes, the others
should be OK if it works for you.


thanks,

Takashi
-

From: Mathieu Desnoyers
Date: Thursday, September 6, 2007 - 12:37 pm

Hi Andrew,

I got a link error on myri10ge when building 2.6.23-rc4-mm1 on x86_64 :

ERROR: "lro_flush_all" [drivers/net/myri10ge/myri10ge.ko] undefined!
ERROR: "lro_receive_frags" [drivers/net/myri10ge/myri10ge.ko] undefined!
make[2]: *** [__modpost] Error 1
make[1]: *** [modules] Error 2
make: *** [_all] Error 2

Mathieu

My config:

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-rc4-mm1
# Thu Sep  6 11:02:54 2007
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_NONIRQ_WAKEUP=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_ZONE_DMA32=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_NR_QUICK=2
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_CMPXCHG=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_DMI=y
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
CONFIG_AUDIT=y
# CONFIG_AUDITSYSCALL is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=16
CONFIG_CONTAINERS=y
# CONFIG_CONTAINER_DEBUG is not set
# CONFIG_CONTAINER_NS is not set
# CONFIG_CONTAINER_CPUACCT is not set
CONFIG_CPUSETS=y
# CONFIG_RESOURCE_COUNTERS is not ...
From: David Miller
Date: Thursday, September 6, 2007 - 1:40 pm

From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

myri10ge needs some LRO ifdeffery.

-

From: David Miller
Date: Thursday, September 6, 2007 - 1:48 pm

From: David Miller <davem@davemloft.net>

Actually the fix is even simpler, missing select in Kconfig.

I've checked the following fix for this into the net-2.6.24
tree.

commit 9fd380e892e078b582920325357292c07eeeecc9
Author: David S. Miller <davem@kimchee.(none)>
Date:   Thu Sep 6 21:44:36 2007 +0100

    [MYRI10GE]: Need to select INET_LRO.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index b92b7dc..7d1a84e 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2496,6 +2496,7 @@ config MYRI10GE
 	depends on PCI
 	select FW_LOADER
 	select CRC32
+	select INET_LRO
 	---help---
 	  This driver supports Myricom Myri-10G Dual Protocol interface in
 	  Ethernet mode. If the eeprom on your board is not recent enough,
-

From: Jeff Garzik
Date: Friday, September 7, 2007 - 4:59 pm

Yes, that's the correct fix.  ACK.


-

From: Daniel Walker
Date: Friday, September 7, 2007 - 5:25 pm

Didn't catch this one .. Guess -mm a little out of date..

Daniel

-

From: Avuton Olrich
Date: Saturday, October 13, 2007 - 3:03 pm

This bug still exists, though now it is in mainline. I just bisected
to it with this config[1], unless, of course randconfig is still
making bad configs.

Errors out with:
drivers/built-in.o: In function `myri10ge_poll':
myri10ge.c:(.text+0xce259): undefined reference to `lro_receive_frags'
myri10ge.c:(.text+0xce37c): undefined reference to `lro_flush_all'

It bisects back to this with this:
sbh@shapeshifter /tmp/tester/linux-2.6 $ git-bisect bad
1e6e9342d41ff80ced0ad5dfcf084926700cdfc5 is first bad commit
commit 1e6e9342d41ff80ced0ad5dfcf084926700cdfc5
Author: Andrew Gallatin <gallatin@myri.com>
Date:   Mon Sep 17 11:37:42 2007 -0700

    [MYRI10GE]: Use LRO.

    Singed off by: Andrew Gallatin <gallatin@myri.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

This is with linux-2.6.git
master: 8d8fe64237646fdd2c2de2722ec4189a5999119d

[1] http://avuton.googlepages.com/undef-reference-lro_receive_frags.config
-- 
avuton
--
 Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
-

From: Mel Gorman
Date: Sunday, September 9, 2007 - 5:22 am

(To list based on CC's in net-add-ath5k-wireless-driver-fix.patch . If
that is in error, apologies)


I thought I would give the ath5k driver a shot on my Thinkpad T60p to see
what happened but it wasn't particularly successful. lspci -v shows

03:00.0 Ethernet controller: Atheros Communications, Inc. AR5212 802.11abg NIC (rev 01)
        Subsystem: IBM ThinkPad 11a/b/g Wireless LAN Mini Express Adapter (AR5BXB6)
        Flags: bus master, fast devsel, latency 0, IRQ 22
        Memory at edf00000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [40] Power Management version 2
        Capabilities: [50] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable-
        Capabilities: [60] Express Legacy Endpoint IRQ 0
        Capabilities: [90] MSI-X: Enable- Mask- TabSize=1
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel

During boot, the following relevant information in dmesg shows up;

ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 17 (level, low) -> IRQ 21
PCI: Setting latency timer of device 0000:03:00.0 to 64
Uhhuh. NMI received for unknown reason b1 on CPU 0.
You have some hardware problem, likely on the PCI bus.
Dazed and confused, but trying to continue
ath5k_hw_nic_wakeup: failed to resume the MAC Chip
ACPI: PCI interrupt for device 0000:03:00.0 disabled
ath_pci: probe of 0000:03:00.0 failed with error -5

Needless to say, it fails to bring up networking later. I have no real idea
how to debug something like this. Any suggestions?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
-

From: Robert de Rooy
Date: Friday, September 14, 2007 - 8:12 am

I just tried Fedora 8test2 LiveCD which includes the 0.9.5-BSD version 
of the ath5k driver and get the exact same thing on my ThinkPad T60.
Fedora 8test2 uses a 2.6.23 based kernel.
-

From: Adrian Bunk
Date: Sunday, September 9, 2007 - 1:24 pm

This patch fixes the following compile error:

<--  snip  -->

...
  CC      arch/alpha/kernel/asm-offsets.s
In file included from /home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/include/linux/bitops.h:17,
                 from /home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/include/linux/kernel.h:15,
                 from /home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/include/linux/sched.h:50,
                 from /home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/arch/alpha/kernel/asm-offsets.c:9:
include2/asm/bitops.h: In function 'clear_bit_unlock':
include2/asm/bitops.h:75: error: implicit declaration of function 'smp_mb'
make[2]: *** [arch/alpha/kernel/asm-offsets.s] Error 1

<--  snip  -->

Signed-off-by: Adrian Bunk <bunk@kernel.org>

---
6df784c9aa4ba1ff2062b63e733c645e8b1e5203 
diff --git a/include/asm-alpha/bitops.h b/include/asm-alpha/bitops.h
index ffec8a8..381b4f5 100644
--- a/include/asm-alpha/bitops.h
+++ b/include/asm-alpha/bitops.h
@@ -2,6 +2,7 @@
 #define _ALPHA_BITOPS_H
 
 #include <asm/compiler.h>
+#include <asm/barrier.h>
 
 /*
  * Copyright 1994, Linus Torvalds.

-

From: Adrian Bunk
Date: Sunday, September 9, 2007 - 1:24 pm

ide_get_error_location() is no longer used.

Signed-off-by: Adrian Bunk <bunk@kernel.org>

---

 drivers/ide/ide-io.c |   35 -----------------------------------
 include/linux/ide.h  |    5 -----
 2 files changed, 40 deletions(-)

924249789a0c0d577c5c5bfa91f4e514b7ebde60 
diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index c1692d9..ec835e3 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -322,41 +322,6 @@ static void ide_complete_pm_request (ide_drive_t *drive, struct request *rq)
 	spin_unlock_irqrestore(&ide_lock, flags);
 }
 
-/*
- * FIXME: probably move this somewhere else, name is bad too :)
- */
-u64 ide_get_error_location(ide_drive_t *drive, char *args)
-{
-	u32 high, low;
-	u8 hcyl, lcyl, sect;
-	u64 sector;
-
-	high = 0;
-	hcyl = args[5];
-	lcyl = args[4];
-	sect = args[3];
-
-	if (ide_id_has_flush_cache_ext(drive->id)) {
-		low = (hcyl << 16) | (lcyl << 8) | sect;
-		HWIF(drive)->OUTB(drive->ctl|0x80, IDE_CONTROL_REG);
-		high = ide_read_24(drive);
-	} else {
-		u8 cur = HWIF(drive)->INB(IDE_SELECT_REG);
-		if (cur & 0x40) {
-			high = cur & 0xf;
-			low = (hcyl << 16) | (lcyl << 8) | sect;
-		} else {
-			low = hcyl * drive->head * drive->sect;
-			low += lcyl * drive->sect;
-			low += sect - 1;
-		}
-	}
-
-	sector = ((u64) high << 24) | low;
-	return sector;
-}
-EXPORT_SYMBOL(ide_get_error_location);
-
 /**
  *	ide_end_drive_cmd	-	end an explicit drive command
  *	@drive: command 
diff --git a/include/linux/ide.h b/include/linux/ide.h
index 48871f9..65de5c3 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -1088,11 +1088,6 @@ extern ide_startstop_t ide_do_reset (ide_drive_t *);
 extern void ide_init_drive_cmd (struct request *rq);
 
 /*
- * this function returns error location sector offset in case of a write error
- */
-extern u64 ide_get_error_location(ide_drive_t *, char *);
-
-/*
  * "action" parameter type for ide_do_drive_cmd() below.
  */
 typedef enum {

-

From: Bartlomiej Zolnierkiewicz
Date: Tuesday, September 11, 2007 - 2:27 pm

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

Since git-block contains the patch which removes the only user of
ide_get_error_location() I think that this patch should be also merged
through block tree.  Jens?

PS none of the blkdev_issue_flush() users uses *error_sector argument


-

From: Jens Axboe
Date: Tuesday, September 11, 2007 - 10:54 pm

I had hoped that the existance was enough incentive, but it didn't
happen. I'll make a note to kill that again.

-- 
Jens Axboe

-

From: Adrian Bunk
Date: Sunday, September 9, 2007 - 1:24 pm

This patch makes three needlessly global functions static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>

---

 drivers/dma/ioat_dma.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

c633b44cd60648f456a11bb38fd9193ce4d6acdc 
diff --git a/drivers/dma/ioat_dma.c b/drivers/dma/ioat_dma.c
index e4c3afe..66c5bb5 100644
--- a/drivers/dma/ioat_dma.c
+++ b/drivers/dma/ioat_dma.c
@@ -47,8 +47,8 @@
 static void ioat_dma_start_null_desc(struct ioat_dma_chan *ioat_chan);
 static void ioat_dma_memcpy_cleanup(struct ioat_dma_chan *ioat_chan);
 
-struct ioat_dma_chan *ioat_lookup_chan_by_index(struct ioatdma_device *device,
-						int index)
+static struct ioat_dma_chan *ioat_lookup_chan_by_index(struct ioatdma_device *device,
+						       int index)
 {
 	return device->idx[index];
 }
@@ -716,7 +716,7 @@ MODULE_PARM_DESC(ioat_interrupt_style,
  * ioat_dma_setup_interrupts - setup interrupt handler
  * @device: ioat device
  */
-int ioat_dma_setup_interrupts(struct ioatdma_device *device)
+static int ioat_dma_setup_interrupts(struct ioatdma_device *device)
 {
 	struct ioat_dma_chan *ioat_chan;
 	int err, i, j, msixcnt;
@@ -826,7 +826,7 @@ err_no_irq:
  * ioat_dma_remove_interrupts - remove whatever interrupts were set
  * @device: ioat device
  */
-void ioat_dma_remove_interrupts(struct ioatdma_device *device)
+static void ioat_dma_remove_interrupts(struct ioatdma_device *device)
 {
 	struct ioat_dma_chan *ioat_chan;
 	int i;

-

From: Adrian Bunk
Date: Sunday, September 9, 2007 - 1:25 pm

This patch makes four needlessly global functions static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>

---

 drivers/usb/serial/ch341.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

c7046a47d2d1dd5dc6a8fcc298b8c5f7497b3aaa 
diff --git a/drivers/usb/serial/ch341.c b/drivers/usb/serial/ch341.c
index eb68106..6b252ce 100644
--- a/drivers/usb/serial/ch341.c
+++ b/drivers/usb/serial/ch341.c
@@ -66,7 +66,8 @@ static int ch341_control_in(struct usb_device *dev,
 	return r;
 }
 
-int ch341_set_baudrate(struct usb_device *dev, struct ch341_private *priv)
+static int ch341_set_baudrate(struct usb_device *dev,
+			      struct ch341_private *priv)
 {
 	short a, b;
 	int r;
@@ -108,14 +109,15 @@ int ch341_set_baudrate(struct usb_device *dev, struct ch341_private *priv)
 	return r;
 }
 
-int ch341_set_handshake(struct usb_device *dev, struct ch341_private *priv)
+static int ch341_set_handshake(struct usb_device *dev,
+			       struct ch341_private *priv)
 {
 	dbg("ch341_set_handshake(%d,%d)", priv->dtr, priv->rts);
 	return ch341_control_out(dev, 0xa4,
 		~((priv->dtr?1<<5:0)|(priv->rts?1<<6:0)), 0);
 }
 
-int ch341_get_status(struct usb_device *dev)
+static int ch341_get_status(struct usb_device *dev)
 {
 	char *buffer;
 	int r;
@@ -142,7 +144,7 @@ out:	kfree(buffer);
 
 /* -------------------------------------------------------------------------- */
 
-int ch341_configure(struct usb_device *dev, struct ch341_private *priv)
+static int ch341_configure(struct usb_device *dev, struct ch341_private *priv)
 {
 	char *buffer;
 	int r;

-

From: Adrian Bunk
Date: Sunday, September 9, 2007 - 1:25 pm

nfs_wb_page_priority() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>

---

 fs/nfs/write.c         |    3 ++-
 include/linux/nfs_fs.h |    1 -
 2 files changed, 2 insertions(+), 2 deletions(-)

30370f47093c3d812929d84a5a6be79ccb55a2b3 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 3e9e268..37953fd 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1424,7 +1424,8 @@ out:
 	return ret;
 }
 
-int nfs_wb_page_priority(struct inode *inode, struct page *page, int how)
+static int nfs_wb_page_priority(struct inode *inode, struct page *page,
+				int how)
 {
 	loff_t range_start = page_offset(page);
 	loff_t range_end = range_start + (loff_t)(PAGE_CACHE_SIZE - 1);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index f5414fc..e247a40 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -430,7 +430,6 @@ extern long nfs_sync_mapping_wait(struct address_space *, struct writeback_contr
 extern int nfs_wb_all(struct inode *inode);
 extern int nfs_wb_nocommit(struct inode *inode);
 extern int nfs_wb_page(struct inode *inode, struct page* page);
-extern int nfs_wb_page_priority(struct inode *inode, struct page* page, int how);
 extern int nfs_wb_page_cancel(struct inode *inode, struct page* page);
 #if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
 extern int  nfs_commit_inode(struct inode *, int);

-

From: Adrian Bunk
Date: Sunday, September 9, 2007 - 1:25 pm

This patch makes the following needlessly global code static:
- vmcoreinfo_data[]
- vmcoreinfo_size
- vmcoreinfo_append_str()

Signed-off-by: Adrian Bunk <bunk@kernel.org>

---

 include/linux/kexec.h |   14 -----------
 kernel/kexec.c        |   52 +++++++++++++++++++++++++-----------------
 2 files changed, 32 insertions(+), 34 deletions(-)

e6dbb01497c12aa69b47914da4db1cfd23e9813e 
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 99f2d6f..7cce357 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -123,21 +123,8 @@ int kexec_should_crash(struct task_struct *);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
 void crash_save_vmcoreinfo(void);
 void arch_crash_save_vmcoreinfo(void);
-void vmcoreinfo_append_str(const char *fmt, ...);
 unsigned long paddr_vmcoreinfo_note(void);
 
-#define SYMBOL(name) \
-	vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name)
-#define SIZE(name) \
-	vmcoreinfo_append_str("SIZE(%s)=%d\n", #name, sizeof(struct name))
-#define OFFSET(name, field) \
-	vmcoreinfo_append_str("OFFSET(%s.%s)=%d\n", #name, #field, \
-			      &(((struct name *)0)->field))
-#define LENGTH(name, value) \
-	vmcoreinfo_append_str("LENGTH(%s)=%d\n", #name, value)
-#define CONFIG(name) \
-	vmcoreinfo_append_str("CONFIG_%s=y\n", #name)
-
 extern struct kimage *kexec_image;
 extern struct kimage *kexec_crash_image;
 
@@ -177,7 +164,6 @@ extern struct resource crashk_res;
 typedef u32 note_buf_t[KEXEC_NOTE_BYTES/4];
 extern note_buf_t *crash_notes;
 extern u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4];
-extern unsigned int vmcoreinfo_size;
 extern unsigned int vmcoreinfo_max_size;
 
 
diff --git a/kernel/kexec.c b/kernel/kexec.c
index af2c035..c84a387 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -36,9 +36,9 @@
 note_buf_t* crash_notes;
 
 /* vmcoreinfo stuff */
-unsigned char vmcoreinfo_data[VMCOREINFO_BYTES];
+static unsigned char vmcoreinfo_data[VMCOREINFO_BYTES];
 u32 ...
From: Ken'ichi Ohmichi
Date: Sunday, September 9, 2007 - 7:55 pm

Hi Adrian,



The kernel compiling fails with your patch because architecture-specific
function should access the above data/function:

# make
[snip]
arch/ia64/kernel/machine_kexec.c: In function 'arch_crash_save_vmcoreinfo':
arch/ia64/kernel/machine_kexec.c:134: error: implicit declaration of function 'SYMBOL'
arch/ia64/kernel/machine_kexec.c:135: error: implicit declaration of function 'LENGTH'
arch/ia64/kernel/machine_kexec.c:139: error: implicit declaration of function 'SIZE'
arch/ia64/kernel/machine_kexec.c:139: error: 'node_memblk_s' undeclared (first use in this function)
arch/ia64/kernel/machine_kexec.c:139: error: (Each undeclared identifier is reported only once
arch/ia64/kernel/machine_kexec.c:139: error: for each function it appears in.)
arch/ia64/kernel/machine_kexec.c:140: error: implicit declaration of function 'OFFSET'
arch/ia64/kernel/machine_kexec.c:140: error: 'start_paddr' undeclared (first use in this function)
arch/ia64/kernel/machine_kexec.c:141: error: 'size' undeclared (first use in this function)
arch/ia64/kernel/machine_kexec.c:144: error: implicit declaration of function 'CONFIG'
arch/ia64/kernel/machine_kexec.c:144: error: 'PGTABLE_3' undeclared (first use in this function)
make[1]: *** [arch/ia64/kernel/machine_kexec.o] Error 1
make: *** [arch/ia64/kernel] Error 2
#


Thanks
Ken'ichi Ohmichi
-

From: Adrian Bunk
Date: Monday, September 10, 2007 - 5:20 am

Thanks, I missed this.

That's 80% my fault and 20% the fault of the usage of generic names 
SYMBOL/SIZE/OFFSET/LENGTH/CONFIG making it impossible to grep for them 
(and namespace conflicts quite possible).

Can we get these #define's properly prefixed (e.g. KEXEC_SYMBOL etc.) so 
that other people will not repeat my mistake and namespace conflicts 

TIA
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Maneesh Soni
Date: Monday, September 10, 2007 - 10:53 pm

CRASH_DUMP_ or VMCORE_ should be a better prefix as the dump filtering
functionality not directly related to kexec.

Thanks
Maneesh


-- 
Maneesh Soni
Linux Technology Center,
IBM India Systems and Technology Lab, 
Bangalore, India
-

From: Ken'ichi Ohmichi
Date: Wednesday, September 12, 2007 - 12:37 am

Hi Adrian, Maneesh,

Maneesh Soni wrote:
 > On Mon, Sep 10, 2007 at 02:20:40PM +0200, Adrian Bunk wrote:
 >> On Mon, Sep 10, 2007 at 11:55:49AM +0900, Ken'ichi Ohmichi wrote:
 >>> Hi Adrian,
 >>>
 >>>
 >>> 2007/09/09 22:25:16 +0200, Adrian Bunk <bunk@kernel.org> wrote:
 >>>> On Fri, Aug 31, 2007 at 09:58:22PM -0700, Andrew Morton wrote:
 >>>>> ...
 >>>>> Changes since 2.6.23-rc3-mm1:
 >>>>> ...
 >>>>> +add-vmcoreinfo.patch
 >>>>> ...
 >>>>>  misc
 >>>>> ...
 >>>> This patch makes the following needlessly global code static:
 >>>> - vmcoreinfo_data[]
 >>>> - vmcoreinfo_size
 >>>> - vmcoreinfo_append_str()
 >>> The kernel compiling fails with your patch because architecture-specific
 >>> function should access the above data/function:
 >>>
 >>> # make
 >>> [snip]
 >>> arch/ia64/kernel/machine_kexec.c: In function 'arch_crash_save_vmcoreinfo':
 >>> arch/ia64/kernel/machine_kexec.c:134: error: implicit declaration of function 'SYMBOL'
 >>> arch/ia64/kernel/machine_kexec.c:135: error: implicit declaration of function 'LENGTH'
 >>> arch/ia64/kernel/machine_kexec.c:139: error: implicit declaration of function 'SIZE'
 >>> arch/ia64/kernel/machine_kexec.c:139: error: 'node_memblk_s' undeclared (first use in this function)
 >>> arch/ia64/kernel/machine_kexec.c:139: error: (Each undeclared identifier is reported only once
 >>> arch/ia64/kernel/machine_kexec.c:139: error: for each function it appears in.)
 >>> arch/ia64/kernel/machine_kexec.c:140: error: implicit declaration of function 'OFFSET'
 >>> arch/ia64/kernel/machine_kexec.c:140: error: 'start_paddr' undeclared (first use in this function)
 >>> arch/ia64/kernel/machine_kexec.c:141: error: 'size' undeclared (first use in this function)
 >>> arch/ia64/kernel/machine_kexec.c:144: error: implicit declaration of function 'CONFIG'
 >>> arch/ia64/kernel/machine_kexec.c:144: error: 'PGTABLE_3' undeclared (first use in this function)
 >>> make[1]: *** [arch/ia64/kernel/machine_kexec.o] Error 1
 >>> make: *** [arch/ia64/kernel] ...
From: Adrian Bunk
Date: Sunday, September 9, 2007 - 1:25 pm

This hydra had more than one head...

Signed-off-by: Adrian Bunk <bunk@kernel.org>

---

 arch/i386/kernel/irq.c    |    2 --
 arch/powerpc/kernel/irq.c |    1 -
 arch/s390/kernel/irq.c    |    1 -
 arch/sh/kernel/irq.c      |    1 -
 arch/x86_64/kernel/irq.c  |    1 -
 5 files changed, 6 deletions(-)

68791fe88172ac3c2dbb0fbbffb8befc7b59e3f7 
diff --git a/arch/i386/kernel/irq.c b/arch/i386/kernel/irq.c
index a6b2c7e..de1601f 100644
--- a/arch/i386/kernel/irq.c
+++ b/arch/i386/kernel/irq.c
@@ -231,8 +231,6 @@ asmlinkage void do_softirq(void)
 
 	local_irq_restore(flags);
 }
-
-EXPORT_SYMBOL(do_softirq);
 #endif
 
 /*
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index dfad0e4..65c2409 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -395,7 +395,6 @@ void do_softirq(void)
 
 	local_irq_restore(flags);
 }
-EXPORT_SYMBOL(do_softirq);
 
 
 /*
diff --git a/arch/s390/kernel/irq.c b/arch/s390/kernel/irq.c
index 8f0cbca..c36d812 100644
--- a/arch/s390/kernel/irq.c
+++ b/arch/s390/kernel/irq.c
@@ -95,7 +95,6 @@ asmlinkage void do_softirq(void)
 
 	local_irq_restore(flags);
 }
-EXPORT_SYMBOL(do_softirq);
 
 void init_irq_proc(void)
 {
diff --git a/arch/sh/kernel/irq.c b/arch/sh/kernel/irq.c
index 0340498..4b49d03 100644
--- a/arch/sh/kernel/irq.c
+++ b/arch/sh/kernel/irq.c
@@ -245,7 +245,6 @@ asmlinkage void do_softirq(void)
 
 	local_irq_restore(flags);
 }
-EXPORT_SYMBOL(do_softirq);
 #endif
 
 void __init init_IRQ(void)
diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c
index 87423b7..3542f0c 100644
--- a/arch/x86_64/kernel/irq.c
+++ b/arch/x86_64/kernel/irq.c
@@ -236,4 +236,3 @@ asmlinkage void do_softirq(void)
 	}
  	local_irq_restore(flags);
 }
-EXPORT_SYMBOL(do_softirq);

-

From: David Miller
Date: Wednesday, September 12, 2007 - 6:14 am

From: Adrian Bunk <bunk@kernel.org>

Applied, thanks.
-

From: Adrian Bunk
Date: Sunday, September 9, 2007 - 1:25 pm

raise_softirq_irqoff no longer has any modular user.

Signed-off-by: Adrian Bunk <bunk@kernel.org>

---
eff0407b63757cdd4164a0bdde0313e8f154b6dc 
diff --git a/kernel/softirq.c b/kernel/softirq.c
index abae56c..ce38b56 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -335,8 +335,6 @@ inline fastcall void raise_softirq_irqoff(unsigned int nr)
 		wakeup_softirqd();
 }
 
-EXPORT_SYMBOL(raise_softirq_irqoff);
-
 void fastcall raise_softirq(unsigned int nr)
 {
 	unsigned long flags;

-

From: Christoph Hellwig
Date: Sunday, September 9, 2007 - 1:41 pm

This should probably go in through Dave's tree as it's removing this
rather annoying user.

-

From: David Miller
Date: Wednesday, September 12, 2007 - 6:15 am

From: Christoph Hellwig <hch@infradead.org>

Yep, I've just tossed it into my tree.

Thanks.
-

From: Adrian Bunk
Date: Sunday, September 9, 2007 - 1:25 pm

This patch makes the following needlessly global functions static:
- lock_page_container()
- unlock_page_container()
- __mem_container_move_lists()

Additionally, there was no reason for the "mem_control_type" object.

Signed-off-by: Adrian Bunk <bunk@kernel.org>

---

 mm/memcontrol.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

b582cc510b6b0a182dc56025828e7a3c566b9724 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8162d98..49bf04f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -91,7 +91,7 @@ enum {
 	MEM_CONTAINER_TYPE_CACHED,
 	MEM_CONTAINER_TYPE_ALL,
 	MEM_CONTAINER_TYPE_MAX,
-} mem_control_type;
+};
 
 static struct mem_container init_mem_container;
 
@@ -156,18 +156,18 @@ struct page_container *page_get_page_container(struct page *page)
 		(page->page_container & ~PAGE_CONTAINER_LOCK);
 }
 
-void __always_inline lock_page_container(struct page *page)
+static void __always_inline lock_page_container(struct page *page)
 {
 	bit_spin_lock(PAGE_CONTAINER_LOCK_BIT, &page->page_container);
 	VM_BUG_ON(!page_container_locked(page));
 }
 
-void __always_inline unlock_page_container(struct page *page)
+static void __always_inline unlock_page_container(struct page *page)
 {
 	bit_spin_unlock(PAGE_CONTAINER_LOCK_BIT, &page->page_container);
 }
 
-void __mem_container_move_lists(struct page_container *pc, bool active)
+static void __mem_container_move_lists(struct page_container *pc, bool active)
 {
 	if (active)
 		list_move(&pc->lru, &pc->mem_container->active_list);

-

From: Balbir Singh
Date: Monday, September 10, 2007 - 1:23 am

This looks good as well


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL
-

From: Jan Engelhardt
Date: Monday, September 10, 2007 - 12:58 pm

Yes, typedefs are bad. And because it happens very so often,
I also have the link: http://lkml.org/lkml/2006/11/21/34



	Jan
-- 
-

From: Jan Engelhardt
Date: Monday, September 10, 2007 - 12:59 pm

Humm. Judging from the @@-line, it looks like:

enum {
	MEM_CONTAINER_TYPE_WHATEVER
} mem_control_type;

making it actually a variable name. Confusing at best.


	Jan
-- 
-

From: Adrian Bunk
Date: Monday, September 10, 2007 - 2:59 pm

It's not about style - your "mem_control_type" was not an identifier,
it was an (unused) variable.


It seems the intended code was:

enum mem_control_type {
        MEM_CONTAINER_TYPE_UNSPEC = 0,
        MEM_CONTAINER_TYPE_MAPPED,
        MEM_CONTAINER_TYPE_CACHED,
        MEM_CONTAINER_TYPE_ALL,
        MEM_CONTAINER_TYPE_MAX,
};


cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Balbir Singh
Date: Monday, September 10, 2007 - 7:41 pm

Yes, thinking again, what you say makes sense.


-- 
	Thanks,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL
-

From: Adrian Bunk
Date: Sunday, September 9, 2007 - 1:25 pm

This patch makes the following needlessly globalvariables static:
- sctp_memory_pressure
- sctp_memory_allocated
- sctp_sockets_allocated

Signed-off-by: Adrian Bunk <bunk@kernel.org>

---
3c211ad074038414ebc156b1abbc3df78dc60cb2 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 37e7306..f53545a 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -112,9 +112,9 @@ extern int sysctl_sctp_mem[3];
 extern int sysctl_sctp_rmem[3];
 extern int sysctl_sctp_wmem[3];
 
-int sctp_memory_pressure;
-atomic_t sctp_memory_allocated;
-atomic_t sctp_sockets_allocated;
+static int sctp_memory_pressure;
+static atomic_t sctp_memory_allocated;
+static atomic_t sctp_sockets_allocated;
 
 static void sctp_enter_memory_pressure(void)
 {

-

From: Neil Horman
Date: Monday, September 10, 2007 - 7:05 am

Looks fine to me
Acked-by: Neil Horman <nhorman@tuxdriver.com>

Neil

-- 
/***************************************************
 *Neil Horman
 *nhorman@tuxdriver.com
 *gpg keyid: 1024D / 0x92A74FA1
 *http://pgp.mit.edu
 ***************************************************/
-

From: David Miller
Date: Wednesday, September 12, 2007 - 6:18 am

From: Adrian Bunk <bunk@kernel.org>

Applied, thanks.
-

From: Adrian Bunk
Date: Sunday, September 9, 2007 - 1:25 pm

tcp_splice_data_recv() can become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>

---
233aefd2a215430c16bd02eca06fb8a4b6079f7a 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 22576e4..6623796 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -515,8 +515,9 @@ static inline void tcp_push(struct sock *sk, int flags, int mss_now,
 	}
 }
 
-int tcp_splice_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
-			 unsigned int offset, size_t len)
+static int tcp_splice_data_recv(read_descriptor_t *rd_desc,
+				struct sk_buff *skb,
+				unsigned int offset, size_t len)
 {
 	struct tcp_splice_state *tss = rd_desc->arg.data;
 

-

From: David Miller
Date: Wednesday, September 12, 2007 - 6:21 am

From: Adrian Bunk <bunk@kernel.org>

I'll let Jens or similar pick this one up since it
obviously won't apply to my tree.

-

From: Jens Axboe
Date: Wednesday, September 12, 2007 - 10:44 am

I'll shove it in my #splice-net branch, where it originates from.

-- 
Jens Axboe

-

From: Adrian Bunk
Date: Sunday, September 9, 2007 - 1:26 pm

do_try_to_free_pages() can become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>

---
23781fa6792c518c8581ceeaf08db251574e8430 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b34b29d..9104cf8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1248,8 +1248,8 @@ static unsigned long shrink_zones(int priority, struct zone **zones,
  * holds filesystem locks which prevent writeout this might not work, and the
  * allocation attempt will fail.
  */
-unsigned long do_try_to_free_pages(struct zone **zones, gfp_t gfp_mask,
-					struct scan_control *sc)
+static unsigned long do_try_to_free_pages(struct zone **zones, gfp_t gfp_mask,
+					  struct scan_control *sc)
 {
 	int priority;
 	int ret = 0;

-

From: Balbir Singh
Date: Monday, September 10, 2007 - 1:24 am

Thanks, looks good!

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL
-

From: Andy Whitcroft
Date: Monday, September 10, 2007 - 10:43 am

Am seeing the following compile error on all of my powerpc platforms:

  CC      kernel/sched.o
  kernel/sched.c: In function `cpu_to_phys_group':
  kernel/sched.c:5937: error: `per_cpu__cpu_sibling_map' undeclared (first use in this function)
  kernel/sched.c:5937: error: (Each undeclared identifier is reported only once
  kernel/sched.c:5937: error: for each function it appears in.)
  kernel/sched.c:5937: warning: type defaults to `int' in declaration of `type name'
  kernel/sched.c:5937: error: invalid type argument of `unary *'
  kernel/sched.c: In function `build_sched_domains':
  kernel/sched.c:6172: error: `per_cpu__cpu_sibling_map' undeclared (first use in this function)
  kernel/sched.c:6172: warning: type defaults to `int' in declaration of `type name'
  kernel/sched.c:6172: error: invalid type argument of `unary *'
  kernel/sched.c:6183: warning: type defaults to `int' in declaration of `type name'
  kernel/sched.c:6183: error: invalid type argument of `unary *'
  make[1]: *** [kernel/sched.o] Error 1
  make: *** [kernel] Error 2

-apw
-

From: Andy Whitcroft
Date: Monday, September 10, 2007 - 10:49 am

I have a couple of old NUMA-Q systems which are unable to read their
boot disks with 2.6.23-rc4-mm1.  The disks appear to be recognised and
even the partition tables read correctly, and then they go pop:

  qla1280: QLA1040 found on PCI bus 0, dev 10
  Clocksource tsc unstable (delta = 99922590 ns)
  Time: jiffies clocksource has been installed.
  scsi(0:0): Resetting SCSI BUS
  scsi0 : QLogic QLA1040 PCI to SCSI Host Adapter
         Firmware version:  7.65.06, Driver version 3.26
  scsi 0:0:0:0: Direct-Access     IBM      DGHS18X          0360 PQ: 0 ANSI: 3
  scsi(0:0:0:0): Sync: period 10, offset 12, Wide
  scsi 0:0:1:0: Direct-Access     IBM OEM  DCHS09X          5454 PQ: 0 ANSI: 2
  scsi(0:0:1:0): Sync: period 10, offset 12, Wide
  scsi 0:0:2:0: Direct-Access     IBM OEM  DCHS09X          5454 PQ: 0 ANSI: 2
  scsi(0:0:2:0): Sync: period 10, offset 12, Wide
  scsi 0:0:3:0: Direct-Access     IBM OEM  DCHS09X          5454 PQ: 0 ANSI: 2
  scsi(0:0:3:0): Sync: period 10, offset 12, Wide
  scsi 0:0:4:0: Direct-Access     IBM OEM  DCHS09X          5454 PQ: 0 ANSI: 2
  scsi(0:0:4:0): Sync: period 10, offset 12, Wide
  st: Version 20070203, fixed bufsize 32768, s/g segs 256
  sd 0:0:0:0: [sda] 35843670 512-byte hardware sectors (18352 MB)
  sd 0:0:0:0: [sda] Write Protect is off
  sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA
  sd 0:0:0:0: [sda] 35843670 512-byte hardware sectors (18352 MB)
  sd 0:0:0:0: [sda] Write Protect is off
  sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA
   sda: sda1
  sd 0:0:0:0: [sda] Attached SCSI disk
  sd 0:0:1:0: [sdb] 17796077 512-byte hardware sectors (9112 MB)
  sd 0:0:1:0: [sdb] Write Protect is off
  sd 0:0:1:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
  sd 0:0:1:0: [sdb] 17796077 512-byte hardware sectors (9112 MB)
  sd 0:0:1:0: [sdb] Write Protect is off
  sd 0:0:1:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
  ...
From: Andrew Morton
Date: Monday, September 10, 2007 - 11:19 am

The only patch which touches qla1280 is git-block.patch.  From a quick
squizz the change looks OK, although it's tricky and something might have
broken.

(the dprintk at line 2929 needs to print remseg, not seg_cnt).

Can you retest with that change reverted (below)?  If it's not that then
perhaps something in scsi core broke, dunno.


diff -puN drivers/scsi/qla1280.c~revert-1 drivers/scsi/qla1280.c
--- a/drivers/scsi/qla1280.c~revert-1
+++ a/drivers/scsi/qla1280.c
@@ -2775,7 +2775,7 @@ qla1280_64bit_start_scsi(struct scsi_qla
 	struct device_reg __iomem *reg = ha->iobase;
 	struct scsi_cmnd *cmd = sp->cmd;
 	cmd_a64_entry_t *pkt;
-	struct scatterlist *sg = NULL, *s;
+	struct scatterlist *sg = NULL;
 	__le32 *dword_ptr;
 	dma_addr_t dma_handle;
 	int status = 0;
@@ -2889,16 +2889,13 @@ qla1280_64bit_start_scsi(struct scsi_qla
 	 * Load data segments.
 	 */
 	if (seg_cnt) {	/* If data transfer. */
-		int remseg = seg_cnt;
 		/* Setup packet address segment pointer. */
 		dword_ptr = (u32 *)&pkt->dseg_0_address;
 
 		if (cmd->use_sg) {	/* If scatter gather */
 			/* Load command entry data segments. */
-			for_each_sg(sg, s, seg_cnt, cnt) {
-				if (cnt == 2)
-					break;
-				dma_handle = sg_dma_address(s);
+			for (cnt = 0; cnt < 2 && seg_cnt; cnt++, seg_cnt--) {
+				dma_handle = sg_dma_address(sg);
 #if defined(CONFIG_IA64_GENERIC) || defined(CONFIG_IA64_SGI_SN2)
 				if (ha->flags.use_pci_vchannel)
 					sn_pci_set_vchan(ha->pdev,
@@ -2909,12 +2906,12 @@ qla1280_64bit_start_scsi(struct scsi_qla
 					cpu_to_le32(pci_dma_lo32(dma_handle));
 				*dword_ptr++ =
 					cpu_to_le32(pci_dma_hi32(dma_handle));
-				*dword_ptr++ = cpu_to_le32(sg_dma_len(s));
+				*dword_ptr++ = cpu_to_le32(sg_dma_len(sg));
+				sg++;
 				dprintk(3, "S/G Segment phys_addr=%x %x, len=0x%x\n",
 					cpu_to_le32(pci_dma_hi32(dma_handle)),
 					cpu_to_le32(pci_dma_lo32(dma_handle)),
-					cpu_to_le32(sg_dma_len(sg_next(s))));
-				remseg--;
+					cpu_to_le32(sg_dma_len(sg)));
 ...
From: Torsten Kaiser
Date: Monday, September 10, 2007 - 11:59 am

I reported a similar problem on Sep 1, but until now got no response.
The system boots, reads the partition tables, starts the RAID and then

From my log:
[    3.890000] scsi0 : sata_sil24
[    3.900000] scsi1 : sata_sil24
[    3.900000] ata1: SATA max UDMA/100 host m128@0xefeffc00 port
0xefef8000 irq 16
[    3.920000] ata2: SATA max UDMA/100 host m128@0xefeffc00 port
0xefefa000 irq 16
[    4.300000] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    4.360000] ata1.00: ATA-7: MAXTOR STM3320820AS, 3.AAE, max UDMA/133
[    4.370000] ata1.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32)
[    4.430000] ata1.00: configured for UDMA/100
[    4.500000] ieee1394: Node added: ID:BUS[0-00:1023]  GUID[0010dc00005cc354]
[    4.500000] ieee1394: Host added: ID:BUS[0-01:1023]  GUID[0011d80000c4c261]
[    4.790000] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    4.850000] ata2.00: ATA-7: MAXTOR STM3320820AS, 3.AAE, max UDMA/133
[    4.860000] ata2.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32)
[    4.920000] ata2.00: configured for UDMA/100
[    4.930000] scsi 0:0:0:0: Direct-Access     ATA      MAXTOR
STM332082 3.AA PQ: 0 ANSI: 5
[    4.960000] sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
[    4.980000] sd 0:0:0:0: [sda] Write Protect is off
[    4.990000] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    4.990000] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[    5.020000] sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
[    5.040000] sd 0:0:0:0: [sda] Write Protect is off
[    5.050000] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    5.050000] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[    5.080000]  sda: sda1 sda2
[    5.110000] sd 0:0:0:0: [sda] Attached SCSI disk
[    5.120000] scsi 1:0:0:0: Direct-Access     ATA      MAXTOR
STM332082 3.AA PQ: 0 ANSI: 5
[    5.140000] sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors ...
From: Andrew Morton
Date: Monday, September 10, 2007 - 12:20 pm

You still haven't had a response ;)  Let's add a cc.

Oh, you reported it against 2.6.23-rc4-mm1
(http://lkml.org/lkml/2007/9/1/92) and I did cc linux-ide in my response.

I'll continue to point out where this sort of thing occurs because last
week I was told that a reson why so many bug reports are ignored is because

Andy is using qla1280.  You're using sata.  So it's probably a different

Can you please confirm that this bug is present in -mm and not present in
mainline (yet)?

Thanks.
-

From: Torsten Kaiser
Date: Monday, September 10, 2007 - 12:38 pm

But the mail from Andy was a nice point to try to another cc, i.e.

Yes, but you (Andrew) also said in response to Andy: "If it's not that then
perhaps something in scsi core broke, dunno." So I wanted to add that

Currently using 2.6.23-rc3-mm1, that works for me.
Now downloading 2.6.23-rc5-git1...

Torsten
-

From: FUJITA Tomonori
Date: Monday, September 10, 2007 - 12:42 pm

On Mon, 10 Sep 2007 12:20:38 -0700


This might be a sg chaining bug too (probabaly sg chaining libata
patch).

Can you try the following patch that I've just sent:

http://lkml.org/lkml/2007/9/10/251

The patch also disables chaining sg list for libata.
-

From: Torsten Kaiser
Date: Monday, September 10, 2007 - 1:43 pm

With this patch 2.6.23-rc4-mm1 works for me.
Mainline 2.6.23-rc5-git1 works also without needing any patches.

Torsten
-

From: Jens Axboe
Date: Tuesday, September 11, 2007 - 1:32 am

OK, thanks for testing that. I'll merge Tomo's patch so that we can
selectively enable drivers when we KNOW they work, instead of trying to
do this (massive) operation whole sale.

-- 
Jens Axboe

-

From: FUJITA Tomonori
Date: Monday, September 10, 2007 - 12:10 pm

On Mon, 10 Sep 2007 11:19:26 -0700

Can you try this patch (against 2.6.23-rc4-mm1)?

From 592bd2049cb3e6e1f1dde7cf631879f26ddffeaa Mon Sep 17 00:00:00 2001
From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date: Mon, 10 Sep 2007 04:17:13 +0100
Subject: [PATCH] qla1280: sg chaining fixes

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
 drivers/scsi/qla1280.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/qla1280.c b/drivers/scsi/qla1280.c
index bd805ec..7c1eaec 100644
--- a/drivers/scsi/qla1280.c
+++ b/drivers/scsi/qla1280.c
@@ -2977,8 +2977,8 @@ qla1280_64bit_start_scsi(struct scsi_qla_host *ha, struct srb * sp)
 						cpu_to_le32(pci_dma_hi32(dma_handle)),
 						cpu_to_le32(pci_dma_lo32(dma_handle)),
 						cpu_to_le32(sg_dma_len(s)));
-					remseg--;
 				}
+				remseg -= cnt;
 				dprintk(5, "qla1280_64bit_start_scsi: "
 					"continuation packet data - b %i, t "
 					"%i, l %i \n", SCSI_BUS_32(cmd),
@@ -3250,6 +3250,8 @@ qla1280_32bit_start_scsi(struct scsi_qla_host *ha, struct srb * sp)
 
 				/* Load continuation entry data segments. */
 				for_each_sg(sg, s, remseg, cnt) {
+					if (cnt == 7)
+						break;
 					*dword_ptr++ =
 						cpu_to_le32(pci_dma_lo32(sg_dma_address(s)));
 					*dword_ptr++ =
@@ -3260,6 +3262,7 @@ qla1280_32bit_start_scsi(struct scsi_qla_host *ha, struct srb * sp)
 						cpu_to_le32(pci_dma_lo32(sg_dma_address(s))),
 						cpu_to_le32(sg_dma_len(s)));
 				}
+				remseg -= cnt;
 				dprintk(5, "qla1280_32bit_start_scsi: "
 					"continuation packet data - "
 					"scsi(%i:%i:%i)\n", SCSI_BUS_32(cmd),
-- 
1.5.2.4


-

From: Andy Whitcroft
Date: Thursday, September 13, 2007 - 10:34 am

Yep this patch seems to sort out booting on these boxes.  The other one

-apw
-

From: Paul Jackson
Date: Friday, September 14, 2007 - 9:16 pm

This patch works for me.

I was getting the scsi errors reported earlier in
this thread, running 2.6.23-rc4-mm1 on one of our
big SGI Altix systems.

Applying this patch fixed it, so far as I can tell,
which is to say my system boots cleanly once again.

Thanks.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401
-

From: FUJITA Tomonori
Date: Saturday, September 15, 2007 - 3:52 am

On Fri, 14 Sep 2007 21:16:35 -0700

Thanks for testing!

Jens, we could enable use_sg_chaining option for qla1280.


From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Subject: [PATCH] qla1280: enable use_sg_chaining option

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
 drivers/scsi/qla1280.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/qla1280.c b/drivers/scsi/qla1280.c
index 7c1eaec..83249af 100644
--- a/drivers/scsi/qla1280.c
+++ b/drivers/scsi/qla1280.c
@@ -4259,6 +4259,7 @@ static struct scsi_host_template qla1280_driver_template = {
 	.sg_tablesize		= SG_ALL,
 	.cmd_per_lun		= 1,
 	.use_clustering		= ENABLE_CLUSTERING,
+	.use_sg_chaining	= ENABLE_SG_CHAINING,
 };
 
 
-- 
1.5.2.4

-

From: Jens Axboe
Date: Monday, September 17, 2007 - 6:28 am

Added, thanks!

-- 
Jens Axboe

-

From: FUJITA Tomonori
Date: Monday, September 17, 2007 - 7:32 am

On Mon, 17 Sep 2007 15:28:19 +0200

Thanks.

BTW, please don't forget to integrate the following patches:


- revert sg segment size ifdefs

http://marc.info/?l=linux-scsi&m=118881264013097&w=2

- remove sglist_len

http://marc.info/?l=linux-scsi&m=118907920405100&w=2
-

From: Jens Axboe
Date: Tuesday, September 18, 2007 - 3:18 am

Added, and I rebased the sglist-* branches to current again. So
everything should be fully uptodate once more.

-- 
Jens Axboe

-

From: FUJITA Tomonori
Date: Tuesday, September 18, 2007 - 5:25 am

On Tue, 18 Sep 2007 12:18:40 +0200

Thanks, here are a few more things.

- please drop the iscsi patch since Mike has major changes to iscsi
I/O path.

- ipr sg chaining need to be disabled since libata is not ready.

- you can add Doug's ACK to scsi_debug patch:

http://marc.info/?l=linux-scsi&m=118926325931801&w=2
-

From: Jens Axboe
Date: Tuesday, September 18, 2007 - 5:51 am

All done.

-- 
Jens Axboe

-

From: FUJITA Tomonori
Date: Monday, September 10, 2007 - 12:31 pm

On Mon, 10 Sep 2007 11:19:26 -0700

Even if we revert the qla1280 patch, scsi-ml still sends chaining sg
list. So it doesn't work.

The following patch disables chaining sg list for qla1280. If the fix
that I've just sent doesn't work, please try this.

-
From: FUJITA Tomonori <tomof@acm.org>
Subject: [PATCH] add use_sg_chaining option to scsi_host_template

This option is true if a low-level driver can support sg
chaining. This will be removed eventually when all the drivers are
converted to support sg chaining. q->max_phys_segments is set to
SCSI_MAX_SG_SEGMENTS if false.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
 arch/ia64/hp/sim/simscsi.c            |    1 +
 drivers/scsi/3w-9xxx.c                |    1 +
 drivers/scsi/3w-xxxx.c                |    1 +
 drivers/scsi/BusLogic.c               |    1 +
 drivers/scsi/NCR53c406a.c             |    3 ++-
 drivers/scsi/a100u2w.c                |    1 +
 drivers/scsi/aacraid/linit.c          |    1 +
 drivers/scsi/aha1740.c                |    1 +
 drivers/scsi/aic7xxx/aic79xx_osm.c    |    1 +
 drivers/scsi/aic7xxx/aic7xxx_osm.c    |    1 +
 drivers/scsi/aic7xxx_old.c            |    1 +
 drivers/scsi/arcmsr/arcmsr_hba.c      |    1 +
 drivers/scsi/dc395x.c                 |    1 +
 drivers/scsi/dpt_i2o.c                |    1 +
 drivers/scsi/eata.c                   |    3 ++-
 drivers/scsi/hosts.c                  |    1 +
 drivers/scsi/hptiop.c                 |    1 +
 drivers/scsi/ibmmca.c                 |    1 +
 drivers/scsi/ibmvscsi/ibmvscsi.c      |    1 +
 drivers/scsi/initio.c                 |    1 +
 drivers/scsi/ipr.c                    |    1 +
 drivers/scsi/lpfc/lpfc_scsi.c         |    2 ++
 drivers/scsi/mac53c94.c               |    1 +
 drivers/scsi/megaraid.c               |    1 +
 drivers/scsi/megaraid/megaraid_mbox.c |    1 +
 drivers/scsi/megaraid/megaraid_sas.c  |    1 +
 drivers/scsi/mesh.c                   |    1 +
 drivers/scsi/nsp32.c                  |    1 ...
From: Andy Whitcroft
Date: Friday, September 14, 2007 - 1:10 am

On Tue, Sep 11, 2007 at 04:31:12AM +0900, FUJITA Tomonori wrote:

Ok, the other patch _did_ work, but this got tested anyhow and it did

-apw
-

From: Torsten Kaiser
Date: Friday, September 14, 2007 - 6:01 am

Sorry to confirm this. My RAID5 got destroyed a second time.
To summarize what worked / not worked / and seems to work for me:

First 2 tries with unpatched rc4-mm1: Both times one sata_sil24-drive got kicked
Then I switched back to rc3-mm1, 18 boots with that kernel worked.
Then I tried the patched rc4-mm1 and it worked too.
The next boot also worked, but the third time kicked a drive out again.
But as nobody reads logs, I did not notice that and keep using the
patched rc4-mm1.
The next 5 times the system worked normally with the two remaining drives.
The sixth boot kicked the second sata_sil24 drive. That I did notice...
After reassembling the RAID, I'm now back to the patch rc4-mm1 that
did boot correctly this time.
So the patch just makes it unlikelier to hit the bug. Instead of
failing 2 out of 2 times, it only failed 2 out of 8 times.
I compared the rc4-mm1 boot from a working case and the case where it
kicked the first drive. Nothing seems to stand out...


145c145
< CPU 0: aperture @ 4000000 size 32 MB
154c154
< Calibrating delay using timer specific routine.. 5203.23 BogoMIPS
(lpj=26016160)
169c169
< APIC timer calibration result 12499998
173c173
< Calibrating delay using timer specific routine.. 5222.40 BogoMIPS
(lpj=26112010)
182c182
< Calibrating delay using timer specific routine.. 5222.73 BogoMIPS
(lpj=26113694)
191c191
< Calibrating delay using timer specific routine.. 5223.07 BogoMIPS
(lpj=26115369)
269d268
< Switched to high resolution mode on CPU 3
502,509c502,509
< raid6: int64x1   2634 MB/s
< raid6: int64x2   3244 MB/s
< raid6: int64x4   3405 MB/s
< raid6: int64x8   2614 MB/s
< raid6: sse2x1    3607 MB/s
< raid6: sse2x2    4834 MB/s
< raid6: sse2x4    4946 MB/s
< raid6: using algorithm sse2x4 (4946 MB/s)
567c567
< md1: bitmap initialized from disk: read 10/10 pages, set 96 bits

Another good boot also showed the aperture at a similar high address:
CPU 0: aperture @ b7f2000000 size 32 MB
And that good boot also showed the "correct" ...
From: Andrew Morton
Date: Friday, September 14, 2007 - 1:15 pm

Let's keep linux-ide cc'ed, please.
-

From: Laurent Riffard
Date: Monday, September 10, 2007 - 1:19 pm

Le 01.09.2007 06:58, Andrew Morton a 
From: Laurent Riffard
Date: Thursday, September 13, 2007 - 3:50 pm

I dig through git-block.patch and the culprit seems to be commit
c94f1c4ac87862675c8d70941973bc3a69aff5d8 "bio: use memset() in
bio_init()".

Maybe the real bug is a bad bio initialization in pktcdvd driver,
-

From: Andrew Morton
Date: Thursday, September 13, 2007 - 4:05 pm

On Fri, 14 Sep 2007 00:50:25 +0200

I think I'll be dropping git-block.  There were a number of problems
in rc4-mm1 (for which I have a sprinkling of messy-looking patches
somewhere ahead of my current cursor) and nothing seems to have
happened in the git tree for a month or so.
-

From: Jens Axboe
Date: Friday, September 14, 2007 - 1:00 am

Huh? It's not even two weeks old. And here we go again, git-block is
getting dropped and you'll be complaining about lack of testing next.
I'll update the branches today as discussed with Tomo, that should work
fine.

-- 
Jens Axboe

-

From: Jens Axboe
Date: Friday, September 14, 2007 - 1:30 am

Branches updated with the scsi host template addition and the qla sg
chaining fix.

-- 
Jens Axboe

-

From: Jens Axboe
Date: Friday, September 14, 2007 - 2:33 am

At least pktcdvd doesn't expect bio->bi_io_vec[] to be cleared, that's
why it's oopsing now. I'll revert this bit for now, thanks for the
report.

-- 
Jens Axboe

-

From: Jens Axboe
Date: Friday, September 14, 2007 - 4:06 am

Rethinking this, I think bio_init() is doing the right thing, only
pktcdvd seems to rely on it preserving some members. So I'd rather fixup
pktcdvd instead.

Does this work for you?

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index fadbfd8..98343a1 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -1142,16 +1142,20 @@ static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt)
 	 * Schedule reads for missing parts of the packet.
 	 */
 	for (f = 0; f < pkt->frames; f++) {
+		struct bio_vec *vec;
+
 		int p, offset;
 		if (written[f])
 			continue;
 		bio = pkt->r_bios[f];
+		vec = bio->bi_io_vec;
 		bio_init(bio);
 		bio->bi_max_vecs = 1;
 		bio->bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9);
 		bio->bi_bdev = pd->bdev;
 		bio->bi_end_io = pkt_end_io_read;
 		bio->bi_private = pkt;
+		bio->bi_io_vec = vec;
 
 		p = (f * CD_FRAMESIZE) / PAGE_SIZE;
 		offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
@@ -1448,6 +1452,7 @@ static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
 	pkt->w_bio->bi_bdev = pd->bdev;
 	pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
 	pkt->w_bio->bi_private = pkt;
+	pkt->w_bio->bi_io_vec = bvec;
 	for (f = 0; f < pkt->frames; f++)
 		if (!bio_add_page(pkt->w_bio, bvec[f].bv_page, CD_FRAMESIZE, bvec[f].bv_offset))
 			BUG();

-- 
Jens Axboe

-

From: Laurent Riffard
Date: Friday, September 14, 2007 - 12:04 pm

Well, it's better: I was able to mount the DVD-RW, sync, and write data,
but kernel oopsed when I unmounted the drive:

[  529.295829] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
[  529.296490] printing eip: 00000000 *pde = 00000000 
[  529.297106] Oops: 0000 [#1] PREEMPT 
[  529.297702] last sysfs file: /block/pktcdvd0/range
[  529.298284] Modules linked in: udf binfmt_misc pktcdvd radeon drm lp nls_iso8859_1 nls_cp850 vfat fat reiser4 lzo_decompress lzo_compress eeprom w83781d hwmon_vid snd_ens1371 gameport snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event firewire_ohci firewire_core snd_seq crc_itu_t sg snd_timer snd_seq_device 8250_pnp snd sr_mod cdrom rtc ohci1394 i2c_viapro 8250 serial_core uhci_hcd soundcore snd_page_alloc floppy pcspkr ne2k_pci 8390 parport_pc via686a ieee1394 usbcore parport ata_generic via_agp agpgart evdev reiserfs sd_mod pata_via libata scsi_mod dm_mirror dm_mod
[  529.302127] 
[  529.302785] Pid: 3718, comm: umount Not tainted (2.6.23-rc4-mm1 #73)
[  529.303493] EIP: 0060:[<00000000>] EFLAGS: 00010202 CPU: 0
[  529.304207] EIP is at _stext+0x3feff000/0x19
[  529.304911] EAX: c30ded90 EBX: cb110da8 ECX: 00000000 EDX: c30ded90
[  529.305640] ESI: 00000001 EDI: cb0c7748 EBP: cb1dfe98 ESP: cb1dfe90
[  529.306389]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[  529.307136] Process umount (pid: 3718, ti=cb1df000 task=c27157b0 task.ti=cb1df000)
[  529.307213] Stack: c017b4bf 00000000 cb1dfeb0 e1c0e57a cb1115d8 cb0c7748 c1e4a828 c26663c8 
[  529.308122]        cb1dfec4 e1c0e650 cb1dfec4 c017c15f 00000000 cb1dfee4 c017c8f3 c1e4a834 
[  529.309040]        00000000 c1e4a8bc c1e4a828 e1f12ea0 00000000 cb1dfeec c017c9ab cb1dfef8 
[  529.309972] Call Trace:
[  529.311464]  [show_trace_log_lvl+26/47] show_trace_log_lvl+0x1a/0x2f
[  529.312264]  [show_stack_log_lvl+155/163] show_stack_log_lvl+0x9b/0xa3
[  529.313056]  [show_registers+160/482] show_registers+0xa0/0x1e2
[ ...
From: Laurent Riffard
Date: Thursday, September 20, 2007 - 2:25 pm

Jens,

this patch, applied on top of your previous patch, solved it.



pktcdvd: don't rely on bio_init() preserving bio->bi_destructor

Signed-off-by: Laurent Riffard <laurent.riffard@free.fr>
---
 drivers/block/pktcdvd.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6-mm/drivers/block/pktcdvd.c
===================================================================
--- linux-2.6-mm.orig/drivers/block/pktcdvd.c
+++ linux-2.6-mm/drivers/block/pktcdvd.c
@@ -1156,6 +1156,7 @@ static void pkt_gather_data(struct pktcd
 		bio->bi_end_io = pkt_end_io_read;
 		bio->bi_private = pkt;
 		bio->bi_io_vec = vec;
+		bio->bi_destructor = pkt_bio_destructor;
 
 		p = (f * CD_FRAMESIZE) / PAGE_SIZE;
 		offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
@@ -1453,6 +1454,7 @@ static void pkt_start_write(struct pktcd
 	pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
 	pkt->w_bio->bi_private = pkt;
 	pkt->w_bio->bi_io_vec = bvec;
+	pkt->w_bio->bi_destructor = pkt_bio_destructor;
 	for (f = 0; f < pkt->frames; f++)
 		if (!bio_add_page(pkt->w_bio, bvec[f].bv_page, CD_FRAMESIZE, bvec[f].bv_offset))
 			BUG();




-

From: Jens Axboe
Date: Thursday, September 20, 2007 - 10:19 pm

Ah great, thanks for following up on this! Applied.

-- 
Jens Axboe

-

Previous thread: socket locking obscure code by Cyrill Gorcunov on Friday, August 31, 2007 - 9:50 pm. (1 message)

Next thread: BUG POWERPC: snd-powermac hangs since 'Merge 32 and 64 bits asm-powerpc/io.h' by Dave Vasilevsky on Friday, August 31, 2007 - 9:58 pm. (1 message)