Re: 2.6.25-rc8-mm1 sparc64 build problem: size of array 'type name' is negative

Previous thread: Re: race leading to held mutexes, inode_cache corruption by Andrew Morton on Tuesday, April 1, 2008 - 9:28 pm. (1 message)

Next thread: [PATCH 1/2] bluetooth : use lockdep sub-classes for diffrent bluetooth protocol by Dave Young on Tuesday, April 1, 2008 - 10:59 pm. (2 messages)
From: Andrew Morton
Date: Tuesday, April 1, 2008 - 9:32 pm

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc8/2.6.25-rc8-mm1/

- Added the wm97xx touchscreen driver tree, as git-wm97xx.patch (Mark Brown
  <broonie@opensource.wolfsonmicro.com>)

- git-alsa.patch has been replaced by git-alsa-tiwai.patch

- git-drm.patch is dropped due to build errors

- git-md-accel.patch has been replaced with git-async_tx.patch

- git-xfs.patch is dropped due to extensive git rejects.

- Added the VFS git tree, as git-vfs.patch (Al Viro <viro@zeniv.linux.org.uk>)



Boilerplate:

- See the `hot-fixes' directory for any important updates to this patchset.

- To fetch an -mm tree using git, use (for example)

  git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1
  git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1

- -mm kernel commit activity can be reviewed by subscribing to the
  mm-commits mailing list.

        echo "subscribe mm-commits" | mail majordomo@vger.kernel.org

- If you hit a bug in -mm and it is not obvious which patch caused it, it is
  most valuable if you can perform a bisection search to identify which patch
  introduced the bug.  Instructions for this process are at

        http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt

  But beware that this process takes some time (around ten rebuilds and
  reboots), so consider reporting the bug first and if we cannot immediately
  identify the faulty patch, then perform the bisection search.

- When reporting bugs, please try to Cc: the relevant maintainer and mailing
  list on any email.

- When reporting bugs in this kernel via email, please also rewrite the
  email Subject: in some manner to reflect the nature of the bug.  Some
  developers filter by Subject: when looking for messages to read.

- Occasional snapshots of the -mm lineup are uploaded to
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on
  the mm-commits list.  These probably are at least ...
From: Dmitri Vorobiev
Date: Tuesday, April 1, 2008 - 10:40 pm

Hi Andrew,

MIPS build fails with the following:

$ make ARCH=mips CROSS_COMPILE=mips-unknown-linux-gnu-
...
[skipped]
...
  CC      arch/mips/mips-boards/generic/init.o
In file included from include/asm/cacheflush.h:13,
                 from arch/mips/mips-boards/generic/init.c:30:
include/linux/mm.h:411:63: "NR_PAGEFLAGS" is not defined
include/linux/mm.h:459:62: "NR_PAGEFLAGS" is not defined
make[1]: *** [arch/mips/mips-boards/generic/init.o] Error 1
make: *** [arch/mips/mips-boards/generic] Error 2

Thanks,
Dmitri
--

From: Andrew Morton
Date: Tuesday, April 1, 2008 - 11:03 pm

ahh, yup, known problem, sorry.  We are slowly working on a fix.
--

From: Christoph Lameter
Date: Wednesday, April 2, 2008 - 10:33 am

This the fix that I posted a couple of days ago after Andrew noted the 
problem:




From: Christoph Lameter <clameter@sgi.com>
Subject: Allow override of definition for asm constant

MIPS has a different way of defining asm constants which causes troubles
for bounds.h generation (see also the Kbuild script).

Add a new per arch CONFIG variable

	CONFIG_ASM_SYMBOL_PREFIX

which can be set to define an alternate header for asm constant definitions.
Use this for MIPS to make bounds determination work right.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 arch/mips/Kconfig |    7 +++++++
 kernel/bounds.c   |   11 ++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

Index: linux-2.6.25-rc5-mm1/arch/mips/Kconfig
===================================================================
--- linux-2.6.25-rc5-mm1.orig/arch/mips/Kconfig	2008-03-31 13:14:26.888383587 -0700
+++ linux-2.6.25-rc5-mm1/arch/mips/Kconfig	2008-03-31 13:14:28.028403612 -0700
@@ -2019,6 +2019,13 @@ config I8253
 config ZONE_DMA32
 	bool
 
+#
+# Used to override gas symbol setup in kernel/bounds.c.
+#
+config ASM_SYMBOL_PREFIX
+	string
+	default "@@@#define "
+
 source "drivers/pcmcia/Kconfig"
 
 source "drivers/pci/hotplug/Kconfig"
Index: linux-2.6.25-rc5-mm1/kernel/bounds.c
===================================================================
--- linux-2.6.25-rc5-mm1.orig/kernel/bounds.c	2008-03-31 13:14:26.904383870 -0700
+++ linux-2.6.25-rc5-mm1/kernel/bounds.c	2008-03-31 13:14:28.028403612 -0700
@@ -9,8 +9,17 @@
 #include <linux/page-flags.h>
 #include <linux/mmzone.h>
 
+#ifdef CONFIG_ASM_SYMBOL_PREFIX
+#define PREFIX CONFIG_ASM_SYMBOL_PREFIX
+#else
+/*
+ * Standard gas way of defining an asm symbol
+ */
+#define PREFIX "->"
+#endif
+
 #define DEFINE(sym, val) \
-	asm volatile("\n->" #sym " %0 " #val : : "i" (val))
+	asm volatile("\n" PREFIX #sym " %0 " : : "i" (val))
 
 #define BLANK() asm volatile("\n->" : :)
 
--

From: Andrew Morton
Date: Wednesday, April 2, 2008 - 11:29 am

On Wed, 2 Apr 2008 10:33:32 -0700 (PDT)

I'm obviously missing something here.

i386 generates

->NR_PAGEFLAGS $18 __NR_PAGEFLAGS       #

mips generates

->NR_PAGEFLAGS 18 __NR_PAGEFLAGS         #

The only difference is the "$".  This can be trivially handled in the sed
expression which filters this .s file.

Why are we diddling with that "->" thing, and why does it even exist?
--

From: Christoph Lameter
Date: Wednesday, April 2, 2008 - 11:33 am

For some reason the asm-offset.c for mips generates it differently and the 

Maybe the simple solution is to drop the strange mips way of doing things 

I guess this is some convention to allow the Kbuild set script to extract 
the value. There must be some reason that they added the strange prefix.

ccing Sam who may shed some light on this.
--

From: Sam Ravnborg
Date: Wednesday, April 2, 2008 - 12:06 pm

When the asm-offset stuff were consolidated the mips variant
did not match the others.
I do not recall if I ever tried this on a mips tool-chain and as
my dev box is busted atm I cannot even test it out now.

I would be happy if we could kill the MIPS specific sed expression
in the top-level Kbuild file.

Ralf - can you take a look at this and see if mips really generates
different assembler syntax which warrants the different sed expression.

If mips really needs a different sed expression then we should adjust
it so the output is similar to the other archs.

	Sam
--

From: Ralf Baechle
Date: Thursday, April 3, 2008 - 9:02 am

The reason for MIPS doing things a little different is that the resulting
<asm/asm-offsets.h> doesn't look like machine generated jibberish.  So
how about below patch which combines the two sed expressions.

  Ralf

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

diff --git a/Kbuild b/Kbuild
index 7136de7..2bd4a3c 100644
--- a/Kbuild
+++ b/Kbuild
@@ -52,10 +52,8 @@ targets += arch/$(SRCARCH)/kernel/asm-offsets.s
 
 # Default sed regexp - multiline due to syntax constraints
 define sed-y
-	"/^->/{s:^->\([^ ]*\) [\$$#]*\([^ ]*\) \(.*\):#define \1 \2 /* \3 */:; s:->::; p;}"
+	"/^->/{s:^->\([^ ]*\) [\$$#]*\([^ ]*\) \(.*\):#define \1 \2 /* \3 */:; s:->::; p;}; /^@@@/{s/^@@@//; s/ \#.*\$$//; p;};"
 endef
-# Override default regexp for specific architectures
-sed-$(CONFIG_MIPS) := "/^@@@/{s/^@@@//; s/ \#.*\$$//; p;}"
 
 quiet_cmd_offsets = GEN     $@
 define cmd_offsets
--

From: Christoph Lameter
Date: Thursday, April 3, 2008 - 3:17 pm

Well but it is machine generated and it may be best if mips would do more 
of the same that is done in other arches? We do not want special arch 
cases in Kbuild.

How about this patch?


Subject: Standardize mips asm-offsets.c somewhat

mips uses a different pattern to signal a constant in the asm code generated
by asm-offsets.c which in turn requires special handling in Kbuild and 
causes trouble for the new mechanism to count the number of page flags.

Remove the special handling and make mips use the same string as all the
other arches (->).

It seems that MIPS tried to have nice looking asm output. Sadly this 
patch disturbsthat nice formatting somewhat and makes it look like asm 
output for any otherarch.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 Kbuild                         |    2 
 arch/mips/kernel/asm-offsets.c |  392 ++++++++++++++++++++---------------------
 2 files changed, 196 insertions(+), 198 deletions(-)

Index: linux-2.6.25-rc8-mm1/Kbuild
===================================================================
--- linux-2.6.25-rc8-mm1.orig/Kbuild	2008-04-03 14:53:38.581697916 -0700
+++ linux-2.6.25-rc8-mm1/Kbuild	2008-04-03 14:53:41.411694858 -0700
@@ -54,8 +54,6 @@ targets += arch/$(SRCARCH)/kernel/asm-of
 define sed-y
 	"/^->/{s:^->\([^ ]*\) [\$$#]*\([^ ]*\) \(.*\):#define \1 \2 /* \3 */:; s:->::; p;}"
 endef
-# Override default regexp for specific architectures
-sed-$(CONFIG_MIPS) := "/^@@@/{s/^@@@//; s/ \#.*\$$//; p;}"
 
 quiet_cmd_offsets = GEN     $@
 define cmd_offsets
Index: linux-2.6.25-rc8-mm1/arch/mips/kernel/asm-offsets.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/arch/mips/kernel/asm-offsets.c	2008-04-03 14:53:38.601695308 -0700
+++ linux-2.6.25-rc8-mm1/arch/mips/kernel/asm-offsets.c	2008-04-03 14:59:46.939017142 -0700
@@ -20,193 +20,193 @@
 #define text(t) __asm__("\n@@@" t)
 #define _offset(type, member) (&(((type *)NULL)->member))
 #define offset(string, ptr, ...
From: Dmitri Vorobiev
Date: Thursday, April 3, 2008 - 4:26 pm

I confirm that with this patch applied, the kernel build succeeds. Did
not try to boot it, though.

Thanks,

--

From: Ralf Baechle
Date: Friday, April 4, 2008 - 3:24 am

Almost.  It compiles into a usable header but breaks the text() macro
which is used to emit a commit (actually any string literal) into the

With your patch nothing will be emited.  The existing non-MIPS sed
expression in Kbuild doesn't allow for that which is why I added the
handling of @@@-prefixed strings to the sed expression.  And once that
is there the remaining asm-offset.c change is no longer needed.

  Ralf
--

From: Christoph Lameter
Date: Friday, April 4, 2008 - 10:36 am

The text macro still emits the same text. Nothing is changed. Why does 

Why would kbuild have to handle comments?

--

From: Christoph Lameter
Date: Friday, April 4, 2008 - 10:50 am

Ahh you want to insert comments into the generated 
include/asm-*/asm-offsets.h. Hmmm, the header comments state that it was 
generated so one would hopefully look at the source file instead ?

If we want comments etc in there then we may want to do in some 
standardized fashion that works across all arches. Most of the 
arch/*/asm-offsets.c contents are exactly the same. Mips is deviating the 
most. If we could put some of the common stuff into common header files 
then this may turn out to be a nice code cleanup.

--

From: Valdis.Kletnieks
Date: Tuesday, April 1, 2008 - 11:04 pm

Dell Latitude D820, Core2 T7200, x86_64.

Built my usual .config cleanly, booted OK, has gone for a half hour
of fairly representative usage without any oopsen or other dmesg surprises...
From: Andrew Morton
Date: Tuesday, April 1, 2008 - 11:15 pm

Yes, it passed testing on my six test machines without any runtime problems
at all.  Weird.

Lots of compile-time problems, but that's usual.
--

From: Kamalesh Babulal
Date: Tuesday, April 1, 2008 - 11:25 pm

Hi Andrew,

The 2.6.25-rc8-mm1 kernel panic's while bootup on the power machine(s).

[    0.000000] ------------[ cut here ]------------
[    0.000000] kernel BUG at arch/powerpc/mm/init_64.c:240!
[    0.000000] Oops: Exception in kernel mode, sig: 5 [#1]
[    0.000000] SMP NR_CPUS=32 NUMA PowerMac
[    0.000000] Modules linked in:
[    0.000000] NIP: c0000000003d1dcc LR: c0000000003d1dc4 CTR: c00000000002b6ac
[    0.000000] REGS: c00000000049b960 TRAP: 0700   Not tainted  (2.6.25-rc8-mm1-autokern1)
[    0.000000] MSR: 9000000000021032 <ME,IR,DR>  CR: 44000088  XER: 20000000
[    0.000000] TASK = c0000000003f9c90[0] 'swapper' THREAD: c000000000498000 CPU: 0
[    0.000000] GPR00: c0000000003d1dc4 c00000000049bbe0 c0000000004989d0 0000000000000001 
[    0.000000] GPR04: d59aca40f0000000 000000000b000000 0000000000000010 0000000000000000 
[    0.000000] GPR08: 0000000000000004 0000000000000001 c00000027e520800 c0000000004bf0f0 
[    0.000000] GPR12: c0000000004bf020 c0000000003fa900 0000000000000000 0000000000000000 
[    0.000000] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[    0.000000] GPR20: 0000000000000000 0000000000000000 0000000000000000 4000000001400000 
[    0.000000] GPR24: 00000000017d64b0 c0000000003d6250 0000000000000000 c000000000504000 
[    0.000000] GPR28: 0000000000000000 cf000000001f8000 0000000001000000 cf00000000000000 
[    0.000000] NIP [c0000000003d1dcc] .vmemmap_populate+0xb8/0xf4
[    0.000000] LR [c0000000003d1dc4] .vmemmap_populate+0xb0/0xf4
[    0.000000] Call Trace:
[    0.000000] [c00000000049bbe0] [c0000000003d1dc4] .vmemmap_populate+0xb0/0xf4 (unreliable)
[    0.000000] [c00000000049bc70] [c0000000003d2ee8] .sparse_mem_map_populate+0x38/0x60
[    0.000000] [c00000000049bd00] [c0000000003c242c] .sparse_early_mem_map_alloc+0x54/0x94
[    0.000000] [c00000000049bd90] [c0000000003c250c] .sparse_init+0xa0/0x20c
[    0.000000] [c00000000049be50] [c0000000003ab7d0] .setup_arch+0x1ac/0x218
[    0.000000] [c00000000049bee0] ...
From: Andrew Morton
Date: Tuesday, April 1, 2008 - 11:39 pm

int __meminit vmemmap_populate(struct page *start_page,
					unsigned long nr_pages, int node)
{
	unsigned long mode_rw;
	unsigned long start = (unsigned long)start_page;
	unsigned long end = (unsigned long)(start_page + nr_pages);
	unsigned long page_size = 1 << mmu_psize_defs[mmu_linear_psize].shift;

	mode_rw = _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_COHERENT | PP_RWXX;

	/* Align to the page size of the linear mapping. */
	start = _ALIGN_DOWN(start, page_size);

	for (; start < end; start += page_size) {
		int mapped;
		void *p;

		if (vmemmap_populated(start, page_size))
			continue;

		p = vmemmap_alloc_block(page_size, node);
		if (!p)
			return -ENOMEM;

		pr_debug("vmemmap %08lx allocated at %p, physical %08lx.\n",
			start, p, __pa(p));

		mapped = htab_bolt_mapping(start, start + page_size,
					__pa(p), mode_rw, mmu_linear_psize,
					mmu_kernel_ssize);
=====>		BUG_ON(mapped < 0);
	}

	return 0;
}

Beats me.  pseries?  Badari has been diddling with the bolted memory code
in git-powerpc...
--

From: Kamalesh Babulal
Date: Wednesday, April 2, 2008 - 12:08 am

One of the machines is the Power5 and another is PowerMac G5, on which the 
same kernel panic is seen.

-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--

From: Michael Ellerman
Date: Wednesday, April 2, 2008 - 12:17 am

Can you enable DEBUG_LOW in arch/powerpc/platforms/pseries/lpar.c, that
should show what's happening in hpte_insert().

cheers

--=20
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
From: Kamalesh Babulal
Date: Wednesday, April 2, 2008 - 10:09 am

Just define DEBUG_LOW did not fetch and debug information, so added some printk to
htab_bolt_mapping () and pSeries_lpar_hpte_insert ()

[boot]0012 Setup Arch
htab_bolt_mapping (vstart cf00000000000000, vend cf00000001000000, pstart 3000000,mode 190, psize 4, ssize 0)
htab_bolt_mapping: calling c000000000888f00
_hpte_insert(group=252078, va=d59aca40f0000000, pa=0000000003000000, rflags=194, vflags=10, psize=4 ssize=0)
htab_bolt_mapping (vstart cf00000000000000, vend cf00000001000000, pstart 4000000,mode 190, psize 4, ssize 0)
htab_bolt_mapping: calling c000000000888f00
_hpte_insert(group=252078, va=d59aca40f0000000, pa=0000000004000000, rflags=194, vflags=10, psize=4 ssize=0)
htab_bolt_mapping (vstart cf00000000000000, vend cf00000001000000, pstart 5000000,mode 190, psize 4, ssize 0)
htab_bolt_mapping: calling c000000000888f00
_hpte_insert(group=252078, va=d59aca40f0000000, pa=0000000005000000, rflags=194, vflags=10, psize=4 ssize=0)
htab_bolt_mapping (vstart cf00000000000000, vend cf00000001000000, pstart 6000000,mode 190, psize 4, ssize 0)
htab_bolt_mapping: calling c000000000888f00
_hpte_insert(group=252078, va=d59aca40f0000000, pa=0000000006000000, rflags=194, vflags=10, psize=4 ssize=0)
htab_bolt_mapping (vstart cf00000000000000, vend cf00000001000000, pstart 8000000,mode 190, psize 4, ssize 0)
htab_bolt_mapping: calling c000000000888f00
_hpte_insert(group=252078, va=d59aca40f0000000, pa=0000000008000000, rflags=194, vflags=10, psize=4 ssize=0)
htab_bolt_mapping (vstart cf00000000000000, vend cf00000001000000, pstart 9000000,mode 190, psize 4, ssize 0)
htab_bolt_mapping: calling c000000000888f00
_hpte_insert(group=252078, va=d59aca40f0000000, pa=0000000009000000, rflags=194, vflags=10, psize=4 ssize=0)
htab_bolt_mapping (vstart cf00000000000000, vend cf00000001000000, pstart a000000,mode 190, psize 4, ssize 0)
htab_bolt_mapping: calling c000000000888f00
_hpte_insert(group=252078, va=d59aca40f0000000, pa=000000000a000000, rflags=194, vflags=10, psize=4 ssize=0)
htab_bolt_mapping ...
From: Badari Pulavarty
Date: Wednesday, April 2, 2008 - 11:15 am

Kamalesh,

With your config, I am able to reproduce the problem. I haven't touched
that part of code. I can take a look at it. It looks like we are trying
to create mapping for same "vaddr" multiple times and we get failures
after few creates. I am not sure why we are trying to create so many
times with same vaddr.

Thanks,
Badari

--

From: Badari Pulavarty
Date: Wednesday, April 2, 2008 - 12:22 pm

Okay. Found it.

Root cause is:

mm-make-mem_map-allocation-continuous.patch
and its friends in -mm.

You have to call sparse_init_one_section() on each pmap and usemap
as we allocate - since valid_section() depends on it (which is needed
by vmemmap_populate() to check if the section is populated or not).
On ppc, we need to call htab_bolted_mapping() on each section and
we need to skip existing sections.

These patches tried to group all allocations together and then later
calls sparse_init_one_section() - which is not good :(

Please let me know, if its doesn't make sense - I will try to explain
better :)

Thanks,
Badari

--

From: Yinghai Lu
Date: Wednesday, April 2, 2008 - 2:57 pm

will send you patch workaround it...

YH
--

From: Andy Whitcroft
Date: Friday, April 4, 2008 - 2:24 am

It does look like this is resolved with the patch below, if my testing
is to be believed (results out on TKO):

    [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init
    From: Yinghai Lu <yhlu.kernel@gmail.com>

Andrew, I believe you just sucked that up into -mm.

-apw
--

From: Kamalesh Babulal
Date: Wednesday, April 2, 2008 - 2:02 am

Hi Andrew,

The 2.6.25-rc8-mm1 kernel build fails on x86_64, when compiled with randconfig option 

In file included from include/net/dst.h:15,
                 from include/net/sock.h:57,
                 from include/linux/if_pppox.h:145,
                 from fs/compat_ioctl.c:39:
include/net/neighbour.h: In function 
From: Miles Lane
Date: Wednesday, April 2, 2008 - 3:49 am

CC [M]  drivers/net/wireless/iwlwifi/iwl3945-base.o
drivers/net/wireless/iwlwifi/iwl3945-base.c: In function
'iwl3945_build_tx_cmd_basic':
drivers/net/wireless/iwlwifi/iwl3945-base.c:2492: error: 'struct
iwl3945_priv' has no member named 'rxtxpackets'
make[4]: *** [drivers/net/wireless/iwlwifi/iwl3945-base.o] Error 1

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.25-rc8-mm1
# Tue Apr  1 21:44:54 2008
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
# CONFIG_X86_64 is not set
CONFIG_X86=y
# CONFIG_GENERIC_LOCKBREAK is not set
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
# CONFIG_GENERIC_GPIO is not set
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
# CONFIG_GENERIC_TIME_VSYSCALL is not set
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
# CONFIG_HAVE_SETUP_PER_CPU_AREA is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_ZONE_DMA32 is not set
CONFIG_ARCH_POPULATES_NODE_MAP=y
# CONFIG_AUDIT_ARCH is not set
CONFIG_ARCH_SUPPORTS_AOUT=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_X86_SMP=y
CONFIG_X86_32_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_X86_TRAMPOLINE=y
CONFIG_KTIME_SCALAR=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General ...
From: Valdis.Kletnieks
Date: Wednesday, April 2, 2008 - 4:08 am

Apparently not ready for prime time...



From: Chatre, Reinette
Date: Wednesday, April 2, 2008 - 9:58 am

Thanks! John Linville just posted a fix for this problem to
wireless-testing ("drivers/net/wireless/iwlwifi/iwl-3945.h: correct
CONFIG_IWL4965_LEDS typo")

Reinette
--

From: Valdis.Kletnieks
Date: Wednesday, April 2, 2008 - 12:15 pm

And with John's fix, I'm able to build with IWL3945_LEDS defined and
there's now an "ooooh shiny" LED that hasn't worked since I got the laptop. :)
From: Mariusz Kozlowski
Date: Wednesday, April 2, 2008 - 9:20 am

Hello,

	sparc64 box, gcc 4.1.2

  CC      arch/sparc64/mm/init.o
arch/sparc64/mm/init.c: In function 'paging_init':
arch/sparc64/mm/init.c:1303: error: size of array 'type name' is negative

and this is 

BUILD_BUG_ON(BITS_PER_LONG - NR_PAGEFLAGS != 32);

	Mariusz
From: Andrew Morton
Date: Wednesday, April 2, 2008 - 9:30 am

yup, thanks.  That's due to some page-flag rework in the memory management
queue.  The patches which broke mips as well.  I'm pushing cross-compilers
in Christoph's direction and hoping stuff gets fixed...
--

From: Valdis.Kletnieks
Date: Wednesday, April 2, 2008 - 12:12 pm

(Yes, I know the kernel is tainted.  Hopefully the traceback will make
enough sense that it won't matter.  I think I cc'd most everybody who is
listed in MAINTAINERS or had a non-trivial jbd, quota, or ext3 patch in the broken-out/)

So I was running a 'yum update' on my laptop, walked away to ask a cow-orker
a question, and came back to find it had BUG'ed twice...  Amazingly
enough, although it died in ext3 code, it apparently only nuked whatever
filesystem it was handling, as syslog was still able to log the gory details
into a file in /var. Given that a kernel rpm was the one it failed on, the
I/O was almost certainly on either / or /boot - both ext3. / is mounted
with quotas, /boot isn't, so I'm betting on /

Apr  2 13:48:07 turing-police yum: Updated: texlive-texmf-latex-2007-18.fc9.noarch
Apr  2 13:48:08 turing-police yum: Updated: 1:openoffice.org-xsltfilter-2.4.0-12.4.fc9.x86_64
Apr  2 13:48:09 turing-police yum: Updated: 1:openoffice.org-javafilter-2.4.0-12.4.fc9.x86_64
Apr  2 13:48:12 turing-police yum: Updated: kernel-headers-2.6.25-0.185.rc7.git6.fc9.x86_64

(here, it started updating kernel-2.6.25-0.185.rc7.git6 and died while I wasn't looking)

[34895.379293] ------------[ cut here ]------------
[34895.379299] kernel BUG at fs/jbd/transaction.c:275!
[34895.379302] invalid opcode: 0000 [1] PREEMPT SMP 
[34895.379306] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
[34895.379309] CPU 0 
[34895.379311] Modules linked in: gspca(U) compat_ioctl32 videodev v4l1_compat irnet ppp_generic slhc irtty_sir sir_dev ircomm_tty ircomm irda crc_ccitt coretemp vmnet(P)(U) vmmon(P)(U) nf_conntrack_ftp xt_pkttype ipt_REJECT ipt_osf nf_conntrack_ipv4 xt_ipisforif ipt_recent ipt_LOG xt_u32 iptable_filter ip_tables xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6t_LOG xt_limit ip6table_filter ip6_tables x_tables sha256_generic aes_generic acpi_cpufreq tpm_tis arc4 pcmcia ecb iwl3945 yenta_socket nvidia(P)(U) iTCO_wdt firmware_class iTCO_vendor_support rsrc_nonstatic ...
From: Andrew Morton
Date: Wednesday, April 2, 2008 - 12:30 pm

On Wed, 02 Apr 2008 15:12:49 -0400

The backtrace tells it all - we were inside a transaction for filesystem A,
went into page reclaim, reclaimed an inode for filesystem B and then
DQUOT_DROP() tried to start a transaction on filesystem B.  JBD doesn't
like cross-fs nested transactions (it'll corrupt task_struct.journal_info,
and will cause ab/ba deadlocks).  So it went BUG.

Presumably something in the quota updates in -mm caused this.
--

From: Jan Kara
Date: Thursday, April 3, 2008 - 1:57 am

I think quota is innocent in this ;). We start a transaction in
ext3_dquot_drop() for quite some time already. The problem is really in
inode_alloc_security() and Josef pointed out. We really aren't allowed to
allocate with GFP_KERNEL there because the reclaim code could as well
decide to just write an inode on a different filesystem...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--

From: Josef Bacik
Date: Wednesday, April 2, 2008 - 12:27 pm

<snip>

Try this patch, it will keep us from re-entering the fs when we aren't supposed
to.  cc'ing Eric Paris since he's the only selinux guy I know :).  I don't think
any of the other allocations in here need to be fixed, but I didn't look too
carefully.

Signed-off-by: Josef Bacik <jbacik@redhat.com>


diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index c2fef7b..820d07a 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -180,7 +180,7 @@ static int inode_alloc_security(struct inode *inode)
 	struct task_security_struct *tsec = current->security;
 	struct inode_security_struct *isec;
 
-	isec = kmem_cache_zalloc(sel_inode_cache, GFP_KERNEL);
+	isec = kmem_cache_zalloc(sel_inode_cache, GFP_NOFS);
 	if (!isec)
 		return -ENOMEM;
 
@@ -2429,7 +2429,7 @@ static int selinux_inode_init_security(struct inode *inode, struct inode *dir,
 		return -EOPNOTSUPP;
 
 	if (name) {
-		namep = kstrdup(XATTR_SELINUX_SUFFIX, GFP_KERNEL);
+		namep = kstrdup(XATTR_SELINUX_SUFFIX, GFP_NOFS);
 		if (!namep)
 			return -ENOMEM;
 		*name = namep;
--

From: Andrew Morton
Date: Wednesday, April 2, 2008 - 12:39 pm

On Wed, 2 Apr 2008 15:27:15 -0400

Might fix it.  But 2.6.24's inode_alloc_security() also uses GFP_KERNEL and
doesn't have this bug.  What changed?


--

From: Josef Bacik
Date: Wednesday, April 2, 2008 - 12:41 pm

I don't see why the problem couldn't happen in 2.6.24, I'm sure if I generate
enough memory pressure and start creating a bunch of files I could reproduce the
same thing.  /me wanders off to try,

Josef 
--

From: Stephen Smalley
Date: Thursday, April 3, 2008 - 11:18 am

Looks legitimate, although we've been doing that since Linux 2.6.0-test3
(selinux merge) for inode_alloc_security and d_instantiate, and since
Linux 2.6.14 for inode_init_security, so something is at least
triggering it more easily now.  inode_doinit_with_dentry looks like
another instance and security_context_to_sid_core as well.

-- 
Stephen Smalley
National Security Agency

--

From: James Morris
Date: Thursday, April 3, 2008 - 4:02 pm

Thanks, I'll push this to Linus, but note that further analysis is 
required.


-- 
James Morris
<jmorris@namei.org>
--

From: Stephen Smalley
Date: Friday, April 4, 2008 - 5:46 am

Please review.

More cases where SELinux must not re-enter the fs code.
Called from the d_instantiate security hook.

Signed-off-by:  Stephen Smalley <sds@tycho.nsa.gov>

---

 security/selinux/hooks.c            |    7 ++++---
 security/selinux/include/security.h |    3 ++-
 security/selinux/ss/services.c      |   12 +++++++-----
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 41a049f..95b51b6 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1143,7 +1143,7 @@ static int inode_doinit_with_dentry(struct inode *inode, struct dentry *opt_dent
 		}
 
 		len = INITCONTEXTLEN;
-		context = kmalloc(len, GFP_KERNEL);
+		context = kmalloc(len, GFP_NOFS);
 		if (!context) {
 			rc = -ENOMEM;
 			dput(dentry);
@@ -1161,7 +1161,7 @@ static int inode_doinit_with_dentry(struct inode *inode, struct dentry *opt_dent
 			}
 			kfree(context);
 			len = rc;
-			context = kmalloc(len, GFP_KERNEL);
+			context = kmalloc(len, GFP_NOFS);
 			if (!context) {
 				rc = -ENOMEM;
 				dput(dentry);
@@ -1185,7 +1185,8 @@ static int inode_doinit_with_dentry(struct inode *inode, struct dentry *opt_dent
 			rc = 0;
 		} else {
 			rc = security_context_to_sid_default(context, rc, &sid,
-			                                     sbsec->def_sid);
+							     sbsec->def_sid,
+							     GFP_NOFS);
 			if (rc) {
 				printk(KERN_WARNING "%s:  context_to_sid(%s) "
 				       "returned %d for dev=%s ino=%ld\n",
diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
index f7d2f03..44e12ec 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -86,7 +86,8 @@ int security_sid_to_context(u32 sid, char **scontext,
 int security_context_to_sid(char *scontext, u32 scontext_len,
 	u32 *out_sid);
 
-int security_context_to_sid_default(char *scontext, u32 scontext_len, u32 *out_sid, u32 def_sid);
+int ...
From: James Morris
Date: Sunday, April 6, 2008 - 4:54 pm

-- 
James Morris
<jmorris@namei.org>
--

From: Jan Kara
Date: Friday, April 4, 2008 - 3:15 am

I guess it is just the combination of someone using SELinux + quota
(or several journaling filesystems) + being unlucky under memory
pressure that makes this happen only rarely. Josef, have you been
successful in reproducing the problem under older kernel?

								Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs
--

From: Josef Bacik
Date: Friday, April 4, 2008 - 5:53 am

Not yet, I haven't been lucky enough apparently.  I'm going to kick it off on a
couple of boxes over the weekend and see if I can't hit it on one of them
instead of just relying on the one box.

Josef
--

From: Stephen Smalley
Date: Thursday, April 3, 2008 - 11:25 am

-- 
Stephen Smalley
National Security Agency

--

From: Dave Airlie
Date: Thursday, April 3, 2008 - 5:11 am

Actually git-agp.patch is broken, and should have been dropped. the
drm patch is fine.

I did mention this to you in two separate e-mails :)

Dave.
--

From: Andrew Morton
Date: Thursday, April 3, 2008 - 9:59 am

git-drm has a bunch of git rejects against mainline.  I had a go at fixing
them but it didn't work out and I had other stuff to look at.
--

From: Tilman Schmidt
Date: Thursday, April 3, 2008 - 4:08 pm

8/2.6.25-rc8-mm1/

This fails to come up on my development machine, apparently because it
has trouble accessing the SATA hard disks.
Hardware: Intel Pentium D940, Intel DQ965GF board, two SATA hard disks.
Some unusual things I noticed during the boot process:

- a message "doing fast boot" that looked unfamiliar; unfortunately
  it scrolled off too quickly to note its context

- for each of the two SATA ports in use, a message
  "SATA port is slow to respond, please be patient"
  accompanied by about 10 secs wait

- it actually got past the point where it mounts the root file system,
  so it must have thought it could access the disks

- finally, the system hung completely after the SUSE startup messages

  Setting current sysctl status from /etc/sysctl.conf
  net.ipv4.icmp_echo_ignore_broadcasts =3D 1

  with a dead keyboard and I had to hit the Win^Wreset button.

- After rebooting into 2.6.24-rc8 (which works fine), nothing had been
  written to the disks, not even the dmesg output which SUSE usually
  dumps into /var/log/boot.msg early during startup.

Before I try booting that kernel again, any instructions on what to
watch out for? Is netconsole usable again? Other ideas?

Regards,
Tilman

From: Andrew Morton
Date: Thursday, April 3, 2008 - 4:17 pm

On Fri, 04 Apr 2008 01:08:19 +0200


Usual stuff: `diff -u dmesg-2.6.25-rc8 dmesg-2.6.25-rc8-mm1'.  Bisection.

Thanks.
--

From: Tilman Schmidt
Date: Wednesday, April 9, 2008 - 7:29 am

This is taking longer than I hoped, so here's a little progress report.


That message doesn't make it into dmesg. It's apparently a Suse thing,

These messages seem to be a separate issue. I also get them with
a .config that otherwise brings up the system successfully. That
allowed me to capture a dmesg, so here are some possibly interesting
hunks of the diff between a mainline kernel and a working 2.6.25-rc8-mm1
one:

--- dmesg-2.6.25-rc8-git.nots-reordered 2008-04-09 15:29:52.000000000 +02=
00
+++ dmesg-2.6.25-rc8-mm1.nots   2008-04-09 00:48:42.000000000 +0200
@@ -1,4 +1,4 @@
- Linux version 2.6.25-rc8-testing-00210-g51ac03f (ts@xenon) (gcc version=
 4.2.1 (SUSE Linux)) #37 SMP PREEMPT Wed Apr 9 01:27:07 CEST 2008
+ Linux version 2.6.25-rc8-mm1-testing (ts@xenon) (gcc version 4.2.1 (SUS=
E Linux)) #6 SMP PREEMPT Wed Apr 9 00:24:23 CEST 2008
   BIOS-provided physical RAM map:
    BIOS-e820: 0000000000000000 - 000000000008f000 (usable)
    BIOS-e820: 000000000008f000 - 00000000000a0000 (reserved)

[...]

@@ -244,12 +277,10 @@
   CPU1: Intel P4/Xeon Extended MCE MSRs (24) available
   CPU1: Thermal monitoring enabled
   CPU1: Intel(R) Pentium(R) D CPU 3.20GHz stepping 04
- Total of 2 processors activated (12796.06 BogoMIPS).
+ Total of 2 processors activated (12796.87 BogoMIPS).
   ENABLING IO-APIC IRQs
   ..TIMER: vector=3D0x31 apic1=3D0 pin1=3D2 apic2=3D-1 pin2=3D-1
- checking TSC synchronization [CPU#0 -> CPU#1]:
- Measured 560 cycles TSC warp between CPUs, turning off TSC clock.
- Marking TSC unstable due to: check_tsc_sync_source failed.
+ checking TSC synchronization [CPU#0 -> CPU#1]: passed.
   Brought up 2 CPUs
   CPU0 attaching sched-domain:
    domain 0: span 03

[Nice - at last a kernel that likes my TSC; not sure if it matters though=
=2E]

@@ -846,26 +880,36 @@
   PCI: Setting latency timer of device 0000:00:1f.2 to 64
   scsi0 : ahci
   PM: Adding info for No Bus:host0
+ PM: Adding info for No Bus:host0
   scsi1 : ahci
   PM: Adding info for No ...
From: Tilman Schmidt
Date: Sunday, April 13, 2008 - 5:28 pm

Final report, seeing -mm2 is out:

- Netconsole works. (grumblestupidsusefirewallgrumble)

- The hang during boot only happens with kernels compiled with
  CONFIG_CIFS_EXPERIMENTAL=3Dy
  It also doesn't always happen at the same point in the boot sequence.
  I'm suspecting it might be triggered by some network packet.
  Anyway, it's obviously *not* a SATA problem.
  (That was just me jumping to conclusions, because ...)

- That leaves only the messages

  ata1: port is slow to respond, please be patient (Status 0x80)
  ata1: COMRESET failed (errno=3D-16)

  and accompanying delays during boot, for each installed SATA disk.
  I'll try to find the time to retest this with 2.6.25-rc8-mm2.

Thanks,
Tilman

--=20
Tilman Schmidt                                  E-Mail: tilman@imap.cc
Wehrhausweg 66                                  Fax: +49 228 4299019
53227 Bonn, Germany

From: Andrew Morton
Date: Sunday, April 13, 2008 - 7:05 pm

I don't remember seeing a report of the CIFS hang.

It might be caused by
bkl-removal-convert-cifs-over-to-unlocked_ioctl.patch, but it's hard to see

That would be good, thanks.
--

From: Tilman Schmidt
Date: Tuesday, April 15, 2008 - 4:33 pm

Done. The messages and delays do *not* happen with 2.6.25-rc8-mm2.

HTH
Tilman


From: Jiri Slaby
Date: Friday, April 4, 2008 - 1:16 pm

After
$ echo -n 4-1.2 >/sys/bus/usb/drivers/usb/unbind
$ echo -n 4-1.2 >/sys/bus/usb/drivers/usb/bind

I have this in logs:

sysfs: duplicate filename 'usbdev4.12_ep81' can not be created
------------[ cut here ]------------
WARNING: at /home/l/latest/xxx/fs/sysfs/dir.c:425 sysfs_add_one+0x99/0xc0()
Modules linked in: usbhid hid nls_cp437 vfat fat usb_storage tun bitrev ipv6 
arc4 ecb crypto_blkcipher cryptomgr crypto_algapi ath5k mac80211 sr_mod crc32 
ohci1394 rtc_cmos cfg80211 ieee1394 floppy rtc_core ehci_hcd rtc_lib ff_memless 
cdrom [last unloaded: hid]
Pid: 539, comm: bash Tainted: G        W 2.6.25-rc8-mm1_64 #395

Call Trace:
  [<ffffffff8022f07f>] warn_on_slowpath+0x5f/0x80
  [<ffffffff80230197>] ? printk+0x67/0x70
  [<ffffffff802d9bd0>] ? sysfs_ilookup_test+0x0/0x20
  [<ffffffff802a12e8>] ? ifind+0x58/0xc0
  [<ffffffff802d9bd0>] ? sysfs_ilookup_test+0x0/0x20
  [<ffffffff802d9f49>] sysfs_add_one+0x99/0xc0
  [<ffffffff802daf68>] sysfs_create_link+0xa8/0x130
  [<ffffffff8038ebda>] device_add+0x2aa/0x4d0
  [<ffffffff80310c26>] ? kobject_init+0x36/0x80
  [<ffffffff8038ee19>] device_register+0x19/0x20
  [<ffffffff803dbbec>] usb_create_ep_files+0x19c/0x320
  [<ffffffff803dadb3>] usb_create_sysfs_intf_files+0xd3/0x100
  [<ffffffff803d630c>] usb_set_configuration+0x3ac/0x5f0
  [<ffffffff803df81a>] generic_probe+0x7a/0xb0
  [<ffffffff803d83fa>] usb_probe_device+0x3a/0x40
  [<ffffffff80390ceb>] driver_probe_device+0x9b/0x1a0
  [<ffffffff803901b3>] driver_bind+0xb3/0x100
  [<ffffffff8038f8a7>] drv_attr_store+0x27/0x30
  [<ffffffff802d94ab>] sysfs_write_file+0xeb/0x140
  [<ffffffff8028cc57>] vfs_write+0xc7/0x170
  [<ffffffff8028d2f0>] sys_write+0x50/0x90
  [<ffffffff8020b5eb>] system_call_after_swapgs+0x7b/0x80

---[ end trace 6ee6d593d4e510b4 ]---




I think, this is a 2.6.25-rc5-mm1 regression, there
while :; do
   echo -n 4-1.2 >/sys/bus/usb/drivers/usb/unbind
   echo -n 4-1.2 >/sys/bus/usb/drivers/usb/bind
   usleep 10000
done
went just fine for about ...
From: Greg KH
Date: Friday, April 4, 2008 - 1:51 pm

Does this also show up in 2.6.25-rc8 without -mm?

I thought I fixed this already, I don't see what slipped into -mm that
would have caused it to come back.  Time to run some more tests...

Oh, also note that binding and unbinding the main "usb" driver is not
encouraged, or even supported.  I'm amazed it works, as this is not
something that any "real" user would do as it makes no sense at all
because we have no "alternative" drivers yet for the main USB device.

thanks,

greg k-h
--

From: Alan Stern
Date: Friday, April 4, 2008 - 2:23 pm

It's a real bug.  I don't have time to track it down now.  Next week...

Alan Stern

--

From: Alan Stern
Date: Friday, April 4, 2008 - 8:46 pm

Here's the answer.  The bug was introduced when the definition of 
device_is_registered() in include/linux/device.h was changed.  The old 
definition returned 0 when called inside a driver's remove method for a 
device being unregistered, whereas the new definition returns 1.  I 
don't know when this change was made.

This patch ought to fix the problem.  Jiri, can you confirm that it 
works?

Alan Stern

-----------------------------------------------------------

Removing an interface's sysfs files before unregistering the interface
doesn't work properly, because usb_unbind_interface() will reinstall
altsetting 0 and thereby create new sysfs files.  This patch (as1074)
removes the files after the unregistration is finished.  It's not
quite as clean, but at least it works.

Also, there's no need to check if an interface has been registered
before removing its sysfs files.  If it hasn't been registered then
the files won't have been created, so usb_remove_sysfs_intf_files()
will simply do nothing.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

---

Index: usb-2.6/drivers/usb/core/message.c
===================================================================
--- usb-2.6.orig/drivers/usb/core/message.c
+++ usb-2.6/drivers/usb/core/message.c
@@ -1089,8 +1089,8 @@ void usb_disable_device(struct usb_devic
 				continue;
 			dev_dbg(&dev->dev, "unregistering interface %s\n",
 				interface->dev.bus_id);
-			usb_remove_sysfs_intf_files(interface);
 			device_del(&interface->dev);
+			usb_remove_sysfs_intf_files(interface);
 		}
 
 		/* Now that the interfaces are unbound, nobody should
@@ -1231,7 +1231,7 @@ int usb_set_interface(struct usb_device 
 	 */
 
 	/* prevent submissions using previous endpoint settings */
-	if (iface->cur_altsetting != alt && device_is_registered(&iface->dev))
+	if (iface->cur_altsetting != alt)
 		usb_remove_sysfs_intf_files(iface);
 	usb_disable_interface(dev, iface);
 
@@ -1330,8 +1330,7 @@ int usb_reset_configuration(struct ...
From: Greg KH
Date: Friday, April 4, 2008 - 9:37 pm

I've changed that in the -mm tree to make some PCI stuff much easier.  I
didn't realize that USB was depending on when this was being set, sorry
about it.

I like your fix better, it makes the code path much simpler :)

thanks,

greg k-h
--

From: Alan Stern
Date: Saturday, April 5, 2008 - 7:16 am

Well, it's not really any _simpler_, since all I did was interchange 
two lines of code.

But I agree this way is better.  It doesn't depend on the behavior of 
device_is_registered() in the ill-defined situation where the device is 
in the middle of being unregistered.

Alan Stern

--

From: Jiri Slaby
Date: Saturday, April 5, 2008 - 1:17 am

Tested-by: Jiri Slaby <jirislaby@gmail.com>

Works well, thanks.
--

From: Valdis.Kletnieks
Date: Sunday, April 6, 2008 - 11:21 pm

Been seeing these crop up once in a while - can take hours after a reboot
before I see the first one, but once I see one, I'm likely to see more, at
a frequency of anywhere from ~5seconds to ~10 minutes between BUG msgs.

BUG: scheduling while atomic: swapper/0/0xffffffff
Pid: 0, comm: swapper Tainted: P          2.6.25-rc8-mm1 #4

Call Trace:
 [<ffffffff8020b2f4>] ? default_idle+0x0/0x74
 [<ffffffff8022be19>] __schedule_bug+0x5d/0x61
 [<ffffffff80552aea>] schedule+0x11a/0x9e4
 [<ffffffff805536ce>] ? preempt_schedule+0x3c/0xaa
 [<ffffffff802480f1>] ? hrtimer_forward+0x82/0x96
 [<ffffffff804600a4>] ? cpuidle_idle_call+0x0/0xd5
 [<ffffffff8020b2f4>] ? default_idle+0x0/0x74
 [<ffffffff8020b2e0>] cpu_idle+0xf6/0x10a
 [<ffffffff80540cb2>] rest_init+0x86/0x8a

Eventually, I end up with a basically hung system, and need to alt-sysrq-B.

Yes, I know it's tainted, and it's possible the root cause is a self-inflicted
buggy module - but the traceback above seems odd.  Did some of my code manage
to idle the CPU while is_atomic was set, or is the path from cpu_idle on down
doing something it shouldn't be?

(I admit being confused - if my code was the source of the is_atomic error,
shouldn't it have been caught on the *previous* call to schedule - the one
that ran through all the queues and decided we should invoke idle?

From: Andrew Morton
Date: Sunday, April 6, 2008 - 11:48 pm

Sounds sane.  Perhaps preempt_count is getting mucked up in interrupt
context?

iirc there's some toy in either the recently-added tracing code or still in
the -rt tree which would help find a missed unlock, but I forget what it was.
Ingo will know...


--

Previous thread: Re: race leading to held mutexes, inode_cache corruption by Andrew Morton on Tuesday, April 1, 2008 - 9:28 pm. (1 message)

Next thread: [PATCH 1/2] bluetooth : use lockdep sub-classes for diffrent bluetooth protocol by Dave Young on Tuesday, April 1, 2008 - 10:59 pm. (2 messages)