Re: 2.6.25-rc3-mm1 (9p docs)

Previous thread: Re: drivers/net/wireless/b43legacy/ on mips by Ralf Baechle on Tuesday, March 4, 2008 - 2:02 am. (2 messages)

Next thread: none
From: Andrew Morton
Date: Tuesday, March 4, 2008 - 2:19 am

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc3/2.6.25-rc3-mm1/



Boilerplate:

- See the `hot-fixes' directory for any important updates to this patchset.

- To fetch an -mm tree using git, use (for example)

  git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1
  git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1

- -mm kernel commit activity can be reviewed by subscribing to the
  mm-commits mailing list.

        echo "subscribe mm-commits" | mail majordomo@vger.kernel.org

- If you hit a bug in -mm and it is not obvious which patch caused it, it is
  most valuable if you can perform a bisection search to identify which patch
  introduced the bug.  Instructions for this process are at

        http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt

  But beware that this process takes some time (around ten rebuilds and
  reboots), so consider reporting the bug first and if we cannot immediately
  identify the faulty patch, then perform the bisection search.

- When reporting bugs, please try to Cc: the relevant maintainer and mailing
  list on any email.

- When reporting bugs in this kernel via email, please also rewrite the
  email Subject: in some manner to reflect the nature of the bug.  Some
  developers filter by Subject: when looking for messages to read.

- Occasional snapshots of the -mm lineup are uploaded to
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on
  the mm-commits list.  These probably are at least compilable.

- More-than-daily -mm snapshots may be found at
  http://userweb.kernel.org/~akpm/mmotm/.  These are almost certainly not
  compileable.



Changes since 2.6.25-rc2-mm1:


 origin.patch
 git-x86.patch
 git-acpi.patch
 git-alsa.patch
 git-avr32.patch
 git-cifs.patch
 git-cpufreq.patch
 git-powerpc.patch
 git-drm.patch
 git-dvb.patch
 git-hwmon.patch
 git-gfs2-nmw.patch
 git-dlm.patch
 git-hid.patch
 ...
From: Cornelia Huck
Date: Tuesday, March 4, 2008 - 4:59 am

On Tue, 4 Mar 2008 01:19:28 -0800,

This should go into 2.6.25, as it fixes a panic (see
http://marc.info/?l=linux-kernel&m=120411157302447&w=2,
http://marc.info/?l=linux-kernel&m=120412001416810&w=2).
--

From: Greg KH
Date: Tuesday, March 4, 2008 - 12:35 pm

Will add it to that queue to send to Linus in a bit, thanks for poking
me.

Hint, when sending patches, please at least change the Subject so that I
don't accidentally pass it by, it was burried in a longer thread that I
missed the first time through.

thanks,

greg k-h
--

From: Kamalesh Babulal
Date: Tuesday, March 4, 2008 - 6:12 am

Hi Andrew,

The 2.6.25-rc3-mm1 kernel panics while bootup on power box. The machine booted up
without the panic on the third attempt, but badness call trace were seen while running
tests

1) The kernel panic on first attempt

Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc00000000000cb2c
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=128 NUMA pSeries
Modules linked in:
NIP: c00000000000cb2c LR: c00000000000caf8 CTR: 0000000000000226
REGS: c00000000068f360 TRAP: 0300   Not tainted  (2.6.25-rc3-mm1-autotest)
MSR: 8000000000001032 <ME,IR,DR>  CR: 28000024  XER: 20000001
DAR: 0000000000000000, DSISR: 0000000040000000
TASK = c0000000005c8590[0] 'swapper' THREAD: c00000000068c000 CPU: 0
GPR00: c00000000068f5e0 c00000000068f5e0 c00000000068e690 0000000000000000 
GPR04: 00000000000035e0 000000000087264e c000000008011280 c000000000594000 
GPR08: c0000000005c9300 0000000000000000 c000000000591090 c00000000068c000 
GPR12: 8000000000009032 c0000000005c9300 0000000000000000 0000000000000000 
GPR16: 0000000000000000 0000000000000000 0000000000008000 0000000000000000 
GPR20: 0000000000000000 0000000000000000 000000000000007f 0000000000018000 
GPR24: 0000000000000001 0000000000000080 0000000000000018 0000000000000000 
GPR28: 0000000000000c00 c000000000588988 c000000000639be8 c000000008001c00 
NIP [c00000000000cb2c] .do_IRQ+0x74/0x1c4
LR [c00000000000caf8] .do_IRQ+0x40/0x1c4
Call Trace:
[c00000000068f5e0] [c00000000000caf8] .do_IRQ+0x40/0x1c4 (unreliable)
[c00000000068f680] [c000000000004790] hardware_interrupt_entry+0x18/0x1c
--- Exception: 501 at .memset+0x70/0xfc
    LR = .__alloc_bootmem_core+0x39c/0x3dc
[c00000000068f970] [c00000000068fa10] init_thread_union+0x3a10/0x4000 (unreliable)
[c00000000068fa30] [c00000000057237c] .__alloc_bootmem_node+0x38/0x8c
[c00000000068fad0] [c0000000003c477c] .zone_wait_table_init+0x74/0x108
[c00000000068fb60] [c0000000003d9058] .init_currently_empty_zone+0x40/0x11c
[c00000000068fc00] ...
From: Michael Neuling
Date: Tuesday, March 4, 2008 - 7:40 am

I'm not getting a crash but I am getting this:

   start_kernel(): bug: interrupts were enabled *very* early, fixing it

...and you're getting a null pointer access here (in do_IRQ):

	irq = ppc_md.get_irq();

Are we somehow enabling interrupts before we've setup ppc_md.get_irq?

--

From: Andrew Morton
Date: Tuesday, March 4, 2008 - 11:33 am

Yes, we are - it's the semaphore rewrite which is doing this in
start_kernel().  It's being discussed.

Enabling interrupts too early on powerpc was discovered to be fatal on
powerpc years ago.  It looks like that remains the case.

--

From: Benjamin Herrenschmidt
Date: Wednesday, March 5, 2008 - 1:23 am

Yes, it is and will probably always be. All that semaphore mucking
around that hard-enables interrupts is just asking for trouble (and on
more than just powerpc... heh, how do you do if your main interrupt
controller hasn't even been initialized yet ?)

Ben.


--

From: Benjamin Herrenschmidt
Date: Wednesday, March 5, 2008 - 5:03 pm

Regarding these issues. I could make it non fatal and just WARN_ON,
provided that I have a way to differentiate legal vs. illegal calls
to local_irq_enable(). We already have that function mostly out of
line in C code due to our lazy irq disabling scheme, so the overhead of
testing some global kernel state would be minimum here.

However, I don't see anything around init/main.c:start_kernel() that I
can use. What do you reckon here we should do ? Add some kind of global
we set before calling local_irq_enable() ? Or make early_boot_irqs_on()
do that generically 

It's currently defined as an empty inline without CONFIG_TRACE_IRQFLAGS
but we could make it set a flag instead.

I'm pretty sure other archs have similar problems, especially in the
embedded world where you are booted with random junk firmwares that may
leave devices, interrupt controllers etc... in random state, and
enabling incoming IRQs before the arch code properly initializes the
main interrupt controller can be fatal. I know at least of an ARM board
I worked on a while ago that had a similar issues.

On ppc32, unfortunately, our local_irq_enable/restore are nice inlines
that whack the appropriate MSR bits directly, thus adding a test for a
global flag would add some bloat/overhead that I'd like to avoid, at
least until we decide to also do lazy disabling on those, if ever...

Cheers,
Ben.


--

From: Andrew Morton
Date: Wednesday, March 5, 2008 - 5:44 pm

On Thu, 06 Mar 2008 11:03:31 +1100


I'd have thought that the way to do this would be to add it to lockdep -
lockdep already has all the infrastructure and code sites to do this.

Set some special flag saying its-ok-to-enable-interrupts-now and test that
in lockdep.

akpm:/usr/src/25> grep LOCKDEP arch/powerpc/Kconfig 
akpm:/usr/src/25> 

losers ;)

Still, doing it for

akpm:/usr/src/25> grep -l LOCKDEP arch/*/Kconfig 
arch/arm/Kconfig
arch/avr32/Kconfig
arch/mips/Kconfig
arch/s390/Kconfig
arch/sh/Kconfig
arch/sparc64/Kconfig
arch/um/Kconfig
arch/x86/Kconfig

should give pretty good coverage.
--

From: Benjamin Herrenschmidt
Date: Wednesday, March 5, 2008 - 5:52 pm

[Empty message]
From: Andrew Morton
Date: Tuesday, March 4, 2008 - 11:36 am

/* Convert GFP flags to their corresponding migrate type */
static inline int allocflags_to_migratetype(gfp_t gfp_flags)
{
        WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK);

Mel, Pekka: would you have some head-scratching time for this one please?
--

From: Pekka Enberg
Date: Tuesday, March 4, 2008 - 11:47 am

On Tue, 04 Mar 2008 18:42:19 +0530 Kamalesh Babulal

On Tue, Mar 4, 2008 at 8:36 PM, Andrew Morton

Sure. Just to double-check, this is with SLAB, right? Do you see this with SLUB?
--

From: Pekka Enberg
Date: Tuesday, March 4, 2008 - 12:18 pm

What we have is __getblk() -> __getblk_slow() -> grow_buffers() -> 
grow_dev_page() doing find_or_create_page() with __GFP_MOVABLE set. That 
path then eventually does radix_tree_preload -> kmem_cache_alloc() to a 
cache that has SLAB_RECLAIM_ACCOUNT set which implies __GFP_RECLAIMABLE 
(for both SLAB and SLUB). So we oops there.

I suspect the WARN_ON() is bogus although I really don't know that part 
of the code all too well. Mel?

			Pekka
--

From: Benjamin Herrenschmidt
Date: Wednesday, March 5, 2008 - 1:22 am

We are taking a HW interrupt ... we aren't supposed to take HW
interrupts that early during boot afaik.

Is it yet another case of somebody hard-enabling interrupts with
local_irq_enable() ?


--

From: Randy Dunlap
Date: Tuesday, March 4, 2008 - 9:35 am

i386 allmodconfig gives me this:

ERROR: "probe_4drives" [drivers/ide/ide-core.ko] undefined!

---
~Randy
--

From: Bartlomiej Zolnierkiewicz
Date: Thursday, March 6, 2008 - 2:14 pm

Hi,


It was also reported by Andrew & Stephen but the thing is that it doesn't
happen here with IDE tree, also it is quite strange that only probe_4drives
causes error and other probe_* variables don't.

I think that it is caused by something else in -mm / linux-next...

Thanks,
Bart
--

From: Randy Dunlap
Date: Tuesday, March 4, 2008 - 9:45 am

With
CONFIG_BLK_CPQ_DA=m
CONFIG_BLK_CPQ_CISS_DA=m
# CONFIG_CISS_SCSI_TAPE is not set

I'm getting
In file included from drivers/block/cciss.c:230:
drivers/block/cciss_scsi.c:1498:38: error: macro parameters must be comma-separated
drivers/block/cciss.c: In function 'cciss_seq_show_header':
drivers/block/cciss.c:271: error: implicit declaration of function 'cciss_seq_tape_report'
drivers/block/cciss.c: In function 'cciss_proc_write':
drivers/block/cciss.c:392: error: implicit declaration of function 'cciss_engage_scsi'
make[2]: *** [drivers/block/cciss.o] Error 1
make[1]: *** [drivers/block] Error 2
make[1]: *** Waiting for unfinished jobs....

---
~Randy
--

From: Miller, Mike (OS Dev)
Date: Tuesday, March 4, 2008 - 10:02 am

Randy,
It looks like you have the original broken patch. I resubmitted and I think Jens picked up the fixed patch but I don't know where it is...  :(

-- mikem
--

From: Randy Dunlap
Date: Tuesday, March 4, 2008 - 10:14 am

s/you/latest -mm/

I thought that this had been fixed, but I can't find it either... :(

Jens, did you queue a patch for this?

-- 
~Randy
--

From: Jens Axboe
Date: Tuesday, March 4, 2008 - 11:14 am

From: Hugh Dickins
Date: Tuesday, March 4, 2008 - 12:12 pm

use-page_cache_xxx-in-ext2.patch gave me lots of EXT2-fs error (device
loop0): ext2_find_entry: dir 52629 size 5120 exceeds block count 2
so I stopped it quickly.  Creating a directory entry was muddling up the
directory and the linked inode, writing directory page out to the latter.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 fs/ext2/dir.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- 2.6.25-rc3-mm1/fs/ext2/dir.c	2008-03-04 11:37:47.000000000 +0000
+++ linux/fs/ext2/dir.c	2008-03-04 18:25:24.000000000 +0000
@@ -472,7 +472,7 @@ void ext2_set_link(struct inode *dir, st
 int ext2_add_link (struct dentry *dentry, struct inode *inode)
 {
 	struct inode *dir = dentry->d_parent->d_inode;
-	struct address_space *mapping = inode->i_mapping;
+	struct address_space *mapping = dir->i_mapping;
 	const char *name = dentry->d_name.name;
 	int namelen = dentry->d_name.len;
 	unsigned chunk_size = ext2_chunk_size(dir);
--

From: Kamalesh Babulal
Date: Tuesday, March 4, 2008 - 12:20 pm

Hi Andrew,

kernel bug is triggered while running libhugetlbfs test with 2.6.25-rc3-mm1 kernel
over the x86 and power machines.

------------[ cut here ]------------
kernel BUG at mm/hugetlb.c:295!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/node/possible
Modules linked in:

Pid: 5484, comm: counters Not tainted (2.6.25-rc3-mm1-autokern1 #1)
EIP: 0060:[<c10535cf>] EFLAGS: 00010202 CPU: 0
EIP is at alloc_buddy_huge_page+0x7a/0xb0
EAX: c13acd01 EBX: f7d3a000 ECX: 00000000 EDX: 00006363
ESI: 00000001 EDI: 00000000 EBP: 00000000 ESP: f5539ebc
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process counters (pid: 5484, ti=f5538000 task=f60afa20 task.ti=f5538000)
Stack: 00000001 c1053669 fffffff4 00000001 f5539ecc f5539ecc 00000001 fffffff4 
       f55d0e78 00000001 c105480c 00000001 00200000 c1054875 00000000 f54426c0 
       00200000 00000000 f54426c0 c10b0fb8 fffffff4 00200000 00000000 f55d0e78 
Call Trace:
 [<c1053669>] gather_surplus_pages+0x64/0x16d
 [<c105480c>] hugetlb_acct_memory+0x1e/0x4a
 [<c1054875>] hugetlb_reserve_pages+0x3d/0x6b
 [<c10b0fb8>] hugetlbfs_file_mmap+0x9b/0xe1
 [<c104bf9f>] mmap_region+0x1dc/0x3ae
 [<c104bd42>] do_mmap_pgoff+0x27f/0x28e
 [<c1005af2>] sys_mmap2+0x5a/0x78
 [<c10029fa>] syscall_call+0x7/0xb
 =======================
Code: c1 e8 ed 27 1c 00 85 db 74 41 83 7b 04 00 75 10 68 c0 93 27 c1 e8 02 92 fc ff 58 e8 c1 02 fb ff f0 ff 4b 04 0f 94 c0 84 c0 74 04 <0f> 0b eb fe c7 43 38 3e 33 05 c1 8b 03 c1 e8 1c ff 04 85 60 ce 
EIP: [<c10535cf>] alloc_buddy_huge_page+0x7a/0xb0 SS:ESP 0068:f5539ebc
---[ end trace 5a47484f8fe93a33 ]---


------------[ cut here ]------------
cpu 0x3: Vector: 700 (Program Check) at [c0000000fb277740]
    pc: c0000000000c6f54: .alloc_buddy_huge_page+0x120/0x1dc
    lr: c0000000000c6f20: .alloc_buddy_huge_page+0xec/0x1dc
    sp: c0000000fb2779c0
   msr: 8000000000029032
  current = 0xc0000000fc4cae90
  paca    = 0xc0000000004fae80
    pid   = 6828, comm = counters
kernel BUG at ...
From: Andrew Morton
Date: Tuesday, March 4, 2008 - 12:51 pm

On Wed, 05 Mar 2008 00:50:17 +0530

Please send Adam a copy of that libhugetlbfs test ;)

hugetlb-correct-page-count-for-surplus-huge-pages.patch adds:

        if (page) {
                /*
                 * This page is now managed by the hugetlb allocator and has
                 * no users -- drop the buddy allocator's reference.
                 */
                int page_count = put_page_testzero(page);
                BUG_ON(page_count != 0);


--

From: Adam Litke
Date: Tuesday, March 4, 2008 - 3:01 pm

Ugh I got bitten by put_page_testzero().  When it returns 1, the page
count is zero (not the page count).

My initial version had a BUG_ON() with side-effects.  When a reviewer
pointed it out, I thought I could fix the patch up on its way out the
door.  I have self-administered my punishment.  This patch will fix it:

Signed-off-by: Adam Litke <agl@us.ibm.com>

--- mm/hugetlb.c.orig	2008-03-04 13:36:30.000000000 -0800
+++ mm/hugetlb.c	2008-03-04 13:39:30.000000000 -0800
@@ -291,8 +291,8 @@ static struct page *alloc_buddy_huge_pag
 		 * This page is now managed by the hugetlb allocator and has
 		 * no users -- drop the buddy allocator's reference.
 		 */
-		int page_count = put_page_testzero(page);
-		BUG_ON(page_count != 0);
+		put_page_testzero(page);
+		VM_BUG_ON(page_count(page));
 		nid = page_to_nid(page);
 		set_compound_page_dtor(page, free_huge_page);
 		/*
 
-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

--

From: Kamalesh Babulal
Date: Wednesday, March 5, 2008 - 12:52 am

Hi Adam,

Thanks the patch fixes the kernel bug while running the libhugetlbfs test.



-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--

From: Randy Dunlap
Date: Tuesday, March 4, 2008 - 1:24 pm

Both x86_64 and i386 builds throw these messages at me:

  LD      arch/x86/kernel/acpi/realmode/wakeup.elf
ld: warning: dot moved backwards before `.text'
ld: warning: dot moved backwards before `.text'
ld: warning: dot moved backwards before `.text'
  OBJCOPY arch/x86/kernel/acpi/realmode/wakeup.bin


---
~Randy
--

From: Rafael J. Wysocki
Date: Tuesday, March 4, 2008 - 3:33 pm

I think I saw something like this on a system with an "older" toolchain.
I'm not seeing it on openSUSE 10.3, though (using gcc 4.2.1).

Added CCs to the experts.

Thanks,
Rafael
--

From: Sam Ravnborg
Date: Wednesday, March 5, 2008 - 12:40 am

Google turned up this post:
http://sourceware.org/ml/binutils/2006-08/msg00235.html

I have no time to dig more into it the next days.

	Sam
--

From: Randy Dunlap
Date: Tuesday, March 4, 2008 - 2:26 pm

"make htmldocs" gives me:

  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/basic/docproc
make[1]: *** No rule to make target `Documentation/DocBook/9p-overview.eps', needed by `Documentation/DocBook/9p.xml'.  Stop.
make: *** [htmldocs] Error 2


Are we missing the .eps and .png files?

---
~Randy
--

From: Eric Van Hensbergen
Date: Tuesday, March 4, 2008 - 2:43 pm

Actually looks like we are missing a .fig (which generates the .eps or
.png as appropriate) and the template file.
Ugh, sorry, I must have messed up the patch.  I'll fix it in my tree tonight.

                -eric
--

From: Jiri Slaby
Date: Wednesday, March 5, 2008 - 3:51 am

This probably causes userspace damage:

dbus:
prctl(0x8, 0x1, 0, 0, 0)                = -1 EINVAL (Invalid argument)

named:
named: -u with Linux threads not supported: requires kernel support for 
prctl(PR_SET_KEEPCAPS)
prctl(0x8, 0x1, 0, 0, 0)          = -1 EINVAL (Invalid argument)

ntpd:
prctl(0x8, 0x1, 0xffffffffffffffa8, 0x1, 0) = -1 EINVAL (Invalid argument)
prctl(0x8, 0x1, 0, 0, 0)          = -1 EINVAL (Invalid argument)

$ grep CONFIG_SECURITY .config
# CONFIG_SECURITY is not set
# CONFIG_SECURITY_FILE_CAPABILITIES is not set
--

From: Jiri Slaby
Date: Wednesday, March 5, 2008 - 3:59 am

sorry, s/probably//
--

From: Serge E. Hallyn
Date: Wednesday, March 5, 2008 - 7:06 am

Thanks, Jiri.  Does the following patch work for you?

This patch address the !CONFIG_SECURITY case, but not the case of
using the dummy LSM.  The default these days is to have capabilities
compiled in no matter what, but it is still possible to have
CONFIG_SECURITY=y and CONFIG_SECURITY_CAPABILITIES=n, in which
case prctl(0x8) will return -EINVAL.  Do we want dummy to call
cap_prctl() as well, or are we ok with userspace getting -EINVAL
given that there are in fact no capabilities at that point and
the userspace code is clearly expecting them?

thanks,
-serge

From 4a66f19580489a3ac84f0a145e4585c09e65c88e Mon Sep 17 00:00:00 2001
From: Serge E. Hallyn <serue@us.ibm.com>
Date: Wed, 5 Mar 2008 06:02:32 -0800
Subject: [PATCH 1/1] capabilities: use cap_task_prctl when !CONFIG_SECURITY

capabilities-implement-per-process-securebits.patch introduced
cap_task_prctl() and moved the handling of capability-related
prctl into it.  So when !CONFIG_SECURITY, the default
security_task_prctl() needs to call cap_task_prctl() the way
other default hooks call capability helpers when they exist.

This fixes a slew of userspace breakages when
CONFIG_SECURITY=n.

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
---
 include/linux/security.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index 83763b0..861d6da 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -2228,7 +2228,7 @@ static inline int security_task_prctl (int option, unsigned long arg2,
 				       unsigned long arg4,
 				       unsigned long arg5, long *rc_p)
 {
-	return 0;
+	return cap_task_prctl(option, arg2, arg3, arg3, arg5, rc_p);
 }
 
 static inline void security_task_reparent_to_init (struct task_struct *p)
-- 
1.5.1

--

From: Jiri Slaby
Date: Wednesday, March 5, 2008 - 8:18 am

Tested-by: Jiri Slaby <jirislaby@gmail.com>
--

From: Andrew Morgan
Date: Sunday, March 9, 2008 - 9:28 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Acked-by: Andrew G. Morgan <morgan@kernel.org>

Cheers

Andrew

Serge E. Hallyn wrote:
|
| This patch address the !CONFIG_SECURITY case, but not the case of
| using the dummy LSM.  The default these days is to have capabilities
| compiled in no matter what, but it is still possible to have
| CONFIG_SECURITY=y and CONFIG_SECURITY_CAPABILITIES=n, in which
| case prctl(0x8) will return -EINVAL.  Do we want dummy to call
| cap_prctl() as well, or are we ok with userspace getting -EINVAL
| given that there are in fact no capabilities at that point and
| the userspace code is clearly expecting them?
|
| thanks,
| -serge
|
|>From 4a66f19580489a3ac84f0a145e4585c09e65c88e Mon Sep 17 00:00:00 2001
| From: Serge E. Hallyn <serue@us.ibm.com>
| Date: Wed, 5 Mar 2008 06:02:32 -0800
| Subject: [PATCH 1/1] capabilities: use cap_task_prctl when
!CONFIG_SECURITY
|
| capabilities-implement-per-process-securebits.patch introduced
| cap_task_prctl() and moved the handling of capability-related
| prctl into it.  So when !CONFIG_SECURITY, the default
| security_task_prctl() needs to call cap_task_prctl() the way
| other default hooks call capability helpers when they exist.
|
| This fixes a slew of userspace breakages when
| CONFIG_SECURITY=n.
|
| Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
| ---
|  include/linux/security.h |    2 +-
|  1 files changed, 1 insertions(+), 1 deletions(-)
|
| diff --git a/include/linux/security.h b/include/linux/security.h
| index 83763b0..861d6da 100644
| --- a/include/linux/security.h
| +++ b/include/linux/security.h
| @@ -2228,7 +2228,7 @@ static inline int security_task_prctl (int
option, unsigned long arg2,
|  				       unsigned long arg4,
|  				       unsigned long arg5, long *rc_p)
|  {
| -	return 0;
| +	return cap_task_prctl(option, arg2, arg3, arg3, arg5, rc_p);
|  }
|
|  static inline void security_task_reparent_to_init (struct task_struct *p)
-----BEGIN PGP SIGNATURE-----
Version: ...
From: Pavel Emelyanov
Date: Wednesday, March 5, 2008 - 6:04 am

With CONFIG_SYSFS not set got this on boot:

kobject: '<NULL>' (f88774c8): is not initialized, yet kobject_put() is
------------[ cut here ]------------
WARNING: at lib/kobject.c:652 kobject_put+0x29/0x3c()
Modules linked in: sky2 e1000
Pid: 1303, comm: modprobe Not tainted 2.6.25-rc3-mm1 #79
 [<c041855b>] warn_on_slowpath+0x40/0x66
 [<c041c687>] irq_exit+0x50/0x67
 [<c040cc70>] smp_apic_timer_interrupt+0x6e/0x7a
 [<c0403380>] apic_timer_interrupt+0x28/0x30
 [<c0418e36>] vprintk+0x2b0/0x2df
 [<c04118e8>] __update_rq_clock+0x1d/0x110
 [<c0565e43>] schedule_timeout+0x13/0x86

 [<c05656c2>] wait_for_common+0xd1/0x123
 [<c0418e79>] printk+0x14/0x18
 [<c04b34bf>] kobject_put+0x29/0x3c
 [<c0431e39>] free_module+0x2f/0x72
 [<c04328dd>] sys_init_module+0xa61/0x15d2

 [<c04ba863>] pci_bus_read_config_byte+0x0/0x58
 [<c0454f87>] vfs_read+0x6c/0x8b
 [<c0455323>] sys_read+0x3c/0x63
 [<c04028b2>] sysenter_past_esp+0x5f/0x85

 =======================
---[ end trace d50646e8e8e48682 ]---
BUG: atomic counter underflow at:
Pid: 1303, comm: modprobe Tainted: G        W 2.6.25-rc3-mm1 #79
 [<c04b4042>] kref_put+0x3a/0x55
 [<c0431e39>] free_module+0x2f/0x72
 [<c04328dd>] sys_init_module+0xa61/0x15d2
 [<c04ba863>] pci_bus_read_config_byte+0x0/0x58
 [<c0454f87>] vfs_read+0x6c/0x8b
 [<c0455323>] sys_read+0x3c/0x63
 [<c04028b2>] sysenter_past_esp+0x5f/0x85
 =======================

And same on any (int this case sky2) module unload (load is OK)

sky2 eth1: disabling interface
kobject: '<NULL>' (f886cb48): is not initialized, yet kobject_put() is being called.
------------[ cut here ]------------
WARNING: at lib/kobject.c:652 kobject_put+0x29/0x3c()
Modules linked in: e1000 [last unloaded: sky2]
Pid: 3216, comm: rmmod Tainted: G        W 2.6.25-rc3-mm1 #80
 [<c041855b>] warn_on_slowpath+0x40/0x66
 [<c041c687>] irq_exit+0x50/0x67
 [<c040cc70>] smp_apic_timer_interrupt+0x6e/0x7a
 [<c0403380>] apic_timer_interrupt+0x28/0x30
 [<c0418e36>] vprintk+0x2b0/0x2df
 [<c04118e8>] ...
From: Pavel Emelyanov
Date: Wednesday, March 5, 2008 - 6:12 am

Sorry, I forgot to change the subject in the previous letter.
Better late than never.
--

From: Kay Sievers
Date: Wednesday, March 5, 2008 - 6:31 am

From: Pavel Emelyanov
Date: Wednesday, March 5, 2008 - 6:38 am

From: Kay Sievers
Date: Wednesday, March 5, 2008 - 6:54 am

Ok. Care to enable CONFIG_DEBUG_KOBJECT, and post the part of the log
that happens right before the WARN()? We might get a hint where to look
for the stuff that goes wrong.

Thanks,
Kay

--

From: Pavel Emelyanov
Date: Wednesday, March 5, 2008 - 7:28 am

Hm... Not sure how may lines are required, but here'are the ones
that are related to sky2 module, which is loaded and then removed:

kobject: 'sky2' (f74de280): kobject_add_internal: parent: 'drivers', set: 'drivers'
PCI: Setting latency timer of device 0000:02:00.0 to 64
sky2 0000:02:00.0: v1.21 addr 0xdeefc000 irq 16 Yukon-EC (0xb6) rev 2
kobject: 'net' (f7512200): kobject_add_internal: parent: '0000:02:00.0', set: '<NULL>'
kobject: 'eth1' (f74ccb64): kobject_add_internal: parent: 'net', set: 'devices'
kobject: 'eth1' (f74ccb64): kobject_uevent_env
kobject: 'eth1' (f74ccb64): fill_kobj_path: path = '/devices/pci0000:00/0000:00:03.0/0000:02:00.0/net/eth1'
sky2 eth1: addr 00:0e:0c:3b:d8:8a
kobject: 'sky2' (f74de280): kobject_uevent_env
kobject: 'sky2' (f74de280): fill_kobj_path: path = '/bus/pci/drivers/sky2'
sky2 eth1: enabling interface
sky2 eth1: disabling interface
kobject: 'eth1' (f74ccb64): kobject_uevent_env
kobject: 'eth1' (f74ccb64): fill_kobj_path: path = '/devices/pci0000:00/0000:00:03.0/0000:02:00.0/net/eth1'
kobject: 'net' (f7512200): kobject_cleanup

kobject: 'net' (f7512200): auto cleanup kobject_del
kobject: 'net' (f7512200): calling ktype release
kobject: (f7512200): dynamic_kobj_release
kobject: 'net': free name
kobject: 'eth1' (f74ccb64): kobject_cleanup
kobject: 'eth1' (f74ccb64): calling ktype release
kobject: 'eth1': free name
kobject: 'sky2' (f74de280): kobject_cleanup

kobject: 'sky2' (f74de280): auto cleanup 'remove' event
kobject: 'sky2' (f74de280): kobject_uevent_env
kobject: 'sky2' (f74de280): fill_kobj_path: path = '/bus/pci/drivers/sky2'
kobject: 'sky2' (f74de280): auto cleanup kobject_del
kobject: 'sky2' (f74de280): calling ktype release

kobject: 'sky2': free name
kobject: '<NULL>' (f886cb48): is not initialized, yet kobject_put() is being called.
------------[ cut here ]------------
WARNING: at lib/kobject.c:652 kobject_put+0x29/0x3c()
Modules linked in: e1000 [last unloaded: sky2]
Pid: 3188, comm: rmmod Tainted: G        W ...
From: Greg KH
Date: Wednesday, March 5, 2008 - 9:40 am

Hm, but with CONFIG_SYSFS set this does not show up?

thanks,

greg k-h
--

From: Pavel Emelyanov
Date: Wednesday, March 5, 2008 - 9:59 am

From: Greg KH
Date: Wednesday, March 5, 2008 - 10:07 am

Thanks.  Odds are we have some sysfs issue in the module core, that code
really needs to be refactored, I'll go work on it to see if we can try
to isolate all of that code into one file, which should help find these
kinds of things easier.

thanks,

greg k-h
--

From: Valdis.Kletnieks
Date: Wednesday, March 5, 2008 - 12:21 am

x86_64, mostly 64-bit userspace, Dell Latitude D820, T7200 Core2 Duo...

So I gave CONFIG_PROFILE_LIKELY another try, and this time the thing actually
booted and got into userspace, but stuff started dying in rc.sysinit.

According to dmesg, they all died at the same place:

[    4.841459] rename_device[686]: segfault at ffffffffff7009be ip ffffffffff7009be sp 7fff7ccfb958 error 14
[    4.842384] rename_device[984]: segfault at ffffffffff7009be ip ffffffffff7009be sp 7fffb6fe9c68 error 14
[    4.843298] rename_device[981]: segfault at ffffffffff7009be ip ffffffffff7009be sp 7fffc18504c8 error 14
[    4.844184] rename_device[983]: segfault at ffffffffff7009be ip ffffffffff7009be sp 7fff512c8f48 error 14
[    6.099486] rename_device[1513]: segfault at ffffffffff7009be ip ffffffffff7009be sp 7fff47e88ad8 error 14
[    5.769289] rename_device[1516]: segfault at ffffffffff7009be ip ffffffffff7009be sp 7fffa317edd8 error 14
[    7.457229] fsck.ext3[1576]: segfault at ffffffffff7009be ip ffffffffff7009be sp 7fff3be947f8 error 14

(Note that not everything died - some renames, an fsck, and maybe I missed
something - but a lot of other stuff worked (dmesg, grep, cat, uname that I
ran, and a lot of things that rc.sysinit invoked - so that may tell us
something...)

/proc/self/maps says that's near:

ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

And my System.map says:

ffffffff80855a0c A __bss_stop
ffffffff80855a0c A _end
ffffffffff600000 T vgettimeofday
ffffffffff600100 t vread_tsc
ffffffffff600122 t vread_hpet
ffffffffff600140 D __vsyscall_gtod_data
ffffffffff600400 T vtime
ffffffffff600800 T vgetcpu
ffffffffff600870 D __vgetcpu_mode
ffffffffff600880 D __jiffies
ffffffffff600c00 T venosys_1
ffffffffff700000 A VDSO64_PRELINK
ffffffffff7005b0 A VDSO64_jiffies
ffffffffff7005b8 A VDSO64_vgetcpu_mode
ffffffffff7005c0 A VDSO64_vsyscall_gtod_data
<file ends there>

So we're in the same 4K as the VDSO64_* values, but some 0x4fe past ...
From: Andi Kleen
Date: Wednesday, March 5, 2008 - 10:45 am

Try this patch:

Remove unlikelies in vsyscall path

Remove unlikely in vsyscall path that conflict with unlikely profiling.
The unlikelies shouldn't be needed anyways because gcc predicts
condition leading to early return as unlikely by default and
for the loops it shouldn't make much difference

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux/arch/x86/kernel/vsyscall_64.c
===================================================================
--- linux.orig/arch/x86/kernel/vsyscall_64.c
+++ linux/arch/x86/kernel/vsyscall_64.c
@@ -128,7 +128,7 @@ static __always_inline void do_vgettimeo
 		seq = read_seqbegin(&__vsyscall_gtod_data.lock);
 
 		vread = __vsyscall_gtod_data.clock.vread;
-		if (unlikely(!__vsyscall_gtod_data.sysctl_enabled || !vread)) {
+		if (!__vsyscall_gtod_data.sysctl_enabled || !vread) {
 			gettimeofday(tv,NULL);
 			return;
 		}
@@ -169,7 +169,7 @@ time_t __vsyscall(1) vtime(time_t *t)
 {
 	struct timeval tv;
 	time_t result;
-	if (unlikely(!__vsyscall_gtod_data.sysctl_enabled))
+	if (!__vsyscall_gtod_data.sysctl_enabled)
 		return time_syscall(t);
 
 	vgettimeofday(&tv, NULL);
Index: linux/arch/x86/vdso/vclock_gettime.c
===================================================================
--- linux.orig/arch/x86/vdso/vclock_gettime.c
+++ linux/arch/x86/vdso/vclock_gettime.c
@@ -48,7 +48,7 @@ static noinline int do_realtime(struct t
 		ts->tv_sec = gtod->wall_time_sec;
 		ts->tv_nsec = gtod->wall_time_nsec;
 		ns = vgetns();
-	} while (unlikely(read_seqretry(&gtod->lock, seq)));
+	} while (read_seqretry(&gtod->lock, seq));
 	timespec_add_ns(ts, ns);
 	return 0;
 }
@@ -77,7 +77,7 @@ static noinline int do_monotonic(struct 
 		ns = gtod->wall_time_nsec + vgetns();
 		secs += gtod->wall_to_monotonic.tv_sec;
 		ns += gtod->wall_to_monotonic.tv_nsec;
-	} while (unlikely(read_seqretry(&gtod->lock, seq)));
+	} while (read_seqretry(&gtod->lock, seq));
 	vset_normalized_timespec(ts, secs, ns);
 	return 0;
 }
@@ -105,7 +105,7 @@ int ...
From: Andrew Morton
Date: Wednesday, March 5, 2008 - 11:02 am

Yes, but both those files now have:

/*
 * likely and unlikely explode when used in vdso in combination with
 * profile-likely-unlikely-macros.patch
 */
#undef likely
#define likely(x) (x)
#undef unlikely
#define unlikely(x) (x)

at the top, so it'll be something else.  Perhaps a `likely' snuck in via an
inline in a header file.  It would be better to add a #define DONT_DO_THAT
at the top of arch/x86/kernel/vsyscall_64.c and
arch/x86/vdso/vclock_gettime.c, then use that to defeat likely-profiling.

 arch/x86/kernel/vsyscall_64.c  |   11 ++---------
 arch/x86/vdso/vclock_gettime.c |   11 ++---------
 include/linux/compiler.h       |    3 ++-
 3 files changed, 6 insertions(+), 19 deletions(-)

diff -puN arch/x86/kernel/vsyscall_64.c~profile-likely-unlikely-macros-fix arch/x86/kernel/vsyscall_64.c
--- a/arch/x86/kernel/vsyscall_64.c~profile-likely-unlikely-macros-fix
+++ a/arch/x86/kernel/vsyscall_64.c
@@ -17,6 +17,8 @@
  *  want per guest time just set the kernel.vsyscall64 sysctl to 0.
  */
 
+#define SUPPRESS_LIKELY_PROFILING
+
 #include <linux/time.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
@@ -46,15 +48,6 @@
 #define __syscall_clobber "r11","cx","memory"
 
 /*
- * likely and unlikely explode when used in vdso in combination with
- * profile-likely-unlikely-macros.patch
- */
-#undef likely
-#define likely(x) (x)
-#undef unlikely
-#define unlikely(x) (x)
-
-/*
  * vsyscall_gtod_data contains data that is :
  * - readonly from vsyscalls
  * - written by timer interrupt or systcl (/proc/sys/kernel/vsyscall64)
diff -puN arch/x86/vdso/vclock_gettime.c~profile-likely-unlikely-macros-fix arch/x86/vdso/vclock_gettime.c
--- a/arch/x86/vdso/vclock_gettime.c~profile-likely-unlikely-macros-fix
+++ a/arch/x86/vdso/vclock_gettime.c
@@ -9,6 +9,8 @@
  * Also alternative() doesn't work.
  */
 
+#define SUPPRESS_LIKELY_PROFILING
+
 #include <linux/kernel.h>
 #include <linux/posix-timers.h>
 #include <linux/time.h>
@@ -23,15 +25,6 @@
 
 #define gtod ...
From: Andi Kleen
Date: Wednesday, March 5, 2008 - 11:22 am

I think you need to do it differently. Not undef/define, but set
some symbol that is checked by the unlikely profiler and it won't

Possible.  The problem is that there are now vsyscall functions in
other files too, especially hpet_64.c and tsc_64.c

Perhaps this is something that should be just checked in modpost instead. 
Any external references from the vsyscall section to another section
should be flag'ed as error (cc'ed Sam in case he wants to look at that) 

-Andi
--

From: Valdis.Kletnieks
Date: Wednesday, March 5, 2008 - 3:26 pm

Confirming that this patch works and my system goes multi-user cleanly.

Actual numbers after about 10 minutes of uptime:

% wc -l /proc/likely_prof 
2635 /proc/likely_prof
% grep '^[^ ]' /proc/likely_prof 
Likely Profiling Results
[+- ] Type | # True | # False | Function:Filename@Line
+unlikely |        1|        0  in_dev_get()@:include/linux/inetdevice.h@185
+unlikely |      513|        0  dst_input()@:include/net/dst.h@254
-likely   |        0|      148  ip6_mc_input()@:net/ipv6/ip6_input.c@271
-likely   |        0|        1  sock_error()@:include/net/sock.h@1211
-likely   |      851|     1219  tcp_transmit_skb()@:net/ipv4/tcp_output.c@493
+unlikely |        1|        0  signal_pending()@:include/linux/sched.h@1927
-likely   |        0|  1172946  audit_syscall_entry()@:kernel/auditsc.c@1522
+unlikely |  1172716|        0  syscall_trace_enter()@:arch/x86/kernel/ptrace.c@1556
-likely   |        0|  1173020  audit_syscall_exit()@:kernel/auditsc.c@1551
+unlikely |  1172831|        0  syscall_trace_leave()@:arch/x86/kernel/ptrace.c@1573
-likely   |        0|     1272  audit_alloc()@:kernel/auditsc.c@841
+unlikely |        3|        0  icmp_unreach()@:net/ipv4/icmp.c@773
+unlikely |        2|        1  nf_ct_attach()@:net/netfilter/core.c@230
-likely   |        0|        2  dst_gc_task()@:net/core/dst.c@82
+unlikely |      143|       61  fput_light()@:include/linux/file.h@77
+unlikely |      892|      424  _read_unlock_irqrestore()@:kernel/spinlock.c@375
+unlikely |       28|        0  sched_move_task()@:kernel/sched.c@7835
+unlikely |       28|        0  sched_move_task()@:kernel/sched.c@7828
+unlikely |      108|        0  verify_export_symbols()@:kernel/module.c@1401
+unlikely |      313|        0  verify_export_symbols()@:kernel/module.c@1393
+unlikely |       14|        0  ll_front_merge_fn()@:block/blk-merge.c@347
-likely   |       17|     1150  audit_free()@:kernel/auditsc.c@1428
-likely   |       17|  1174290  audit_get_context()@:kernel/auditsc.c@711
+unlikely |       ...
From: Andrew Morton
Date: Wednesday, March 5, 2008 - 4:49 pm

On Wed, 05 Mar 2008 17:26:25 -0500


These are all the ones which we got wrong on your setup, yes?

I wonder if assuming that current->audit_context is NULL is realistic
nowadays.

--

From: Valdis.Kletnieks
Date: Wednesday, March 5, 2008 - 12:59 pm

Nope, sorry... same behavior.  Apparently it's a (un)likely someplace
else...

I'm trying to figure out what's at 0x9be into the vdso, but not having
a lot of luck.
From: Andi Kleen
Date: Wednesday, March 5, 2008 - 2:56 pm

You can do objdump -Sr on the vdso/vsyscall object files and see
if there are any external references to unlikely related functions. If yes
the problem is in that function

-Andi
--

From: Badari Pulavarty
Date: Wednesday, March 5, 2008 - 2:34 pm

Hi Andrew,

Not able to boot 2.6.25-rc3-mm1 my ppc64 box.
2.6.25-rc2-mm1 and 2.6.25-rc3 boots fine.

I applied slab.c fix also.

Any other known issues ? My config file attached.
Here are the messages on the console.

Thanks,
Badari

Linux/PowerPC load: root=/dev/sda3 selinux=0 elevator=cfq numa=debug
kernelcore=1024M
Finalizing device tree... using OF tree (promptr=00c39a50)
OF stdout device is: /vdevice/vty@30000000
Hypertas detected, assuming LPAR !
command line: root=/dev/sda3 selinux=0 elevator=cfq numa=debug
kernelcore=1024M
memory layout at init:
  alloc_bottom : 00000000023d0000
  alloc_top    : 0000000008000000
  alloc_top_hi : 0000000072000000
  rmo_top      : 0000000008000000
  ram_top      : 0000000072000000
Looking for displays
instantiating rtas at 0x00000000077ca000 ... done
0000000000000000 : boot cpu     0000000000000000
0000000000000002 : starting cpu hw idx 0000000000000002... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x00000000023d1000 -> 0x00000000023d21cf
Device tree struct  0x00000000023d3000 -> 0x00000000023e0000
Calling quiesce ...
returning from prom_init


#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.25-rc3-mm1
# Wed Mar  5 10:34:39 2008
#
CONFIG_PPC64=y

#
# Processor support
#
# CONFIG_POWER4_ONLY is not set
CONFIG_POWER3=y
CONFIG_POWER4=y
# CONFIG_TUNE_CELL is not set
CONFIG_PPC_FPU=y
# CONFIG_ALTIVEC is not ...
From: Andrew Morton
Date: Wednesday, March 5, 2008 - 2:54 pm

On Wed, 05 Mar 2008 13:34:14 -0800

The semaphore consolidation code enables interrupts early in boot, when it
shouldn't.  This tends to make powerpc blow up.  Could be that this is what
you're hitting.

Matthew, is this ging to be fixed soon?

Thanks.  
--

From: Badari Pulavarty
Date: Wednesday, March 5, 2008 - 3:35 pm

Yes. I just backed out git-semaphore.patch and machine booted fine.

Thanks,
Badari

--

From: Stephen Rothwell
Date: Wednesday, March 5, 2008 - 4:17 pm

Hi Andrew,

On Wed, 5 Mar 2008 13:54:25 -0800 Andrew Morton <akpm@linux-foundation.org>=

There is a new version of these patches in the current linux-next tree ...

--=20
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
From: Valdis.Kletnieks
Date: Thursday, March 6, 2008 - 8:58 pm

Dell Latitude D820, x86_64, Core2 Duo T7200

'shutdown -h' blows up at the very end. shutdown -r works OK. I caught this one
with netconsole.  There's another, different, crash I've been seeing a bit
earlier in the shutdown -h as well, but I haven't been able to catch that one
yet...

[   74.254402] CPU 1 is now offline
[   74.255395] SMP alternatives: switching to UP code
[   74.256373] BUG: unable to handle kernel paging request at ffffffff8020a023
[   74.256373] IP: [<ffffffff80211872>] alternatives_smp_unlock+0x66/0x7b
[   74.256373] PGD 203067 PUD 207063 PMD 7e4cc163 PTE 20a161
[   74.256373] Oops: 0003 [1] PREEMPT SMP 
[   74.256373] last sysfs file: /sys/devices/virtual/block/dm-14/dev
[   74.256373] CPU 0 
[   74.256373] Modules linked in: rtc sha256_generic aes_generic acpi_cpufreq tpm_tis arc4 ecb pcmcia iwl3945 iTCO_wdt ohci1394 firmware_class iTCO_vendor_support yenta_socket watchdog_core thermal rsrc_nonstatic mac80211 snd_hda_intel intel_agp watchdog_dev ieee1394 pcmcia_core processor button ac battery cfg80211
[   74.256373] Pid: 1767, comm: halt Not tainted 2.6.25-rc3-mm1 #8
[   74.256373] RIP: 0010:[<ffffffff80211872>]  [<ffffffff80211872>] alternatives_smp_unlock+0x66/0x7b
[   74.256373] RSP: 0018:ffff81007ac63d10  EFLAGS: 00010093
[   74.256373] RAX: ffffffff80573190 RBX: ffff81007f83a8c0 RCX: ffffffff80563cec
[   74.256373] RDX: ffffffff8020a023 RSI: ffffffff8078a0b8 RDI: ffffffff80783018
[   74.256373] RBP: ffff81007ac63d28 R08: 0000000000000001 R09: ffffffff80563cec
[   74.256373] R10: ffffffff80200000 R11: ffff81007ac63d1f R12: 0000000000000000
[   74.256373] R13: 0000000000000001 R14: 0000000000000246 R15: ffff81007d156340
[   74.256373] FS:  00007f2d0ab206f0(0000) GS:ffffffff8076e000(0000) knlGS:0000000000000000
[   74.256373] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   74.256373] CR2: ffffffff8020a023 CR3: 000000007edf3000 CR4: 00000000000006e0
[   74.256373] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   74.256373] DR3: ...
From: Andrew Morton
Date: Thursday, March 6, 2008 - 11:16 pm

Yes, I hit a similar one during halt on the t61p.  But because of the
netconsole bustage I was only able to see (on the screen) oops #2 - oops #1
had scrolled off.  oops #2 had a simlar trace and the EIP was in
text_poke().

I suppose one of us should bisect it.
--

From: Valdis.Kletnieks
Date: Friday, March 7, 2008 - 12:52 am

OK, I finally managed to catch the *other* failure I was seeing at shutdown,
and it appears to be a variant on the same theme, so readers may feel free to
ignore the rest of this note unless they care about the gory details...

Apparently, if I booted with 'ignore_loglevel' (which is my default when using
netconsole), I hit the above traceback and I'm dead in the water, no alt-sysrq,
need to hold down the power button for 5 seconds.

If I boot with 'quiet' instead, I get the below set of tracebacks, which caused
the original BUG to go scrolling off-screen and obfuscating that it's the same
failure. Adding to the confusion, if it failed in this mode, alt-sysrq still
worked just fine, so alt-sysrq-S-S-U-B got me a reboot.

Now that I know that at least *part* of the issue is the same, I can go
bisecting.  Somebody *else* can ponder why ignore_loglevel/quiet causes the
big difference in behavior after the BUG, that part is beyond my ken...

[  168.036824] BUG: unable to handle kernel paging request at ffffffff8020a023
[  168.037300] IP: [<ffffffff80211872>] alternatives_smp_unlock+0x66/0x7b
[  168.037745] PGD 203067 PUD 207063 PMD 7f989163 PTE 20a161
[  168.037781] Oops: 0003 [1] PREEMPT SMP 
[  168.037781] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
[  168.037781] CPU 0 
[  168.037781] Modules linked in: rtc irnet ppp_generic slhc irtty_sir sir_dev ircomm_tty ircomm irda crc_ccitt sha256_generic aes_generic acpi_cpufreq tpm_tis arc4 ecb iwl3945 pcmcia nvidia(P)(U) firmware_class mac80211 ohci1394 snd_hda_intel cfg80211 yenta_socket ieee1394 iTCO_wdt iTCO_vendor_support thermal rsrc_nonstatic ac processor watchdog_core battery watchdog_dev button pcmcia_core intel_agp [last unloaded: x_tables]
[  168.037781] Pid: 3115, comm: halt Tainted: P          2.6.25-rc3-mm1 #8
[  168.037781] RIP: 0010:[<ffffffff80211872>]  [<ffffffff80211872>] alternatives_smp_unlock+0x66/0x7b
[  168.037781] RSP: 0000:ffff81007dbebd10  EFLAGS: 00010093
[  168.037781] RAX: ffffffff80573190 ...
From: Thomas Gleixner
Date: Friday, March 7, 2008 - 1:06 am

Can you decode ffffffff8020a023 via addr2line please ?

Thanks,
	tglx
--

From: Valdis.Kletnieks
Date: Friday, March 7, 2008 - 1:23 am

It's been a long day, and I couldn't get addr2line to work, it kept saying '??:0'.

However, this is in my System.map:

ffffffff8020a000 t poll_idle
ffffffff8020a009 t do_nothing
ffffffff8020a00f T set_personality_64bit
ffffffff8020a041 T release_thread
ffffffff8020a07d T arch_randomize_brk

so set_personality_64bit+0x14 or so?
From: Thomas Gleixner
Date: Friday, March 7, 2008 - 1:34 am

----------------------------------------------------------^^^^^^

The PTE has the RW bit cleared, so the fault is not a big surprise.

Thanks,
	tglx
--

From: Valdis.Kletnieks
Date: Friday, March 7, 2008 - 12:30 pm

Probably not surprisingly, the quilt bisect says the problem is git-x86,patch.
From: Andrew Morton
Date: Wednesday, March 12, 2008 - 12:32 am

Did rc5-mm1 fix this?
--

From: Valdis.Kletnieks
Date: Wednesday, March 12, 2008 - 7:19 pm

Nope, still blows up with exactly the same traceback.

I may have to try again to figure out how to bisect the git-x86 tree - Ingo
send me a pointer to his git-x86 cheat sheet, I looked at it but I couldn't
figure out how to tell 'git bisect' that the starting good spot was "whatever
corresponded to the git-x86 patch in 24-rc8-mm1" and bad was "25-rc3-mm1". I
tried using the first commit ID listed in the patch, but that gave me this:

(looking at first few lines of the git-x86.patch in the 25-rc3-mm1 broken-out):

commit fa70e201463a7f3d86b995249e57a8e27b31b5f8
Author: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Date:   Sun Feb 24 11:57:22 2008 +0100

but then:

% git bisect bad fa70e201463a7f3d86b995249e57a8e27b31b5f8
fatal: Needed a single revision
Bad rev input: fa70e201463a7f3d86b995249e57a8e27b31b5f8

And I didn't see any release tags in the x86 git tree that I could specify
either.

(Once I get the good and bad markers set, it "should be easy" - I've managed
to git-bisect through Linus's git tree before, but that was always easy
because "bad" was HEAD and "good" had a nice v2.6.2mumble-rcN tag to specify...

From: Andrew Morton
Date: Wednesday, March 12, 2008 - 7:32 pm

Yes, it's all a bit mysterious.  I just look in the changelog, which was
pull edout of the git diff via various means.

See how
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc5/2.6.25-rc5-mm...
starts with 5813a19cba5735b629cdeb156863dab814759128 and ends with
816543f9bf2fb77ff52083216a4537eb4e3058ec.  Use
5813a19cba5735b629cdeb156863dab814759128 as good and
816543f9bf2fb77ff52083216a4537eb4e3058ec as bad.

--

From: Valdis.Kletnieks
Date: Wednesday, March 12, 2008 - 8:57 pm

I *hope* I'm mis-reading Ingo's directions when I cut-n-pasted them -
first I pulled down the two trees, tried to bisect, had it give me the
"need a single revision" error, then I checked out a tree - and got a
*different* funky opaque error message when I tried to bisect:

[/usr/src/valdis/x86.git] git-init-db
Initialized empty Git repository in .git/
[/usr/src/valdis/x86.git] git-remote add linus git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
[/usr/src/valdis/x86.git] git-remote add x86 git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
[/usr/src/valdis/x86.git] git-remote update
Updating linus
warning: no common commits
(...)
Resolving deltas: 100% (598008/598008), done.
From git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
 * [new branch]      master     -> linus/master
remote: Counting objects: 105, done.
(...)
From git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
 * [new tag]         v2.6.12    -> v2.6.12
 * [new tag]         v2.6.12-rc2 -> v2.6.12-rc2
 (...)
 * [new tag]         v2.6.25-rc4 -> v2.6.25-rc4
 * [new tag]         v2.6.25-rc5 -> v2.6.25-rc5
Updating x86
remote: Counting objects: 2651, done.
(...)
Resolving deltas: 100% (1979/1979)
s: 100% (1979/1979), completed with 310 local objects.
From git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86
 * [new branch]      base       -> x86/base
 * [new branch]      for-akpm   -> x86/for-akpm
 * [new branch]      for-linus  -> x86/for-linus
 * [new branch]      latest     -> x86/latest
 * [new branch]      master     -> x86/master
 * [new branch]      origin     -> x86/origin
 * [new branch]      testing    -> x86/testing
[/usr/src/valdis/x86.git] git bisect start
[/usr/src/valdis/x86.git] git bisect good 5813a19cba5735b629cdeb156863dab814759128
fatal: Needed a single revision
Bad rev commit: ^{commit}
[/usr/src/valdis/x86.git] git branch list
fatal: Not a valid object name: 'master'.
[/usr/src/valdis/x86.git] git checkout -b ...
From: Andrew Morton
Date: Wednesday, March 12, 2008 - 9:27 pm

Try this:

echo "git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git#for-akpm" > .git/branches/git-foo
git-fetch git-foo
git-checkout git-foo
git-bisect start
git-bisect good 968f7910e8d10e5273977248f3d89193b32e8c20
git-bisect bad c28550f4f68a894a3c05141762f388b5a14f33e3
--

From: Valdis.Kletnieks
Date: Friday, March 14, 2008 - 11:50 am

Trying it against what I already pulled down:

[/usr/src/valdis/x86.git] echo "git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git#for-akpm" > .git/branches/git-foo
[/usr/src/valdis/x86.git] git-fetch git-foo
remote: Counting objects: 1642, done.
remote: Compressing objects: 100% (261/261), done.
remote: Total 1296 (delta 1090), reused 1238 (delta 1034)
Receiving objects: 100% (1296/1296), 197.24 KiB | 215 KiB/s, done.
Resolving deltas: 100% (1090/1090), completed with 218 local objects.
[/usr/src/valdis/x86.git] git-checkout git-foo
error: pathspec 'git-foo' did not match any file(s) known to git.
Did you forget to 'git add'?
[/usr/src/valdis/x86.git] git-bisect start
won't bisect on seeked tree
[/usr/src/valdis/x86.git] git-checkout -b git-foo git-foo
git checkout: updating paths is incompatible with switching branches/forcing
Did you intend to checkout 'git-foo' which can not be resolved as commit?

Trying again against a totally clean new directory:

[/usr/src/valdis] git --version
git version 1.5.4.3
[/usr/src/valdis] rm -rf x86.git
[/usr/src/valdis] mkdir x86.git
[/usr/src/valdis] cd x86.git
[/usr/src/valdis/x86.git] git-init-db
Initialized empty Git repository in .git/
[/usr/src/valdis/x86.git] git-remote add linus git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
[/usr/src/valdis/x86.git] git-remote update
Updating linus
warning: no common commits
remote: Counting objects: 721254, done.
remote: Compressing objects: 100% (130309/130309), done.
remote: Total 721254 (delta 598318), reused 711930 (delta 589976)
Receiving objects: 100% (721254/721254), 175.04 MiB | 3535 KiB/s, done.
Resolving deltas: 100% (598318/598318), done.
From git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
 * [new branch]      master     -> linus/master
remote: Counting objects: 105, done.
remote: Compressing objects: 100% (105/105), done.
remote: Total 105 (delta 0), reused 102 (delta 0)
Receiving objects: 100% (105/105), 30.40 KiB, ...
From: Ingo Molnar
Date: Friday, March 21, 2008 - 6:41 am

the best way to bisect the x86.git-only commits is to do:

  git-bisect bad x86/latest
  git-bisect good x86/base

the 'base' branch is the upstream tree that x86.git is based against. 
This will minimize the number of bisection points as well, because 
you'll only bisect x86.git patches.

[ and make sure you test x86/base first to establish that it's truly
  'good' :-) ]

	Ingo
--

From: Valdis.Kletnieks
Date: Friday, March 21, 2008 - 12:38 pm

OK, *that* got the bisect running.  However, after a few bisections, things
are getting weird...

(Note - I haven't done a git pull or update for a week and a bit, so the tree is
as of 03/14 or so...)

'git bisect log' reports:

git-bisect start
# bad: [21a418440c44b6a2cdf38fea2533a5398d6fd939] Move mp_bus_id_to_node to numa.c
git-bisect bad 21a418440c44b6a2cdf38fea2533a5398d6fd939
# good: [dba92d3bc49c036056a48661d2d8fefe4c78375a] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
git-bisect good dba92d3bc49c036056a48661d2d8fefe4c78375a
# good: [53f0f2bc547fd13a70a6adb86592301ec83b9fc7] x86 mmiotrace: comment about user space ABI
git-bisect good 53f0f2bc547fd13a70a6adb86592301ec83b9fc7
# good: [53f0f2bc547fd13a70a6adb86592301ec83b9fc7] x86 mmiotrace: comment about user space ABI
git-bisect good 53f0f2bc547fd13a70a6adb86592301ec83b9fc7
# good: [53f0f2bc547fd13a70a6adb86592301ec83b9fc7] x86 mmiotrace: comment about user space ABI
git-bisect good 53f0f2bc547fd13a70a6adb86592301ec83b9fc7
# good: [2702dd1be087ac7307b731d884ee48db6e1cdff6] x86: create smpcommon.c
git-bisect good 2702dd1be087ac7307b731d884ee48db6e1cdff6
# good: [ad42b55d36238ebb9fa4d7a538ef691a76397c46] x86: add KERN_INFO to show_unhandled_signals printout
git-bisect good ad42b55d36238ebb9fa4d7a538ef691a76397c46
# good: [56b412e63863ea82a5720315076c7dbd1d9888cd] x86: change x86 to use generic find_next_bit
git-bisect good 56b412e63863ea82a5720315076c7dbd1d9888cd
# good: [42de918f25dc9a49fb9688e22c2a3f2b156cc1bf] x86: prevent unconditional writes to DebugCtl MSR
git-bisect good 42de918f25dc9a49fb9688e22c2a3f2b156cc1bf

At this point, 'git bisect visualize' shows 9 commits left to bisect through,
and all are dated 03/10 or later.  However, since 25-rc3-mm1 had the problem,
it had to be something in-tree as of 03/05.

Is it possible that the problem code was in the git-x86 tree when Andrew
pulled for -rc3-mm1 and -rc5-mm1, but had been reverted by the time I grabbed
the tree, ...
From: Ingo Molnar
Date: Friday, March 21, 2008 - 12:58 pm

no, we frequently regenerate the x86.git tree so the dates have little 
relevance. If for any particular pull, x86/base is good and x86/latest 
is bad, then the bug is somewhere in those 200-300 patches inbetween. 
They are lined up linearly so should be perfectly bisectable.

	Ingo
--

From: Valdis.Kletnieks
Date: Friday, March 21, 2008 - 1:05 pm

OK, off to go try the last few bisects then...


From: Ingo Molnar
Date: Friday, March 21, 2008 - 1:12 pm

well ... your git bisection log does look suspiciously 'good', so 
something is wrong thee i think :-(

the chance to get 8 'good' bisection points in a row is 1:256. OTOH, the 
freshest x86 patches are always at the 'end' of the queue - which are 
also the ones most likely to break anything.

Are you sure the x86/base point is indeed 'good'? You can check it via:

 git-checkout -b tmp x86/base

and build+boot it.

	Ingo
--

From: Valdis.Kletnieks
Date: Friday, March 21, 2008 - 8:11 pm

On the other hand, this was broken in 25-rc3-mm1, so it's not a "fresh"

Did that, and it's good (as in 'shutdown -h now' powers off rather than BUG and
hanging).

"You're at Witt's End" -- Adventure, c. 1978

OK.. so far I've got:

25-rc3-mm1 is bad
25-rc5-mm1 is bad, and bisected down to git-x86.patch
x86/base as pulled last week is good
bisected to within the last 9 entries of x86/latest is good.

So I can't seem to replicate it using the git-x86 tree, but bisecting -mm
implicates it.  How very strange.

I even went and pulled Andrew's mmotm pile as of this afternoon, and got that
to built after having to heave only a dozen patches over the side and one or
two hand-fixes of patches - and *that* one is good too.

So I'm thinking that it was some "bump in the night" that was broken in the
x86 tree when Andrew pulled it for 25-rc5-mm1, but was fixed by the time I
pulled it a few days later to start git-bisecting it.

Given that -mmotm isn't showing the problem, I'm having a hard time coming
up with enthusiasm to keep chasing it.  If I see it happen again in a -mm
or Linus kernel, I'll restart the chase then....


From: Ingo Molnar
Date: Saturday, March 22, 2008 - 5:09 am

if it ever reappears then please check x86/latest first (without any 
other -mm bits) and notify us.

	Ingo
--

Previous thread: Re: drivers/net/wireless/b43legacy/ on mips by Ralf Baechle on Tuesday, March 4, 2008 - 2:02 am. (2 messages)

Next thread: none