Re: 2.6.28-rc6-git1 -- BUG: unable to handle kernel paging request at ffff8800be8b0019

Previous thread: [PATCH] x86: fixup config space size of CPU functions for AMD family 11h by Andreas Herrmann on Tuesday, November 25, 2008 - 9:18 am. (2 messages)

Next thread: [PATCH] LEDs: allow led-drivers to use a wider than 0...255 range of brightness values by Guennadi Liakhovetski on Tuesday, November 25, 2008 - 9:57 am. (5 messages)
From: Miles Lane
Date: Tuesday, November 25, 2008 - 9:50 am

# find /sys -name "*" | xargs tail

[ 3866.840809] BUG: unable to handle kernel paging request at ffff8800be8b0019
[ 3866.840819] IP: [<ffffffff80262f07>] kallsyms_lookup+0x20/0x120
[ 3866.840828] PGD 202063 PUD 117f067 PMD 1374067 PTE 80000000be8b0160
[ 3866.840835] Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 3866.840842] last sysfs file: /sys/kernel/slab/kmalloc-64/alloc_calls
[ 3866.840846] CPU 1
[ 3866.840849] Modules linked in: hci_uart serport ipv6 af_packet
binfmt_misc sco bridge stp llc rfcomm bnep l2cap bluetooth
acpi_cpufreq cpufreq_performance cpufreq_powersave iptable_filter
ip_tables sbp2 parport_pc lp parport snd_hda_intel pcmcia snd_pcm_oss
snd_mixer_oss snd_pcm snd_page_alloc snd_hwdep snd_seq_dummy
snd_seq_oss snd_seq_midi tifm_7xx1 iwl3945 evdev snd_rawmidi tifm_core
psmouse rfkill rng_core yenta_socket rsrc_nonstatic iTCO_wdt
iTCO_vendor_support mac80211 snd_seq_midi_event snd_seq snd_timer
snd_seq_device snd battery ac shpchp button soundcore loop sd_mod
sr_mod cdrom sg ohci1394 ieee1394 ehci_hcd uhci_hcd sky2 usbcore
thermal processor fan unix
[ 3866.840943] Pid: 7717, comm: tail Not tainted 2.6.28-rc6-git1 #6
[ 3866.840946] RIP: 0010:[<ffffffff80262f07>]  [<ffffffff80262f07>]
kallsyms_lookup+0x20/0x120
[ 3866.840953] RSP: 0018:ffff880073d63da8  EFLAGS: 00010292
[ 3866.840956] RAX: 0000000000000f9a RBX: ffffffff80695c45 RCX: ffff880073d63e00
[ 3866.840959] RDX: ffff880073d63df8 RSI: ffff880073d63df0 RDI: ffffffff80695c45
[ 3866.840962] RBP: ffff880073d63dd8 R08: ffff8800be8aff9a R09: 0000000000000000
[ 3866.840965] R10: ffff88013e8aff91 R11: 000000000000000a R12: ffff8800be8aff9a
[ 3866.840968] R13: ffff880073d63df8 R14: ffff880073d63df0 R15: ffff880073d63e00
[ 3866.840972] FS:  00007f707c7996e0(0000) GS:ffff8800bf002898(0000)
knlGS:0000000000000000
[ 3866.840975] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3866.840978] CR2: ffff8800be8b0019 CR3: 0000000090844000 CR4: 00000000000006e0
[ 3866.840982] DR0: 0000000000000000 DR1: 0000000000000000 DR2: ...
From: Rafael J. Wysocki
Date: Wednesday, November 26, 2008 - 4:26 pm

Did that happen after a resume from suspend to RAM, by chance?

Rafael
--

From: Pekka Enberg
Date: Thursday, November 27, 2008 - 6:37 am

Hi,

(I'm jumping in as Andrew forwarded the bug to us thinking it's SLUB related.)


Could be. I looked at the oops and I'm pretty sure SLUB is not at
fault here. Decoding the oopsing code:

  [ 3866.841062] Code: df e8 39 d2 ff ff 5b 41 5c c9 c3 55 48 89 e5 41 57 49 89
  cf 41 56 49 89 f6 41 55 49 89 d5 41 54 4d 89 c4 53 48 89 fb 48 83 ec
08 <41> c6
  40 7f 00 41 c6 00 00 48 81 ff 00 90 20 80 72 09 48 81 ff

results in:

  0000000000000000 <.text>:
     0:   41 c6 40 7f 00          movb   $0x0,0x7f(%r8)   <<<<----
     5:   41 c6 00 00             movb   $0x0,(%r8)
     9:   48 81 ff 00 90 20 80    cmp    $0xffffffff80209000,%rdi
    10:   72 09                   jb     0x1b
    12:   48                      rex.W
    13:   81                      .byte 0x81
    14:   ff                      .byte 0xff

which looks like this:

  000000000000023e <kallsyms_lookup>:
   */
  const char *kallsyms_lookup(unsigned long addr,
                              unsigned long *symbolsize,
                              unsigned long *offset,
                              char **modname, char *namebuf)
  {
   23e:   41 56                   push   %r14
   240:   49 89 f6                mov    %rsi,%r14
   243:   41 55                   push   %r13
   245:   49 89 d5                mov    %rdx,%r13
   248:   41 54                   push   %r12
   24a:   49 89 cc                mov    %rcx,%r12
   24d:   55                      push   %rbp
   24e:   48 89 fd                mov    %rdi,%rbp
   251:   53                      push   %rbx
          namebuf[KSYM_NAME_LEN - 1] = 0;
   252:   41 c6 40 7f 00          movb   $0x0,0x7f(%r8) <<<<----
   */

That is, we're oopsing because someone is passing a bogus 'namebuf' to
kallsyms_lookup(). This is further confirmed by looking at the value of R8:

  [ 3866.840962] RBP: ffff880073d63dd8 R08: ffff8800be8aff9a R09:
0000000000000000

and adding 0x7f to it:

  0xffff8800be8aff9a + 0x7f = 0xffff8800be8b0019

which equals to the ...
From: Vegard Nossum
Date: Thursday, November 27, 2008 - 6:50 am

Looks good, but I think you're looking at the wrong version of
sprint_symbol(). Try:

commit 966c8c12dc9e77f931e2281ba25d2f0244b06949
Author: Hugh Dickins <hugh@veritas.com>
Date:   Wed Nov 19 15:36:36 2008 -0800

    sprint_symbol(): use less stack

...

@@ -304,17 +304,24 @@ int sprint_symbol(char *buffer, unsigned long address)
        char *modname;
        const char *name;
        unsigned long offset, size;
-       char namebuf[KSYM_NAME_LEN];
+       int len;

-       name = kallsyms_lookup(address, &size, &offset, &modname, namebuf);
+       name = kallsyms_lookup(address, &size, &offset, &modname, buffer);


...so it might just be the caller's fault (depending on whether this
patch was correct or not).


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--

From: Hugh Dickins
Date: Thursday, November 27, 2008 - 6:54 am

I think you're looking at a 2.6.28-rc5 sprint_symbol() there:
the world has moved on since those days.  I changed it to use the
supplied "buffer" instead of local "namebuf" in 2.6.28-rc6, so we
have to wonder if my patch is to blame - though I don't see it.

Sorry, I'm eating lunch then about to go out for a couple of hours:
can't look into it now, but maybe this info will help you to make
better sense of what's going on.

Hugh
--

From: Pekka Enberg
Date: Thursday, November 27, 2008 - 7:13 am

Hi Hugh,

(I'm adding Vegard to the cc as he pointed out the same thing.)


Oh, right. I think I see where this is going. The buffer is coming
from sysfs and is PAGE_SIZE long. In SLUB, we do check for overflows
but list_locations() allows us to get as close as 100 bytes bytes from
the edge of the page. Unfortunately, kallsyms_lookup() does:

  const char *kallsyms_lookup(unsigned long addr,
                              unsigned long *symbolsize,
                              unsigned long *offset,
                              char **modname, char *namebuf)
  {
          namebuf[KSYM_NAME_LEN - 1] = 0;
          namebuf[0] = 0;

where KSYM_NAME_LEN is 128 which will tip us over the page edge
potentially triggering an oops. The proper fix is probably to pass a
length of namebuf to kallsyms_lookup() but it's probably best to
revert the patch this late in the release cycle?

                        Pekka
--

From: Hugh Dickins
Date: Thursday, November 27, 2008 - 10:34 am

Thanks for working that out.

Several possibilities, I don't know which is the right one for now,
hope Andrew can judge.

I don't see passing the length of namebuf to kallsyms_lookup()
as called for: it could have been designed that way from the start,
but it wasn't, and I don't see this bug as any reason to change that.

Reverting my patch for now: that's certainly a reasonable possibility,
but leaves us with several other such bugs.  Suggested patch below,
but the ftrace part of it worries me a little, since it's within a
structure and maybe it's a bad idea to enlarge that at this point;
I've also not _really_ done the arithmetic needed for the slub one.

An alternative quick just-for-now fix might be to remove that
	namebuf[KSYM_NAME_LEN - 1] = 0;
from kallsyms_lookup(): as I understand it (please check), that
could only make sense in cases where the symbol is KSYM_NAME_LEN
long or longer - in which case, all of the places fixed in the
patch below would be causing corruption already, even without my
patch.  I think.  Maybe that "= 0" even serves no purpose at all?

Hugh

[PATCH] KSYM_SYMBOL_LEN fixes

Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked to
my 966c8c12dc9e77f931e2281ba25d2f0244b06949 sprint_symbol(): use less stack
exposing a bug in slub's list_locations() - kallsyms_lookup() writes a 0
to namebuf[KSYM_NAME_LEN-1], but that was beyond the end of page provided.

The 100 slop which list_locations() allows at end of page looks roughly
enough for all the other stuff it might print after the symbol before
it checks again: break out KSYM_SYMBOL_LEN earlier than before.

Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they need
KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer where
it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies them.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 fs/proc/base.c         |    2 +-
 include/linux/ftrace.h |    2 +-
 kernel/latencytop.c    |    2 ...
From: Pekka Enberg
Date: Thursday, November 27, 2008 - 2:23 pm

Hi Hugh,


For SLUB, I think you can just replace the 100 with KSYM_NAME_LEN and it 
will work fine. The 100 constant is already supposed to be big enough to 
fit the symbol and the rest.


I like this approach the best but I worry it's a can of worms especially 
this late in the cycle.

As a stop-gap measure, I'm fine with your proposed patch either as-is or 
with the suggestions I had for the SLUB case.

		Pekka
--

From: Hugh Dickins
Date: Thursday, November 27, 2008 - 2:56 pm

That would fix the bug that my sprint_symbol patch has exposed
(or introduced, according to your perspective!); but that 100 can only
be big enough to fit symbol and all the rest if you assume that maximum
symbol length is ... I'm not sure exactly ... somewhere around 20 bytes?

If we assume that, then why does KSYM_NAME_LEN allow 128?  I'll agree
that 128 seems rather on the high side (I'd abhor symbols of anywhere
near that length in any code I came near), but that's what's allowed,

I'd be _much_ happier with my proposed patch if we get an Ack from
Steven on the ftrace.h part of it.

Hugh
--

From: Christoph Lameter
Date: Thursday, November 27, 2008 - 3:51 pm

Your fix looks fine. The maximum symbol length needs to be supported by
list_locations() althought a line with a symbol like that would look a bit
weird.


--

From: Pekka Enberg
Date: Friday, November 28, 2008 - 12:57 am

Yup,

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>

--

From: Steven Rostedt
Date: Thursday, November 27, 2008 - 4:27 pm

Acked-by: Steven Rostedt <srostedt@redhat.com>

Even on Turkey day ;-)

-- Steve

--

From: Steven Rostedt
Date: Thursday, November 27, 2008 - 4:31 pm

[ Added Frederic ]


Acked-by: Steven Rostedt <srostedt@redhat.com>

The boot tracer was written by Frederic.

-- Steve

--

From: Frederic Weisbecker
Date: Thursday, November 27, 2008 - 5:16 pm

No problem for me.
The ring buffer seems large enough to contain this entry size for each initcall :-)

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

--

From: Andrew Morton
Date: Friday, November 28, 2008 - 11:40 pm

This makes rather a mess when applied ahead of the (massive) ftrace
changes in linux-next.

afacit these two func[] arrays:

--- /dev/null
+++ b/include/trace/boot.h
@@ -0,0 +1,56 @@
+#ifndef _LINUX_TRACE_BOOT_H
+#define _LINUX_TRACE_BOOT_H
+
+/*
+ * Structure which defines the trace of an initcall
+ * while it is called.
+ * You don't have to fill the func field since it is
+ * only used internally by the tracer.
+ */
+struct boot_trace_call {
+	pid_t			caller;
+	char			func[KSYM_NAME_LEN];
+};
+
+/*
+ * Structure which defines the trace of an initcall
+ * while it returns.
+ */
+struct boot_trace_ret {
+	char			func[KSYM_NAME_LEN];
+	int				result;
+	unsigned long long	duration;		/* nsecs */
+};


will need to be changed to KSYM_SYMBOL_LEN.  I think.  Please fix it up
carefully when this all lands on your doorstep.

--

From: Pekka Enberg
Date: Friday, November 28, 2008 - 12:56 am

I looked into this and unfortunately that won't help us. For example,
module_address_lookup() will copy KSYM_NAME_LEN - 1 bytes to namebuf
anyway so we'll overflow the buffer.

It would probably be a good idea to add a comment to kallsyms_lookup()
stating that namebuf needs to have at least KSYM_NAME_LEN bytes
available.

		Pekka

--

From: Andrew Morton
Date: Monday, December 1, 2008 - 1:51 pm

On Thu, 27 Nov 2008 17:34:27 +0000 (GMT)

The ftrace.h change causes this:

In file included from arch/x86/kernel/machine_kexec_64.c:14:
include/linux/ftrace.h:234: error: 'MODULE_NAME_LEN' undeclared here (not in a function)

The fault really lies with include/linux/kallsyms.h, I think:

#define KSYM_SYMBOL_LEN (sizeof("%s+%#lx/%#lx [%s]") + (KSYM_NAME_LEN - 1) + \
                         2*(BITS_PER_LONG*3/10) + (MODULE_NAME_LEN - 1) + 1)

but it doesn't include module.h, so it requires that users of this
header perform the include.

But I'm a bit reluctant to include module.h from kallsyms.h because
kallsyms.h is a simple low-level thing:

#include <linux/errno.h>
#include <linux/kernel.h>
#include <linux/stddef.h>

and it wouldn't surprise me if module.h was including kallsyms.h by
some means.

So for now I'll try including module.h from ftrace.c.  It would be nice
to fix the kallsyms.h dependency..

--

From: Hugh Dickins
Date: Monday, December 1, 2008 - 3:47 pm

You ended up choosing ftrace.h, which seems more to the point.

Thank you for doing the work I should never have given you.

Hugh
--

From: Rusty Russell
Date: Monday, December 1, 2008 - 5:57 pm

Hmm... and I recently define MODULE_NAME_LEN in terms of MAX_PARAM_PREFIX_LEN 
for similar reasons.

Perhaps we should move MODULE_NAME_LEN to kernel.h and define everything in 
terms of that?  It seems to be a common need.

Rusty.
--

From: Miles Lane
Date: Thursday, November 27, 2008 - 9:14 am

I don't recall doing a suspend/resume.  I have tried to reproduce this
BUG, but haven't managed it thus far.  I will keep trying.  I can't spend
too much time on it, as I have a work project to complete.

Thanks,
          Miles

--

Previous thread: [PATCH] x86: fixup config space size of CPU functions for AMD family 11h by Andreas Herrmann on Tuesday, November 25, 2008 - 9:18 am. (2 messages)

Next thread: [PATCH] LEDs: allow led-drivers to use a wider than 0...255 range of brightness values by Guennadi Liakhovetski on Tuesday, November 25, 2008 - 9:57 am. (5 messages)