Re: Redzone overwritten with CONFIG_SECURITY

Previous thread: [PATCH] fix SMP ordering hole in fcntl_setlk() (CVE-2008-1669) by Miloslav Semler on Monday, May 26, 2008 - 6:39 am. (1 message)

Next thread: [GIT PULL] SLUB/SLOB updates for 2.6.26 by Pekka J Enberg on Monday, May 26, 2008 - 8:39 am. (1 message)
From: Eric Sesterhenn
Date: Monday, May 26, 2008 - 7:34 am

hi,

i enabled CONFIG_SECURITY on current git and get tons of
Redzone overwritten errors during early boot, even
with CONFIG_SECURITY_CAPABILITIES and CONFIG_SECURITY_NETWORK
disabled. After a while it ends with a kernel panic
saying: not syncing: Out of memory and no killable process...
Root partition is ext3 format.

At the moment i dont have a camera at hand, so i'll try
to write down everything which looks interesting, please tell
me if i missed something.

The first 24 Bytes of the overwritten section contain
zeros. Then we have a constant 0x18, and three changing
values. the next three bites contain exactly the same
values, first the 0x18, then the two changing ones.

The only value i found so far matching the 0x18 and
which might be related to CONFIG_SECURITY is CAP_SYS_RESOURCE
defined in /include/linux/capability.h

BUG hugetlbfs_inode_cache: Redzone overwritten

INFO: 0xccd8e250-0xccd8e253. First byte 0x0 instead of 0xbb
Info: Slab 0xc119d1c0 objects=12 used=0 fs=0xccd8e000 flags=0x400020c3
Info: Object 0xccd8e00 offset=0 fp=0xccd8e280

Object 0xccd8e00: 00 00 00 ...
Object 0xccd8e10: 00 00 00 00 00 00 00 00 00 18 e0 d8 cc 18 e0 d8 cc
Object 0xccd8e20  00 00 00 ...
...

Pid: 1, comm:swapper Not tainted 2.6.26-rc3-00436-gb373303 #42
print_trailer
check_bytes_and_report
check_object
__slab_alloc
kmem_cache_alloc
? hugetlbfs_alloc_inode
? hugetlbfs_alloc_inode
hugetlbfs_alloc_inode
alloc_inote
new_inode
hugetlbs_get_inote
hugetlbfs_fill_super
? sget
? set_anon_super
get_sb_node
hugetlbfs_get_sb
? hugetlbfs_fill_super
vfs_kern_mount
kern_mount_data
init_hugetlbfs_fs
? init_once
? kernel_init
kernel_init


Config follows:

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.26-rc3
# Mon May 26 15:10:47 2008
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
# CONFIG_X86_64 is not set
CONFIG_X86=y
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
# CONFIG_GENERIC_LOCKBREAK is not ...
From: Eric Sesterhenn
Date: Tuesday, May 27, 2008 - 7:00 am

hi,

i tested a kmemcheck kernel as an attempt to debug
this further... seems CONFIG_SECURITY is unrelated to
this, but slub debugging only catches the
overwrite it if i enable CONFIG_SECURITY.

with slub_debug=FZPU i get the warning at
init_object+0x63:

(gdb) l *(init_object+0x63)
0xc0187243 is in init_object (mm/slub.c:544).
539	{
540		u8 *p = object;
541	
542		if (s->flags & __OBJECT_POISON) {
543			memset(p, POISON_FREE, s->objsize - 1);
544			p[s->objsize - 1] = POISON_END;
545		}
546	
547		if (s->flags & SLAB_RED_ZONE)
548			memset(p + s->objsize,

if i set slub_debug=- i get the kmemcheck warning at

(gdb) l *(__slab_alloc+0x238)
0xc0187bc8 is in __slab_alloc (mm/slub.c:303).
298		return *(void **)(object + s->offset);
299	}
300	
301	static inline void set_freepointer(struct kmem_cache *s, void
*object, void *fp)
302	{
303		*(void **)(object + s->offset) = fp;
304	}
305	
306	/* Loop over all objects in a slab */
307	#define for_each_object(__p, __s, __addr, __objects) \

I used the kmemcheck git tree from
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-kmemcheck-4.git

In case you need some of the other kmemcheck output please
let me know.

Greetings, Eric

--

From: Vegard Nossum
Date: Tuesday, May 27, 2008 - 7:23 am

Hello!


Oy, whow! :-)

I actually tried to reproduce your problem yesterday to see if

This is sort of expected. kmemcheck is not directly incompatible with
slub debugging, but it may produce some false positives (that we
haven't worked out yet). So I recommend that you turn slub debugging

Hm, yes. It would be nice to see the actual kmemcheck error message as
well in order to determine the cause of this.

I don't really see how that write (= fp) can cause an error, so it has
to be the s->offset dereference that is doing it. That seems extremely
unlikely and would indicate a bug in SLUB itself...

Out of curiosity, will your crash go away entirely if you compile the

It would be nice to see the whole dmesg if you can get it.

You should also make sure you have either

CONFIG_KMEMCHECK_ENABLED_BY_DEFAULT=y

set in your config or that you are booting with the kmemcheck=1
command-line option; otherwise, you'll only get the first warning
before kmemcheck auto-disables itself. Forcing it to stay on will
potentially give us more useful output.

There is actually a newer kmemcheck tree which supports
kmemcheck+SLAB, but the version you are running should be usable for
debugging your problem, so I'm not going to ask you to try that.

Thanks for trying it out, it would feel good if kmemcheck would
finally be useful for something :-) Good luck.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--

From: Eric Sesterhenn
Date: Tuesday, May 27, 2008 - 7:53 am

hi,


ah, wouldnt a config option or warning message make sense while

Ok, here we go, i tried to write it down as good as possible

BUG: unable to handle kernel paging request at cb801000
IP: [(c0187bc8)] __slab_alloc+0x238/0x5e0
*pde = 01019067 *pte = 0x801962
Thread overran stack, or stack corrupted
Oops: 0002 [#1] PREEMPT
Modules linked in:

Pid: 0, comm: swapper Not tainted (2.6.25-x86-latest,git....)
EIP: 0060:[(c0187bc8)] EFLAGS: 00010286 CPU:0
EIP is at __slab_alloc+0x238/0x5e0
EAX: c08a3d40 EBC: cf801000 ECX: 00000000 EDX: c125b024
ESI: cf801000 EDI: cf801000 EBP: c08a9f54 ESP: c08a9f24
 DS: 007b ES: 007b FS: 0000 CS: 0000 SS: 0068
Process swapper (pid: 0, ti=c08a9000 task=c08583c0, task.ti=c08a9000)
Stack: c125b024 c0441b07 00000008 00000000 ffffffff 000000d0 c08a3d40 00000010
	c125b024 00000000 c08a3d40 00000286 c08a9f78 c0189187 c0443d85 c08a3ddc
	c04433d85 000000d0 00000000 c08a9fb4 0000000a c08a9f98 c0443d85 c08a9fb8
Call Trace:
? vsnprintf+0x2d7
? __ kmalloc+0xf7
? kvasprintf+0x35
? kvasprintf+0x35
? kvasprintf+0x35
? kasprintf+0x17
? kmem_cache_init+0xcd
? start_kernel+0x1c9
? unknown_bootoption+0x0
? i386_start_kernel+0x8
=====
Copde: 0a 8b 59 04 0f af c3 8d 04 07 39 f8 76 36 89 fb eb 04 89 f3 89 ce 8b 55 f0 89 d9 8b 45 e8 e8 d0 ea ff ff 8b 45 e8 8b 48 0c 01 cb(89) 33 8b 5d f0 8b 50 04 0f b7 43 0a 8d 0c 32 0f af c2 8d 04 07
EIP: [(c0187bc8)] __slab_alloc+0x238/0x5e0 SS:ESP 0068:c08a9f24
--- end trace ---
Kernel panic - not syncing: attempted to kill the idle task!


At the moment i dont have a camera at hand and netconsole doesnt



yeah, its a nice project, is there a reason why it isnt in mainline yet?

Thanks for your help, Eric

--

From: Pekka Enberg
Date: Tuesday, May 27, 2008 - 7:55 am

Hi Eric,


Unfortunately kmemcheck does not catch writes to red-zone so it won't
help you debug the original issue.

                        Pekka
--

From: Pekka Enberg
Date: Tuesday, May 27, 2008 - 8:00 am

(added some cc's)


So what kernel version is this and what's the last known version that
worked? As it's early boot crash, maybe you can try to do git bisect
--

From: Eric Sesterhenn
Date: Tuesday, May 27, 2008 - 8:11 am

this is 2.6.26-rc4, i didnt test any earlier versions so far
(ok, i did test some pre -rc4 git versions i think 4 days
ago, but they also showed the problem) this is the
first time that i enabled CONFIG_SECURITY on my testbox.

I am currently trying to reproduce this with SLAB as Vegard
suggested for the kmemcheck report. After this I'll retest this
on a fresh tree to make sure it isnt something buggy on my part
and try some older kernels.

Greetings, Eric
--

From: Eric Sesterhenn
Date: Tuesday, May 27, 2008 - 9:11 am

ok, with CONFIG_SECURITY, SLAB and CONFIG_DEBUG_SLAB on the
-rc3 based kmemcheck tree enabled i dont get any error.
I am currently building a fresh -rc4 (with SLUB) to make
sure this is for real, then i'll try that again with SLAB
and then start testing older kernels. 

Greetings, Eric

--

From: Pekka Enberg
Date: Tuesday, May 27, 2008 - 10:59 am

Then it's likely that we corrupt hugetlbfs_inode_cachep because of
SLUB merging and the real problem is somewhere else. Can you also try
passing 'slub_nomerge" as a kernel parameter with SLUB?
--

From: Christoph Lameter
Date: Tuesday, May 27, 2008 - 11:04 am

Enabling Redzone disables merging. So its unrelated to merging.
 
--

From: Chris Wright
Date: Tuesday, May 27, 2008 - 10:47 am

ok, do_init_calls time frame with that config...CONFIG_SECURITY isn't
really doing any allocations, nor much in the way of memory writes.
It would get called into via the:

 hugetlbs_get_inode
   new_inode
     alloc_inode

I couldn't recreate with that config.
--

From: Eric Sesterhenn
Date: Wednesday, May 28, 2008 - 3:03 am

I did a fresh git-clone and tried again without being able
to reproduce this.
I diffed all .h and .c files and except for
the autogenerated ones they are exactly the same...

Is it possible that this was caused because a file didnt
get rebuild correctly? I can still reproduce it with the
old checkout. Sorry if this causes unessecary noise :(

Greetings, Eric
--

From: Chris Wright
Date: Wednesday, May 28, 2008 - 2:51 pm

I had wondered that, since data structures will grow w/ CONFIG_SECURITY
set (like inode, for example).  I haven't encountered a Kbuild dependency
bug in quite a while though.

thanks,
-chris
--

From: Chris Wright
Date: Saturday, May 31, 2008 - 4:24 pm

Yeah, this thing is miscompiled (thanks for the vmlinux).

$ cd /tmp/CONFIG_SECURITY-mem-corruption
$ gdb -q vmlinux
(gdb) p sizeof(struct inode)
$1 = 596
(gdb) p sizeof(struct hugetlbfs_inode_info)
$2 = 592

struct hugetlbfs_inode_info {
	struct shared_policy policy;
	struct inode vfs_inode;
};
 
The hugetlbfs_inode_info structure isn't updated with the 4 extra bytes
added from CONFIG_SECURITY to struct inode.

If you're interested in more gory details, you can look at:
$ eu-readelf -winfo vmlinux > readelf-info.out

thanks,
-chris
--

From: Pekka Enberg
Date: Tuesday, May 27, 2008 - 11:25 am

Hi Eric,


Unfortunately this transcribe is not useful for serious debugging. The
object ranges in the printout ("0xccd8e250" vs "0xccd8e00") and you
didn't write down contents of the "Redzone" range that has the
corrupting data. So a serial console output or a picture of the oops
would be much appreciated.
--

Previous thread: [PATCH] fix SMP ordering hole in fcntl_setlk() (CVE-2008-1669) by Miloslav Semler on Monday, May 26, 2008 - 6:39 am. (1 message)

Next thread: [GIT PULL] SLUB/SLOB updates for 2.6.26 by Pekka J Enberg on Monday, May 26, 2008 - 8:39 am. (1 message)