This patch is a logical extension of the protection provided by CONFIG_DEBUG_RODATA to LKMs. The protection is provided by splitting module_core and module_init into three logical parts each and setting appropriate page access permissions for each individual section: 1. Code: RO+X 2. RO data: RO+NX 3. RW data: RW+NX In order to achieve proper protection, layout_sections() have been modified to align each of the three parts mentioned above onto page boundary. Next, the corresponding page access permissions are set right before successful exit from load_module(). Further, free_module() and sys_init_module have been modified to set module_core and module_init as RW+NX right before calling module_free(). By default, the original section layout and access flags are preserved. When compiled with CONFIG_DEBUG_SET_MODULE_RONX=y, the patch will page-align each group of sections to ensure that each page contains only one type of content and will enforce RO/NX for each group of pages. V1: Initial proof-of-concept patch. V2: The patch have been re-written to reduce the number of #ifdefs and to make it architecture-agnostic. Code formatting have been corrected also. V3: Opportunistic RO/NX protectiuon is now unconditional. Section page-alignment is enabled when CONFIG_DEBUG_RODATA=y. V4: Removed most macros and improved coding style. V5: Changed page-alignment and RO/NX section size calculation V6: Fixed comments. Restricted RO/NX enforcement to x86 only V7: Introduced CONFIG_DEBUG_SET_MODULE_RONX, added calls to set_all_modules_text_rw() and set_all_modules_text_ro() in ftrace V8: updated for compatibility with linux 2.6.33-rc5 V9: coding style fixes V10: more coding style fixes V11: minor adjutments for -tip V12: minor adjutments for v2.6.35-rc2-tip V13: minor adjutments for v2.6.37-rc1-tip Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com> Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Reviewed-by: James Morris ...
Commit-ID: 84e1c6bb38eb318e456558b610396d9f1afaabf0 Gitweb: http://git.kernel.org/tip/84e1c6bb38eb318e456558b610396d9f1afaabf0 Author: matthieu castet <castet.matthieu@free.fr> AuthorDate: Tue, 16 Nov 2010 22:35:16 +0100 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Thu, 18 Nov 2010 13:32:56 +0100 x86: Add RO/NX protection for loadable kernel modules This patch is a logical extension of the protection provided by CONFIG_DEBUG_RODATA to LKMs. The protection is provided by splitting module_core and module_init into three logical parts each and setting appropriate page access permissions for each individual section: 1. Code: RO+X 2. RO data: RO+NX 3. RW data: RW+NX In order to achieve proper protection, layout_sections() have been modified to align each of the three parts mentioned above onto page boundary. Next, the corresponding page access permissions are set right before successful exit from load_module(). Further, free_module() and sys_init_module have been modified to set module_core and module_init as RW+NX right before calling module_free(). By default, the original section layout and access flags are preserved. When compiled with CONFIG_DEBUG_SET_MODULE_RONX=y, the patch will page-align each group of sections to ensure that each page contains only one type of content and will enforce RO/NX for each group of pages. -v1: Initial proof-of-concept patch. -v2: The patch have been re-written to reduce the number of #ifdefs and to make it architecture-agnostic. Code formatting has also been corrected. -v3: Opportunistic RO/NX protection is now unconditional. Section page-alignment is enabled when CONFIG_DEBUG_RODATA=y. -v4: Removed most macros and improved coding style. -v5: Changed page-alignment and RO/NX section size calculation -v6: Fixed comments. Restricted RO/NX enforcement to x86 only -v7: Introduced CONFIG_DEBUG_SET_MODULE_RONX, added calls to set_all_modules_text_rw() and ...
This is incompatible with CONFIG_JUMP_LABEL: [ 252.093624] BUG: unable to handle kernel paging request at ffffffffa0680764 [ 252.094008] IP: [<ffffffff81225ee0>] generic_swap+0xa/0x1a [ 252.094008] PGD 1a1e067 PUD 1a22063 PMD 1093ac067 PTE 8000000109786161 [ 252.094008] Oops: 0003 [#1] PREEMPT SMP [ 252.094008] Pid: 3740, comm: modprobe Not tainted 2.6.37-rc3-mmotm1123 #1 0X564R/Latitude E6500 [ 252.094008] RIP: 0010:[<ffffffff81225ee0>] [<ffffffff81225ee0>] generic_swap+0xa/0x1a [ 252.094008] RSP: 0018:ffff88011a217d98 EFLAGS: 00010206 [ 252.094008] RAX: 00000000000000d9 RBX: 0000000000000030 RCX: 000000000000007c [ 252.094008] RDX: 0000000000000017 RSI: ffffffffa0680794 RDI: ffffffffa0680764 [ 252.094008] RBP: ffff88011a217d98 R08: 0000000000000000 R09: ffff88011a217d38 [ 252.094008] R10: 0000000000000000 R11: 000000000000013d R12: ffffffffa0680764 [ 252.094008] R13: 0000000000000018 R14: 0000000000000018 R15: 0000000000000018 [ 252.094008] FS: 00007f6ecb897720(0000) GS:ffff8800df100000(0000) knlGS:0000000000000000 [ 252.094008] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 252.094008] CR2: ffffffffa0680764 CR3: 000000011911e000 CR4: 00000000000406e0 [ 252.094008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 252.094008] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 252.094008] Process modprobe (pid: 3740, threadinfo ffff88011a216000, task ffff88011861e300) [ 252.094008] Stack: [ 252.094008] ffff88011a217e28 ffffffff81226007 ffffffff81a31158 0000000000000000 [ 252.094008] 0000000000000030 0000000000000048 ffffffff8105fa34 ffffffff81225ed6 [ 252.094008] ffffffff00000018 ffffffff00000018 0000000000000018 00000030ffffffff [ 252.094008] Call Trace: [ 252.094008] [<ffffffff81226007>] sort+0x117/0x1b0 [ 252.094008] [<ffffffff8105fa34>] ? jump_label_cmp+0x0/0x1b [ 252.094008] [<ffffffff81225ed6>] ? generic_swap+0x0/0x1a [ 252.094008] [<ffffffff8105faa6>] ...
Le Wed, 24 Nov 2010 22:41:07 -0500, could you try the attached patch ? on module load, we sort the __jump_table section. So we should make it writable. Matthieu
Confirming that fixes the issue I was seeing, thanks...
Acked-by: Kees Cook <kees.cook@canonical.com> Can this please get committed to tip? Thanks, -Kees -- Kees Cook Ubuntu Security Team --
Le Wed, 8 Dec 2010 14:19:51 -0800, I think it is not need anymore with Steven Rostedt patch [1] Matthieu --
Hi, Which patch was that? (Or, rather, what's a good url to see it?) -Kees -- Kees Cook Ubuntu Security Team --
Ah-ha! Thanks for the pointer. So, instead of the other explicit jump_table fix, can this get pushed into tip? Thanks! -Kees -- Kees Cook Ubuntu Security Team --
What's the status of this bug? If we still need the patch then please submit it standalone with a proper subject line, with acks/signoffs added, etc. Thanks, Ingo --
Steve Rostedt's patch that moves the setting of the page permissions seems to make this patch no longer necessary. I tripped over this same issue, but the version in the latest -mmotm does not need it, as it includes Steve's fix.
It would be nice to see that fix submitted so that it gets into the tree that introduced the bug. Steve, Andrew? Thanks, Ingo --
Hi Ingo, Which tree/branch should I base this on? I could do write up formal patch and push it out. -- Steve --
Please use tip:x86/security, it contains the RO/NX patches: 691513f70d39: x86: Resume trampoline must be executable 84e1c6bb38eb: x86: Add RO/NX protection for loadable kernel modules 5bd5a452662b: x86: Add NX protection for kernel data Thanks! Ingo --
Can I add your "Tested-by: ..." to the commit? Thanks, -- Steve --
Sure. I can verify that the patch series (a) doesn't break well-behaved modules and (b) it *does* trigger for misbehaving modules (I sent the VirtualBox crew a bug report for a module that tried to write to an area marked executable).
This patch breaks function tracer: ------------[ cut here ]------------ WARNING: at kernel/trace/ftrace.c:1014 ftrace_bug+0x114/0x171() Hardware name: Precision WorkStation 470 Modules linked in: i2c_core(+) Pid: 86, comm: modprobe Not tainted 2.6.37-rc2+ #68 Call Trace: [<ffffffff8104e957>] warn_slowpath_common+0x85/0x9d [<ffffffffa00026db>] ? __process_new_adapter+0x7/0x34 [i2c_core] [<ffffffffa00026db>] ? __process_new_adapter+0x7/0x34 [i2c_core] [<ffffffff8104e989>] warn_slowpath_null+0x1a/0x1c [<ffffffff810a9dfe>] ftrace_bug+0x114/0x171 [<ffffffffa00026db>] ? __process_new_adapter+0x7/0x34 [i2c_core] [<ffffffff810aa0db>] ftrace_process_locs+0x1ae/0x274 [<ffffffffa00026db>] ? __process_new_adapter+0x7/0x34 [i2c_core] [<ffffffff810aa29e>] ftrace_module_notify+0x39/0x44 [<ffffffff814405cf>] notifier_call_chain+0x37/0x63 [<ffffffff8106e054>] __blocking_notifier_call_chain+0x46/0x5b [<ffffffff8106e07d>] blocking_notifier_call_chain+0x14/0x16 [<ffffffff8107ffde>] sys_init_module+0x73/0x1f3 [<ffffffff8100acf2>] system_call_fastpath+0x16/0x1b ---[ end trace 2aff4f4ca53ec746 ]--- ftrace faulted on writing [<ffffffffa00026db>] __process_new_adapter+0x7/0x34 [i2c_core] Here we set the text read only before we call the notifiers. The function tracer changes the calls to mcount into nops via a notifier Below is the patch that fixes this bug. -- Steve Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> diff --git a/kernel/module.c b/kernel/module.c index ba421e6..5ccf52c 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -2804,18 +2804,6 @@ static struct module *load_module(void __user *umod, kfree(info.strmap); free_copy(&info); - /* Set RO and NX regions for core */ - set_section_ro_nx(mod->module_core, - mod->core_text_size, - mod->core_ro_size, - mod->core_size); - - /* Set RO and NX regions for init ...
That seems fine. I note that both before and after this patch we potentially execute code in the module (via parse_args) before we set it ro & nx. But fixing this last bit of coverage is probably not worth the pain... Cheers, Rusty. --
Rusty, can I take this as an "Acked-by"? Thanks, -- Steve --
Yep... Acked-by: Rusty Russell <rusty@rustcorp.com.au> Thanks, Rusty. --
Hi, Le Mon, 29 Nov 2010 13:15:42 -0500, That's look fine for me. But I wonder why ftrace_arch_code_modify_prepare isn't called ? It is only called when we start/stop tracing ? Matthieu --
Correct. There's no reason to use it for the changing of mcount callers to nops. For core kernel code, it happens before SMP is enabled. For modules, it happens before the module code is executed. Except for what Rusty stated with the module parameters, but if you are executing complex code with that, you deserve what you get ;-) The initial changes are made outside of stop machine. The code is not being executed by anyone else. -- Steve --
