Re: [PATCH] tracing/function-return-tracer: don't trace kfree while it frees the return stack

Previous thread: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related by Fabio Comolli on Sunday, November 23, 2008 - 9:17 am. (20 messages)

Next thread: [RFC -tip] x86: introduce ENTRY(KPROBE)_X86 assembly helpers to catch unbalanced declaration by Cyrill Gorcunov on Sunday, November 23, 2008 - 9:57 am. (20 messages)
From: Frederic Weisbecker
Date: Sunday, November 23, 2008 - 9:33 am

Impact: fix a crash

While I killed the cat process, I got sometimes the following (but rare)
crash:

[   65.689027] Pid: 2969, comm: cat Not tainted (2.6.28-rc6-tip #83) AMILO Li 2727
[   65.689027] EIP: 0060:[<00000000>] EFLAGS: 00010082 CPU: 1
[   65.689027] EIP is at 0x0
[   65.689027] EAX: 00000000 EBX: f66cd780 ECX: c019a64a EDX: f66cd780
[   65.689027] ESI: 00000286 EDI: f66cd780 EBP: f630be2c ESP: f630be24
[   65.689027]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[   65.689027] Process cat (pid: 2969, ti=f630a000 task=f66cd780 task.ti=f630a000)
[   65.689027] Stack:
[   65.689027]  00000012 f630bd54 f630be7c c012c853 00000000 c0133cc9 f66cda54 f630be5c
[   65.689027]  f630be68 f66cda54 f66cd88c f66cd878 f7070000 00000001 f630be90 c0135dbc
[   65.689027]  f614a614 f630be68 f630be68 f65ba200 00000002 f630bf10 f630be90 c012cad6
[   65.689027] Call Trace:
[   65.689027]  [<c012c853>] ? do_exit+0x603/0x850
[   65.689027]  [<c0133cc9>] ? next_signal+0x9/0x40
[   65.689027]  [<c0135dbc>] ? dequeue_signal+0x8c/0x180
[   65.689027]  [<c012cad6>] ? do_group_exit+0x36/0x90
[   65.689027]  [<c013709c>] ? get_signal_to_deliver+0x20c/0x390
[   65.689027]  [<c0102b69>] ? do_notify_resume+0x99/0x8b0
[   65.689027]  [<c02e6d1a>] ? tty_ldisc_deref+0x5a/0x80
[   65.689027]  [<c014db9b>] ? trace_hardirqs_on+0xb/0x10
[   65.689027]  [<c02e6d1a>] ? tty_ldisc_deref+0x5a/0x80
[   65.689027]  [<c02e39b0>] ? n_tty_write+0x0/0x340
[   65.689027]  [<c02e1812>] ? redirected_tty_write+0x82/0x90
[   65.689027]  [<c019ee99>] ? vfs_write+0x99/0xd0
[   65.689027]  [<c02e1790>] ? redirected_tty_write+0x0/0x90
[   65.689027]  [<c019f342>] ? sys_write+0x42/0x70
[   65.689027]  [<c01035ca>] ? work_notifysig+0x13/0x19
[   65.689027] Code:  Bad EIP value.
[   65.689027] EIP: [<00000000>] 0x0 SS:ESP 0068:f630be24

This is because on do_exit(), kfree is called to free the return addresses stack
but kfree is traced and stored its return address in this stack.
This patch fixes it.

Signed-off-by: ...
From: Ingo Molnar
Date: Sunday, November 23, 2008 - 9:40 am

heh, nice one :)

note that we also need to keep gcc from reordering things here (no 
matter how unlikely in this particular case).

(also, small detail: we prefer a newline after variable definitions.)

Full commit attached below.

	Ingo

-------------->
From eae849ca034c7f1015f0a6f17421ebc737f0a069 Mon Sep 17 00:00:00 2001
From: Frederic Weisbecker <fweisbec@gmail.com>
Date: Sun, 23 Nov 2008 17:33:12 +0100
Subject: [PATCH] tracing/function-return-tracer: don't trace kfree while it frees the return stack

Impact: fix a crash

While I killed the cat process, I got sometimes the following (but rare)
crash:

[   65.689027] Pid: 2969, comm: cat Not tainted (2.6.28-rc6-tip #83) AMILO Li 2727
[   65.689027] EIP: 0060:[<00000000>] EFLAGS: 00010082 CPU: 1
[   65.689027] EIP is at 0x0
[   65.689027] EAX: 00000000 EBX: f66cd780 ECX: c019a64a EDX: f66cd780
[   65.689027] ESI: 00000286 EDI: f66cd780 EBP: f630be2c ESP: f630be24
[   65.689027]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[   65.689027] Process cat (pid: 2969, ti=f630a000 task=f66cd780 task.ti=f630a000)
[   65.689027] Stack:
[   65.689027]  00000012 f630bd54 f630be7c c012c853 00000000 c0133cc9 f66cda54 f630be5c
[   65.689027]  f630be68 f66cda54 f66cd88c f66cd878 f7070000 00000001 f630be90 c0135dbc
[   65.689027]  f614a614 f630be68 f630be68 f65ba200 00000002 f630bf10 f630be90 c012cad6
[   65.689027] Call Trace:
[   65.689027]  [<c012c853>] ? do_exit+0x603/0x850
[   65.689027]  [<c0133cc9>] ? next_signal+0x9/0x40
[   65.689027]  [<c0135dbc>] ? dequeue_signal+0x8c/0x180
[   65.689027]  [<c012cad6>] ? do_group_exit+0x36/0x90
[   65.689027]  [<c013709c>] ? get_signal_to_deliver+0x20c/0x390
[   65.689027]  [<c0102b69>] ? do_notify_resume+0x99/0x8b0
[   65.689027]  [<c02e6d1a>] ? tty_ldisc_deref+0x5a/0x80
[   65.689027]  [<c014db9b>] ? trace_hardirqs_on+0xb/0x10
[   65.689027]  [<c02e6d1a>] ? tty_ldisc_deref+0x5a/0x80
[   65.689027]  [<c02e39b0>] ? n_tty_write+0x0/0x340
[   65.689027]  [<c02e1812>] ? ...
From: Ingo Molnar
Date: Sunday, November 23, 2008 - 9:44 am

hm, this reminds me - this is the wrong place for the callback - the 
call stack is freed too early.

instead of freeing it in do_exit() (like it's done right now), it 
should be freed when the task struct and thread info is freed: in 
kernel/fork.c:free_task() - okay?

The difference is minor but could allow more complete tracing: as 
right now we'll skip the entries that get generated when we schedule 
away from a dead task.

	Ingo
--

From: Frederic Weisbecker
Date: Sunday, November 23, 2008 - 10:43 am

[Empty message]
From: Steven Rostedt
Date: Sunday, November 23, 2008 - 12:24 pm

I first thought that too, but thinking about it, if gcc does do that, then
it will break the logic for a correct C program.

t is passed in as a pointer, then it modifies the contents of t (which 
could be a global pointer), then it calls a external function, that might 
also reference the global pointer.

This means that if it were to reorder the two, it would break C, because 
the compiler can not assume that the called function will read the global 
pointer either.

In other words, the compiler should not need to worry about SMP or 
modifications done by interrupts or other threads. But the compiler should 
always preserve the order that is assumed by a single context.

-- Steve
--

From: Ingo Molnar
Date: Sunday, November 23, 2008 - 12:35 pm

Correct, but this assumes that kfree is a C function. Which it might 
not necessarily be: it could be optimized via an inline in certain 
cases, etc. It's best to document such cases explicitly.

In any case, the real solution is what i suggested in the previous 
mail, to do the freeing from the task-struct freeing path in 
kernel/fork.c:free_task() - that has other advantages as well.

	Ingo
--

From: Steven Rostedt
Date: Sunday, November 23, 2008 - 12:48 pm

Yeah, I thought about kfree being optimized out somehow, but thinking 
about what kfree does, it seems difficult to imagine how that could 

Yeah, but sometimes it's good to talk about quirks of a compiler, even on 
obsoleted situations ;-)

-- Steve

--

Previous thread: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related by Fabio Comolli on Sunday, November 23, 2008 - 9:17 am. (20 messages)

Next thread: [RFC -tip] x86: introduce ENTRY(KPROBE)_X86 assembly helpers to catch unbalanced declaration by Cyrill Gorcunov on Sunday, November 23, 2008 - 9:57 am. (20 messages)