Re: [PROBLEM] hard-lock with kmemtrace, relayfs, and splice

Previous thread: [PATCH] ext4: add an option to control error handling on file data by Hidehiro Kawai on Friday, October 10, 2008 - 5:04 am. (1 message)

Next thread: [Patch] kernel/kallsyms.c: fix double return's by Américo on Friday, October 10, 2008 - 5:43 am. (2 messages)
To: Eduard - Gabriel Munteanu <eduard.munteanu@...>
Cc: <zanussi@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Friday, October 10, 2008 - 5:42 am

(I'm cc'ing Tom, Jens, and LKML.)

To recap, with a CONFIG_KMEMTRACE enabled kernel from the
"topic/kmemtrace" branch of:

git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6.git topic/kmemtrace

running the "kmemtraced" program from

git://git.kernel.org/pub/scm/linux/kernel/git/penberg/kmemtrace-user.git

results to a hard lock on my machine. I am unable to find anything
obviously wrong with it and as I can read/write the relay files just
fine, I'm beginning to think it's problem in relayfs splice
implementation.

Tom, thoughts?

Pekka

--

To: Pekka Enberg <penberg@...>, Eduard - Gabriel Munteanu <eduard.munteanu@...>
Cc: <zanussi@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Friday, October 24, 2008 - 12:44 am

after doing a

git clone git://repo.or.cz/linux-2.6/kmemtrace.git

I reboot the OS and encountered several application crashes.....the
logs are as per attached....please comment.

for bug0 and bug1, the system go into a state of complete
non-responsive, even connecting via SSH into the system becomes
impossible, and bug0 and bug1 was generated just before it goes into
this state.

for bug2, the mouse seemed to respond, i managed to output the dmesg
(which is bug2 itself).

any comments?

--
Regards,
Peter Teoh

To: Peter Teoh <htmldeveloper@...>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@...>, <zanussi@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Friday, October 24, 2008 - 10:15 am

Hmm, you're hitting out-of-memory. I can't immediately see how kmemtrace
is at fault here. Eduard, thoughts?

--

To: Pekka Enberg <penberg@...>, Eduard - Gabriel Munteanu <eduard.munteanu@...>
Cc: <zanussi@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Friday, October 24, 2008 - 8:56 pm

b2suICAganVzdCBpZ25vcmUgaXQuICAgaSB0aGluayBiYXNlZCBvbiAiZ2l0IGxvZyIgb2YKImdp
dDovL3JlcG8ub3IuY3ovbGludXgtMi42L2ttZW10cmFjZS5naXQiOgoKY29tbWl0IDA0NDFlNWZm
NmFiNzFjZjdhM2U5ZWMzMTE2ZjE0ZDBmZDdkMjBkNTEKQXV0aG9yOiBFZHVhcmQgLSBHYWJyaWVs
IE11bnRlYW51IDxlZHVhcmQubXVudGVhbnVAbGludXgzNjAucm8+CkRhdGU6ICAgVGh1IEp1bCAx
MCAyMDoyMDowNSAyMDA4ICswMzAwCgogICAga21lbXRyYWNlOiBTTE9CIGhvb2tzLgoKdGhlIHJl
cG8gc2VlbWVkIHdheSBvdXQgb2YgZGF0ZS4gICBCdXQgYSBjbG9uZSBvZiBQZWtrYSdzIHNsYWIt
Mi42J3MKdG9waWMva21lbXRyYWNlIGJyYW5jaCBoYXZlIG5vIHByb2JsZW0uICAgVGhlIHN0ZXBz
IGFyZSBhcyBmb2xsb3dzOgoKMS4gICBnaXQgY2xvbmUgZ2l0Oi8vZ2l0Lmtlcm5lbC5vcmcvcHVi
L3NjbS9saW51eC9rZXJuZWwvZ2l0L3BlbmJlcmcvc2xhYi0yLjYuZ2l0CjIuICAgZ2l0IGNoZWNr
b3V0IC1tICBvcmlnaW4vdG9waWMva21lbXRyYWNlCjMuICAgZ2l0IGNoZWNrb3V0IC1iIGttZW10
cmFjZQoKZ2l0IGxvZyBnaXZlczoKCmNvbW1pdCA1MWIxOWJlMzUzNWM4ZmJjY2U2YjZmODM4ZDg5
YjlhNmE0Y2M1YjkyCkF1dGhvcjogVG9tIFphbnVzc2kgPHphbnVzc2lAY29tY2FzdC5uZXQ+CkRh
dGU6ICAgRnJpIE9jdCAxMCAyMzo1ODo1MSAyMDA4IC0wNTAwCgogICAgcmVsYXlmczogZml4IGlu
ZmluaXRlIGxvb3Agd2l0aCBzcGxpY2UoKQoKQnV0IHRoZSB1c2Vyc3BhY2UgdG9vbCBJIHN0aWxs
IGdldCBpdCBmcm9tOiBnaXQ6Ly9yZXBvLm9yLmN6L2ttZW10cmFjZS11c2VyLmdpdAoKQ29ycmVj
dD8KCk5vdyBJIGhhdmUgc29tZSBwcm9ibGVtczoKCmEuICAgSSB3b3VsZCBsaWtlIHRvIGV4dHJh
Y3Qgb3V0IGFsbCB0aGUgY29tbWl0IGFzIGRpZmYgLSBtYXkgSSBrbm93CmhvdyB0byBkbyB0aGF0
PyAoImdpdCBsb2ciIG9ubHkgZ2l2ZXMgdGhlIGRlc2NyaXB0aXZlIHBhcnQpLgpiLiAgIGJvb3Qt
dGltZSBtZW1vcnkgcHJvZmlsaW5nLi4uLmhvdyBjYW4gaXQgYmUgZG9uZSAob3IgZXh0cmFjdGVk
Cm91dCk/ICAoa21lbXRyYWNlLXVzZXIgZG9lcyBub3QgaGF2ZSB0aGF0PykKYy4gICBwbGVhc2Ug
cHJvdmlkZSBzb21lIHBvaW50ZXJzIHRvIGRvY3VtZW50YXRpb246ICAgaG93IGRvIEkKaW50ZXJw
cmV0IHRoZSBmb2xsb3dpbmc6CgpBbGxvY2F0aW9uICM4MzQ2OCAoQ1BVMCkgYWxyZWFkeSBleGlz
dHMhCiAgICAgICAgYnkgX19rbWFsbG9jX3RyYWNrX2NhbGxlcisyNQogICAgICAgIGxhc3QgdG91
Y2hlZCBieSBfX2ttYWxsb2NfdHJhY2tfY2FsbGVyCkFsbG9jYXRpb24gIzgzNzQwIChDUFUwKSBh
bHJlYWR5IGV4aXN0cyEKICAgICAgICBieSBfX2ttYWxsb2NfdHJhY2tfY2FsbGVyKzI1CiAgICAg
ICAgbGFzdCB0b3VjaGVkIGJ5IF9fa21hbGxvY190cmFja19jYWxsZXIKCkZvciBteSBwdXJwb3Nl
LCBJIHdvdWxkIGxpa2Ug...

To: Peter Teoh <htmldeveloper@...>
Cc: Pekka Enberg <penberg@...>, <zanussi@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Saturday, October 25, 2008 - 10:04 am

Hi,

Yes, that repo may be outdated, please use Pekka's for the latest

If you enable kmemtrace at boot-time using the "kmemtrace.enabled=yes"
parameter, kmemtrace will start logging just after kmem_cache_init()
finishes. There are plans for making tracing possible even earlier,
however it involves quite a big rework of how relay allocates its

The userspace app writes out those messages when it looks like a memory
region has been allocated twice. In reality, it's most likely a free

I have made provisions for exporting the whole stack trace, however it

Much of the booting sequence is done on a single CPU, then the kernel

Thanks for testing.

Cheers,
Eduard

--

To: Pekka Enberg <penberg@...>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Saturday, October 11, 2008 - 12:58 am

It looks like you hit the same problem as described here:

commit 8191ecd1d14c6914c660dfa007154860a7908857

splice: fix infinite loop in generic_file_splice_read()

relay uses the same loop but it never got noticed or fixed. Can you try
the following patch:

diff --git a/kernel/relay.c b/kernel/relay.c
index 8d13a78..6a4d439 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -1318,12 +1318,9 @@ static ssize_t relay_file_splice_read(struct file *in,
if (ret < 0)
break;
else if (!ret) {
- if (spliced)
- break;
- if (flags & SPLICE_F_NONBLOCK) {
+ if (flags & SPLICE_F_NONBLOCK)
ret = -EAGAIN;
- break;
- }
+ break;
}

*ppos += ret;

It worked for me, but I also had to apply the following patch to
kmemtraced:

diff --git a/kmemtraced.c b/kmemtraced.c
index 217478d..324ced9 100644
--- a/kmemtraced.c
+++ b/kmemtraced.c
@@ -109,6 +109,8 @@ static void *reader_thread(void *data)
if (retval < 0)
panic("splice() (from) failed: %s\n",
strerror(errno));
+ if (!retval)
+ continue;
retval = splice(pipe_fd[0], NULL, log_fd, NULL,
128, SPLICE_F_MOVE);
if (retval < 0)

Otherwise it would end up hanging kmemtraced in the second splice (pipe
to log_fd) if the return from the first splice was 0 (i.e. there's no
data available (and we can never know if there will ever be any
more)).

I'm not sure why kmemtraced is only splicing 128 bytes at a time - it
seems to defeat the purpose - or why it wouldn't be using poll to know
when there's at least a whole sub-buffer to splice, but to each his own.
Hopefully the kernel patch at least fixes the loop.

Tom

--

To: Tom Zanussi <zanussi@...>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Monday, October 13, 2008 - 2:57 am

Hi Tom,

Indeed. That fixes the deadlock.

However, now I don't get anything to the cpu*.out files if I run
kmemtraced with kmemtrace disabled. If I enable kmemtrace manually and
then run kmemtraced, I do receive some data. I did apply the
kmemtrace-user patch as well.

Hmm?

Pekka

--

To: Pekka Enberg <penberg@...>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Tuesday, October 14, 2008 - 12:03 am

Hi Pekka,

To me, that sounds like how it should work - if kmemtrace is disabled,
it shouldn't be logging anything, and that's in fact what I saw when
debugging this - it started out disabled and therefore nothing being
logged to relay (printks confirmed that). When I wrote 1 to the enabled
file, data started getting logged to relay and to the *.out files.

So I don't know why the enabled state behaves the way it does, or if
it's unexpected, but that anyway doesn't seem like a relay problem to
me.

--

To: Tom Zanussi <zanussi@...>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Tuesday, October 14, 2008 - 1:13 am

Hi Tom,

Oh, right. Looking at kmemtraced.c, we never enable kmemtrace, just
disable it (which doesn't make much sense). Bug in README or the code.
Eduard?

Btw, Tom, you can add my

Tested-by: Pekka Enberg <penberg@cs.helsinki.fi>

to your patch if you want. Are you going to send it to Andrew or do you
want me to pick it up with rest of the kmemtrace patches?

Pekka
--

To: Tom Zanussi <zanussi@...>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Tuesday, October 14, 2008 - 3:05 am

From: Pekka Enberg <penberg@cs.helsinki.fi>
Date: Tue, 14 Oct 2008 10:03:51 +0300
Subject: [PATCH] kmemtrace: enable tracing in kmemtraced

If kmemtrace is not enabled at boot, we need to enable it before we start
relaying data to userspace.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
---
kmemtraced.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kmemtraced.c b/kmemtraced.c
index 217478d..5cc1ada 100644
--- a/kmemtraced.c
+++ b/kmemtraced.c
@@ -165,8 +165,6 @@ int main(int argc, char *argv[])
if (!readers)
panic("Out of memory!\n");

- write_str("/sys/kernel/debug/kmemtrace/enabled", "0");
-
printf("Copying /proc/kallsyms...\n");
copy_kallsyms();

@@ -180,6 +178,8 @@ int main(int argc, char *argv[])
strerror(errno));
}

+ write_str("/sys/kernel/debug/kmemtrace/enabled", "1");
+
printf("Logging... Press Control-C to stop.\n");

while (sigwait(&signals, &signal) == 0) {
--
1.5.4.3

--

To: Pekka Enberg <penberg@...>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Tuesday, October 14, 2008 - 1:46 am

Hi Pekka,

If you pick it up that would be fine with me. Here's my sob:

Signed-off-by: Tom Zanussi <tzanussi@gmail.com>

Thanks,

--

To: Tom Zanussi <zanussi@...>
Cc: Pekka Enberg <penberg@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Tuesday, October 14, 2008 - 3:30 am

Great, this makes it go swifter. I'll commit the change to
kmemtrace-user.

--

To: Tom Zanussi <zanussi@...>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Tuesday, October 14, 2008 - 2:58 am

To: Tom Zanussi <zanussi@...>
Cc: Pekka Enberg <penberg@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Saturday, October 11, 2008 - 2:17 pm

Cheers,
Eduard

--

To: Pekka Enberg <penberg@...>
Cc: <zanussi@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Friday, October 10, 2008 - 7:51 am

Hmm, I've seen kmemtraced not stopping, but no hard locks. It seems not
even SIGKILL can stop it. As far as I know, SIGKILL should stop even a
blocking splice() (in case this is the problem). This is what I did:
# ./kmemtraced &
# killall -KILL kmemtraced
... kmemtraced still running and using CPU.

Eduard

--

Previous thread: [PATCH] ext4: add an option to control error handling on file data by Hidehiro Kawai on Friday, October 10, 2008 - 5:04 am. (1 message)

Next thread: [Patch] kernel/kallsyms.c: fix double return's by Américo on Friday, October 10, 2008 - 5:43 am. (2 messages)