When i first setup blktrace on a s390 z/VM guest to trace to another
system and then put some load on the disk traced, the system oopses in
subbuf_splice_actor. The setup is as simple as
# blktrace -h tracehost -d /dev/sda
# dd if=/dev/sda of=/dev/null
This is the stack trace from the current 2.6.25-rc5, i added
noinline to subbuf_splice_actor, otherwise it will be inlined:
Unable to handle kernel pointer dereference at virtual kernel address 0000000000000000
Oops: 0004 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: binfmt_misc vmur
CPU: 1 Not tainted 2.6.25-rc5 #10
Process blktrace (pid: 2655, task: 000000002bc38238, ksp: 000000002b0d79a8)
Krnl PSW : 0704100180000000 00000000000874e2 (subbuf_splice_actor+0x212/0x364)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3
Krnl GPRS: 0a00000000000001 000000002b2bb000 0000000000001000 00000000000000c8
0000000000001000 0000000000001000 0000000000000000 0000000000000200
0000000000019000 0000000000000019 0000000000066fd8 000000002b0d79e8
000003e040ed7938 0000000000000000 000000000008749e 000000002b0d79e8
Krnl Code: 00000000000874d4: e31050b00004 lg %r1,176(%r5)
00000000000874da: 1854 lr %r5,%r4
00000000000874dc: e3cc10000004 lg %r12,0(%r12,%r1)
>00000000000874e2: e3c320000024 stg >%r12,0(%r3,%r2)
00000000000874e8: e330b2700014 lgf %r3,624(%r11)
00000000000874ee: eb330004000d sllg %r3,%r3,4
00000000000874f4: e320b2680004 lg %r2,616(%r11)
00000000000874fa: 1814 lr %r1,%r4
Call Trace:
([<000000000008749e>] subbuf_splice_actor+0x1ce/0x364)
[<00000000000876a2>] relay_file_splice_read+0x6e/0xfc
[<00000000000e4f90>] do_splice_to+0x9c/0xb4
[<00000000000e545c>] splice_direct_to_actor+0xd8/0x21c
[<00000000000e55ec>] do_splice_direct+0x4c/0x70
[<00000000000bc2be>] do_sendfile+0x1b6/0x228 ...That is indeed a bug, does this work for you?
diff --git a/kernel/relay.c b/kernel/relay.c
index d080b9d..39d1fa8 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -1066,7 +1066,7 @@ static int subbuf_splice_actor(struct file *in,
unsigned int flags,
int *nonpad_ret)
{
- unsigned int pidx, poff, total_len, subbuf_pages, ret;
+ unsigned int pidx, poff, total_len, subbuf_pages, nr_pages, ret;
struct rchan_buf *rbuf = in->private_data;
unsigned int subbuf_size = rbuf->chan->subbuf_size;
uint64_t pos = (uint64_t) *ppos;
@@ -1098,7 +1098,9 @@ static int subbuf_splice_actor(struct file *in,
pidx = (read_start / PAGE_SIZE) % subbuf_pages;
poff = read_start & ~PAGE_MASK;
- for (total_len = 0; spd.nr_pages < subbuf_pages; spd.nr_pages++) {
+ nr_pages = min_t(unsigned int, subbuf_pages, PIPE_BUFFERS);
+
+ for (total_len = 0; spd.nr_pages < nr_pages; spd.nr_pages++) {
unsigned int this_len, this_end, private;
unsigned int cur_pos = read_start + total_len;
--
Jens Axboe
--
With the patch, i can run dd and 'blktrace -h traceserver' without the oops. But the output from blktrace only contains only zeros and no usable data for blkparse. Using blktrace to write the data directly to disk, without using the blktrace server works. Is there anything i should look for to help debugging the problem? Christof --
We should probably get Tom in the loop, as he is the relay expert. I'll make sure the above patch gets into 2.6.25, as it is definitely a bug that needs fixing. -- Jens Axboe --
http://relayfs.sourceforge.net/contact.html mentions Tom Zanussi, but his email address seems to be no longer valid. I copy Dave Wilder here, since he is mentioned as relay maintainer on the web page. Dave, can you have a look at this? I can easily reproduce the problem on s390 Linux for testing and getting more debug information. Christof --
I will take a look at the problem. Any thing special about the test you are running I need to know? --
The test is about using the blktrace network feature on a 64 bit s390 guest on z/VM. I don't know if 64 bit and z/VM are related to the problem scenario, it is the only setup i tested. To reproduce, start somewhere 'blktrace -l' as a server and issue blktrace -h tracehostname dd if=/dev/sda of=/dev/null to start blktrace and some I/O load. Christof --
If the problem is in the splice part of relay, then the problem will likely go away if you add the -s parameter to blktrace. Might be handy to know if you are experimenting. -- Jens Axboe --
I still have your patch applied. blktrace with -s returns the trace data, as expected and blktrace without the -s returns data, but nothing useful for blkparse. Christof --
Two issues i found: First, in subbuf_splice_actor the function pointer spd_release is not initialized, but later, splice_to_pipe calls while (page_nr < spd_pages) spd->spd_release(spd, page_nr++); I would guess that the function pointer should point to spd_release_page or something similar. The second one: It seems that the loop in splice_direct_to_actor first calls do_splice_to to get the trace data into the pipe and then actor, to move the data out to the network socket. With some debug printks, i see that splice_to_pipe waits on the call pipe_wait(pipe) until there is some free space again. But if the same process should loop through the read and write parts, the pipe_wait would deadlock the process, or am i missing something here? Christof --
Hi,
Yeah, I no longer work for IBM - I need to update the relayfs site to
Not sure anyone's still looking into this, but it doesn't look like an
s390 problem specifically. Apparently what happened was that some
internal changes to splice seem to have broken the relay splice_read
implementation. The above patch fixes part of the problem; the patch
below should fix the rest, and AFAICT it doesn't break anything else.
Signed-off-by: Tom Zanussi <zanussi@comcast.net>
diff --git a/fs/splice.c b/fs/splice.c
index a861bb3..068b210 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1094,7 +1094,7 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
ret = splice_direct_to_actor(in, &sd, direct_splice_actor);
if (ret > 0)
- *ppos += ret;
+ *ppos = sd.pos;
return ret;
}
diff --git a/kernel/relay.c b/kernel/relay.c
index d6204a4..dc873fb 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -1162,7 +1162,7 @@ static ssize_t relay_file_splice_read(struct file *in,
ret = 0;
spliced = 0;
- while (len) {
+ while (len && !spliced) {
ret = subbuf_splice_actor(in, ppos, pipe, len, flags, &nonpad_ret);
if (ret < 0)
break;
--
Sorry for the late response and yes, our tests on s390 are still tracking the issue. I just ran a short test on 2.6.25 with the above patch applied. With this patch, blktrace client/server mode works again with the sendfile() call and returns valid data for blkparse. Thanks everybody for helping with this problem. Jens, will you pick up this patch for inclusion? Christof --
Certainly, thanks to Tom for providing this and you for testing! -- Jens Axboe --
Tom, did you verify that sd.pos is always updated? -- Jens Axboe --
Yeah, everything else that can be a sendfile source ends up using generic_file_splice_read(), which updates sd.pos as expected. Tom --
Goodie, thanks for confirming! -- Jens Axboe --
