Re: blktrace/relay/s390: Oops in subbuf_splice_actor

Previous thread: linux-next20080314 build fails with !CONFIG_PM by Kamalesh Babulal on Friday, March 14, 2008 - 12:57 am. (7 messages)

Next thread: [PATCH] fsldma: Fix the DMA halt when using DMA_INTERRUPT async_tx transfer. by Zhang Wei on Friday, March 14, 2008 - 1:47 am. (1 message)
From: Christof Schmitt
Date: Friday, March 14, 2008 - 1:43 am

When i first setup blktrace on a s390 z/VM guest to trace to another
system and then put some load on the disk traced, the system oopses in
subbuf_splice_actor. The setup is as simple as

# blktrace -h tracehost -d /dev/sda
# dd if=/dev/sda of=/dev/null

This is the stack trace from the current 2.6.25-rc5, i added
noinline to subbuf_splice_actor, otherwise it will be inlined:

Unable to handle kernel pointer dereference at virtual kernel address 0000000000000000 
Oops: 0004 [#1] PREEMPT SMP DEBUG_PAGEALLOC 
Modules linked in: binfmt_misc vmur 
CPU: 1 Not tainted 2.6.25-rc5 #10 
Process blktrace (pid: 2655, task: 000000002bc38238, ksp: 000000002b0d79a8) 
Krnl PSW : 0704100180000000 00000000000874e2 (subbuf_splice_actor+0x212/0x364) 
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3 
Krnl GPRS: 0a00000000000001 000000002b2bb000 0000000000001000 00000000000000c8 
           0000000000001000 0000000000001000 0000000000000000 0000000000000200 
           0000000000019000 0000000000000019 0000000000066fd8 000000002b0d79e8 
           000003e040ed7938 0000000000000000 000000000008749e 000000002b0d79e8 
Krnl Code: 00000000000874d4: e31050b00004       lg      %r1,176(%r5) 
           00000000000874da: 1854               lr      %r5,%r4 
           00000000000874dc: e3cc10000004       lg      %r12,0(%r12,%r1) 
          >00000000000874e2: e3c320000024       stg    >%r12,0(%r3,%r2) 
           00000000000874e8: e330b2700014       lgf     %r3,624(%r11) 
           00000000000874ee: eb330004000d       sllg    %r3,%r3,4 
           00000000000874f4: e320b2680004       lg      %r2,616(%r11) 
           00000000000874fa: 1814               lr      %r1,%r4 
Call Trace: 
([<000000000008749e>] subbuf_splice_actor+0x1ce/0x364) 
 [<00000000000876a2>] relay_file_splice_read+0x6e/0xfc 
 [<00000000000e4f90>] do_splice_to+0x9c/0xb4 
 [<00000000000e545c>] splice_direct_to_actor+0xd8/0x21c 
 [<00000000000e55ec>] do_splice_direct+0x4c/0x70 
 [<00000000000bc2be>] do_sendfile+0x1b6/0x228 ...
From: Jens Axboe
Date: Friday, March 14, 2008 - 4:58 am

That is indeed a bug, does this work for you?

diff --git a/kernel/relay.c b/kernel/relay.c
index d080b9d..39d1fa8 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -1066,7 +1066,7 @@ static int subbuf_splice_actor(struct file *in,
 			       unsigned int flags,
 			       int *nonpad_ret)
 {
-	unsigned int pidx, poff, total_len, subbuf_pages, ret;
+	unsigned int pidx, poff, total_len, subbuf_pages, nr_pages, ret;
 	struct rchan_buf *rbuf = in->private_data;
 	unsigned int subbuf_size = rbuf->chan->subbuf_size;
 	uint64_t pos = (uint64_t) *ppos;
@@ -1098,7 +1098,9 @@ static int subbuf_splice_actor(struct file *in,
 	pidx = (read_start / PAGE_SIZE) % subbuf_pages;
 	poff = read_start & ~PAGE_MASK;
 
-	for (total_len = 0; spd.nr_pages < subbuf_pages; spd.nr_pages++) {
+	nr_pages = min_t(unsigned int, subbuf_pages, PIPE_BUFFERS);
+
+	for (total_len = 0; spd.nr_pages < nr_pages; spd.nr_pages++) {
 		unsigned int this_len, this_end, private;
 		unsigned int cur_pos = read_start + total_len;
 

-- 
Jens Axboe

--

From: Christof Schmitt
Date: Friday, March 14, 2008 - 6:05 am

With the patch, i can run dd and 'blktrace -h traceserver' without the
oops. But the output from blktrace only contains only zeros and no
usable data for blkparse. Using blktrace to write the data directly to
disk, without using the blktrace server works. Is there anything i
should look for to help debugging the problem?

Christof
--

From: Jens Axboe
Date: Friday, March 14, 2008 - 6:10 am

We should probably get Tom in the loop, as he is the relay expert. I'll
make sure the above patch gets into 2.6.25, as it is definitely a bug
that needs fixing.

-- 
Jens Axboe

--

From: Christof Schmitt
Date: Friday, March 14, 2008 - 6:22 am

http://relayfs.sourceforge.net/contact.html mentions Tom Zanussi, but
his email address seems to be no longer valid. I copy Dave Wilder
here, since he is mentioned as relay maintainer on the web page.

Dave, can you have a look at this? I can easily reproduce the problem
on s390 Linux for testing and getting more debug information.

Christof
--

From: David Wilder
Date: Friday, March 14, 2008 - 8:21 am

I will take a look at the problem.
Any thing special about the test you are running I need to know?
--

From: Christof Schmitt
Date: Friday, March 14, 2008 - 9:28 am

The test is about using the blktrace network feature on a 64 bit s390
guest on z/VM. I don't know if 64 bit and z/VM are related to the
problem scenario, it is the only setup i tested. To reproduce, start
somewhere 'blktrace -l' as a server and issue 
blktrace -h tracehostname
dd if=/dev/sda of=/dev/null
to start blktrace and some I/O load.

Christof
--

From: Jens Axboe
Date: Monday, March 17, 2008 - 1:08 am

If the problem is in the splice part of relay, then the problem will
likely go away if you add the -s parameter to blktrace. Might be handy
to know if you are experimenting.

-- 
Jens Axboe

--

From: Christof Schmitt
Date: Monday, March 17, 2008 - 8:19 am

I still have your patch applied. blktrace with -s returns the trace
data, as expected and blktrace without the -s returns data, but
nothing useful for blkparse.

Christof
--

From: Christof Schmitt
Date: Wednesday, March 19, 2008 - 2:16 am

Two issues i found: First, in subbuf_splice_actor the function pointer
spd_release is not initialized, but later, splice_to_pipe calls
	while (page_nr < spd_pages)
		spd->spd_release(spd, page_nr++);

I would guess that the function pointer should point to
spd_release_page or something similar.

The second one: It seems that the loop in splice_direct_to_actor first
calls do_splice_to to get the trace data into the pipe and then
actor, to move the data out to the network socket. With some debug
printks, i see that splice_to_pipe waits on the call pipe_wait(pipe)
until there is some free space again. But if the same process should
loop through the read and write parts, the pipe_wait would deadlock
the process, or am i missing something here?

Christof
--

From: Tom Zanussi
Date: Monday, April 7, 2008 - 9:23 pm

Hi,

Yeah, I no longer work for IBM - I need to update the relayfs site to

Not sure anyone's still looking into this, but it doesn't look like an
s390 problem specifically.  Apparently what happened was that some
internal changes to splice seem to have broken the relay splice_read
implementation.  The above patch fixes part of the problem; the patch
below should fix the rest, and AFAICT it doesn't break anything else.

Signed-off-by: Tom Zanussi <zanussi@comcast.net>

diff --git a/fs/splice.c b/fs/splice.c
index a861bb3..068b210 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1094,7 +1094,7 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
 
 	ret = splice_direct_to_actor(in, &sd, direct_splice_actor);
 	if (ret > 0)
-		*ppos += ret;
+		*ppos = sd.pos;
 
 	return ret;
 }
diff --git a/kernel/relay.c b/kernel/relay.c
index d6204a4..dc873fb 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -1162,7 +1162,7 @@ static ssize_t relay_file_splice_read(struct file *in,
 	ret = 0;
 	spliced = 0;
 
-	while (len) {
+	while (len && !spliced) {
 		ret = subbuf_splice_actor(in, ppos, pipe, len, flags, &nonpad_ret);
 		if (ret < 0)
 			break;



--

From: Christof Schmitt
Date: Wednesday, April 23, 2008 - 12:06 am

Sorry for the late response and yes, our tests on s390 are still
tracking the issue. I just ran a short test on 2.6.25 with the above
patch applied. With this patch, blktrace client/server mode works
again with the sendfile() call and returns valid data for blkparse.

Thanks everybody for helping with this problem.

Jens, will you pick up this patch for inclusion?

Christof
--

From: Jens Axboe
Date: Wednesday, April 23, 2008 - 12:08 am

Certainly, thanks to Tom for providing this and you for testing!

-- 
Jens Axboe

--

From: Jens Axboe
Date: Wednesday, April 23, 2008 - 12:45 am

Tom, did you verify that sd.pos is always updated?

-- 
Jens Axboe

--

From: Tom Zanussi
Date: Wednesday, April 23, 2008 - 9:32 pm

Yeah, everything else that can be a sendfile source ends up using
generic_file_splice_read(), which updates sd.pos as expected.

Tom



--

From: Jens Axboe
Date: Thursday, April 24, 2008 - 3:49 am

Goodie, thanks for confirming!

-- 
Jens Axboe

--

Previous thread: linux-next20080314 build fails with !CONFIG_PM by Kamalesh Babulal on Friday, March 14, 2008 - 12:57 am. (7 messages)

Next thread: [PATCH] fsldma: Fix the DMA halt when using DMA_INTERRUPT async_tx transfer. by Zhang Wei on Friday, March 14, 2008 - 1:47 am. (1 message)