Re: disk statistics issue in 2.6.27

Previous thread: Kernel 2.6.26.5 -> 2.6.27.2 [USB REGRESSION] (USB -> D_STATE) by Justin Piszcz on Sunday, October 19, 2008 - 4:26 am. (24 messages)

Next thread: [PATCH] Export tiny shmem_file_setup for DRM-GEM by Hugh Dickins on Sunday, October 19, 2008 - 4:52 am. (6 messages)
From: Miquel van Smoorenburg
Date: Sunday, October 19, 2008 - 4:31 am

I just upgraded one of our servers in the nntp cluster to 2.6.27.1 -
most of the others are running 2.6.26.something.

I noticed that the "iostat -k -x 2" output does't make any sense.
The number of reads/sec and number of writes/sec are about what I
would expect, and so are the other fields, but rkB/sec and wkB/sec
are  completely off-scale: gigabytes read/written per second.

To be sure it wasn't a bug in iostat I wrote a small perl script
to process /sys/block/sda/stats, and it shows the same problem.

Note that the stats in /proc/diskstats and /sys/block/<dev>/stats
are the same.

I've tried both the cfq and deadline I/O scheduler - no difference.

$ perl mystat.pl 
Device:   r/s   w/s     rkB/s     wkB/s
sda       141    53   2301120   1795200

Device:   r/s   w/s     rkB/s     wkB/s
sda       145     7   2366400   3394560

I compiled a 2.6.26.6 kernel with the exact same .config and it
doesn't show the problem.

I've been staring at include/linux/genhd.h and block/genhd.c
for a while but I just don't see it.

The mystat.pl perl script and my .config are below.
The machine is a dual xeon 2.2 Ghz, 32 bit, 4 GB mem, /dev/sda
is a 3-scsi-disk RAID5 array on an adaptec 2005S controller.

Any idea what could be causing this ?

Mike.

# ---------- mystat.pl --------------

#! /usr/bin/perl

my $dev = 'sda';

sub getstat {
	my $fn = "/sys/block/$_[0]/stat";
	open(FD, $fn) or die "$fn: $!\n";
	$_ = <FD>;
	close FD;
	split;
}

my @last = getstat($dev);
while (1) {
	sleep(2);
	my @s = getstat($dev);
	print "Device:   r/s   w/s     rkB/s     wkB/s\n";
	printf "%-8.8s %4d  %4d  %8d  %8d\n\n", $dev,
		$s[0] - $last[0], $s[5] - $last[5],
		($s[2] - $last[2]) / 2, ($s[6] - $last[6]) / 2;
	@last = @s;
}

# ---------- .config --------------

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.27.1
# Sat Oct 18 19:48:07 2008
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
# CONFIG_X86_64 is not ...
From: Jens Axboe
Date: Monday, October 20, 2008 - 10:12 am

Weird, I cannot reproduce this at all, iostat works fine for me in .26,
.27 and current -git as well. So it's just a plain SCSI drive from
linux, no software raid or dm?

Are the reported values in iostat any sort of multiple of the real
throughtput, or is is just insanely large?

-- 
Jens Axboe

--

From: Miquel van Smoorenburg
Date: Wednesday, October 22, 2008 - 7:37 am

It looks like it's a multiple, but it appears to vary between 128 and
512, and is different for reads and writes, so I do not know what to
make of it.

I tried to reproduce it on different systems - on another box with a
2010S controller I'm seeing the same thing. But on a different box with
SATA drives I cannot reproduce the problem.

Thanks for looking at this. I guess I'll have to start putting debug
statements in block/gendisk.c to find out what's wrong. I might not have
time for it until after next week though, but I won't give up :)

Mike.

--

From: Jens Axboe
Date: Wednesday, October 22, 2008 - 10:17 am

It's all really weird, especially if you see it on drivers both using
the SCSI layer. And it's a multiple of the transfer size, then it can't
be something like requeues skewing the count. But do look into it, I'll
do some testing on various drivers tomorrow as well.

-- 
Jens Axboe

--

From: Miquel van Smoorenburg
Date: Thursday, October 23, 2008 - 9:05 am

I added some debug statements to block/blk-core.c, and it appears that
blk_end_io() is always called with nr_bytes == 16320 Kbytes (16711680).

Ofcourse I should have noticed earlier that iostat -x 2 always prints
32640 as "average request size" (in sectors).

Mike.

--

From: Jens Axboe
Date: Friday, October 31, 2008 - 5:47 am

You didn't happen to log a backtrace from that, did you? That's a weird
nr_bytes, it's 0xff0000.

I'm assuming this then fixes it, just wondering what the heck is going
on here.

diff --git a/block/blk-core.c b/block/blk-core.c
index 33c8d4b..a372618 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1712,11 +1710,12 @@ static int __end_that_request_first(struct request *req, int error,
 	if (blk_fs_request(req) && req->rq_disk) {
 		const int rw = rq_data_dir(req);
 		struct hd_struct *part;
-		int cpu;
+		int cpu, sectors;
 
+		sectors = min_t(int, nr_bytes >> 9, req->hard_nr_sectors);
 		cpu = part_stat_lock();
 		part = disk_map_sector_rcu(req->rq_disk, req->sector);
-		part_stat_add(cpu, part, sectors[rw], nr_bytes >> 9);
+		part_stat_add(cpu, part, sectors[rw], sectors);
 		part_stat_unlock();
 	}
 

-- 
Jens Axboe

--

From: Miquel van Smoorenburg
Date: Monday, November 3, 2008 - 2:39 pm

I didn't try the above patch yet (it might work), but I did find the
cause. I bisected the problem and it was caused by commit
427e59f09fdba387547106de7bab980b7fff77be , something in the scsi layer.
More on linux-scsi (I'll Cc you)

Thanks

Mike.

--

Previous thread: Kernel 2.6.26.5 -> 2.6.27.2 [USB REGRESSION] (USB -> D_STATE) by Justin Piszcz on Sunday, October 19, 2008 - 4:26 am. (24 messages)

Next thread: [PATCH] Export tiny shmem_file_setup for DRM-GEM by Hugh Dickins on Sunday, October 19, 2008 - 4:52 am. (6 messages)