I just upgraded one of our servers in the nntp cluster to 2.6.27.1 -
most of the others are running 2.6.26.something.
I noticed that the "iostat -k -x 2" output does't make any sense.
The number of reads/sec and number of writes/sec are about what I
would expect, and so are the other fields, but rkB/sec and wkB/sec
are completely off-scale: gigabytes read/written per second.
To be sure it wasn't a bug in iostat I wrote a small perl script
to process /sys/block/sda/stats, and it shows the same problem.
Note that the stats in /proc/diskstats and /sys/block/<dev>/stats
are the same.
I've tried both the cfq and deadline I/O scheduler - no difference.
$ perl mystat.pl
Device: r/s w/s rkB/s wkB/s
sda 141 53 2301120 1795200
Device: r/s w/s rkB/s wkB/s
sda 145 7 2366400 3394560
I compiled a 2.6.26.6 kernel with the exact same .config and it
doesn't show the problem.
I've been staring at include/linux/genhd.h and block/genhd.c
for a while but I just don't see it.
The mystat.pl perl script and my .config are below.
The machine is a dual xeon 2.2 Ghz, 32 bit, 4 GB mem, /dev/sda
is a 3-scsi-disk RAID5 array on an adaptec 2005S controller.
Any idea what could be causing this ?
Mike.
# ---------- mystat.pl --------------
#! /usr/bin/perl
my $dev = 'sda';
sub getstat {
my $fn = "/sys/block/$_[0]/stat";
open(FD, $fn) or die "$fn: $!\n";
$_ = <FD>;
close FD;
split;
}
my @last = getstat($dev);
while (1) {
sleep(2);
my @s = getstat($dev);
print "Device: r/s w/s rkB/s wkB/s\n";
printf "%-8.8s %4d %4d %8d %8d\n\n", $dev,
$s[0] - $last[0], $s[5] - $last[5],
($s[2] - $last[2]) / 2, ($s[6] - $last[6]) / 2;
@last = @s;
}
# ---------- .config --------------
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.27.1
# Sat Oct 18 19:48:07 2008
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
# CONFIG_X86_64 is not ...Weird, I cannot reproduce this at all, iostat works fine for me in .26, .27 and current -git as well. So it's just a plain SCSI drive from linux, no software raid or dm? Are the reported values in iostat any sort of multiple of the real throughtput, or is is just insanely large? -- Jens Axboe --
It looks like it's a multiple, but it appears to vary between 128 and 512, and is different for reads and writes, so I do not know what to make of it. I tried to reproduce it on different systems - on another box with a 2010S controller I'm seeing the same thing. But on a different box with SATA drives I cannot reproduce the problem. Thanks for looking at this. I guess I'll have to start putting debug statements in block/gendisk.c to find out what's wrong. I might not have time for it until after next week though, but I won't give up :) Mike. --
It's all really weird, especially if you see it on drivers both using the SCSI layer. And it's a multiple of the transfer size, then it can't be something like requeues skewing the count. But do look into it, I'll do some testing on various drivers tomorrow as well. -- Jens Axboe --
I added some debug statements to block/blk-core.c, and it appears that blk_end_io() is always called with nr_bytes == 16320 Kbytes (16711680). Ofcourse I should have noticed earlier that iostat -x 2 always prints 32640 as "average request size" (in sectors). Mike. --
You didn't happen to log a backtrace from that, did you? That's a weird
nr_bytes, it's 0xff0000.
I'm assuming this then fixes it, just wondering what the heck is going
on here.
diff --git a/block/blk-core.c b/block/blk-core.c
index 33c8d4b..a372618 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1712,11 +1710,12 @@ static int __end_that_request_first(struct request *req, int error,
if (blk_fs_request(req) && req->rq_disk) {
const int rw = rq_data_dir(req);
struct hd_struct *part;
- int cpu;
+ int cpu, sectors;
+ sectors = min_t(int, nr_bytes >> 9, req->hard_nr_sectors);
cpu = part_stat_lock();
part = disk_map_sector_rcu(req->rq_disk, req->sector);
- part_stat_add(cpu, part, sectors[rw], nr_bytes >> 9);
+ part_stat_add(cpu, part, sectors[rw], sectors);
part_stat_unlock();
}
--
Jens Axboe
--
I didn't try the above patch yet (it might work), but I did find the cause. I bisected the problem and it was caused by commit 427e59f09fdba387547106de7bab980b7fff77be , something in the scsi layer. More on linux-scsi (I'll Cc you) Thanks Mike. --
