Hi,
Comparision testing with tiobench between kernel 2.6 and 2.4 shows results for 2.6 more poor than for 2.4(2-3 times)
I tried to change schedulers (antic on deadline) and tuning *_expire parameters for
device but cann't achieve 2.4 results :(
Is anybody have a positive results from tuning 2.6 kernel and got real improvement with 2.6 against 2.4 ?
Can anybody confirm that performance in 2.6 degraded since 2.4 ?
If it wrong, can somebody direct me how to right tune kernel 2.6 ?.
Thanks in any case,
sincerely, Alex
speed
Alex > Is anybody have a positive results from tuning 2.6 kernel and got real improvement with 2.6 against 2.4 ?
Hi Alex, I too agree with you. that 2.6 series , with or without any latency and tweaks is much slower than 2.4 series, I use 2.4.${latest} with Cornvalis ( spello ? ) - lck series patches works the best.
I have a K6 cpu / with 32MB Ram onlty . So take it what its worth ..
Though I am not blaming any Linux Developere here. I just want to make that ver clear.
BTW try tiny kernel patch , check google .. and see if iots help
The Link --> http://selenic.com/tiny/
try this Alex :
http://selenic.com/tiny/
http://selenic.com/tiny/2.6.8-rc2-tiny1.patch.bz2
more data needed
To get useful answers, you will have to share some data with us: which filesystem, what hardware (cpu, amount of ram, harddisk, controller and interface parameters, exact kernel revisions). Is the filesystem fresh and empty, or do you simply see fragmentation effects here?
Do you e.g. run into http://kerneltrap.org/node/view/3039/8677 (i don't know, if this applies to tiobench) ? There is work going on here (a quick fix in 2.6.7 iirc, better fixes waiting), so once again "2.6" is not specific enough for an answer.
more data
Hi all,
Thanks for your answers. Last evening I made new tests. I done they with 2 filesystems: ext2 and ext3
Partition was formatted each time before testing.
My test platform hardware:
127MB LOWMEM available.
Detected 805.648 MHz processor.
Using tsc for high-res timesource
Memory: 125512k/131008k available (2332k kernel code, 4940k reserved, 977k data, 172k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay loop... 1585.15 BogoMIPS
CPU: AMD Athlon(tm) processor stepping 02
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 256K (64 bytes/line)
Disk i/o subsystem:
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller at PCI slot 0000:00:07.1
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci0000:00:07.1
ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xd008-0xd00f, BIOS settings: hdc:DMA, hdd:pio
hdd: FUJITSU MPB3043ATU, ATA DISK drive
hdd: max request size: 128KiB
hdd: 8448300 sectors (4325 MB), CHS=8940/15/63, UDMA(33)
/dev/ide/host0/bus1/target1/lun0: p1
Benchmarking was provided by tiobench. Command line:
tiobench.pl --size 1024 --block 4096 --dir /mnt/storage/
where:
File size : 1024 Mb
Block size: 4096 bytes
Directory : mounted to /dev/hdd1
2.6.8.1 Kernel
Unit information
================
File size = megabytes
Blk Size = bytes
Rate = megabytes per second
CPU% = percentage of CPU used during the test
Latency = milliseconds
Lat% = percent of requests that took longer than X seconds
CPU Eff = Rate divided by CPU% - throughput per cpu load
Sequential Reads
File Blk Num Avg Maximum Lat% Lat% CPU
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
2.6.8.1 1024 4096 1 9.45 4.607% 0.413 46.22 0.00000 0.00000 205
2.6.8.1 1024 4096 2 8.66 4.259% 0.884 301.52 0.00000 0.00000 203
2.6.8.1 1024 4096 4 8.42 4.073% 1.843 567.67 0.00000 0.00000 207
2.6.8.1 1024 4096 8 8.37 4.093% 3.677 1090.28 0.00000 0.00000 205
Random Reads
File Blk Num Avg Maximum Lat% Lat% CPU
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
2.6.8.1 1024 4096 1 0.32 0.442% 12.368 33.70 0.00000 0.00000 71
2.6.8.1 1024 4096 2 0.33 0.274% 23.241 330.81 0.00000 0.00000 120
2.6.8.1 1024 4096 4 0.33 0.273% 46.039 339.86 0.00000 0.00000 119
2.6.8.1 1024 4096 8 0.33 0.286% 87.117 650.52 0.00000 0.00000 114
Sequential Writes
File Blk Num Avg Maximum Lat% Lat% CPU
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
2.6.8.1 1024 4096 1 9.63 6.257% 0.384 626.52 0.00000 0.00000 154
2.6.8.1 1024 4096 2 9.55 6.266% 0.611 47134.93 0.00038 0.00038 152
2.6.8.1 1024 4096 4 9.16 6.072% 1.233 74109.65 0.00839 0.00076 151
2.6.8.1 1024 4096 8 8.82 5.705% 2.847 95546.61 0.05798 0.00076 155
Random Writes
File Blk Num Avg Maximum Lat% Lat% CPU
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
2.6.8.1 1024 4096 1 0.43 0.382% 0.692 38.00 0.00000 0.00000 112
2.6.8.1 1024 4096 2 0.44 0.369% 5.310 1457.97 0.00000 0.00000 118
2.6.8.1 1024 4096 4 0.43 0.378% 14.866 1643.34 0.00000 0.00000 113
2.6.8.1 1024 4096 8 0.42 0.426% 21.147 1970.78 0.00000 0.00000 99
2.4.26 Kernel
Unit information
================
File size = megabytes
Blk Size = bytes
Rate = megabytes per second
CPU% = percentage of CPU used during the test
Latency = milliseconds
Lat% = percent of requests that took longer than X seconds
CPU Eff = Rate divided by CPU% - throughput per cpu load
Sequential Reads
File Blk Num Avg Maximum Lat% Lat% CPU
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
2.4.26 1024 4096 1 9.41 4.401% 0.415 38.36 0.00000 0.00000 214
2.4.26 1024 4096 2 5.63 2.771% 1.386 152.60 0.00000 0.00000 203
2.4.26 1024 4096 4 5.63 2.544% 2.723 252.17 0.00000 0.00000 221
2.4.26 1024 4096 8 5.52 2.648% 5.495 374.46 0.00000 0.00000 209
Random Reads
File Blk Num Avg Maximum Lat% Lat% CPU
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
2.4.26 1024 4096 1 0.31 0.333% 12.759 38.35 0.00000 0.00000 92
2.4.26 1024 4096 2 0.30 0.155% 25.654 55.18 0.00000 0.00000 195
2.4.26 1024 4096 4 0.31 0.261% 48.772 122.97 0.00000 0.00000 120
2.4.26 1024 4096 8 0.33 0.333% 92.522 275.71 0.00000 0.00000 98
Sequential Writes
File Blk Num Avg Maximum Lat% Lat% CPU
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
2.4.26 1024 4096 1 9.55 4.926% 0.386 4909.10 0.00763 0.00000 194
2.4.26 1024 4096 2 8.97 4.623% 0.795 6157.89 0.00992 0.00000 194
2.4.26 1024 4096 4 8.45 4.439% 1.702 6238.79 0.02289 0.00000 190
2.4.26 1024 4096 8 7.68 4.130% 3.714 6695.59 0.04158 0.00000 186
Random Writes
File Blk Num Avg Maximum Lat% Lat% CPU
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
2.4.26 1024 4096 1 0.39 0.274% 0.021 0.35 0.00000 0.00000 142
2.4.26 1024 4096 2 0.40 0.407% 0.037 16.29 0.00000 0.00000 98
2.4.26 1024 4096 4 0.41 0.312% 0.063 40.56 0.00000 0.00000 130
2.4.26 1024 4096 8 0.42 0.322% 0.075 56.65 0.00000 0.00000 130
As shown above, I not got my previous results ;). Results with 2.4.26 kernel almost the same as with 2.6.8.1.
New tests with smp-enabled kernels, sysbench utility and tiny patch(to 2.6 kernel) will present soon.
Sincerely, Alex
PS
Difference in results between ext2 and ext3 lies between 0%-1.2%
Sequential read
The main significant difference in your test is that 2.6.8.1 is much better than 2.4.26 in sequential reading with multiple threads.
If 2.6 was so bad first, then the question is if 2.6 became so much better, or that 2.4 somehow became much slower?
With which IO scheduler did you run the test? (AS or CFQ?)
It would be great if you did exactly the same benchmark(-s) with just adjusting one variable each time, so e.g. SMP on, then a seperate run with tiny, etc. to make it clear where the improvement/degradion comes from.
I guess everyone would love it if you would benchmark other filesystems too, like XFS, Reiser4, Reiser3, JFS, ext3 with different mount options, etc. ;-)
Scheduler for 2.6 kernel
I use default scheduler - as.
About filesystems: possible will be better to exclude filesystem processing from benchmark, and run benchmark utility on device directly. What do you think ?
So we will exclude many deviation factors. :)
exclude filesystem processing?
I don't know what you mean with "exclude filesystem processing", I also don't know what tiobench does. It doesn't make one big file and do random and sequential reads/write on just that, does it? That would be rather uninformative.
what I really test.
I not interesing what filesystem is better.
I interest in performance disk IO subsystem, but not a performance supported filesystems. So in my test I try to walk around the filesystem layer.
Really I begin this topic because of:
1. I got performance degradation under some circumstances(in comparision with 2.4.26 kernel)
2. I can't influence on I/O performance by changing /sys/block/?da/queue/iosched/* parameters. Ever got worsed results. :(
3. I not understand why I got 9.5Mb/sec throughput on IDE disk, when it work in UDMA(33) mode(instead 33Mb/sec).
what do you expect?
What do you expect from your poor MPB3043ATU 4G disk? This harddisk model appears in postings from 1998, so it is at least 6 years old.
Even benchmark data is hard to find, but i at least found http://www.vpk.psc.ru/testhdd.htm, where the "Av.L.Sp. (Average Linear Read Speed)" is listed as "8,50", totally in line with the other models.
So actually your disk is too _fast_ :)
UDMA33 is only the name of the _interface_ between disk and controller, so if you are lucky, data coming out of the disk cache streams with this speed (but this never happens, because linux caches the data on its own, and the os caches are much bigger...). If it is the only device on the cable.
It would be absurd to build disks, which are faster than their interface, but to be slower is certainly ok and useful (the interface then assures, that the data is transported with full platter speed in any circumstance).
As you may know, linear reads are a somewhat theoretical number, because seeks are glacially slow (mechanical parts have to move). Using the 9.3ms from the table, in the worst case (seek every 512Byte sector) this is about 55kByte/s. If you read through the page cache (in 4k chunks) you get a whopping 440kByte/s in this case...
I found performance improvement switching to 2.6
It was an old box, P2-233 MHz, 500 MB ancient drive. The boot-up time
was pretty long, but the anticipatory scheduler really makes booting faster. Must be its ability to support lots of concurrent reading better, even if that comes at cost otherwise (less raw performance due to the fact that disk heads wait after seeks when the hand-tuned heuristics tell it might pay off).
The deadline scheduler should provide 2.4-like performance, particularly if you increase filesystem readahead (was it blockdev --setra 511 that they recommended).
I'm now using CFQ because it allows the system to respond faster to things like "I open a new terminal" when I am doing heavy disk activity. Anticipatory really pushes the waiting period beyond tenable in that kind of situations. However, CFQ does not seem to have the
favourable characteristics of AS when it comes to booting -- AS still rocks there. It's now somewhat slower to boot the system...