Hi
we are experiencing massive performance problems with two of our
Linux servers that contain 3ware controllers on a Tyan mainboard and
a couple of 1T disks.During the daily cron job that uses rsync to sync a 500G file system
=66rom another machine to the raid on the 3ware controller the load
jumps up, and the machine becomes sluggish as hell. For example, an
ssh login to that machine takes minutes to complete and ldap becomes
unreliable while the rsync job is running. Even Nagios complains
about the machine being down while rsync is running.We tried the Cent-OS 2.6.18-based kernel, 2.6.23.y and linus-git from
today, but all three kernels show the same very poor performance as
soon as data is written to the disks on the 3ware controller.In particular commit 1e6c38c, i.e.
[SCSI] 3w-9xxx: fix abysmal write performance on some motherboards
which is contained in linus-current but not in the other two kernels
mentioned above does not seem to make any difference.We also tried different Raid Configurations, to no avail. ATM we're
using a raid10 over 4 disks with write cache enabled.Below there's some more info about the card, dmesg and lspci output
and our kernel config. A similar machine works fine with FreeBSD,
so I really think it's a problem with the linux driver.ATM this machine is only used as a fallback for the main server,
so we'll be able to reboot and test patches.Thanks
Andre
--------------------------------------------------------------------
=46rom the 3DM2 web interface:Model: 9500S-4LP
Firmware: FE9X 2.08.00.009
Driver: 2.26.02.010
BIOS: BE9X 2.03.01.052
Memory Installed 112 MB
# of Ports 4
# of Units 1
# of Drives 4--------------------------------------------------------------------
=46rom dmesg (linus-git):Driver 'sd' needs updating - please use bus_type methods
3ware 9000 Storage Controller device driver for Linux v2.26.02.010.
ACPI: PCI Interrupt 0000:03:03.0[A] -> GSI 24 (level, low) -> IRQ 24
in...
Could you give some numbers, please?
However there are some known issues:
http://forums.storagereview.net/index.php?showtopic=25923
http://tumbleweed.org.za/2007/02/16/horrific-performance-with-3ware-raidSymptons are reasonable performance with large block ops, but really bad performance with small block ops.
time (cp -a linux-2.6.24.2 linux-2.6.24.2b; sync)
Gives me with some tuning 50 seconds here with a 9650SE in a 4 disk raid5 setup. (very bad, single disk will do it in <30s!!!)
But reading and writing large files with large block sizes is usually beyond > 100 MB/sBest regards,
Arnd
--
Thanks, this helped a lot. However, there does not seem to be a way
to make the system more responsive, which is really the problem we
are experiencing.Andre
--=20
The only person who always got his work done by Friday was Robinson Crusoe
This is not 3ware-specific, but kernel 2.6.24 has new per-device write
throttling that might help with the responsiveness issue:http://kernelnewbies.org/LinuxChanges#head-92340ffcec39e7c2a09fd933243fb...
http://lwn.net/Articles/245600/Also, check to see if the 3ware controller has a background initialize
or verify in progress, since that will obviously slow things down until
it is complete.Tony
--
Yes, but we tried both 2.6.24 and 2.6.25-rc, so Peter's new
write-throttling code doesn't seem to help much in our situtation. I'll
play a bit with the various /proc/sys/vm/* knobs to see if that makesThat's certainly not the case.
Thanks
Andre
--=20
The only person who always got his work done by Friday was Robinson Crusoe
Andre,
Can you try turning down /sys/block/sdX/device/queue_depth to 16 and
see if that improves your responsiveness?-Adam
--
Yes, that setting seems to improve responsiveness greatly.
Thanks a lot
Andre
--=20
The only person who always got his work done by Friday was Robinson Crusoe
You're putting your box under astronomical load. This is generally
regarded as a bad idea, regardless of how well your storage controller
is performing. Can you measure the single-threaded throughput (say,
coping one huge file, and then syncing) to give us a baseline
performance figure? rsync will happily peg your box, your network, and
your cat if you let it.-- Chris
--
The machine becomes sluggish also when I write directly to the raid
array. A simpledd if=3D/dev/zero of=3Dtmpfile
Single threaded throughput seems to be ok (140M/s). The problem is
that the machine becomes unresponsive.Andre
--=20
The only person who always got his work done by Friday was Robinson Crusoe
Actually, it's normal for pdflush to spawn up to 8 threads when you're
dirtying memory faster than it can be written to disk. Load going to 4Does the machine become unresponsive during the single-threaded test, or
only when doing the rsync?-- Chris
--
It takes noticably longer to ssh into the machine also in the
single-threaded case (using dd to write to the device), but the
system remains usable. When the rsync job is running, it becomes
unusable quickly.However, reducing the queue depth with
echo 16 > /sys/block/sda/device/queue_depth
as suggested by Adam, solves all problems.
Thanks all for your help.
Andre
--=20
The only person who always got his work done by Friday was Robinson Crusoe
Hello Andre,
do you have the write-back cache of the controller enabled for your disks?
When you disable this cache, the controller will also disable the disks,
cause a write-performance between 3 to 8MB/s per disks.Cheers,
Bernd--
Bernd Schubert
Q-Leap Networks GmbH
--
?=20
Yes, I do. Performance is poor anyway.
Andre
--=20
The only person who always got his work done by Friday was Robinson Crusoe
| Davide Libenzi | [patch 7/8] fdmap v2 - implement sys_socket2 |
| Greg Kroah-Hartman | [PATCH 018/196] coda: convert struct class_device to struct device |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| David Newall | Re: Slow DOWN, please!!! |
git: | |
| Christoph Lameter | Network latency regressions from 2.6.22 to 2.6.29 |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Arjan van de Ven | Re: [GIT]: Networking |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
