Please CC me on replies, as I am not subscribed. Hi, for a while now I am having problems writing large files sequentially to EXT2 filesystems on CCISS based boxes. The problem is that writing multiple files in parallel is extremely slow compared to a single file in non-DIO mode. When using DIO, the scaling is almost "perfect". The problem manifests itself in RHEL4 kernels (2.6.9-X) and any mainline kernel up to 2.6.24-rc8. The systems in question are HP/DL380G4 with 2 cpus, 8 GB memory, SmartArray6i (CCISS) with BBWC and 4x72GB@10krpm disks in RAID5 configuration. Environment is 64-bit RHEL4.3. The problem can be reproduced by running 1, 2 or 3 parallel "dd" processes, or "iozone" with 1, 2 or 3 threads. Curiously, there was a period from 2.6.24-rc1 until 2.6.24-rc5 where the problem went away. It turned out that this was due to a "regression" that was "fixed" by below commit. Unfortunatelly this is not good for my systems, but it might shed some light on the underlying problem:pages are by having situations were a changed the buffered_rmqueue() for a physical increased search linux-2.6.24-rc6/mm/page_alloc.c +0000 +0000 migratetype); received here callers and the callers number in that can ordered Reverting this patch from 2.6.24-rc8 gives the good performance reported below (rc8*). So, apparently CCISS is very sensitive to the page ordering. Here are the numbers (MB/sec) including sync-time. I compare 2.6.24-rc8 (rc8) and 2.6.24-rc8 with abore commit reverted (rc8*). Reported is the combined throughput for 1,2,3 iozone threads, for reference also the DIO numbers. Raw numbers are attached. Test rc8 rc8* ---------------------------------------- 1x3GB 56 90 1x3GB-DIO 86 86 2x1.5GB 9.5 87 2x1.5GB-DIO 80 85 3x1GB 16.5 85 3x1GB-DIO 85 85 One can see that in mainline/rc8 all non-DIO numbers are smaller than the corresponding DIO numbers, or the non-DIO numbers from rc8*. The performance for 2 and 3 threads in mainline/rc8 is just bad. Of course I have the option to revert commit ....54b6d for my systems, but I think a more general solution would be better. If I can help tracking the real problem down, I am open for suggestions. Cheers Martin ------------------------------------------------------ Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Andy Whitcroft | clam |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Trent Piepho | [PATCH] [POWERPC] Improve (in|out)_beXX() asm code |
git: | |
| David Miller | Re: iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49 |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
