login
Header Space

 
 

Re: arcmsr & areca-1660 - strange behaviour under heavy load

Previous thread: ATM CONTRACT PAYMENT. by ATM CARD PAYMENT on Saturday, February 23, 2008 - 1:26 am. (1 message)

Next thread: Tiny cpusets -- cpusets for small systems? by Paul Jackson on Saturday, February 23, 2008 - 8:09 am. (8 messages)
To: <linux-kernel@...>
Date: Saturday, February 23, 2008 - 7:20 am

Hi,

I've found strange problem either in arcmsr driver, or maybe in 
areca-1660 card...
When system on SAS discs RAID connected to areca-1660 card 
gets under heavy I/O load, it gets unusable after some time. I can 100% reproduce 
this, although it needs quite speciffic conditions:
It can be reproduced on 2x quad core machine, RAM has to be limited to 
~192MB to cause heavy paging.
Only thing needed to cause the problem is to start loop doing kernel 
compilation using make -j 8 - this loads the system heavily, because of 
lack of memory. After few correct compile runs the system gets into 
state when all programs including the basic ones (ls, cp, ..) start 
crashing... dmesg (when it works) doesn't say anything strange...
After reboot, the system is OK again.
I have tested it on different motherboards, with different CPUs, RAMs(all 
were properly tested with memtest), with two different areca cards and 
different drives. I can't reproduce the problem on same hardware when 
using different RAID card (ie adaptec). All testing systems were properly 
cooled..
I have tried all available areca firmwares, two different distributions 
(oracle linux, and centos), and kernels ranging from distribution ones, to last GIT snapshot.
Could somebody please give me some hints on how to hunt this problem?
Areca support doesn't seem to be very interested in the problem :-(
Thanks a lot in advance
BR
nik

-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

--
To: Nikola Ciprich <extmaillist@...>
Cc: <linux-kernel@...>, <linux-scsi@...>, Nick Cheng <nick.cheng@...>, Erich Chen <erich@...>
Date: Sunday, February 24, 2008 - 8:10 pm

(cc's added)

Please get the machine into this state of memory exhaustion then take
copies of the output of the following, and send them via reply-to-all to
this email:

- cat /proc/meminfo

- cat /proc/slabinfo

- dmesg -c &gt; /dev/null ; echo m &gt; /proc/sysrq-trigger ; dmesg -c

Thanks.
--
To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <linux-scsi@...>, Nick Cheng <nick.cheng@...>, Erich Chen <erich@...>, <kopi@...>
Date: Tuesday, February 26, 2008 - 5:35 am

Hi

On Sun, 24 Feb 2008, Andrew Morton wrote:

Hi Andrew,
thanks a lot for reply, I'm attaching requested information.
please let me know if You need more information/testing, whatever.
I'll be glad to help.
BR

--
To: Nikola Ciprich <extmaillist@...>
Cc: <linux-kernel@...>, <linux-scsi@...>, Nick Cheng <nick.cheng@...>, Erich Chen <erich@...>, <kopi@...>
Date: Tuesday, February 26, 2008 - 1:43 pm

Alas, that all looks OK to me.

You never get any out-of-memory messages, and no oom-killing messages?

Possibly what is happening here is that in this low-memory condition, some
of the driver's internal memory-allocation attempts are failing, and the
driver isn't correctly handling this.  This is a rare situation which may
well not have been hit in anyone else's testing.

I expect that the Areca engineers will be able to reproduce this with a
suitably small "mem=" kernel boot option.  If not, they could perhaps
investigate the kernel's fault-injection framework, which permits
simulation of page allocation failures.
--
To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <linux-scsi@...>, Nick Cheng <nick.cheng@...>, Erich Chen <erich@...>, <kopi@...>
Date: Tuesday, February 26, 2008 - 3:29 pm

Hi Andrew,
no, right now I have the machine in the weird state, swap is empty (3GB), 
and so is bigger part of RAM (~100MB free), and the gcc crashes even when 
trying to compile c program with empty main function. so it doesn't seem 
to be problem with memory exhaustion.
Hopefully the areca guys will be able to find out what is going on. But 
anyways, if You'll have any other idea what should I check/try, please let 
me know, as I have to admit that I'd really like to hunt it down myself 
(and yes, there is some vanity on my side here :))
thanks a lot once more
cheers
nik



  On Tue, 26 Feb 2008, 

-- 

--
To: Nikola Ciprich <extmaillist@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-scsi@...>, Nick Cheng <nick.cheng@...>, Erich Chen <erich@...>, <kopi@...>
Date: Tuesday, February 26, 2008 - 5:04 pm

Maybe memory fragmentation?  Perhaps the driver tries to allocate a
large block of memory and cannot find a continuous block of the right
size.

Maybe the driver developers used different kernel .config options than
you are using. =20

Try increasing the value in /proc/sys/vm/min_free_kbytes.

Try switching some things like SLAB or SLUB, try booting with
kernelcore=3D512M to enable the Movable memory zone, or try 64-bit vs
32-bit kernels.=20
--=20
Zan Lynx &lt;zlynx@acm.org&gt;
To: 'Nikola Ciprich' <extmaillist@...>
Cc: 'Andrew Morton' <akpm@...>, <linux-kernel@...>, <linux-scsi@...>, 'Erich Chen' <erich@...>, <kopi@...>, <support@...>, 'Zan Lynx' <zlynx@...>
Date: Tuesday, February 26, 2008 - 9:53 pm

Hi Nikola,
Please put support@areca.com.tw in the loop.
I am sure Areca support, Kevin, has taken over your case.
If you like, please let him know your configuration and operations to
synchronize both sides.
Thank you for your patience and sorry for your inconvenience,

-----Original Message-----
From: Zan Lynx [mailto:zlynx@acm.org] 
Sent: Wednesday, February 27, 2008 5:04 AM
To: Nikola Ciprich
Cc: Andrew Morton; linux-kernel@vger.kernel.org; linux-scsi@vger.kernel.org;
Nick Cheng; Erich Chen; kopi@linuxbox.cz
Subject: Re: arcmsr &amp; areca-1660 - strange behaviour under heavy load



Maybe memory fragmentation?  Perhaps the driver tries to allocate a
large block of memory and cannot find a continuous block of the right
size.

Maybe the driver developers used different kernel .config options than
you are using.  

Try increasing the value in /proc/sys/vm/min_free_kbytes.

Try switching some things like SLAB or SLUB, try booting with
kernelcore=512M to enable the Movable memory zone, or try 64-bit vs
32-bit kernels. 
-- 
Zan Lynx &lt;zlynx@acm.org&gt;

--
To: 'Nikola Ciprich' <extmaillist@...>, 'Andrew Morton' <akpm@...>
Cc: <linux-kernel@...>, <linux-scsi@...>, 'Erich Chen' <erich@...>, <kopi@...>, <kevin34@...>, <billion.wu@...>
Date: Tuesday, February 26, 2008 - 6:30 am

Hi Nikola,
As I said, we will test on our site.
Our support team will help you to settle the issue.
Sorry for your inconvenience,

-----Original Message-----
From: Nikola Ciprich [mailto:extmaillist@linuxbox.cz] 
Sent: Tuesday, February 26, 2008 5:36 PM
To: Andrew Morton
Cc: linux-kernel@vger.kernel.org; linux-scsi@vger.kernel.org; Nick Cheng;
Erich Chen; kopi@linuxbox.cz
Subject: Re: arcmsr &amp; areca-1660 - strange behaviour under heavy load

Hi

On Sun, 24 Feb 2008, Andrew Morton wrote:

Hi Andrew,
thanks a lot for reply, I'm attaching requested information.
please let me know if You need more information/testing, whatever.
I'll be glad to help.
BR

-- 

--
Previous thread: ATM CONTRACT PAYMENT. by ATM CARD PAYMENT on Saturday, February 23, 2008 - 1:26 am. (1 message)

Next thread: Tiny cpusets -- cpusets for small systems? by Paul Jackson on Saturday, February 23, 2008 - 8:09 am. (8 messages)
speck-geostationary