aacraid on Dell 2650's

Submitted by lairdcp
on September 30, 2004 - 1:48pm

Hi All,

Any ideas greatly appreciated. I am working with a University who
have 8 X Dell 2650's with RHEL 3 AS installed. Its patched via RHN to
kernel version 2.4.21-20.EL. I have used the latest Dell Snap CD and
then installed the latest PERC 3Di firmware (6092). I am also running
the above aacraid driver which as I understand is the latest from Dell's website.

The problem is on the first couple of boots on each machine I get:

Kernel panic on boot
call trace: [] scsi_setup_host[scsi_mod] oxb6 (0xc989fe88)
aac_pci_tb [aacraid] 0x0 (0xc989feac)
aac_pci_driver[aacraid] 0x2a (0xc989febc)
etc etc

On the third boot it goes in ok, but then after x minutes if I have to
boot again it will fail.

I have also run the Elite diags on the disks which come back fine.

This is going into prod in around a month and will be a great Linux
site, so any help would be great.

I am talking to Dell about this but as they havent come up with much
thought I would flag it here.

Many Thanks,

Colin

I've had a similar issue with

Anonymous
on
November 19, 2004 - 9:53am

I've had a similar issue with the raid controller on the 2650's. I'm running RedHat 9.0 with a patched 2.4.25 kernel. (from kernel.org, I dont bother with redhats rpm'ed kernels)
I had to get the redhat kernel patch for the newest aacraid drivers, extract the rpm, and use that source to replace the kernel drivers. Recompiled and its working nicely. I'm having a hell of a time getting a 2.6 kernel to work though.
The link below might help you with aacraid issues.
Thanks
http://linux.dell.com/files/aacraid/

eXklusve@hotmail.com

Problems with Dell Poweredge 2650

Anonymous
on
November 29, 2004 - 5:26pm

I have been trying to get RHEL AS3 working on a Poweredge 2650 with the same kernel version as you. I decided not to use the hardware RAID but to set those up as 5 containers each of 36G and use software RAID. I partitioned 1G on each of two drives as raid 1 for /boot. I configured 1G from each of the remaining 3 drives as raid 0 and used it for swap. The remaining disk space on all the 5 drives was set up as raid5 as root (/).

I have been having awful problems with this. Fortunately this is a replacement machine and I have had the luxury of being able to play with this. However 3 months and not have a stable machine is just too much. Yesterday, it reported an aacraid hang? and then produced file system errors. For a while I had thought it was the way that I had partitioned the disks but am now on the web finding more information. Sometimes I think that I have it stable and then it produces more problems.

Would it be better to trash the machine and put it on a poweredge 2850? With this raid controller and the broadcom nics, it seems it would be best to avoid this hardware, and perhaps Dell hardware, for RHEL.

regards

mike

Questions

Anonymous
on
November 29, 2004 - 8:15pm

I guess the first question to ask is if you tried this on more than one of the machines. From your post it seems you did, but I wanted to be clear on that; it could just be that one of the machines (the one you are testing on) has a bad controller.

Barring that, I guess the thing to do is to try out some different kernels. Personally, I would first try out the kernel that originally comes with RHEL 3 AS (the one you installed with). If that still has problems, try out a kernel.org kernel (probably one of the newest ones, 2.4.27 or 2.4.28); I know it is a pain in the butt, but you will at least get to see if it is one of RedHat's patches.

I only say all of this because at my old job we used linux with the aacraid driver on PowerEdge 2650's without any problems. We were running a kernel.org 2.4.18 at the time, so if the newest kernel.org kernels don't work, you might want to start with that as a baseline.

Re Questions

Anonymous
on
November 29, 2004 - 9:38pm

Many thanks for your advice and comments.

Unfortunately I don't have a spare 2650 or 2850 at the moment. However I will test some of the kernels as you suggest. Interesting that you didn't have any problems with the 2650's in your old job with the older kernels.

cheers

mike

Re Questions

Anonymous
on
November 29, 2004 - 11:39pm

I have a stable 2650 with Perc3/Di running RHAS 2.1 (2.4.9) and also one running vanilla 2.4.21.

Following is the boot messages

Red Hat/Adaptec aacraid driver (1.1-5[2340])
AAC0: kernel 2.8-0 build 6089Red Hat/Adaptec aacraid driver (1.1.3 Feb 18 2004 19:54:03)
AAC0: monitor 2.8-0 build 6089AAC0: kernel 2.8-0 build 6089Red
AAC0: bios 2.8-0 build 6089
AAC0: serial 32e021d3

Hat/Adaptec aacraid driver (1.1.3 Feb 18 2004 19:54:03)
AAC0: kernel 2.7.4 build 3170
AAC0: monitor 2.7.4 build 3170
AAC0: bios 2.7.0 build 3170
AAC0: serial 524841d3fafaf001

aacraid / Dell PE2640 / RHEL 3

e4jet
on
December 10, 2004 - 2:07pm

We have the PERC3Di (6092 build) on 2 PE2650 machines. RHEL 3.0 ES update 1 (2.4.21-9.ELsmp) works fine on both. RHEL 3.0 ES update 3 (2.4.21-20.ELsmp) crashes often during boot (on both) as described by lairdcp at the beginning of this thread. Update 3 is using aacraid version 1.1.5.2340. I'm not an expert, but this seems to be the problem. Dell offers aacraid 1.1.4-2302 on their support site. The following steps were used to install this module:

get aacraid-1.1.4-2302-RHEL3-A01.tar from Dell.
rpm -i dkms-1.00-1.noarch.rpm
rpm -i aacraid-1.1.4-2302dkms.noarch.rpm

ln -s /var/dkms/aacraid/1.1.4-2302 /var/dkms/aacraid/1.1.4.2302

dkms build -m aacraid -v 1.1.4-2302
dkms install -m aacraid -v 1.1.4-2302

check /lib/modules/2.4.21-20.ELsmp/kernel/drivers/scsi/aacraid
for a new module...

I've booted the machine 10 times without a panic (previously I was getting 5 panics out of 10 boots). I did experience /var failing to umount durning umount2 (it claimed the device was busy) about 4 out of the 10 times. I'm going to run Bonnie++ on the box for the weekend to see if this is going to be a stable config. I'll post my results next week. If you try this, please post your results here as well.

Thanks,
-e4jet

RE: aacraid / Dell PE2640 / RHEL 3

e4jet
on
December 13, 2004 - 7:25am

Good news. The Dell provided aacraid module help up to 2+ days of continuous bonnie++ tests. I ran 2 loops, one against each LUN. Each test ran 171 times. The test scores were consistent throughout the test.

-e4jet

What were the results of the

Ryan (not verified)
on
December 16, 2004 - 11:05am

What were the results of the test runs? The hardware RAID results seem to be a bit low on throughput, though seek/sec isn't too bad. I'm using a 2.6.9 derived kernel with the aacraid driver that comes with the kernel, and everything has been perfectly stable, but I wouldn't mind more throughput.

sargeras scsi # dmesg|grep AAC
AAC0: kernel 2.7.4 build 3170
AAC0: monitor 2.7.4 build 3170
AAC0: bios 2.7.0 build 3170
AAC0: serial fb9c61d3fafaf001
sargeras scsi # dmesg|grep aac
Red Hat/Adaptec aacraid driver (1.1.2-lk2 Dec 16 2004)
sargeras scsi # cat /proc/version
Linux version 2.6.9-gentoo-r9 (root@sargeras) (gcc version 3.3.4 20040623 (Gentoo Linux 3.3.4-r1, ssp-3.3.2-2, pie-8.7.6)) #9 SMP Thu Dec 16 11:44:42 CST 2004

I have 3 2650s that are about to be deployed each with 4 146GB 10K RPM drives.

I know it isn't much... but I

Anonymous (not verified)
on
March 16, 2005 - 3:49pm

I know it isn't much... but I almost cracked my head open on this one. The solution for me was simply to upgrade the raid controller to the latest firmware version on the dell site. As of around the 9th of March, 2005. 2.6.11 works flawlessly now, on a PE2650.

Red Hat/Adaptec aacraid driver (1.1.2-lk2 Mar 16 2005)
ACPI: PCI interrupt 0000:06:08.1[A] -> GSI 30 (level, low) -> IRQ 30
AAC0: kernel 2.8.4 build 6092
AAC0: monitor 2.8.4 build 6092
AAC0: bios 2.8.0 build 6092
AAC0: serial 124410d3fafaf001
AAC0: 64bit support enabled.
AAC0: 64 Bit DAC enabled
scsi2 : percraid
Vendor: DELL Model: fish Rev: V1.0
Type: Direct-Access ANSI SCSI revision: 02
megaraid cmm: 2.20.2.5 (Release Date: Fri Jan 21 00:01:03 EST 2005)
megaraid: 2.20.4.5 (Release Date: Thu Feb 03 12:27:22 EST 2005)
megaraid: probe new device 0x101e:0x1960:0x1028:0x0493: bus 4:slot 0:func 0
ACPI: PCI interrupt 0000:04:00.0[A] -> GSI 24 (level, low) -> IRQ 24
megaraid: fw version:[197O] bios version:[3.35]
scsi3 : LSI Logic MegaRAID driver
scsi[3]: scanning scsi channel 0 [Phy 0] for non-raid devices
Vendor: DELL Model: PV22XS Rev: E.14
Type: Processor ANSI SCSI revision: 03
scsi[3]: scanning scsi channel 1 [Phy 1] for non-raid devices
scsi[3]: scanning scsi channel 2 [virtual] for logical drives
Vendor: MegaRAID Model: LD 0 RAID5 908G Rev: 197O
Type: Direct-Access ANSI SCSI revision: 02

Where did you get the firmware?

Kirk A (not verified)
on
April 11, 2005 - 3:19pm

Hello,
The title says it all. I cannot find the site to download the
firmware update. BTW, It is now April 11th. Is the problem truly gone?

Thanks,
Kirk.

Fixed ! PE 2650 failures with Perc 3/Di card

Kirk A (not verified)
on
April 17, 2005 - 5:41pm

Hello All,

As of April 17, 2005, I have had 5 solid days uptime with my
Dell PE 2650 after installing the new aacraid 1.15 drivers.
If it fails or continues to work, I will continue to post...

I obtained the following file
aacraid_drv_linux_1.1.5-2371.rpm
from
http://www.adaptec.com/worldwide/support/drivers_by_product.jsp?sess=no&...

I unpacked it using rpm which creates:
/opt/Adaptec/aacraid/aacraid_patches.tgz
/opt/Adaptec/aacraid/aacraid_prebuilt.tgz
/opt/Adaptec/aacraid/aacraid_source.tgz
/opt/Adaptec/aacraid/adpt_mk_initrd
/opt/Adaptec/aacraid/adpt_mkinitrd
/opt/Adaptec/aacraid/chk_lilo
/opt/Adaptec/aacraid/create_device_nodes
/opt/Adaptec/aacraid/grub.awk
/opt/Adaptec/aacraid/install.sh
/opt/Adaptec/aacraid/lilo.awk
/opt/Adaptec/aacraid/module.equiv
/opt/Adaptec/aacraid/read.me

I unzipped the aacraid_source.tgz file and copied into my
2.6.11 kernel tree (drivers/scsi/aacraid). I then compiled
my kernel and installed. I do run with the kernel option apm=off.
However with earlier versions of the driver this flag did not
help. Finally I am running FC3. I did not update the BIOS. I did
not update the Controller firmware. I repeat, the only change
to do is aacraid driver and I guess 2.6.11 vs 2.6.9 kernel

Thanks to Frank Free for helping me throughout all of this.

Cheers,
Kirk.

I've found that Bonnie is a p

Anonymous (not verified)
on
April 21, 2005 - 6:40am

I've found that Bonnie is a poor way to test this error. We also had 2-3 days uptime with non-stop Bonnie runs and thought the problem was solved. The problem is easier to trigger copying small files. Instead of waiting days/weeks for the error to trigger, we've started running parallel copies of our /usr directory to different directories. The problem now triggers within hours, usually within 40 minutes.

aacraid general problems 2.6.9

Alex Marquez (not verified)
on
August 29, 2005 - 2:23am

Hi all... i resolved all SCSI aacraid problems, updating accraid driver from 1.1.2k to 1.1.5. I have 2 machines running whitout crash about 10 days. Before this change, i get a scsci random crash every day or few hours. I hope that will be stable for 3 month more or less. If not, is better 10 days, that 1. :-)
Is easy.. get the kernel source, put the new driver in the correct location, compile, install and nothing more.
For compile this is a good guide:
http://kerneltrap.org/node/2465

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.