OpenBSD creator Theo de Raadt [interview] offered an overview of some new RAID management functionality that will be found in the upcoming OpenBSD 3.8. At this time, only the ami MegaRAID controllers are supported, however in the future the new bioctl tool will be enhanced to manage additional RAID controllers. In his email, Theo provides examples of using the bioctl command to probe and manage RAID drives. He notes, "I would like to make it clear that for 3.8, this support will only work for the ami(4) raid controllers. Hopefully some other people will come helping us to make controllers from other vendors work too. About half of the code is a framework to permit RAID controller drivers to do the right thing."
The framework is intentionally designed to support only the basic functionality, as it will ultimately be used to support many RAID controllers. Theo explains, "the functionality supplied is also very basic, almost minimal. But this is done like this on purpose, since we believe that we could support this functionality on all RAID controllers in the same way, without special 'but that controller is so different' mindsets entering the picture. RAID management should (and can be) be no more complicated than ifconfig managing network interfaces." The functionality considered essential includes the ability to know when something wrong, configuration for automatic hot swapping, the ability to locate drives by blinking their lights, the ability to insert new drives and add them into a hot swap configuration, and the ability to turn off the beeper. "Everything else is just icing," Theo concludes. "These are the micro operations which really matter."
From: Theo de Raadt [email blocked] To: misc Subject: RAID management support coming in OpenBSD 3.8 Date: Fri, 09 Sep 2005 15:18:58 -0600 I thought it was time to give some details about the (minimal) RAID management stuff coming in OpenBSD 3.8. Most of this code has been written by Marco Peereboom with some help from David Gwynne and Michael Shalayeff. Moral support and direction from me and Bob Beck who has a pile of these AMI setups. Here is a demonstration. First, a piece of dmesg output, so that we can see which device is going to be handled: ami0 at pci1 dev 8 function 0 "Symbios Logic MegaRAID" rev 0x01: apic 9 int 8 (irq 10) Dell 518/64b/lhc ami0: FW 350O, BIOS v1.09, 128MB RAM ami0: 2 channels, 0 FC loops, 2 logical drives scsibus2 at ami0: 40 targets sd0 at scsibus2 targ 0 lun 0: <AMI, Host drive #00, > SCSI2 0/direct fixed sd0: 349400MB, 44542 cyl, 255 head, 63 sec, 512 bytes/sec, 715571200 sec total sd1 at scsibus2 targ 1 lun 0: <AMI, Host drive #01, > SCSI2 0/direct fixed sd1: 349400MB, 44542 cyl, 255 head, 63 sec, 512 bytes/sec, 715571200 sec total scsibus3 at ami0: 16 targets ses0 at scsibus3 targ 6 lun 0: <DELL, PV22XS, E.17> SCSI3 3/processor fixed scsibus4 at ami0: 16 targets ses1 at scsibus4 targ 6 lun 0: <DELL, PV22XS, E.17> SCSI3 3/processor fixed OK, this is an AMI raid controller. It has come up with 3 scsi busses; one for the virtual RAID volumes which there are two of, and two SCSI busses which match the real SCSI busses that are on the controller (to expose the SES or SAFTE enclosure management controllers, and so that we can talk pass-through to the real disks). If we wish to probe further details, we use # bioctl ami0 Volume Status Size Device ami0 0 Online 366372454400 sd0 RAID5 0 Online 73403465728 0:0.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 1 Online 73403465728 0:2.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 2 Online 73403465728 0:4.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 3 Online 73403465728 0:8.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 4 Online 73403465728 1:10.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> 5 Online 73403465728 1:12.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> ami0 1 Online 366372454400 sd1 RAID5 0 Online 73403465728 0:1.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 1 Online 73403465728 0:3.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 2 Online 73403465728 0:5.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 3 Online 73403465728 1:9.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> 4 Online 73403465728 1:11.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> 5 Online 73403465728 1:13.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> ami0 2 Unused 73403465728 1:14.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> ami0 3 Hot spare 73403465728 1:15.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> Here we can see which physical drives are on the controller, and how they are configured into volumes. Two volumes have been created, both of which are rather large. The drives are on two scsi busses, for instance, 1:12.0 means SCSI bus 1, scsi target 12, lun 0. With additional options to bioctl(4), we could find out some more (mostly irrelevant) information. There are also two additional devices which we know about: one is unused (ie. not registered with the AMI firmware at the moment), and one is a Hot Spare. Let's cause some havoc. First, I want to pick a drive that I am going to unplug, to mimic a failure. Let's see... 1:9.0 looks good to me. # bioctl -b 1.9 ami0 When I look at the array, one of the drives is now blinking. I made it blink just because I prefer to pull drives out of my sd1 filesystems rather than the sd0 filesystems. And otherwise I wouldn't be able to show off the blink support. Anyways, I pull that particular drive. Immediately some churning starts, and if I re-run bioctl I can see what has happened: # bioctl ami0 Volume Status Size Device ami0 0 Online 366372454400 sd0 RAID5 0 Online 73403465728 0:0.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 1 Online 73403465728 0:2.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 2 Online 73403465728 0:4.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 3 Online 73403465728 0:8.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 4 Online 73403465728 1:10.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> 5 Online 73403465728 1:12.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> ami0 1 Degraded 366372454400 sd1 RAID5 0 Online 73403465728 0:1.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 1 Online 73403465728 0:3.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 2 Online 73403465728 0:5.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 3 Rebuild 73403465728 1:15.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> 4 Online 73403465728 1:11.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> 5 Online 73403465728 1:13.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> ami0 2 Unused 73403465728 1:14.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> Drive 1:15 automatically became a part of the "sd1" volume, and is currently rebuilding. If I access a filesysdtem on sd1, I will notice that it is a little bit slower. Of course the RAID array is beeping so loudly I think my ears are going to burst, so I must shut it up: # bioctl -a quiet ami0 When I reinsert the drive that I previously unplugged, I see: # bioctl ami0 Volume Status Size Device ami0 0 Online 366372454400 sd0 RAID5 0 Online 73403465728 0:0.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 1 Online 73403465728 0:2.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 2 Online 73403465728 0:4.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 3 Online 73403465728 0:8.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 4 Online 73403465728 1:10.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> 5 Online 73403465728 1:12.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> ami0 1 Degraded 366372454400 sd1 RAID5 0 Online 73403465728 0:1.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 1 Online 73403465728 0:3.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 2 Online 73403465728 0:5.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 3 Rebuild 73403465728 1:15.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> 4 Online 73403465728 1:11.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> 5 Online 73403465728 1:13.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> ami0 2 Unused 73403465728 1:9.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> ami0 3 Unused 73403465728 1:14.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> Drive 1:9 has come back as "Unused". Let's make it a Hot Spare, so that I can use it later. # bioctl -H 1:9 ami0 # bioctl ami0 Volume Status Size Device ami0 0 Online 366372454400 sd0 RAID5 0 Online 73403465728 0:0.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 1 Online 73403465728 0:2.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 2 Online 73403465728 0:4.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 3 Online 73403465728 0:8.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 4 Online 73403465728 1:10.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> 5 Online 73403465728 1:12.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> ami0 1 Degraded 366372454400 sd1 RAID5 0 Online 73403465728 0:1.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 1 Online 73403465728 0:3.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 2 Online 73403465728 0:5.0 ses0 <MAXTOR ATLAS15K2_73SCA JNZ6> 3 Rebuild 73403465728 1:15.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> 4 Online 73403465728 1:11.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> 5 Online 73403465728 1:13.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> ami0 2 Hot spare 73403465728 1:9.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> ami0 3 Unused 73403465728 1:14.0 ses1 <MAXTOR ATLAS15K2_73SCA JNZ6> Now if I get another failure, there is a drive to perform a failover to again. Earlier we had mentioned the SES and SAFTE enclosure monitors. Their statistics are also available as well. # sysctl hw.sensors hw.sensors.0=ses0, psu0, OK, indicator, On hw.sensors.1=ses0, psu1, OK, indicator, On hw.sensors.2=ses0, fan0, OK, percent, 33.33% hw.sensors.3=ses0, fan1, OK, percent, 33.33% hw.sensors.4=ses0, fan2, OK, percent, 33.33% hw.sensors.5=ses0, fan3, OK, percent, 33.33% hw.sensors.6=ses0, temp0, OK, temp, 26.00 degC / 78.80 degF hw.sensors.7=ses0, temp1, OK, temp, 25.00 degC / 77.00 degF hw.sensors.8=ses0, temp2, OK, temp, 27.00 degC / 80.60 degF hw.sensors.9==ses0, temp3, OK, temp, 28.00 degC / 82.40 degF hw.sensors.10=ses1, psu0, OK, indicator, On hw.sensors.11=ses1, psu1, OK, indicator, On hw.sensors.12=ses1, fan0, OK, percent, 33.33% hw.sensors.13=ses1, fan1, OK, percent, 33.33% hw.sensors.14=ses1, fan2, OK, percent, 33.33% hw.sensors.15=ses1, fan3, OK, percent, 33.33% hw.sensors.16=ses1, temp0, OK, temp, 26.00 degC / 78.80 degF hw.sensors.17=ses1, temp1, OK, temp, 25.00 degC / 77.00 degF hw.sensors.18=ses1, temp2, OK, temp, 27.00 degC / 80.60 degF hw.sensors.19=ses1, temp3, OK, temp, 28.00 degC / 82.40 degF We can use sensorsd(8) to watch these status indicators for problems. When this code was first written, I used to toggle one of the RAID enclosure power switches for kicks, just so that I could see the values change. I would like to make it clear that for 3.8, this support will only work for the ami(4) raid controllers. Hopefully some other people will come helping us to make controllers from other vendors work too. About half of the code is a framework to permit RAID controller drivers to do the right thing. The amount of code to support this is very small compared to typical vendor RAID management solutions. The functionality supplied is also very basic, almost minimal. But this is done like this on purpose, since we believe that we could support this functionality on all RAID controllers in the same way, without special "but that controller is so different" mindsets entering the picture. RAID management should (and can be) be no more complicated than ifconfig managing network interfaces. The typical administrator needs to know when something is wrong automatic Hot Swap allocation on volume degrade to blink and unblink drives (to find them), to be able to upgrade newly inserted drives to Hot Swap status to shut off the damn beeper. Everything else is just icing. These are the micro operations which really matter. All other operations on the volumes make it OK to reboot into the card BIOS. At this point in this mail, I would love to show the output of the RAID array back in normal status, but it will take a couple of hours for that volume to be rebuilt. If anyone is serious about attempting to write the back-end code for another RAID driver already in our tree, please contact marco [email blocked]. But don't bother him with other stuff...
From: David Gwynne [email blocked] Subject: Re: RAID management support coming in OpenBSD 3.8 Date: Sat, 10 Sep 2005 17:07:56 +1000 From: "Theo de Raadt" [email blocked] > I thought it was time to give some details about the (minimal) RAID > management stuff coming in OpenBSD 3.8. Most of this code has been > written by Marco Peereboom with some help from David Gwynne and > Michael Shalayeff. Moral support and direction from me and Bob Beck > who has a pile of these AMI setups. I'd also like to say that I wouldn't have been able to do this stuff without donations from the following people: Ben Hooper Chris Bensend Travis Gillitzer Mark Uemura Greg Tod Thanks go to these guys for helping us get something going for 3.8. dlg