Re: 4 partition raid 5 with 2 disks active and 2 spare, how to force?

Previous thread: Auto Rebuild on hot-plug by Neil Brown on Wednesday, March 24, 2010 - 5:35 pm. (33 messages)

Next thread: Re: Use of WD20EARS with MDADM by Bill Davidsen on Wednesday, April 14, 2010 - 12:53 pm. (52 messages)
From: Anshuman Aggarwal
Date: Thursday, March 25, 2010 - 2:30 am

All, thanks in advance...particularly Neil.

My raid5 setup has 4 partitions, 2 of which are showing up as spare and 2 as active. The mdadm --assemble --force gives me the following error:
2 active devices and 2 spare cannot start device

it is a raid 5, with superblock 1.2, 4 devices in the order sda1, sdb5, sdc5, sdd5. I have lvm2 on top of this with other devices ...so as you all know data is irreplaceable blah blah.

I know that this device has not been written to for a while, so the data can be considered intact (hopefully all) if I can get the device to start up...but I'm not sure of the best way to coax the kernel to assemble it. Relevant information follows:

=== This device is working fine === 
mdadm --examine  -e1.2 /dev/sdb5
/dev/sdb5:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 42c56ea0:2484f566:387adc6c:b3f6a014
           Name : GATEWAY:127  (local to host GATEWAY)
  Creation Time : Sat Aug 22 09:44:21 2009
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 586099060 (279.47 GiB 300.08 GB)
     Array Size : 1758296832 (838.42 GiB 900.25 GB)
  Used Dev Size : 586098944 (279.47 GiB 300.08 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : f8ebb9f8:b447f894:d8b0b59f:ca8e98eb

Internal Bitmap : 2 sectors from superblock
    Update Time : Fri Mar 19 00:56:15 2010
       Checksum : 1005cfbc - correct
         Events : 3796145

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 2
   Array State : .AA. ('A' == active, '.' == missing)

=== This device is marked spare, can be marked active (IMHO) ===
mdadm --examine  -e1.2 /dev/sdd5
/dev/sdd5:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 42c56ea0:2484f566:387adc6c:b3f6a014
           Name : GATEWAY:127  (local to host GATEWAY)
  Creation Time : Sat Aug 22 09:44:21 2009
     Raid Level : raid5
   Raid Devices : 4

 ...
From: Michael Evans
Date: Thursday, March 25, 2010 - 4:37 am

On Thu, Mar 25, 2010 at 2:30 AM, Anshuman Aggarwal

You have a raid 5 array.

(drives then data+parity per drive as an example)
1234

123P
45P6
7P89
...

You are missing two drives, meaning you lack parity and 1 data stripe
and have NO parity to recover it with.

It's like seeing:

.23.
.5P.
.P8.

and expecting to somehow recover the missing data when it is no longer
within the clean information.

Your only hope is to assemble the array in read only mode with the
other devices, if they can still even be read.  In that case you might
at least be able to recover nearly all of your data; hopefully any
missing areas are in unimportant files or non-allocated space.

At this point you should be EXTREMELY CAREFUL, and DO NOTHING, without
having a good solid plan in place.  Rushing /WILL/ cause you to loose
data that might still potentially be recovered.
--

From: Anshuman Aggarwal
Date: Thursday, March 25, 2010 - 7:09 am

Thanks Michael, I am clear about the problem of why the multiple failure would cause me to lose data. Which is why I wanted to consult this mailing list before proceeding. 

Could you tell me how to keep the array read-only?  and mark one or both of these spares as active forcibly? and Also, once I am able to use these spares as active and the data is not consistent in a particular stripe, how does the kernel resolve the inconsistency (as in what data does it use, the one based on the data stripes or the one based on the parity?) this one is just academic interest since it'll be difficult to figure out which is the right data anyways.

Thanks,
Anshuman


--

From: Michael Evans
Date: Thursday, March 25, 2010 - 8:38 pm

On Thu, Mar 25, 2010 at 7:09 AM, Anshuman Aggarwal

Please, read the wikipedia page first,

http://en.wikipedia.org/wiki/RAID

and then this

http://wiki.tldp.org/LVM-on-RAID (some links need updating, but it's
still up to date for concepts)


With that background nearly out of the way, please stop, and read them
both again.  Yes, seriously.  In order to prevent data loss you'll
need to have a good understanding of what RAID does, so that you can
watch out for ways it can fail.

The next step, before we do /anything/ else is for you to post the
COMPLETE output of these commands.

mdadm -Dvvs
mdadm -Evvs

They will help everyone on the list better understand the state of the
metadata records and what potential solutions might be possible.
--

From: Anshuman Aggarwal
Date: Friday, March 26, 2010 - 9:28 am

Thanks again. I have visited those pages (twice no less) and nothing seems to be new from the concepts (both raid and lvm) since I last studied them. 

My problem is that I'm not familiar enough with the recovery tools and the common practical pitfalls to do this comfortably without the hand holding of this mailing list :)

Here is the requested output:
Note: Since I have 3-4 other arrays running (root device etc.) which don't have anything to do with this one and are all working fine...I am just putting the output of the relevant devices (in order to avoid confusing everybody). Please let me know if you still require the full output.  

mdadm -Dvvs /dev/md_d127
mdadm: md device /dev/md_d127 does not appear to be active.

mdadm --assemble  /dev/md_d127 /dev/sda1 /dev/sdb5 /dev/sdc5 /dev/sdd5
mdadm: /dev/md_d127 assembled from 2 drives and 1 spare - not enough to start the array.

Says that the device /dev/md_d127 is not active (because its not active in /proc/mdstat)	
mdadm -Evvs  /dev/sda1 /dev/sdb5 /dev/sdc5 /dev/sdd5
/dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 42c56ea0:2484f566:387adc6c:b3f6a014
           Name : GATEWAY:127  (local to host GATEWAY)
  Creation Time : Sat Aug 22 09:44:21 2009
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 586099060 (279.47 GiB 300.08 GB)
     Array Size : 1758296832 (838.42 GiB 900.25 GB)
  Used Dev Size : 586098944 (279.47 GiB 300.08 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 571fa32b:d76198a1:0f5d3a2d:31f6d6b8

Internal Bitmap : 2 sectors from superblock
    Update Time : Fri Mar 19 00:56:15 2010
       Checksum : 7e769165 - expected aa523227
         Events : 3796145

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : spare
   Array State : .AA. ('A' == active, '.' == missing)
/dev/sdb5:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID ...
From: Michael Evans
Date: Friday, March 26, 2010 - 12:04 pm

On Fri, Mar 26, 2010 at 9:28 AM, Anshuman Aggarwal

Obviously you do not understand the problem then, since if you did not
previously, and you say you learned nothing new.

Also, you added additional arguments to the commands I provided when
that was neither required nor desired.

However enough data was returned to see one thing:  ALL of the events
counters show the same number.

That is extremely odd, usually in this situation at least one device
will have a lower number.


If possible please describe what happened to cause this in the first place.

Also, you'll find these links more directly relevant to your problem:

https://raid.wiki.kernel.org/index.php/RAID_Recovery

Reading my local copy of the manpage (which is slightly outdated, you
should really get the latest stable mdadm release, compile, install
and read the manual to confirm it's still not there) I can't find any
way of bringing an array up in read only mode without using missing
devices, which is what the permutation script tries to do.
Additionally without knowing what type of event is being recovered
from; I suspect either simultaneous disconnection of half the drives;
or what you've done since, because it looks like something, I cannot
offer concrete advice on how to proceed.

However there are two main routes open to you at this point.  Posting
a fresh message asking how to create an array read only for use with
data recovery, and some variant of following the perl script's steps
that the linked document mentions.
--

From: Anshuman Aggarwal
Date: Sunday, March 28, 2010 - 8:18 am

Michael,
I am running mdadm 3.1.2 (latest stable I think) compiled from source (FYI on Ubuntu Karmic, 2.6.31-20-generic)

Here is what happened....the device /dev/sda1 has failed once, but I was wondering if it was a freak accident so I tried adding it back..and then it started resyncing ...somewhere in this process...the disk /dev/sda1 stalled and the server needed a reboot. After that boot, I got 2 spares (/dev/sda1, /dev/sdd5) and 2 active devices (/dev/sdb1, /dev/sdc1)

Maybe I need to do a build with a --assume-clean with the devices in the right order (which I'm positive I can remember) ...be nice if you could plz double check:
mdadm --build -n 4 -l 5 -e1.2 --assume-clean /dev/md127 /dev/sda1 /dev/sdb5 /dev/sdc5 /dev/sdd5

Again, thanks for your time...

John,
 I did try what you said without any luck(--assemble --force but it refuses to accept the spare as a valid device and 2 active on a 4 member device isn't good enough)





--

From: Anshuman Aggarwal
Date: Sunday, March 28, 2010 - 9:35 am

Some more info:

I did try this command with the following result:

mdadm --build -n 4 -l 5 -e1.2 --assume-clean /dev/md127 /dev/sda1 /dev/sdb5 /dev/sdc5 /dev/sdd5
mdadm: Raid level 5 not permitted with --build.

Should I try this?

Thanks--

From: Luca Berra
Date: Sunday, March 28, 2010 - 10:32 pm

From your description above /dev/sda was the failed one, so you should
not add it to the array. use the word "missing" in its place.

L.

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \
--

From: Michael Evans
Date: Sunday, March 28, 2010 - 11:41 pm

Additionally to using missing for the device you know to be a failed
one, VERY highly suggest running a check, or some read-only operation
on the resulting raid device to make sure you can read all of the
data.  Be sure to check the dmesg/system logs to make sure that there
were no noted storage errors.  If there were not, it is /probably/
safe to re-add the previously failed disk and resync it.

While checking that your array data can be read, you should probably
also run the SMART tests via smartctl (or a gui for it) on the
'failed' disk to see if it was a sign of something worse.

In any case, I do NOT recommend using anything within the raid
container other than in read-only mode until the resync is complete.
You may need to use portions of sda that are still good in more
elaborate ways to recover data that is readable there, but not
readable on sdd or other drives.  Read/write mode or even FSCK on the
array contents will only increase the chances of data being out of
sync.
--

From: Anshuman Aggarwal
Date: Tuesday, April 6, 2010 - 11:07 am

I've just had to recreate my raid5 device by using 
mdadm --create --assume-clean -n4 -l5 -e1.2 -c64 

in order to recover my data (because --assemble would not work with force etc.). 
The problem:
 *  Data Offset in the new array is much larger. 
 * Internal Bitmap is starting at a different # sectors from superblock.
 * Array Size is smaller though the disks are the same. 

How can I get these to be the same as what they were in the original array???


I have tried to make sure that nothing gets written to the md device except the metadata during create. 
All of these are important because the fs on top of the LVM on top of the md would need all the data it can to fsck properly and I don't want it starting on the wrong offset. 

I am including the output from mdadm --examine from before and after the create


New...

/dev/sdb5:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 8588b69c:c0579680:8a63486a:cbcb0e7d
           Name : GATEWAY:511  (local to host GATEWAY)
  Creation Time : Tue Apr  6 01:53:25 2010
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 586097284 (279.47 GiB 300.08 GB)
     Array Size : 1758290688 (838.42 GiB 900.24 GB)
  Used Dev Size : 586096896 (279.47 GiB 300.08 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 13d6a075:c1cad6dc:c13c3d98:e4b980e9

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Apr  6 23:23:07 2010
       Checksum : df3cb34f - correct
         Events : 4

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 2
   Array State : .AAA ('A' == active, '.' == missing)



--

From: Neil Brown
Date: Tuesday, April 6, 2010 - 3:55 pm

On Tue, 6 Apr 2010 23:37:02 +0530

Use the same version of mdadm and you used to originally create the array.
Probably 2.6.9 from the data, though 3.1.1 seems to create the same layout.
So anything before 3.1.2

I really should write a "--recreate" for mdadm which uses whatever parameters
if it finds already on the devices.


--

From: Berkey B Walker
Date: Tuesday, April 6, 2010 - 5:24 pm

I think that is great thinking, Sir.
--

From: Anshuman Aggarwal
Date: Wednesday, April 7, 2010 - 12:27 am

Since I already tried to recreate using 3.1.2, with super block 1.2, would it have overwritten much other data on the device? Also is the superblock format documented somewhere such as a graph explaining where what is stored?


--

From: Neil Brown
Date: Wednesday, April 7, 2010 - 6:15 am

On Wed, 7 Apr 2010 12:57:20 +0530

I don't think it will have overwritten any data, but I don't have enough info
to be 100% certain.

If you used --assume-clean, and did not write anything to the array, then
only the superblock and bitmap will have been written.

The superblock that you wrote will be the same location as the old
superblock, so writing that will not corrupt data.

The bitmap will have been written 8 sectors from superblock rather than 2,
but it will probably have been a smaller bitmap.
If you report
  mdadm -X /dev/sdb5
I can tell you how big the bitmap is, and so whether it would have extended
in to the data which was at 272 sectors from the start of the device.
So the bitmap would have to exceed 266 sectors for it to over-write any data.

The only superblock documentation I know of is in the source code for mdadm
and the kernel.


--

From: John Robinson
Date: Friday, March 26, 2010 - 12:29 pm

On 26/03/2010 16:28, Anshuman Aggarwal wrote:
[...]

You said sda was broken, so forget that. Goodness knows how sdd5 managed 
to end up being a spare. I think you want `mdadm --assemble /dev/md_d127 
--force /dev/sd[bcd]5`. I don't think you can start it read-only but 
with a member missing you're not going to get a resync going so this is 
unlikely to cause data loss. Still, don't do this if you don't believe 
it's the correct answer, and certainly don't blame me if it wastes your 
data. Good luck!

Cheers,

John.
--

Previous thread: Auto Rebuild on hot-plug by Neil Brown on Wednesday, March 24, 2010 - 5:35 pm. (33 messages)

Next thread: Re: Use of WD20EARS with MDADM by Bill Davidsen on Wednesday, April 14, 2010 - 12:53 pm. (52 messages)