Re: mdadm: failed devices become spares!

Previous thread: mdadm: failed devices become spares! by Pierre =?iso-8859-1?q?Vign=E9ras?= on Sunday, May 16, 2010 - 8:40 am. (1 message)

Next thread: Manufacturers Database - 1,057,119 records with 476,509 emails by Pham anorthite on Sunday, May 16, 2010 - 5:33 pm. (1 message)
From: Leslie Rhorer
Date: Sunday, May 16, 2010 - 12:56 pm

It's not quite clear to me from the link whether your drives are
truly toast, or not.  If they are, then you are hosed.  Assuming not, then
you need to use 

`mdadm --examine /dev/sdxx` and `mdadm -Dt /dev/mdyy`

	to determine precisely all the parameters and the order of the block
devices in the array.  You need the chunk size, the superblock type, which
slot was occupied by each device in the array (this may not be the same as
when the array was created), the size of the array (if it did not fill the
entire partition in every case), the RAID level, etc.  Once you are certain
you have all the information to enable you to re-create the array, if need
be, the try to re-assemble the array with

`mdadm --assemble --force /dev/mdyy`

	If it works, then fsck the file system.  (I think I noticed you are
using XFS.  If so, do not use XFS_Check.  Instead, use XFS_Repair with the
-n option.)  After you have a clean file system, issue the command

`echo repair > /sys/block/mdyy/md/sync_action`

	to re-sync the array.  If the array does not assemble, then you will
need to stop it and re-create it using the options you obtained from your
research above and adding the --assume-clean switch to prevent a resync if
something is wrong.  If the fsck won't work after re-creating the array,
then you probably got one or more of the parameters incorrect.

--

From: Pierre =?iso-8859-1?q?Vign=E9ras?=
Date: Monday, May 17, 2010 - 11:10 am

Thanks for your help. Here is what I did:

 
# cat /proc/mdstat          
Personalities : [raid1] [raid10] [raid6] [raid5] [raid4] 
[...]
md2 : inactive sdc1[2](S) sdd1[5](S) sdf1[4](S) sde1[3](S)
      1250274304 blocks                                   
[...]                                                          
                              
# mdadm --examine /dev/sdc1 /dev/sdd1 /dev/sdf1 /dev/sde1
/dev/sdc1:                                             
          Magic : a92b4efc                             
        Version : 00.90.00                             
           UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos)
  Creation Time : Thu Aug  6 01:59:44 2009                                  
     Raid Level : raid10                                                    
  Used Dev Size : 312568576 (298.09 GiB 320.07 GB)                          
     Array Size : 625137152 (596.18 GiB 640.14 GB)                          
   Raid Devices : 4                                                         
  Total Devices : 4                                                         
Preferred Minor : 2                                                         

    Update Time : Tue Apr 13 19:22:21 2010
          State : clean                   
Internal Bitmap : present                 
 Active Devices : 2                       
Working Devices : 4                       
 Failed Devices : 0                       
  Spare Devices : 2                       
       Checksum : 5baf7939 - correct      
         Events : 90612                   

         Layout : near=2, far=1
     Chunk Size : 64K          

      Number   Major   Minor   RaidDevice State
this     2       8       33        2      active sync   /dev/sdc1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       65        3      active sync   ...
From: Tim Small
Date: Monday, May 17, 2010 - 2:09 pm

If you want to experiment with different ways of getting the data back,
but without risking writing anything to the drives, you could do this:

1. Use dmsetup to create copy-on-write "virtual drives" which
"see-through" to the content of your real drives, but don't risk writing
anything at all to them.

2. Use mdadm --create --assume-clean ...blahblah...
/dev/mapper/cow_drive_1  .....

to force mdadm to put the array back together the way you think it was
(the output of examine will be useful here).  You'll need to specify (at
least - from memory):

. stripe size
. metadata version (this affects metadata location on the drives)
. correct device order (with or without a single failed drive)


... after that you can run a read-only (or read-write) check on the COW
md partition to verify that you've got your data back, then mount it
read-only etc.  Once you're happy that your commands are going to get
things running again, you can run them "for real" on the non-COW devices.

See the recent list archives for my post on using a similar set of
commands for HW RAID data forensics, along with references....

HTH,

Tim.
--

From: Neil Brown
Date: Monday, May 17, 2010 - 6:30 pm

On Mon, 17 May 2010 20:10:36 +0200

Something strange...
I cannot explain the 'SpareActive' messages.
Most of the rest makes sense.

You had a RAID10 - 4 drives in near=2 mode.  So the first two disks contain
identical data, and the second two are also identical and contain the rest.
The second device failed due to a write error.
Why it seemed to become a spare I'm not sure.  I'm not all sure it did
become a spare immediately- your logs aren't conclusive on that point.
It did eventually become a spare, but that could be because you "removed and
added the devices" which would have changed them from 'fail' to 'spares'.

Then the first device in the array reported an error and so was failed.
After this you would not be able to read or write to the even chunks of the
array, xfs noticed and complained.

By this time sdf1 seemed to be a spare so it gave recovery a try.  The
recovery process discovered there was nowhere to read good data from and
immediately gave up.

However if the devices really are OK, then sdf1 and sdc1 should contain
identical data (except the superblock would be slightly different.
You could check this with "cmp -l", though that might not be very efficient.
Also sdd1 and sde1 should be identical.

I suggest that you try:

 mdadm -S /dev/md2
 mdadm -C /dev/md2 -l 10 -n 4 -c 64 -e 0.90 /dev/sdc1 missing /dev/sdd1 missing  --assume-clean

and then see what the data on md2 looks like.
You could equally try sdf1 in place of sdc1, or sde1 in place of sdd1
(make sure you double check the device names, don't assume I got then right).

Once you have a combination that look good, you can add the other two devices
an they will recover and you should have your data back.

BUT be warned.  Something cause some errors to be reported.  Unless you find
out what that was and fix it, errors will occur again.  I have no idea what
might have caused those errors.  Bad media? bad controller ? bad usb
controller? bad luck?

I wouldn't write new data, or even perform a ...
From: Neil Brown
Date: Monday, May 17, 2010 - 7:06 pm

On Tue, 18 May 2010 11:30:16 +1000

Actually I can explain that I think.

When a device fails it gets marked as faulty, then as soon as there is no
more pending IO it gets moved out of the array.  "mdadm -D" will show it with
a larger 'Number' and a 'RaidDevice' of '-'.
Normally these happen almost as a single operation, though a lot of pending
IO can slow it down.

"mdadm --monitor" identified devices based on 'Number', so it would normally
see a working device disappear - which is reported a a failure, and a
'faulty/spare' device appear, which it ignores.

However if --monitor gets to check the array between the above to events, it
will first see that the working drive is now faulty, so it reports a failure,
and then see that the faulty device isn't faulty any more and in fact isn't
even there.  The "isn't event there" bit doesn't register and it treats it as
'SpareActive'.

I should fix that.

So I'm quite sure now that your devices didn't really become spares until you
removed and added them, which is exactly they way to turn failed devices
into spares.

NeilBrown

--

From: MRK
Date: Tuesday, May 18, 2010 - 3:25 pm

However in one case the two events are not detected in the same round:

Apr 12 20:10:02 phobos mdadm[3157]: Fail event detected on md device /dev/md2,
component device /dev/sdf1
Apr 12 20:11:02 phobos mdadm[3157]: SpareActive event detected on md device
/dev/md2, component device /dev/sdf1


1 minute passes between the two entries. I suppose that's the mdadm 
daemon polling time.

In the other case all the entries are at the same time

Apr 13 08:00:02 phobos mdadm[3157]: Fail event detected on md device /dev/md2,
component device /dev/sdd1
Apr 13 08:00:02 phobos mdadm[3157]: SpareActive event detected on md device
/dev/md2, component device /dev/sdd1
Apr 13 08:00:02 phobos last message repeated 7 times
[...many times that messages..]


...plus, in this second case the SpareActive triggers a lot of times 
within that same second (Pierre you cut it short, but are all the "many 
times that messages" all at the exact same time or they span a few seconds?)

It looks to me like some kind of usb failure where the USB connection or 
USB bridge momentarily fails then immediately gets re-detected and 
re-added to the system. But since there are no usb entries in dmesg, 
that would also be an issue of the usb driver. Could the problem also be 
a mixture with some unwise udev triggers of Debian, maybe somehow 
causing the auto-re-add of the drive to the RAID?

Pierre:
- can you post your mdadm.conf?
- USB is not good for RAID imho. Many times in my life I saw problems 
with USB/SATA bridges where the drive would get disconnected on high I/O 
activity and then reconnected after a few seconds. Anyway, readding it 
to the RAID shouldn't have happened. Also in my case there were "usb" 
entries in dmesg.
--

From: Simon Matthews
Date: Wednesday, May 19, 2010 - 12:56 pm

I  can second that. At one time I had a USB backup drive that was
configured as half a RAID 1 set. This was so that the drive could
immediately be used in the event of a massive failure of the file
server.

Pulling this USB drive before stopping the RAID device caused the
machine to become unresponsive. I think it was trying to do some kind
of I/O, all I know was that a hard boot was the only way I could get
the machine out of that condition.

Simon
--

From: Pierre =?utf-8?q?Vign=C3=A9ras?=
Date: Friday, May 21, 2010 - 2:00 pm

Well I was probably  tired when I tried to filter the log for the bug report. 
It seems that this 'last message repeated 7 times' is for the:

Apr 13 08:00:02 phobos kernel: [5814019.208017] nfsd: non-standard errno: 5

not for the:

Apr 13 08:00:02 phobos mdadm[3157]: SpareActive event detected on md device 
/dev/md2, component device /dev/sdd1

I looked into my log and can't find something else. Sorry, sorry, sorry if 

Sure, but I am not sure it will be useful:

$ cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays
ARRAY /dev/md0 level=raid1 num-devices=2 
UUID=13f4fdef:db0bd815:77e02d4f:1bda00b4
ARRAY /dev/md1 level=raid1 num-devices=2 
UUID=4a120782:2ed3053c:e99784b3:b8e5f7bf
ARRAY /dev/md4 level=raid1 num-devices=2 
UUID=b3c7212a:e95c5081:24bf28c1:396de87f
ARRAY /dev/md2 level=raid10 num-devices=4 
UUID=b34f4192:f823df58:24bf28c1:396de87f
ARRAY /dev/md3 level=raid5 num-devices=3 

Well, that is what I discover: USB and RAID is not currently fine (hum, on 
Debian stable, not sure, we can say 'currently', kernel is:

$ uname -a
Linux phobos 2.6.26-2-686 #1 SMP Tue Mar 9 17:35:51 UTC 2010 i686 GNU/Linux
$

).

Anyway, it would be a great feature if USB can be used for a RAID setup, at 
least for end users (actually, I am using in my setup, a "special" layout for 
the using of RAID on several heterogeneous drives that I described here:

http://www.linuxconfig.org/prouhd-raid-for-the-end-user

)

Thanks for your help and regards.
-- ...
From: Pierre =?utf-8?q?Vign=C3=A9ras?=
Date: Friday, May 21, 2010 - 2:27 pm

Sorry for the delay of my reply...

This small mail to let you know that my RAID array is currently recovering 
thanks to the valuable inputs of this mailing list users. You are great!

For the curious, what I did is the following:

# ##### Do not forget the '--assume-clean' as I almost did! ;-(
# mdadm -C /dev/md2 -l 10 -n 4 -c 64 -e 0.90 --assume-clean /dev/sdd1 missing 
/dev/sdc1 missing               
# vgchange -a y                                                                                               
# xfs_repair -n -t 1 -v /dev/my-vg/my-lv
# mount -o ro /dev/my-vg/my-lv /mnt/tmp
# find /mnt/tmp
# du -ks /mnt/tmp/
# umount /mnt/tmp
# #### Required: XFS asked the log to get replayed
# mount /dev/my-vg/my-lv /mnt/tmp/
# umount /mnt/tmp
# xfs_repair  -t 1 -v /dev/my-vg/my-lv
# mdadm --manage /dev/md2 --add /dev/sde1
# mdadm --manage /dev/md2 --add /dev/sdf1

The array is currently at 25 % of the recovery process. A bit too soon to say 
that everything is fine... By the way, I am quite sure now that my USB 
controllers (or the use driver or whatever in the chain except all disks) are 
buggy: all the other RAIDs of my setup are gone!

I will try to recover them using the same kind of process, to backup all data. 
Do you think that using BBR (since each time, the burden started due to a 
sector (write?) error), the problem will be "solved" (or at least postponed 
until BBR itself does not have enough free sectors)?

Anyway, again, thanks a lot to all of you. 
Open Source rocks! ;-)
-- 
Pierre Vignéras
--

From: Pierre =?utf-8?q?Vign=C3=A9ras?=
Date: Tuesday, May 18, 2010 - 4:07 pm

Well, actually, here is what I have:

phobos:~# mdadm --examine /dev/sd[c-f]1
/dev/sdc1:                             
          Magic : a92b4efc             
        Version : 00.90.00             
           UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos)
  Creation Time : Thu Aug  6 01:59:44 2009                                  
     Raid Level : raid10                                                    
  Used Dev Size : 312568576 (298.09 GiB 320.07 GB)                          
     Array Size : 625137152 (596.18 GiB 640.14 GB)                          
   Raid Devices : 4                                                         
  Total Devices : 4                                                         
Preferred Minor : 2                                                         

    Update Time : Tue Apr 13 19:22:21 2010
          State : clean                   
Internal Bitmap : present                 
 Active Devices : 2                       
Working Devices : 4                       
 Failed Devices : 0                       
  Spare Devices : 2                       
       Checksum : 5baf7939 - correct      
         Events : 90612                   

         Layout : near=2, far=1
     Chunk Size : 64K          

      Number   Major   Minor   RaidDevice State
this     2       8       33        2      active sync   /dev/sdc1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       65        3      active sync   /dev/sde1
   4     4       8       81        4      spare   /dev/sdf1      
   5     5       8       49        5      spare   /dev/sdd1      
/dev/sdd1:                                                       
          Magic : a92b4efc                                       
        Version : 00.90.00                                       
           UUID : ...
From: Neil Brown
Date: Tuesday, May 18, 2010 - 6:45 pm

On Wed, 19 May 2010 01:07:40 +0200

The fact that sdc1 appear to have the same content as sde1 perfectly matches
the fact that these two devices think the are devices "2" and "3" in the
array, so they still contain half of your data.  This is good.

The fact that sdf1 appears to match sdd1 partly but not completely suggests
that they were devices "0" and "1", but that one of them has had other stuff
written to it.


The way to find out is to try and see.
If you create an array following the above pattern it will not change any
data on the devices, just the superblock, which you have a record of in this
email now.
So you should try creating an array, run "fsck -n" and see if the filesystem
looks OK.  If it does, mount ( -o ro ) and see what it looks like.

Then try the other possibility and see how that compares.
Given the current names of devices, the list given to the mdadm command
should be:

   /dev/sdd1 missing /dev/sdc1 missing
or
   /dev/sdf1 missing /dev/sdc1 missing

Hopefully one of those will mount and fsck successfully.


--

Previous thread: mdadm: failed devices become spares! by Pierre =?iso-8859-1?q?Vign=E9ras?= on Sunday, May 16, 2010 - 8:40 am. (1 message)

Next thread: Manufacturers Database - 1,057,119 records with 476,509 emails by Pham anorthite on Sunday, May 16, 2010 - 5:33 pm. (1 message)