Hello Linux-Raid list,
------------------------
My problem in a nutshell:
------------------------
I am unable to mount a RAID-0 (EXT3?) filesystem which I previously
assembled with mdadm under Ubuntu 9.10 32bitx. This RAID-0 array was
originally created by my NAS Thecus N4100.
I am getting the following console message:
mount: wrong fs type, bad option, bad superblock on /dev/md0,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
See test [T007] below for detailed messages.
In other words, I cannot recover my stored data.
Can you help? It's crucial to me.
Thanks in advance,
David
------------------------
My story in (very) short:
------------------------
I own a NAS Thecus N4100 -- works perfectly -- with 4 x 400GB disks
running as a RAID-0 array. No physical disk error, RAID is perfectly
sain.
In Feb 2010, I (had to) extract(ed) the 4 disks off the NAS rack in
order to remount the RAID under a regular Linux box. I placed the 4
disks in USB cases, labelled the cases (1,2,3,4) according to the
disks genuine order in the NAS rack (see Figure 1 below), and tried to
rebuild the RAID-0 array by means of mdadm under Ubuntu 9.10, using
the disks connection situation depited in Figure 2.
After several trial and error manipulations (not only, but in
particular, to regenerate the RAID superblocks), I was able to
re-create the RAID-0 array but... I am unable to mount the RAID file
system in the end.
Panik: I attempted to insert the disks back to the NAS to check
whether the RAID was still "alive" in the NAS. The NAS rebooted for
about 10 minutes (what did it do? I do not know), then reported that
my RAID configuration was gone. I wrote nothing to the NAS, I properly
shut the NAS down, and put the disks back to their USB cases in order
to resolve this RAID issue (if resolvable?) with mdadm.
I performed several tests (reported below) which formally describe the
situation I am facing.
I do need your help to understand what is wrong and whether (and how)
to solve this issue.
NOTE: I am currently working with 4 disk images of my physical disks
as depicted by Figure 3 below, so I can safely perform destructive
tests and manipulations you suggest me, with no risk for the original
disks. I (only...) need 15 hours to recreate a disk image set.
..
-----------------
My main questions:
-----------------
1) Test [T006]
***********
(1.1) Is there something / what is wrong my with my RAID superblocks?
mdadm accepts to assemble the disks, reports a sain RAID, BUT mdadm -D
details about each disk unexceptingly reference a fixed /dev/sdc2
partition in the array, which is no way involved in the array
(although there is an existing /dev/sdc2 partition on my /dev/sdc
disk).
(1.2) Can this be a source of the problems?
(1.3) Can this so-called RAID superblock inconsistency due to some
deeper EXT2/3 filesystem issue or corruption mentioned in question 2
below?
(1.4) How can I practically perform a deep RAID superblock check on
each disk other than mdadm -D /dev/diskN that would make such an
inconsistency explanable?
2) Test [T007]: EXT2/3 file system issues
***************************************
(2.1) Do I FIRST need to resolve the EXT2/3 filesystem issues reported
when mounting the RAID filesystem (therefore considering the EXT2/3
induces the RAID isssue)?
(2.2) Or reversely: are the EXT2/3 issues the consequence of a RAID issue?
(2.3) At which level do I need to work in order to solve this issue out?
(2.4) Any work methodology advice is welcome...
3) Do I need to recover my RAID partition in order to mount it, or is
there any RAID-related manipulation, configuration I missed with
mdadm, which prevents me from mounting it?
4) Test [T003]
***********
The tests performed with Palimpsest report a 201 MB "unknown" partition:
(4.1) Where does this disk zone come from?
(4.2) Was it accidentally created by the NAS when the disks were
re-inserted back to its rack? (sent a ticket to Thecus support about
this point - no answer yet).
(4.3) Is this 201MB unkown zone a RAID-0 disk feature common to all
RAID-0 arrays? If yes, what is this zone supposed to contain?
(4.4) If yes to question (4.2): does this mean my RAID-0 data are
definively lost because the N4100 implicitly deleted a part of the
RAID partition?
5) Tests [T010] to [T012]
**********************
Test executed with Testdisk report a Linux partition that seems to be
living beside the RAID component partition, and could be a "lost"
partition buried in the 201MB unknown zone.
(5.1) Is this partition a feature in RAID-0 arrays?
(5.2) Is this an inconsistency caused by missusing mdadm? Or by the
NAS when the disks were inserted back to their rack?
(5.2) How can I resolve that?
6) Any piece of advice, test to perform, manipulations, etc are welcome.
----------------------------------
THECUS N4100 initial configuration:
----------------------------------
- Firmware : 1.3.06 (SSH plugin installed)
- RAID Level : RAID-0
- Disks : 4 x Seagate Baracuda ST3400832AS 400 GB
- Total RAID capacity : 1.6 TB
- Used space : around 75%
- Seagate ST3400832AS features (from manufacturer):
* Total capacity : 400 GB
* Usable capacity : 372.6 GB
* Cylinders : 16383
* Heads : 16
* Sectors : 63
- Figure 1: Disks genuine ordering in the NAS rack:
********
+------------+
* Top disk : | Disk 1 |
+------------+
* Next disk : | Disk 2 |
+------------+
* Third disk : | Disk 3 |
+------------+
* Bottom disk : | Disk 4 |
+------------+
- Figure 2: Disks connections in the Linux box:
********
Thecus USB Linux Disk
N4100 devices devices partitions
+--------+
| | --> 201MB (Unknown)
| DISK 1 | --> USB Disk 1 --> /dev/sdf |
| | --> 372.4GB (RAID compon.1)
+--------+
+--------+
| | --> 201MB (Unknown)
| DISK 2 | --> USB Disk 2 --> /dev/sdg |
| | --> 372.4GB (RAID compon.2)
+--------+
+--------+
| | --> 201MB (Unknown)
| DISK 3 | --> USB Disk 3 --> /dev/sdh |
| | --> 372.4GB (RAID compon.3)
+--------+
+--------+
| | --> 201MB (Unknown)
| DISK 4 | --> USB Disk 4 --> /dev/sdi |
| | --> 372.4GB (RAID compon.4)
+--------+
- Figure 3: Corresponding disk images situation:
********
Thecus Disk Loop Mapped disk
N4100 images devices partitions
+--------+ --> 201MB (Unknown)
| | | /dev/mapper/loop0p1
| DISK 1 | --> disk0.hd --> /dev/loop0 |
| | --> 372.4GB (RAID compo.1)
+--------+ /dev/mapper/loop0p2
+--------+ --> 201MB (Unknown)
| | | /dev/mapper/loop1p1
| DISK 2 | --> disk1.hd --> /dev/loop1 |
| | --> 372.4GB (RAID compo.2)
+--------+ /dev/mapper/loop1p2
+--------+ --> 201MB (Unknown)
| | | /dev/mapper/loop2p1
| DISK 3 | --> disk2.hd --> /dev/loop2 |
| | --> 372.4GB (RAID compo.3)
+--------+ /dev/mapper/loop2p2
+--------+ --> 201MB (Unknown)
| | | /dev/mapper/loop3p1
| DISK 4 | --> disk3.hd --> /dev/loop3 |
| | --> 372.4GB (RAID compo.4)
+--------+ /dev/mapper/loop3p2
-------------------------------------------------------
Performed tests & manipulation descriptions and results:
-------------------------------------------------------
PART 1: TESTS USING mdadm
*************************
------
[T001] Connecting USB disk 1, disk 2, disk 3, disk 4 and gathering information.
------
The purpose of this test suite is to verify the response of the system
when connecting each physical RAID disk as a USB device.
* ACTION *
I am connecting disk 1 as USB device /dev/sdf to my Ubuntu system:
* messages.log *
May 15 14:42:19 obelix kernel: [176690.908772] usb 1-7.3.3: new high
speed USB device using ehci_hcd and address 11
May 15 14:42:19 obelix kernel: [176691.002540] usb 1-7.3.3:
configuration #1 chosen from 1 choice
May 15 14:42:19 obelix kernel: [176691.011777] scsi10 : SCSI emulation
for USB Mass Storage devices
May 15 14:42:24 obelix kernel: [176696.059395] scsi 10:0:0:0:
Direct-Access ST340083 2AS PQ: 0 ANSI: 2 CCS
May 15 14:42:24 obelix kernel: [176696.060124] sd 10:0:0:0: Attached
scsi generic sg8 type 0
May 15 14:42:24 obelix kernel: [176696.071314] sd 10:0:0:0: [sdf]
781422768 512-byte logical blocks: (400 GB/372 GiB)
May 15 14:42:24 obelix kernel: [176696.075622] sd 10:0:0:0: [sdf]
Write Protect is off
May 15 14:42:24 obelix kernel: [176696.078622] sdf: sdf1 sdf2
May 15 14:42:24 obelix kernel: [176696.116632] sd 10:0:0:0: [sdf]
Attached SCSI disk
* ACTION *
I am connecting disk 2 as USB device /dev/sdg to my Ubuntu system:
* messages.log *
May 15 14:52:11 obelix kernel: [177282.841023] usb 1-7.3.1: new high
speed USB device using ehci_hcd and address 12
May 15 14:52:11 obelix kernel: [177282.936281] usb 1-7.3.1:
configuration #1 chosen from 1 choice
May 15 14:52:11 obelix kernel: [177282.955419] scsi11 : SCSI emulation
for USB Mass Storage devices
May 15 14:52:16 obelix kernel: [177287.961386] scsi 11:0:0:0:
Direct-Access ST340083 2AS PQ: 0 ANSI: 2
May 15 14:52:16 obelix kernel: [177287.962147] sd 11:0:0:0: Attached
scsi generic sg9 type 0
May 15 14:52:16 obelix kernel: [177287.969607] sd 11:0:0:0: [sdg]
781422768 512-byte logical blocks: (400 GB/372 GiB)
May 15 14:52:16 obelix kernel: [177287.975128] sd 11:0:0:0: [sdg]
Write Protect is off
May 15 14:52:16 obelix kernel: [177287.980862] sdg: sdg1 sdg2
May 15 14:52:16 obelix kernel: [177288.011894] sd 11:0:0:0: [sdg]
Attached SCSI disk
* ACTION *
I am connecting disk 3 as USB device /dev/sdh to my Ubuntu system:
* messages.log *
May 15 14:59:33 obelix kernel: [177724.441158] usb 1-7.2: new high
speed USB device using ehci_hcd and address 14
May 15 14:59:33 obelix kernel: [177724.536461] usb 1-7.2:
configuration #1 chosen from 1 choice
May 15 14:59:33 obelix kernel: [177724.543552] scsi13 : SCSI emulation
for USB Mass Storage devices
May 15 14:59:38 obelix kernel: [177729.545857] scsi 13:0:0:0:
Direct-Access ST340083 2AS PQ: 0 ANSI: 2
May 15 14:59:38 obelix kernel: [177729.546667] sd 13:0:0:0: Attached
scsi generic sg10 type 0
May 15 14:59:38 obelix kernel: [177729.552659] sd 13:0:0:0: [sdh]
781422768 512-byte logical blocks: (400 GB/372 GiB)
May 15 14:59:38 obelix kernel: [177729.556128] sd 13:0:0:0: [sdh]
Write Protect is off
May 15 14:59:38 obelix kernel: [177729.561068] sdh: sdh1 sdh2
May 15 14:59:38 obelix kernel: [177729.590054] sd 13:0:0:0: [sdh]
Attached SCSI disk
* ACTION *
I am connecting disk 4 as USB device /dev/sdi to my Ubuntu system:
* messages.log *
May 15 15:00:14 obelix kernel: [177765.658207] usb 1-7.3.4: new high
speed USB device using ehci_hcd and address 15
May 15 15:00:14 obelix kernel: [177765.752468] usb 1-7.3.4:
configuration #1 chosen from 1 choice
May 15 15:00:14 obelix kernel: [177765.773190] scsi14 : SCSI emulation
for USB Mass Storage devices
May 15 15:00:19 obelix kernel: [177770.777746] scsi 14:0:0:0:
Direct-Access ST340083 2AS PQ: 0 ANSI: 2
May 15 15:00:19 obelix kernel: [177770.778639] sd 14:0:0:0: Attached
scsi generic sg11 type 0
May 15 15:00:19 obelix kernel: [177770.789192] sd 14:0:0:0: [sdi]
781422768 512-byte logical blocks: (400 GB/372 GiB)
May 15 15:00:19 obelix kernel: [177770.796334] sd 14:0:0:0: [sdi]
Write Protect is off
May 15 15:00:19 obelix kernel: [177770.805059] sdi: sdi1 sdi2
May 15 15:00:19 obelix kernel: [177770.837077] sd 14:0:0:0: [sdi]
Attached SCSI disk
* ACTION *
I am now collecting summarizing information about the USB RAID disks
connected to my Ubuntu box:
$ sudo blkid
* CONSOLE-OUT *
(... other devices ...)
/dev/sdf2: UUID="ecfe8404-2f35-4a45-d668-56da8e136666" TYPE="linux_raid_member"
/dev/sdg2: UUID="ecfe8404-2f35-4a45-d668-56da8e136666" TYPE="linux_raid_member"
/dev/sdh2: UUID="ecfe8404-2f35-4a45-d668-56da8e136666" TYPE="linux_raid_member"
/dev/sdi2: UUID="ecfe8404-2f35-4a45-d668-56da8e136666" TYPE="linux_raid_member"
* QUESTION *
The 4 RAID disk partitions are detected as "Linux Raid Members" and
share the same UUID, which should be normal since they belong to the
same RAID array. Is this right?
------
[T002] Disks 1, 2, 3 and 4 geometry and partitions using fdisk -l
------
The purpose of this test suite is to report the physical geomtry and
partitioning information returned by fdisk -l for each physical RAID
disk.
* ACTION *
I am examinating each physical disk geometry and partitioning reported
by fdisk.
$ sudo fdisk -l /dev/sdf
* CONSOLE-OUT *
Disk /dev/sdf: 400.1 GB, 400088457216 bytes
16 heads, 63 sectors/track, 775221 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes
Disk identifier : 0x00000000
Device Boot Start End Blocks Id System
/dev/sdf1 1 389 196024+ 83 Linux
/dev/sdf2 390 775221 390515328 fd Linux raid autodetect
$ sudo fdisk -l /dev/sdg
* CONSOLE-OUT *
Disk /dev/sdg: 400.1 GB, 400088457216 bytes
16 heads, 63 sectors/track, 775221 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes
Disk identifier : 0x00000000
Device Boot Start End Blocks Id System
/dev/sdg1 1 389 196024+ 83 Linux
/dev/sdg2 390 775221 390515328 fd Linux raid autodetect
$ sudo fdisk -l /dev/sdh
* CONSOLE-OUT *
Disk /dev/sdh: 400.1 GB, 400088457216 bytes
16 heads, 63 sectors/track, 775221 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes
Disk identifier : 0x00000000
Device Boot Start End Blocks Id System
/dev/sdh1 1 389 196024+ 83 Linux
/dev/sdh2 390 775221 390515328 fd Linux raid autodetect
$ sudo fdisk -l /dev/sdi
* CONSOLE-OUT *
Disk /dev/sdi: 400.1 GB, 400088457216 bytes
16 heads, 63 sectors/track, 775221 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes
Disk identifier : 0x00000000
Device Boot Start End Blocks Id System
/dev/sdi1 1 389 196024+ 83 Linux
/dev/sdi2 390 775221 390515328 fd Linux raid autodetect
------
[T003] Using palimpsest to view disks partition structures.
------
* ACTION *
For this diagnostic, I am using graphical disk manager application
"palimpsest" under Gnome to visualize the 4 USB disk devices /dev/sdf,
/dev/sdg, /dev/sdh, /dev/sdi in order to confirm the results I got
from previous fdisk -l commands.
* RESULTS *
- See attached images 01 to 04.
- The 4 USB disks are correctly displayed in the disk tree (image01.png)
- For each USB disk, there is an unknown or unused 201MB partition
(image02.png)
- Each Seagate disk contains a second partition labelled "Linux Raid
Member" (image03.png)
- The 4 disks are detected as a coherent RAID drive (image04.png)
- The assembled filesystem is reported "mountable" by Palimpset (image05.png)
* COMMENTS *
- On image 04, one notices that only the second partition (/dev/sdf2,
/dev/sdg2, /dev/sdh2, /dev/sdi2) typed as "linux raid member"
partition of each disk is used for assembling the final RAID drive.
- The assembled filesystem is reported to be an ext2 filesystem
- I am unable to mount the RAID filesystem by using Palimpsest.
------
[T004] Using mdadm to assemble the full disks as one single RAID-0 device.
------
* ACTION *
I am using standard raid management tool mdadm to assemble the 4 USB
physical disk as one single RAID device. I am using switch -A (not
switch --create) because I have already created the array previously
and regenerated the persistent superblocks on each disks.
Nevertheless, please note that I am explicitly mentioning which
devices (and their order) are involved in the assembled array.
$ sudo mdadm -A /dev/md0 /dev/sdf /dev/sdg /dev/sdh /dev/sdi
^ ^ ^ ^
| | | |
DISK 1 DISK 2 DISK 3 DISK 4
* CONSOLE-OUT *
mdadm: no recogniseable superblock on /dev/sdf
mdadm: /dev/sdf has no superblock - assembly aborted
* COMMENTS *
This error seems logical: for each disk, only the second partition,
labelled "Linux raid member" is supposed to be part of the RAID array.
------
[T005] mdadm to assemble the "linux raid" partitions as one single
RAID-0 device.
------
* ACTION *
Same test as [TEST004]. But this time, I am explicitly assembling the
"linux raid member" partition of each disk. See [T001] for the
partitions of each disk.
$ sudo mdadm -A /dev/md0 /dev/sdf2 /dev/sdg2 /dev/sdh2 /dev/sdi2
^ ^ ^ ^
| | | |
RAID comp.1 RAID comp.2 RAID comp.3 RAID comp.4
* CONSOLE-OUT *
mdadm: /dev/md0 has been started with 4 drives.
* messages.log *
May 15 16:42:49 obelix kernel: [183920.968499] md: md0 stopped.
May 15 16:42:49 obelix kernel: [183921.161066] md: bind<sdg2>
May 15 16:42:49 obelix kernel: [183921.173482] md: bind<sdh2>
May 15 16:42:49 obelix kernel: [183921.181697] md: bind<sdi2>
May 15 16:42:49 obelix kernel: [183921.183694] md: bind<sdf2>
May 15 16:42:49 obelix kernel: [183921.186312] raid0: looking at sdf2
May 15 16:42:49 obelix kernel: [183921.186318] raid0: comparing
sdf2(781030528)
May 15 16:42:49 obelix kernel: [183921.186323] with sdf2(781030528)
May 15 16:42:49 obelix kernel: [183921.186327] raid0: END
May 15 16:42:49 obelix kernel: [183921.186330] raid0: ==> UNIQUE
May 15 16:42:49 obelix kernel: [183921.186333] raid0: 1 zones
May 15 16:42:49 obelix kernel: [183921.186337] raid0: looking at sdi2
May 15 16:42:49 obelix kernel: [183921.186342] raid0: comparing
sdi2(781030528)
May 15 16:42:49 obelix kernel: [183921.186346] with sdf2(781030528)
May 15 16:42:49 obelix kernel: [183921.186349] raid0: EQUAL
May 15 16:42:49 obelix kernel: [183921.186353] raid0: looking at sdh2
May 15 16:42:49 obelix kernel: [183921.186358] raid0: comparing
sdh2(781030528)
May 15 16:42:49 obelix kernel: [183921.186362] with sdf2(781030528)
May 15 16:42:49 obelix kernel: [183921.186365] raid0: EQUAL
May 15 16:42:49 obelix kernel: [183921.186369] raid0: looking at sdg2
May 15 16:42:49 obelix kernel: [183921.186374] raid0: comparing
sdg2(781030528)
May 15 16:42:49 obelix kernel: [183921.186378] with sdf2(781030528)
May 15 16:42:49 obelix kernel: [183921.186381] raid0: EQUAL
May 15 16:42:49 obelix kernel: [183921.186384] raid0: FINAL 1 zones
May 15 16:42:49 obelix kernel: [183921.186393] raid0: done.
May 15 16:42:49 obelix kernel: [183921.186397] raid0 : md_size is
3124122112 sectors.
May 15 16:42:49 obelix kernel: [183921.186401] ******* md0
configuration *********
May 15 16:42:49 obelix kernel: [183921.186405] zone0=[sdf2/sdg2/sdh2/sdi2/]
May 15 16:42:49 obelix kernel: [183921.186417] zone offset=0kb
device offset=0kb size=1562061056kb
May 15 16:42:49 obelix kernel: [183921.186421]
**********************************
May 15 16:42:49 obelix kernel: [183921.186423]
May 15 16:42:49 obelix kernel: [183921.186446] md0: detected capacity
change from 0 to 1599550521344
May 15 16:42:49 obelix kernel: [183921.194595] md0: unknown partition table
* COMMENTS *
The command now apparently worked.The RAID array seems to be
assembled. In test [T006] below, I am performing simple RAID diagnosis
using the mdadm command.
* QUESTION *
Messages.log indicates that there is no partition table available on
device /dev/md0. Is this normal?
------
[T006] Diagnosing the assembled RAID array using mdadm
------
* ACTION *
Listing the assembled arrays at kernel
$ sudo cat /proc/mdstat
* CONSOLE-OUT *
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : active raid0 sdf2[0] sdi2[3] sdh2[2] sdg2[1]
1562061056 blocks 64k chunks
* COMMENTS *
The kernel sees a RAID-0 device /dev/md0 assembled with the following
disks devices ordered as /dev/sdf2, /dev/sdg2, /dev/sdh2, and
/dev/sdi2.
* ACTION *
Let's get details about the assembled RAID-0 device /dev/md0
$ sudo mdadm -D /dev/md0
* CONSOLE-OUT *
/dev/md0:
Version : 00.90
Creation Time : Fri Feb 19 01:23:02 2010
Raid Level : raid0
Array Size : 1562061056 (1489.70 GiB 1599.55 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Fri Feb 19 01:23:02 2010
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Chunk Size : 64K
UUID : ecfe8404:2f354a45:d66856da:8e136666
Events : 0.1
Number Major Minor RaidDevice State
0 8 82 0 active sync /dev/sdf2
1 8 98 1 active sync /dev/sdg2
2 8 114 2 active sync /dev/sdh2
3 8 130 3 active sync /dev/sdi2
* COMMENTS *
This result seems consistent!
* ACTION *
Let's get details about RAID component partition /dev/sdf2 (DISK 1) with mdadm:
$ sudo mdadm -E /dev/sdf2
* CONSOLE-OUTPUT *
/dev/sdf2:
Magic : a92b4efc
Version : 00.90.00
UUID : ecfe8404:2f354a45:d66856da:8e136666
Creation Time : Fri Feb 19 01:23:02 2010
Raid Level : raid0
Used Dev Size : 0
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Fri Feb 19 01:23:02 2010
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : c0d7901b - correct
Events : 1
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 34 0 active sync /dev/sdc2
0 0 8 34 0 active sync /dev/sdc2
1 1 8 50 1 active sync
2 2 8 66 2 active sync
3 3 8 82 3 active sync /dev/sdf2
* COMMENTS *
This result does not look not consistent:
- Why is /dev/sdc2 mentioned here as the current device? Should be /dev/sdf2.
- Why devices 1 and 2 are left blank?
- Why is device /dev/sdf2 (current device) mentioned as device 3?
* ACTION *
Let's get details about RAID component partition /dev/sdg2 (DISK 2) with mdadm:
$ sudo mdadm -E /dev/sdg2
* CONSOLE-OUTPUT *
/dev/sdg2:
Magic : a92b4efc
Version : 00.90.00
UUID : ecfe8404:2f354a45:d66856da:8e136666
Creation Time : Fri Feb 19 01:23:02 2010
Raid Level : raid0
Used Dev Size : 0
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Fri Feb 19 01:23:02 2010
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : c0d7902d - correct
Events : 1
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 50 1 active sync
0 0 8 34 0 active sync /dev/sdc2
1 1 8 50 1 active sync
2 2 8 66 2 active sync
3 3 8 82 3 active sync /dev/sdf2
* COMMENTS *
This result does not look consistent:
- Why is (blank) mentioned here as the current device? Should be /dev/sdg2.
- Why devices 1 and 2 are left blank?
- Why is device /dev/sdf2 mentioned as device 3?
* ACTION *
Let's get details about RAID component partition /dev/sdh2 (DISK 3) with mdadm:
$ sudo mdadm -E /dev/sdh2
* CONSOLE-OUTPUT *
/dev/sdh2:
Magic : a92b4efc
Version : 00.90.00
UUID : ecfe8404:2f354a45:d66856da:8e136666
Creation Time : Fri Feb 19 01:23:02 2010
Raid Level : raid0
Used Dev Size : 0
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Fri Feb 19 01:23:02 2010
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : c0d7903f - correct
Events : 1
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 8 66 2 active sync
0 0 8 34 0 active sync /dev/sdc2
1 1 8 50 1 active sync
2 2 8 66 2 active sync
3 3 8 82 3 active sync /dev/sdf2
* COMMENTS *
This result does not look consistent:
- Why is (blank) mentioned here as the current device? Should be /dev/sdh2.
- Why devices 1 and 2 are left blank?
- Why is device /dev/sdf2 mentioned as device 3?
* ACTION *
Let's get details about RAID component partition /dev/sdi2 (DISK 4) with mdadm:
$ sudo mdadm -E /dev/sdi2
* CONSOLE-OUTPUT *
/dev/sdi2:
Magic : a92b4efc
Version : 00.90.00
UUID : ecfe8404:2f354a45:d66856da:8e136666
Creation Time : Fri Feb 19 01:23:02 2010
Raid Level : raid0
Used Dev Size : 0
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Fri Feb 19 01:23:02 2010
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : c0d79051 - correct
Events : 1
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 8 82 3 active sync /dev/sdf2
0 0 8 34 0 active sync /dev/sdc2
1 1 8 50 1 active sync
2 2 8 66 2 active sync
3 3 8 82 3 active sync /dev/sdf2
* COMMENTS *
This result does not look consistent:
- Why is (blank) mentioned here as the current device? Should be /dev/sdi2.
- Why devices 1 and 2 are left blank?
- Why is device /dev/sdf2 mentioned as device 3?
------
[T007] Mounting the assembled RAID's filesystem as ext3-fs.
------
$ sudo mount -t ext3 /dev/md0 /media/N4100
* CONSOLE-OUT *
mount: wrong fs type, bad option, bad superblock on /dev/md0,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
* kern.log *
May 15 16:48:08 obelix kernel: [184240.160548] EXT3-fs error (device
md0): ext3_check_descriptors: Block bitmap for group 1920 not in group
(block 0)!
May 15 16:48:08 obelix kernel: [184240.163677] EXT3-fs: group
descriptors corrupted!
* COMMENTS *
There is an obvious filesystem issue on the assembled filesystem,
which seems related with corrupted ext3 filesystem descriptors.
* ACTION *
I re-issue the mount command, this time not forcing the filesystem type:
$ sudo mount /dev/md0 /media/N4100
* CONSOLE-OUT *
mount: wrong fs type, bad option, bad superblock on /dev/md0,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
* kern.log *
May 15 16:51:42 obelix kernel: [184453.959766] EXT2-fs error (device
md0): ext2_check_descriptors: Block bitmap for group 1920 not in group
(block 0)!
May 15 16:51:42 obelix kernel: [184453.959783] EXT2-fs: group
descriptors corrupted!
* QUESTION *
Is this issue related with the apparent inconsistencies of the mdadm
diagnosis performed on each individual disk in [T006] ?
* COMMENTS *
In its current state, the RAID filesystem of device /dev/md0 cannot be
mounted and exhibits severe inconsistencies...
------
[T008] RAID array device /dev/md0 geometry and partitioning information
------
$ sudo fdisk -l /dev/md0
* CONSOLE-OUT *
Disk /dev/md0: 1599.6 GB, 1599550521344 bytes
2 heads, 4 sectors/track, 390515264 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk identifier : 0x00000000
Disk /dev/md0 doesn't contain a valid partition table.
[QUESTIONS]
Is there anything to fix here? How?
------
[T009] Checking what is wrong on /dev/md0 filesystem by means of fsck.ext3
------
* COMMENTS *
Not performing any write action on the assembled physical array...
$ sudo e2fsck -n /dev/md0
* CONSOLE-OUT *
e2fsck 1.41.9 (22-Aug-2009)
e2fsck: Group descriptors look bad... trying backing blocks...
the superbloc has an invalid journal (i-node 8).
Delete ? no
e2fsck: Illegal inode number while checking ext3 journal for /dev/md0
* COMMENTS *
This diagnostic is insufficient for now but I do not want to perform
any intrusive diagnostic on the physical disks.
PART 2: TESTS USING testdisk on disk images
*******************************************
------
[T010] Global analysis of the assembled raid image /dev/md0
------
* COMMENTS *
I am performing a testdisk analysis on the final RAID device assembled
from the 4 disk images disk0.hd, disk1.hd, disk2.hd and disk3.hd.
$ sudo testdisk /dev/md0
* CONSOLE-OUT *
Screen 1
---------------------------------------------------------------------
TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
TestDisk is free software, and
comes with ABSOLUTELY NO WARRANTY.
Select a media (use Arrow keys, then press Enter):
Disk /dev/md0 - 1599 GB / 1489 GiB
[Proceed ] [ Quit ]
Note: Disk capacity must be correctly detected for a successful recovery.
If a disk listed above has incorrect size, check HD jumper settings, BIOS
detection, and install the latest OS patches and disk drivers.
---------------------------------------------------------------------
Screen 2
---------------------------------------------------------------------
TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
Disk /dev/md0 - 1599 GB / 1489 GiB
Please select the partition table type, press Enter when done.
[Intel ] Intel/PC partition
[EFI GPT] EFI GPT partition map (Mac i386, some x86_64...)
[Mac ] Apple partition map
[None ] Non partitioned media
[Sun ] Sun Solaris partition
[XBox ] XBox partition
[Return ] Return to disk selection
Note: Do NOT select 'None' for media with only a single partition. It's very
rare for a drive to be 'Non-partitioned'.
---------------------------------------------------------------------
* ACTION *
I select none: indeed, according to my assumption, /dev/md0 SHOULD be a an
ext3-fs filesystem, and therefore does not contain sub-partitions.
Screen 3
---------------------------------------------------------------------
TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
Disk /dev/md0 - 1599 GB / 1489 GiB - CHS 390515264 2 4
[ Analyse ] Analyse current partition structure and search for lost partitions
[ Advanced ] Filesystem Utils
[ Geometry ] Change disk geometry
[ Options ] Modify options
[ Quit ] Return to disk selection
Note: Correct disk geometry is required for a successful recovery. 'Analyse'
process may give some warnings if it thinks the logical geometry is mismatched.
---------------------------------------------------------------------
* ACTION *
I select option Analyse
Screen 4
---------------------------------------------------------------------
TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
Disk /dev/md0 - 1599 GB / 1489 GiB - CHS 390515264 2 4
Current partition structure:
Partition Start End Size in sectors
P ext2 0 0 1 390515263 1 4 3124122112
[Quick Search]
Try to locate partition
---------------------------------------------------------------------
* ACTION *
I select Quick Search. The Quik Search analysis gets started... and
the following result is displayed.
Screen 5
---------------------------------------------------------------------
TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
Disk /dev/md0 - 1599 GB / 1489 GiB - CHS 390515264 2 4
Partition Start End Size in sectors
P ext2 0 0 1 390515199 1 4 3124121600
Structure: Ok.
Keys T: change type, P: list files,
Enter: to continue
EXT2 Large file Sparse superblock, 1599 GB / 1489 GiB
---------------------------------------------------------------------
* ACTION *
I press P to list the files on this filesystem.
Screen 6
---------------------------------------------------------------------
TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
P ext2 0 0 1 390515199 1 4 3124121600
Directory /
No file found, filesystem seems damaged.
Use Right arrow to change directory, c to copy,
h to hide deleted files, q to quit
---------------------------------------------------------------------
* COMMENTS *
What is wrong?
* ACTION *
I press Q to return to screen 5. Then I press Enter to continue.
Screen 7
---------------------------------------------------------------------
TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
Disk /dev/md0 - 1599 GB / 1489 GiB - CHS 390515264 2 4
Partition Start End Size in sectors
P ext2 0 0 1 390515199 1 4 3124121600
Write isn't available because the partition table type "None" has been selected.
[ Quit ] [Deeper Search]
Try to find more partitions
---------------------------------------------------------------------
* COMMENTS *
So, what can I do? I cannot write partition organization to the disk
because I have selected "none" as the partition structure for the
analysis... How can I practically modify that?
------
[T011] Analysis of the unkown 201MB partition.
------
* NOTES *
I execute this testdisk analysis on the first loopback partition
mapped as /dev/mapper/loop0p1.
Unlike the /dev/md0 device, it seems I cannot perform any RAID
assembling of the 4 p1 partitions /dev/mapper/loop0p1,
/dev/mapper/loop1p1, /dev/mapper/loop2p1 and /dev/mapper/loop3p1,
because this disk zone does not seem to contain any RAID superblocks.
In clear, the 201 Unknown zone reported on each disk DOES NOT look
like a RAID partition (unless its type was accidentally changed by the
above manipulations). Therefore, and unlike assembled device /dev/md0,
I am forced to run TestDisk one 1 disk image.
I select the first disk image /dev/loop0, and I therefore execute
TestDisk on its p1 partition as follows:
$ sudo test /dev/mapper/loop0p1
* CONSOLE-OUT *
Screen 1
---------------------------------------------------------------------
TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
TestDisk is free software, and
comes with ABSOLUTELY NO WARRANTY.
Select a media (use Arrow keys, then press Enter):
Disk /dev/mapper/loop0p1 - 200 MB / 191 MiB
[Proceed ] [ Quit ]
Note: Disk capacity must be correctly detected for a successful recovery.
If a disk listed above has incorrect size, check HD jumper settings, BIOS
detection, and install the latest OS patches and disk drivers.
---------------------------------------------------------------------
Screen 2
---------------------------------------------------------------------
TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
Disk /dev/mapper/loop0p1 - 200 MB / 191 MiB
Please select the partition table type, press Enter when done.
[Intel ] Intel/PC partition
[EFI GPT] EFI GPT partition map (Mac i386, some x86_64...)
[Mac ] Apple partition map
[None ] Non partitioned media
[Sun ] Sun Solaris partition
[XBox ] XBox partition
[Return ] Return to disk selection
Note: Do NOT select 'None' for media with only a single partition. It's very
rare for a drive to be 'Non-partitioned'.
---------------------------------------------------------------------
* ACTION *
I select Intel/PC partition just in case this zone would contain some
deleted partition.
Screen 3
---------------------------------------------------------------------
TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
Disk /dev/mapper/loop0p1 - 200 MB / 191 MiB - CHS 392049 1 1
Current partition structure:
Partition Start End Size in sectors
Partition sector doesn't have the endmark 0xAA55
*=Primary bootable P=Primary L=Logical E=Extended D=Deleted
[Quick Search]
Try to locate partition
---------------------------------------------------------------------
* COMMENTS *
Where does the end 0xAA55 error come from?
* ACTION *
I select the Quick Search option, and I get the following result.
Screen 4
---------------------------------------------------------------------
TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
Disk /dev/mapper/loop0p1 - 200 MB / 191 MiB - CHS 392049 1 1
Partition Start End Size in sectors
No partition found or selected for recovery
[ Quit ] [Deeper Search]
Try to find more partitions
---------------------------------------------------------------------
* COMMENTS *
Unknown zone p1 does not contain any partition.
------
[T012] Analysis of one RAID disk image /dev/loop0
------
* NOTES *
I execute this testdisk analysis on the first loopback disk mapped as
/dev/loop0.
Identical results would also be found performing a testdisk analysis
on images /dev/loop1, /dev/loop2, or /dev/loop3.
Please note that:
- I ONLY perform an analysis on ONE disk image, not on the entire
/dev/md0 RAID device image,
- I am performing the analysis of ONE whole disk image, unlike test
[T010] where I ONLY analyzed the 201 MB unknown partition.
By doing this test, I expect TestDisk will give me accurate partition
information about each individual disk involved in the RAID array, and
in particular, I do hope I will get more accurate information about
this 201 MB zone.
$ sudo test /dev/loop0
* CONSOLE-OUT *
Screen 1
---------------------------------------------------------------------
TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
TestDisk is free software, and
comes with ABSOLUTELY NO WARRANTY.
Select a media (use Arrow keys, then press Enter):
Disk /dev/loop0 - 400 GB / 372 GiB
[Proceed ] [ Quit ]
Note: Disk capacity must be correctly detected for a successful recovery.
If a disk listed above has incorrect size, check HD jumper settings, BIOS
detection, and install the latest OS patches and disk drivers.
---------------------------------------------------------------------
Screen 2
---------------------------------------------------------------------
TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
Disk /dev/loop0 - 400 GB / 372 GiB
Please select the partition table type, press Enter when done.
[Intel ] Intel/PC partition
[EFI GPT] EFI GPT partition map (Mac i386, some x86_64...)
[Mac ] Apple partition map
[None ] Non partitioned media
[Sun ] Sun Solaris partition
[XBox ] XBox partition
[Return ] Return to disk selection
Note: Do NOT select 'None' for media with only a single partition. It's very
rare for a drive to be 'Non-partitioned'.
---------------------------------------------------------------------
* ACTION *
I select Intel/PC partition, because I assume there should be a
regular partition available in which the RAID ext2 (or 3) partition is
referenced.
Screen 3
---------------------------------------------------------------------
TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
Disk /dev/loop0 - 400 GB / 372 GiB - CHS 781422768 1 1
[ Analyse ] Analyse current partition structure and search for lost partitions
[ Advanced ] Filesystem Utils
[ Geometry ] Change disk geometry
[ Options ] Modify options
[ MBR Code ] Write TestDisk MBR code to first sector
[ Delete ] Delete all data in the partition table
[ Quit ] Return to disk selection
Note: Correct disk geometry is required for a successful recovery. 'Analyse'
process may give some warnings if it thinks the logical geometry is mismatched.
---------------------------------------------------------------------
* COMMENTS *
Disk Geometry information CHS=781422768 1 1 reported by Testdisk does
not match the CHS information reported by fdisk -l in test [T010].
I decide to correct Testdisk geometry parameters by replacing them
with parameters CHS=775221 16 63 reported by fdisk -l in test [T010].
(Note: I also performed test [T018] with no geometry information
change: the results of this test are not reported in this document,
because the results are inconsistent.)
* ACTION *
I select Analyse.
Screen 4
---------------------------------------------------------------------
TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
Disk /dev/loop0 - 400 GB / 372 GiB - CHS 775221 16 63
Current partition structure:
Partition Start End Size in sectors
No EXT2, JFS, Reiser, cramfs or XFS marker
1 P Linux 0 1 1 388 15 63 392049
1 P Linux 0 1 1 388 15 63 392049
2 P Linux RAID 389 0 1 775220 15 63 781030656 [md0]
No partition is bootable
[Quick Search] [ Backup ]
Try to locate partition
---------------------------------------------------------------------
* COMMENTS *
This time I seem to get some more information about the global
partition structure
of the disk:
- Partition 2 is obviously the RAID component partition
- Partition 1 is suposedly a Linux partition. But where is this
partition? Furthermore, there seems to be 2 traces of the same
partition... Was a second partition created on top of an older one?
Up to now, there seems to be a ray of hope: the RAID partition is
effectively referenced in the partition table of a RAID disk, AND
there also seems to be a Linux partition, probably damaged. I suspect
that this Linux partition may have been created by the NAS Thecus
N4100 and may contain the SHARED FOLDERS configuration and access
rights...
Nevertheless, does the fact that I cannot see this Linux partition 1
prevent me from accessing and mouting the RAID partition 2?
* ACTION *
I select Quick search in order to perform some partition search.
Results are reported in Screen 5 below.
Screen 5
---------------------------------------------------------------------
TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
Disk /dev/loop0 - 400 GB / 372 GiB - CHS 775221 16 63
The harddisk (400 GB / 372 GiB) seems too small! (< 1599 GB / 1489 GiB)
Check the harddisk size: HD jumpers settings, BIOS detection...
The following partitions can't be recovered:
Partition Start End Size in sectors
Linux 389 0 1 3099715 15 47 3124121600
Linux 397 0 1 3099723 15 47 3124121600
[ Continue ]
EXT2 Large file Sparse superblock, 1599 GB / 1489 GiB
---------------------------------------------------------------------
* COMMENTS *
The hard disk seems to small!! How is this possible? What is wrong? I
am using the correct geometry information, am not I?
Screen 6
---------------------------------------------------------------------
TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
Disk /dev/loop0 - 400 GB / 372 GiB - CHS 775221 16 63
Warning: the current number of heads per cylinder is 16
but the correct value may be 255.
You can use the Geometry menu to change this value.
It's something to try if
- some partitions are not found by TestDisk
- or the partition table can not be written because partitions overlaps.
[ Continue ]
---------------------------------------------------------------------
* QUESTION *
What am I supposed to do now? The geometry of /dev/loop0 matches that
reported by the fdisk -l tests performed previously. There cannot be a
disk geometry issue, can it?
Screen 7
---------------------------------------------------------------------
TestDisk 6.11, Data Recovery Utility, April 2009
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
Disk /dev/loop0 - 400 GB / 372 GiB - CHS 775221 16 63
Partition Start End Size in sectors
L Linux RAID 775220 13 62 775220 15 63 128 [md0]
Structure: Ok. Use Up/Down Arrow keys to select partition.
Use Left/Right Arrow keys to CHANGE partition characteristics:
*=Primary bootable P=Primary L=Logical E=Extended D=Deleted
Keys A: add partition, L: load backup, T: change type,
Enter: to continue
md 0.90.0 Raid 0: devices 0(8,34)* 1(8,50) 2(8,66) 3(8,82), 65 KB / 64 KiB
---------------------------------------------------------------------
* COMMENTS *
Now, only the RAID partition is shown in the list. Linux partition 1
has disappeared... Why?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html