To me, things do not look good for a quick fix. It kinda looks like you
killed it. Any info about the details of how things died, and exactly
what you did after things atarted going south? What are you using for a
controller? It sounds like it is ready for the dump. Any messages from
the controller, itself?
b-
Kyler Laird wrote:
quoted text > Recently a drive failed on one of our file servers. The machine has
> three RAID6 arrays (15 1TB each plus spares). I let the spare rebuild
> and then started the process of replacing the drive.
>
> Unfortunately I'd misplaced the list of drive IDs so I generated a new
> list in order to identify the failed drive. I used "smartctl" and made
> a quick script to scan all 48 drives and generate pretty output. That
> was a mistake. After running it a couple times one of the controllers
> failed and several disks in the first array were failed.
>
> I worked on the machine for awhile. (It has an NFS root.) I got some
> information from it before it rebooted (via watchdog). I've dumped all
> of the information here.
>
http://lairds.us/temp/ucmeng_md/
>
> In mdstat_0 you can see the status of the arrays right after the
> controller failure. mdstat_1 shows the status after reboot.
>
> sys_block shows a listing of the block devices so you can see that the
> problem drives are on controller 1.
>
> The examine_sd?1 files show -E output from each drive in md0. Note that
> the Events count is different for the drives on the problem controller.
>
> I'd like to know if this is something I can recover. I do have backups
> but it's a huge pain to recover this much data.
>
> Thank you.
>
> --kyler
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to
majordomo@vger.kernel.org
> More majordomo info at
http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to
majordomo@vger.kernel.org
More majordomo info at
http://vger.kernel.org/majordomo-info.html