On 9/14/07, Andy Whitcroft <apw@shadowen.org> wrote:
Sorry to confirm this. My RAID5 got destroyed a second time.
To summarize what worked / not worked / and seems to work for me:
First 2 tries with unpatched rc4-mm1: Both times one sata_sil24-drive got kicked
Then I switched back to rc3-mm1, 18 boots with that kernel worked.
Then I tried the patched rc4-mm1 and it worked too.
The next boot also worked, but the third time kicked a drive out again.
But as nobody reads logs, I did not notice that and keep using the
patched rc4-mm1.
The next 5 times the system worked normally with the two remaining drives.
The sixth boot kicked the second sata_sil24 drive. That I did notice...
After reassembling the RAID, I'm now back to the patch rc4-mm1 that
did boot correctly this time.
So the patch just makes it unlikelier to hit the bug. Instead of
failing 2 out of 2 times, it only failed 2 out of 8 times.
I compared the rc4-mm1 boot from a working case and the case where it
kicked the first drive. Nothing seems to stand out...
< == good rc4-mm1 boot
145c145
< CPU 0: aperture @ 4000000 size 32 MB
---
154c154
< Calibrating delay using timer specific routine.. 5203.23 BogoMIPS
(lpj=26016160)
---
169c169
< APIC timer calibration result 12499998
---
173c173
< Calibrating delay using timer specific routine.. 5222.40 BogoMIPS
(lpj=26112010)
---
182c182
< Calibrating delay using timer specific routine.. 5222.73 BogoMIPS
(lpj=26113694)
---
191c191
< Calibrating delay using timer specific routine.. 5223.07 BogoMIPS
(lpj=26115369)
---
269d268
< Switched to high resolution mode on CPU 3
270a270
502,509c502,509
< raid6: int64x1 2634 MB/s
< raid6: int64x2 3244 MB/s
< raid6: int64x4 3405 MB/s
< raid6: int64x8 2614 MB/s
< raid6: sse2x1 3607 MB/s
< raid6: sse2x2 4834 MB/s
< raid6: sse2x4 4946 MB/s
< raid6: using algorithm sse2x4 (4946 MB/s)
---
567c567
< md1: bitmap initialized from disk: read 10/10 pages, set 96 bits
---
568a569,655
571a659,663
576a669,672
Another good boot also showed the aperture at a similar high address:
CPU 0: aperture @ b7f2000000 size 32 MB
And that good boot also showed the "correct" BogoMIPS:
Calibrating delay using timer specific routine.. 5205.43 BogoMIPS (lpj=26027183)
Calibrating delay using timer specific routine.. 5200.01 BogoMIPS (lpj=26000052)
Calibrating delay using timer specific routine.. 5200.01 BogoMIPS (lpj=26000082)
Calibrating delay using timer specific routine.. 5200.03 BogoMIPS (lpj=26000166)
Anything more I can provide to help debugging this?
Torsten
-