Re: [smartmontools-support] exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Jonas Petersson
Date: Sunday, August 31, 2008 - 3:00 am

Hi again Justin,

Justin Piszcz skrev:

Very much so, yes.

At best, all disk access will hang for a while and then resume after the 
reset has worked out - this often happens a couple of times per day now.

At worst, the reset will not work and the disk is remounted read-only 
and I can sort of use the system a bit this way. It seems somewhat 
random how much still works: Up until today I could at least always use 
dmesg and tail various logs to try to hunt down what happened, but this 
morning dmesg could not be found and I got I/O errors when accessing 
anything in /var/log. Rebooting helped as usual.

This fatal variant has happened about every second day lately.

The first two weeks I had the system showed nothing at all like this: I 
have log files since July 26 and the first recorded (reset-able) glitch 
is from Aug 16. Obviously, any non-resetable problem would have been 
easy to spot.


Yes, I would not point fingers to the ICH8 chipset either: The other 
MacBookPro I have experimented with now is a 2,2 (ATI based) and has 
ICH7, but I'm 99.9% sure my previous MacBookPro 3,1 (nvidia based) was 
ICH8 and it worked flawlessly (I saw no reason to swap for the 4,1 
version, but it was stolen from me in June). As far as I know the 
significant differences with my current MBP are just: higher screen 
resolution, multitouch ("iphone") touchpad and more memory. Alas, I 
didn't keep a lshw dump.


I'll just clarify that the errno after "revalidation failed" is not 
always -5. When it ends up fatal I've also seen -3 and possibly 
something else too. I would have taken a screen shot this morning if 
only dmesg had worked. :-(



For the record: My current theory is that it is some kind of hardware 
problem - either in the disk or on the motherboard so I have persuaded 
my local AppleStore to swap the harddisk on Monday and then they will 
run their full hardware stress test (4+ hours according to him). The 
stress test was apparently suggested from the central repair people (who 
have no idea I run Linux on it - the local techie knows, but has no 
problem with it as long as I keep a small OSX partition) so I guess this 
sort of hints that they are aware of hardware issues.

(Note: I've had the same techie replace a broken motherboard in the past 
when the Linux messages where at least as clear as the OSX ones - in 
that case drives would in the end only show up in the boot menue when 
the system had cooled down for at least 20 minutes. To be on the safe 
side, I've upped the minimum fan speed by 50% to ensure all sensors give 
me happy readings all the time - luckily the 4,1 fans are very silent 
compared to the 2,2)

I hope to have everything back in shape on Wednesday and I'll let you 
know how it fares.

BTW: For a while I displayed the hddtemp sensor all the time along with 
coretemp etc, but I now understand that this is also SMART based so I've 
turned it off in the past weeks experimentation. Again, it seemed to 
work flawlessly for months on my previous (stolen) MBP 3,1.

			Best / Jonas
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [smartmontools-support] exception Emask 0x0 SAct 0x0 S ..., Jonas Petersson, (Sun Aug 31, 3:00 am)