I am not a subscriber to this list. Please cc me in correspondence.
Thanks for your consideration.
Bob Montgomery
Contracting at HP, Fort Collins
-----------------------------
Incorrect method of reading the RTC
I think the code used to read the RTC in arch/x86_64/kernel/time.c is
incorrect. This code was added in patch:
2006-03-28 Matt Mackall [PATCH] RTC: Remove RTC UIP synchronization on x86_64
I have been investigating intermittent boot problems on x86_64 systems
running a variant of 2.6.18.
My tests were originally designed to test kdump, by hanging a CPU and
allowing the nmi_watchdog to crash the system to obtain the dumpfile.
During these tests, the system fails to come back up one out of every
100 (or 300?) or so tests with:
"ALERT! /dev/sda2 does not exist. Dropping to a shell!"
This occurs when udevsettle returns immediately instead of waiting
the usual 15-18 seconds for the disks to get ready. And *that*
happens when the system wakes up thinking the year is 2135.
Sun Sep 25 07:07:56 UTC 2135
That's the result of BCD conversion and mktime() when every field
of the RTC time and date comes back "0xbf". (Note, the example time
shown had advanced since being initially set.)
The code that reads the Real Time Clock during boot up (currently in
arch/x86_64/kernel/time.c:get_persistent_time, but in get_cmos_time
on my kernel) uses a loop sort of like this:
spin_lock_irqsave(&rtc_lock, flags);
do {
sec = CMOS_READ(RTC_SECONDS);
min = CMOS_READ(RTC_MINUTES);
hour = CMOS_READ(RTC_HOURS);
day = CMOS_READ(RTC_DAY_OF_MONTH);
mon = CMOS_READ(RTC_MONTH);
year = CMOS_READ(RTC_YEAR);
} while (sec != CMOS_READ(RTC_SECONDS));
spin_unlock_irqrestore(&rtc_lock, flags);
The idea appears to be that if a final read of RTC_SECONDS returns
the same value as the initial read, then it has successfully avoided
the update cycle of the RTC and thus acquired good values. I don't
believe that...