Hi, I've been running an athlon64 in 64-bit mode without problems,
up to and incluing 2.6.19.1. A couple of weeks ago I decided to use
it for testing x86 builds, since then it's been nothing but trouble
in 32-bit mode. It still works fine when I boot it in 64-bit mode.I already had a 32-bit system on the disk, but it was somewhat old
(gcc-3.4.6, udev from a long while ago, glibc-2.3.4) so I wasn't
totally surprised when it started to spontaneously reboot.Eventually, I installed a recent system to build a fresh 32-bit
system. Still suffered from reboots - sometimes within a few
minutes of booting, sometimes it was fine for hours. Tried various
versions of 2.6.18.x, eventually thought it was ok, built my new
system in several stages. On Saturday it was running the fresh system
under 2.6.18.6 for most of the day without problems (although
admittedly the first part was from the console, and only the last 2
or 3 hours were running X). Yesterday I left it building arts, and
it rebooted. It was then able to finish building much of kde, and
then built 2.6.19.1. Booted into 2.6.19.1, spent several hours using
the desktop and running compiles and tests.Today, in 2.6.19.1, the keyboard LEDs for caps and scroll lock
started flashing about 30 minutes after I'd booted it, so I guess it
had oopsed. Unfortunately, nothing from the oops made it to the
logs, although SysRq+b worked, so I guess I need to look at the
SysRq options before it happens again.So, at the moment I've still got nothing in the logs from any of
this, and it isn't predictable. This all happens while running X
(originally 6.8.2, now 7.1). I'm beginning to despair of getting
any indications about what is going wrong. Any suggestions, please
?Current ver_linux and config follow.
Ken
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.Linux bluesbreaker 2.6.19.1 #1 PREEMPT Sun Dec 31 17:44:47 GMT 2006 i68...
Use work_on_cpu instead of cpumask games.
Doug: note one subtle change. If we can't get to CPU 0 for some
reason, the return will be -EINVAL not -EBUSY.Doug: want me to fix this in the caller? ie:
return work_on_cpu(0, generate_smi, smi_cmd) == 0 ? 0 : -EBUSY;
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Douglas_Warzecha@dell.com
---
drivers/firmware/dcdbas.c | 46 +++++++++++++++++++---------------------------
1 file changed, 19 insertions(+), 27 deletions(-)diff -r 155c7f2e2e30 drivers/firmware/dcdbas.c
--- a/drivers/firmware/dcdbas.c Thu Oct 23 11:22:04 2008 +1100
+++ b/drivers/firmware/dcdbas.c Thu Oct 23 11:23:38 2008 +1100
@@ -237,31 +237,9 @@ static ssize_t host_control_on_shutdown_
return count;
}-/**
- * smi_request: generate SMI request
- *
- * Called with smi_data_lock.
- */
-static int smi_request(struct smi_cmd *smi_cmd)
+static long generate_smi(void *_smi_cmd)
{
- cpumask_t old_mask;
- int ret = 0;
-
- if (smi_cmd->magic != SMI_CMD_MAGIC) {
- dev_info(&dcdbas_pdev->dev, "%s: invalid magic value\n",
- __func__);
- return -EBADR;
- }
-
- /* SMI requires CPU 0 */
- old_mask = current->cpus_allowed;
- set_cpus_allowed_ptr(current, &cpumask_of_cpu(0));
- if (smp_processor_id() != 0) {
- dev_dbg(&dcdbas_pdev->dev, "%s: failed to get CPU 0\n",
- __func__);
- ret = -EBUSY;
- goto out;
- }
+ struct smi_cmd *smi_cmd = _smi_cmd;/* generate SMI */
asm volatile (
@@ -273,10 +251,24 @@ static int smi_request(struct smi_cmd *s
"c" (smi_cmd->ecx)
: "memory"
);
+ return 0;
+}-out:
- set_cpus_allowed_ptr(current, &old_mask);
- return ret;
+/**
+ * smi_request: generate SMI request
+ *
+ * Called with smi_data_lock.
+ */
+static int smi_request(struct smi_cmd *smi_cmd)
+{
+ if (smi_cmd->magic != SMI_CMD_MAGIC) {
+ dev_info(&dcdbas_pdev->dev, "%s: invalid magic value\n",
+ __func__);
+ return -EBADR;
+ }
+
+ /* SMI requires CPU 0 ...
A shot in the dark at the spontaneous reset issue, but no clue on the 32 vs 64-bit observation...
See if ACPI exports any temperature readings under /proc/acpi/thermal_zone/*/temperature
and if so, keep an eye on them to see if there is an indication of a thermal problem.( And if ACPI doesn't, maybe lmsensors can find something.)
cheers,
-Len
-
Thanks, but there is nothing there. I never managed to get
lmsensors configured (as in 'calibrated') for the hardware I tried it
on, but I was starting to think about retrying it. But first, I'm
just about to start testing with memtest86+ in case something in the
memory has gone bad.ĸen
--
das eine Mal als Tragödie, das andere Mal als Farce
-
You might remove and re-insert the DIMMS.
Sometimes there are poor contacts if the DIMMS are not fully seated and clicked in.The real mystery is the 32 vs 64-bit thing.
Are the devices configured the same way -- ie are they both in IOAPIC mode
and /proc/interrupts looks the same for both modes?-Len
-
Too late, I've started memtest-86+. If it seems ok after an
overnight run, I'll take a look at /proc/interrupts. How can I tell
it is in IOAPIC mode, please ? Google was not helpful for this, but
if it's an override, the only things on my command lines are root=
and video= settings.Certainly, it seems likely that the configs could be fairly
different in their detail.Ken
--
das eine Mal als Tragödie, das andere Mal als Farce
-
(did anyone ever answer this?)
In IO-APIC mode, /proc/interrupts contains entries like these:
CPU0 CPU1
0: 121218123 0 IO-APIC-edge timer
1: 715259 0 IO-APIC-edge i8042
6: 5 0 IO-APIC-edge floppy
7: 0 0 IO-APIC-edge parport0
9: 0 0 IO-APIC-level acpi
12: 10011272 0 IO-APIC-edge i8042
14: 11561548 0 IO-APIC-edge ide0
66: 4525183 0 PCI-MSI libata
74: 1711 0 IO-APIC-level ehci_hcd:usb1, uhci_hcd:usb6
82: 4 0 IO-APIC-level ohci_hcd:usb2, ohci_hcd:usb3, ohci_hcd:usb4, ohci_hcd:usb5
98: 101326 0 PCI-MSI HDA Intel
106: 17747181 0 PCI-MSI eth0
169: 0 0 IO-APIC-level uhci_hcd:usb9
177: 3 0 IO-APIC-level ohci1394
185: 15 0 IO-APIC-level uhci_hcd:usb8, aic79xx
193: 427962 0 IO-APIC-level uhci_hcd:usb7, aic79xxIf not in IO-APIC mode, lots of those will say "XT-PIC" instead
---
~Randy
-
I eventually found it ("Local APIC support on uniprocessors") in
menuconfig. In the meantime, I'd moved my 32-bit activity to a
different box (also athlon64, but a bit faster) and I had one oops
on that. At least, I assume it was an oops - the caps and scroll
LEDs flashed, but I couldn't do anything with MagicSysrq, not even
force a reboot. Ran diff on the various configs, changed to IO-APIC
plus an unrelated change to use libata for the cdrom. The faster box
_seems_ stable (used for a couple of hours, and then for a whole day)
so I'm back on the original problem machine.Last night I reconfigured the kernel (select X86_UP_APIC, deselect
ACPI_VIDEO [ had been a module ], select ACPI_DEBUG, select PCI_MSI
(had been on in my 64-bit configs), removed some ATA/ATAPI drivers I
didn't need). I was running on the 'old' 2.6.19.1 while I built it,
and again got the flashing LEDs after the build, but nothing logged
although I was able to force a reboot with SysRq b.I guess that when it does have problems, it is mostly within 30
minutes of booting - otherwise, it can be up all day. So, for the
moment I'm hopeful that changing the config will help, but it will
be several days before I feel at all confident.Ken
--
das eine Mal als Tragödie, das andere Mal als Farce
-
Update: it doesn't seem to relate to using/not using APIC.
Last Thursday it twice failed during booting (from cold), with
messages all over the screen. Again, nothing reached the logs
despite my best attempts. The second time, I scrolled back to try
to copy it by hand, but there were several sets of messages and I'm
not sure if I'd managed to get to the first of them, or if that had
dropped out. It seemed to be something to do with highmem (also
noticed highmem in the screen messages the first time). I tried to
copy it by hand, but it all scrolled out before I'd copied anything
useful as a load of 'atkbd.c Spurious ACK on isa0060/serio' messages
appeared.Today, I've built 2.6.19.2 without highmem (the box only has 1GB,
dunno why I'd included that in the original config) and I will
continue to wait patiently for either a week without problems, or
something that I can manage to note - although I think at the moment
that the second coming of the great prophet Zarquon is more likely.Ken
--
das eine Mal als Tragödie, das andere Mal als Farce
-
Bizarre - it panic'd again last Thursday while I was in X, but I
still didn't manage to log any output. At the weekend, I had the
bright idea of using chattr +j on the syslog to try to journal any
data, since then it has been fine. So, it isn't down to highmem, and
I still can't trigger it reliably, or get any trace. Tried running
as x86_64 this morning (because cold starts on Thursdays seem
particularly problematic, perhaps it's a time/power-supply-noise
problem), then x86 from a cold start this afternoon.Time to hope it won't bite me too often, and move on to testing
2.6.20-rc6.Ken
--
das eine Mal als Tragödie, das andere Mal als Farce
-
Try disabling preempt.
-
Thanks for the suggestion - any particular reason ? That box is
running as 64-bit at the moment, and likely to stay that way for the
rest of this week while it builds userspace, but I'm not averse to
running 32-bit on it again if it serves a purpose.However, in the vain hope I can actually get it to log something,
wouldn't it be more useful to continue running _with_ preempt ?I'm starting to think it might have been yet another victim of the
solar flares, and the fact it was always running a 32-bit kernel
could have been coincidence.Ken
--
das eine Mal als Tragödie, das andere Mal als Farce
-
Obviously papering over a severe bug, but why is it necessary for you to run a
32bit kernel to test 32bit userspace? If your 64bit kernel is stable, use the
IA32 emulation surely?--
Cheers,
Alistair.Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
My 64-bit is pure64 on this machine, so it doesn't have any
suitable libs or tools. Anyway, I really do need a 32-bit kernel
to test some linuxfromscratch build instructions.Ken
--
das eine Mal als Tragödie, das andere Mal als Farce
-
Sorry, I think last night is still interfering with my own logic
circuits. Yes, I could use 'linux32' to change the personality as a
work-around now that I've built the system. Mainly, I was hoping
somebody would notice something bad in the config, but I might use
the work-around in the meantime. Thanks for reminding me about it.Ken
--
das eine Mal als Tragödie, das andere Mal als Farce
-
Personally when I built an embedded LFS for a customer, I wrote a dummy "arch"
and "uname" and then bootstrapped the 32bit LFS book, then built a cross
compiler with the CLFS book and built a 64bit kernel. Seemed to work okay.However, there isn't 100% compatibility in a 64bit kernel for all syscalls, I
think one of the VFAT syscall wrappers is currently broken.[ 5807.639755] ioctl32(war3.exe:4998): Unknown cmd fd(9) cmd(82187201){02}
arg(00221000) on /home/alistair/.wine/drive_c/Program Files/Warcraft IIIOther than that, I've had no problem with running a purely 32bit userspace.
--
Cheers,
Alistair.Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
| H. Peter Anvin | Re: [rft] s2ram wakeup moves to .c, could fix few machines |
| Greg Kroah-Hartman | [PATCH 002/196] Chinese: rephrase English introduction in HOWTO |
| Ingo Molnar | [patch] PID namespace design bug, workaround |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
git: | |
| Eric Dumazet | Re: Multicast packet loss |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
