Re: [i2c] Various problems on Axis 700 Lite VIA C7

Previous thread: [OOPS] AXIS 700 Lite (VIA C7 CPU) BUG with 2.6.23-rc9-git (i2c) by Guennadi Liakhovetski on Thursday, October 4, 2007 - 6:46 pm. (4 messages)

Next thread: race with page_referenced_one->ptep_test_and_clear_young and pagetable setup/pulldown by Jeremy Fitzhardinge on Thursday, October 4, 2007 - 9:43 pm. (12 messages)
To: <linux-kernel@...>
Cc: <linux-usb-devel@...>
Date: Thursday, October 4, 2007 - 7:19 pm

Booting git snapshot of about 6 hours ago, getting the following:

USB Universal Host Controller Interface driver v3.0
ACPI: PCI Interrupt Link [ALKB] enabled at IRQ 21
ACPI: PCI Interrupt 0000:00:10.0[A] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 18
ACPI: PCI interrupt for device 0000:00:10.0 disabled
uhci_hcd 0000:00:10.0: init 0000:00:10.0 fail, -16
uhci_hcd: probe of 0000:00:10.0 failed with error -16
ACPI: PCI Interrupt 0000:00:10.1[A] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 18
ACPI: PCI interrupt for device 0000:00:10.1 disabled
uhci_hcd 0000:00:10.1: init 0000:00:10.1 fail, -16
uhci_hcd: probe of 0000:00:10.1 failed with error -16
ACPI: PCI Interrupt 0000:00:10.2[B] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 18
ACPI: PCI interrupt for device 0000:00:10.2 disabled
uhci_hcd 0000:00:10.2: init 0000:00:10.2 fail, -16
uhci_hcd: probe of 0000:00:10.2 failed with error -16
ACPI: PCI Interrupt 0000:00:10.3[B] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 18
ACPI: PCI interrupt for device 0000:00:10.3 disabled
uhci_hcd 0000:00:10.3: init 0000:00:10.3 fail, -16
uhci_hcd: probe of 0000:00:10.3 failed with error -16
ACPI: PCI Interrupt 0000:00:10.4[C] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 18
ACPI: PCI interrupt for device 0000:00:10.4 disabled
ehci_hcd 0000:00:10.4: init 0000:00:10.4 fail, -16
ehci_hcd: probe of 0000:00:10.4 failed with error -16

With "pci=routeirq" it is the same, but then it's "IRQ 17" instead of 18,
and the line

ACPI: PCI Interrupt Link [ALKB] enabled at IRQ 21

is missing. Works with Debian etch default 2.6.18. /proc/interrupts under
.23-rc9-...:

$ cat /proc/interrupts
CPU0
0: 31756 IO-APIC-edge timer
1: 2 IO-APIC-edge i8042
8: 1 IO-APIC-edge rtc
9: 0 IO-APIC-fasteoi acpi
12: 4 IO-APIC-edge i8042
16: 2627 IO-APIC-fasteoi sata_via
19: 472 IO-APIC-fasteoi eth0

Under 2.6.18:

A...

To: Guennadi Liakhovetski <g.liakhovetski@...>
Cc: <linux-kernel@...>, <linux-usb-devel@...>
Date: Friday, October 5, 2007 - 10:25 am

What do you get with CONFIG_USB_DEBUG enabled?

Alan Stern

-

To: Greg KH <gregkh@...>, Alan Stern <stern@...>
Cc: Jean Delvare <khali@...>, <linux-kernel@...>, <i2c@...>, <linux-usb-devel@...>
Date: Friday, October 5, 2007 - 4:22 pm

Hi

Ok, after a day of biseting, it turns out to be a compiler problem. The
gcc-3.3.5 produces at least these two problems (Oops on i2c-viapro probe
and disabled IRQs in USB), whereas 4.1.2 has no problem so far. Up to now
3.3.5 had no problem compiling 2.6.20+ kernels here, for example, for P-II
SMP. Does it at all look realistic that such "random" run-time problems
are caused by a miscompilation?...

Thanks
Guennadi
---
Guennadi Liakhovetski
-

To: Guennadi Liakhovetski <g.liakhovetski@...>
Cc: Greg KH <gregkh@...>, Alan Stern <stern@...>, <linux-kernel@...>, <i2c@...>, <linux-usb-devel@...>
Date: Friday, October 5, 2007 - 5:13 pm

Hi Guennadi,

Miscompilation can do about anything. There have been a number of other
reports about compiler issues lately. Ingo Molnar here:
http://kerneltrap.org/Linux/Compiler_Optimization_Bugs_and_World_Domination
Me here:
http://marc.info/?l=linux-kernel&m=119127234804440&w=2

The trend I am seeing is that we are optimizing for, and testing with,
recent compilers (gcc 4.1 and later) and that older compilers tend to
break, even though compilers as old as gcc 3.2 are still supposed to be
supported. Not good.

--
Jean Delvare
-

To: Guennadi Liakhovetski <g.liakhovetski@...>
Cc: Greg KH <gregkh@...>, Alan Stern <stern@...>, Jean Delvare <khali@...>, <linux-kernel@...>, <i2c@...>, <linux-usb-devel@...>
Date: Friday, October 5, 2007 - 4:47 pm

I can't say about compilers (but it looks to me somewhat possible still),
but I can say a bit about the platform/CPU. You can find my thread titled
"VIA C7 anyone" from several months back in archives - that was my first
expirience. Since that, I received several emails from others with similar
problems.

The things is that at least boards I'm using, but I suspect it's CPU not
the board, -- are somewhat... flaky, so to say, and their reliability (or
even ability to work) depends on several factors, starting with production
conditions (environment at a time when it has been produced) and up to
various thermal factors.

It seems there's quite significant percentage of C7-based boards that are
flaky/unreliable, replacing one with another from the same batch usually
fixes the prob.

Next, some boards are VERY sensitive to themperature, and their thermal
sensors are WRONG - it seems - 100% of the time, showing ~20% less
themperature than it really is (say, when the sensor shows 35 degrees
celsius, the themp really is about 45..50 degrees) -- when the themperature
(on SOME samples) grows above 40 degrees, the system becomes unreliable
and may crash randomly here and there.

More, due to geometry of the CPU chip, with very small square area that
touches the headsink and relatively large headsink, it's sometimes enouth
to just touch the headsink so it positions wrongly, with bad thermo-contact
between the CPU and the headsink, resulting in high themperatures and
system instability.

And even more interesting -- it seems that some sequence of instructions
are more frequently misinterpreted (under "abnormal" conditions above)
than other sequences doing the same thing. That is, the same program
compiled with gcc-3.4 may work almost 100% correct while the same thing
compiled with gcc-4.1 may almost alway fail (usually due to segmentation
fault), or exactly the opposite.

So umm.... ;)

I'm running VIA C7 on this machine where I'm typing right now - no single
glitch since the time I...

To: Michael Tokarev <mjt@...>
Cc: Greg KH <gregkh@...>, Alan Stern <stern@...>, Jean Delvare <khali@...>, <linux-kernel@...>, <i2c@...>, <linux-usb-devel@...>
Date: Friday, October 5, 2007 - 5:30 pm

On Sat, 6 Oct 2007, Michael Tokarev wrote:

Hm, well, I could only compile a i686 kernel and Intel chipset with
"otherwise the same" config with these two compiler options to test...

Well, this very system has been running git and compiling kernels for
itself the whole day today without a single issue. The gcc-3.3.5
miscompiled kernel was compiled on another machine. So, I hope my specific
sample is stable. And I need it to be stable, because I'm going to run my
mail-server on it... BTW, compiled a tickless kernel on it, so far without
looking into user-space after 12 min uptime 40290 timer interrupts, i.e.,
17Hz, not bad.

As for sensors - my system seems to have a w83627ehf chip, "sensors"
output (under 2.6.22) looks pretty funny too:

# sensors
w83627ehf-i2c-9191-290
ERROR: Can't get adapter or algorithm?!?
VCore: +0.98 V (min = +0.00 V, max = +1.74 V)
in1: +12.41 V (min = +3.17 V, max = +9.24 V) ALARM
AVCC: +3.26 V (min = +3.82 V, max = +1.79 V) ALARM
3VCC: +3.26 V (min = +2.86 V, max = +1.49 V) ALARM
in4: +1.54 V (min = +1.38 V, max = +1.46 V) ALARM
in5: +1.59 V (min = +2.04 V, max = +0.95 V) ALARM
in6: +4.71 V (min = +4.48 V, max = +3.05 V) ALARM
VSB: +3.26 V (min = +4.08 V, max = +4.08 V) ALARM
VBAT: +3.20 V (min = +3.57 V, max = +3.02 V) ALARM
in9: +1.59 V (min = +2.04 V, max = +2.01 V) ALARM
Case Fan: 0 RPM (min = 3668 RPM, div = 16) ALARM
CPU Fan: 0 RPM (min = 4440 RPM, div = 16) ALARM
Aux Fan: 0 RPM (min = 3125 RPM, div = 16) ALARM
fan5: 0 RPM (min = 0 RPM, div = 8)
Sys Temp: +39C (high = -6C, hyst = -2C) ALARM
CPU Temp: +43.0C (high = +100.0C, hyst = +95.0C)
AUX Temp: +42.5C (high = +100.0C, hyst = +95.0C)

Maybe at least CPU Temp. at least correlates with the real value:-) Yes,
I'll look in BIOS next time I boot with a connected monitor and a
keyboard.

Thanks for the info!
Guennadi
---
Guennadi Liakhovetski
-

To: Michael Tokarev <mjt@...>
Cc: <linux-usb-devel@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Alan Stern <stern@...>, <i2c@...>
Date: Saturday, October 6, 2007 - 11:10 am

No, doesn't work. I forgot that on that PC I can only boot with
"acpi=noirq", so, the whole ACPI IRQ-mapping code is not used. Otherwise,
I did build such a kernel for that PC - noticed no problem. So, either the
only two "miscompiled" places were i2c-viapro and acpi irq routing, or
indeed it only triggers problems on C7...

Thanks
Guennadi
---
Guennadi Liakhovetski
-

To: Alan Stern <stern@...>
Cc: <linux-kernel@...>, <linux-usb-devel@...>
Date: Friday, October 5, 2007 - 10:50 am

Will try as soon as my bisect is done. Interestingly, both problems with
this system - this one and http://lkml.org/lkml/2007/10/4/417 so far
regress together - already somewhere after 23-rc6 and both USB and
i2c-viapro still work... Might also be some configuration options that got
lost while bisecting .22 - .23-rc9.

Thanks
Guennadi
---
Guennadi Liakhovetski
-

Previous thread: [OOPS] AXIS 700 Lite (VIA C7 CPU) BUG with 2.6.23-rc9-git (i2c) by Guennadi Liakhovetski on Thursday, October 4, 2007 - 6:46 pm. (4 messages)

Next thread: race with page_referenced_one->ptep_test_and_clear_young and pagetable setup/pulldown by Jeremy Fitzhardinge on Thursday, October 4, 2007 - 9:43 pm. (12 messages)