[attached the DSDT.dsl file fyi] Alan, thank you for the pointers. I have been doing variations on this testing theme for a while - I get intrigued by a good debugging challenge, and after all it's my machine... Two relevant new data points, and then some more suggestions: 1. It appears to be a real port. SMI traps are not happening in the normal outb to 80. Hundreds of them execute perfectly with the expected instruction counts. If I can trace the particular event that creates the hard freeze (getting really creative, here) and stop before the freeze disables the entire computer, I will. That may be an SMI, or perhaps any other kind of interrupt or exception. Maybe someone knows how to safely trace through an impending SMI while doing printk's or something? 2. It appears to be the standard POST diagnostic port. On a whim, I disassembled my DSDT code, and studied it more closely. It turns out that there are a bunch of "Store(..., DBUG)" instructions scattered throughout, and when you look at what DBUG is defined as, it is defined as an IO Port at IO address DBGP, which is a 1-byte value = 0x80. So the ACPI BIOS thinks it has something to do with debugging. There's a little strangeness here, however, because the value sent to the port occasionally has something to do with arguments to the ACPI operations relating to sleep and wakeup ... could just be that those arguments are distinctive. In thinking about this, I recognize a couple of things. ACPI is telling us something when it declares a reference to port 80 in its code. It's not telling us the function of this port on this machine, but it is telling us that it is being used by the BIOS. This could be a reason to put out a printk warning message... 'warning: port 80 is used by ACPI BIOS - if you are experiencing problems, you might try an alternate means of iodelay.' Second, it seems likely that there are one of two possible reasons that the port 80 writes cause hang/freezes: 1. buffer overflow ...
That does imply some muppet 'extended' the debug interface for power management on your laptop. Also pretty much proves that for such systems we do have to move from port 0x80 to another delay approach. Ingo - the fact that so many ISA bus devices need _p to mean "ISA bus clocks" says to me we should keep the _p port 0x80 using variant for old systems/device combinations (eg ISA ethernet cards) which won't show up in any problem system (we know this from 15 odd years of testing), but stop using it for PCI and embedded devices on modern systems. Alan --
yes, ISA is fragile, and no way do we want to remove the delay, but are there strong counter-arguments against doing the clean thing and adding an udelay(2) (or udelay(1)) to replace those _p() uses in ISA drivers? That removes the global effect once and forever. Initially for standalone drivers without early bootup functionality, not platform drivers that might need to run before we have calibrated udelay. if someone runs a fresh new kernel on an ancient device then timings _will_ change a bit, no matter what we do. Alignments change, the compiler output will change (old compilers get deprecated so a new compiler might have to be picked), cache effects change - and this is inevitable. The important thing is to not eliminate the delays - but we sure dont have to keep them cycle accurate (we couldnt even if we wanted to). The only way to get the _exact same_ behavior is to not change the kernel at all. Ingo --
why should it be significantly slower? Ingo --
On Tue, 1 Jan 2008 19:46:59 +0100 out 80h, al is only two bytes. Any alternative that has been suggested in this discussion will use more space. mov dx, alt_port; out dx, al will be larger, a function call will definitely be a lot larger. People have been making changes to the kernel to save a couple of hundred bytes of text size. On old hardware (or anything with an ISA bus which I'd guess includes the Geode SCx200 SoC which is basically a MediaGX processor, a southbridge and an ISA bus with a Super I/O chip on it) an out to 80h will use exactly one ISA cycle. A call to udelay will need a margin, so it will be slightly slower. And that's assuming that you can find out the speed of the ISA bus, if you can't you'll have to assume the slowest possible bus (6 MHz I guess) which will be a lot slower. I don't know if the difference in code size or the udelay will be significantly slower, but I think it might be. And to take the MediaGX as an example, the TSC is not usable on that CPU, so Linux has to use the PIT timer for gettimeofday. As I wrote in a different post, I believe the PIT on the SCx200 needs outb_p to work reliably. So if outb_p becomes significantly slower that will affect a critical path on a very common embedded CPU. I'm not sure what Alan meant with his comments about locking, but if changing outb_p to use an udelay means that we have to add locking, that is also going to affect the code size and speed. /Christer --
Not to disagree with the point but more like 8 (1 us at 8 MHz). It's the There's also the bit about microseconds being very losely defined pre loops_per_jiffy calibration. Per CPU-family init helps somewhat but certainly for family 6 (Pentium Pro, II, III -- lots of hardware with ISA Explained here: http://lkml.org/lkml/2007/12/30/136 However, that's not an argument. Missing locking is a bug, and current outb I/O delay use hiding it doesn't change that. Rene. --
On Tue, 01 Jan 2008 20:59:20 +0100 Thanks, I had missed that one. I'm afraid that some PC104 systems may still use ancient video cards. /Christer --
PC/104 is actual ISA, not even LPC... -hpa --
You missed a word "wrongly". It has been "wrongly stated" I've been going through the ISA cases which are the majority. Generally speaking they are correct. We have a couple of "interesting" PCI users who most definitely want udelay() or removal of _p. We have various chipset cases which want looking at in detail. The ISA drivers however If you use wall clock timings it will be slower. Alan --
On Tue, 1 Jan 2008 22:01:43 +0100
And once again, the _p in the code that talks to the PIT is very much
non-bogus. And it is a critical path that's called a lot. The i8253
PIT and the i8259 interrupt controller are probably the only ones that
are relevant on a modern machine, and it seems that even some fairly
modern chipsets have limitations on how fast you can drive them.
BTW, I just checked the Intel M8253 data sheet (dead tree variant), and
it says under A.C Characteristics, READ CYCLE:
Recovery Time Between /READ and Any Other Control Signal: 1 us
So at least for the original M8253 a udelay(1) might be more
appropriate than outb_p, since the delay is not expressed in clock
cycles but absolute time.
The data sheet for the Intel M8259A says:
End of /RD to Next Command: 300 ns
End of /WR to Next Command: 370 ns
On the other hand, I don't know how all the i8253/i8259 clones or the
numerous variants of Super I/O chips behave. It wouldn't surprise me
if some Super I/O chip uses the ISA bus clock to latch the values
I didn't say that, I said I'm afraid it will be slower. :-)
/Christer
--
I wouldn't even be surprised if most all would... Rene. --
I actually analyzed the case of the PIT in the case of the implementation of a real chipset. In our case, running the PIT at 1.19318 MHz when the rest of the chipset core was running at 100 MHz introduced a huge amount of extra complexity and we really wanted to get rid of it. As it turns out, the PIT interface is ill-defined if run at a higher frequency; you can get undefined values as a result of a write followed by a read if there is no intervening PIT clock, which of course in the standard interface never happens. So in the end, we had to build all the synchronizers, backpressure controls and other crap that went along with an additional clock domain. As a result of that experience, I really don't think you will *ever* see a PIT that runs at a modern frequency. Building a 100 MHz PIC, however, was not a problem, and being able to sink accesses at full speed meant we didn't have to implement flow control. -hpa --
If text size becomes a problem in this case, then we can use an alternatives-like mechanism to fix up the kernel. However, realistically this probably should be a function call *combined with* the out and in; that reduces the impact somewhat. -hpa --
On Tue, 01 Jan 2008 13:21:47 -0800 That's a very good point. So for the PIT it should be possible to have two clocksources, one with the _p and one without, that one can switch between with a kernel command line option. So there shouldn't be any slowdown at all due to that. The i8259 init code is not time critical, so should be able to use a "reasonable" delay. Besides the above there are only a handful of _p uses outside of real ISA device drivers, and those should not be relevant for a modern PC unless somebody wants to use an 8390 based PCMCIA card, but we could tell them "don't do that then". But I'd better shut up and let Alan continue on his review of the _p use in the drivers. /Christer --
We need to build 8390.c twice anyway - once for PCI once for ISA with the _p changes whichever way it gets done. PCMCIA can use whichever we decide is right. Anyone know if PCMCIA is guaranteed to be 8MHz ? --
On Tue, 1 Jan 2008 23:12:50 +0000 It's not. It's perfectly ok to drive a PCMCIA bus slower than that, IIRC we used a much slower clock speed than that on a StrongARM platform I worked a couple of years ago. The PCMCIA CIS (Card information services) allows the following device speeds: 100, 150, 200 and 250 ns. The memory card spec also allows 600 and 300 ns. The standard I/O card cycle speed is 255 ns. I believe that is "the shortest access time for a read/write cycle", and I can't tell if that is comparable to one ISA clock cycles or if it's comparable to 8 ISA bus cycles. On the other hand, there is no clock line in a PCMCIA connector, so for PCMCIA devices any delays should be absolute times, or based on some clock that is internal to the card. How that fits with the 8390 data sheet talking about bus clocks, I don't know. /Christer --
and that's exactly what x86.git#mm does now. Ingo --
#1 udelay has to be for the worst case bus clock (6MHz) while the device may be at 10Mhz or even 12MHz ISA. So it slows it down stuff unneccessarily- and stuff that really really is slow enough as is. #2 Most of the ancient wind up relics with ISA bus don't have a tsc so their udelay value is kind of iffy. #3 Not changing it is the lowest risk for a lot of the old ISA code that never occurs on newer boxes If we have an isa_inb_p() as a specific statement of "I am doing an ISA bus dependant delay on ancient crap hardware" then we can avoid the risk of breakage. We wouldn't use it for non ISA, and certainly not for stuff like chipset logic which requires a more thorough fix as it occurs on all ISA bus cycles are *slow*, the subtle processor cache and gcc triggered timing changes are lost in the noise. Alan --
udelay is supposed to be reliable. If someone runs a new kernel and has no TSC (which might happen even on modern hardware or with notsc) _and_ finds that udelay is not calibrated well enough then that's a kernel bug Not changing the kernel _at all_ is what is the "lowest risk" option. If the kernel is changed, it should be tested - and if we have a buggy udelay, that should be fixed - because it could cause many other bugs in other drivers. yes, there are always risks in changing something, but using udelay is a gcc triggered timing changes can easily add up to a LOT more - especially if a loop is involved and especially on older hardware. Remember, 1 microsecond is just a handful of instructions on real old hardware. The kernel's timings are _not_ immutable, never were, never will be. Ingo --
On Tue, 1 Jan 2008 19:45:24 +0100 How do you find out the speed of the ISA bus? AFAIK there is no standardized way to do that. On the Geode SC2200 the ISA bus speed is usually the PCI clock divided by 4 giving 33MHz/4=8.3MHz or 30/4=7.5MHz, but with no external ISA devices it's possible to overclock the ISA bus to /3 to run it at 11MHz or so. But without poking at some CPU and southbridge specific registers to find out the PCI bus speed and the ISA bus divisor you can't really tell. So if you do udelay based on a 6MHz clock (I think you can safely assume that any 386 based system runs the ISA bus at least that fast) you'll waste at least 30% and maybe even 100% more time for the delay after every _p call. /Christer --
12MHz is valid for ISA although not a good idea - even IBM issued some systems with 12MHz ISA before discovering many vendors had assumed 8 was it. --
You miss the point entirely. The delay is in bus clocks not CPU clocks, not tsc clocks not PIT clocks, and it is permitted to vary by a factor of two. So you'll worst case halve the speed of network packet up/download As you say - its only a few instructions so small udelays tend to be Not for ISA bus hardware. For chipset logic, for PCI yes - for ISA stuff no. It's all about ISA clocks not wall clocks. Alan --
ok, you are right. How about we go with one of your suggestions: rename the API family to isa_*_p() in the affected ISA drivers? That makes it perfectly clear that this is an ISA related historic quirk that we just cannot properly emulate in an acceptable fashion. It will also make the least amount of changes to these truly historic drivers. The main maintenance thing we are interested in is to have no subsequent new uses of this API and to eliminate these accesses from modern hardware - and naming it clearly 'ISA' and making it dependent on CONFIG_ISA would likely achieve that purpose. oh, another thing: there are 100+ mails in this thread while there are only 3 mails in the thread that lists 61 not-yet-fixed-in-2.6.24 regressions: | Listed regressions statistics: | | Date Total Pending Unresolved | ---------------------------------------- | Today 139 38 23 which is a sad proportion of attention :-/ Ingo --
FYI - another quirky Quanta motherboard from HP, with DMI readings reported to me. -------- Original Message -------- Date: Wed, 2 Jan 2008 16:23:27 +1030 From: Joel Stanley <joel.stanley@adelaide.edu.au> To: David P. Reed <dpreed@reed.com> Subject: Re: [PATCH] Option to disable AMD C1E (allows dynticks to work) Using port80.c, I could hard lock a HP Pavilion tx1000 laptop on the Quanta 30BF Tonight, I will try compiling a kernel with these values added to your patch. Some history, feel free to ignore if it's not relevant: ubuntu feisty's 2.6.22 based kernel worked fine, irc. We were having issues with sound, so tried fedora8's .23 based kernel, but this would sporadically hard lock. Ubuntu hardy's 2.6.24 appeared fine, for the 2 hours or so I used it last night, until using the port80.c program, obviously. Cheers, Joel --
For no binary changes at all, and if going through all those outb_p() users
anyway, might/could as well just manually split them then:
outb_p() --> outb();
slow_down_io();
and then just leave out the slow_down_io() call in the non-ISA spots.
slow_down_io() could be renamed isa_io_delay() or anything (paravirt is a
little annoying there) if someone cares but then it's a complete identity
transformation for any driver that does care.
Would IMO also make for a somewhat better API than an isa_outb_p() as
there's nothing particurly ISA about the outb method itself -- many ISA
drivers use plain outb() as well.
Rene.
--
On 02-01-08 16:35, Rene Herman wrote:
> On 02-01-08 14:47, Alan Cox wrote:
>
>>> ok, you are right. How about we go with one of your suggestions:
>>> rename the API family to isa_*_p() in the affected ISA drivers? That
>>> makes it perfectly clear that this is an ISA related historic quirk
>>> that we just cannot properly emulate in an acceptable fashion. It
>>> will also make the least amount of changes to these truly historic
>>> drivers.
>>
>> Works for me. We need to build two versions of 8390.c now but thats no
>> big deal and sorts PCMCIA out too.
>
> For no binary changes at all, and if going through all those outb_p()
> users anyway, might/could as well just manually split them then:
>
> outb_p() --> outb();
> slow_down_io();
>
> and then just leave out the slow_down_io() call in the non-ISA spots.
> slow_down_io() could be renamed isa_io_delay() or anything (paravirt is
> a little annoying there) if someone cares but then it's a complete
> identity transformation for any driver that does care.
>
> Would IMO also make for a somewhat better API than an isa_outb_p() as
> there's nothing particurly ISA about the outb method itself -- many ISA
> drivers use plain outb() as well.
Would just need this bit of io.h arch unification from the orignal patch and
that's it:
diff --git a/include/asm-x86/io_64.h b/include/asm-x86/io_64.h
index a037b07..97cb8c6 100644
--- a/include/asm-x86/io_64.h
+++ b/include/asm-x86/io_64.h
@@ -35,13 +35,20 @@
* - Arnaldo Carvalho de Melo <acme@conectiva.com.br>
*/
-#define __SLOW_DOWN_IO "\noutb %%al,$0x80"
+static inline void native_io_delay(void)
+{
+ asm volatile("outb %%al,$0x80" : : : "memory");
+}
+static inline void slow_down_io(void)
+{
+ native_io_delay();
#ifdef REALLY_SLOW_IO
-#define __FULL_SLOW_DOWN_IO __SLOW_DOWN_IO __SLOW_DOWN_IO __SLOW_DOWN_IO
__SLOW_DOWN_IO
-#else
-#define __FULL_SLOW_DOWN_IO ...Alan - in googling around the net yesterday looking for SuperIO chipsets that claim to support port 80, I have found that "blade" servers from companies like IBM and HP *claim* to have a system for monitoring port 80 diagnostic codes and sending them to the "drawer" management processor through a management backplane. This is a little puzzling, because you'd think they would have noticed port 80 issues, since they run Linux in their systems. Maybe not hangs, but it seems unhelpful to have a lot of noise spewing over a bus that is supposed to provide "management" diagnostics. Anyway, what I did not find was whether there was a particular chipset that provided that port 80 feature on those machines. However, if it's a common "cell" in a design, it may have leaked into the notebook market chipsets too. Anyone know if the Linux kernels used on blade servers have been patched to not do the port 80 things? I don't think this would break anything there, but it might have been a helpful patch for their purposes. I don't do blades personally or at work (I focus on mobile devices these days, and my personal servers are discrete), so I have no knowledge. It could be that the blade servers have BIOSes that don't do POST codes over port 80, but send them directly to the "drawer" management bus, of course. --
Most of the chipsets let you turn it on and off so presumably the BIOS turns it off before running Linux. Thats certainly done by several chipsets and we recently had a bug where a BIOS forgot to turn them off I'm not aware of such, or requests for them. Alan --
I have mentioned this before... I think writing zero to port 0xf0 would be an acceptable pause interface (to the extent where we need an I/O port) except on 386 with 387 present; on those systems we can fall back to 0x80. -hpa --
I see that with CR0.NE set (*) we indeed don't care about IGNNE#... However, I'm worried about this comment in arch/x86/kernel/i8259_32.c === /* * New motherboards sometimes make IRQ 13 be a PCI interrupt, * so allow interrupt sharing. */ === Is it really safe to just blindly negate IRQ13 on everything out there, from regular PC through funky embedded thingies? (*) bit 5: rene@7ixe4:~/src/local$ ./smsw msw: 0x3b Rene.
Well, on the PIIX it is and I guess on anything where it's _not_ fully internal an 0xf0 write wouldn't have any effect on IRQ13... When you earlier mentioned this it seemed 0xed switched on DMI would be good enough, but well. Alan, do you have an opinion on the port 0xf0 write? It should probably still be combined with a replacement/deletion for new machines due to the bus-locking "bad for real-time" thing you mentioned earlier but in the short run it could be a fairly low-impact replacement on anything except a 386+387 We should do a another timing measurement survey and it makes for sligtly worse code if we indeed feel it's not safe enough to write anything other than 0, but otherwise it's quite minimal. Rene. --
Thinking about this, my main worry about 0xf0 as a 0x80 replacement would be systems that have elected to _not_ let port 0xf0 writes flow through to ISA changing the timing-characteristics. Given that it's a known port, someone may have elected to just keep it fully internal. Upto now the datasheets I've read do put it on ISA... Rene. --
On Wed, 02 Jan 2008 00:11:54 +0100 Both 0xed and 0xf0 are mapped to internal functions on the AMD Elan SC400 processor. It is an AMD 486 based system on a chip and since AMD just knew that it would never have a math coprocessor, they reused the 0xf0-0xf2 range for the PCMCIA controller. I guess the AMD Elan SC500 will have similar problems. I seem to recall that back when I was working with the Elan SC400 (sometime around 1998?) there were discussions about finding an alternate delay port because outb to 0x80 messed up the debug port. I think the Elan stopped those discussions because just about every port on the Elan was reused for some alternate purpose. /Christer --
Okay, thanks much. So 0xf0 would be unuseable on 386+387 and AMD Elan SC400 and could possibly change timing on an unknown number of systems due to not being put on the bus. 0x80 only fails for some recent HP laptops instead so it seems there would be not enough cause to go with 0xf0 onstead of 0x80 as the default choice; if we're quirking around machines anyway it might as well be the DMI based quirking currently suggested. Rene. --
Yeah, the Elan is not supportable anyway without a CONFIG option (it's broken in so many ways), so it doesn't really apply. It's a fuckwit design. -hpa --
It was actually IBM who broke it with the 80286-based PC/AT because of the BIOS compatibility -- the vector #0x10 had already been claimed by the original PC for the video software interrupt call (apparently against Intel's recommendation not to use low 32 interrupt vectors for such purposes), so it could not have been reused as is for FP exception handling without breaking existing software. I suppose a more complicated piece of glue logic could have been used along the lines of what eventually went into the i486, but presumably the relatively low level of integration of the PC/AT made such additional circuits hard to justify even if it indeed was considered. Maciej --
Supposedly the reason was that the DOS-less "cassette BASIC" delivered by Microsoft used all the INT instructions except the reserved ones as a weird bytecode interpreter. Bill Gates was fond of that kind of hacks. -hpa --
There's nothing easier than always writing 0 to the 0x80 to check if it hangs in such case...? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
I did try that. Machine in question does hang when you write 0 to 0x80 in a loop a few thousand times. This particular suspicion was that the problem was caused by the following sort of thing (it's a multi-cpu system...) First, some ACPI code writes "meaningful value" X to port 80 that is sort of a "parameter" to whatever follows. Just because the DSDT disassembly *calls* it the DBUG port doesn't mean it is *only* used for debugging. We (Linux) use it for timing delays, after all... then Linux driver writes some random value (!=X) including zero to port 80. then ACPI writes some other values that cause SMI or some other thing to happen, There are experiments that are not so simple that could rule this particular guess out. I have them on my queue of experiments I might try (locking out ACPI). Of course if the BIOS were GPL, we could look at the comments, etc... I may today pull the laptop apart to see if I can see what chips are on it, besides the nvidia chipset and the processor. That might give a clue as to what SuperIO or other logic chips are there. --
