After much, much testing (months, off and on, pursuing hypotheses), I've discovered that the use of "outb al,0x80" instructions to "delay" after inb and outb instructions causes solid freezes on my HP dv9000z laptop, when ACPI is enabled. It takes a fair number of out's to 0x80, but the hard freeze is reliably reproducible by writing a driver that solely does a loop of 50 outb's to 0x80 and calling it in a loop 1000 times from user space. !!! The serious impact is that the /dev/rtc and /dev/nvram devices are very unreliable - thus "hwclock" freezes very reliably while looping waiting for a new second value and calling "cat /dev/nvram" in a loop freezes the machine if done a few times in a row. This is reproducible, but requires a fair number of outb's to the 0x80 diagnostic port, and seems to require ACPI to be on. io_64.h is the source of these particular instructions, via the CMOS_READ and CMOS_WRITE macros, which are defined in mc146818_64.h. (I wonder if the same problem occurs in 32-bit mode). I'm happy to complete and test a patch, but I'm curious what the right approach ought to be. I have to say I have no clue as to what ACPI is doing on this chipset (nvidia MCP51) that would make port 80 do this. A raw random guess is that something is logging POST codes, but if so, not clear what is problematic in ACPI mode. ANy help/suggestions? Changing the delay instruction sequence from the outb to short jumps might be the safe thing. But Linus, et al. may have experience with that on other architectures like older Pentiums etc. --
Use a variable for the port and and do a early quirk to change the port to something safe on your chipset? Ok there might be code using outb_p() before the early quirks, but should be possible to find using instrumentation. Also the port assignment might not be chipset specific, but BIOS specific, then you would need to match the DMI identifier. The disadvantage of that is that there are usually other BIOS I don't think that makes sense to do on anything modern. The trouble is that the jumps will effectively execute near "infinitely fast" on any modern CPU compared to the bus. But the delay really needs to be something that is about IO port speed. Ok in theory you could try to measure a outb using RDTSC and then use udelay, but first you would need a safe port for that already and then RDTSC is not necessarily constant. -Andi --
You don't need to. Port 0x80 historically is about 8uS so just udelay(8) and make sure the initial default delay is conservative enough before the CPU speed is computed. 0x80 should be fine for anything PC compatible anyway, its specifically reserved as a debug port and supported for *exactly* that purpose by many chipsets. The afflicted laptop should really be taken up with the vendor. If its got port 0x80 wrong gods knows what else it might have problems with. Alan --
Actually, I've seen few pci cards with leds on port 0x80, and I wonder: is our outb_p really correct? I mean, we expect 8usec delay -- historical ISA timing -- but when _PCI_ card with leds is inserted, it is likely to be faster than old ISA, right? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
Yes, i guess switching to udelay at least on newer systems would be a good idea. I'm not quite sure about systems without TSC though. -Andi --
Something like this? (Warning, will not probably even compile on x86-64, I do not have 64-bit compiler near me). (I believe VGA cards do not need slow outputs, plus udelay is not available in uncompressor?) Signed-off-by: Pavel Machek <pavel@suse.cz> [but it needs fixing x86-64] Pavel diff --git a/arch/x86/boot/compressed/misc_32.c b/arch/x86/boot/compressed/misc_32.c index b74d60d..288e162 100644 --- a/arch/x86/boot/compressed/misc_32.c +++ b/arch/x86/boot/compressed/misc_32.c @@ -276,10 +276,10 @@ static void putstr(const char *s) RM_SCREEN_INFO.orig_y = y; pos = (x + cols * y) * 2; /* Update cursor position */ - outb_p(14, vidport); - outb_p(0xff & (pos >> 9), vidport+1); - outb_p(15, vidport); - outb_p(0xff & (pos >> 1), vidport+1); + outb(14, vidport); + outb(0xff & (pos >> 9), vidport+1); + outb(15, vidport); + outb(0xff & (pos >> 1), vidport+1); } static void* memset(void* s, int c, unsigned n) diff --git a/include/asm-x86/io_32.h b/include/asm-x86/io_32.h index fe881cd..944dc5f 100644 --- a/include/asm-x86/io_32.h +++ b/include/asm-x86/io_32.h @@ -3,6 +3,7 @@ #define _ASM_IO_H #include <linux/string.h> #include <linux/compiler.h> +#include <linux/delay.h> /* * This file contains the definitions for the x86 IO instructions @@ -17,17 +18,6 @@ #include <linux/compiler.h> * mistake somewhere. */ -/* - * Thanks to James van Artsdalen for a better timing-fix than - * the two short jumps: using outb's to a nonexistent port seems - * to guarantee better timings even on fast machines. - * - * On the other hand, I'd like to be sure of a non-existent port: - * I feel a bit unsafe about using 0x80 (should be safe, though) - * - * Linus - */ - /* * Bit simplified and optimized by Jan Hubicka * Support of BIGMEM added by Gerhard Wichert, Siemens AG, July 1999. @@ -252,7 +242,7 @@ #endif /* __KERNEL__ */ static inline void native_io_delay(vo...
Alan, did you double-check that 8 us? I tried to but I seem to not have trustworthy documentation. Rene. --
I remember 16-bit CPU-driven ISA was able to do 2-3 MB/s transfers, that means at least 1 Maccesses/second = up to 1 microsecond/access. Perhaps IO ports accesses were slower than memory? But 8-12 times? Perhaps port 0x80 was using (slower) 8-bit timings? Bus-mastering ISA cards were able to do ca. 5 MB/s with 8 MHz (10 MHz?) clocking, some old machines didn't like it. Googling suggests that a slave access on 8-bit ISA bus was taking 6 cycles by default (including 4 wait states), 16-bit - 3 cycles (with 1 WS). Respectively 0.75 us and 0.375 us, and 0.25 us for 16-bit 0WS memory access (with standard 8 MHz clock). These values could be changed with BIOS setup, and devices could use 0WS or I/O CHRDY signals if they didn't like the defaults (dir 0WS mean 1 WS for 8-bit devices?). -- Krzysztof Halasa --
Where did the 8us delay come from? The documentation and source is careful not to say how long the delay is. Would changing it to, say 1us, be technically wrong? Is code that requires 8us correct? --
I think a single ISA bus transaction is 1 µs, so two of them back to back should be 2 µs, not 8 µs... -hpa --
Sigh. And now where do these _two_ transactions come from? (and yes, see Alan's folowups, a transaction on a spec bus is 1 us). Rene. --
Stale memory, sorry. -hpa --
Exactly. You think it's 2us, but the documentation doesn't say. The _p functions are generic inasmuch as they provide an unspecified delay. Drivers which work across platforms, and which use _p, therefore have different delays on different platforms. Should the length of the delay be unimportant? I wouldn't have thought so. If it is important, does that mean that such drivers are buggy on some platforms? I really *hate* the idea that access to non-present hardware is used to generate a delay. That sucks so badly. It's worthy of a school-aged hacker, not of a world-leading operating system. It's so not best-practice that it's worst-practice. --
Actually its very good practice. The LPC bus behaviour is absolutely and precisely defined. The timing of the inb is defined in bus clocks which is perfect as the devices needing delay are running at a fraction of busclock usually busclock/2. Older processors did not have a high precision timer so you couldn't calibrate loop based delays for 1uS. Port 0x80 is used all over the place for this, not just in Linux but in a large number of DOS programs and other PC OS's. It's even got specific hardware support in many of the chipsets so that you can make the latched last 0x80 write appear on the parallel port for debugging. Alan --
For newer CPUs udelay() would be probably fine though. We seem to have several documented examples now where the bus aborts trigger hardware bugs, and it is always better to avoid such situations. I still think the best strategy would be to switch based on TSC availability. Perhaps move out*_p out of line to avoid code bloat. -Andi --
Why is TSC significant? udelay() based on bogomips seems to be good enough...? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
Maybe I'm not sure how accurate it really is on non TSC system. On the other hand it is unclear that the port 80 IO is always the same time so it's probably ok to vary a bit. So most likely going to udelay() unconditionally is fine. -Andi --
yep, agreed, and have queued up the patch below. I've killed the misc_*.c outb_p() uses because they happen before there's an udelay() available - but that should be perfectly fine anyway: i dont remember any video hardware that needed pauses for cursor updates, i think those _p()'s just came in accidentally. (there's hardware that needed _p() for other aspects of video such as mode switching - but cursor updates ...) Ingo ------------------> Subject: x86: fix in/out_p delays From: Ingo Molnar <mingo@elte.hu> Debugged by David P. Reed <dpreed@reed.com>. Do not use port 0x80, it can cause crashes, see: http://bugzilla.kernel.org/show_bug.cgi?id=6307 http://bugzilla.kernel.org/show_bug.cgi?id=9511 instead of just removing _p postfixes en masse, lets just first remove the 0x80 port usage, then remove any unnecessary _p io ops gradually. It's more debuggable this way. Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/x86/boot/compressed/misc_32.c | 8 ++++---- arch/x86/boot/compressed/misc_64.c | 8 ++++---- arch/x86/kernel/quirks.c | 10 ++++++++++ include/asm-x86/io_32.h | 5 +---- include/asm-x86/io_64.h | 5 +---- 5 files changed, 20 insertions(+), 16 deletions(-) Index: linux-x86.q/arch/x86/boot/compressed/misc_32.c =================================================================== --- linux-x86.q.orig/arch/x86/boot/compressed/misc_32.c +++ linux-x86.q/arch/x86/boot/compressed/misc_32.c @@ -276,10 +276,10 @@ static void putstr(const char *s) RM_SCREEN_INFO.orig_y = y; pos = (x + cols * y) * 2; /* Update cursor position */ - outb_p(14, vidport); - outb_p(0xff & (pos >> 9), vidport+1); - outb_p(15, vidport); - outb_p(0xff & (pos >> 1), vidport+1); + outb(14, vidport); + outb(0xff & (pos >> 9), vidport+1); + outb(15, vidport); + outb(0xff & (pos >> 1), vidport+1); } static void* memset(void* s, int c, unsigned n) Index: linux...
Hi, On Tue, 11 Dec 2007 12:12:59 +1030 Well, if the delay is so much unspecified, what about _reading_ port 0x80 ? Will the delay be shorter ? And if so, what about reading port 0x80 and writing the value back ? inb al,0x80 outb 0x80,al I've been wondering since the beginning of this thread if the problem is not just the value we put to port 0x80, not writing to the port... Just my 0.02 Eur... Paul -- Paul Rolland E-Mail : rol(at)witbe.net Witbe.net SA Tel. +33 (0)1 47 67 77 77 Les Collines de l'Arche Fax. +33 (0)1 47 67 77 99 F-92057 Paris La Defense RIPE : PR12-RIPE Please no HTML, I'm not a browser - Pas d'HTML, je ne suis pas un navigateur "Some people dream of success... while others wake up and work hard at it" "I worry about my child and the Internet all the time, even though she's too young to have logged on yet. Here's what I worry about. I worry that 10 or 15 years from now, she will come to me and say 'Daddy, where were you when they took freedom of the press away from the Internet?'" --Mike Godwin, Electronic Frontier Foundation --
The delay is completely and fully specified in terms of the ISA/LPC clock which certainly for anything modern means a fixed, unchanging value (something very close to 1 us) and even on older PCs that allow some tweaking just means a delay synced to the actual bus clock which is what the _p variants should normally want to accomplish. Yes, as far as I'm aware, an inb() means the same delay but clobbers See? Moreover, this also only makes sense if there's in fact something responding to reads at 0x80 and with port 0x80 being a well-known legacy PC port, a POST monitor would be just about that and writing to _that_ would seem unlikely to have any ill effects other than turning your POST board LED display into a christmas tree. The problem more likely is some piece of hardware getting upset at LPC bus aborts and your suggestion wouldn't fix that. In earlier incarnations of this thread it's been reported that various implementations of the legacy PC timer, DMA controller and PIC needed the delay but just replacing the outb with a udelay(1) would seem very likely to have the desired effect also for those. The only problem with _that_ is that you need a calibrated timing loop first which means not-very-early boot (ie, not while you try to program the timer to calibrate the loop for example). Pavel Machek already posted a patch, although with an overly pessimistic delay value. The problem here is with an x86-64 machine that very likely does not need any delay at all in fact. One thing to do would be to make _any_ delay dependent on 32-bit but given that 64-bit machines can run 32-bit kernels this doesn't fix things fully, although it probably does in practice. Keying of DMI for any delay could be possible. But if the simple udelay(1) just works, all the better. Rene. --
That would be the delay on the i386 (sic) architecture. In general,
though, the delay is:
"Some devices require that accesses to their ports are slowed down.
This functionality is provided by appending a _p to the end of the
function."
-- Documentation/DocBook/deviceiobook.tmpl
(I've not seen any other formal definition.)
Most architectures (Alpha, Arm, Arm2, Blackfin, FRV, h8300, IA64,
PA-RISC, PowerPC, Sparc, Sparc64, V850 and Xtensa) do no pause. M68k
does no pause except in one configuration, when it's the same as i386.
On m32r it's a push and a pop. On SuperH it's similar to i386, only
using 16-bit input. X86-64 is the same as i386!
Thinking that _p gives a pause is perhaps too PC-centric. Why, if a
delay is needed, wouldn't you use a real delay; one that says how long
it should be?
--This particular discussion isn't about anything in general but solely about the delay an outb_p gives you on x86 since what is under discussion is not Because any possible outb_p delay should be synced to the bus-clock, not to any wall-clock. Drivers that want to sync to wall-clock need to use an outb, delay pair as you'd expect. In the real world, driver authors aren't perfect and will have used outb_p as a wall-clock delay which they have gotten away with since it's a nicely specified delay in terms of the ISA/LPC clock and the ISA/LPC clock being fairly (old) to very (new) constant. The delay it gives is very close to 1 us on a spec ISA/LPC bus (*) and as such, even though it may not be the right thing to do from an theoretical standpoint, generally a udelay(1) is going to be a fine replacement from a practical one -- as soon as we _can_ use udelay(), as I also wrote. Rene. (*) some local testing shows it to be almost exactly that for both out and in on my own PC -- a little over. If anyone cares, see attached little test program. The "little over" I don't worry about. 0 us delay is also fine for me and if any code was _that_ fragile it would have broken long ago.
That could be true if outb_p were used only in architecture dependent code, but it's not. It's used in drivers that are supposed to run on all sorts of platforms. Why does a megaraid controller need delays on i386 but not on Sparc, PowerPC, Alpha and others? Is it buggy on most It's most commonly a zero delay. Only in the minority of architectures is it otherwise. If a delay is needed, then put one in, but don't put in a paper promise that's more likely to be ignored than observed. Plenty of doubt has been expressed as to whether _p is widely used without need. Not surprising since it has such a vague specific meaning. One could say, Linux on i386 is liberally sprinkled with needless delays. I suppose it has the advantage that Microsoft will be hard pressed to catch up when finally we remove them. :-) I really prefer accurate code, but I'm also pragmatic and realise that it's far too much work to fix this any time soon. But if it were to be fixed, then perhaps _p would take an additional parameter, measured in cycles of delay. --
Most, probably most-all, of the delays to port operations on modern ix86 machines are not needed at all. Certainly machines that use bridges to expand port I/O to the ISA bus do need any such delays. There are exactly two (and only two) problems with removing the delays. (1) Older machines which have an actual ISA bus with its attendent capacity that needs to be charged long enough for the data to become valid --before being overwritten by new data. (2) I/O operations that have two ports, one an index port and the other a data port, like the CMOS RTC. Once you set the index port, it takes about 300 ns for it to propigate to the hardware, so there needs to be some delay between the back-to-back CPU operations which can occur much faster than that. On this machine, I have changed all the _p macros so they don't do anything. Since it is a modern machine with N/S bridges, which provide their own delays, everything works. Such would not be the case if I was using a machine that had an actual ISA (or PC-104) bus. Those are not terminated busses, but open-ended capacitors made up of connectors and PC traces. It takes about 300 ns to charge one of those (so 1us is a good dalay). BYW, there are no "transactions" on the ISA or EISA bus. It works by using a sequence of operations with minimum setup and hold times. It's very primative. Cheers, Dick Johnson Penguin : Linux version 2.6.22.1 on an i686 machine (5588.29 BogoMips). My book : http://www.AbominableFirebug.com/ _ **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this informati...
We know this. The problem is that there is no good known way to figure out which machines need it. Also it is typically slow hardware anyways -- the most time critical is probably the 8259, but nobody who cares about performance still uses it except as a fail safe fallback and for those it is better It has been observed to be required talking to some older and PIT etc. Anyways it looks like the discussion here is going in a a loop. I had hoped David would post his test results with another port so that we know for sure that the bus aborts (and not port 80) is the problem on his box. But it looks like he doesn't want to do this. Still removing the bus aborts is probably the correct way to go forward. Only needs a patch now. If nobody beats me to it i'll add one later to my tree. -Andi --
Pavel Machek already posted one. His udelay(8) wants to be less -- 1 or "to be safe" perhaps 2. http://lkml.org/lkml/2007/12/9/131 Rene. --
2 at least; that's how long outb(0x80) takes on one of my machines. Actually, ISA can go down to 4MHz, so maybe we should be using 4 usec.... but I guess I'm paranoid here. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
4 isn't sensible. There have been machines capable both of running Linux and their ISA bus at less than 8 MHz (if only for example by picking a 5 divisor on a system that was capable of hosting a 40 Mhz 386/486 but using a slower CPU) but not by much. And machines doing that and running Linux, even more so "today": 0. My posted test program (although there seems to be something wrong with it since it's influenced by compiler optimisation) is showing more than 1 but note that on the vast majority of machines, 0 would in fact do. 1 will on all, 2 will as well. Rene. --
Sadly, I've been busy with other crises in my day job for the last few days. I did modify Rene's test program and ran it on my "problem" machine, with the results below. The interesting part of this is that port 80 seems to respond to "in" instructions faster than the presumably "unused" ports 0xEC and 0XEF (those were mentioned by someone as alternatives to port 80). That, and the fact that the port 80 test reliably freezes the machine solid the second time it is run, and the "hwclock" utility reliably hangs the machine if the port 80's are used in the CMOS_READ/CMOS_WRITE loop, seems to strongly indicate that this chipset or motherboard actually uses port 80, rather than there being a bus problem. Someone might have an in to nVidia to clarify this, since I don't. In any case, the udelay(2) approach seems to be a safe fix for this machine. Hope input from an "outsider" is helpful in going forward. I put a lot of time and effort into tracking down this problem on this particular machine model, largely because I like the machine. Running the (slightly modified to test ports 80, ec, ef instead of just port 80) test when the 2 GHz max speed CPU is running at 800 MHz, here's what I get for port 80 and port ec and port ef. port 80: cycles: out 1430, in 792 port ef: cycles: out 1431, in 1378 port ec: cycles: out 1432, in 1372 ---------------------------- System info: HP Pavilion dv9000z laptop (AMD64x2) PCI bus controller is nVidia MCP51. processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 72 model name : AMD Turion(tm) 64 X2 Mobile Technology TL-60 stepping : 2 cpu MHz : 800.000 cache size : 512 KB --
Don't know if someone else mentioned those but I only said 0xed. That's the value Phoenix BIOSes use (yes, and which H. Peter Anvin) reported as being generally problematic as well). It's in fact not all that unexpected it seems that port 0x80 responds to in given that it's used by the DMA controller. It's a write that falls on deaf ears. The read is going to be faster if it doesn't timeout on an unused port. Although it's not faster for everyone, such as for me indicating that for us port 0x80 is really-really unused, it is for many. See results here: Yes, so it seems. In this case we could in fact also "fix" your situation by just going to 0xed depending on for example DMI. Alan Cox just posted a few At 800 MHz, that's 1.79 / 0.99 microseconds. The precision of the "in" is somewhat interesting. Did someone at nVidia think it's an "in" from 0x80 Rene. --
By the way, _does_ anyone have a contact at nVidia who could clarify? Alan maybe? I'm quite curious what they did... Summary: Unless after booting with "acpi=off", outputs to port 0x80 (the legacy way to delay I/O) reliably, but not immediately, hang MCP51 machines. Outputs to port 0xed do not indicating it's a not a generic bus abort problem. Rene. --
Sorry, the first sentence didn't parse unambiguously for me. Do you mean "acpi=off" works, or that "acpi=off" allows *subsequent* boots to work? I have some people at nVidia I can probably ping. -hpa --
Have them search on Google for: --
Sorry, didn't see this again due to aforementioned horseshit ISP. "acpi=off" works it seems. Report from David Reed here: Rene. --
On Wed, 12 Dec 2007 21:58:25 +0100 I don't. Nvidia are not the most open bunch of people on the planet. This doesn't appear to be a chipset bug anyway but a firmware one (other systems with the same chipset work just fine). The laptop maker might therefore be a better starting point. --
One wonders if it does some SMM trick to capture port 0x80 writes and attempt to haul them off for debugging; it almost sounds like some kind of debugging code got let out into the field. -hpa --
Presumably you have programmable decoders to trigger SMI? If not, then they're probably doing the equivalent in a SuperIO chip or similar. -hpa --
Not implausible. We've got a bug I've been dealing with where a vendor left debug stuff enabled via the parallel port and which clearly "escaped" from the test environment to the BIOS proper. --
Port 0xED, just FYI: cycles: out 1430, in 1370 cycles: out 1429, in 1370 (800 Mhz) --
4 isn't sensible. There have been machines capable both of running Linux and their ISA bus at less than 8 MHz (if only for example by picking a 5 divisor on a system that was capable of hosting a 40 Mhz 386/486 but using a slower CPU) but not by much. And machines doing that and running Linux, even more so "today": 0. My posted test program (although there seems to be something wrong with it since it's influenced by compiler optimisation) is showing more than 1 but note that on the vast majority of machines, 0 would in fact do. 1 will on all, 2 will as well. Rene. --
4 isn't sensible. There have been machines capable both of running Linux and their ISA bus at less than 8 MHz (if only for example by picking a 5 divisor on a system that was capable of hosting a 40 Mhz 386/486 but using a slower CPU) but not by much. And machines doing that and running Linux, even more so "today": 0. My posted test program (although there seems to be something wrong with it since it's influenced by compiler optimisation) is showing more than 1 but note that on the vast majority of machines, 0 would in fact do. 1 will on all, 2 will as well. Rene. --
4 isn't sensible. There have been machines capable both of running Linux and their ISA bus at less than 8 MHz (if only for example by picking a 5 divisor on a system that was capable of hosting a 40 Mhz 386/486 but using a slower CPU) but not by much. And machines doing that and running Linux, even more so "today": 0. My posted test program (although there seems to be something wrong with it since it's influenced by compiler optimisation) is showing more than 1 but note that on the vast majority of machines, 0 would in fact do. 1 will on all, 2 will as well. Rene. --
Which port do you want me to test? Also, I can run the timing test on my machine if you share the source code so I can build it. --
Oh, thought your previous reply was already responding to this. The "other diagnostic port", 0xed. The point is not so much that it's going to be a Thanks, would be interesting. This one: Rene.
Try replacing port 0x80 in include/asm-x86/io_*.h with 0xed... and see if it makes your machine stable. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
Okay, this needs to be junked. I don't get it, but I get different results from an -O2 and an -O0 compile on this one. Anyone? Rene. --
Each platform provides its own versions of the various _p functions which work as required for that platform. As to megaraid, I don't have the docs so I couldn't specifically tell you Most of those platforms have hardware that was designed not to need those delays and they know that their CMOS clock etc are not clocked at half "vague specific" ? sorry don't follow you. Its an ISA bus delay on systems that need it (or an LPC bus delay on measured in what, against what, for which bus. inb_p/outb_p are really only meaningful for ISA/LPC bus devices. In those cases it is precisely defined. Its use for PCI devices is a bit suspect and as a general rule probably wrong. Alan --
The _p variants are a universal fixture, defined as ending with a pause, but without specifying the duration. (The duration is architecture specific, mostly zero.) It really isn't a form that should be used in Yes, it's now clear that all of this is so. Regrettably, it's used in dozens of drivers, most having nothing to do with an ISA/LPC bus. If it really is specific to the ISA architecture, then it should only be used in architecture specific code. I think the solution is to remove it. Replace all _p calls with the non-_p variant, and add an explicit udelay. Udelay can initially be set conservatively until it's been properly calibrated, allowing it to be used during early boot. The good news is that it's only used in a few dozen drivers, so that actually might be doable. And then, who knows, maybe Microsoft might have to scratch their corporate heads, trying to find out how to compete with a suddenly much faster Linux! :-p --
Perhaps what was meant is that ISA-tuned timings make little sense on devices that are part of the chipset or on the PCI or PCI-X buses? On the other hand, since we don't know in many cases whether the "_p" was supposed to mean "the time it takes to execute an "out al,80h" on whatever bus structure happens to be on whatever machine, the problem is unsolvable. Ranting about whether ISA/LPC is on what machines seems to be of little value in contributing to a constructive solution. It seems to me that in the long term, driver writers would do well to think more clearly about the timings their devices require, when that is possible. They are probably implementation dependent - depending on the clock speed of the particular clock that is driving the particular i/o device. Then there's the social problem of a community development project - which is to get people to tune their code but preserve its ability to run on older and variant machines. --
On Thu, 13 Dec 2007 08:13:29 -0500 No. ISA as LPC bus is alive and well inside and outside chipsets. Welcome to planet earth and the reality of 'its cheaper to reuse cells than design a new one'. For the chipset logic like DMA controllers the _p is absolutely correct. Alan --
Simulating 1 microsecond delays (assuming LPC meets that goal for 0x80) is "absolutely correct" for devices provided on PCI-X running on 3 GHz or greater machines? Well, you are entitled to your opinion. Seems likely that reading the timing specs of such a chipset might be correct, and delaying for a time proportional to CPU speed, rather than assuming running 3000 3GHz clock cycles is needed on a very fast emulation of an old device that probably runs at the fastest bus speed provided in the chipset. Every device has different timing constraints. In the real world that I live in. --
On Thu, 13 Dec 2007 20:50:33 -0500 Yes - the LPC bus clock doesn't change for the CPU clock. --
It not only could be, it _is_ true. Not using an output to port 0x80 is what The latter probably and I don't bleedin' well care. In a discussion about removing the out to 0x80 the only thing that is relevant is what it should No damnit, you misunderstand. I'm saying that an outb_p _should_ be defined in terms of the bus clock since if you want a wall-clock delay you should be using just that. The _hardware_ is synced to the bus clock and therefore, having a delay available that is synced to the bus clock as well makes some sense. And again again again again not withstanding that, a udelay will still be an okay replacement in practice. Rene. --
Hello, On Tue, 11 Dec 2007 14:16:01 +0100 Some results : Core 2Duo 1.73GHz : [root@tux tmp]# ./in out = 2366 in = 2496 [root@tux tmp]# ./in out = 3094 in = 2379 Plain old PIII 600 MHz: [root@www-dev /tmp]# ./in out = 314 in = 543 [root@www-dev /tmp]# ./in out = 319 in = 538 [root@www-dev /tmp]# ./in out = 319 in = 550 [root@www-dev /tmp]# ./in out = 329 in = 531 Opteron 150 2.4GHz : -bash-3.1# ./in out = 4801 in = 4863 -bash-3.1# ./in out = 5041 in = 4909 -bash-3.1# ./in out = 4829 in = 4886 Paul --
Okay, these vary to wildly for you and might I suppose be a serialising artifact or some such. Give me a bit and I'll try to improve it... Rene --
This might be a bit more constant, I suppose. This serialises with cpuid. Don't see a difference locally, but perhaps you do. On a Duron 1300 with an actual ISA bus, "out" is between 1300 and 1600 for me and "in" between 1200 and 1500 with a few flukes above that which will I suppose be caused by the bus (ISA _or_ PCI) being momentarily busy or some such... Rene.
Here's my results on a PIII Xeon, 550mhz, 440GX chipset, and an ISA slot, which until recently was actually used with an 8 port serial card: jfsnew:~/src> sudo ./port80 out: 729 in : 348 jfsnew:~/src> sudo ./port80 out: 729 in : 354 jfsnew:~/src> sudo ./port80 out: 729 in : 350 jfsnew:~/src> sudo ./port80 out: 728 in : 346 jfsnew:~/src> sudo ./port80 out: 730 in : 340 --
Thank you. That's a little odd. The "in" time should be close to the "out" time really. Well, err, <shrug> I guess. For now noone's contemplating replacing the out with an in anyways :-) Rene. --
Hello, On Tue, 11 Dec 2007 16:28:56 +0100 Well, yes, at least on the PIII and the Opteron... Core2 is still ch
