From: Arjan van de Ven <arjan@linux.intel.com> Subject: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in On PCs, PCI extended configuration space (4Kb) is riddled with problems associated with the memory mapped access method (MMCONFIG). At the same time, there are very few machines that actually need or use this extended configuration space. At this point in time, the only sensible action is to make access to the extended configuration space an opt-in operation for those device drivers that need/want access to this space, as well as for those userland diagnostics utilities that (on admin request) want to access this space. It's inevitable that this is done per device rather than per bus; we'll be needing per device PCI quirks to turn this extended config space off over time no matter what; in addition, it gives the least amount of surprise: loading a driver for a device only impacts that one device, not a whole bus worth of devices (although it'll be common to have one physical device per bus on PCI-E). The (desireable) side-effect of this patch is that all enumeration is done using normal configuration cycles. The patch below splits the lower level PCI config space operation (which operate on a bus) in two: one that normally only operates on traditional space, and one that gets used after the driver has opted in to using the extended configuration space. This has lead to a little code duplication, but it's not all that bad (most of it is prototypes in headers and such). Architectures that have a solid reliable way to get to extended configuration space can just keep doing what they do now and allow extended space access from the "traditional" bus ops, and just not fill in the new bus ops. (This could include x86 for, say, BIOS year 2009 and later, but doesn't right now) This patch also adds a sysfs property for each device into which root can write a '1' to enable extended configuration space. The kernel will print a notice into dmesg w...
Comments: 1) [minor] With a bit in struct pci_dev, there is no need for separate raw_pci_ops. That will simplify your patch, with no functionality change. "golden" arches (no pun intended) may implement raw_pci_ops that _always_ work with extended config space, and simply ignore that bit, if that is how their underlying non-mmconfig-nor-type1 hardware is implemented. 2) [non-minor] hmmmm. [jgarzik@core ~]$ lspci -n | wc -l 23 So I would have to perform 23 sysfs twiddles, before I could obtain a full and unabridged 'lspci -vvvxxx'? For the userspace interface, the most-often-used knob for diagnostic purposes will be the easiest one. And that's echo 1 > enable-ext-cfg-space-for-all-buses-ACPI-says-to lspci -vvvxxx 3) [minor] architectures must be able to override pci_enable_ext_config(). see "golden arches". --
Or you force it on with "pci=mmconfig" or something at boot-time. But yes. The *fact* is that MMCONFIG has not just been globally broken, but broken on a per-device basis. I don't know why (and quite frankly, I doubt anybody does), but the PCI device ID corruption happened only for a specific set of devices. Whether it was a timing issue with particular devices or whether it was a timing issue with some particular bridge (and could affect any devices behind that bridge), who knows... It almost certainly was brought on by a borderline (or broken) northbridge, but it apparently only affected specific devices - which makes me suspect that it wasn't *entirely* due to just the northbridge, and it was a combination of things. I don't understand why you cannot seem to accept that per-device thing, in the face of clear data that yes, it really *is* per-device. Not to mention the fact that the way MMIO config setups work, you may well have entire buses that simply aren't accessible with MMIO config at all (because the MMIO config window is not large enough). Furthermore, please accept the fact that of those 23 devices, exactly *none* will actually care. So yes, you'd have to enable it manually for those individual devices, but that's only if you want to do something totally pointless in the first place. So stop this totally inane "it has to be global" crap. It doesn't have to be global at all, and we have hard data showing that it really SHOULD NOT be a global flag. Linus --
On Thu, 27 Dec 2007 06:52:35 -0500 but sadly your second statement is not correct. Part of the complication is that all PCI config ops operate on busses not devices; at first I thought "just add a bit and be done with it", but sadly it's not quite the case. Due to the per-bus nature of the ops, you end up having 2 type of bus operations, and that's just boilerplate (prototypes, exports and stuff) but it makes up most of the lines of the patch In addition, a separate raw_pci_ops (for x86 only!) is needed anyway since it's quite likely that we'll have various options of each case (extended or not) and we want to pick the best one for each case, the easiest one is an option to lspci. Nothing more nothing less. Making a global knob in kernel space is a lot more tricky, and in addition really there's enough cases where userspace wants the one device anyway Doing the "for each device I'm about to dump" in lspci is pretty much as hard as doing see the patch. All pci_enable_ext_config() does is set a flag. The architecture decides what to do with that flag. Golden architectures can just totally ignore the flag and always expose the full space. (In fact, the patch assumes all-but-x86 to be golden here; which is fair) -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
I just thought this might be interesting to the discussion. I recently bought another 2 GB memory for my computer. My hardware is as following: Asus Commando (Intel P965 chipset) Intel Core2 Q6600 4x1 GB Geil PC6400 memory nVidia 8800 gts (old g80 core, 640 mb mem) Without booting with pci=nommeconf i have severe stability issues and often when its not crashing i get slowdowns with the error: kern.log:Jan 15 13:19:40 bilbo kernel: [ 132.046715] NVRM: Xid (0001:00): 6, PE0001 ... repeated x times. In addition the nVidia framebuffer seems to "leak" or not update since i get loads of graphics artifacts. The system works perfectly fine with 2 GB memory and not the pci=nommconf. It works like a charm when using pci=nommconf and 4 GB memory. In adition i have to enable the Northbridge->PCI Memory remap feature in the BIOS to avoid the kernel panicing when trying to access > 3 gb but that is understandable :) My software is Kubuntu 7.10 stock x86_64 kernel, but i do use the binary driver by nVidia. It works like a charm when using pci=nommconf If you guys need any more info about hardware/software from me, please let me know. -- Øyvind Vågen Jægtnes +47 96 22 03 08 (i reject your diurnal rhythm and subsitute my own) --
Can you send me a follow-on patch that documents this in Documentation/ABI please. thanks, greg k-h --
Greg, if you integrate Ivan's patch, you don't need Arjan's patch. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
On Fri, 11 Jan 2008 12:28:20 -0700 Personally I absolutely don't agree with that. Ivan's patch is another attempt to make MMCONFIG work somewhat better, but does not provide the explicit opt-in that I think is required at this point; people have tried to get MMCONFIG stable for a really long time, and failed still upto today. At least my patience is up and this needs to be opt-in. -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
I think I agree with Arjan here, Ivan's patch should also work on top of this one, and will help out some machines. But as he hasn't asked for it to be included in the kernel tree, that's a moot point right now :) thanks, greg k-h --
So your argument is that MMCONFIG sucks, therefore Linux has to have a He didn't? I certainly ask for it to be included. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
What's *your* point? MMCONFIG is known broken. If we ever start enabling it more (ie start using it even if it's not reserved in the e820 tables), all that known breakage will come and bite us in the *ss. We need to have some armor-plated underwear to protect against that ass-biting, and that's what Arjan's patch is. Tell me what *other* armor plating you could have that actually works? Linus --
Ivan's patch doesn't start enabling MMCONFIG in more places than we currently do. It makes us use conf1 accesses for all accesses below The armour plating that already exists -- pci=nommconf. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
.. and I agree with that patch. But there will be people who try to access extended space by mistake, and they'll have a hard-locked machine or No. It needs to be automatic, OR THE OTHER WAY AROUND. Ie we disable the unsafe feature on purpose, and then force people who access it to do so *consciously*. Extended config space is different, for chissake! It's not even like it's just a bigger normal config space where normal config accesses just overflow into it. It really does have different rules etc. Linus --
But they can't. We limit the size they can access to 256 bytes, unless I'd be fine with making mmconfig off by default. Make people pass Yes, but it's also important to enable some of the PCIe features. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
Umm. Probing address 256 (or *any* address) using MMCONFIG will simply lock up the machine. HARD. What's so hard to understand about MMCONFIG being broken on certain hardware? Linus --
Did I miss a bug report? The only problems I'm currently aware of are the ones where using MMCONFIG during BAR probing causes a hard lockup on some Intel machines, and the ones where we get bad config data on some AMD machines due to the configuration retry status being mishandled. All the other lockups I'm aware of are already handled by the existing checks. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
Hmm. Were all those reports root-caused to just that BAR probing? If so, we may be in better shape than I worried. Linus --
I believe so. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
Ditto. One typical problem is that on "Intel(r) 3 Series Experss Chipset Family" MMCONFIG probing of the BAR #2 (frame buffer address) of integrated graphics device locks up the machine (depending on BIOS settings, of course). This happens because the frame buffer of IGD has higher decode priority than MMCONFIG range, as stated in Intel docs... Ivan. --
Ok, so what would the proposed patch look like to help resolve this? Ivan, you posted one a while ago, but never seemed to get any confirmation if it helped or not. Should I use that and drop Arjan's? Or use both? Or something else like the patches proposed by Tony Camuso? thanks, greg k-h --
Actually I'm strongly against Arjan's patch. First, it's based on assumption that the MMCONFIG thing is sort of fundamentally broken on some systems, but none of the facts we have so far does confirm that. And second, I really don't like the implementation as it breaks all non-x86 arches (or forces them to add a set of totally meaningless Tony's patch is a variation of the same idea, so this patch supersedes it. The only argument for using conf1 to access only the first 64 bytes of the config space was some concerns about performance. But the only driver that extensively uses config space at runtime is tg3, and only as a work around some broken revisions of the chip. And even in that case I seriously doubt that mmconf vs. conf1 would make any measurable difference. On the other hand, always using conf1 for the whole 256-byte legacy config space allows us to drop all sorts of black lists, which is a *huge* advantage. Here is the same patch, but with an updated commit message - proper attribution to Loic Prylli, which I somehow missed the first time, sorry. Ivan. --- PCI x86: always use conf1 to access config space below 256 bytes Thanks to Loic Prylli <loic@myri.com>, who originally proposed this idea. Always using legacy configuration mechanism for the legacy config space and extended mechanism (mmconf) for the extended config space is a simple and very logical approach. It's supposed to resolve all known mmconf problems. It still allows per-device quirks (tweaking dev->cfg_size). It also allows to get rid of mmconf fallback code. Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> --- arch/x86/pci/mmconfig-shared.c | 35 ----------------------------------- arch/x86/pci/mmconfig_32.c | 22 +++++++++------------- arch/x86/pci/mmconfig_64.c | 22 ++++++++++------------ arch/x86/pci/pci.h | 7 ------- 4 files changed, 19 insertions(+), 67 deletions(-) diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mm...
On Sat, 12 Jan 2008 17:40:30 +0300 btw this is my main objection to your patch; it intertwines the conf1 and mmconfig code even more. When (and I'm saying "when" not "if") systems arrive that only have MMCONFIG for some of the devices, we'll have to detangle this again, and I'm really not looking forward to that. --
conf1 has been a hardcoded dependencies of mmconfig for years. Ivan's patch does not make it worse (in fact it considerably simplifies that code, making it easier to untangle later). IMHO, either your patch or Ivan's can be a good base, but: 1) For your remark above to be given any consideration, your patch should be modified to remove the hardcoded conf1 from the *current* mmconfig code, otherwise we end up with 3 set of ops (mmconfig + conf1+ a possible third set of operations) intertwined in a confusing manner. And removing that dependency is not a straightforward operation unless you also do 2): 2) the pci_enable_ext_config() function and dev->ext_cfg_space field, sysfs interface should be removed from the patch. There has never been a problem reporting crashes or any undefined behaviour while trying to access ext-conf-space, all the problems where *using mmconfig to access legacy-conf-space*. The "if (dev->cfg_space_ext > 0)" checks can instead be replaced by "if (reg >= 256)". Otherwise when using per-device explicit enabling, just *checking* whether ext-conf-space is available by calling pci_enable_ext_config(), will make some of the old problems of *loosing legacy conf-space* come back: you would have introduced a new user-space and kernel API while only solving half the problems, not a good deal. if you do 1) and 2), then you really support the good properties you claimed: - You can use mmconfig for ext-space and something else for legacy-space. - You can use mmconfig for everything (for instance if conf1 is not implemented). Of course it is as straightforward to modify Ivan's patch to also have the same properties. Loic Loic --
On Sun, 13 Jan 2008 13:23:35 -0500 Loic Prylli <loic@myri.com> wrote: This entirely misses the point of why I made the patch. The point is NOT that devices are buggy. The point is that right now, 99.99% of the machines out there do NOT need extended config space (no matter how it gets accessed), yet at the same time they suffered from it's issues for... what 2 years now? The point of my patch was to make people who don't need extended config space, not have to deal with it anymore. Note: There is not a 100% overlap between "need" and "will not be used in the patches that use legacy for < 256". In the other patches posted, extended config space will be used in cases where it won't be with my patch. (Most obvious one is an "lspci -vx" from automated scripts). Is that a problem? We've had 2 years of mess, with one not-enough patch after another. There still are problems TODAY (eg im 2.6.24-rc7). The patch that falls back to an alternative method for below 256 is no doubt a step in the right direction. (although I'm not all that happy about mixing access types, it's not provably incorrect) Is it enough? I'm not sure. Only time can tell I suppose, but the risk side is that if it is not enough, users who don't need the extended config space for functionality will suffer the bugs AGAIN. So in short, my approach was NOT about "fix PCI", it is about "fix the user experience". It's a stopgap for sure, until the underlying mechanism gets reliable. It's been 2 years..... maybe this next step is "it", maybe it isn't. --
I think I got your point the first time, and I agree it is sound. But in my subjective and biased opinion, I just think ext-conf-space is already useful and widespread enough (being used is not the same as being strictly required for basic operation) for your proposed tradeoff to not be optimal (protecting against "future/non-proven" hardware bugs, i.e. bringing non-proven benefits, at the expense of making life harder for ext-conf-space users while bringing additional extra API/code). To take an example from the linux tree: the driver/pci/pcie/aer code uses ext-conf-space for every pcie-root (currently several distributions enable it by default), does it mean opt-in would be automatically activated for most pcie hierarchies (defeating most of the benefits of being opt-in), or we just disable that code by default? Does lspci -v will automatically opt-in all pcie (right now by default it tries to list the extended-capabilities for pcie and pcix), or do we now require manual explicit sysfs operations to get the whole thing? Is is an additional flag to lspci (if so will that flag also apply to pcix, possibly causing a crash for lspci -v To go one step your direction, I have already argued in a couple of emails that I would prefer to not implement ext-conf-space access for any PCI-X devices (removing PCI-X2 from pci_ext_cfg_size), because there we are trying to support devices that we don't really know exists or will ever exists. And protecting against "unproven bugs" makes more FWIW, I have in my tree a patch almost identical to Ivan's dated "December 2005". Because of the constant activity on the mmconfig front (that I thought would make it obsolete), I never took the effort of suggesting it before one month ago (I am not a regular user of linux-kernel). I admit nobody else should view it that way, but for me rather than the last attempt at fixing mmconfig, it's a patch first used two years ago that would have arguably prevented a...
I believe you to be mistaken in this belief. If you take Ivan's patch, conf1 is used for all accesses below 256 bytes. lspci -x only dumps config space up to 64 bytes; lspci -xxxx is needed to show extended pci config space. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
I agree with Arjan about that "not a 100% overlap". It is about the
extra ext-conf-space access done while probing in drivers/pci/probe.c:
dev->cfg_size = pci_cfg_space_size(dev);
(and lspci -v will also query/show the list of extended-caps for
pci-x/pcie-x devices that have some, provided the kernel can access
ext-conf-space).
With Ivan's patch, that line would still cause one extended-conf-space
access at offset 256 for pcie/pci-x2 devices (to check the ability to
query ext-space). Arjan "opt-in" patch would prevent that extra access.
IMHO that access is OK and harmless in all cases, we are already
protected by MCFG/e820 checks, but I agree one can express a different
opinion based on trying to prevent "never-seen/potential" hardware/BIOS
bugs. FWIW it is also there that I was suggested to exclude PCI-X2
devices (when restricted to pcie, that access while probing cannot even
cause the harmless master-abort/0xffffffff), but there is a small trade-off.
Loic
--There is nothing wrong with it; please realize that mmconf and conf1 are just different cpu-side interfaces. Both produce precisely the *same* bus MMCONFIG for *some* of the devices? This doesn't sound realistic from technical point of view. MMCONFIG-only systems? Sure. I really hope to see these. But it won't be PC-AT architecture anymore. It has to be something like alpha, for instance, fully utilizing the 64-bit address space, and we'll have to have the whole low-level PCI infrastructure completely different for these future platforms anyway. Right now, each and every x86 chipset *does* require working conf1 just in order to set up the mmconf aperture. It's the very fundamental thing, sort of design philosophy. Ivan. --
On Sun, 13 Jan 2008 00:49:11 +0300 s/x86/pc/ and not even that. Really this is a huge design mistake in your patch, the hard coding of conf1, and for that reason I really don't think it should go in. We have 4 or so methods on PC today to access config space, probably going to 6 in the next year or two. One of those methods *HARD PICKING* another one as "second best" for cases where it doesn't want to deal with is WRONG. It really needs to be up to the architecture/platform to decide which ops vector is the fallback. And yes on your current PC that might well be conf1. But hardcoding that is not the right thing. We have the vectors, we have the ranking code, just make a "second rank" thing. Oh wait, my patch did that ;) Then let either the mmconfig code or the wrapper above it (doesn't matter, in fact, I can see value of making this decision in the wrapper and keep mmconfig code simple and clean, because maybe mmconfig IS the thing that the architecture says needs to deal with the lower 256 bytes).. Oh wait my patch also did that pretty much ;) The rest of my patch was defaulting to off. Is it that bit that you really hate? --
Arjan, I have not seen your MMCONFIG patch. Would you mind sending me a copy? Thanks. Tony --
On Sat, 12 Jan 2008 19:12:23 -0500 sure ---- On PCs, PCI extended configuration space (4Kb) is riddled with problems associated with the memory mapped access method (MMCONFIG). At the same time, there are very few machines that actually need or use this extended configuration space. At this point in time, the only sensible action is to make access to the extended configuration space an opt-in operation for those device drivers that need/want access to this space, as well as for those userland diagnostics utilities that (on admin request) want to access this space. It's inevitable that this is done per device rather than per bus; we'll be needing per device PCI quirks to turn this extended config space off over time no matter what; in addition, it gives the least amount of surprise: loading a driver for a device only impacts that one device, not a whole bus worth of devices (although it'll be common to have one physical device per bus on PCI-E). The (desireable) side-effect of this patch is that all enumeration is done using normal configuration cycles. The patch below splits the lower level PCI config space operation (which operate on a bus) in two: one that normally only operates on traditional space, and one that gets used after the driver has opted in to using the extended configuration space. This has lead to a little code duplication, but it's not all that bad (most of it is prototypes in headers and such). Architectures that have a solid reliable way to get to extended configuration space can just keep doing what they do now and allow extended space access from the "traditional" bus ops, and just not fill in the new bus ops. (This could include x86 for, say, BIOS year 2009 and later, but doesn't right now) This patch also adds a sysfs property for each device into which root can write a '1' to enable extended configuration space. The kernel will print a notice into dmesg when this happens (including the name of the app) so that if the system crashes a...
Thanks, Arjan. The problem we have been experiencing has to do with Northbridges, not with devices. As far as the device is concerned, after the Northbridge translates the config access into PCI bus cycles, the device has no idea what mechanism drove the Northbridge to the translation. That is to say, the device does not know whether the config cycle on the bus was caused by an MMCONFIG cycle or a legacy Port IO cycle delivered to the Northbridge. In systems that had Northbridges that did not respond correctly to MMCONFIG cycles, like the AMD 8132, we (HP & RH) were blacklisting whole platforms to limit them to Port IO PCI config. However, when platforms emerged using both legacy PCI and PCI express, the platforms that were limited to Port IO config cycles were not express compliant, since the express spec requires the platform to be able to address the full 4096 byte region of config space to be considered express-compliant. The patch I devised concerned itself with Northbridges and separated MMCONFIG-compliant buses from those that could not handle MMCONFIG. Therefore, the express bus in the platform could happily employ MMCONFIG to access the entire 4K region, while the legacy bus with the non-compliant Northbridge could be restricted to Port IO config. However, even with my patch, the problem remained where devices requiring large displacements could overlap the BIOS-mapped MMCONFIG region. In such a situation, where the bus has passed the MMCONFIG test, the MMCONFIG region can get doubly mapped by bus-sizing code, causing the system to hang. The remedy proposed by Loic and implemented by Ivan is actually quite elegant, in that it addresses all these problems quite effectively while eliminating a ration of specialized and somewhat obscure code. In my humble opinion, Port IO config access is here to stay, having been defined as an architected mechanism in the PCI 2.1 spec. This is most especially true for x86. In other words, for x86, I don't think...
On Sat, 12 Jan 2008 20:36:59 -0500 correct for now. HOWEVER, and this is the point Linus has made several times: Just about NOBODY has devices that need the extended config space. At all. So making this opt-in for devices allows our users to boot and use their system if they are in the majority that has no need for even getting Wanne bet there'll be devices that screw this up? THere's devices that even screwed THis kind of patchup has been going on for the better part of a year (well 2 years) by now and it's STILL NOT ENOUGH, as you can see by the more patchups that have You're wrong there. Sad to say, but you're wrong there. -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
The PCI express spec requires the platform to provide access to this space for express-compliance. More devices will be using this space as express There may have been devices that incorrectly applied the PCI spec to various fields in the header, I'll grant you that. However, there is no way a device can determine electrically whether the Northbridge received Port IO or MMCONFIG cycles. This is between the CPU Which is why Loic's proposal and Ivan's implementation of it is so elegant. It solves all these problems in one sweep, and eliminates the code rendered The PCI spec provides for conf1 as an architected solution. It's not going away, and especially not in x86 land where Port IO is built-in to the CPU. --
On Sun, 13 Jan 2008 07:43:11 -0500 PLATFORM not OS :) Windows isn't using it in the server space, and only in the client space it recently started again sadly you're wrong. -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
As someone gently pointed out to me, you are in a position to know this, so I probably am wrong. --
On Sun, 13 Jan 2008 16:28:08 -0500 I suspect Arjan is wrong. It might be some Intel agenda but I still see fairly new driver reference code that is hardcoding port accesses even when designed for Redmond products. Alan --
Agreed. I suspect that the likelihood of conf1 accesses going away in the next five years is slim to none. Linus --
On Mon, 14 Jan 2008 00:54:34 +0000 I find it hard to believe that even they have their drivers do PCI config access via ports directly from the drivers, and especially in driver reference code... -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
To all ... Well, here is what I perceive we've got so far. . Some PCI Northbridges do not work with MMCONFIG. . Some PCI BARs can overlap the MMCONFIG area during bus sizing. It is hoped that new BIOSes will locate MMCONFIG in an area safely out of the way of bus sizing code, but there can be no guarantees. . conf1 is going away in newer x86 implementations in the not too distant future. . The PCI express spec requires platforms to provide access to the extended config area, and there are express devices today using that area for AER. . There is no need to provide different PCI config access mechanisms at device granularity, since the PCI config access mechanism between the CPU and the Northbridge is opaque to the devices. PCI config mechanisms only need to differ at the Northbridge level. . We have a flurry of patches all claiming to solve all or some of these problems. Arjan, I realize it may not be possible for you to answer this question, but I feel compelled to ask it anyway. Is it possible that future x86 architectures will be implementing a SAL-like interface to abstract PCI config access altogether? Or can we condense these patches down to a set that does the following? . If the system is capable of conf1, then PCI config access at offsets < 256 should be confined to conf1. This solution is most effective for existing and legacy systems. . If the system does not support MMCONFIG, of if MMCONFIG is not working, then accesses to offsets > 256 return -1 and an error status. . For systems, where the conf1 mechanism is NOT available, then MMCONFIG should be the PCI access mechanism for all offsets. For such systems, we must assume that the BIOS has become smart enough to locate MMCONFIG in a region safe from encroachment by bus sizing code. --
On Sun, 13 Jan 2008 22:29:23 -0500 not "conf1" but "what the platform thinks is the best method for < 256". We have this nice abstraction for the platform to select the best method... we should use it. And still, it's another attempt to get this fixed (well.. it's been 2 years in the coming so far, maybe this will be the last one, maybe it will not be... we'll see I suppose, but it sucks to be a user who doesn't need any of the functionality that the extended config space provides in theory but gets to suffer more of the issues) I'm all in favor of making this more reliable, but really.. we've thought it was fixed time and time again over the last two years. Please consider limiting the scope of the damage as well. -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
I don't understand. If we're going to differentiate MMCONFIG from some other
access mechanism, it only needs to be done at the Northbridge level. Devices
are electrically ignorant of the protocol used between CPU and Northbridge
Agreed.
So we have Loic and Ivan's patch limiting MMCONFIG accesses to
offsets >= 256.
And we have Matthew's patch that abstracts the method for config
accesses to offsets < 256.
I beleive Matthew has already tested these patches for functionality
on x86. All that's needed is to test for regressions on other arches.
Is there any interest in providing the following?
1. The ability to use MMCONFIG for all accesses on systems that have
no problems with MMCONFIG.
2. For systems using both PCI and PCI express, testing each bus
for MMCONFIG compliance, to determine whether MMCONFIG can be
used for all config accesses or whether the bus must be limited
all to the method abstracted for offsets < 256.
Or does that introduce unnecessary complications?
--On Mon, 14 Jan 2008 08:01:01 -0500 Again this is about having systems that don't need extended config space not use it. At all. The only way to do that is have the drivers say they need it, and not use it otherwise. It has NOTHING to do with how things are wired up. It's pure a kernel level policy decision about whether to use extended config space AT ALL. -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
The problem with compelling device drivers to determine the PCI config mechanism is that it must be forced upon arches that have no PCI configuration quirks or don't even use the same PCI config mechanisms as x86. I don't think that's a good policy. Better to confine arch-specific quirks to the arch-specific code whenever possible. --
On Mon, 14 Jan 2008 10:23:14 -0500 -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
Arjan, you would be foisting this call on device drivers running on arches that don't need any such distinction between extended config space and < 256 bytes. I still think it's a bad policy. Let's endeavor to confine arch-specific quirks to the arch-specific code. --
Microsoft may not but the standard of Taiwanese driver code (and by reference I mean vendor reference not OS supplier reference) is not always great. When you have weeks to write a driver for a product with a 6 month sales lifetime I guess there are other pressures on driver authors. Easy enough for Intel to analyse though. Alan --
I don't know if they 'screwed it up'. There are devices that misbehave when registers are read from pci config space. But this was never guaranteed to be a safe thing to do; it gradualy became clear that people expected to be able to read random registers and manufacturers responded accordingly, but I don't think you were ever guaranteed to be able to peek at bits of config space arbitrarily. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
Quite correct... Reading registers can have all sorts of side effects, for example clearing chip conditions. Jeff --
I think this will be OK. We'll end up with three pci_ops, one for mmconfig-only, one for mixed mmconfig-conf1 and one for conf1. We could do with that now actually -- the machines which will definitely go beserk if you try to use mmconfig could have the conf1 ops on those busses. Let's take Ivan's patch for now, and do that patch for 2.6.26. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
On Sat, 12 Jan 2008 17:40:30 +0300 Ivan Kokshaysky <ink@jurassic.park.msu.ru> wrote: no it doesn't! Other arches need no changes. --
Umm, true. I misread your patch. But it doesn't change anything - that wasn't my main objection anyway. Ivan. --
I agree, I quite dislike it too. Even If the breakage on x86 makes us want to totally disable it there, it can be done within the existing PCI ops I believe. I think Arjan's problem is to try to do it per-device since the "standard" PCI ops don't get a pci_dev structure (for obvious reasons). But from what I read in this thread, this per-device enabling/disabling doesn't seem very useful at all. Cheers, Ben. --
Here's a patch (on top of Ivan's) to improve things further.
One of Arjan's big problems with Ivan's patch is the hardcoding of conf1
as the fallback. So I took an idea from Arjan's patch, crossed it
with an idea of my own and came up with this. It gets rid of the
raw_pci_ops as a generic idea, and makes it private to the x86 arch.
It also makes the whole select-which-ops private to the x86 arch without
touching the pci layer at all.
Only compile-tested on x86-64.
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 488e48a..ffaf02b 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -43,8 +43,7 @@
#define PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg) \
(((u64) seg << 28) | (bus << 20) | (devfn << 12) | (reg))
-static int
-pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
int reg, int len, u32 *value)
{
u64 addr, data = 0;
@@ -68,8 +67,7 @@ pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
return 0;
}
-static int
-pci_sal_write (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
int reg, int len, u32 value)
{
u64 addr;
@@ -91,24 +89,17 @@ pci_sal_write (unsigned int seg, unsigned int bus, unsigned int devfn,
return 0;
}
-static struct pci_raw_ops pci_sal_ops = {
- .read = pci_sal_read,
- .write = pci_sal_write
-};
-
-struct pci_raw_ops *raw_pci_ops = &pci_sal_ops;
-
-static int
-pci_read (struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
+static int pci_read(struct pci_bus *bus, unsigned int devfn, int where,
+ int size, u32 *value)
{
- return raw_pci_ops->read(pci_domain_nr(bus), bus->number,
+ return raw_pci_read(pci_domain_nr(bus), bus->number,
devfn, where, size, value);
}
-static int
-pci_write (struct pci_bus *bus, unsigned int devf...as a general thing I like where this patch is going On Sun, 13 Jan 2008 00:24:15 -0700 would be nice the "reg > 256 && raw_pci_Ext_ops==NULL" case would just call the raw_pci_ops-> pointer, to give that a chance of refusal couldn't this (at least in some next patch) use the vector if it exists? why set BOTH vectors? you probably ONLY want to set the ext one, so that calls to the lower 256 go to the original --
