Where did this patch go? I didn't get notified that anyone dropped it, but I don't see it in current -git.. -
I think Greg doesn't like it, even though we don't have an alternative at this point... Jesse -
Yes, I didn't like it, Ivan didn't like it, and I got reports that it wasn't even needed at all once you upgraded your BIOS to the latest version. So, is this still needed? And if so, can you try to implement what Ivan suggested to do here instead? thanks, greg k-h -
Aren't you guys referring to pci-disable-decode-of-io-memory-during-bar-sizing.patch? This is another one entirely, though related. -
I have no idea, there have been a lot of conflicting patches in this area... -
Oh sorry, my confusion. I don't know what happened to the mmconfig vs. ACPI resources patch. Jesse -
Yes, it's still needed. Auke rescinded his "BIOS upgrade makes it work" message, so something like this is still necessary. Jesse -
He did? Ugh, I can't keep these all straight, sorry. Can someone just send what they think is still needed, and explain why Ivan will not object to it? :) thanks, greg k-h -
Here's a recap of the whole issue just for people's information: Right now we disable MMCONFIG on machines where the MCFG area is not reserved in the E820 memory map since we figure it's not valid. This is a broken heuristic because the PCI Express firmware spec doesn't require that it be so reserved, it only needs to be reserved as an ACPI motherboard resource, and so many times it's not reserved in E820 despite being completely valid and working. The mmconfig-validate-against-acpi-motherboard-resources.patch changes this to validate against the ACPI motherboard resources instead. The second problem is that on some machines, when we are doing BAR sizing on PCI devices, and write all ones to a BAR in order to determine how many bits "stick", the BAR ends up overlapping with the MCFG area. On some chipsets, this causes writes to the MCFG area (like, to restore the original BAR contents) to get decoded by the device instead of by the MCFG mechanism, which means the BAR stays disabled and configuration access stops working, wreaking havoc. Usually on these machines the MMCONFIG is located near the top of 32-bit memory and the PCI device causing problems is a PCI Express graphics card. pci-disable-decode-of-io-memory-during-bar-sizing.patch, and its successors, switch off the device's decoding during sizing so that it won't absorb the accesses to the MCFG table. The concern raised was that this might affect some devices negatively. We do avoid disabling decode on host bridges, as it's known that some of them disable RAM access when you turn decode off, stupidly. I've yet to hear of any other conclusive case where disabling the decode is harmful. In general, if disabling the decode causes issues, the mere fact of doing the BAR sizing could cause the same issues, and that is unavoidable. The other possible workaround would be to avoid using MMCONFIG until the BAR sizing is done. However, this seems like a poor solution. First of all, in the future ...
The only sane solution is the one that people constantly seem to ignore: - only use MMCONFIG if absolutely required by the access itself In other words, make the MMCONFIG code fall back on old-style accesses for any register access to a word with reg+size <= 256. At that point, almost all the issues with mmconfig just go away, BECAUSE NOBODY USES IT, so it doesn't matter if it's broken? The fact is, CONF1 style accesses are just safer, and *work*. Can we please just do that? Linus -
On Tue, 30 Oct 2007 08:15:46 -0700 (PDT) I would suggest a slight twist then: use CONF1 *until* you're using something above 256, and then and only then switch to MMCONFIG from then on for all accesses. That should solve the spec issue (yes I know about specs, but this is one of those things that will bite us in the future if we're not careful) -
You have to, anyway. Even now the MMCONFIG stuff uses CONF1 cycles for startup. Also, there's reason to believe that mixing things up _has_ to work anyway, and if the issue is between "works in practice" and "theory says that you shouldn't mix", I'll take practice every time. Especially since we *know* that the theory is broken. Right now MMCONFIG is effectively disabled very aggressively because it's simply unusably flaky. So the choice is between: - don't use MMCONFIG at all, because it has so many problems - use MMCONFIG sparingly enough to hide the problems and what "you're supposed to do" is simply trumped by Real Life(tm). Because Intel screwed up so badly when they designed that piece of shit. (Where "screwed up badly" is the usual "left it to firmware people" thing, of course. Dammit, Intel *could* have just made it a real PCI BAR in the Northbridge, and specified it as such, and we wouldn't have these problems! But no, it had to be another idiotic "firmware tells where it No. Maybe if you do it per-device, and only *after* probing (ie we have seen multiple, and successful, accesses), but globally, absolutely not. That would be useless. The bugs we have had in this area have been exactly the kinds of things like "we don't know the real size of the MMCONFIG areas" etc. I could easily see device driver writers probing to see if something works, and I absolutely don't think we should just automatically enable MMCONFIG from then on. But maybe we could have a per-device flag that a driver *can* set. Ie have the logic be: - use MMCONFIG if we have to (reg >= 256) OR - use MMCONFIG if the driver specifically asked us to and then drivers that absolutely need it, and know they do, can set that flag. Preferably after they actually verified that it works. That way you _can_ get the "this is how you're supposed to do it" behaviour, but you get it when there is a reasonable chance that it actually works. And quite ...
On Tue, 30 Oct 2007 10:07:48 -0700 (PDT) something like int pci_enable_mmconfig(struct pci_dev *pdev) ? I'll see what I can do ;) -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org -
Yes, that looks fine. It also matches the kinds of things drivers already have to do (ie enable DMA, MSI etc), both conceptually and from a purely syntactic/practical standpoint. I think mmconfig and MSI have a lot in common, in that both are relatively Well, let's see if "pci_enable_mmconfig()" works out first. There probably aren't more than a few drivers that would be interested in this anyway, so it's probably fine. And by the time mmconfig is commonly used, probably all the teething problems are long gone, so we could plan on a "in five years, maybe we could even enable it by default" thing? Linus -
Please remember that the driver is not the sole user of the PCI config space -- user-space programs (e.g., lspci) can access it via sysfs, too. Should we force users of such programs to add a magic kernel parameter to enable MMCONFIG? Does not make much sense. Maybe we should do all bus scanning with conf1 and then try if MMCONFIG returns the same values? Have a nice fortnight -- Martin `MJ' Mares <mj@ucw.cz> http://mj.ucw.cz/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth "Beware of bugs in the above code; I have only proved it correct, not tried it." -- D.E.K. -
On Thu, 1 Nov 2007 09:31:40 +0100 that is already in the code today but not nearly enough; there's a ton of cases where it's "touch mmconfig and the box is dead"... -
Argh. However I still do not think it is an acceptable reason for breaking all userspace access to the extended config space. At the very least, there should be a sysfs toggle to enable MMCONFIG if root really wishes that (maybe for a particular device, as the drivers would do). Have a nice fortnight -- Martin `MJ' Mares <mj@ucw.cz> http://mj.ucw.cz/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth Do not believe in miracles -- rely on them. -
The per-device flag is fine with me, but I should make something clear: MMCONFIG IS NOT BROKEN! Nor is finding mmconfig space. We know how to do that too. What's broken is our PCI probing with certain address space layouts that include MMCONFIG space. Since MMCONFIG is in MMIO space (by definition) there will always be the potential for problems when we use MMCONFIG and don't disable decode while sizing BARs (probably a latent bug on many non-PC Linux platforms). So we can either use I/O ports for sizing and only switch to MMCONFIG later, or we can just use it on an as-needed basis, or we can make our PCI probing safe if MMCONFIG is in use. Jesse -
Trust me, it is. The particular problem _you_ had with it is only a small small part of the No. You really don't see the big picture. There's been tons of problems with MMCONFIG. Like the fact that other devices have their IO regions registered on top of it, because the MMCONFIG thing was done as a hidden resource. Or the fact that the area claimed was too small. Or too large. Or not listed at all. The whole thing is a total disaster. I told Intel engineers literally *years* ago to not do that idiotic "hidden IO resources that are described by firmware that then inevitably gets things wrong", and yet what happens? Every single time. Linus -
Yeah, that's definitely a problem, and would be a firmware bug. There's no doubt that firmwares have had trouble with this in the past, but given that Vista now relies on this stuff working, it's a lot more I don't disagree there. I'm just saying the actual mechanism is fine (as illustrated by the numerous non-PC ports of Linux), and this particular problem, at least, isn't really specific to how MMCONFIG is described or configured by the firmware and OS, it's simply a Linux problem. But like I said, the per-device flag Arjan suggested is fine with me... Jesse -
From: Jesse Barnes <jbarnes@virtuousgeek.org> I totally disagree, it is not a Linux problem. The non-PC ports don't have this issue because the PCI config space MMIO area of the PCI controller does not overlap PCI MMIO space at all. So the problem simply cannot happen. They choose to put this MMCONFIG in an area overlapping with PCI MMIO space, which breaks both existing practice and code. -
I agree. I think most of the big problems we had were basically with unreleased or very early systems. But the per-device flag should "just fix it" regardless, and we can go forward assuming that things work, but without breaking those borderline systems. Linus -
Unfortunately I think some such machines *were* released, only because the release engineers figured no one actually uses MMCONFIG yet (Windows == whole world of users), so why worry about that particular bug? The "not E820-reserved" message is actually bogus. I'll bet MMCONFIG works fine on your machine with Robert's patch and the disable decode stuff applied. Jesse -
iirc they tended to hang for whatever reason when the mcfg They were :/ The bug flowed from a Intel reference BIOS to lots of That's a different issue. The spec does not require it to be e820-reserved. That was just a (bogus) heuristic originally added to handle the early BIOS bug. Even BIOS where the mcfg is fine do not reserve it there. There were some patches to check it against ACPI resources (which was presumably better specification conforming and more importantly similar to what M$ does). I'm not sure that fixed all problems and Well the overlapping to MMIO would have been fine as design if BIOS had gotten it right. Trying to design things that BIOS can't get it wrong is probably futile -- the BIOSes would just find more subtle ways to break. The real problem I guess was that Linux was trying to use it before Windows which is usually a bad idea regarding BIOS support at least in the Desktop/Laptop space. -Andi -
If it does, it's not by necessity. As soon as you read the table location out of the ACPI tables you can start using it, and that Fact is, we don't really know how many of these systems with supposedly "broken" MMCONFIG were really just suffering from the overlapping PCI/MMCONFIG address space problem, which is entirely the fault of the Why per device? It's not like the MSI case where both the platform and the device are potentially busted. Whether or not MMCONFIG works has nothing to do with the device, all that matters is whether it works on How will they verify that it works? If it works, then verifying it works is all well and good. If it doesn't work, trying to verify if it does could very well blow up the machine. I've made the point before that if we're going to allow using it at all, we'd better find out if it works or not early on, not after we've been running and somebody decides it's a good idea to try using it and Intel could say what they want on the subject.. but that doesn't necessarily reflect what happens with anyone else's chipset implementations. -
On Tue, 30 Oct 2007 17:41:26 -0600 it's per device so that if you have have no users for this stuff, you'll never ever get bitten by bugs (be it linux or the bios). It's just avoiding the problem for a large class of cases... doesn't mean it's solved for the other class, just in practice it turns it into a much smaller problem. -
Exactly. It's not that we care about trying to protect a system that really needs to use MMCFG - once you really *really* need it, you're screwed if it doesn't work. But currently 99% of all systems will never need it, and for the broken platforms, that's a nice round 100%. So if we make code not use MMCFG by default, and you have to enable it for just those device drivers that really want it and care, you automatically avoid all the broken hardware, with no real downsides. Linus -
Don't be silly. Exactly _BECAUSE_ we cannot trust the firmware, we have to use conf1 (which we can trust) to verify it and/or fix things up. Also, there are several devices that don't show up in the MMCFG things, or just otherwise get it wrong. So just take a look at arch/x86/pci/mmconfig-shared.c and look for "conf1". Really. Damn, I'm nervous taking any MMCFG patches that has you as an author, if you aren't even aware of these kinds of fundamnetal issues. You probably read the standards about how things are "supposed" to work, and then just believed them? Rule #1 in kernel programming: don't *ever* think that things actually work the way they are documented to work. The documentation is a starting point, nothing else. And please be defensive in programming. We *know* conf1 cycles work. The hardware has been extensively tested, and there are no firmware interactions. There is *zero* reasons to use MMCONF cycles for normal devices. Ergo: switching over to MMCONF when not needed is stupid and fragile. Linus -
My point was, it's not inherently necessary in order to use MMCONFIG. I'm not saying the checks (unreachable_devices and pci_mmcfg_check_hostbridge) aren't useful or needed with many real machines. However, in the event that type1 access isn't available we just skip all those checks because we have no other option. It would indeed be a pretty broken spec if there was no way to bootstrap with it I can't really disagree that MMCONFIG doesn't have great advantages for most devices (though it likely is faster on a lot of platforms, which may be significant if the device does lots of config space accesses). So for the moment, avoiding using it except where necessary will likely work out (except if some system does indeed puke on mixing type1 and MMCONFIG). However, what Microsoft is doing with Vista may eventually make a difference in the future. Many hardware vendors seem to use the testing strategy of "test with latest Windows version. Works OK? Ship it." If Vista decides that MMCONFIG is good to use all the time, then type1 access support is likely going to a) end up less tested and b) probably deleted entirely in time. We've seen it before - it used to be that not using ACPI was the safe option on most hardware with Linux. Now you pretty much have to use it because the manufacturers only test with it enabled. I've seen at least one board where the interrupt routing was completely broken with ACPI off, because they obviously only tested in Windows.. -
We have a few bits of code in there to look at the actual MMCONFIG base register, which is good since we can sanity check it against the firmware. However, by itself it's not enough since we may end up using it even though the BIOS didn't tell us about an overlap that really does exist. Robert recognized this and added checks to make sure the values read from the base regs actually matched what the firmware was telling us in the ACPI tables (at least iirc, it's been awhile since I looked at the patch). So don't be too nervous. :) Jesse -
We seem to slowly grow more uses of registers > 256, but yes it's still mostly At least for the AMD GART IOMMU I would prefer if that was not done. It has a pci config space access in a time critical path (flush). And Family10h supports accessing the northbridge using mmconfig. I believe the tigon3 driver also does config space access frequently on some chips. Also there are still the old x86 Macs where conf1 doesn't work. -Andi -
I don't think that was ever true. That was a made-up rumor by the EFI people who were just confused. There's no way to disable conf1 accesses on any known chipset, afaik. Linus -
There was a tester with such a machine and he complained until the heuristic handled this case. So there is evidence it really was like this. -Andi -
From: Andi Kleen <ak@suse.de> This case is not worth optimizing for at all. -
