Where did this patch go? I didn't get notified that anyone dropped it,
but I don't see it in current -git..-
I think Greg doesn't like it, even though we don't have an alternative
at this point...Jesse
-
Yes, I didn't like it, Ivan didn't like it, and I got reports that it
wasn't even needed at all once you upgraded your BIOS to the latest
version.So, is this still needed? And if so, can you try to implement what Ivan
suggested to do here instead?thanks,
greg k-h
-
Yes, it's still needed. Auke rescinded his "BIOS upgrade makes it work"
message, so something like this is still necessary.Jesse
-
He did? Ugh, I can't keep these all straight, sorry.
Can someone just send what they think is still needed, and explain why
Ivan will not object to it? :)thanks,
greg k-h
-
Here's a recap of the whole issue just for people's information:
Right now we disable MMCONFIG on machines where the MCFG area is not
reserved in the E820 memory map since we figure it's not valid. This is
a broken heuristic because the PCI Express firmware spec doesn't require
that it be so reserved, it only needs to be reserved as an ACPI
motherboard resource, and so many times it's not reserved in E820
despite being completely valid and working. The
mmconfig-validate-against-acpi-motherboard-resources.patch changes this
to validate against the ACPI motherboard resources instead.The second problem is that on some machines, when we are doing BAR
sizing on PCI devices, and write all ones to a BAR in order to determine
how many bits "stick", the BAR ends up overlapping with the MCFG area.
On some chipsets, this causes writes to the MCFG area (like, to restore
the original BAR contents) to get decoded by the device instead of by
the MCFG mechanism, which means the BAR stays disabled and configuration
access stops working, wreaking havoc. Usually on these machines the
MMCONFIG is located near the top of 32-bit memory and the PCI device
causing problems is a PCI Express graphics card.
pci-disable-decode-of-io-memory-during-bar-sizing.patch, and its
successors, switch off the device's decoding during sizing so that it
won't absorb the accesses to the MCFG table.The concern raised was that this might affect some devices negatively.
We do avoid disabling decode on host bridges, as it's known that some of
them disable RAM access when you turn decode off, stupidly. I've yet to
hear of any other conclusive case where disabling the decode is harmful.
In general, if disabling the decode causes issues, the mere fact of
doing the BAR sizing could cause the same issues, and that is unavoidable.The other possible workaround would be to avoid using MMCONFIG until the
BAR sizing is done. However, this seems like a poor solution. First of
all, in the future there ma...
The only sane solution is the one that people constantly seem to ignore:
- only use MMCONFIG if absolutely required by the access itself
In other words, make the MMCONFIG code fall back on old-style accesses for
any register access to a word with reg+size <= 256.At that point, almost all the issues with mmconfig just go away, BECAUSE
NOBODY USES IT, so it doesn't matter if it's broken?The fact is, CONF1 style accesses are just safer, and *work*.
Can we please just do that?
Linus
-
We seem to slowly grow more uses of registers > 256, but yes it's still mostly
At least for the AMD GART IOMMU I would prefer if that was not done.
It has a pci config space access in a time critical path (flush).
And Family10h supports accessing the northbridge using mmconfig.I believe the tigon3 driver also does config space access frequently
on some chips.Also there are still the old x86 Macs where conf1 doesn't work.
-Andi
-
From: Andi Kleen <ak@suse.de>
This case is not worth optimizing for at all.
-
I don't think that was ever true. That was a made-up rumor by the EFI
people who were just confused.There's no way to disable conf1 accesses on any known chipset, afaik.
Linus
-
There was a tester with such a machine and he complained until
the heuristic handled this case. So there is evidence it
really was like this.-Andi
-
On Tue, 30 Oct 2007 08:15:46 -0700 (PDT)
I would suggest a slight twist then: use CONF1 *until* you're using
something above 256, and then and only then switch to MMCONFIG from
then on for all accesses.That should solve the spec issue (yes I know about specs, but this is
one of those things that will bite us in the future if we're not
careful)-
You have to, anyway. Even now the MMCONFIG stuff uses CONF1 cycles for
startup.Also, there's reason to believe that mixing things up _has_ to work
anyway, and if the issue is between "works in practice" and "theory says
that you shouldn't mix", I'll take practice every time.Especially since we *know* that the theory is broken. Right now MMCONFIG
is effectively disabled very aggressively because it's simply unusably
flaky. So the choice is between:- don't use MMCONFIG at all, because it has so many problems
- use MMCONFIG sparingly enough to hide the problemsand what "you're supposed to do" is simply trumped by Real Life(tm).
Because Intel screwed up so badly when they designed that piece of shit.(Where "screwed up badly" is the usual "left it to firmware people" thing,
of course. Dammit, Intel *could* have just made it a real PCI BAR in the
Northbridge, and specified it as such, and we wouldn't have these
problems! But no, it had to be another idiotic "firmware tells where itNo.
Maybe if you do it per-device, and only *after* probing (ie we have seen
multiple, and successful, accesses), but globally, absolutely not. That
would be useless. The bugs we have had in this area have been exactly the
kinds of things like "we don't know the real size of the MMCONFIG areas"
etc.I could easily see device driver writers probing to see if something
works, and I absolutely don't think we should just automatically enable
MMCONFIG from then on.But maybe we could have a per-device flag that a driver *can* set. Ie have
the logic be:- use MMCONFIG if we have to (reg >= 256)
OR
- use MMCONFIG if the driver specifically asked us to
and then drivers that absolutely need it, and know they do, can set that
flag. Preferably after they actually verified that it works.That way you _can_ get the "this is how you're supposed to do it"
behaviour, but you get it when there is a reasonable chance that it
actually works.And quite frankly...
If it does, it's not by necessity. As soon as you read the table
location out of the ACPI tables you can start using it, and thatFact is, we don't really know how many of these systems with supposedly
"broken" MMCONFIG were really just suffering from the overlapping
PCI/MMCONFIG address space problem, which is entirely the fault of theWhy per device? It's not like the MSI case where both the platform and
the device are potentially busted. Whether or not MMCONFIG works has
nothing to do with the device, all that matters is whether it works onHow will they verify that it works? If it works, then verifying it works
is all well and good. If it doesn't work, trying to verify if it does
could very well blow up the machine.I've made the point before that if we're going to allow using it at all,
we'd better find out if it works or not early on, not after we've been
running and somebody decides it's a good idea to try using it andIntel could say what they want on the subject.. but that doesn't
necessarily reflect what happens with anyone else's chipset implementations.
-
Don't be silly. Exactly _BECAUSE_ we cannot trust the firmware, we have to
use conf1 (which we can trust) to verify it and/or fix things up.Also, there are several devices that don't show up in the MMCFG things, or
just otherwise get it wrong.So just take a look at arch/x86/pci/mmconfig-shared.c and look for
"conf1".Really. Damn, I'm nervous taking any MMCFG patches that has you as an
author, if you aren't even aware of these kinds of fundamnetal issues. You
probably read the standards about how things are "supposed" to work, and
then just believed them?Rule #1 in kernel programming: don't *ever* think that things actually
work the way they are documented to work. The documentation is a starting
point, nothing else.And please be defensive in programming. We *know* conf1 cycles work. The
hardware has been extensively tested, and there are no firmware
interactions. There is *zero* reasons to use MMCONF cycles for normal
devices. Ergo: switching over to MMCONF when not needed is stupid and
fragile.Linus
-
We have a few bits of code in there to look at the actual MMCONFIG base
register, which is good since we can sanity check it against the
firmware. However, by itself it's not enough since we may end up using
it even though the BIOS didn't tell us about an overlap that really
does exist. Robert recognized this and added checks to make sure the
values read from the base regs actually matched what the firmware was
telling us in the ACPI tables (at least iirc, it's been awhile since I
looked at the patch).So don't be too nervous. :)
Jesse
-
My point was, it's not inherently necessary in order to use MMCONFIG.
I'm not saying the checks (unreachable_devices and
pci_mmcfg_check_hostbridge) aren't useful or needed with many real
machines. However, in the event that type1 access isn't available we
just skip all those checks because we have no other option. It would
indeed be a pretty broken spec if there was no way to bootstrap with itI can't really disagree that MMCONFIG doesn't have great advantages for
most devices (though it likely is faster on a lot of platforms, which
may be significant if the device does lots of config space accesses). So
for the moment, avoiding using it except where necessary will likely
work out (except if some system does indeed puke on mixing type1 and
MMCONFIG).However, what Microsoft is doing with Vista may eventually make a
difference in the future. Many hardware vendors seem to use the testing
strategy of "test with latest Windows version. Works OK? Ship it." If
Vista decides that MMCONFIG is good to use all the time, then type1
access support is likely going to a) end up less tested and b) probably
deleted entirely in time. We've seen it before - it used to be that not
using ACPI was the safe option on most hardware with Linux. Now you
pretty much have to use it because the manufacturers only test with it
enabled. I've seen at least one board where the interrupt routing was
completely broken with ACPI off, because they obviously only tested in
Windows..-
On Tue, 30 Oct 2007 17:41:26 -0600
it's per device so that if you have have no users for this stuff,
you'll never ever get bitten by bugs (be it linux or the bios).
It's just avoiding the problem for a large class of cases... doesn't
mean it's solved for the other class, just in practice it turns it into
a much smaller problem.
-
Exactly. It's not that we care about trying to protect a system that
really needs to use MMCFG - once you really *really* need it, you're
screwed if it doesn't work.But currently 99% of all systems will never need it, and for the broken
platforms, that's a nice round 100%. So if we make code not use MMCFG by
default, and you have to enable it for just those device drivers that
really want it and care, you automatically avoid all the broken hardware,
with no real downsides.Linus
-
The per-device flag is fine with me, but I should make something clear:
MMCONFIG IS NOT BROKEN!
Nor is finding mmconfig space. We know how to do that too.
What's broken is our PCI probing with certain address space layouts that
include MMCONFIG space. Since MMCONFIG is in MMIO space (by
definition) there will always be the potential for problems when we use
MMCONFIG and don't disable decode while sizing BARs (probably a latent
bug on many non-PC Linux platforms).So we can either use I/O ports for sizing and only switch to MMCONFIG
later, or we can just use it on an as-needed basis, or we can make our
PCI probing safe if MMCONFIG is in use.Jesse
-
Trust me, it is.
The particular problem _you_ had with it is only a small small part of the
No. You really don't see the big picture. There's been tons of problems
with MMCONFIG. Like the fact that other devices have their IO regions
registered on top of it, because the MMCONFIG thing was done as a hidden
resource. Or the fact that the area claimed was too small. Or too large.
Or not listed at all.The whole thing is a total disaster. I told Intel engineers literally
*years* ago to not do that idiotic "hidden IO resources that are described
by firmware that then inevitably gets things wrong", and yet what happens?
Every single time.Linus
-
iirc they tended to hang for whatever reason when the mcfg
They were :/ The bug flowed from a Intel reference BIOS to lots of
That's a different issue. The spec does not require it to be e820-reserved.
That was just a (bogus) heuristic originally added to handle the early BIOS
bug. Even BIOS where the mcfg is fine do not reserve it there.There were some patches to check it against ACPI resources (which
was presumably better specification conforming and more importantly
similar to what M$ does). I'm not sure that fixed all problems andWell the overlapping to MMIO would have been fine as design if BIOS had
gotten it right. Trying to design things that BIOS can't get it wrong
is probably futile -- the BIOSes would just find more subtle ways to break.
The real problem I guess was that Linux was trying to use it before
Windows which is usually a bad idea regarding BIOS support at least
in the Desktop/Laptop space.-Andi
-
Unfortunately I think some such machines *were* released, only because
the release engineers figured no one actually uses MMCONFIG yet
(Windows == whole world of users), so why worry about that particular
bug?The "not E820-reserved" message is actually bogus. I'll bet MMCONFIG
works fine on your machine with Robert's patch and the disable decode
stuff applied.Jesse
-
Yeah, that's definitely a problem, and would be a firmware bug. There's
no doubt that firmwares have had trouble with this in the past, but
given that Vista now relies on this stuff working, it's a lot moreI don't disagree there. I'm just saying the actual mechanism is fine
(as illustrated by the numerous non-PC ports of Linux), and this
particular problem, at least, isn't really specific to how MMCONFIG is
described or configured by the firmware and OS, it's simply a Linux
problem.But like I said, the per-device flag Arjan suggested is fine with me...
Jesse
-
I agree. I think most of the big problems we had were basically with
unreleased or very early systems.But the per-device flag should "just fix it" regardless, and we can go
forward assuming that things work, but without breaking those borderline
systems.Linus
-
From: Jesse Barnes <jbarnes@virtuousgeek.org>
I totally disagree, it is not a Linux problem.
The non-PC ports don't have this issue because the PCI config space
MMIO area of the PCI controller does not overlap PCI MMIO space at
all. So the problem simply cannot happen.They choose to put this MMCONFIG in an area overlapping with PCI MMIO
space, which breaks both existing practice and code.
-
On Tue, 30 Oct 2007 10:07:48 -0700 (PDT)
something like
int pci_enable_mmconfig(struct pci_dev *pdev) ?I'll see what I can do ;)
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
-
Please remember that the driver is not the sole user of the PCI config
space -- user-space programs (e.g., lspci) can access it via sysfs, too.
Should we force users of such programs to add a magic kernel parameter
to enable MMCONFIG? Does not make much sense.Maybe we should do all bus scanning with conf1 and then try if MMCONFIG
returns the same values?Have a nice fortnight
--
Martin `MJ' Mares <mj@ucw.cz> http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
"Beware of bugs in the above code; I have only proved it correct, not tried it." -- D.E.K.
-
On Thu, 1 Nov 2007 09:31:40 +0100
that is already in the code today but not nearly enough; there's a ton
of cases where it's "touch mmconfig and the box is dead"...
-
Argh. However I still do not think it is an acceptable reason for breaking
all userspace access to the extended config space.At the very least, there should be a sysfs toggle to enable MMCONFIG if
root really wishes that (maybe for a particular device, as the drivers
would do).Have a nice fortnight
--
Martin `MJ' Mares <mj@ucw.cz> http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
Do not believe in miracles -- rely on them.
-
Yes, that looks fine. It also matches the kinds of things drivers already
have to do (ie enable DMA, MSI etc), both conceptually and from a purely
syntactic/practical standpoint.I think mmconfig and MSI have a lot in common, in that both are relatively
Well, let's see if "pci_enable_mmconfig()" works out first. There probably
aren't more than a few drivers that would be interested in this anyway, so
it's probably fine.And by the time mmconfig is commonly used, probably all the teething
problems are long gone, so we could plan on a "in five years, maybe we
could even enable it by default" thing?Linus
-
Aren't you guys referring to
pci-disable-decode-of-io-memory-during-bar-sizing.patch? This is another
one entirely, though related.
-
I have no idea, there have been a lot of conflicting patches in this
area...
-
Oh sorry, my confusion.
I don't know what happened to the mmconfig vs. ACPI resources patch.
Jesse
-
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Andi Kleen | [PATCH x86] [0/16] Various i386/x86-64 changes |
| Vladislav Bolkhovitin | Re: Integration of SCST in the mainstream Linux kernel |
| Pavel Roskin | ndiswrapper and GPL-only symbols redux |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Natalie Protasevich | [BUG] New Kernel Bugs |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Arjan van de Ven | Re: [GIT]: Networking |
