login
Header Space

 
 

Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in

Previous thread: [BUG][PATCH -mm] bluetooth : rfcomm add get/put device in del_conn by Dave Young on Tuesday, December 25, 2007 - 6:03 am. (4 messages)

Next thread: TOMOYO Linux Security Goal by Tetsuo Handa on Tuesday, December 25, 2007 - 8:33 am. (15 messages)
To: <linux-kernel@...>
Cc: Jeff Garzik <jeff@...>, Linus Torvalds <torvalds@...>, <gregkh@...>, <inux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Matthew Wilcox <matthew@...>
Date: Tuesday, December 25, 2007 - 7:26 am

From: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Subject: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in

On PCs, PCI extended configuration space (4Kb) is riddled with problems 
associated with the memory mapped access method (MMCONFIG). At the same
time, there are very few machines that actually need or use this extended
configuration space. 

At this point in time, the only sensible action is to make access to the
extended configuration space an opt-in operation for those device drivers
that need/want access to this space, as well as for those userland 
diagnostics utilities that (on admin request) want to access this space.

It's inevitable that this is done per device rather than per bus; we'll
be needing per device PCI quirks to turn this extended config space off 
over time no matter what; in addition, it gives the least amount of surprise:
loading a driver for a device only impacts that one device, not a whole bus
worth of devices (although it'll be common to have one physical device per
bus on PCI-E).

The (desireable) side-effect of this patch is that all enumeration is done
using normal configuration cycles.

The patch below splits the lower level PCI config space operation (which
operate on a bus) in two: one that normally only operates on traditional 
space, and one that gets used after the driver has opted in to using the
extended configuration space. This has lead to a little code duplication,
but it's not all that bad (most of it is prototypes in headers and such).

Architectures that have a solid reliable way to get to extended configuration
space can just keep doing what they do now and allow extended space access
from the "traditional" bus ops, and just not fill in the new bus ops.
(This could include x86 for, say, BIOS year 2009 and later, but doesn't
right now)

This patch also adds a sysfs property for each device into which root can
write a '1' to enable extended configuration space. The kernel will print
a notice into dmesg w...
To: Arjan van de Ven <arjan@...>
Cc: <linux-kernel@...>, Linus Torvalds <torvalds@...>, <gregkh@...>, <inux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Matthew Wilcox <matthew@...>
Date: Thursday, December 27, 2007 - 7:52 am

Comments:

1) [minor] With a bit in struct pci_dev, there is no need for separate 
raw_pci_ops.  That will simplify your patch, with no functionality change.

"golden" arches (no pun intended) may implement raw_pci_ops that 
_always_ work with extended config space, and simply ignore that bit, if 
that is how their underlying non-mmconfig-nor-type1 hardware is implemented.


2) [non-minor] hmmmm.

	[jgarzik@core ~]$ lspci -n | wc -l
	23

So I would have to perform 23 sysfs twiddles, before I could obtain a 
full and unabridged 'lspci -vvvxxx'?

For the userspace interface, the most-often-used knob for diagnostic 
purposes will be the easiest one.  And that's

	echo 1 &gt; enable-ext-cfg-space-for-all-buses-ACPI-says-to
	lspci -vvvxxx


3) [minor] architectures must be able to override 
pci_enable_ext_config().  see "golden arches".




--
To: Jeff Garzik <jeff@...>
Cc: Arjan van de Ven <arjan@...>, <linux-kernel@...>, <gregkh@...>, <inux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Matthew Wilcox <matthew@...>
Date: Thursday, December 27, 2007 - 1:52 pm

Or you force it on with "pci=mmconfig" or something at boot-time.

But yes. The *fact* is that MMCONFIG has not just been globally broken, 
but broken on a per-device basis. I don't know why (and quite frankly, I 
doubt anybody does), but the PCI device ID corruption happened only for a 
specific set of devices.

Whether it was a timing issue with particular devices or whether it was a 
timing issue with some particular bridge (and could affect any devices 
behind that bridge), who knows... It almost certainly was brought on by a 
borderline (or broken) northbridge, but it apparently only affected 
specific devices - which makes me suspect that it wasn't *entirely* due to 
just the northbridge, and it was a combination of things.

I don't understand why you cannot seem to accept that per-device thing, in 
the face of clear data that yes, it really *is* per-device. Not to mention 
the fact that the way MMIO config setups work, you may well have entire 
buses that simply aren't accessible with MMIO config at all (because the 
MMIO config window is not large enough).

Furthermore, please accept the fact that of those 23 devices, exactly 
*none* will actually care. So yes, you'd have to enable it manually for 
those individual devices, but that's only if you want to do something 
totally pointless in the first place.

So stop this totally inane "it has to be global" crap. It doesn't have to 
be global at all, and we have hard data showing that it really SHOULD NOT 
be a global flag.

		Linus
--
To: Jeff Garzik <jeff@...>
Cc: <linux-kernel@...>, Linus Torvalds <torvalds@...>, <gregkh@...>, <inux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Matthew Wilcox <matthew@...>
Date: Thursday, December 27, 2007 - 10:09 am

On Thu, 27 Dec 2007 06:52:35 -0500


but sadly your second statement is not correct. Part of the complication is that all PCI config ops
operate on busses not devices; at first I thought "just add a bit and be done with it", but sadly it's
not quite the case. Due to the per-bus nature of the ops, you end up having 2 type of bus operations, 
and that's just boilerplate (prototypes, exports and stuff) but it makes up most of the lines of the patch

In addition, a separate raw_pci_ops (for x86 only!) is needed anyway since it's quite likely that 
we'll have various options of each case (extended or not) and we want to pick the best one for each case,



the easiest one is an option to lspci. Nothing more nothing less.

Making a global knob in kernel space is a lot more tricky, and in addition
really there's enough cases where userspace wants the one device anyway
Doing the "for each device I'm about to dump" in lspci is pretty much as hard as doing

see the patch. All pci_enable_ext_config() does is set a flag.
The architecture decides what to do with that flag. Golden
architectures can just totally ignore the flag and always expose
the full space. 
(In fact, the patch assumes all-but-x86 to be golden here; which is fair)



-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To: <linux-kernel@...>
Date: Tuesday, January 15, 2008 - 8:58 am

I just thought this might be interesting to the discussion.

I recently bought another 2 GB memory for my computer.
My hardware is as following:

Asus Commando (Intel P965 chipset)
Intel Core2 Q6600
4x1 GB Geil PC6400 memory
nVidia 8800 gts (old g80 core, 640 mb mem)

Without booting with pci=nommeconf i have severe stability issues and
often when its not crashing i get slowdowns with the error:

kern.log:Jan 15 13:19:40 bilbo kernel: [  132.046715] NVRM: Xid
(0001:00): 6, PE0001
... repeated x times.

In addition the nVidia framebuffer seems to "leak" or not update since
i get loads of graphics artifacts.

The system works perfectly fine with 2 GB memory and not the
pci=nommconf.
It works like a charm when using pci=nommconf and 4 GB memory.

In adition i have to enable the Northbridge-&gt;PCI Memory remap feature
in the BIOS to avoid the kernel panicing when trying to access &gt; 3 gb
but that is understandable :)

My software is Kubuntu 7.10 stock x86_64 kernel, but i do use the
binary driver by nVidia.

It works like a charm when using pci=nommconf

If you guys need any more info about hardware/software from me, please
let me know.

-- 
Øyvind Vågen Jægtnes
+47 96 22 03 08

(i reject your diurnal rhythm and subsitute my own)
--
To: Arjan van de Ven <arjan@...>
Cc: <linux-kernel@...>, Jeff Garzik <jeff@...>, Linus Torvalds <torvalds@...>, <gregkh@...>, <inux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Matthew Wilcox <matthew@...>
Date: Friday, January 11, 2008 - 3:02 pm

Can you send me a follow-on patch that documents this in
Documentation/ABI please.

thanks,

greg k-h
--
To: Greg KH <greg@...>
Cc: Arjan van de Ven <arjan@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, Linus Torvalds <torvalds@...>, <gregkh@...>, <inux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>
Date: Friday, January 11, 2008 - 3:28 pm

Greg, if you integrate Ivan's patch, you don't need Arjan's patch.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To: Matthew Wilcox <matthew@...>
Cc: Greg KH <greg@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, Linus Torvalds <torvalds@...>, <gregkh@...>, <inux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>
Date: Friday, January 11, 2008 - 3:40 pm

On Fri, 11 Jan 2008 12:28:20 -0700

Personally I absolutely don't agree with that.
Ivan's patch is another attempt to make MMCONFIG work somewhat better,
but does not provide the explicit opt-in that I think is required at
this point; people have tried to get MMCONFIG stable for a really long time,
and failed still upto today. At least my patience is up and this needs
to be opt-in.



-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To: Arjan van de Ven <arjan@...>
Cc: Matthew Wilcox <matthew@...>, Greg KH <greg@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, Linus Torvalds <torvalds@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>
Date: Friday, January 11, 2008 - 3:45 pm

I think I agree with Arjan here, Ivan's patch should also work on top of
this one, and will help out some machines.

But as he hasn't asked for it to be included in the kernel tree, that's
a moot point right now :)

thanks,

greg k-h
--
To: Greg KH <gregkh@...>
Cc: Arjan van de Ven <arjan@...>, Greg KH <greg@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, Linus Torvalds <torvalds@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>
Date: Friday, January 11, 2008 - 3:49 pm

So your argument is that MMCONFIG sucks, therefore Linux has to have a

He didn't?  I certainly ask for it to be included.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To: Matthew Wilcox <matthew@...>
Cc: Greg KH <gregkh@...>, Arjan van de Ven <arjan@...>, Greg KH <greg@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>
Date: Friday, January 11, 2008 - 3:58 pm

What's *your* point?

MMCONFIG is known broken. If we ever start enabling it more (ie start 
using it even if it's not reserved in the e820 tables), all that known 
breakage will come and bite us in the *ss.

We need to have some armor-plated underwear to protect against that 
ass-biting, and that's what Arjan's patch is. 

Tell me what *other* armor plating you could have that actually works?

		Linus
--
To: Linus Torvalds <torvalds@...>
Cc: Greg KH <gregkh@...>, Arjan van de Ven <arjan@...>, Greg KH <greg@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>
Date: Friday, January 11, 2008 - 4:17 pm

Ivan's patch doesn't start enabling MMCONFIG in more places than we
currently do.  It makes us use conf1 accesses for all accesses below

The armour plating that already exists -- pci=nommconf.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To: Matthew Wilcox <matthew@...>
Cc: Greg KH <gregkh@...>, Arjan van de Ven <arjan@...>, Greg KH <greg@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>
Date: Friday, January 11, 2008 - 4:27 pm

.. and I agree with that patch. But there will be people who try to access 
extended space by mistake, and they'll have a hard-locked machine or 

No. It needs to be automatic, OR THE OTHER WAY AROUND.

Ie we disable the unsafe feature on purpose, and then force people who 
access it to do so *consciously*.

Extended config space is different, for chissake! It's not even like it's 
just a bigger normal config space where normal config accesses just 
overflow into it. It really does have different rules etc.

			Linus
--
To: Linus Torvalds <torvalds@...>
Cc: Greg KH <gregkh@...>, Arjan van de Ven <arjan@...>, Greg KH <greg@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>
Date: Friday, January 11, 2008 - 4:42 pm

But they can't.  We limit the size they can access to 256 bytes, unless

I'd be fine with making mmconfig off by default.  Make people pass

Yes, but it's also important to enable some of the PCIe features.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To: Matthew Wilcox <matthew@...>
Cc: Greg KH <gregkh@...>, Arjan van de Ven <arjan@...>, Greg KH <greg@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>
Date: Friday, January 11, 2008 - 5:12 pm

Umm. Probing address 256 (or *any* address) using MMCONFIG will simply 
lock up the machine. HARD.

What's so hard to understand about MMCONFIG being broken on certain 
hardware?

		Linus
--
To: Linus Torvalds <torvalds@...>
Cc: Greg KH <gregkh@...>, Arjan van de Ven <arjan@...>, Greg KH <greg@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>
Date: Friday, January 11, 2008 - 5:17 pm

Did I miss a bug report?  The only problems I'm currently aware of are
the ones where using MMCONFIG during BAR probing causes a hard lockup on
some Intel machines, and the ones where we get bad config data on some
AMD machines due to the configuration retry status being mishandled.

All the other lockups I'm aware of are already handled by the existing
checks.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To: Matthew Wilcox <matthew@...>
Cc: Greg KH <gregkh@...>, Arjan van de Ven <arjan@...>, Greg KH <greg@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>
Date: Friday, January 11, 2008 - 5:28 pm

Hmm. Were all those reports root-caused to just that BAR probing? If so, 
we may be in better shape than I worried.

		Linus
--
To: Linus Torvalds <torvalds@...>
Cc: Greg KH <gregkh@...>, Arjan van de Ven <arjan@...>, Greg KH <greg@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>
Date: Friday, January 11, 2008 - 5:38 pm

I believe so.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To: Matthew Wilcox <matthew@...>
Cc: Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, Arjan van de Ven <arjan@...>, Greg KH <greg@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>
Date: Friday, January 11, 2008 - 7:58 pm

Ditto.

One typical problem is that on "Intel(r) 3 Series Experss Chipset Family"
MMCONFIG probing of the BAR #2 (frame buffer address) of integrated graphics
device locks up the machine (depending on BIOS settings, of course).
This happens because the frame buffer of IGD has higher decode priority
than MMCONFIG range, as stated in Intel docs...

Ivan.
--
To: Ivan Kokshaysky <ink@...>
Cc: Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, Arjan van de Ven <arjan@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Tony Camuso <tcamuso@...>
Date: Friday, January 11, 2008 - 8:26 pm

Ok, so what would the proposed patch look like to help resolve this?

Ivan, you posted one a while ago, but never seemed to get any
confirmation if it helped or not.  Should I use that and drop Arjan's?
Or use both?  Or something else like the patches proposed by Tony
Camuso?

thanks,

greg k-h
--
To: Greg KH <greg@...>
Cc: Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, Arjan van de Ven <arjan@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Tony Camuso <tcamuso@...>, Loic Prylli <loic@...>
Date: Saturday, January 12, 2008 - 10:40 am

Actually I'm strongly against Arjan's patch. First, it's based on
assumption that the MMCONFIG thing is sort of fundamentally broken
on some systems, but none of the facts we have so far does confirm that.
And second, I really don't like the implementation as it breaks all
non-x86 arches (or forces them to add a set of totally meaningless

Tony's patch is a variation of the same idea, so this patch
supersedes it. The only argument for using conf1 to access only the
first 64 bytes of the config space was some concerns about performance.
But the only driver that extensively uses config space at runtime
is tg3, and only as a work around some broken revisions of the chip.
And even in that case I seriously doubt that mmconf vs. conf1 would
make any measurable difference.
On the other hand, always using conf1 for the whole 256-byte legacy
config space allows us to drop all sorts of black lists, which is
a *huge* advantage.

Here is the same patch, but with an updated commit message -
proper attribution to Loic Prylli, which I somehow missed
the first time, sorry.

Ivan.

---
PCI x86: always use conf1 to access config space below 256 bytes

Thanks to Loic Prylli &lt;loic@myri.com&gt;, who originally proposed
this idea.

Always using legacy configuration mechanism for the legacy config space
and extended mechanism (mmconf) for the extended config space is
a simple and very logical approach. It's supposed to resolve all
known mmconf problems. It still allows per-device quirks (tweaking
dev-&gt;cfg_size). It also allows to get rid of mmconf fallback code.

Signed-off-by: Ivan Kokshaysky &lt;ink@jurassic.park.msu.ru&gt;
---
 arch/x86/pci/mmconfig-shared.c |   35 -----------------------------------
 arch/x86/pci/mmconfig_32.c     |   22 +++++++++-------------
 arch/x86/pci/mmconfig_64.c     |   22 ++++++++++------------
 arch/x86/pci/pci.h             |    7 -------
 4 files changed, 19 insertions(+), 67 deletions(-)

diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mm...
To: Ivan Kokshaysky <ink@...>
Cc: Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Tony Camuso <tcamuso@...>, Loic Prylli <loic@...>
Date: Saturday, January 12, 2008 - 1:45 pm

On Sat, 12 Jan 2008 17:40:30 +0300


btw this is my main objection to your patch; it intertwines the conf1 and mmconfig code even more.
When (and I'm saying "when" not "if") systems arrive that only have MMCONFIG for some of the devices,
we'll have to detangle this again, and I'm really not looking forward to that.
--
To: Arjan van de Ven <arjan@...>
Cc: Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Tony Camuso <tcamuso@...>
Date: Sunday, January 13, 2008 - 2:23 pm

conf1 has been a hardcoded dependencies of mmconfig for years. Ivan's 
patch does not make it worse (in fact it considerably simplifies that 
code, making it easier to untangle later).


IMHO, either your patch or Ivan's can be a good base, but:

1) For your remark above to be given any consideration, your patch 
should be modified to remove the hardcoded conf1 from the *current* 
mmconfig code, otherwise we end up with 3 set of ops (mmconfig + conf1+ 
a possible third set of operations) intertwined in a confusing manner. 
And removing that dependency is not a straightforward operation unless 
you also do 2):

2) the pci_enable_ext_config() function and dev-&gt;ext_cfg_space field, 
sysfs interface should be removed from the patch.  There has never been 
a problem reporting crashes or any undefined behaviour while trying to 
access ext-conf-space, all the problems where *using mmconfig to access 
legacy-conf-space*. The "if (dev-&gt;cfg_space_ext &gt; 0)" checks can instead 
be replaced by  "if (reg &gt;= 256)".
Otherwise when using per-device explicit enabling, just *checking* 
whether ext-conf-space is available by calling pci_enable_ext_config(), 
will make some of the old problems of *loosing legacy conf-space* come 
back: you would  have introduced a new user-space and kernel API while 
only solving half the problems, not a good deal.

if you do 1) and 2), then you really support the good properties you 
claimed:
- You can use mmconfig for ext-space and something else for legacy-space.
- You can use mmconfig for everything (for instance if conf1 is not 
implemented).

Of course it is as straightforward to modify Ivan's patch to also have 
the same properties.



Loic








Loic

--
To: Loic Prylli <loic@...>
Cc: Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Tony Camuso <tcamuso@...>
Date: Sunday, January 13, 2008 - 2:41 pm

On Sun, 13 Jan 2008 13:23:35 -0500
Loic Prylli &lt;loic@myri.com&gt; wrote:



This entirely misses the point of why I made the patch. The point is NOT
that devices are buggy. The point is that right now, 99.99% of the machines
out there do NOT need extended config space (no matter how it gets accessed),
yet at the same time they suffered from it's issues for... what 2 years now?
The point of my patch was to make people who don't need extended config space,
not have to deal with it anymore.

Note: There is not a 100% overlap between "need" and "will not be used in 
the patches that use legacy for &lt; 256". In the other patches posted, 
extended config space will be used in cases where it won't be with my 
patch. (Most obvious one is an "lspci -vx" from automated scripts). 
Is  that a problem? We've had 2 years of mess, with one not-enough patch after another.
There still are problems TODAY (eg im 2.6.24-rc7). The patch that falls back
to an alternative method for below 256 is no doubt a step in the right direction. 
(although I'm not all that happy about mixing access types, it's not provably incorrect)
Is it enough? I'm not sure. Only time can tell I suppose, but the risk side is that
if it is not enough, users who don't need the extended config space for functionality
will suffer the bugs AGAIN.

So in short, my approach was NOT about "fix PCI", it is about "fix the user experience".
It's a stopgap for sure, until the underlying mechanism gets reliable. It's been 2 years.....
maybe this next step is "it", maybe it isn't.

--
To: Arjan van de Ven <arjan@...>
Cc: Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Tony Camuso <tcamuso@...>
Date: Sunday, January 13, 2008 - 4:51 pm

I think I got your point the first time, and I agree it is sound. But in 
my subjective and biased opinion,  I just think ext-conf-space is 
already useful and widespread enough (being used is not the same as 
being strictly required for basic operation) for your proposed tradeoff 
to not be optimal (protecting against "future/non-proven" hardware bugs, 
i.e. bringing non-proven benefits, at the expense of making life harder 
for ext-conf-space users while bringing additional extra API/code).


To take an example from the linux tree: the driver/pci/pcie/aer code 
uses ext-conf-space for every pcie-root (currently several distributions 
enable it by default), does it mean opt-in would be automatically 
activated for most pcie hierarchies (defeating most of the benefits of 
being opt-in), or we just disable that code by default?


Does lspci -v will automatically opt-in all pcie (right now by default 
it tries to list the extended-capabilities for pcie and pcix), or do we 
now require manual explicit sysfs operations to get the whole thing? Is 
is an additional flag to lspci (if so will that flag also apply to pcix, 
possibly causing a crash for lspci -v 



To go one step your direction, I have already argued in a couple of 
emails that I would prefer to not implement ext-conf-space access for 
any PCI-X devices (removing PCI-X2 from pci_ext_cfg_size), because there 
we are trying to support devices that we don't really know exists or 
will ever exists. And protecting against "unproven bugs" makes more 


FWIW, I have in my tree a patch almost identical to Ivan's dated 
"December 2005". Because of the constant activity on the mmconfig front 
(that I thought would make it obsolete), I never took the effort of 
suggesting it before one month ago (I am not a regular user of 
linux-kernel). I admit nobody else should view it that way, but for me  
rather than the last attempt at fixing mmconfig, it's a patch first used 
two years ago that would have arguably prevented a...
To: Arjan van de Ven <arjan@...>
Cc: Loic Prylli <loic@...>, Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Tony Camuso <tcamuso@...>
Date: Sunday, January 13, 2008 - 4:43 pm

I believe you to be mistaken in this belief.  If you take Ivan's patch,
conf1 is used for all accesses below 256 bytes.  lspci -x only dumps
config space up to 64 bytes; lspci -xxxx is needed to show extended pci
config space.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To: Matthew Wilcox <matthew@...>
Cc: Arjan van de Ven <arjan@...>, Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Tony Camuso <tcamuso@...>
Date: Sunday, January 13, 2008 - 5:18 pm

I agree with Arjan about that "not a 100% overlap". It is about the 
extra ext-conf-space access done while probing in drivers/pci/probe.c:
    dev-&gt;cfg_size = pci_cfg_space_size(dev);

(and lspci -v will also query/show the list of extended-caps for 
pci-x/pcie-x devices that have some, provided the kernel can access 
ext-conf-space).

With Ivan's patch, that line would still cause one extended-conf-space 
access at offset 256 for pcie/pci-x2 devices  (to check the ability to 
query ext-space). Arjan "opt-in" patch would prevent that extra access.

IMHO that access is OK and harmless in all cases, we are already 
protected by MCFG/e820 checks, but I agree one can express a different 
opinion based on trying to prevent "never-seen/potential" hardware/BIOS 
bugs. FWIW it is also there that I was suggested to exclude PCI-X2 
devices (when restricted to pcie, that access while probing cannot even 
cause the harmless master-abort/0xffffffff), but there is a small trade-off.


Loic

--
To: Arjan van de Ven <arjan@...>
Cc: Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Tony Camuso <tcamuso@...>, Loic Prylli <loic@...>
Date: Saturday, January 12, 2008 - 5:49 pm

There is nothing wrong with it; please realize that mmconf and conf1 are
just different cpu-side interfaces. Both produce precisely the *same* bus

MMCONFIG for *some* of the devices? This doesn't sound realistic
from technical point of view.
MMCONFIG-only systems? Sure. I really hope to see these. But it won't
be PC-AT architecture anymore. It has to be something like alpha,
for instance, fully utilizing the 64-bit address space, and we'll have
to have the whole low-level PCI infrastructure completely different
for these future platforms anyway.
Right now, each and every x86 chipset *does* require working
conf1 just in order to set up the mmconf aperture. It's the very
fundamental thing, sort of design philosophy.

Ivan.
--
To: Ivan Kokshaysky <ink@...>
Cc: Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Tony Camuso <tcamuso@...>, Loic Prylli <loic@...>
Date: Saturday, January 12, 2008 - 7:01 pm

On Sun, 13 Jan 2008 00:49:11 +0300


s/x86/pc/

and not even that.

Really this is a huge design mistake in your patch, the hard coding of conf1,
and for that reason I really don't think it should go in.

We have 4 or so methods on PC today to access config space, probably going to 6 in the next year
or two. One of those methods *HARD PICKING* another one as "second best" for cases where it
doesn't want to deal with is WRONG. It really needs to be up to the architecture/platform
to decide which ops vector is the fallback. And yes on your current PC that might well be conf1.
But hardcoding that is not the right thing. We have the vectors, we have the ranking code,
just make a "second rank" thing. 
Oh wait, my patch did that ;)
Then let either the mmconfig code or the wrapper above it (doesn't matter, in fact, I can see
value of making this decision in the wrapper and keep mmconfig code simple and clean,
because maybe mmconfig IS the thing that the architecture says needs to deal with the lower 256 bytes)..

Oh wait my patch also did that pretty much ;)

The rest of my patch was defaulting to off. Is it that bit that you really hate?


--
To: Arjan van de Ven <arjan@...>
Cc: Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>
Date: Saturday, January 12, 2008 - 8:12 pm

Arjan,

I have not seen your MMCONFIG patch.

Would you mind sending me a copy?

Thanks.

Tony

--
To: <tcamuso@...>
Cc: Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>
Date: Saturday, January 12, 2008 - 8:40 pm

On Sat, 12 Jan 2008 19:12:23 -0500

sure


----


On PCs, PCI extended configuration space (4Kb) is riddled with problems
associated with the memory mapped access method (MMCONFIG). At the same
time, there are very few machines that actually need or use this
extended configuration space.

At this point in time, the only sensible action is to make access to the
extended configuration space an opt-in operation for those device
drivers that need/want access to this space, as well as for those
userland diagnostics utilities that (on admin request) want to access
this space.

It's inevitable that this is done per device rather than per bus; we'll
be needing per device PCI quirks to turn this extended config space off
over time no matter what; in addition, it gives the least amount of
surprise: loading a driver for a device only impacts that one device,
not a whole bus worth of devices (although it'll be common to have one
physical device per bus on PCI-E).

The (desireable) side-effect of this patch is that all enumeration is
done using normal configuration cycles.

The patch below splits the lower level PCI config space operation (which
operate on a bus) in two: one that normally only operates on traditional
space, and one that gets used after the driver has opted in to using the
extended configuration space. This has lead to a little code
duplication, but it's not all that bad (most of it is prototypes in
headers and such).

Architectures that have a solid reliable way to get to extended
configuration space can just keep doing what they do now and allow
extended space access from the "traditional" bus ops, and just not fill
in the new bus ops.  (This could include x86 for, say, BIOS year 2009
and later, but doesn't right now)

This patch also adds a sysfs property for each device into which root
can write a '1' to enable extended configuration space. The kernel will
print a notice into dmesg when this happens (including the name of the
app) so that if the system crashes a...
To: Arjan van de Ven <arjan@...>
Cc: Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>, Prarit Bhargava <prarit@...>, Chumbalkar, Nagananda <Nagananda.Chumbalkar@...>, Schoeller, Patrick (Linux - Houston, TX) <Patrick.Schoeller@...>, Bhavana Nagendra <bnagendr@...>
Date: Saturday, January 12, 2008 - 9:36 pm

Thanks, Arjan.

The problem we have been experiencing has to do with Northbridges,
not with devices.

As far as the device is concerned, after the Northbridge translates
the config access into PCI bus cycles, the device has no idea what
mechanism drove the Northbridge to the translation.

That is to say, the device does not know whether the config cycle
on the bus was caused by an MMCONFIG cycle or a legacy Port IO
cycle delivered to the Northbridge.

In systems that had Northbridges that did not respond correctly to
MMCONFIG cycles, like the AMD 8132, we (HP &amp; RH) were blacklisting
whole platforms to limit them to Port IO PCI config.

However, when platforms emerged using both legacy PCI and PCI express,
the platforms that were limited to Port IO config cycles were not
express compliant, since the express spec requires the platform to
be able to address the full 4096 byte region of config space to
be considered express-compliant.

The patch I devised concerned itself with Northbridges and separated
MMCONFIG-compliant buses from those that could not handle MMCONFIG.

Therefore, the express bus in the platform could happily employ
MMCONFIG to access the entire 4K region, while the legacy bus
with the non-compliant Northbridge could be restricted to Port IO
config.

However, even with my patch, the problem remained where devices
requiring large displacements could overlap the BIOS-mapped
MMCONFIG region. In such a situation, where the bus has passed
the MMCONFIG test, the MMCONFIG region can get doubly mapped by
bus-sizing code, causing the system to hang.

The remedy proposed by Loic and implemented by Ivan is actually
quite elegant, in that it addresses all these problems quite
effectively while eliminating a ration of specialized and somewhat
obscure code.

In my humble opinion, Port IO config access is here to stay, having
been defined as an architected mechanism in the PCI 2.1 spec.

This is most especially true for x86.

In other words, for x86, I don't think...
To: <tcamuso@...>
Cc: Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>, Prarit Bhargava <prarit@...>, Chumbalkar, Nagananda <Nagananda.Chumbalkar@...>, Schoeller, Patrick (Linux - Houston, TX) <Patrick.Schoeller@...>, Bhavana Nagendra <bnagendr@...>
Date: Sunday, January 13, 2008 - 12:42 am

On Sat, 12 Jan 2008 20:36:59 -0500

correct for now.
HOWEVER, and this is the point Linus has made several times:
Just about NOBODY has devices that need the extended config space. At all.
So making this opt-in for devices allows our users to boot and use
their system if they are in the majority that has no need for even getting

Wanne bet there'll be devices that screw this up? THere's devices that even screwed

THis kind of patchup has been going on for the better part of a year (well 2 years)
by now and it's STILL NOT ENOUGH, as you can see by the more patchups that have

You're wrong there. Sad to say, but you're wrong there.

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To: Arjan van de Ven <arjan@...>
Cc: Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>, Prarit Bhargava <prarit@...>, Chumbalkar, Nagananda <Nagananda.Chumbalkar@...>, Schoeller, Patrick (Linux - Houston, TX) <Patrick.Schoeller@...>, Bhavana Nagendra <bnagendr@...>
Date: Sunday, January 13, 2008 - 8:43 am

The PCI express spec requires the platform to provide access to this space
for express-compliance. More devices will be using this space as express

There may have been devices that incorrectly applied the PCI spec to
various fields in the header, I'll grant you that.

However, there is no way a device can determine electrically whether the
Northbridge received Port IO or MMCONFIG cycles. This is between the CPU

Which is why Loic's proposal and Ivan's implementation of it is so elegant.
It solves all these problems in one sweep, and eliminates the code rendered

The PCI spec provides for conf1 as an architected solution. It's not
going away, and especially not in x86 land where Port IO is built-in
to the CPU.



--
To: <tcamuso@...>
Cc: Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>, Prarit Bhargava <prarit@...>, Chumbalkar, Nagananda <Nagananda.Chumbalkar@...>, Schoeller, Patrick (Linux - Houston, TX) <Patrick.Schoeller@...>, Bhavana Nagendra <bnagendr@...>
Date: Sunday, January 13, 2008 - 1:03 pm

On Sun, 13 Jan 2008 07:43:11 -0500

PLATFORM not OS :)
Windows isn't using it in the server space, and only in the client space it recently started


again sadly you're wrong. 

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To: Arjan van de Ven <arjan@...>
Cc: Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>, Prarit Bhargava <prarit@...>, Chumbalkar, Nagananda <Nagananda.Chumbalkar@...>, Schoeller, Patrick (Linux - Houston, TX) <Patrick.Schoeller@...>, Bhavana Nagendra <bnagendr@...>
Date: Sunday, January 13, 2008 - 5:28 pm

As someone gently pointed out to me, you are in a position to know this,
so I probably am wrong.

--
To: <tcamuso@...>
Cc: Arjan van de Ven <arjan@...>, Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>, Prarit Bhargava <prarit@...>, Chumbalkar, Nagananda <Nagananda.Chumbalkar@...>, Schoeller, Patrick (Linux - Houston, TX) <Patrick.Schoeller@...>, Bhavana Nagendra <bnagendr@...>
Date: Sunday, January 13, 2008 - 8:54 pm

On Sun, 13 Jan 2008 16:28:08 -0500

I suspect Arjan is wrong. It might be some Intel agenda but I still see
fairly new driver reference code that is hardcoding port accesses even
when designed for Redmond products.

Alan
--
To: Alan Cox <alan@...>
Cc: <tcamuso@...>, Arjan van de Ven <arjan@...>, Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>, Prarit Bhargava <prarit@...>, Chumbalkar, Nagananda <Nagananda.Chumbalkar@...>, Schoeller, Patrick (Linux - Houston, TX) <Patrick.Schoeller@...>, Bhavana Nagendra <bnagendr@...>
Date: Monday, January 14, 2008 - 1:20 am

Agreed. I suspect that the likelihood of conf1 accesses going away in the 
next five years is slim to none.

			Linus
--
To: Alan Cox <alan@...>
Cc: <tcamuso@...>, Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>, Prarit Bhargava <prarit@...>, Chumbalkar, Nagananda <Nagananda.Chumbalkar@...>, Schoeller, Patrick (Linux - Houston, TX) <Patrick.Schoeller@...>, Bhavana Nagendra <bnagendr@...>
Date: Sunday, January 13, 2008 - 9:33 pm

On Mon, 14 Jan 2008 00:54:34 +0000


I find it hard to believe that even they have their drivers do PCI config access via ports directly from the drivers,
and especially in driver reference code...


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To: Arjan van de Ven <arjan@...>
Cc: Alan Cox <alan@...>, Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>, Prarit Bhargava <prarit@...>, Chumbalkar, Nagananda <Nagananda.Chumbalkar@...>, Schoeller, Patrick (Linux - Houston, TX) <Patrick.Schoeller@...>, Bhavana Nagendra <bnagendr@...>
Date: Sunday, January 13, 2008 - 11:29 pm

To all ...

Well, here is what I perceive we've got so far.

. Some PCI Northbridges do not work with MMCONFIG.

. Some PCI BARs can overlap the MMCONFIG area during bus sizing.
   It is hoped that new BIOSes will locate MMCONFIG in an area
   safely out of the way of bus sizing code, but there can be
   no guarantees.

. conf1 is going away in newer x86 implementations in the not
   too distant future.

. The PCI express spec requires platforms to provide access to
   the extended config area, and there are express devices today
   using that area for AER.

. There is no need to provide different PCI config access
   mechanisms at device granularity, since the PCI config access
   mechanism between the CPU and the Northbridge is opaque to
   the devices. PCI config mechanisms only need to differ at
   the Northbridge level.

. We have a flurry of patches all claiming to solve all or some
   of these problems.


Arjan,

I realize it may not be possible for you to answer this question,
but I feel compelled to ask it anyway. Is it possible that future
x86 architectures will be implementing a SAL-like interface to
abstract PCI config access altogether?

Or can we condense these patches down to a set that does the
following?

. If the system is capable of conf1, then PCI config access
   at offsets &lt; 256 should be confined to conf1. This solution
   is most effective for existing and legacy systems.

. If the system does not support MMCONFIG, of if MMCONFIG is
   not working, then accesses to offsets &gt; 256 return -1 and an
   error status.

. For systems, where the conf1 mechanism is NOT available,
   then MMCONFIG should be the PCI access mechanism for all
   offsets. For such systems, we must assume that the BIOS has
   become smart enough to locate MMCONFIG in a region safe from
   encroachment by bus sizing code.



--
To: <tcamuso@...>
Cc: Alan Cox <alan@...>, Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>, Prarit Bhargava <prarit@...>, Chumbalkar, Nagananda <Nagananda.Chumbalkar@...>, Schoeller, Patrick (Linux - Houston, TX) <Patrick.Schoeller@...>, Bhavana Nagendra <bnagendr@...>
Date: Monday, January 14, 2008 - 1:05 am

On Sun, 13 Jan 2008 22:29:23 -0500


not "conf1" but "what the platform thinks is the best method for &lt; 256".

We have this nice abstraction for the platform to select the best method... we should use it.

And still, it's another attempt to get this fixed (well.. it's been 2 years in the coming so far, maybe this will
be the last one, maybe it will not be... we'll see I suppose, but it sucks to be a user who doesn't 
need any of the functionality that the extended config space provides in theory but gets to suffer more of the issues)

I'm all in favor of making this more reliable, but really..
we've thought it was fixed time and time again over the last two years. Please consider
limiting the scope of the damage as well.




-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To: Arjan van de Ven <arjan@...>
Cc: Alan Cox <alan@...>, Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>, Prarit Bhargava <prarit@...>, Chumbalkar, Nagananda <Nagananda.Chumbalkar@...>, Schoeller, Patrick (Linux - Houston, TX) <Patrick.Schoeller@...>, Bhavana Nagendra <bnagendr@...>
Date: Monday, January 14, 2008 - 9:01 am

I don't understand. If we're going to differentiate MMCONFIG from some other
access mechanism, it only needs to be done at the Northbridge level. Devices
are electrically ignorant of the protocol used between CPU and Northbridge
Agreed.

So we have Loic and Ivan's patch limiting MMCONFIG accesses to
offsets &gt;= 256.

And we have Matthew's patch that abstracts the method for config
accesses to offsets &lt; 256.

I beleive Matthew has already tested these patches for functionality
on x86. All that's needed is to test for regressions on other arches.

Is there any interest in providing the following?

1. The ability to use MMCONFIG for all accesses on systems that have
    no problems with MMCONFIG.

2. For systems using both PCI and PCI express, testing each bus
    for MMCONFIG compliance, to determine whether MMCONFIG can be
    used for all config accesses or whether the bus must be limited
    all to the method abstracted for offsets &lt; 256.

Or does that introduce unnecessary complications?


--
To: <tcamuso@...>
Cc: Alan Cox <alan@...>, Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>, Prarit Bhargava <prarit@...>, Chumbalkar, Nagananda <Nagananda.Chumbalkar@...>, Schoeller, Patrick (Linux - Houston, TX) <Patrick.Schoeller@...>, Bhavana Nagendra <bnagendr@...>
Date: Monday, January 14, 2008 - 10:46 am

On Mon, 14 Jan 2008 08:01:01 -0500


Again this is about having systems that don't need extended config space not use it. At all.
The only way to do that is have the drivers say they need it, and not use it otherwise.
It has NOTHING to do with how things are wired up. It's pure a kernel level policy decision
about whether to use extended config space AT ALL.



-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To: Arjan van de Ven <arjan@...>
Cc: Alan Cox <alan@...>, Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>, Prarit Bhargava <prarit@...>, Chumbalkar, Nagananda <Nagananda.Chumbalkar@...>, Schoeller, Patrick (Linux - Houston, TX) <Patrick.Schoeller@...>, Bhavana Nagendra <bnagendr@...>
Date: Monday, January 14, 2008 - 11:23 am

The problem with compelling device drivers to determine the PCI
config mechanism is that it must be forced upon arches that
have no PCI configuration quirks or don't even use the same
PCI config mechanisms as x86.

I don't think that's a good policy.

Better to confine arch-specific quirks to the arch-specific code
whenever possible.

--
To: <tcamuso@...>
Cc: Alan Cox <alan@...>, Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>, Prarit Bhargava <prarit@...>, Chumbalkar, Nagananda <Nagananda.Chumbalkar@...>, Schoeller, Patrick (Linux - Houston, TX) <Patrick.Schoeller@...>, Bhavana Nagendra <bnagendr@...>
Date: Monday, January 14, 2008 - 12:01 pm

On Mon, 14 Jan 2008 10:23:14 -0500



-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To: Arjan van de Ven <arjan@...>
Cc: Alan Cox <alan@...>, Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>, Prarit Bhargava <prarit@...>, Chumbalkar, Nagananda <Nagananda.Chumbalkar@...>, Schoeller, Patrick (Linux - Houston, TX) <Patrick.Schoeller@...>, Bhavana Nagendra <bnagendr@...>
Date: Monday, January 14, 2008 - 12:08 pm

Arjan, you would be foisting this call on device drivers running on
arches that don't need any such distinction between extended config
space and &lt; 256 bytes.

I still think it's a bad policy.

Let's endeavor to confine arch-specific quirks to the arch-specific
code.

--
To: Arjan van de Ven <arjan@...>
Cc: <tcamuso@...>, Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>, Prarit Bhargava <prarit@...>, Chumbalkar, Nagananda <Nagananda.Chumbalkar@...>, Schoeller, Patrick (Linux - Houston, TX) <Patrick.Schoeller@...>, Bhavana Nagendra <bnagendr@...>
Date: Monday, January 14, 2008 - 5:11 am

Microsoft may not but the standard of Taiwanese driver code (and by
reference I mean vendor reference not OS supplier reference) is not
always great. When you have weeks to write a driver for a product with a
6 month sales lifetime I guess there are other pressures on driver
authors.

Easy enough for Intel to analyse though.

Alan
--
To: Arjan van de Ven <arjan@...>
Cc: <tcamuso@...>, Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>, Prarit Bhargava <prarit@...>, Chumbalkar, Nagananda <Nagananda.Chumbalkar@...>, Schoeller, Patrick (Linux - Houston, TX) <Patrick.Schoeller@...>, Bhavana Nagendra <bnagendr@...>
Date: Sunday, January 13, 2008 - 12:47 am

I don't know if they 'screwed it up'.  There are devices that misbehave
when registers are read from pci config space.  But this was never
guaranteed to be a safe thing to do; it gradualy became clear that
people expected to be able to read random registers and manufacturers
responded accordingly, but I don't think you were ever guaranteed to be
able to peek at bits of config space arbitrarily.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To: Matthew Wilcox <matthew@...>
Cc: Arjan van de Ven <arjan@...>, <tcamuso@...>, Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Loic Prylli <loic@...>, Prarit Bhargava <prarit@...>, Chumbalkar, Nagananda <Nagananda.Chumbalkar@...>, Schoeller, Patrick (Linux - Houston, TX) <Patrick.Schoeller@...>, Bhavana Nagendra <bnagendr@...>
Date: Sunday, January 13, 2008 - 2:43 am

Quite correct...  Reading registers can have all sorts of side effects, 
for example clearing chip conditions.

	Jeff



--
To: Arjan van de Ven <arjan@...>
Cc: Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Tony Camuso <tcamuso@...>, Loic Prylli <loic@...>
Date: Saturday, January 12, 2008 - 2:17 pm

I think this will be OK.  We'll end up with three pci_ops, one for
mmconfig-only, one for mixed mmconfig-conf1 and one for conf1.  We could
do with that now actually -- the machines which will definitely go beserk
if you try to use mmconfig could have the conf1 ops on those busses.

Let's take Ivan's patch for now, and do that patch for 2.6.26.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To: Ivan Kokshaysky <ink@...>
Cc: Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Tony Camuso <tcamuso@...>, Loic Prylli <loic@...>
Date: Saturday, January 12, 2008 - 11:46 am

On Sat, 12 Jan 2008 17:40:30 +0300
Ivan Kokshaysky &lt;ink@jurassic.park.msu.ru&gt; wrote:

no it doesn't!
Other arches need no changes.
--
To: Arjan van de Ven <arjan@...>
Cc: Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Benjamin Herrenschmidt <benh@...>, Martin Mares <mj@...>, Tony Camuso <tcamuso@...>, Loic Prylli <loic@...>
Date: Saturday, January 12, 2008 - 12:23 pm

Umm, true. I misread your patch.
But it doesn't change anything - that wasn't my main objection
anyway.

Ivan.
--
To: Ivan Kokshaysky <ink@...>
Cc: Greg KH <greg@...>, Matthew Wilcox <matthew@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, Arjan van de Ven <arjan@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Martin Mares <mj@...>, Tony Camuso <tcamuso@...>, Loic Prylli <loic@...>
Date: Sunday, January 13, 2008 - 3:08 am

I agree, I quite dislike it too. Even If the breakage on x86 makes us
want to totally disable it there, it can be done within the existing PCI
ops I believe.

I think Arjan's problem is to try to do it per-device since the
"standard" PCI ops don't get a pci_dev structure (for obvious reasons).

But from what I read in this thread, this per-device enabling/disabling
doesn't seem very useful at all.

Cheers,
Ben.


--
To: Benjamin Herrenschmidt <benh@...>
Cc: Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, Arjan van de Ven <arjan@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Martin Mares <mj@...>, Tony Camuso <tcamuso@...>, Loic Prylli <loic@...>
Date: Sunday, January 13, 2008 - 3:24 am

Here's a patch (on top of Ivan's) to improve things further.

One of Arjan's big problems with Ivan's patch is the hardcoding of conf1
as the fallback.  So I took an idea from Arjan's patch, crossed it
with an idea of my own and came up with this.  It gets rid of the
raw_pci_ops as a generic idea, and makes it private to the x86 arch.
It also makes the whole select-which-ops private to the x86 arch without
touching the pci layer at all.

Only compile-tested on x86-64.

diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 488e48a..ffaf02b 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -43,8 +43,7 @@
 #define PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg)	\
 	(((u64) seg &lt;&lt; 28) | (bus &lt;&lt; 20) | (devfn &lt;&lt; 12) | (reg))
 
-static int
-pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
 	      int reg, int len, u32 *value)
 {
 	u64 addr, data = 0;
@@ -68,8 +67,7 @@ pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
 	return 0;
 }
 
-static int
-pci_sal_write (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
 	       int reg, int len, u32 value)
 {
 	u64 addr;
@@ -91,24 +89,17 @@ pci_sal_write (unsigned int seg, unsigned int bus, unsigned int devfn,
 	return 0;
 }
 
-static struct pci_raw_ops pci_sal_ops = {
-	.read =		pci_sal_read,
-	.write =	pci_sal_write
-};
-
-struct pci_raw_ops *raw_pci_ops = &amp;pci_sal_ops;
-
-static int
-pci_read (struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
+static int pci_read(struct pci_bus *bus, unsigned int devfn, int where,
+							int size, u32 *value)
 {
-	return raw_pci_ops-&gt;read(pci_domain_nr(bus), bus-&gt;number,
+	return raw_pci_read(pci_domain_nr(bus), bus-&gt;number,
 				 devfn, where, size, value);
 }
 
-static int
-pci_write (struct pci_bus *bus, unsigned int devf...
To: Matthew Wilcox <matthew@...>
Cc: Benjamin Herrenschmidt <benh@...>, Ivan Kokshaysky <ink@...>, Greg KH <greg@...>, Linus Torvalds <torvalds@...>, Greg KH <gregkh@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <linux-pci@...>, Martin Mares <mj@...>, Tony Camuso <tcamuso@...>, Loic Prylli <loic@...>
Date: Sunday, January 13, 2008 - 1:01 pm

as a general thing I like where this patch is going

On Sun, 13 Jan 2008 00:24:15 -0700

would be nice the "reg &gt; 256 &amp;&amp; raw_pci_Ext_ops==NULL" case would just
call the raw_pci_ops-&gt; pointer, to give that a chance of refusal

	couldn't this (at least in some next patch) use the vector if it exists?


why set BOTH vectors? you probably ONLY want to set the ext one, so 
that calls to the lower 256 go to the original

--