Re: raw_pci_read in quirk_intel_irqbalance

Previous thread: Re: [BUG][PATCH -mm] bluetooth : rfcomm add get/put device in del_conn by Dave Young on Tuesday, December 25, 2007 - 3:07 am. (3 messages)

Next thread: Re: TOMOYO Linux Security Goal by Serge E. Hallyn on Wednesday, December 26, 2007 - 9:42 am. (14 messages)
From: Arjan van de Ven
Date: Tuesday, December 25, 2007 - 4:26 am

From: Arjan van de Ven <arjan@linux.intel.com>
Subject: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in

On PCs, PCI extended configuration space (4Kb) is riddled with problems 
associated with the memory mapped access method (MMCONFIG). At the same
time, there are very few machines that actually need or use this extended
configuration space. 

At this point in time, the only sensible action is to make access to the
extended configuration space an opt-in operation for those device drivers
that need/want access to this space, as well as for those userland 
diagnostics utilities that (on admin request) want to access this space.

It's inevitable that this is done per device rather than per bus; we'll
be needing per device PCI quirks to turn this extended config space off 
over time no matter what; in addition, it gives the least amount of surprise:
loading a driver for a device only impacts that one device, not a whole bus
worth of devices (although it'll be common to have one physical device per
bus on PCI-E).

The (desireable) side-effect of this patch is that all enumeration is done
using normal configuration cycles.

The patch below splits the lower level PCI config space operation (which
operate on a bus) in two: one that normally only operates on traditional 
space, and one that gets used after the driver has opted in to using the
extended configuration space. This has lead to a little code duplication,
but it's not all that bad (most of it is prototypes in headers and such).

Architectures that have a solid reliable way to get to extended configuration
space can just keep doing what they do now and allow extended space access
from the "traditional" bus ops, and just not fill in the new bus ops.
(This could include x86 for, say, BIOS year 2009 and later, but doesn't
right now)

This patch also adds a sysfs property for each device into which root can
write a '1' to enable extended configuration space. The kernel will print
a notice into dmesg ...
From: Greg KH
Date: Friday, January 11, 2008 - 12:02 pm

Can you send me a follow-on patch that documents this in
Documentation/ABI please.

thanks,

greg k-h
--

From: Arjan van de Ven
Date: Friday, January 11, 2008 - 12:09 pm

On Fri, 11 Jan 2008 11:02:29 -0800

once it's stable enough, say after 1 kernel release, sure

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Greg KH
Date: Friday, January 11, 2008 - 12:14 pm

That's what the Documentation/ABI/testing section is for.  If you add
something new, it needs to be documented now, otherwise it will be
forgotten.

thanks,

greg k-h
--

From: Arjan van de Ven
Date: Friday, January 11, 2008 - 12:54 pm

On Fri, 11 Jan 2008 11:02:29 -0800

---
 Documentation/ABI/testing/sysfs-pci-extended-config |   39 ++++++++++++++++++++
 1 file changed, 39 insertions(+)

Index: linux-2.6.24-rc7/Documentation/ABI/testing/sysfs-pci-extended-config
===================================================================
--- /dev/null
+++ linux-2.6.24-rc7/Documentation/ABI/testing/sysfs-pci-extended-config
@@ -0,0 +1,39 @@
+What:		/sys/devices/pci<bus>/<device>/extended_config_space
+Date:		January 11, 2008
+Contact:	Arjan van de Ven <arjan@linux.intel.com>
+Description:
+		This attribute is for use for system-diagnostic software
+		only.
+
+		The kernel may decide to restrict PCI configuration space
+		access for userspace to the first 64 or 256 bytes by
+		default, for stability reasons. This attribute, when
+		present, can be used to request access to the full
+		4Kb from the kernel.
+
+		Request to get access to the full 4Kb can be done by
+		writing a '1' into this attribute file. All other values
+		are reserved for future use and should not be used by
+		software at this point.
+
+		The kernel may log the request to the various kernel
+		logging services. The kernel may decide to ignore the
+		request if the kernel deems extended configuration space
+		access not reliable enough for the system or the device.
+		The kernel may decide to not present this attribute
+		if the kernel decides extended config space is reliable
+		and made available by default, or if the kernel decides
+		that extended configuration space will never be
+		accessible.
+
+		Software needs to gracefully deal with getting the
+		access not granted. Software also needs to gracefully deal
+		with this attribute not being present.
+
+		Due to the fragility of extended configuratio space,
+		system diagnostic software should only set this attribute
+		on explicit user request, or in the case of GUI like tools,
+		at least with explicit user permission.
+
+
+


-- 
If you want to reach me at my work email, ...
From: Greg KH
Date: Friday, January 11, 2008 - 1:55 pm

Thanks, I've merged this with the original one.

greg k-h
--

From: Matthew Wilcox
Date: Friday, January 11, 2008 - 12:28 pm

Greg, if you integrate Ivan's patch, you don't need Arjan's patch.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Arjan van de Ven
Date: Friday, January 11, 2008 - 12:40 pm

On Fri, 11 Jan 2008 12:28:20 -0700

Personally I absolutely don't agree with that.
Ivan's patch is another attempt to make MMCONFIG work somewhat better,
but does not provide the explicit opt-in that I think is required at
this point; people have tried to get MMCONFIG stable for a really long time,
and failed still upto today. At least my patience is up and this needs
to be opt-in.



-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Matthew Wilcox
Date: Friday, January 11, 2008 - 12:49 pm

So your argument is that MMCONFIG sucks, therefore Linux has to have a

He didn't?  I certainly ask for it to be included.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Matthew Wilcox
Date: Friday, January 11, 2008 - 1:17 pm

Ivan's patch doesn't start enabling MMCONFIG in more places than we
currently do.  It makes us use conf1 accesses for all accesses below

The armour plating that already exists -- pci=nommconf.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Linus Torvalds
Date: Friday, January 11, 2008 - 1:27 pm

.. and I agree with that patch. But there will be people who try to access 
extended space by mistake, and they'll have a hard-locked machine or 

No. It needs to be automatic, OR THE OTHER WAY AROUND.

Ie we disable the unsafe feature on purpose, and then force people who 
access it to do so *consciously*.

Extended config space is different, for chissake! It's not even like it's 
just a bigger normal config space where normal config accesses just 
overflow into it. It really does have different rules etc.

			Linus
--

From: Matthew Wilcox
Date: Friday, January 11, 2008 - 1:42 pm

But they can't.  We limit the size they can access to 256 bytes, unless

I'd be fine with making mmconfig off by default.  Make people pass

Yes, but it's also important to enable some of the PCIe features.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Linus Torvalds
Date: Friday, January 11, 2008 - 2:12 pm

Umm. Probing address 256 (or *any* address) using MMCONFIG will simply 
lock up the machine. HARD.

What's so hard to understand about MMCONFIG being broken on certain 
hardware?

		Linus
--

From: Matthew Wilcox
Date: Friday, January 11, 2008 - 2:17 pm

Did I miss a bug report?  The only problems I'm currently aware of are
the ones where using MMCONFIG during BAR probing causes a hard lockup on
some Intel machines, and the ones where we get bad config data on some
AMD machines due to the configuration retry status being mishandled.

All the other lockups I'm aware of are already handled by the existing
checks.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Linus Torvalds
Date: Friday, January 11, 2008 - 2:28 pm

Hmm. Were all those reports root-caused to just that BAR probing? If so, 
we may be in better shape than I worried.

		Linus
--

From: Matthew Wilcox
Date: Friday, January 11, 2008 - 2:38 pm

I believe so.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Ivan Kokshaysky
Date: Friday, January 11, 2008 - 4:58 pm

Ditto.

One typical problem is that on "Intel(r) 3 Series Experss Chipset Family"
MMCONFIG probing of the BAR #2 (frame buffer address) of integrated graphics
device locks up the machine (depending on BIOS settings, of course).
This happens because the frame buffer of IGD has higher decode priority
than MMCONFIG range, as stated in Intel docs...

Ivan.
--

From: Jesse Barnes
Date: Friday, January 11, 2008 - 5:17 pm

Yeah, I'm only aware of 3:
  - the BAR overlapping w/MMCONFIG problem described above
  - ATI chipset config space retry bug
  - VIA (?) chipset host bridges don't respond well to having decode
    disabled (they stop decoding RAM addresses as well)

That's it afaik, so I've never really known where Linus' paranoia comes 
from.  OTOH I haven't been too keen to challenge it either; MMCONFIG 
space is only just beginning to be tested widely with the deployment of 
Vista, so we'll doubtless see more problems on older chipsets if we 
enable it by default.

Jesse
--

From: Greg KH
Date: Friday, January 11, 2008 - 5:26 pm

Ok, so what would the proposed patch look like to help resolve this?

Ivan, you posted one a while ago, but never seemed to get any
confirmation if it helped or not.  Should I use that and drop Arjan's?
Or use both?  Or something else like the patches proposed by Tony
Camuso?

thanks,

greg k-h
--

From: Ivan Kokshaysky
Date: Saturday, January 12, 2008 - 7:40 am

Actually I'm strongly against Arjan's patch. First, it's based on
assumption that the MMCONFIG thing is sort of fundamentally broken
on some systems, but none of the facts we have so far does confirm that.
And second, I really don't like the implementation as it breaks all
non-x86 arches (or forces them to add a set of totally meaningless

Tony's patch is a variation of the same idea, so this patch
supersedes it. The only argument for using conf1 to access only the
first 64 bytes of the config space was some concerns about performance.
But the only driver that extensively uses config space at runtime
is tg3, and only as a work around some broken revisions of the chip.
And even in that case I seriously doubt that mmconf vs. conf1 would
make any measurable difference.
On the other hand, always using conf1 for the whole 256-byte legacy
config space allows us to drop all sorts of black lists, which is
a *huge* advantage.

Here is the same patch, but with an updated commit message -
proper attribution to Loic Prylli, which I somehow missed
the first time, sorry.

Ivan.

---
PCI x86: always use conf1 to access config space below 256 bytes

Thanks to Loic Prylli <loic@myri.com>, who originally proposed
this idea.

Always using legacy configuration mechanism for the legacy config space
and extended mechanism (mmconf) for the extended config space is
a simple and very logical approach. It's supposed to resolve all
known mmconf problems. It still allows per-device quirks (tweaking
dev->cfg_size). It also allows to get rid of mmconf fallback code.

Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
---
 arch/x86/pci/mmconfig-shared.c |   35 -----------------------------------
 arch/x86/pci/mmconfig_32.c     |   22 +++++++++-------------
 arch/x86/pci/mmconfig_64.c     |   22 ++++++++++------------
 arch/x86/pci/pci.h             |    7 -------
 4 files changed, 19 insertions(+), 67 deletions(-)

diff --git a/arch/x86/pci/mmconfig-shared.c ...
From: Benjamin Herrenschmidt
Date: Sunday, January 13, 2008 - 12:08 am

I agree, I quite dislike it too. Even If the breakage on x86 makes us
want to totally disable it there, it can be done within the existing PCI
ops I believe.

I think Arjan's problem is to try to do it per-device since the
"standard" PCI ops don't get a pci_dev structure (for obvious reasons).

But from what I read in this thread, this per-device enabling/disabling
doesn't seem very useful at all.

Cheers,
Ben.


--

From: Matthew Wilcox
Date: Sunday, January 13, 2008 - 12:24 am

Here's a patch (on top of Ivan's) to improve things further.

One of Arjan's big problems with Ivan's patch is the hardcoding of conf1
as the fallback.  So I took an idea from Arjan's patch, crossed it
with an idea of my own and came up with this.  It gets rid of the
raw_pci_ops as a generic idea, and makes it private to the x86 arch.
It also makes the whole select-which-ops private to the x86 arch without
touching the pci layer at all.

Only compile-tested on x86-64.

diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 488e48a..ffaf02b 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -43,8 +43,7 @@
 #define PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg)	\
 	(((u64) seg << 28) | (bus << 20) | (devfn << 12) | (reg))
 
-static int
-pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
 	      int reg, int len, u32 *value)
 {
 	u64 addr, data = 0;
@@ -68,8 +67,7 @@ pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
 	return 0;
 }
 
-static int
-pci_sal_write (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
 	       int reg, int len, u32 value)
 {
 	u64 addr;
@@ -91,24 +89,17 @@ pci_sal_write (unsigned int seg, unsigned int bus, unsigned int devfn,
 	return 0;
 }
 
-static struct pci_raw_ops pci_sal_ops = {
-	.read =		pci_sal_read,
-	.write =	pci_sal_write
-};
-
-struct pci_raw_ops *raw_pci_ops = &pci_sal_ops;
-
-static int
-pci_read (struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
+static int pci_read(struct pci_bus *bus, unsigned int devfn, int where,
+							int size, u32 *value)
 {
-	return raw_pci_ops->read(pci_domain_nr(bus), bus->number,
+	return raw_pci_read(pci_domain_nr(bus), bus->number,
 				 devfn, where, size, value);
 }
 
-static int
-pci_write (struct pci_bus *bus, unsigned int devfn, int where, int size, u32 ...
From: Matthew Wilcox
Date: Sunday, January 13, 2008 - 12:58 am

Oops.  I forgot to check the ordering of mmconfig vs direct probing, so
that patch would end up just using mmconfig for everything.  Not what we
want.  Also, there's three bits of mmconfig-shared that're probing using
conf1, even if it might have failed.  And if we're going to use
raw_pci_read() when conf1 might have failed and mmconf isn't set up yet,
we need to check raw_pci_ops in raw_pci_read().  Add the check in
raw_pci_write too, just for symmetry.

I don't like it that mmconfig_32 prints a message and mmconfig_64
doesn't, but fixing that is not part of this patch.

Interdiff:

diff -u b/arch/x86/pci/common.c b/arch/x86/pci/common.c
--- b/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -31,7 +31,7 @@
 int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
 						int reg, int len, u32 *val)
 {
-	if (reg < 256)
+	if (reg < 256 && raw_pci_ops)
 		return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
 	if (raw_pci_ext_ops)
 		return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
@@ -41,7 +41,7 @@
 int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
 						int reg, int len, u32 val)
 {
-	if (reg < 256)
+	if (reg < 256 && raw_pci_ops)
 		return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
 	if (raw_pci_ext_ops)
 		return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
diff -u b/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
--- b/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -28,7 +28,7 @@
 static const char __init *pci_mmcfg_e7520(void)
 {
 	u32 win;
-	pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
+	raw_pci_read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
 
 	win = win & 0xf000;
 	if(win == 0x0000 || win == 0xf000)
@@ -53,7 +53,7 @@
 
 	pci_mmcfg_config_num = 1;
 
-	pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0x48, 4, &pciexbar);
+	raw_pci_read(0, 0, PCI_DEVFN(0,0), 0x48, 4, &pciexbar);
 
 	/* Enable bit */
 	if ...
From: Arjan van de Ven
Date: Sunday, January 13, 2008 - 10:01 am

as a general thing I like where this patch is going

On Sun, 13 Jan 2008 00:24:15 -0700

would be nice the "reg > 256 && raw_pci_Ext_ops==NULL" case would just
call the raw_pci_ops-> pointer, to give that a chance of refusal

	couldn't this (at least in some next patch) use the vector if it exists?


why set BOTH vectors? you probably ONLY want to set the ext one, so 
that calls to the lower 256 go to the original

--

From: Matthew Wilcox
Date: Monday, January 14, 2008 - 3:52 pm

We don't have a situation where that can happen -- all the other current
config methods on x86 are limited to <256 bytes.  If we get another

I thought so, but due to the way that things are initialised, mmconfig
happens before conf1.  conf1 is known to be usable, but hasn't set
raw_pci_ops at this point.  Confusing, and not ideal, but fixing this

I had misunderstood how the x86 pci init happened -- I thought conf1
would override this.  It doesn't.

The following patch has been tested on ia64, x86 and x86_64.
It successfully avoids the hang on my G33 machine (ie BAR probing
problem), when applied *after* Ivan's patch.

Greg, please apply Ivan's patch and then this one.

---

PCI: Rationalise raw_pci_ops

Replace raw_pci_ops with raw_pci_read() and raw_pci_write().  This is
a better interface for ACPI, ia64 and now x86.

Make pci_raw_ops private to the x86 arch, and use it to implement
raw_pci_read/write.  Add a raw_pci_ext_ops for extended config space.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 488e48a..8fd7e82 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -43,8 +43,7 @@
 #define PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg)	\
 	(((u64) seg << 28) | (bus << 20) | (devfn << 12) | (reg))
 
-static int
-pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
 	      int reg, int len, u32 *value)
 {
 	u64 addr, data = 0;
@@ -68,8 +67,7 @@ pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
 	return 0;
 }
 
-static int
-pci_sal_write (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
 	       int reg, int len, u32 value)
 {
 	u64 addr;
@@ -91,24 +89,17 @@ pci_sal_write (unsigned int seg, unsigned int bus, unsigned int devfn,
 	return 0;
 }
 
-static struct pci_raw_ops pci_sal_ops = ...
From: Adrian Bunk
Date: Monday, January 14, 2008 - 4:04 pm

*ahem*

I don't think anything of what was discussed in this thread would be in 
scope for 2.6.24 (unless Linus wants to let the bunny that brings eggs 
release 2.6.24).

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Loic Prylli
Date: Tuesday, January 15, 2008 - 9:00 am

Why not put in 2.6.24 a simple fix for the last known remaining mmconfig 
problems in 2.6.24?  There has mostly been three bugs related to mmconfig:
- BIOS/hardware: exaggerated MCFG claims: solved long ago
- hardware: buggy CRS+mmconfig chipset: fix included last month
- Linux code: mmconfig incompatible with live BAR-probing: *not fixed*

It would be ironic to not fix the only one that is really confined to 
the Linux code.

Everybody more or less agrees *any* patches submitted so far does solve 
the known problems, and will not cause regressions. The only long 
discussion is about how to best prevent the effect of an "imaginary" 
fourth bug, and by nature that's a controversial topic.

For 2.6.24, if nothing more than a few lines can be done, either make 
pci=nommconf the default and add a pci=mmconf option, or/and apply one 
of the easiest patch to review i.e.Tony's one, so small I copy it again 
below (using 0x40 or 0x100 for the comparison does not really matter, 
personally I would change it to 0x100 to be like Ivan's patch, but 
either is much better than nothing). Replacing some mmconfig access by 
conf1 cannot cause any regression.


Loic


P.S.: with that patch, conf1-less x86 systems requiring mmconfig would 
not be supported. But they are like UFOs. They are plenty of them in the 
galaxy, but earth sightings are not convincing enough for 2.6.24 
support, they can wait 2.6.25.


diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
index 1bf5816..4474979 100644
--- a/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -73,7 +73,7 @@ static int pci_mmcfg_read(unsigned int seg, unsigned 
int bus,
     }
 
     base = get_base_addr(seg, bus, devfn);
-    if (!base)
+    if ((!base) || (reg < 0x40))
         return pci_conf1_read(seg,bus,devfn,reg,len,value);
 
     spin_lock_irqsave(&pci_config_lock, flags);
@@ -106,7 +106,7 @@ static int pci_mmcfg_write(unsigned int seg, 
unsigned int bus,
         return -EINVAL;
 
 ...
From: Greg KH
Date: Tuesday, January 15, 2008 - 10:46 am

Heh, no, because it is _way_ too late for such a patch that hasn't been
tested in any trees, sorry.

2.6.25 is the earliest I'll take such a fix, and if it's really as
simple as you say, I'll consider it for the -stable releases for .24 if
needed.

But so far, we have a zillion patches floating around, claiming
different things, some with signed-off-bys and others without, so for
now, I'll just stick with Arjan's patch in -mm and see if anyone
complains about those releases...

thanks,

greg k-h
--

From: Matthew Wilcox
Date: Tuesday, January 15, 2008 - 10:56 am

I complain about Arjan's patch.  For reasons which have been adequately
gone into already in this thread.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Tony Camuso
Date: Tuesday, January 15, 2008 - 12:27 pm

I agree with Matthew.

My preference is Ivan's patch using Loic's proposal.

My patch would have tested MMCONFIG before using it, but it didn't
fix the problem where the decode of large displacement devices can
overlap the MMCONFIG region.

Ivan's patch fixes that, and the problem of Northbridges that don't
respond to MMCONFIG and as a bonus cleans out some code rendered
unnecessary by his patch.

Linus is confident that conf1 is not going away for at least the
next five years.



--

From: Linus Torvalds
Date: Tuesday, January 15, 2008 - 12:38 pm

Not on PC's. Small birds tell me that there can be all these non-PC x86 
subarchitectures that may or may not have conf1.

		Linus
--

From: Matthew Wilcox
Date: Tuesday, January 15, 2008 - 12:40 pm

Right -- hence my patch on top of Ivan's which removes all the assumptions
about conf1 from mmconfig (there are still *references* to conf1 in the
mmconfig code, but they'll only be used if conf1 is functional).

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Loic Prylli
Date: Tuesday, January 15, 2008 - 3:12 pm

But is there a ACPI-compliant/architecture that only offers mmconfig for 
configuration-space access and no other fallback method (i.e. no conf1, 
no bios,...)?

2.6.24 supports mmconfig for:
 - ACPI-system with  MCFG
 - a couple chipset discovered by conf1


If a system has no conf1, but does not have e820+ACPI+MCFG, or does have 
some other method than mmconfig, it was already irrelevant in the 
discussion of Ivan's initial patch in december (because that system was 
either never supported or not impacted, and we were trying to fix bugs, 
not introduce support for new class of systems).


Maybe Arjan could share his knowledge, and tell us what system he was 
thinking about (and whether it needed to be supported by 2.6.24) when 
saying:
  "When (and I'm saying "when" not "if") systems arrive that only have 
MMCONFIG for some of the devices."


Anyway Ivan's patch + Matthew's extensions are handling that non-PC 
arch. That combination is advocated by at least:
Ivan Kokshaysky
Matthew Wilcox
Tony Camuso
Loic Prylli
even Arjan's said that while he prefers his patch (saying it's more 
conservative), he does not see a existing problem with the Ivan/Matthew 
combination.

[ simpler, less ambitious fixes can be forgotten if nothing can be done 
for 2.6.24, I can understand that choice ]


The list of problems I see with Arjan's patch are:
- no word on whether the existing Linux driver/pci/pcie/aer code should 
be converted to opt-in?
- mmconfig still needs to be revisited to sort-out the mix of 
mmconfig+conf1+third-method access.
- you cannot test if ext-conf-space is available without taking risks: 
when pci_enable_ext_config() is called, even legacy-conf-space is 
switched to the new method.  So some administrator action (lspci -v 
+maybe-other-flag) or some driver action (that can optionally use 
ext-conf-space but does not *rely* on it) could cause some devices to 
totally disappear (if some pci hierarchy is handled by mmconfig as a 
0xffffffff ...
From: Grant Grundler
Date: Saturday, January 19, 2008 - 9:58 am

Agreed.
Greg, I think at least two better alternatives were proposed already.
Please review the thread again.

grant
--

From: Tony Camuso
Date: Monday, January 28, 2008 - 11:32 am

Greg,

Have you given Grant's suggestion any further consideration?

I'd like to know how the MMCONFIG issues discussed in this thread are going
to be handled upstream. I have a patch implemented in RHEL 5.2, but I would
rather have the upstream patch implemented, whatever it is.



--

From: Greg KH
Date: Monday, January 28, 2008 - 1:44 pm

Well, everyone still doesn't seem to agree on the proper way forward
here, so for me to just "pick one" isn't very appropriate.

So, can we try again?

Can people submit, what they think the change should be?  Right now I
have Arjan's patch in my kernel tree, but will not send it to Linus for
.25 for now, unless everyone thinks that is the best solution at the
moment (which, for me, I'm leaning toward right now...)

thanks,

greg "can't we all just get along?" k-h
--

From: Matthew Wilcox
Date: Monday, January 28, 2008 - 3:31 pm

My opinion is that Ivan's patch followed by my patch is the best way
forward.  I see Arjan's patch as a good prototype, but it introduces a lot
of unnecessary infrastructure (and a userspace interface that I dislike).

I would like to see Ivan's patch merged ASAP as it does fix one of
my machines.  akpm has the patch from me to disable io decoding, and
intends to send it to Linus during this merge window ... that patch
becomes unnecessary if we merge Ivan's patch.

My patch is an incremental improvement that adds some of the features
of Arjan's patch without the extra infrastructure.  I don't think it's
urgent, but it does make some of our internal interfaces cleaner.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Greg KH
Date: Monday, January 28, 2008 - 3:53 pm

Please send me patches, in a form that can be merged, along with a
proper changelog entry, in the order in which you wish them to be
applied, so I know exactly what changes you are referring to.

thanks,

greg k-h
--

From: Matthew Wilcox
Date: Monday, January 28, 2008 - 7:56 pm

I'll send each patch as a reply to this email.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Matthew Wilcox
Date: Monday, January 28, 2008 - 7:57 pm

PCI x86: always use conf1 to access config space below 256 bytes

Thanks to Loic Prylli <loic@myri.com>, who originally proposed
this idea.

Always using legacy configuration mechanism for the legacy config space
and extended mechanism (mmconf) for the extended config space is
a simple and very logical approach. It's supposed to resolve all
known mmconf problems. It still allows per-device quirks (tweaking
dev->cfg_size). It also allows to get rid of mmconf fallback code.

Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 arch/x86/pci/mmconfig-shared.c |   35 -----------------------------------
 arch/x86/pci/mmconfig_32.c     |   22 +++++++++-------------
 arch/x86/pci/mmconfig_64.c     |   22 ++++++++++------------
 arch/x86/pci/pci.h             |    7 -------
 4 files changed, 19 insertions(+), 67 deletions(-)

diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 4df637e..6b521d3 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -22,42 +22,9 @@
 #define MMCONFIG_APER_MIN	(2 * 1024*1024)
 #define MMCONFIG_APER_MAX	(256 * 1024*1024)
 
-DECLARE_BITMAP(pci_mmcfg_fallback_slots, 32*PCI_MMCFG_MAX_CHECK_BUS);
-
 /* Indicate if the mmcfg resources have been placed into the resource table. */
 static int __initdata pci_mmcfg_resources_inserted;
 
-/* K8 systems have some devices (typically in the builtin northbridge)
-   that are only accessible using type1
-   Normally this can be expressed in the MCFG by not listing them
-   and assigning suitable _SEGs, but this isn't implemented in some BIOS.
-   Instead try to discover all devices on bus 0 that are unreachable using MM
-   and fallback for them. */
-static void __init unreachable_devices(void)
-{
-	int i, bus;
-	/* Use the max bus number from ACPI here? */
-	for (bus = 0; bus < PCI_MMCFG_MAX_CHECK_BUS; bus++) {
-		for (i = 0; i < 32; i++) {
-			unsigned int devfn = PCI_DEVFN(i, ...
From: Greg KH
Date: Tuesday, January 29, 2008 - 6:21 am

Hm, who wrote this, Ivan?

If so, Matthew, please do not strip off authorship of patches, and place
a "From:" line on the first line above the description, so it is not
lost.

thanks,

greg k-h
--

From: Matthew Wilcox
Date: Tuesday, January 29, 2008 - 4:43 pm

Sorry, I didn't know that was the convention.  I thought the first
Signed-off-by: was assumed to be the author.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Linus Torvalds
Date: Tuesday, January 29, 2008 - 5:04 pm

There's certainly a strong correlation between "first sign-off" and 
authorship, but signing off doesn't guarantee it, and while it's not the 
bulk of patches, it certainly happens that people sign off on patches made 
by others (either because the company has specific people who have the 
right to sign off on things, or simply because the code comes from some 
source that did GPL it, but perhaps didn't sign off on it - hopefully 
rare, but certainly not impossible or unheard of especially for 
one-liners that got picked up from mailing lists etc)

		Linus
--

From: Matthew Wilcox
Date: Monday, January 28, 2008 - 8:03 pm

We want to allow different implementations of pci_raw_ops for standard
and extended config space on x86.  Rather than clutter generic code with
knowledge of this, we make pci_raw_ops private to x86 and use it to
implement the new raw interface -- raw_pci_read() and raw_pci_write().

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 arch/ia64/pci/pci.c               |   25 ++++++++-----------------
 arch/ia64/sn/pci/tioce_provider.c |   16 ++++++++--------
 arch/x86/kernel/quirks.c          |    2 +-
 arch/x86/pci/common.c             |   25 +++++++++++++++++++++++--
 arch/x86/pci/direct.c             |    4 ++--
 arch/x86/pci/fixup.c              |    6 ++++--
 arch/x86/pci/legacy.c             |    2 +-
 arch/x86/pci/mmconfig-shared.c    |    6 +++---
 arch/x86/pci/mmconfig_32.c        |   10 ++--------
 arch/x86/pci/mmconfig_64.c        |    8 +-------
 arch/x86/pci/pci.h                |   15 +++++++++++----
 arch/x86/pci/visws.c              |    3 ---
 drivers/acpi/osl.c                |   25 ++++++-------------------
 drivers/ata/Kconfig               |    3 +++
 drivers/ata/Makefile              |    3 +++
 include/linux/pci.h               |   16 ++++++++--------
 16 files changed, 84 insertions(+), 85 deletions(-)

diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 488e48a..8fd7e82 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -43,8 +43,7 @@
 #define PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg)	\
 	(((u64) seg << 28) | (bus << 20) | (devfn << 12) | (reg))
 
-static int
-pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
 	      int reg, int len, u32 *value)
 {
 	u64 addr, data = 0;
@@ -68,8 +67,7 @@ pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
 	return 0;
 }
 
-static int
-pci_sal_write (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_write(unsigned int seg, unsigned int bus, unsigned ...
From: Yinghai Lu
Date: Sunday, February 3, 2008 - 12:30 am

related?

YH
--

From: Tony Camuso
Date: Thursday, February 7, 2008 - 8:54 am

Matthew,

Perhaps I missed it, but did you address Yinghai's concerns?


--

From: Arjan van de Ven
Date: Thursday, February 7, 2008 - 9:28 am

On Thu, 07 Feb 2008 10:54:05 -0500

nothing should use these directly. So static is the right answer ;)
--

From: Tony Camuso
Date: Thursday, February 7, 2008 - 9:36 am

Agreed. Thanks, Arjan.

Matthew,
What about the ATA_RAM addition to Kconfig? Was it accidental,
or intended? If intended, how is it related?
--

From: Grant Grundler
Date: Thursday, February 7, 2008 - 7:28 pm

AFAICT, it looks accidental. I can't see how it's related.
He should be back online next week and can answer for himself.

hth,
grant
--

From: Matthew Wilcox
Date: Saturday, February 9, 2008 - 5:41 am

No.  An unrelated patch that I didn't trim out.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Yinghai Lu
Date: Saturday, February 9, 2008 - 11:25 pm

looks good. it should get into -mm or x86/mm for some testing

YH
--

From: Greg KH
Date: Sunday, February 10, 2008 - 12:21 am

Can I get a revised version of this, without the incorrect hunk?

thanks,

greg k-h
--

From: Matthew Wilcox
Date: Sunday, February 10, 2008 - 7:51 am

Sure.  I've even rebased it against current HEAD.  Damn whitespace
cleanup introducing unnecessary conflicts ....

I suggest Ivan's patch be merged ASAP as it actually fixes bugs.
This patch is just cleanup (and takes care of some future concerns).

From ad4c3f135cda6f5210735231d30ef8e9dbd58c7c Mon Sep 17 00:00:00 2001
From: Matthew Wilcox <matthew@wil.cx>
Date: Sun, 10 Feb 2008 09:45:28 -0500
Subject: [PATCH] Change pci_raw_ops to pci_raw_read/write

We want to allow different implementations of pci_raw_ops for standard
and extended config space on x86.  Rather than clutter generic code with
knowledge of this, we make pci_raw_ops private to x86 and use it to
implement the new raw interface -- raw_pci_read() and raw_pci_write().

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 arch/ia64/pci/pci.c               |   25 ++++++++-----------------
 arch/ia64/sn/pci/tioce_provider.c |   16 ++++++++--------
 arch/x86/kernel/quirks.c          |    2 +-
 arch/x86/pci/common.c             |   25 +++++++++++++++++++++++--
 arch/x86/pci/direct.c             |    4 ++--
 arch/x86/pci/fixup.c              |    6 ++++--
 arch/x86/pci/legacy.c             |    2 +-
 arch/x86/pci/mmconfig-shared.c    |    6 +++---
 arch/x86/pci/mmconfig_32.c        |   10 ++--------
 arch/x86/pci/mmconfig_64.c        |    8 +-------
 arch/x86/pci/pci.h                |   15 +++++++++++----
 arch/x86/pci/visws.c              |    3 ---
 drivers/acpi/osl.c                |   25 ++++++-------------------
 include/linux/pci.h               |   16 ++++++++--------
 14 files changed, 78 insertions(+), 85 deletions(-)

diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 488e48a..8fd7e82 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -43,8 +43,7 @@
 #define PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg)	\
 	(((u64) seg << 28) | (bus << 20) | (devfn << 12) | (reg))
 
-static int
-pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_read(unsigned ...
From: Grant Grundler
Date: Sunday, February 10, 2008 - 12:13 pm

Willy,
Just wondering...why don't we just pass "struct bus*" through to the
raw_pci* ops?
My thinking is if a PCI bus controller or bridge is discovered, then we should
always create a matching "struct bus *".

Your patch looks fine to me but if you (and others) agree with the above,
I can make patch to change the internal interface. The pci_*_config API
needs to remain the same.


Why are we using raw_pci_read here instead of pci_read_config_dword()?
If the pci_write_config_byte() above works, then I expect the read
to work too.

To be clear, this is not a problem with this patch...rather a seperate
problem with the original code.

hth,
grant
--

From: Matthew Wilcox
Date: Sunday, February 10, 2008 - 12:37 pm

ACPI may need to access PCI config space before we've done a PCI bus
walk.  There's an opregion that AML may access that is for PCI config
space, and an apparently unrelated method might happen to contain such a

I have no idea.  I didn't want to change the semantics in this patch.
Presumably the original author would have an idea why they needed to do
this.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Yinghai Lu
Date: Sunday, February 10, 2008 - 1:16 pm

your patch and Ivan's patch should be merged in one...

YH
--

From: Matthew Wilcox
Date: Sunday, February 10, 2008 - 1:19 pm

Why?

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Yinghai Lu
Date: Sunday, February 10, 2008 - 1:25 pm

Even Greg didn't know that there was another patch need to be applied
before this one yesterday.

he said there was some hunks..

YH
--

From: Matthew Wilcox
Date: Sunday, February 10, 2008 - 1:32 pm

I don't believe you.  For example:


Which I then did.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Yinghai Lu
Date: Sunday, February 10, 2008 - 1:47 pm

then you may need to send patches to Greg: So Grey or others don'e
need to dig Ivan's patch

[PATCH 0/2]...
[PATCH 1/2]... Ivan's patch with from statement
[PATCH 2/2] ... your patch

YH
--

From: Linus Torvalds
Date: Sunday, February 10, 2008 - 1:24 pm

I really don't care whether they get merges as one or separately, but I 
think it should be merged _now_ (-rc1 is already delayed), and I'd like to 
see the final versions of both. Does anybody have them in a final 
agreed-upon format (preferably with that oddness in quirk_intel_irqbalance 
also fixed?)

		Linus
--

From: Matthew Wilcox
Date: Sunday, February 10, 2008 - 1:45 pm

I just looked at fixing that -- the reason seems to be that we don't
actually have the struct pci_dev at that point.  I can fix it, but I
think it's actually buggy.  I want to look at some chipset docs to
confirm that though.

I've attached the two patches that I believe are the ones we want.  We
can (and should) fix quirk_intel_irqbalance separately.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
From: Matthew Wilcox
Date: Sunday, February 10, 2008 - 4:02 pm

I don't think I fully understand what's going on here.  So here's what
I've been able to glean; hopefully someone who understands this better
can help out.

I happen to have an E7525-based machine, so here's an lspci of bus 0:

00:00.0 Host bridge: Intel Corporation E7525 Memory Controller Hub (rev 0a)
00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 0a)
00:03.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A1 (rev 0a)
00:04.0 PCI bridge: Intel Corporation E7525/E7520 PCI Express Port B (rev 0a)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02)
00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) AC'97 Audio Controller (rev 02)

The line in question reads:

        /* read xTPR register */
        raw_pci_read(0, 0, 0x40, 0x4c, 2, &word);

That's domain 0, bus 0, device 8, function 0, address 0x4c, length 2.

I've checked the public E7525 and E7520 MCH datasheets, and they don't
document the xTPR registers; nor do any of the devices in the datasheet
have registers documented at 0x4c.

You can see from my lspci above that I don't _have_ a device 8 on bus 0.
The aforementioned documentation says:

"A disabled or non-existent device's configuration ...
From: Matthew Wilcox
Date: Sunday, February 10, 2008 - 10:04 pm

I'd like to thank Grant for pointing out to me that this is exactly what
the write immediately above this is doing -- enabling device 8 to

Here's the patch to implement the above two suggestions:

----

From f565b65591a3f90a272b1d511e4ab1728861fe77 Mon Sep 17 00:00:00 2001
From: Matthew Wilcox <matthew@wil.cx>
Date: Sun, 10 Feb 2008 23:18:15 -0500
Subject: [PATCH] Use proper abstractions in quirk_intel_irqbalance

Since we may not have a pci_dev for the device we need to access, we can't
use pci_read_config_word.  But raw_pci_read is an internal implementation
detail; it's better to use the architected pci_bus_read_config_word
interface.  Using PCI_DEVFN instead of a mysterious constant helps
reassure everyone that we really do intend to access device 8.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 arch/x86/kernel/quirks.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index 1941482..c47208f 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -11,7 +11,7 @@
 static void __devinit quirk_intel_irqbalance(struct pci_dev *dev)
 {
 	u8 config, rev;
-	u32 word;
+	u16 word;
 
 	/* BIOS may enable hardware IRQ balancing for
 	 * E7520/E7320/E7525(revision ID 0x9 and below)
@@ -26,8 +26,11 @@ static void __devinit quirk_intel_irqbalance(struct pci_dev *dev)
 	pci_read_config_byte(dev, 0xf4, &config);
 	pci_write_config_byte(dev, 0xf4, config|0x2);
 
-	/* read xTPR register */
-	raw_pci_read(0, 0, 0x40, 0x4c, 2, &word);
+	/*
+	 * read xTPR register.  We may not have a pci_dev for device 8
+	 * because it might be hidden until the above write.
+	 */
+	pci_bus_read_config_word(dev->bus, PCI_DEVFN(8, 0), 0x4c, &word);
 
 	if (!(word & (1 << 13))) {
 		dev_info(&dev->dev, "Intel E7520/7320/7525 detected; "
-- 
1.5.2.5

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling ...
From: Grant Grundler
Date: Monday, February 11, 2008 - 12:49 am

welcome.


Can you also add a comment which points at the Intel documentation?

http://download.intel.com/design/chipsets/datashts/30300702.pdf
Page 34 documents 0xf4 register.

And I just doubled checked that the 0xf4 register value is restored later

Yeah, this should work even though we don't have a dev for it.

Acked-by: Grant Grundler <grundler@parisc-linux.org>

thanks,
grant
--

From: Matthew Wilcox
Date: Monday, February 11, 2008 - 9:15 am

I'm told that these URLs are not guaranteed to be stable.  And
remembering the pain we had when HP decided to relocate all of their
documents, I'm really not inclined to embed a link to a URL in the


Thanks.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Linus Torvalds
Date: Monday, February 11, 2008 - 10:18 am

I put it in the commit message, but it wasn't on page 34 when I checked (I 
changed it to 69), and I added the naem for the datasheet so that if/when 
it moves, maybe google can help.

		Linus
--

From: Grant Grundler
Date: Monday, February 11, 2008 - 12:38 pm

Sorry - page 34 was just the first reference to "Extended Configuration
Registers" when I originally scrounged up the info for willy.

It should. But doing a quick check now only shows one other copy
(in .es domain :) when searching for "30300702.pdf". 

Searching for the full document title results in several intel.com
locations and lots of other misc references that don't look quite right.
Many of those just reference the "product brief" and not the data sheet.

yahoo.com gives similar results.

thanks,
--

From: Yinghai Lu
Date: Sunday, February 10, 2008 - 6:49 pm

Andrew,

those two patch just got into linus 2.6.25-rc1.

I assume that you will drop
gregkh-pci-pci-make-pci-extended-config-space-a-driver-opt-in.patch in
-mm.

please check some updated patches in -mm that could be affected. hope
it could save you some time

x86-validate-against-acpi-motherboard-resources.patch
x86-clear-pci_mmcfg_virt-when-mmcfg-get-rejected.patch
x86-mmconf-enable-mcfg-early.patch
x86_64-check-msr-to-get-mmconfig-for-amd-family-10h-opteron-v3.patch

YH
From: Robert Hancock
Date: Sunday, February 10, 2008 - 7:53 pm

I don't think any of these patches are affected. They all affect whether 
to use MMCONFIG globally or not, regardless of whether not particular 
accesses will use it.
--

From: Yinghai Lu
Date: Sunday, February 10, 2008 - 10:59 pm

what i mean:

gregkh-pci-pci-make-pci-extended-config-space-a-driver-opt-in.patch is
not needed.

need some update because of changes by "Change pci_raw_ops to
pci_raw_read/write" patch.
such as pci_conf1_read became static...unreachable_devices() is gone..

YH
--

From: Andrew Morton
Date: Monday, February 11, 2008 - 3:10 pm

On Sun, 10 Feb 2008 17:49:34 -0800


I have unhappy feelings here - the patches seem to be churning a bit
and when I last sent them to Greg and Ingo they received no apparent
response.

So I think I'll just drop all four.  Please redo, retest and fully
resubmit, thanks.

And we need to work out who owns these patches.  Are they rightly part of
the PCI tree, or of the x86 tree?

--

From: Ingo Molnar
Date: Monday, February 11, 2008 - 3:38 pm

i actually carried them for a while and 
validate-against-acpi-motherboard-resources.patch got a fair bit of test 
time with positive results. So it has a clear ACK from me.

It's something that looks appealing:

| This path adds validation of the MMCONFIG table against the ACPI 
| reserved motherboard resources. If the MMCONFIG table is found to be 
| reserved in ACPI, we don't bother checking the E820 table.  The PCI 
| Express firmware spec apparently tells BIOS developers that 
| reservation in ACPI is required and E820 reservation is optional, so 
| checking against ACPI first makes sense.  Many BIOSes don't reserve 
| the MMCONFIG region in E820 even though it is perfectly functional, 
| the existing check needlessly disables MMCONFIG in these cases.

anything that isolates Linux from BIOS messups should be music to our 
ears.

i also think the mmconf-enable stuff for Barcelona stuff from Yinghai, 
albeit not particularly pretty, is probably good too for similar 
reasons. It makes the kernel boot with noacpi which is a good sign IMO. 

I have testsystems that simply do not boot with ACPI turned off - and i 
have a testsystem that locks up hard if it takes an NMI in certain ACPI 
AML sequences ... Just Because.

So i'd ACK them just on general principle - earlier versions of the 
patches were carried in x86.git and caused no particular problems.

but ... then we got complaints from you that stuff collides and that 
such patches should be carried in your or Greg's tree, so we dropped 
them. And there was another 100 KLOC of x86 code to worry about ;-)

So i'd suggest to send those patches upstream, they are system enablers 
and they are at fundamental enough places to be apparent if they cause 
any breakage i think.

	Ingo
--

From: Arjan van de Ven
Date: Monday, January 28, 2008 - 8:05 pm

On Mon, 28 Jan 2008 12:44:31 -0800

I think there's only one fundamental disagreement; and that is:
do we think that things are now totally fixed and no new major issues
will arrive after the "fix yet another mmconfig thing" patches are merged.

If the answer is no, then imho my patch is the right approach; it will limit the damage and doesn't make
the people suffer who don't need extended config space.
If the answer is yet, then my patch is not needed.

This is a judgment call; I'm skeptical, others are more optimistic that after 2 years of messing around
they have finally found the last golden fix.

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Matthew Wilcox
Date: Monday, January 28, 2008 - 8:18 pm

I'm more optimistic because we've so severely restricted the use of
mmconf after these patches that it's unlikely to cause problems.  I also
hear Vista is now using mmconf, so fewer implementations are going to
be buggy at this point.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Greg KH
Date: Tuesday, January 29, 2008 - 6:19 am

Hahahaha, oh, that's a good one...

But what about the thousands of implementations out there that are
buggy?

I'm with Arjan here, I'm very skeptical.

Matthew, with Arjan's patch, is anything that currently works now
broken?  Why do you feel it is somehow "wrong"?

thanks,

greg k-h
--

From: Tony Camuso
Date: Tuesday, January 29, 2008 - 7:15 am

Greg,

The problem with Arjan's patch, if I understand it correctly, is that it
requires drivers to make a call to access extended PCI config space.

And, IIRC, Arjan's patch encumbers drivers for all arch's, even those
that have no MMCONFIG problems.

The patches proposed by Loic, Ivan, Matthew, and myself, all address the
problem in an x86-specific manner that is transparent to the drivers.

--

From: Arjan van de Ven
Date: Tuesday, January 29, 2008 - 7:47 am

On Tue, 29 Jan 2008 09:15:02 -0500

this is not quite correct; the patches from Loic, Ivan, Matthew and you are for a different
problem statement.

Your patch problem statement is "need to fix mmconfig", my patch problem statement is "need
to not make users who don't need it suffer". These are orthogonal problems.


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Tony Camuso
Date: Tuesday, January 29, 2008 - 8:15 am

Yes, but your patch also makes users who need extended PCI config space suffer.

Right now, that isn't a lot of people in x86 land, but your patch encumbers drivers
for non-x86 archs with an additional call to access space that they've never had
a problem with.

As more PCI express drivers start to take advantage of AER and other advanced
express capabilities, the extra call to address a condition specific to legacy
x86 hardware is, IMNSHO, a kludge.

The patches submitted by the others fix the problems with MMCONFIG without
encumbering the drivers to be aware of any difference between legacy config
space and extended config space.

I have tested these patches on a number of systems exhibiting various MMCONFIG-
related pathologies, and they work.




--

From: Arjan van de Ven
Date: Tuesday, January 29, 2008 - 8:29 am

On Tue, 29 Jan 2008 10:15:45 -0500


in addition to pci_enable(), pci_enable_msi(), pci_enable_busmaster() they already need to do
to enable various features?


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Tony Camuso
Date: Tuesday, January 29, 2008 - 9:26 am

These calls are related to generic aspects of the PCI* landscape itself and are
not related to any arch-specific hardware, nor were they devised to address
chipset-specific or BIOS-specific problems.

For the good of all, we should endeavor to avoid putting arch-specific fixes into
the generic code whenever possible.

And in this case, not only is it possible, it's been done and tested.

--

From: Matthew Wilcox
Date: Tuesday, January 29, 2008 - 4:57 pm

Umm .. ia64 already does exactly what I'm proposing for x86.  It uses
one SAL interface for bytes below 256 and a different SAL interface for
bytes 256-4095.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Tony Camuso
Date: Tuesday, January 29, 2008 - 7:30 pm

The interface is the same, ia64_sal_pci_config_write() and ia64_sal_pci_config_read(),
but a flag bit in the mode argument is used to tell the SAL interface whether to
translate the offset component of the config address as having 8 or 12 bits of
of displacement.

In my estimation, Ivan's patch, in his implementation of Loic's suggestion, is even
more elegant, since there is no need to flag whether the access is for offsets below
256. Ivan's code automatically uses Port IO (or equivalent with Matthew's patch) for
offsets below 256 and MMCONFIG for offsets from 256 to 4096.

And even better, it removes the bitmap that tracks MMCONFIG-unfriendly devices for
the first 16 buses, a solution that assumes systems with bus numbers higher than 16
will get MMCONFIG right, which turned out to be a very wrong assumption. Furthermore,
the config address is translated by the Northbridge. The delivery mechanism to
the Northbridge, whether Port IO or MMCONFIG, is utterly opaque to the devices on the
bus, since all they see is PCI config cycles, not Port IO or MMCONFIG cycles. The test
only needed to be made at the Northbridge level, not at the device level. Ivan's patch
removes all this cruft.
--

From: Matthew Wilcox
Date: Tuesday, January 29, 2008 - 8:45 pm

Maybe I'm insufficiently imaginative.  Can you come up with a plausible
way in which the two patches I posted will succumb to bugs?  After those
patches we only use mmconf if:

 1. conf1 has failed to work
OR
 2. user has compiled their own kernel without support for conf1
OR
 3. kernel probes config space 0x100 to see if it can access extended
    config space (requires the device to be PCIe or PCI-X2)
OR
 4. root attempts to lspci -xxxx or lspci -v
OR
 5. device driver tries to access extended config space

With Arjan's patch, I believe only case 3 changes.  In cases 4 and 5,
either lspci or the device driver will jump through the hoop to enable

lspci is broken.  It used to be able to access extended config space, and
now can't unless it is patched to know about the sysfs flag to enable it.

If you're determined to implement something to disable extended config
space by default, it can be done in a much better way than Arjan's patch
-- less code (both source and object).

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Ivan Kokshaysky
Date: Wednesday, January 30, 2008 - 8:15 am

[Empty message]
From: Arjan van de Ven
Date: Wednesday, January 30, 2008 - 8:42 am

On Wed, 30 Jan 2008 18:15:39 +0300

Xorg doesn't do pci express ..


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Ivan Kokshaysky
Date: Wednesday, January 30, 2008 - 1:14 pm

Xorg core provides a set of PCI config access functions (via sysfs) for
the graphics drivers. These functions do work correctly with offsets > 256
bytes. Can you guarantee that none of PCI-E video drivers use that,

Unfortunately, not completely true. Though it has nothing to do with
extended config space.

Ivan.
--

From: Jesse Barnes
Date: Wednesday, January 30, 2008 - 10:51 pm

Ugg, let's look at the actual data (again); I'm really not sure why people 
are jumping to such dire conclusions about the current state of things.

AIUI we only have 3 issues so far (remember mmconfig has been enabled in -mm 
for a long time):
  1) host bridge decode problems (disabling decode to avoid overlaps can 
cause some bridges to stop decoding RAM addrs, but we have a fix for that)
  2) config space retry on ATI (I think willy already debunked this one?)
  3) some FUD about SMM or other firmware interrupts coming in during BAR 
sizing while decode is disabled (this one is just pure FUD; if we want to 
solve it properly we need a new platform hook to disable SMM/NMI/etc. 
around PCI probing)

What else was there?  What reason do we have to think that things are so 
disastrous?

So I really prefer willy's approach to Arjan's alternative...

Jesse
--

From: Arjan van de Ven
Date: Saturday, January 12, 2008 - 8:46 am

On Sat, 12 Jan 2008 17:40:30 +0300
Ivan Kokshaysky <ink@jurassic.park.msu.ru> wrote:

no it doesn't!
Other arches need no changes.
--

From: Ivan Kokshaysky
Date: Saturday, January 12, 2008 - 9:23 am

Umm, true. I misread your patch.
But it doesn't change anything - that wasn't my main objection
anyway.

Ivan.
--

From: Arjan van de Ven
Date: Saturday, January 12, 2008 - 10:45 am

On Sat, 12 Jan 2008 17:40:30 +0300


btw this is my main objection to your patch; it intertwines the conf1 and mmconfig code even more.
When (and I'm saying "when" not "if") systems arrive that only have MMCONFIG for some of the devices,
we'll have to detangle this again, and I'm really not looking forward to that.
--

From: Matthew Wilcox
Date: Saturday, January 12, 2008 - 11:17 am

I think this will be OK.  We'll end up with three pci_ops, one for
mmconfig-only, one for mixed mmconfig-conf1 and one for conf1.  We could
do with that now actually -- the machines which will definitely go beserk
if you try to use mmconfig could have the conf1 ops on those busses.

Let's take Ivan's patch for now, and do that patch for 2.6.26.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Ivan Kokshaysky
Date: Saturday, January 12, 2008 - 2:49 pm

There is nothing wrong with it; please realize that mmconf and conf1 are
just different cpu-side interfaces. Both produce precisely the *same* bus

MMCONFIG for *some* of the devices? This doesn't sound realistic
from technical point of view.
MMCONFIG-only systems? Sure. I really hope to see these. But it won't
be PC-AT architecture anymore. It has to be something like alpha,
for instance, fully utilizing the 64-bit address space, and we'll have
to have the whole low-level PCI infrastructure completely different
for these future platforms anyway.
Right now, each and every x86 chipset *does* require working
conf1 just in order to set up the mmconf aperture. It's the very
fundamental thing, sort of design philosophy.

Ivan.
--

From: Arjan van de Ven
Date: Saturday, January 12, 2008 - 4:01 pm

On Sun, 13 Jan 2008 00:49:11 +0300


s/x86/pc/

and not even that.

Really this is a huge design mistake in your patch, the hard coding of conf1,
and for that reason I really don't think it should go in.

We have 4 or so methods on PC today to access config space, probably going to 6 in the next year
or two. One of those methods *HARD PICKING* another one as "second best" for cases where it
doesn't want to deal with is WRONG. It really needs to be up to the architecture/platform
to decide which ops vector is the fallback. And yes on your current PC that might well be conf1.
But hardcoding that is not the right thing. We have the vectors, we have the ranking code,
just make a "second rank" thing. 
Oh wait, my patch did that ;)
Then let either the mmconfig code or the wrapper above it (doesn't matter, in fact, I can see
value of making this decision in the wrapper and keep mmconfig code simple and clean,
because maybe mmconfig IS the thing that the architecture says needs to deal with the lower 256 bytes)..

Oh wait my patch also did that pretty much ;)

The rest of my patch was defaulting to off. Is it that bit that you really hate?


--

From: Tony Camuso
Date: Saturday, January 12, 2008 - 5:12 pm

Arjan,

I have not seen your MMCONFIG patch.

Would you mind sending me a copy?

Thanks.

Tony

--

From: Arjan van de Ven
Date: Saturday, January 12, 2008 - 5:40 pm

On Sat, 12 Jan 2008 19:12:23 -0500

sure


----


On PCs, PCI extended configuration space (4Kb) is riddled with problems
associated with the memory mapped access method (MMCONFIG). At the same
time, there are very few machines that actually need or use this
extended configuration space.

At this point in time, the only sensible action is to make access to the
extended configuration space an opt-in operation for those device
drivers that need/want access to this space, as well as for those
userland diagnostics utilities that (on admin request) want to access
this space.

It's inevitable that this is done per device rather than per bus; we'll
be needing per device PCI quirks to turn this extended config space off
over time no matter what; in addition, it gives the least amount of
surprise: loading a driver for a device only impacts that one device,
not a whole bus worth of devices (although it'll be common to have one
physical device per bus on PCI-E).

The (desireable) side-effect of this patch is that all enumeration is
done using normal configuration cycles.

The patch below splits the lower level PCI config space operation (which
operate on a bus) in two: one that normally only operates on traditional
space, and one that gets used after the driver has opted in to using the
extended configuration space. This has lead to a little code
duplication, but it's not all that bad (most of it is prototypes in
headers and such).

Architectures that have a solid reliable way to get to extended
configuration space can just keep doing what they do now and allow
extended space access from the "traditional" bus ops, and just not fill
in the new bus ops.  (This could include x86 for, say, BIOS year 2009
and later, but doesn't right now)

This patch also adds a sysfs property for each device into which root
can write a '1' to enable extended configuration space. The kernel will
print a notice into dmesg when this happens (including the name of the
app) so that if the system ...
From: Tony Camuso
Date: Saturday, January 12, 2008 - 6:36 pm

Thanks, Arjan.

The problem we have been experiencing has to do with Northbridges,
not with devices.

As far as the device is concerned, after the Northbridge translates
the config access into PCI bus cycles, the device has no idea what
mechanism drove the Northbridge to the translation.

That is to say, the device does not know whether the config cycle
on the bus was caused by an MMCONFIG cycle or a legacy Port IO
cycle delivered to the Northbridge.

In systems that had Northbridges that did not respond correctly to
MMCONFIG cycles, like the AMD 8132, we (HP & RH) were blacklisting
whole platforms to limit them to Port IO PCI config.

However, when platforms emerged using both legacy PCI and PCI express,
the platforms that were limited to Port IO config cycles were not
express compliant, since the express spec requires the platform to
be able to address the full 4096 byte region of config space to
be considered express-compliant.

The patch I devised concerned itself with Northbridges and separated
MMCONFIG-compliant buses from those that could not handle MMCONFIG.

Therefore, the express bus in the platform could happily employ
MMCONFIG to access the entire 4K region, while the legacy bus
with the non-compliant Northbridge could be restricted to Port IO
config.

However, even with my patch, the problem remained where devices
requiring large displacements could overlap the BIOS-mapped
MMCONFIG region. In such a situation, where the bus has passed
the MMCONFIG test, the MMCONFIG region can get doubly mapped by
bus-sizing code, causing the system to hang.

The remedy proposed by Loic and implemented by Ivan is actually
quite elegant, in that it addresses all these problems quite
effectively while eliminating a ration of specialized and somewhat
obscure code.

In my humble opinion, Port IO config access is here to stay, having
been defined as an architected mechanism in the PCI 2.1 spec.

This is most especially true for x86.

In other words, for x86, I don't think ...
From: Arjan van de Ven
Date: Saturday, January 12, 2008 - 9:42 pm

On Sat, 12 Jan 2008 20:36:59 -0500

correct for now.
HOWEVER, and this is the point Linus has made several times:
Just about NOBODY has devices that need the extended config space. At all.
So making this opt-in for devices allows our users to boot and use
their system if they are in the majority that has no need for even getting

Wanne bet there'll be devices that screw this up? THere's devices that even screwed

THis kind of patchup has been going on for the better part of a year (well 2 years)
by now and it's STILL NOT ENOUGH, as you can see by the more patchups that have

You're wrong there. Sad to say, but you're wrong there.

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Matthew Wilcox
Date: Saturday, January 12, 2008 - 9:47 pm

I don't know if they 'screwed it up'.  There are devices that misbehave
when registers are read from pci config space.  But this was never
guaranteed to be a safe thing to do; it gradualy became clear that
people expected to be able to read random registers and manufacturers
responded accordingly, but I don't think you were ever guaranteed to be
able to peek at bits of config space arbitrarily.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Jeff Garzik
Date: Saturday, January 12, 2008 - 11:43 pm

Quite correct...  Reading registers can have all sorts of side effects, 
for example clearing chip conditions.

	Jeff



--

From: Tony Camuso
Date: Sunday, January 13, 2008 - 5:43 am

The PCI express spec requires the platform to provide access to this space
for express-compliance. More devices will be using this space as express

There may have been devices that incorrectly applied the PCI spec to
various fields in the header, I'll grant you that.

However, there is no way a device can determine electrically whether the
Northbridge received Port IO or MMCONFIG cycles. This is between the CPU

Which is why Loic's proposal and Ivan's implementation of it is so elegant.
It solves all these problems in one sweep, and eliminates the code rendered

The PCI spec provides for conf1 as an architected solution. It's not
going away, and especially not in x86 land where Port IO is built-in
to the CPU.



--

From: Arjan van de Ven
Date: Sunday, January 13, 2008 - 10:03 am

On Sun, 13 Jan 2008 07:43:11 -0500

PLATFORM not OS :)
Windows isn't using it in the server space, and only in the client space it recently started


again sadly you're wrong. 

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Tony Camuso
Date: Sunday, January 13, 2008 - 2:28 pm

As someone gently pointed out to me, you are in a position to know this,
so I probably am wrong.

--

From: Alan Cox
Date: Sunday, January 13, 2008 - 5:54 pm

On Sun, 13 Jan 2008 16:28:08 -0500

I suspect Arjan is wrong. It might be some Intel agenda but I still see
fairly new driver reference code that is hardcoding port accesses even
when designed for Redmond products.

Alan
--

From: Arjan van de Ven
Date: Sunday, January 13, 2008 - 6:33 pm

On Mon, 14 Jan 2008 00:54:34 +0000


I find it hard to believe that even they have their drivers do PCI config access via ports directly from the drivers,
and especially in driver reference code...


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Alan Cox
Date: Monday, January 14, 2008 - 2:11 am

Microsoft may not but the standard of Taiwanese driver code (and by
reference I mean vendor reference not OS supplier reference) is not
always great. When you have weeks to write a driver for a product with a
6 month sales lifetime I guess there are other pressures on driver
authors.

Easy enough for Intel to analyse though.

Alan
--

From: Tony Camuso
Date: Sunday, January 13, 2008 - 8:29 pm

To all ...

Well, here is what I perceive we've got so far.

. Some PCI Northbridges do not work with MMCONFIG.

. Some PCI BARs can overlap the MMCONFIG area during bus sizing.
   It is hoped that new BIOSes will locate MMCONFIG in an area
   safely out of the way of bus sizing code, but there can be
   no guarantees.

. conf1 is going away in newer x86 implementations in the not
   too distant future.

. The PCI express spec requires platforms to provide access to
   the extended config area, and there are express devices today
   using that area for AER.

. There is no need to provide different PCI config access
   mechanisms at device granularity, since the PCI config access
   mechanism between the CPU and the Northbridge is opaque to
   the devices. PCI config mechanisms only need to differ at
   the Northbridge level.

. We have a flurry of patches all claiming to solve all or some
   of these problems.


Arjan,

I realize it may not be possible for you to answer this question,
but I feel compelled to ask it anyway. Is it possible that future
x86 architectures will be implementing a SAL-like interface to
abstract PCI config access altogether?

Or can we condense these patches down to a set that does the
following?

. If the system is capable of conf1, then PCI config access
   at offsets < 256 should be confined to conf1. This solution
   is most effective for existing and legacy systems.

. If the system does not support MMCONFIG, of if MMCONFIG is
   not working, then accesses to offsets > 256 return -1 and an
   error status.

. For systems, where the conf1 mechanism is NOT available,
   then MMCONFIG should be the PCI access mechanism for all
   offsets. For such systems, we must assume that the BIOS has
   become smart enough to locate MMCONFIG in a region safe from
   encroachment by bus sizing code.



--

From: Arjan van de Ven
Date: Sunday, January 13, 2008 - 10:05 pm

On Sun, 13 Jan 2008 22:29:23 -0500


not "conf1" but "what the platform thinks is the best method for < 256".

We have this nice abstraction for the platform to select the best method... we should use it.

And still, it's another attempt to get this fixed (well.. it's been 2 years in the coming so far, maybe this will
be the last one, maybe it will not be... we'll see I suppose, but it sucks to be a user who doesn't 
need any of the functionality that the extended config space provides in theory but gets to suffer more of the issues)

I'm all in favor of making this more reliable, but really..
we've thought it was fixed time and time again over the last two years. Please consider
limiting the scope of the damage as well.




-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Tony Camuso
Date: Monday, January 14, 2008 - 6:01 am

I don't understand. If we're going to differentiate MMCONFIG from some other
access mechanism, it only needs to be done at the Northbridge level. Devices
are electrically ignorant of the protocol used between CPU and Northbridge
Agreed.

So we have Loic and Ivan's patch limiting MMCONFIG accesses to
offsets >= 256.

And we have Matthew's patch that abstracts the method for config
accesses to offsets < 256.

I beleive Matthew has already tested these patches for functionality
on x86. All that's needed is to test for regressions on other arches.

Is there any interest in providing the following?

1. The ability to use MMCONFIG for all accesses on systems that have
    no problems with MMCONFIG.

2. For systems using both PCI and PCI express, testing each bus
    for MMCONFIG compliance, to determine whether MMCONFIG can be
    used for all config accesses or whether the bus must be limited
    all to the method abstracted for offsets < 256.

Or does that introduce unnecessary complications?


--

From: Arjan van de Ven
Date: Monday, January 14, 2008 - 7:46 am

On Mon, 14 Jan 2008 08:01:01 -0500


Again this is about having systems that don't need extended config space not use it. At all.
The only way to do that is have the drivers say they need it, and not use it otherwise.
It has NOTHING to do with how things are wired up. It's pure a kernel level policy decision
about whether to use extended config space AT ALL.



-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Tony Camuso
Date: Monday, January 14, 2008 - 8:23 am

The problem with compelling device drivers to determine the PCI
config mechanism is that it must be forced upon arches that
have no PCI configuration quirks or don't even use the same
PCI config mechanisms as x86.

I don't think that's a good policy.

Better to confine arch-specific quirks to the arch-specific code
whenever possible.

--

From: Arjan van de Ven
Date: Monday, January 14, 2008 - 9:01 am

On Mon, 14 Jan 2008 10:23:14 -0500



-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Tony Camuso
Date: Monday, January 14, 2008 - 9:08 am

Arjan, you would be foisting this call on device drivers running on
arches that don't need any such distinction between extended config
space and < 256 bytes.

I still think it's a bad policy.

Let's endeavor to confine arch-specific quirks to the arch-specific
code.

--

From: Linus Torvalds
Date: Sunday, January 13, 2008 - 10:20 pm

Agreed. I suspect that the likelihood of conf1 accesses going away in the 
next five years is slim to none.

			Linus
--

From: Arjan van de Ven
Date: Sunday, January 13, 2008 - 11:41 am

On Sun, 13 Jan 2008 13:23:35 -0500
Loic Prylli <loic@myri.com> wrote:



This entirely misses the point of why I made the patch. The point is NOT
that devices are buggy. The point is that right now, 99.99% of the machines
out there do NOT need extended config space (no matter how it gets accessed),
yet at the same time they suffered from it's issues for... what 2 years now?
The point of my patch was to make people who don't need extended config space,
not have to deal with it anymore.

Note: There is not a 100% overlap between "need" and "will not be used in 
the patches that use legacy for < 256". In the other patches posted, 
extended config space will be used in cases where it won't be with my 
patch. (Most obvious one is an "lspci -vx" from automated scripts). 
Is  that a problem? We've had 2 years of mess, with one not-enough patch after another.
There still are problems TODAY (eg im 2.6.24-rc7). The patch that falls back
to an alternative method for below 256 is no doubt a step in the right direction. 
(although I'm not all that happy about mixing access types, it's not provably incorrect)
Is it enough? I'm not sure. Only time can tell I suppose, but the risk side is that
if it is not enough, users who don't need the extended config space for functionality
will suffer the bugs AGAIN.

So in short, my approach was NOT about "fix PCI", it is about "fix the user experience".
It's a stopgap for sure, until the underlying mechanism gets reliable. It's been 2 years.....
maybe this next step is "it", maybe it isn't.

--

From: Matthew Wilcox
Date: Sunday, January 13, 2008 - 1:43 pm

I believe you to be mistaken in this belief.  If you take Ivan's patch,
conf1 is used for all accesses below 256 bytes.  lspci -x only dumps
config space up to 64 bytes; lspci -xxxx is needed to show extended pci
config space.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Loic Prylli
Date: Sunday, January 13, 2008 - 2:18 pm

I agree with Arjan about that "not a 100% overlap". It is about the 
extra ext-conf-space access done while probing in drivers/pci/probe.c:
    dev->cfg_size = pci_cfg_space_size(dev);

(and lspci -v will also query/show the list of extended-caps for 
pci-x/pcie-x devices that have some, provided the kernel can access 
ext-conf-space).

With Ivan's patch, that line would still cause one extended-conf-space 
access at offset 256 for pcie/pci-x2 devices  (to check the ability to 
query ext-space). Arjan "opt-in" patch would prevent that extra access.

IMHO that access is OK and harmless in all cases, we are already 
protected by MCFG/e820 checks, but I agree one can express a different 
opinion based on trying to prevent "never-seen/potential" hardware/BIOS 
bugs. FWIW it is also there that I was suggested to exclude PCI-X2 
devices (when restricted to pcie, that access while probing cannot even 
cause the harmless master-abort/0xffffffff), but there is a small trade-off.


Loic

--

From: Loic Prylli
Date: Sunday, January 13, 2008 - 1:51 pm

I think I got your point the first time, and I agree it is sound. But in 
my subjective and biased opinion,  I just think ext-conf-space is 
already useful and widespread enough (being used is not the same as 
being strictly required for basic operation) for your proposed tradeoff 
to not be optimal (protecting against "future/non-proven" hardware bugs, 
i.e. bringing non-proven benefits, at the expense of making life harder 
for ext-conf-space users while bringing additional extra API/code).


To take an example from the linux tree: the driver/pci/pcie/aer code 
uses ext-conf-space for every pcie-root (currently several distributions 
enable it by default), does it mean opt-in would be automatically 
activated for most pcie hierarchies (defeating most of the benefits of 
being opt-in), or we just disable that code by default?


Does lspci -v will automatically opt-in all pcie (right now by default 
it tries to list the extended-capabilities for pcie and pcix), or do we 
now require manual explicit sysfs operations to get the whole thing? Is 
is an additional flag to lspci (if so will that flag also apply to pcix, 
possibly causing a crash for lspci -v 



To go one step your direction, I have already argued in a couple of 
emails that I would prefer to not implement ext-conf-space access for 
any PCI-X devices (removing PCI-X2 from pci_ext_cfg_size), because there 
we are trying to support devices that we don't really know exists or 
will ever exists. And protecting against "unproven bugs" makes more 


FWIW, I have in my tree a patch almost identical to Ivan's dated 
"December 2005". Because of the constant activity on the mmconfig front 
(that I thought would make it obsolete), I never took the effort of 
suggesting it before one month ago (I am not a regular user of 
linux-kernel). I admit nobody else should view it that way, but for me  
rather than the last attempt at fixing mmconfig, it's a patch first used 
two years ago that would have arguably ...
From: Øyvind Vågen Jægtnes
Date: Tuesday, January 15, 2008 - 5:58 am

I just thought this might be interesting to the discussion.

I recently bought another 2 GB memory for my computer.
My hardware is as following:

Asus Commando (Intel P965 chipset)
Intel Core2 Q6600
4x1 GB Geil PC6400 memory
nVidia 8800 gts (old g80 core, 640 mb mem)

Without booting with pci=nommeconf i have severe stability issues and
often when its not crashing i get slowdowns with the error:

kern.log:Jan 15 13:19:40 bilbo kernel: [  132.046715] NVRM: Xid
(0001:00): 6, PE0001
... repeated x times.

In addition the nVidia framebuffer seems to "leak" or not update since
i get loads of graphics artifacts.

The system works perfectly fine with 2 GB memory and not the
pci=nommconf.
It works like a charm when using pci=nommconf and 4 GB memory.

In adition i have to enable the Northbridge->PCI Memory remap feature
in the BIOS to avoid the kernel panicing when trying to access > 3 gb
but that is understandable :)

My software is Kubuntu 7.10 stock x86_64 kernel, but i do use the
binary driver by nVidia.

It works like a charm when using pci=nommconf

If you guys need any more info about hardware/software from me, please
let me know.

-- 
Øyvind Vågen Jægtnes
+47 96 22 03 08

(i reject your diurnal rhythm and subsitute my own)
--

From: Arjan van de Ven
Date: Thursday, December 27, 2007 - 7:09 am

On Thu, 27 Dec 2007 06:52:35 -0500


but sadly your second statement is not correct. Part of the complication is that all PCI config ops
operate on busses not devices; at first I thought "just add a bit and be done with it", but sadly it's
not quite the case. Due to the per-bus nature of the ops, you end up having 2 type of bus operations, 
and that's just boilerplate (prototypes, exports and stuff) but it makes up most of the lines of the patch

In addition, a separate raw_pci_ops (for x86 only!) is needed anyway since it's quite likely that 
we'll have various options of each case (extended or not) and we want to pick the best one for each case,



the easiest one is an option to lspci. Nothing more nothing less.

Making a global knob in kernel space is a lot more tricky, and in addition
really there's enough cases where userspace wants the one device anyway
Doing the "for each device I'm about to dump" in lspci is pretty much as hard as doing

see the patch. All pci_enable_ext_config() does is set a flag.
The architecture decides what to do with that flag. Golden
architectures can just totally ignore the flag and always expose
the full space. 
(In fact, the patch assumes all-but-x86 to be golden here; which is fair)



-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Linus Torvalds
Date: Thursday, December 27, 2007 - 10:52 am

Or you force it on with "pci=mmconfig" or something at boot-time.

But yes. The *fact* is that MMCONFIG has not just been globally broken, 
but broken on a per-device basis. I don't know why (and quite frankly, I 
doubt anybody does), but the PCI device ID corruption happened only for a 
specific set of devices.

Whether it was a timing issue with particular devices or whether it was a 
timing issue with some particular bridge (and could affect any devices 
behind that bridge), who knows... It almost certainly was brought on by a 
borderline (or broken) northbridge, but it apparently only affected 
specific devices - which makes me suspect that it wasn't *entirely* due to 
just the northbridge, and it was a combination of things.

I don't understand why you cannot seem to accept that per-device thing, in 
the face of clear data that yes, it really *is* per-device. Not to mention 
the fact that the way MMIO config setups work, you may well have entire 
buses that simply aren't accessible with MMIO config at all (because the 
MMIO config window is not large enough).

Furthermore, please accept the fact that of those 23 devices, exactly 
*none* will actually care. So yes, you'd have to enable it manually for 
those individual devices, but that's only if you want to do something 
totally pointless in the first place.

So stop this totally inane "it has to be global" crap. It doesn't have to 
be global at all, and we have hard data showing that it really SHOULD NOT 
be a global flag.

		Linus
--

Previous thread: Re: [BUG][PATCH -mm] bluetooth : rfcomm add get/put device in del_conn by Dave Young on Tuesday, December 25, 2007 - 3:07 am. (3 messages)

Next thread: Re: TOMOYO Linux Security Goal by Serge E. Hallyn on Wednesday, December 26, 2007 - 9:42 am. (14 messages)