Re: [PATCH 0/5][RFC] Physical PCI slot objects

Previous thread: auto-reboot "broken" in 2.6.23.1 (from 2.6.22.12) by Linda Walsh on Monday, November 12, 2007 - 4:36 pm. (2 messages)

Next thread: [PATCH/RFC] kconfig hints/tips/tricks by Randy Dunlap on Monday, November 12, 2007 - 5:17 pm. (2 messages)
From: Alex Chiang
Date: Monday, November 12, 2007 - 5:08 pm

Hello,

[this patch series touches a few subsystems; hopefully I got all
the right maintainers]

Recently, Matthew Wilcox sent out the following mail about PCI
slots:

  http://marc.info/?l=linux-pci&m=119432330418980&w=2

The following patch series is a rough first cut at implementing
the ideas he outlined, namely, that PCI slots are physical
objects that we care about, independent of their hotplug
capabilities.

We introduce a struct pci_slot as a first-class citizen, and turn
hotplug_slot into a subsidiary structure. A brief glimpse at the
potential for cleanup is given, as we modify the existing hotplug
drivers and remove the multiple get_address() methods. Certainly
more cleanup can be done with this new structure.

Additionally, we introduce an ACPI PCI slot driver, which will
detect all physical PCI slots in the system (as described by
ACPI). De-coupling the notion of a physical slot from its hotplug
capability allows users to understand the physical geography of
their machines, whether their slots are hotpluggable or not.

This patch series builds heavily on Willy's prior work (with a
bit of the typical refresh needed when aiming at the moving
target of the kernel). The ACPI PCI slot driver is new.

I have tested this series on both ia64 (hp rx6600) and x86 (hp
dl380g). On ia64, system firmware provides _SUN methods for the
PCI slots and we get entries in /sys/bus/pci/slots/. On my x86
machine, firmware didn't seem to provide a _SUN, so we don't get
any entries in /sys/bus/pci/slots/, but nothing really bad
happens either. ;)

Comments/feedback are appreciated.

Thanks.

/ac

-

From: Alex Chiang
Date: Monday, November 12, 2007 - 5:12 pm

[Please note, I got Rick Jones' email screwed up in the 0/5
email; it's corrected above.]

From: Alex Chiang <achiang@hp.com>

Rename the slot to be the contents of the 'path' sysfs attribute, and
delete the attribute.  The mapping from pci address to slot name is
supposed to be done through the 'address' file, which will be provided
automatically later in this series of patches.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
---
 drivers/pci/hotplug/sgi_hotplug.c |   27 +--------------------------
 1 files changed, 1 insertions(+), 26 deletions(-)

diff --git a/drivers/pci/hotplug/sgi_hotplug.c b/drivers/pci/hotplug/sgi_hotplug.c
index ef07c36..e1e7c9d 100644
--- a/drivers/pci/hotplug/sgi_hotplug.c
+++ b/drivers/pci/hotplug/sgi_hotplug.c
@@ -91,21 +91,6 @@ static struct hotplug_slot_ops sn_hotplug_slot_ops = {
 
 static DEFINE_MUTEX(sn_hotplug_mutex);
 
-static ssize_t path_show (struct hotplug_slot *bss_hotplug_slot,
-	       		  char *buf)
-{
-	int retval = -ENOENT;
-	struct slot *slot = bss_hotplug_slot->private;
-
-	if (!slot)
-		return retval;
-
-	retval = sprintf (buf, "%s\n", slot->physical_path);
-	return retval;
-}
-
-static struct hotplug_slot_attribute sn_slot_path_attr = __ATTR_RO(path);
-
 static int sn_pci_slot_valid(struct pci_bus *pci_bus, int device)
 {
 	struct pcibus_info *pcibus_info;
@@ -173,18 +158,10 @@ static int sn_hp_slot_private_alloc(struct hotplug_slot *bss_hotplug_slot,
 		return -ENOMEM;
 	bss_hotplug_slot->private = slot;
 
-	bss_hotplug_slot->name = kmalloc(SN_SLOT_NAME_SIZE, GFP_KERNEL);
-	if (!bss_hotplug_slot->name) {
-		kfree(bss_hotplug_slot->private);
-		return -ENOMEM;
-	}
+	bss_hotplug_slot->name = slot->physical_path;
 
 	slot->device_num = device;
 	slot->pci_bus = pci_bus;
-	sprintf(bss_hotplug_slot->name, "%04x:%02x:%02x",
-		pci_domain_nr(pci_bus),
-		((u16)pcibus_info->pbi_buscommon.bs_persist_busnum),
-		device + 1);
 
 	sn_generate_path(pci_bus, ...
From: Alex Chiang
Date: Monday, November 12, 2007 - 5:13 pm

Register one slot per slot, rather than one slot per function.
Change the name of the slot to fake%d instead of the pci address.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
---
 drivers/pci/hotplug/fakephp.c |   75 +++++++++++++++--------------------------
 1 files changed, 27 insertions(+), 48 deletions(-)

diff --git a/drivers/pci/hotplug/fakephp.c b/drivers/pci/hotplug/fakephp.c
index 027f686..828335e 100644
--- a/drivers/pci/hotplug/fakephp.c
+++ b/drivers/pci/hotplug/fakephp.c
@@ -93,6 +93,7 @@ static int add_slot(struct pci_dev *dev)
 	struct dummy_slot *dslot;
 	struct hotplug_slot *slot;
 	int retval = -ENOMEM;
+	static int count = 1;
 
 	slot = kzalloc(sizeof(struct hotplug_slot), GFP_KERNEL);
 	if (!slot)
@@ -106,7 +107,8 @@ static int add_slot(struct pci_dev *dev)
 	slot->info->max_bus_speed = PCI_SPEED_UNKNOWN;
 	slot->info->cur_bus_speed = PCI_SPEED_UNKNOWN;
 
-	slot->name = &dev->dev.bus_id[0];
+	slot->name = kmalloc(8, GFP_KERNEL);
+	sprintf(slot->name, "fake%d", count++);
 	dbg("slot->name = %s\n", slot->name);
 
 	dslot = kmalloc(sizeof(struct dummy_slot), GFP_KERNEL);
@@ -141,17 +143,17 @@ error:
 static int __init pci_scan_buses(void)
 {
 	struct pci_dev *dev = NULL;
-	int retval = 0;
+	int lastslot = 0;
 
 	while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
-		retval = add_slot(dev);
-		if (retval) {
-			pci_dev_put(dev);
-			break;
-		}
+		if (PCI_FUNC(dev->devfn) > 0 &&
+				lastslot == PCI_SLOT(dev->devfn))
+			continue;
+		lastslot = PCI_SLOT(dev->devfn);
+		add_slot(dev);
 	}
 
-	return retval;
+	return 0;
 }
 
 static void remove_slot(struct dummy_slot *dslot)
@@ -275,23 +277,9 @@ static int enable_slot(struct hotplug_slot *hotplug_slot)
 	return -ENODEV;
 }
 
-/* find the hotplug_slot for the pci_dev */
-static struct hotplug_slot *get_slot_from_dev(struct pci_dev *dev)
-{
-	struct dummy_slot *dslot;
-
-	list_for_each_entry(dslot, &slot_list, node) ...
From: Linas Vepstas
Date: Tuesday, November 13, 2007 - 12:48 pm

Please use snprintf to avoid buffer overruns!

--linas
-

From: Matthew Wilcox
Date: Tuesday, November 13, 2007 - 12:52 pm

Or, since kmalloc can return a 32-byte object at smallest, just allocate
32 bytes and continue using sprintf.  fake%d would overflow after
999,999,999,999,999,999,999,999,999 pci devices, by which time we've
run the machine out of memory anyway.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-

From: Rolf Eike Beer
Date: Wednesday, November 14, 2007 - 5:39 am

This is ugly. Please do it the way we already do e.g. for acpiphp: add a=20
char[8] to "struct dummy_slot" and just reference that here. Or better do=20
what this name suggests: kill fakephp completely and use dummyphp[1] instea=
d.

Eike

1) http://opensource.sf-tec.de/kernel/
From: Alex Chiang
Date: Wednesday, November 14, 2007 - 7:17 am

Hi Eike,


I took at look at the code in acpiphp you're talking about, and
I'm not sure why you think the above is "ugly". We're talking
about a runtime vs compile time storage for the name, and doing a
kmalloc/sprintf is a pretty standard idiom.

BTW, I did incorporate both Linas' and Willy's comments, by
changing the kmalloc size to an explicit 32, and using snprintf
instead.

In any case, for your particular comment, I think I'm going to
leave it alone for now, and let Greg weigh in with the final
recommendation.

Thanks for the review.

/ac

-

From: Rolf Eike Beer
Date: Wednesday, November 14, 2007 - 7:49 am

Because we have another allocation of very small size for every slot here.

struct dummy_slot has a size of 4 pointers, that's 16 byte for 32 bit=20
architectures. Putting 8 byte of additional storage here would save a=20
complete allocation on 32 bit platforms. For 64 bit platforms the memory=20
usage is the same but we do less allocations (i.e. less points to fail) and=
=20
less memory fragmentation.

Btw: your code lacks a check if kmalloc() returns NULL.

Eike
From: Alex Chiang
Date: Wednesday, November 14, 2007 - 8:01 am

Hi Eike,


Good points. I'll make your suggested change for v2.

Thanks.

/ac

-

From: Alex Chiang
Date: Monday, November 12, 2007 - 5:14 pm

Introduce struct pci_slot

  - Make pci_slot the primary sysfs entity. hotplug_slot becomes a
    subsidiary structure.
  - Change the prototype of pci_hp_register() to take the bus and
    slot number (on parent bus) as parameters.
  - Remove all the ->get_address methods since this functionality is
    now handled by pci_slot directly.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
---
 drivers/pci/Makefile                    |    2 +-
 drivers/pci/hotplug/acpiphp.h           |    1 -
 drivers/pci/hotplug/acpiphp_core.c      |   23 +---
 drivers/pci/hotplug/acpiphp_glue.c      |   16 --
 drivers/pci/hotplug/acpiphp_ibm.c       |    5 +-
 drivers/pci/hotplug/cpci_hotplug_core.c |    2 +-
 drivers/pci/hotplug/cpqphp_core.c       |    4 +-
 drivers/pci/hotplug/fakephp.c           |    2 +-
 drivers/pci/hotplug/ibmphp_ebda.c       |    3 +-
 drivers/pci/hotplug/pci_hotplug_core.c  |  237 +++++++++++--------------------
 drivers/pci/hotplug/pciehp_core.c       |   22 +---
 drivers/pci/hotplug/rpadlpar_sysfs.c    |    4 +-
 drivers/pci/hotplug/sgi_hotplug.c       |    7 +-
 drivers/pci/hotplug/shpchp_core.c       |   17 +--
 drivers/pci/pci.h                       |   13 ++
 drivers/pci/slot.c                      |  157 ++++++++++++++++++++
 include/linux/pci.h                     |   14 ++
 include/linux/pci_hotplug.h             |   12 +--
 18 files changed, 293 insertions(+), 248 deletions(-)
 create mode 100644 drivers/pci/slot.c

diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 5550556..12f0b2d 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -2,7 +2,7 @@
 # Makefile for the PCI bus specific drivers.
 #
 
-obj-y		+= access.o bus.o probe.o remove.o pci.o quirks.o \
+obj-y		+= access.o bus.o probe.o remove.o pci.o quirks.o slot.o \
 			pci-driver.o search.o pci-sysfs.o rom.o setup-res.o
 obj-$(CONFIG_PROC_FS) += proc.o
 
diff --git a/drivers/pci/hotplug/acpiphp.h ...
From: Linas Vepstas
Date: Tuesday, November 13, 2007 - 12:56 pm

How much migration do you expect?  Some of it? All of it? If the
answer is "all of it", wouldn't it just be easier to rename 
struct hotplug_slot, and go from there?

--linas
-

From: Matthew Wilcox
Date: Tuesday, November 13, 2007 - 1:03 pm

Only the bits that make sense for non-hotpluggable slots.

I tried your suggestion first.  It wasn't much fun.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-

From: Alex Chiang
Date: Monday, November 12, 2007 - 5:17 pm

Detect all physical PCI slots as described by ACPI, and create
entries in /sys/bus/pci/slots/.

Not all physical slots are hotpluggable, and the acpiphp module
does not detect them. Now we know the physical PCI geography of
our system, without caring about hotplug.

Signed-off-by: Alex Chiang <achiang@hp.com>
---

Looking to get comments on this patch, as it could use some
work.
 
I borrowed a lot of code from acpiphp_glue.c. If your work is
identifiable and feel that pci_slot.c should have your (c),
please let me know.

More importantly, in register_slot(), we are seeing a lot of
noisy output during boot, as acpi_get_pci_id() complains quite
often about not finding an ACPI-PCI binding on devices we pass
in. I'm not quite sure what is the best way to detect whether
we've already added a slot, so that we can avoid calling
acpi_get_pci_id() so often.

 drivers/acpi/Makefile   |    3 +-
 drivers/acpi/pci_slot.c |  168 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 170 insertions(+), 1 deletions(-)
 create mode 100644 drivers/acpi/pci_slot.c

diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 54e3ab0..f6caec8 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -47,7 +47,8 @@ obj-$(CONFIG_ACPI_FAN)		+= fan.o
 obj-$(CONFIG_ACPI_DOCK)		+= dock.o
 obj-$(CONFIG_ACPI_BAY)		+= bay.o
 obj-$(CONFIG_ACPI_VIDEO)	+= video.o
-obj-y				+= pci_root.o pci_link.o pci_irq.o pci_bind.o
+obj-y				+= pci_root.o pci_link.o pci_irq.o pci_bind.o \
+				   pci_slot.o
 obj-$(CONFIG_ACPI_POWER)	+= power.o
 obj-$(CONFIG_ACPI_PROCESSOR)	+= processor.o
 obj-$(CONFIG_ACPI_CONTAINER)	+= container.o
diff --git a/drivers/acpi/pci_slot.c b/drivers/acpi/pci_slot.c
new file mode 100644
index 0000000..a49a14e
--- /dev/null
+++ b/drivers/acpi/pci_slot.c
@@ -0,0 +1,168 @@
+/*
+ *  pci_slot.c - ACPI PCI Slot Driver
+ *
+ *  The code here is heavily leveraged from the acpiphp module.
+ *  Thanks to Matthew Wilcox <matthew@wil.cx> for much guidance.
+ ...
From: Alex Chiang
Date: Monday, November 12, 2007 - 5:18 pm

Change the semantics of pci_create_slot() such that it does not
require a hotplug release() method when creating a slot. Now, we
can use this interface to create a pci_slot for any physical PCI
slots, not just hotpluggable ones.

Add new pci_slot_add_hotplug() interface so that various PCI hotplug
drivers can add hotplug release() methods to pre-existing physical
slots.

Signed-off-by: Alex Chiang <achiang@hp.com>
---
 drivers/pci/hotplug/pci_hotplug_core.c |    2 +-
 drivers/pci/slot.c                     |   39 +++++++++++++++++++++++++++-----
 include/linux/pci.h                    |    4 ++-
 3 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/hotplug/pci_hotplug_core.c b/drivers/pci/hotplug/pci_hotplug_core.c
index 96c21e2..91afa8e 100644
--- a/drivers/pci/hotplug/pci_hotplug_core.c
+++ b/drivers/pci/hotplug/pci_hotplug_core.c
@@ -568,7 +568,7 @@ int pci_hp_register(struct hotplug_slot *slot, struct pci_bus *bus, int slot_nr)
 		return -EINVAL;
 	}
 
-	pci_slot = pci_create_slot(bus, slot_nr, slot->name, hotplug_release);
+	pci_slot = pci_slot_add_hotplug(bus, slot_nr, hotplug_release);
 	if (IS_ERR(pci_slot))
 		return PTR_ERR(pci_slot);
 	slot->pci_slot = pci_slot;
diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
index 5d830d4..3e91491 100644
--- a/drivers/pci/slot.c
+++ b/drivers/pci/slot.c
@@ -70,20 +70,48 @@ static int create_sysfs_files(struct pci_slot *slot)
 	return result;
 }
 
+struct pci_slot *pci_slot_add_hotplug(struct pci_bus *parent, int slot_nr,
+				      void (*release)(struct pci_slot *))
+{
+	struct pci_slot *slot;
+	int found = 0;
+
+	down_write(&pci_bus_sem);
+
+	/* This slot should have already been created, so look for it. If
+	 * we can't find it, return -EEXIST.
+	 */
+	for (slot = parent->slot; slot; slot = slot->next)
+		if (slot->number == slot_nr) {
+			found = 1;
+			break;
+		}
+
+	if (!found) {
+		slot = ERR_PTR(-EEXIST);
+		goto out;
+	}
+
+	slot->release = release;
+ ...
From: Greg KH
Date: Tuesday, November 13, 2007 - 10:01 am

I'm still not sold on this idea at all.  I'm really betting that there
is a lot of incorrect acpi slot information floating around in machines
and odd things will show up in these slot entries.

I say this because for a long time there was no "standard" acpi entries
for hotplug slots and different companies did different things.  Hence
the "odd" IBM acpi hotplug implementation as one example.  If this is
going to go anywhere, you need to get IBM to agree that it works
properly with all their machines...

Also, some companies already provide userspace tools to get all of this
information about the different slots in a system and what is where,
from userspace, no kernel changes are needed.  So, why add all this
extra complexity to the kernel if it is not needed?

thanks,

greg k-h
-

From: Matthew Wilcox
Date: Tuesday, November 13, 2007 - 11:33 am

Is that the end of the world?  Instead of having no information, we'll
end up with odd information.  If people complain, we can always
blacklist (indeed, won't the ACPI rolling blacklist catch the majority

Not in terms of slot names.  There were various things that ACPI didn't
specify, like attention and latches, but the description of _SUN hasn't

Do you have any examples of this?

We should, IMO, be improving the way we tell users which device has
a problem.  'tulip7', 'eth4', 'pci 0000:04:1e.3' ... all the user wants
to know is which damn card it is so they can replace it.  And slot name
tells them that.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-

From: Greg KH
Date: Tuesday, November 13, 2007 - 11:51 am

I don't think the rolling blacklist will catch this, as the rest of the
ACPI code is "correct".

And yes, incorrect information exported by the kernel is not good at

Ok, again, I want to see the IBM people sign off on this, after testing
on all of their machines, before I'll consider this, as I know the IBM
acpi tables are "odd".

Also, how about Dell machines?  I know they are probably not expecting
this information to show up and who knows if the numbering of their
slots match up with their physical diagrams (I say this based on all of

IBM sells a program that does this for server rooms.  It's probably part
of some Tivoli package somewhere, sorry I don't remember the name.  I
did see it working many years ago and it required no kernel changes at

Not if the slot information is incorrect :)

That's my main concern.

thanks,

greg k-h
-

From: Matthew Wilcox
Date: Tuesday, November 13, 2007 - 1:11 pm

That seems a little higher standard than patches are normally held to.
How about the patches get sent to the appropriate people at IBM (who are
they?) and if we haven't heard a NAK from them after a month, then they

I'm pretty sure that Windows uses the _SUN information, so I think this

Well, yes, it doesn't take kernel effort if you're willing to build
proprietary knowledge into a userspace program.  This seems similar
to the argument that filesystems don't need to be in the kernel since
they can be implemented in user space.  They work better in the kernel,
just like this does.

By the way, the end result of this should be a pci-hotplug system that has
much less code in the drivers, more uniformity between the presentation
to userspace, more functionality and is easier to understand.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-

From: Greg KH
Date: Tuesday, November 13, 2007 - 1:19 pm

When you are touching code that I know is going to have problems on a
whole range of different manufacturer's machines, do you expect me to
just ignore that and let you add a feature that is going to be

Again, based on the past history with such things as mis-labeled
ethernet ports, I have doubts.  I can hope, but I need proof based on
the major problems we have had in the past.

It is better to not show information to users at all, then to give them

Don't mix up things like filesystems and exposing pci slots.  That is
just foolish...

thanks,

greg k-h
-

From: Gary Hade
Date: Tuesday, November 13, 2007 - 4:08 pm

I be one of them. :)  I have been involved in many (but not all)
of IBM's x86 based (IBM System x) servers with hotplug capable

I am not fundamentally opposed to this new capability but
share the same concerns that Greg and others have expressed.
So far, I have only tried the changes on one single node
system (IBM x3850) but the below NAK-worthy result supports
the idea that the changes need to be well and widely tested.

Have you possibly considered a kernel option as a kinder and
gentler way of introducing the changes?

Gary

-- 
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503  IBM T/L: 775-4503
garyhade@us.ibm.com
http://www.ibm.com/linux/ltc

====
IBM x3850
Slots 1-2: PCI-X under PCI root bridges
Slots 3-6: PCIe under transparent P2P bridges
Slot 1: PCI-X - populated
Slot 2: PCI-X - !populated
Slot 3: PCIe -  populated
Slot 4: PCIe -  !populated
Slot 5: PCIe -  !populated
Slot 6: PCIe -  populated

result is with 2.6.24-rc2 plus all 4 proposed patches
problem: acpiphp failed to register empty PCIe slots 4 and 5
====

# dmesg
acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
acpiphp_glue: found PCI host-bus bridge with hot-pluggable slots
acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:02:01
acpiphp: Slot [1] registered
acpiphp_glue: found PCI host-bus bridge with hot-pluggable slots
acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:06:01
acpiphp: Slot [2] registered
acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:0a:00.0
acpiphp_glue: found ACPI PCI Hotplug slot 3 at PCI 0000:0b:00
acpiphp: Slot [3] registered
acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:0f:00.0
acpiphp_glue: found ACPI PCI Hotplug slot 4 at PCI 0000:10:00
acpiphp: pci_hp_register failed with error -17
acpiphp_glue: acpiphp_register_hotplug_slot failed(err code = 0xffffffef)
acpiphp_glue: found ACPI PCI Hotplug slot 4 at PCI 0000:10:00
acpiphp: pci_hp_register failed with error -17
acpiphp_glue: acpiphp_register_hotplug_slot ...
From: Alex Chiang
Date: Tuesday, November 13, 2007 - 6:37 pm

Hi Gary,




Silly question, but I have to ask. :)

I sent out 5 patches -- is this simply a typo on your part, or

Ok, so acpiphp wasn't going to register those slots anyway, since
they are empty. It would have bailed out after not seeing _ADR or
_EJ0 on those slots.

The acpi-pci-slot driver created those slots anyway, which is one
of the points of the patch -- to create sysfs entries even for

[repeated 7x]

We saw this message 8x, once for each SxFy object under your p2p
bridge. I actually somewhat did expect to see this error message
(hence the RFC part of my patch ;)

I currently don't have a good way to determine if we've already
seen an empty slot under a p2p bridge, so we try to register
every SxFy object. Of course, a /sys/bus/pci/slots/4/ entry



Arguably, the right thing happened here. We got entries for empty
slots, and we know their addresses.

If anyone can clue me in on a better way to implement patch 4/5
in my series so that we're not seeing those multiple attempts to
register slots under p2p bridges, I'd love to hear your ideas.

Thanks again for testing.

/ac

-

From: Gary Hade
Date: Wednesday, November 14, 2007 - 5:40 pm

Thanks.  This will allow everyone to focus on the systems where
the changes are most beneficial and not waste a bunch of time



No, acpiphp should (and did before your changes) register all 
hotplug capable slots.  All 6 slots (2 PCI-X, 4 PCIe) in that
system are hotplug capable.  Emptyness shouldn't matter.  If
the empty slots are not registered it is not be possible to
successfully hotplug cards to them.

Without your changes acpiphp loads with the following output.
  acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
  acpiphp: Slot [1] registered
  acpiphp: Slot [2] registered
  acpiphp: Slot [3] registered
  acpiphp: Slot [4] registered
  acpiphp: Slot [5] registered
  acpiphp: Slot [6] registered

With your changes I confirmed that an attempted hotplug
to a boot-time vacant PCIe slot failed as expected.  The
driver saw the insertion event but didn't find anything
to enable:
acpiphp_glue: handle_hotplug_event_bridge: Bus check notify on \_SB_.VP05.CALG
acpiphp_glue: handle_hotplug_event_bridge: re-enumerating slots under \_SB_.VP05.CALG


No, the P2P parent bus is 0000:0f and the P2P child bus is
0000:10 so I believe the real address for slot 4 should be
0000:10:00.

kernel without your changes after loading acpiphp:
  # cat /sys/bus/pci/slots/4/address
  0000:10:00

kernel with your changes both before and after loading acpiphp:
  # cat /sys/bus/pci/slots/4/address

Of course, this kind of confusing noise would not be acceptable

No, the wrong thing happened here.  I expect the slot directories
for the empty slots to look the same as they did before your changes.
This is what the slot directories for empty slots look like without 
your changes.
  # find /sys/bus/pci/slots/[45]
  /sys/bus/pci/slots/4
  /sys/bus/pci/slots/4/power
  /sys/bus/pci/slots/4/attention
  /sys/bus/pci/slots/4/latch
  /sys/bus/pci/slots/4/adapter
  /sys/bus/pci/slots/4/address
  /sys/bus/pci/slots/5
  /sys/bus/pci/slots/5/power
  /sys/bus/pci/slots/5/attention
  ...
From: Alex Chiang
Date: Thursday, November 15, 2007 - 10:36 am

Hi Gary,


Of course, you are right. I am not sure what I was thinking when
I wrote that email yesterday, but I pretty much got it exactly
wrong. I think I confused myself between empty/populated vs
hp/non-hp. In any case, thanks for the patient explanation.

I went back and looked at a vanilla kernel on my machine that has
the following hardware configuration:

HP rx6600
Slot 1-2: PCI-X under PCI root bridges
Slot 3-6: PCIe under transparent P2P bridges
Slot 7-10: PCI-X under PCI root bridges
Slot 1: PCI-X,  !populated, !hp
Slot 2: PCI-X,  populated, !hp
Slot 3: PCIe,   populated, hp
Slot 4: PCIe,   populated, hp
Slot 5: PCIe,   !populated, hp
Slot 6: PCIe,   populated, hp
Slot 7: PCI-X,  !populated, hp
Slot 8: PCI-X,  !populated, hp
Slot 9: PCI-X,  !populated, hp
Slot 10: PCI-X, !populated, hp

On a kernel without my changes, we see:

[root@canola slots]# modprobe acpiphp
[root@canola slots]# ls
10  3  4  5  6  7  8  9
[root@canola slots]# for i in * ; do ls $i ; done
adapter  address  attention  latch  power
adapter  address  attention  latch  power
adapter  address  attention  latch  power
adapter  address  attention  latch  power
adapter  address  attention  latch  power
adapter  address  attention  latch  power
adapter  address  attention  latch  power
adapter  address  attention  latch  power
[root@canola slots]# for i in * ; do cat $i/address ; done
0000:01:02
0000:8b:00
0000:52:00
0000:c5:00
0000:10:00
0000:0a:01
0000:4a:01
0000:01:01

This confirms that you were right -- we see all the hp slots show
up, even though many of them are non-populated. Also, the correct
hp entries show up, just like you'd expect.

Ok, so all that said, after re-implementing my ACPI-PCI slot
driver, we get all the correct answers, but with the additional
appearance of slots 1 and 2 (which aren't hotpluggable):

[root@canola slots]# ls
1  10  2  3  4  5  6  7  8  9
[root@canola slots]# for i in * ; do ls $i ; done
address
adapter  address  attention  latch  ...
From: Gary Hade
Date: Thursday, November 15, 2007 - 4:38 pm

On Thu, Nov 15, 2007 at 10:36:13AM -0700, Alex Chiang wrote:

Yea, looks much better.  Nice to see that the addresses
for slots below the p2p bridges now include the child
instead of the parent bus number. :)


Will do.  It still looks like I will not get the box back
until early next week.  I will only be working Monday and
Tuesday due to holiday/vacation but that will hopefully not

I haven't tried it yet but I feel better seeing that you
are now testing on a system having a PCI topology that
appears similar to some of our systems.


Well, I didn't see any obvious warts but I'm pretty skilled
at missing problems when looking at that kind of info.

Thanks,
Gary

-- 
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503  IBM T/L: 775-4503
garyhade@us.ibm.com
http://www.ibm.com/linux/ltc

-

From: Alex Chiang
Date: Wednesday, November 14, 2007 - 7:42 am

Hi Gary,


Can you send me the lspci -vv and lspci -vt output from this
machine?

Thanks.

/ac

-

From: Gary Hade
Date: Wednesday, November 14, 2007 - 11:13 am

Alex, I added some slot number hints to the below `lspci -vt`
output.

I am still looking at your other message and will have some 
hopefully helpful comments soon.  I did apply all 5 patches,
4 was a typo.

Thanks,
Gary

-- 
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503  IBM T/L: 775-4503
garyhade@us.ibm.com
http://www.ibm.com/linux/ltc


Script started on Wed 14 Nov 2007 03:38:14 AM PST
0;root@elm3a9:~[root@elm3a9 ~]# lspci -vt
-+-[0000:19]---00.0-[0000:1a-1d]--+-00.0  Intel Corporation 82571EB Gigabit Ethernet Controller    <- slot 6
 |				  \-00.1  Intel Corporation 82571EB Gigabit Ethernet Controller
 +-[0000:14]---00.0-[0000:15-18]--                                                                 <- slot 5
 +-[0000:0f]---00.0-[0000:10-13]--                                                                 <- slot 4
 +-[0000:0a]---00.0-[0000:0b-0e]----00.0-[0000:0c]----04.0  Adaptec ASC-29320ALP U320              <- slot 3
 +-[0000:06]---00.0  IBM Calgary PCI-X Host Bridge                                                 <- slot 2
 +-[0000:02]-+-00.0  IBM Calgary PCI-X Host Bridge                                                 <- slot 1
 |	     \-01.0  S2io Inc. Xframe II 10Gbps Ethernet
 +-[0000:01]-+-00.0  IBM Calgary PCI-X Host Bridge
 |	     +-01.0  Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet
 |	     +-01.1  Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet
 |	     \-02.0  Adaptec AIC-9410W SAS (Razor ASIC non-RAID)
 \-[0000:00]-+-00.0  IBM Calgary PCI-X Host Bridge
	     +-01.0  ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE]
	     +-03.0  NEC Corporation USB
	     +-03.1  NEC Corporation USB
	     +-03.2  NEC Corporation USB 2.0
	     +-0f.0  Broadcom CSB6 South Bridge
	     +-0f.1  Broadcom CSB6 RAID/IDE Controller
	     \-0f.3  Broadcom GCLE-2 Host Bridge
0;root@elm3a9:~[root@elm3a9 ~]# lspci -vvv
00:00.0 Host bridge: IBM Calgary PCI-X Host Bridge (rev 04)
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- ...
From: Alex Chiang
Date: Wednesday, November 14, 2007 - 11:36 am

Hi Gary,



Actually, I just reworked my patch this morning, and believe that
I have a much cleaner implementation now that should fix a lot of
the errors you saw.

Incoming!

Thanks.

/ac

-

From: Alex Chiang
Date: Tuesday, November 13, 2007 - 1:36 pm

Who would be a good contact at IBM to get some eyes / machine

Who would be a good contact at Dell for the same?

I don't have as much experience with oddball firmware from
various vendors as others on this list, but given the rather
stable definition of _SUN in the ACPI spec, I'd be surprised to
see vendors abusing that method. [I fully accept the possibility
that I'm just naive ;)]

More likely, a vendor will do what the HP Proliant folks did,
that being simply omitting a _SUN method altogether.

One more thought on that -- at *worst* my patch series will do no
worse than the status quo of what the acpiphp module is doing
today. That module walks through the namespace looking for _SUN
methods, and when it finds them, it creates an entry in exactly
the same spot (/sys/bus/pci/slots/N) that my patch series does.

What this series adds beyond acpiphp is adding entries for slots

Like I said in an earlier email, HP ia64 systems will require a
kernel change to get this information. Whether it comes via a
generic ACPI access layer like dev_acpi, or something like this
patch series, the kernel will still get touched.

Thanks.

/ac

-

From: Greg KH
Date: Tuesday, November 13, 2007 - 2:30 pm

The acpiphp module is not loaded on zillions of non-pci-hotplug

How are you going to ensure that the userspace programs that use those
sysfs files are not going to break into tiny pieces now that you have

And like I said, I'm pretty sure you don't need to touch the kernel
today as there are people doing this just fine from userspace without
any kernel changes needed :)

thanks,

greg k-h
-

From: Bjorn Helgaas
Date: Tuesday, November 13, 2007 - 3:01 pm

Do you know how this Tivoli package works or where it gets the

I think you are assuming userspace can just dump the raw ACPI tables
and extract this information from them.  But I don't think that's
possible because _SUN is a method that can contain arbitrary AML.
That AML has to be *executed*, and you can't do that safely in
userspace.

Bjorn
-

From: Greg KH
Date: Tuesday, November 13, 2007 - 3:16 pm

I was told it was just reading the ACPI tables from userspace.  As I saw
it running on a machine without a modified kernel, I had no reason to
doubt this, but it might be doing something else for all I know.

What I do know is that it somehow does work without any kernel changes
needed, and is in use today by very large data centers without any

Yes, I was assuming that based on the above running code, if this is
somehow impossible to do from userspace, I don't know what the code is
doing.

thanks,

greg k-h
-

From: Matt Domsch
Date: Tuesday, November 13, 2007 - 2:15 pm

The only reported _SUN problems on Dell systems were on the PE6800 and
PE6850 systems, which we've fixed with an updated BIOS several months
ago.  IIRC the values weren't always unique which kind of defeated the
purpose.

The eth0/eth1 "issues" are completely separate, with no relation to
ACPI.  That's purely a PCI tree depth-first discovery
vs. breadth-first discovery (where it indeed seems everyone except
Linux 2.6 uses and expects breadth-first).  We looked at extending
ACPI to export device naming preference, but instead extended SMBIOS
tables because we could get that change in far easier.

The other bit that looks at slot numbers is the old PCI IRQ Routing

Again, I've only seen this be incorrect on those two models of Dell
servers with outdated BIOS versions.

Thanks,
Matt

-- 
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux
-

From: Alex Chiang
Date: Tuesday, November 13, 2007 - 2:31 pm

FWIW, the ACPI 2.0 spec did not require uniqueness for _SUN.
(although there is a strange table that refers to _SUN as the
slot-unique ID (table 6-1 in spec v2.0b), the actual definition
of _SUN does not mention uniqueness).

Uniqueness was first required in the 3.0 spec.

/ac

-

From: Greg KH
Date: Tuesday, November 13, 2007 - 2:36 pm

Does your code handle if these are not unique?

thanks,

greg k-h
-

From: Alex Chiang
Date: Tuesday, November 13, 2007 - 4:14 pm

Hi Greg,


I hate to punt on this, but I'm not doing anything that acpiphp
is not doing. If we get back non-unique values for _SUN, I'm
pretty sure both acpiphp and my pci_slot driver will both
continue along as far as they can, until they get an error back
from kobject_register() about trying to register an object with
an existing name.

I think I may have a machine that has non-unique values for _SUN.
If so, I'll try and hunt it down and do some testing.

Thanks for the suggestion.

/ac

-

From: Greg KH
Date: Tuesday, November 13, 2007 - 2:32 pm

I know they are different, I was using that as an example of trying to
match up physical descriptions on a machine with the internal kernel
representation.  It isn't always easy and lots of times things go wrong,
if for no other reason than no one is checking these things.

thanks,

greg k-h
-

From: Alex Chiang
Date: Tuesday, November 13, 2007 - 1:21 pm

On HP ia64 systems, that information is locked away in ACPI, and
there's no easy way to get at it. Alex Williamson tried providing
a generic dev_acpi driver, so that userspace could do whatever
they wanted to with the information:

  http://lkml.org/lkml/2004/8/3/106

And that effort kinda died. I'm of mixed feelings on that driver,
since it would be really nice to get unfettered access to the
ACPI namespace, but it's pretty dangerous, since any interesting
thing you might want to do is actually a method (aka, it calls
into firmware) and who knows what side effects there might be.

So from my point of view, if ia64 customers want to know about
the slots they have in their systems, we're gonna have to do
something kernel-intrusive anyhow.

Thanks.

/ac

-

From: Greg KH
Date: Tuesday, November 13, 2007 - 1:26 pm

Doesn't /sys/firmware/acpi give you raw access to the correct tables
already?

And isn't there some other tool that dumps the raw ACPI tables?  I
thought the acpi developers used it all the time when debugging things
with users.

thanks,

greg k-h
-

From: Rick Jones
Date: Tuesday, November 13, 2007 - 3:51 pm

I'm neither an acpi developer (well I don't think that I am :) nor an
end-user, but here are the two things for which I was going to use the
information being presented by Alex's patch:

1) a not-yet, but on track to be released tool to be used by end-users
to diagnose I/O bottlenecks - the information in
/sys/bus/pci/slot/<foo>/address would be used to associated interfaces
and/or pci busses etc with something the end user would grok - the
number next to the slot.

2) I was also going to get the folks doing installers to make use of the
"end-user" slot ID. Even without going to the extreme of the
aforementioned 192 slot system, an 8 slot system with a bunch of
dual-port NICs in it (for example) is going to present this huge list of
otherwise identical entries. Even if the installers show the MAC for a
NIC (or I guess a WWN for an HBA or whatnot) that still doesn't tell one
without prior knowledge of what MACs were installed in which slot, which
slot is associated with a given ethN. Having the end-user slot ID
visible is then going to be a great help to that poor admin who is doing
the install.

rick jones
-

From: Greg KH
Date: Tuesday, November 13, 2007 - 3:56 pm

Why not just use the code in the linux firmware kit that does this
already today from userspace (thanks to Kristen for pointing this out to
me on irc.)?

No kernel changes needed, and you can get started on your userspace
application today, will work on all distros, no problems at all :)

thanks,

greg k-h
-

From: Matthew Wilcox
Date: Tuesday, November 13, 2007 - 4:04 pm

So then we have something that works on ACPI-based machines.  Who will
add support for all the other kinds of firmware and non-firmware based
slots?

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-

From: Greg KH
Date: Tuesday, November 13, 2007 - 4:07 pm

Well, it seems that the powerpc people do not want this at all, as they
just have "virtual" slots that they don't want people trying to root
around and find the real thing.

What other types of machines can export this kind of information?  How
will you discover non-firmware based slots?

thanks,

greg k-h
-

From: Scott Murray
Date: Tuesday, November 13, 2007 - 11:00 pm

Just to add my 2 cents, I've not seen any CompactPCI gear that had a
way to easily map a PCI peripheral slot to the corresponding physical
slot in the chassis.  In the products I've worked on that use my CPCI
hotplug drivers, we did any required mapping in userspace based on
knowledge of the chassis layout for the particular product.

Scott


-- 
==============================================================================
Scott Murray, scott@spiteful.org

     "Good, bad ... I'm the guy with the gun." - Ash, "Army of Darkness"
-

From: Kristen Carlson Accardi
Date: Tuesday, November 13, 2007 - 4:33 pm

On Tue, 13 Nov 2007 16:04:00 -0700

As far as being able to retrieve the slot number (which it seemed from
the HP manageablity application perspective is the goal here), that
information is available from userspace as well for at least standard PCI
and pcie based systems for occupied slots.  For standard pci, you have
to make something up anyway - for shpchp we just use an incremental 
number and combine it with the bus number to represent the slot.  For
pcie, you can get this info from the slot capabilities register.
-

From: Matthew Wilcox
Date: Tuesday, November 13, 2007 - 5:10 pm

Ummm ... that's not what the /spec/ says.  I've never worked on any shpc
machines, but the shpc driver reads the slot values from the SLOT_CONFIG
register, just like the spec says to.

I don't understand how you're supposed to get the slot number for
standard PCI for occupied slots.  Could you explain?

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-

From: Kenji Kaneshige
Date: Wednesday, November 14, 2007 - 2:55 am

The slot number for shpc slot is like 'YYYY_XXXX'.

YYYY is the bus number, though I don't know the specific reason
why it was added.

XXXX is slot number decided according to the shpc specification,
as you said.

Thanks,
Kenji Kaneshige

-

From: Kristen Carlson Accardi
Date: Wednesday, November 14, 2007 - 11:38 am

On Wed, 14 Nov 2007 18:55:21 +0900

Kenji is correct, you should listen to him, not me :).

For standard PCI slots that are not hotplug, I would think you'd need
firware support to find the slot numbers -- but I'm not sure on this.

btw - the YYYY was added to prevent duplicate slot numbers on buggy 
machines btw.
-

From: Kristen Carlson Accardi
Date: Tuesday, November 13, 2007 - 3:59 pm

On Tue, 13 Nov 2007 12:26:32 -0800

There are - people should take a look at the Intel Linux Firmware Kit
for an example of how to parse ACPI tables in userspace - the code
is also GPL'd, so you are free to use it in another GPL application.

http://www.linuxfirmwarekit.org/download.php#source

In many DSDTs I've seen, _SUN is hardcoded anyway and can be found
by reading the DSDT from userspace.  This is what the firmwarekit
does to check for duplicate _SUN in one of it's tests.

Kristen

-

From: Bjorn Helgaas
Date: Wednesday, November 14, 2007 - 10:37 am

I see three relevant things in the firmware kit:

  1) ExecuteAmlMethod() in amlpoke/amlpoke.c.  This uses
     /proc/acpi/hotkey/event_config to cause the kernel to
     execute an AML method.  This looks similar to what dev_acpi
     does and is unsafe for the same reasons (the method may have
     side effects that interfere with kernel drivers).  The kernel
     support for this was removed in 2.6.21:

       http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=5ee6ed...

  2) execute_aml_method() in acpitable.c.  Similar to above.

  3) parse_SUN_name() in SUN/SUN.c.  This uses acpidump, acpixtract,
     and iasl -d to disassemble the DSDT and SSDTs, then looks for
     things like "Name (_SUN, 0x0000012C)".  That works well if _SUN
     merely returns a constant, and many DSDTs do that.

     But _SUN can be implemented as a control method, and then we have
     a problem because we can't determine the _SUN value by inspecting
     the DSDT.  We have to evaluate the method, and that may require
     operation regions, semaphores, etc., so it can only be done in the
     kernel.

So I agree that the firmware kit has a clever hack that works on much
existing x86 firmware, and it sounds like Tivoli might even rely on
it.  But I don't feel good about it, and it could easily break when
some BIOS writer needs to make _SUN slightly more complicated.

Bjorn
-

From: Greg KH
Date: Wednesday, November 14, 2007 - 10:53 am

Do you know of such BIOSes out there that do this?  Will the above
scheme not work for the ia64 boxes that you know of that are out in the
world today?

thanks,

greg k-h
-

From: Alex Chiang
Date: Wednesday, November 14, 2007 - 12:53 pm

Hi Greg,


Matthew and Bjorn have already been hinting at this, but I'd like
to voice my concern as well.

Even if no one is implementing _SUN as a complicated AML control
method today, that is no guarantee that prevents future firmware
vendors from doing so. The spec allows it, so in my opinion,
pointing to existing implementations and claiming that it all
just works is somewhat unfair (and scary).

If/when the first vendor implements an AML version of _SUN, the
firmware kit will break, and Tivoli will have to react.

I was initially intrigued with you and Kristen pointed out this
tool, but after thinking about it some more, I am not convinced
it is a viable alternative, due to the unknown nature of future
firmware implementations.

Thanks.

/ac

-

From: Alex Chiang
Date: Wednesday, November 14, 2007 - 2:24 pm

Hi Greg,


Talked to some of our ia64 firmware folks, and they confirmed
that our future platforms will implement _SUN as a control
method.

Thanks.

/ac

-

From: Alex Chiang
Date: Wednesday, November 14, 2007 - 2:42 pm

One last mail on this subject -- Bjorn has pointed out to me that
the Dell pe6800 and rez1850 both implement _SUN as control
methods today.

Does Tivoli run on those machines? If so, how is it getting slot
information?

Thanks.

/ac

-

From: Alex Chiang
Date: Thursday, November 15, 2007 - 1:20 pm

Hi Greg, Matt,


I downloaded the firmware kit today and played with it. There is
a test called SUN.exe, which searches through the DSDT, looking
to verify that there are no duplicate _SUN values in the system.

The test breaks on my hp rx6600 (a currently shipping platform),
because _SUN is not a hard-coded value, but implemented as an AML
control method. From the various SSDT dumps:

Here's the _SUN for a slot:
	Method (_SUN, 0, Serialized)
	{
		Store (\SLOT.SUN, ^_UID)
		Local0
		Return (Local0)
	}

Here's the _SUN for a processor:
	Method (_SUN, 0, NotSerialized)
	{
		Store (\CPUL.SUN, ^CPID)
		Local0
		Return (Local0)
	}

So parse_SUN_name() from SUN.c is just plain broken. The test
"passes" because it doesn't find any duplicate values for _SUN,
but what's really going on is that it doesn't find *any* values
of _SUN, so of course there won't be duplicates. ;)

I looked at convincing this test to try and execute the method
using ExecuteAmlMethod() and/or execute_aml_method(), but of
course, that won't work on a modern kernel, as they depend on
/proc/acpi/hotkey/, which was removed (as Bjorn pointed out).

Matt, is there any chance you could see if the firmware kit works
or breaks on your PE6800? 

I downloaded the latest tarball:

  http://www.linuxfirmwarekit.org/download/firmwarekit-r3.tar.gz

And ran it in stand-alone mode. You do need to build the test
suite, but you don't need to run the entire thing -- just run the
SUN/SUN.exe that is generated. (If you get odd complaints about
not being able to find libstandalone.so, just stick it in
/usr/local/lib, modify /etc/ld.so.conf as necessary and run
ldconfig).

Please apply the following debug patch to see how many _SUN
objects the test is actually finding. On my machine, I don't find
any _SUN objects.

Thanks.

/ac

diff --git a/SUN/SUN.c b/SUN/SUN.c
index 90264a7..3b21b9c 100644
--- a/SUN/SUN.c
+++ b/SUN/SUN.c
@@ -260,6 +260,7 @@ static void parse_SUN_name(gpointer data, gpointer ...
From: Matthew Garrett
Date: Wednesday, November 14, 2007 - 10:44 am

Dumping raw ACPI tables isn't adequate - _SUN might be a complex ACPI 
method with multiple reads and writes to raw hardware, and we really 
don't want to do that in userspace. The only way to do this reliably is 
in the kernel.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
-

From: Greg KH
Date: Wednesday, November 14, 2007 - 10:51 am

But it really isn't, as the firmware kit has proven...

thanks,

greg k-h
-

From: Matthew Garrett
Date: Wednesday, November 14, 2007 - 11:03 am

The firmware toolkit will only work if the _SUN method merely returns a 
hardcoded value. The spec doesn't require that that be the case, and 
it's easy to imagine situations where it won't.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
-

From: Linas Vepstas
Date: Tuesday, November 13, 2007 - 1:24 pm

Second that motion.... I don't get it. What are the goals of this
patch, really?  Just to get a "slot geographical location" from the
kernel?  I'm balancing the intellectual appeal of having a kernel
struct for representing physical objects, against the headache of 
reading (debugging, modifying) code that has yet another struct 
doing yet another thing.  So far, the dread of future headaches 
is winning.

On pseries systems, I deal with something called the "partitionable
endpoint", which I think probably usually corresponds to physical 
slots, but I don't really know.  The hardware guys pitch changeups 
all the time.  Some of these are soldered onto the planar, 
so the are not physical slots. But they are, um, "hotpluggable"
in that you can dynamically add & remove them from the system
(even though you cannot physically unplug the ones that are 
soldered on -- you can only assign them to other hardware partitions
(think hardware VM or hardware Xen)).  So, naively, the physical 
slot concept doesn't really map to what I have to work with; 
it just adds one more appendix to it all, one more thing to 
get confused about.

To be clear: above remarks are for the PowerPC boxes. I have
no clue about how things work on the IBM Intel-based boxes.
And Greg's original "get IBM to agree" remark is about the
Intel-based boxes.

--linas
-

From: Alex Chiang
Date: Tuesday, November 13, 2007 - 1:59 pm

[Empty message]
From: Linas Vepstas
Date: Tuesday, November 13, 2007 - 2:41 pm

~ # find /sys/bus/pci/slots -print
/sys/bus/pci/slots
/sys/bus/pci/slots/control
/sys/bus/pci/slots/control/remove_slot
/sys/bus/pci/slots/control/add_slot
/sys/bus/pci/slots/0001:00:02.0
/sys/bus/pci/slots/0001:00:02.0/phy_location
/sys/bus/pci/slots/0001:00:02.0/max_bus_speed
/sys/bus/pci/slots/0001:00:02.0/adapter
/sys/bus/pci/slots/0001:00:02.0/attention
/sys/bus/pci/slots/0001:00:02.0/power
/sys/bus/pci/slots/0001:00:02.6
etc.

~ # cat /sys/bus/pci/slots/0000:00:02.2/phy_location
U787A.001.DNZ00Z5-P1-C2

phy_location corresponds to the printed labels on the box,
and is also shared with the various management consoles
and serivce processors and what not that get involved 
with stuff like that.

Although, please note: on these sysmes, 99% of the "useful"
info about hardware is in /proc/device-tree, and not in /sys
I doubt anyone looks at /sys/bus/pci/slots, I think they mostly
look at the open firmware device tree.

--linas

-

From: Matthew Wilcox
Date: Tuesday, November 13, 2007 - 2:58 pm

Ugh.  Almost two years ago, paulus promised me he was going to fix the
slot name for rpaphp.  Guess he didn't.

This is one of the hateful things about the current design -- hotplug
drivers do too much.  Instead of being just the interface between the
Linux PCI code and the hardware, they create sysfs directories, add files,
and generally have far too much freedom.

We have four different schemes currently for naming in slots/,
1. slot number.  Used by cpqphp, ibmphp, acpiphp, pciehp, shpc.
2. domain:bus:dev:fn.  Used by fakephp.
3a. domain:bus:dev.  Used by rpaphp and sgihp.
3b. Except that rpaphp uses phy_location to present the information that
should be in the name and sgihp uses path.

... I've forgotten what cpci uses.  And yenta doesn't use it.

How is anyone supposed to write sane managability tools in the presence

Right.  This should look like:

# cat /sys/bus/pci/slots/U787A.001.DNZ00Z5-P1-C2/address
0000:00:02

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-

From: Andi Kleen
Date: Tuesday, November 13, 2007 - 6:07 pm

It's not only complexity. Each new sysfs entry costs memory.
Memory is not free. There should be always a good reason for those.

-Andi
-

From: Matthew Wilcox
Date: Wednesday, November 14, 2007 - 7:17 am

It's not a lot of memory; it's one directory and a couple of files for
each PCI slot in the system.  Even huge systems have maybe 200 slots.
In order for this to take up as much as one page of ram on a typical PC
with six slots, this would have to consume 680 bytes per directory.  I
don't think sysfs is that inefficient (and if it is, maybe this feature
is not where the problem is, given the 'power' directory per device, the
19 files per scsi device, the huge numbers of symlinks, etc).

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-

From: Andi Kleen
Date: Wednesday, November 14, 2007 - 7:35 am

It becomes much more when someone does a find /sys. dentries are 
expensive. They eventually can get pruned again, but it's still
costly to do that.

-Andi
-

From: Matthew Wilcox
Date: Wednesday, November 14, 2007 - 8:00 am

Again, if this is a big concern for you, there are better places to look
at for savings.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-

From: Andi Kleen
Date: Wednesday, November 14, 2007 - 8:08 am

Whoever is proposing a feature has the burden to justify that
its usefulness is larger than the overhead/cost it adds.

Doesn't seem to be the case with this one so far.

And in general ignoring overhead in new features is a pretty sad
approach. Big bloat does come in small steps with each new feature.

-Andi
-

From: Matthew Wilcox
Date: Wednesday, November 14, 2007 - 8:12 am

Huh?  There are half a dozen people who think it does, and half a dozen
people who think it doesn't.  Fine, you're on the side which sees no use

A very generic argument which could be used to shoot down any new feature.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-

From: Alex Chiang
Date: Wednesday, November 14, 2007 - 8:20 am

I'm a bit confused where you're going with this. I thought the
main point of contention wasn't around the utility of the
patchset, it was around getting the actual information correct.

We've already mentioned several uses that this patchset adds:

  1. allowing managability folks to determine which physical slot
  their failing card might be sitting in (independent of hotplug
  capability).

  2. giving installers a method of presenting the physical slot
  to the user so that the user can choose to, say, do a network
  install off of the card in slot N, vs having to guess based on
  MAC address or something else unfriendly

  3. performance monitoring tools based on slot addresses/numbers

I believe that userspace cares more about those sorts of things

I'm kinda new to the Linux kernel, so I don't really get what
you're saying. Are you saying that the approved method of
exposing kernel information to the user via sysfs is actually
bloat? Even if that information has utility?

Feel free to educate me; I'm here to learn.

Thanks.

/ac

-

From: Kenji Kaneshige
Date: Wednesday, November 14, 2007 - 4:43 am

Hi,

I tried the series of patches and I encountered some problems.
Since I don't have enough time to investigate them, I can only
report them at present. My environment was ia64 machine with 64
hotplug slots (shpc & pcie). Those slots can be handled by
acpiphp too, though I have not tried it yet.

- Problem(1)
  Two kind of Call Traces were displayed so many times at the
  boot time. Those are attached below. I guess it relate to
  the ACPI Namespace definition.

- Problem(2)
  Many of the shpc slot initializations failed with the
  following error messages. I guess the cause is Problem(1).

  shpchp: pci_hp_register failed with error -17
  shpchp: shpchp: slot initialization failed

- Problem(3)
  All of the pcie slot initializations failed with the
  following error messages. I guess the cause is Problem(1).

  pciehp: pci_hp_register failed with error -17
  pciehp: pciehp: slot initialization failed

BTW, I have some comments about the patches.

- I think your patches assume that all the hotplug slots have
  ACPI _SUN method regardless of the hotplug controller type,
  because pci_slot.c is the only place to call pci_slot_create().
  But I think there are hotplug slots without _SUN. If the
  hotplug slot doesn't have _SUN, slot initialization by the
  hotplug controller driver doesn't work (does it?), because
  there are no slot directory.

- I think pci_slot.o should be kernel module so that users can
  select to enable or disable this functionality.

Here is the Call Trace mentioned above:

Unable to register kobject 1024<3>pci_slot: pci_create_slot returned -22
ACPI: Invalid ACPI Bus context for device <NULL>
sysfs: duplicate filename '1024' can not be created
WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()

Call Trace:
 [<a000000100014b60>] show_stack+0x40/0xa0
                                sp=e000014101befaf0 bsp=e000014101be0f60
 [<a000000100014bf0>] dump_stack+0x30/0x60
                                sp=e000014101befcc0 bsp=e000014101be0f48
 ...
Previous thread: auto-reboot "broken" in 2.6.23.1 (from 2.6.22.12) by Linda Walsh on Monday, November 12, 2007 - 4:36 pm. (2 messages)

Next thread: [PATCH/RFC] kconfig hints/tips/tricks by Randy Dunlap on Monday, November 12, 2007 - 5:17 pm. (2 messages)