Re: [patch 2/2] PNP: don't check disabled PCI BARs for conflicts in quirk_system_pci_resources()

Previous thread: sysfs per namespace by Daniel Lezcano on Monday, September 29, 2008 - 8:46 am. (1 message)

Next thread: [RFC PATCH 0/4] Implementation of IR support using the input subsystem by Jon Smirl on Monday, September 29, 2008 - 9:17 am. (7 messages)
From: Bjorn Helgaas
Date: Monday, September 29, 2008 - 8:53 am

These two patches address this 2.6.26 regression, so they should go
in before 2.6.27:
    http://bugzilla.kernel.org/show_bug.cgi?id=11550

This PNP/PCI resource conflict area has been a trouble spot in the
past:
    http://thread.gmane.org/gmane.linux.acpi.devel/27312
    https://bugzilla.redhat.com/show_bug.cgi?id=280641
    https://bugzilla.redhat.com/show_bug.cgi?id=313491

It would be great if Matthew, Avuton, Karl, and Willem could test
these patches to make sure they don't re-introduce the old problem.

Bjorn
--

From: Bjorn Helgaas
Date: Monday, September 29, 2008 - 8:56 am

Add pci_resource_enabled() to determine whether a PCI BAR is
enabled.  Sometimes firmware leaves PCI BARs unconfigured, and
this interface is a way for the OS to identify that situation.
    
This is based on section 3.5 of the PCI Firmware spec, which says:
    
  Since not all devices may be configured prior to the operating
  system handoff, the operating system needs to know whether a
  specific BAR register has been configured by firmware. The operating
  system makes the determination by checking the I/O Enable, and
  Memory Enable bits in the device's command register, and Expansion
  ROM BAR enable bits. If the enable bit is set, then the corresponding
  resource register has been configured.
    
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>

diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 1a5fc83..c4c90f8 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -26,6 +26,32 @@
 #include "pci.h"
 
 
+int pci_resource_enabled(struct pci_dev *dev, int bar)
+{
+	u16 command = 0;
+	u32 addr = 0;
+
+	/*
+	 * Based on section 3.5, "Device State at Firmware/Operating System
+	 * Handoff," in the PCI Firmware spec.
+	 */
+	pci_read_config_word(dev, PCI_COMMAND, &command);
+
+	if (pci_resource_flags(dev, bar) & IORESOURCE_IO)
+		return command & PCI_COMMAND_IO;
+
+	if (command & PCI_COMMAND_MEMORY) {
+		if (bar == PCI_ROM_RESOURCE) {
+			pci_read_config_dword(dev, dev->rom_base_reg, &addr);
+			return addr & PCI_ROM_ADDRESS_ENABLE;
+		}
+
+		return 1;
+	}
+
+	return 0;
+}
+
 void pci_update_resource(struct pci_dev *dev, struct resource *res, int resno)
 {
 	struct pci_bus_region region;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c0e1400..28ec520 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -796,6 +796,8 @@ static inline int pci_proc_domain(struct pci_bus *bus)
 }
 #endif /* CONFIG_PCI_DOMAINS */
 
+extern int pci_resource_enabled(struct pci_dev *dev, int bar);
+
 #else /* ...
From: Bjorn Helgaas
Date: Monday, September 29, 2008 - 8:57 am

quirk_system_pci_resources() checks PNP motherboard resource for
conflicts with PCI device BARs.  When doing this, we should ignore
disabled PCI BARs, because they often contain zero and look like
they would conflict with legacy devices at low addresses.
    
This patch addresses this regression from 2.6.26:
    http://bugzilla.kernel.org/show_bug.cgi?id=11550
    
Thanks to Frans Pop for reporting this issue and testing the fixes.
    
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Tested-by: Frans Pop <elendil@planet.nl>

diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
index 0bdf9b8..ef5ed99 100644
--- a/drivers/pnp/quirks.c
+++ b/drivers/pnp/quirks.c
@@ -247,6 +247,9 @@ static void quirk_system_pci_resources(struct pnp_dev *dev)
 		for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
 			unsigned int type;
 
+			if (!pci_resource_enabled(pdev, i))
+				continue;
+
 			type = pci_resource_flags(pdev, i) &
 					(IORESOURCE_IO | IORESOURCE_MEM);
 			if (!type || pci_resource_len(pdev, i) == 0)
--

From: Linus Torvalds
Date: Monday, September 29, 2008 - 9:34 am

I really don't think this is the right approach.

Maybe the PCI device has been turned off, but the *resource* may still be 
valid.

Wouldn't it be much better to just check whether the resource is inserted 
in the resource tree or not? 

Quite frankly, it looks like your change will basically cause us to look 
over *every* system PnP resource, and for each of them, it will look at 
*every* PCI device, and for each PCI device it will look at *every* BAR, 
and for each BAR it finds it will read the PCI status register, over and 
over and over again.

Now, I doubt you'll be able to wear out the PCI bus, but doesn't this just 
make you go "umm, that's not pretty, and it doesn't make much sense".

If we've detected the PCI resource as being valid by the PCI layer, why 
not just use that information? And afaik, the easy way to check that is 
just whether it's inserted into the resource tree, which in turn is most 
trivially done by just checking whether the resource has a parent.

IOW, why isn't it just doing

	struct resource *res = dev->resource[bar];

	if (!res->parent)
		continue;

or something? Or what was wrong with just checking the res->start for 
being zero? Wherever PnP is relevant, resources that start at zero are 
disabled, no? 

			Linus


--

From: Rene Herman
Date: Monday, September 29, 2008 - 11:31 am

I believe the possible issue is that resources that do _not_ (seem to) 
start at zero might also be disabled.

Bjorn commented that pci_resource_start() returns a CPU address for I/O 
which might not be the actual I/O address on some platforms. I haven't a 
clue if that's actually possible "wherever PnP is relevent" as you put 
it but that seems to otherwise make sense.

If it does though, it might for all I know also be possible to check 
against some ARCH_SPECIFIC_INVALID_IO_ADDRESS instead of plain unadorned 
0 (or just recheck the actual BAR again if not stored anywhere).

But that's the issue as I understood it: we might miss them on some 
platforms if checking against 0...

Rene.
--

From: Linus Torvalds
Date: Monday, September 29, 2008 - 12:13 pm

But that is irrelevant.

If we have registered them in the resource tree, then PnP must ignore 
them.

The fact is, this is not about being enabled or disabled. This is about 
the PnP tree containing resources that we already parsed from the PCI 
stuff, and once we've seen them as PCI resources, there's not really 
anything valuable in the PnP information.

			Linus
--

From: Rene Herman
Date: Tuesday, September 30, 2008 - 2:19 am

Well, if you say so...

Just did the attached which might match that intention. Please do not 
consider this a submission as I've no idea if this is sensible nor if it 
actually helps Frans. Just for discussion. Anything here should arrive 
through Bjorn.

Rene.
From: Linus Torvalds
Date: Tuesday, September 30, 2008 - 7:48 am

I think you got the logic the wrong way around:

+                       /* have we been registered already? */
+                       if (pci_res->parent)
+                               continue;

This ignores resources that are _registered_, but it should be the other 
way around - we should ignore resources that aren't (because they may be 
invalid or they may move around)

+                       pci_start = pci_res->start;
+                       pci_end = pci_res->end;
+
+                       if (pci_end < pci_start || !pci_end)
+                               continue;

Hmm. This should be unnecessary with any registered resource, since the 
resource code wouldn't allow registering such a resource in the first 
place.

Of course, it may be that the PnP code runs too early, and we have only 
parsed the PCI resources, not inserted them into the resource tree yet. If 
so, none of this will work, of course.

		Linus
--

From: Rene Herman
Date: Tuesday, September 30, 2008 - 8:57 am

It doesn't. With the test negated it triggers for all PCI resources (and 
ofcourse my soundcard driver fails again).

If the simple res->start == 0/SOME_ARCH_DEFINE definitely won't do and 
the issue with Bjorn's approach is the bus hammering the solution might 
be to set a per resource flag in pci_dev at parse time?

(that's PCI though and I can't really be here at the moment so over to 
someone else).

Rene.
--

From: Linus Torvalds
Date: Tuesday, September 30, 2008 - 9:29 am

Oh, ok. Looking at it, it does seem that we actually _insert_ the PCI 
resources too late. We do it in pcibios_allocate_resources(), and there we 
even take care to look at whether it was enabled or disabled (we 
prioritize enabled resources, so that a disabled one will never be 
requested before an enabled one and if they clash, it's always the 
disabled one that loses the resource).

But pcibios_allocate_resources() is called from pcibios_resource_survey(), 
which is called from pcibios_init(), which in turn is caled from 
pci_subsys_init() that is a "subsys_initcall()".

In contrast, the PnP fixup thing is called from pnp_fixup_device, called 
from __pnp_add_device(), called from pnp_add_device() (and 
pnp_add_card(), but that should be later), and those in turn from
pnpacpi_add_device and pnpacpi_init(). 

And pnpacpi_init is _also_ a subsys_initcall, but arch/x86/pci/built-in.o 
gets linked in _after_ drivers/built-in.o. That, in turn, is because it's 
marked as a "driver" in the x86 Makefile, and the main Makefile actually 
ends up forcing "drivers-y" to have drivers/ first.

Just for fun, does this patch make a difference and allow you to just 
take the "is it registered" thing into account?

It's a scary change right now, and I wouldn't commit it as is (I think 
that for 2.6.27 the thing to do is to just do the minimal "zero means 
disabled" thing), but having some random driver level initialize before 
the core architecture-specific PCI code does smell. So something like this 
sounds conceptually right anyway.

		Linus

---
 arch/x86/Makefile |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index f5631da..97d0e86 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -159,7 +159,7 @@ core-$(CONFIG_IA32_EMULATION) += arch/x86/ia32/
 
 # drivers-y are linked after core-y
 drivers-$(CONFIG_MATH_EMULATION) += arch/x86/math-emu/
-drivers-$(CONFIG_PCI)            += ...
From: Linus Torvalds
Date: Tuesday, September 30, 2008 - 10:10 am

A clearer (and more flexible) solution might be this patch. It's more 
explicit about the fact that it simply makes it clear that any drivers 
that are added by the architecture Makefile will be _first_ in the list of 
drivers.

I suspect we should do the same with 'libs' and 'core' too, for that 
matter, but we don't really use '*.a' libraries any more so link order 
doesn't matter apart from initcall ordering. And libraries should hopfully 
never have that issue. And 'core' is at least right now just the 
initramfs thing. So in practice, it's probably only drivers/ that coul 
have issues like this.

Sam added to Cc, since this is a build issue. What do people think? 

Background: We are very careful to add 'drivers/pci' _before_ 
'drivers/{acpi,pnp}' in the drivers/Makefile, but what happens now is that 
since the arch Makefile adds it's own drivers to the very end, the really 
core PCI initcalls actually get ordered at the end, not the beginning. 

		Linus

---
 Makefile |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index 1d03c16..642b9b9 100644
--- a/Makefile
+++ b/Makefile
@@ -458,7 +458,6 @@ scripts: scripts_basic include/config/auto.conf
 
 # Objects we will link into vmlinux / subdirs we need to visit
 init-y		:= init/
-drivers-y	:= drivers/ sound/ firmware/
 net-y		:= net/
 libs-y		:= lib/
 core-y		:= usr/
@@ -517,6 +516,9 @@ endif
 
 include $(srctree)/arch/$(SRCARCH)/Makefile
 
+# Do this _after_ the architecture may have added its own core drivers
+drivers-y	+= drivers/ sound/ firmware/
+
 ifneq (CONFIG_FRAME_WARN,0)
 KBUILD_CFLAGS += $(call cc-option,-Wframe-larger-than=${CONFIG_FRAME_WARN})
 endif
--

From: Linus Torvalds
Date: Tuesday, September 30, 2008 - 10:21 am

It looks like both of these will fall afoul of the fact that ACPI 
currently seems to _expect_ to be called in the old order, ie I get some 
oops in acpi_irq_penalty_init() apparently because it knew that it got 
called later (thanks to being called from pci_acpi_init in the 
arch-specific directory), and knew that the other ACPI subsys setup 
routines had been done before.

Dang.

I guess we'll have to bite the bullet some day and actually create some 
explicit topological ordering of initcalls rather than depend on the 
initcall levels and link order. That is one particular complexity I've 
tried to avoid. But the subtlety of the current ordering is certainly not 
at all good either.

		Linus
--

From: Rene Herman
Date: Tuesday, September 30, 2008 - 12:29 pm

Yes, I also get that oops but other than that, both link order versions 
you sent out work -- ie, booting with acpi=noirq gets me to a functional 
system with the quirk having run for PNP0c02 (acpi=off disables all of 
PNP0c02) and doing its job.

For some reason only some of your messages seem to be making it into my 
inbox (in order, at least) but either of these that is:

http://lkml.org/lkml/2008/9/30/242
http://lkml.org/lkml/2008/9/30/261

With the attached on top, all's working fine for me.

Rene.

From: Rene Herman
Date: Tuesday, September 30, 2008 - 12:37 pm

It does, but that said... placing the attached small debug printk's on 

which does still feel rather clunky (especially the missing "middle" 
resources 7, 8 and 9 for 01.0, my AGP bridge, look a little weird).

(0a.0 is ofcourse my soundcard that is the issue)

The resources array for pci_dev is static -- a pci_dev bitmask of 
enabled resources does sound somewhat nice-ish still perhaps.

Rene.
From: Linus Torvalds
Date: Tuesday, September 30, 2008 - 12:44 pm

Ok. But that means that the last patch I sent out - the one that _only_ 
changes the order for PnP itself, and moves pnpacpi_init and pnpbios_init 
to be fs_initcalls - should also work, and have none of he other 

Don't know why you can't see the messages, but in case this one comes
through but not the other ones, look at the third patch in

	http://lkml.org/lkml/2008/9/30/283

instead.

		Linus
--

From: Rene Herman
Date: Tuesday, September 30, 2008 - 1:48 pm

Yes.

I am fine on current mainline and with this seem to still be fine, with 
or without the quirk changes (*) applied.

(*) http://marc.info/?l=linux-kernel&m=122280330516865&w=2

Frans Pop will need something like those quirk changes on top to have 
his machine stop yelling at him -- assuming it actually works for him 
that is (which it should I guess, but it's not been tested by him yet).

The pci_start == 0 version, attached for convenience and also still 
available from the bugzilla:

http://bugzilla.kernel.org/show_bug.cgi?id=11550

is still the minimal version for Frans' issue.

(I see there are multiple copies of messages that I sent in that marc 
archive. Seem to again be experiencing severe email trouble since I'm 
also not getting back most messages that I see there. Anyways, if you 
get multiple copies, sorry, can't help it it seems, and I need to be 
away after this).

Rene.






From: Ingo Molnar
Date: Tuesday, September 30, 2008 - 12:38 pm

incidentally, i've been talking to Arjan about this recently in context 
of the CONFIG_FASTBOOT feature. Because, as a side-effect, in the long 
run, once the dependencies between initcalls fan out in a more natural 
way, with explicit initcall ordering we'll also be able to boot a bit 
faster and a bit more parallel.

 [ but performance is far less important than robustness, so this idea 
   was on the backburner. ]

and i think on the conceptual level initcall levels and implicit 
ordering are bad in the same way SPL and IPL based locking is worse than 
real, explicit spinlocks.

i think the topological ordering should not be just an extension of the 
current hardcoded initcall levels, but it should be symbol space based: 
i.e. an initcall should depend not on some kind of artificial enum, but 
it should depend on _another initcall_. (a list of initcalls more 
generally)

so instead of the current hardcoded levels:

  core_initcall(sysctl_init);

we could have natural constructs like:

  initcall_depends_on(sysctl_init, securityfs_init);
  initcall_depends_on(sock_init, sysctl_init)

where we create explicit dependencies between actual initcalls just by 
listing their dependencies. In many cases we could express dependencies 
in a natural way:

  initcall_depends_on(some_subsys_init, kmem_cache_init);
  initcall_depends_on(some_subsys_init, sched_init);

which would express the fact that some_subsys_init() must execute after 
kmem_cache_init() and after sched_init().

Each initcall is associated with an 'initcall descriptor', which shows 
which other initcalls this initcall depend on, and whether the initcall 
has been executed already.

during bootup the initcall engine would parse the graph and would 
execute all the 'leaf' initcalls, and would complete the graph 
gradually.

( More details: we'd have a number of compatibility and convenience 
  symbols as well - well-known initialization stages for various 
  customary phases of bootup.

  And at link ...
From: Linus Torvalds
Date: Tuesday, September 30, 2008 - 12:51 pm

Hell no.

We do not want any implicit parallelism in the initcalls. That way lies 
madness.

The probe functions that explicitly know that they are slow (like USB 
detection and/or other individual drivers that have timeouts) should put 
themselves in the background. We should _not_ use the dependency chain to 
do so automatically, because for most cases drivers are totally 
independent, but we still want a _reliable_  and _repeatable_ ordering.

Which means that I will not accept stuff that makes for a parallel bootup 
as a general initcall notion. I want things like network devices to show 
up in the same order for the same kernel, thank you very much - even if 
there is absolutely _zero_ ordering constraints between two independent 
network drivers.

Anything else will inevitably cause just totally random and undebuggable 

Yes, it should be explicit.

However, I don't agree with the notion of having initcalls point to other 
initcalls. One big _idea_ of initcalls is that you can put them anywhere 
in the source code, and that CONFIG_XYZ variables will automatically run 
them or not depending on whether the code was compiled in. Having 

would be a TOTAL DISASTER, because if you do that, then you are 
essentially back to the insane situation where people need to know what 
other parts are enabled.

So no. No "one call depends on another" crap.

But I think we could add a separate notion of a dependancy point, and have 
a setup where we describe "initcall X needs to happen before point A" and 
"initcall Z needs to happen after point A".

And then we can create a separate set of these dependency points, so that 
X and Y don't have to know about each other, they just have to have some 
knowledge about some common synchronization point - one that exists 
regardless of whether X or Y are even compiled in!

			Linus
--

From: Arjan van de Ven
Date: Tuesday, September 30, 2008 - 12:54 pm

On Tue, 30 Sep 2008 12:51:07 -0700 (PDT)

just to avoid any confusion; the current -fastboot tree does not do
this parallel stuff. At all.
(so please don't judge it as doing that)


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Ingo Molnar
Date: Tuesday, September 30, 2008 - 1:01 pm

well, as i mentioned it was and is on the backburner, because we went 
over the same list of problems that you mentioned: harder to read and 
interpret and debug, harder to reproduce boot ordering, etc.

but i'd still like the address the above specific point: it would be 
silly to propagate Kconfig dependencies into the initcall dependencies, 
why do you assume we'd do that?

When PROCFS or PNP is turned off, then their initcall symbols should 
naturally alias to some NOP definition, a function that is immediately 
marked as 'done'. We _already_ have NOP stubs for many initializer 
symbols.


One convenience symbol would be "memory_done()": to indicate that 
kmalloc() and all the other memory allocators are up and running and 
usable.

	Ingo
--

From: Grant Grundler
Date: Tuesday, September 30, 2008 - 11:13 pm

On Tue, Sep 30, 2008 at 12:51:07PM -0700, Linus Torvalds wrote:

We already do this today. :)
Definitions are in include/linux/init.h.
Point A would be "early" ("run before initialing SMP")
The rest could use better definitions and AFAICT aren't that much better
than being named "Point B".

hth,
grant
--

From: Ingo Molnar
Date: Wednesday, October 1, 2008 - 1:26 am

the structural problem with the current level-based initcall design is 
that the current dependencies are implicit (not spelled out clearly 
anywhere in the source code), and bugs in them are often fixed by 
experimenting around (seeing whether it breaks), not by design.

Changing it is a ton of work, and the risks of touching that code might 
eclipse any (often marginal) advantages a new scheme has. Today boot 
code runs only once and it is one of the most under-tested pieces of 
kernel code. Boot code's quality and robustness is at least 1 order of 
magnitude worse than regular kernel code.

But to play the devil's advocate: users have so many problems with weird 
races in boot code today _already_, wouldnt it be better to expose boot 
code to more variations, to put it under environmental pressure that 
ultimately improves its quality?

Init code is often reused during suspend/resume, so by introducing more 
flexibility into initcalls maybe we create enough pressure to fix them 
when it's far easier to fix them. (after bootup - fixing after-resume 
bugs is much harder because often the console is turned off and no 
significant BIOS code ran.)

Today moving an initcall to another level has unknown effects and 
nothing warns about broken dependencies but a bootup crash (often only 
triggering under a specific .config), or a non-working device or some 
other regression.

That is rather fragile and i doubt anyone can argue the opposite.

The question of whether explicit dependencies are better is another 
question and up to debate:

in the long run it is _IMHO_ more robust to express explicit 
dependencies close to the init functions, in the source code:

  initcall_depends_on(this_driver, memory_init);
  initcall_depends_on(this_driver, io_resources_init);

than to rely on the implicit (and undocumented and often forgotten) 
dependencies we have currently.

For example ordering an initcall to after PNP init would be trivial, 
we'd add this to the init dependency ...
From: Grant Grundler
Date: Sunday, October 5, 2008 - 10:34 pm

On Wed, Oct 01, 2008 at 10:26:03AM +0200, Ingo Molnar wrote:

Ingo,
I totally agree with you and liked yours/Arjan's suggestion to make
the dependency explicit. Since linus pointed out explicit dependencies
would be a disaster to maintain, I don't know where to go next. I agree
with him issues will come up. But frobbing the current scheme by increasing
the number of "init points" from 8 to 50 (or more likely a 100 or 200)
feels like it's not making the dependencies explicit either.

We currently make the subsystem/driver dependencies explicit in the
Kconfig files. We hide the lack of a subsystem with null macros
(e.g. "#define foo() {}" ). I'm wondering if this would be one
step making your original suggestion palatable.

I'm also wondering if we should be using the dependency graph
that is effectively coded in Kconfig files to generate the
init calls somehow. Any university students looking for a
very cool and pratical project this year that will make you


Yes. A DEBUG mechanism to record and dump the order when each init
call is started and completed might be one step to satisfy Linus' fear

I'm pretty comfortable that explicit dependencies are better.
The question is how to deal with the issues Linus raised and any

We just need the "memory_init" function to export a stub in

*nod*. I think all of it has be automated. Where automation gets
the dependency info from will be the key to making this work.

hth,
--

From: Linus Torvalds
Date: Wednesday, October 1, 2008 - 8:14 am

Absolutely. 

The problem with the current <linux/init.h> isn't that it doesn't work - 
it's worked pretty well for a long time - it's that we continually tend to 
hit the limits of the few fixed points.

I would just extend on that notion a bit, and also make the markers a bit 
more dynamic. Instead of having just 7-8 levels of initcalls, and a very 
fixed naming that is a bit misleading, I'd like to extend it to maybe 50 
levels, and the levels would be named by the subsystems and then just have 
one single place that orders them.

The old 7 levels (plus the "pure" one) would still exist, so it wouldn't 
need to be a flag-day event.

For example: why would you use "fs_initcall()" for the PnP init? The 
reason is simple: it's not a filesystem init, but it's the one that comes 
after the subsys init and before the actual low-leveld drivers. We used to 
initialize the core filesystem data (VFS) at that stage (we still do, 
although most of the actual filesystems are actually just done using the 
default module-init much later), which explains the naming, but the 
problem is that

 (a) we want to order things at a finer granularity than that
 (b) we want to have a better and more descriptive naming

but the basic notion of having a fixed ordering certainly isn't wrong (and 

It would be *much* better to give them symbolic names (easy enough - we 
just turn them into sections that are symbolic anyway), and then have a 
list of ordering for those symbolic names.

So they sure as hell would be much better than being named "Point B". We 
could get rid of

	subsys_initcall(pci_init)

and instead do

	initcall(pci_init, "pci");
	..
	initcall(pnp_init, "pnp");

and then just list the levels for the linker scripts in one place. So that 
we wouldn't really have to worry about link ordering: if the link ordering 
is wrong, we just add a new initcall level and insert it in the right 
place, and then we can look at the ordering and see it explicitly in one 
place instead ...
From: Yinghai Lu
Date: Wednesday, October 1, 2008 - 9:21 am

On Wed, Oct 1, 2008 at 8:14 AM, Linus Torvalds

don't need to think linking order.
could reorder them in the run time.

struct init_call_st {
        int level;
        char *name;
        initcall_t call;
};

#define INIT_CALL(nameX, levelX, callX) \
                static struct init_call_st __init_call_##nameX __initdata = \
                {       .name = nameX,\
                        .level = levelX,\
                        .call = callX,\
                }; \
                static struct init_call_st *__init_call_ptr_##nameX __used \
                __attribute__((__section__(".init_call.init"))) = \
                        &__init_call_##nameX

in vmlinux.lds.h
#define INIT_CALL_INIT(align)
         \
        . = ALIGN((align));                                             \
        .init_call.init : AT(ADDR(.init_call.init) - LOAD_OFFSET) {     \
                VMLINUX_SYMBOL(__init_call_start) = .;                  \
                *(.init_call.init)                                      \
                VMLINUX_SYMBOL(__init_call_end) = .;                    \
        }

let arch vmlinux.lds.S to have
INIT_CALL_INIT(8)

and init/main.c

void __init do_init_calls(void)
{
        struct init_call_st **daa;
        char *ptr;

        /* sort it?, prescan... */
        for (daa = __init_call_start ; daa < __init_call_end; daa++) {
                struct init_call_st *da = *daa;

...
        }

        for (daa = __init_call_start ; daa < __init_call_end; daa++) {
                struct dyn_array *da = *daa;
                ...
                da->call();
}


YH
--

From: Rolf Eike Beer
Date: Tuesday, September 30, 2008 - 1:05 pm

Why calculate this at boot time? Do you expect this to change between bootups?

Eike
From: Ingo Molnar
Date: Wednesday, October 1, 2008 - 1:52 am

yes, we could indeed map it statically to a single-threaded bootup 
sequence at build time. That is more reproducible, more deterministic, 
etc.

[ What i was thinking about the most generic long-run approach: each CPU 
  processing initcalls in parallel. Then the initcall graph is processed
  in parallel and the ordering and speed of execution (and the grade of
  completion) depends on the number of CPUs, their speed and other
  factors, etc.

  But it would be madness to combine such a facility with the current
  fragile single-threaded boot code. ]

	Ingo
--

From: Linus Torvalds
Date: Tuesday, September 30, 2008 - 11:01 am

Btw, why is that? We very much have a separate

	/**
	 * Reserve motherboard resources after PCI claim BARs,
	 * but before PCI assign resources for uninitialized PCI devices
	 */
	fs_initcall(pnp_system_init);

which is called much later. That seems to be the _right_ point for any 
quirks. It seems that the _real_ problem here is that the PnP device fixup 
is simply called from the wrong point. Ie, why do we do device discovery - 
and thus PnP quirks - in pnp_init (before the PCI bus is actually fully 
initialized!), rather than in pnp_system_init?

Ie, maybe the proper thing to do is to simply be *consistent* in the PnP 
subsystem, and always use fs_initcall (for the stated reasons!) for the 
device discovery. Ie the patch would be something like the appended, and 
then you really can depend on the PCI subsystem having been set up, 
including having all relevant resources inserted into the tree!

Hmm?  Bjorn? Everything that is true about "pnp_system_init" should be 
equally true abote pnpacpi_init and pnpbios_init, no?

			Linus

---
 drivers/pnp/pnpacpi/core.c |    2 +-
 drivers/pnp/pnpbios/core.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pnp/pnpacpi/core.c b/drivers/pnp/pnpacpi/core.c
index c1b9ea3..53561d7 100644
--- a/drivers/pnp/pnpacpi/core.c
+++ b/drivers/pnp/pnpacpi/core.c
@@ -268,7 +268,7 @@ static int __init pnpacpi_init(void)
 	return 0;
 }
 
-subsys_initcall(pnpacpi_init);
+fs_initcall(pnpacpi_init);
 
 static int __init pnpacpi_setup(char *str)
 {
diff --git a/drivers/pnp/pnpbios/core.c b/drivers/pnp/pnpbios/core.c
index 19a4be1..662dfcd 100644
--- a/drivers/pnp/pnpbios/core.c
+++ b/drivers/pnp/pnpbios/core.c
@@ -571,7 +571,7 @@ static int __init pnpbios_init(void)
 	return 0;
 }
 
-subsys_initcall(pnpbios_init);
+fs_initcall(pnpbios_init);
 
 static int __init pnpbios_thread_init(void)
 {
--

From: Linus Torvalds
Date: Tuesday, September 30, 2008 - 11:13 am

Side note: unlike the more radical "move all arch driver initcalls 
earlier", this one actually works for me. But it does cause 
pnp_system_init() to actually run before pnpacpi_init and pnpbios_init, 
due to the link order within PnP. That seems ok, since the system/acpi/pnp 
drivers should all be just different versions of PnP drivers, but I 
thought I'd mention it. Maybe there is some PnP internal reason why 
pnp_system_init should run after the pnp_acpi/bios_init functions.

(And if so, then just changing the order in drivers/pnp/Makefile would fix 
it up again).

Bjorn, is there any reason why this isn't the right thign to do?

Rene - I have this suspicion that just doing this part should already fix 
your issues with _no_ other patch, since now anything PnP does is after 
all the PCI setup code anyway. So does your sound card (or whatever it 
was) work by just doing the subsys_initcall -> fs_initcall change in PnP?

		Linus
--

From: Rene Herman
Date: Tuesday, September 30, 2008 - 12:51 pm

I'll have to wait for the message to arrive here or in a web-archive to 
answer that (...) but please note that I am already fine with the 
original code; it is the fix for my own issue that's IN mainline (ie, 
not only check for MEM resource overlaps but also IO) that now made 
Frans Pop yell due to his machine now spitting out lots of I/O overlap 
warnings -- which turned out to not be real overlaps, but due to a 
uninialized BAR.

You might want Frans to test this though then...

Rene.
--

From: Bjorn Helgaas
Date: Tuesday, September 30, 2008 - 12:16 pm

Right.  The point of this quirk (quirk_system_pci_resources()) is to
prevent the PNP system driver from claiming resources that are actually
used by PCI.  So I don't think there's any reason to run it before we
bind the driver to the device.  PNP doesn't currently have any early/
late concept for quirks, but we probably should add one.

Feels like a post-2.6.27 project though.

Bjorn


--

From: Bjorn Helgaas
Date: Tuesday, September 30, 2008 - 12:12 pm

Sorry for the delay in responding -- I'm officially on vacation
yesterday and today.

I agree that for 2.6.27 the "res->start == 0 means disabled" check
in the PNP quirk seems safest.

I don't like it long-term, because (a) I'd like that knowledge to at
least be in PCI, not PNP, and (b) res->start is the CPU address, not
necessarily the PCI bus address, and the BAR value (== PCI bus address)
is what we're trying to check.  Even if we looked at the BAR value
instead of res->start, I think zero is a valid PCI bus address on
machines where the root bridge applies an address offset.

So something like the patch below (same as what Rene originally
proposed, I think)?

diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
index 0bdf9b8..b3319e4 100644
--- a/drivers/pnp/quirks.c
+++ b/drivers/pnp/quirks.c
@@ -254,6 +254,10 @@ static void quirk_system_pci_resources(struct pnp_dev *dev)
 
 			pci_start = pci_resource_start(pdev, i);
 			pci_end = pci_resource_end(pdev, i);
+
+			if (!pci_start)
+				continue;
+
 			for (j = 0;
 			     (res = pnp_get_resource(dev, type, j)); j++) {
 				if (res->start == 0 && res->end == 0)

--

From: Bjorn Helgaas
Date: Wednesday, October 1, 2008 - 1:18 pm

Where are we on this regression?  We got side-tracked with all
the initcall ordering stuff, but I don't think that leads to
something appropriate for this late stage of 2.6.27.

I think the safest (and already tested) path is the
"res->start == 0 means disabled" check in the PNP quirk.

Bjorn
--

Previous thread: sysfs per namespace by Daniel Lezcano on Monday, September 29, 2008 - 8:46 am. (1 message)

Next thread: [RFC PATCH 0/4] Implementation of IR support using the input subsystem by Jon Smirl on Monday, September 29, 2008 - 9:17 am. (7 messages)