Re: [sodaville] [PATCH 02/11] x86: Add device tree support

Previous thread: Re: [RFC/PATCH v6 02/12] media: Media device by Andy Walls on Thursday, November 25, 2010 - 10:20 am. (3 messages)

Next thread: Congratulations!!! by Irish Claims Desk on Thursday, November 25, 2010 - 10:54 am. (1 message)
From: Sebastian Andrzej Siewior
Date: Thursday, November 25, 2010 - 10:39 am

This patchset introduces device tree support on x86. The device tree is 
passed by the bootloader via setup_data. It is used as an additional 
source of information and does not replace the "traditional" x86 boot
page.
Right now we get the the following information from it:
- hpet location
- apic & ioapic location
- ioapic's interrupt routing
- legacy devices which are not initialized by bios
- devices which are behind a bus which does not support enumeration like 
  i2c

The series is based on the tip tree and is also available at
  git://git.linutronix.de/users/bigeasy/soda.git ce_of

Sebastian Andrzej Siewior (11):
      x86/kernel: remove conditional early remap in parse_e820_ext
      x86: Add device tree support
      x86/dtb: Add a device tree for CE4100
      x86/dtb: add irq host abstraction
      x86/dtb: add early parsing of APIC and IO APIC
      x86/dtb: add support hpet
      x86/dtb: add support for PCI devices backed by dtb nodes
      x86/dtb: Add generic bus probe
      x86/ioapic: Add OF bindings for IO-APIC
      x86/io_apic: add simply id set
      x86/ce4100: use OF for ioapic

 Documentation/x86/boot_with_dtb.txt   |   20 ++
 arch/x86/Kconfig                      |    7 +
 arch/x86/include/asm/bootparam.h      |    1 +
 arch/x86/include/asm/e820.h           |    2 +-
 arch/x86/include/asm/io_apic.h        |    8 +
 arch/x86/include/asm/irq_controller.h |   12 +
 arch/x86/include/asm/prom.h           |   67 ++++++
 arch/x86/kernel/Makefile              |    1 +
 arch/x86/kernel/apic/io_apic.c        |  144 +++++++++++++
 arch/x86/kernel/e820.c                |    8 +-
 arch/x86/kernel/irqinit.c             |    9 +-
 arch/x86/kernel/prom.c                |  370 +++++++++++++++++++++++++++++++++
 arch/x86/kernel/setup.c               |   15 ++-
 arch/x86/platform/ce4100/ce4100.c     |   16 ++-
 arch/x86/platform/ce4100/ce4100.dts   |  210 +++++++++++++++++++
 15 files changed, 877 insertions(+), 13 deletions(-)
 create mode 100644 ...
From: Sebastian Andrzej Siewior
Date: Thursday, November 25, 2010 - 10:39 am

The here introduced irq_host abstraction represents a generic irq_host.
The xlate callback is resposible to parse irq informations like irq type
and number and returns the hardware irq number which is reported by the
hardware as active.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
CC: x86@kernel.org
Cc: devicetree-discuss@lists.ozlabs.org
Tested-by: Dirk Brandewie <dirk.brandewie@gmail.com>
---
 arch/x86/include/asm/irq_controller.h |   12 ++++++++
 arch/x86/include/asm/prom.h           |    2 +
 arch/x86/kernel/prom.c                |   47 ++++++++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/include/asm/irq_controller.h

diff --git a/arch/x86/include/asm/irq_controller.h b/arch/x86/include/asm/irq_controller.h
new file mode 100644
index 0000000..1cbbfd0
--- /dev/null
+++ b/arch/x86/include/asm/irq_controller.h
@@ -0,0 +1,12 @@
+#ifndef __IRQ_CONTROLLER__
+#define __IRQ_CONTROLLER__
+
+struct irq_host {
+	int (*xlate)(struct irq_host *h, const u32 *intspec, u32 intsize,
+			u32 *out_hwirq, u32 *out_type);
+	void *priv;
+	struct device_node *controller;
+	struct list_head l;
+};
+
+#endif
diff --git a/arch/x86/include/asm/prom.h b/arch/x86/include/asm/prom.h
index 8fdb0d2..6c80e53 100644
--- a/arch/x86/include/asm/prom.h
+++ b/arch/x86/include/asm/prom.h
@@ -20,10 +20,12 @@
 #include <asm/irq.h>
 #include <asm/atomic.h>
 #include <asm/setup.h>
+#include <asm/irq_controller.h>
 
 #ifdef CONFIG_OF
 extern void init_dtb(void);
 extern void add_dtb(u64 data);
+void add_interrupt_host(struct irq_host *ih);
 #else
 static inline void init_dtb(void) { }
 static inline void add_dtb(u64 data) { }
diff --git a/arch/x86/kernel/prom.c b/arch/x86/kernel/prom.c
index ba9a096..996fd05 100644
--- a/arch/x86/kernel/prom.c
+++ b/arch/x86/kernel/prom.c
@@ -3,18 +3,63 @@
  */
 
 #include <linux/io.h>
+#include <linux/interrupt.h>
 #include <linux/list.h>
 #include <linux/of.h>
 #include ...
From: Jon Loeliger
Date: Thursday, November 25, 2010 - 12:30 pm

I thought there was an intent and desire to rename the irq_host
as irq_domain.

jdl
--

From: Sebastian Andrzej Siewior
Date: Friday, November 26, 2010 - 7:19 am

AFAIK Benh was thinking about renaming it. I don't know if this is still
the case or when he intends to do so. Once he does so, this can be renamed
as well.

Sebastian
--

From: Benjamin Herrenschmidt
Date: Friday, November 26, 2010 - 2:36 pm

That and moving the powerpc code to a generic place so you don't have to
re-invent your own :-)

I think Grant has patches for that.

Cheers,
Ben.


--

From: Sebastian Andrzej Siewior
Date: Wednesday, December 1, 2010 - 3:31 am

I've found only patches which rename the struct, none that move it into
generic code. Should I rename mine and wait until it appears in generic

Sebastian
--

From: Jon Loeliger
Date: Friday, November 26, 2010 - 8:11 pm

I submitted a first version of that patch already.
I thought we were waiting on gkh's patches to go upstream
before re-submitting the next version of that patch.
If we're ready to proceed with that, we can rebase it

Hrm.  Well, let's ask Ben what order he wants to do...?

jdl
--

From: Sebastian Andrzej Siewior
Date: Thursday, November 25, 2010 - 10:39 am

The apic & ioapic have to be added to system early because
native_init_IRQ() requires it.
The phys_reg preoperty is used instead of the reg property because in
case of a PCI device this property is not holding the address of the
chip. In this case we can't query the PCI bar information because the
PCI bus is not (yet) up.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
CC: x86@kernel.org
Cc: devicetree-discuss@lists.ozlabs.org
Tested-by: Dirk Brandewie <dirk.brandewie@gmail.com>
---
 arch/x86/include/asm/prom.h |    4 ++
 arch/x86/kernel/irqinit.c   |    2 +-
 arch/x86/kernel/prom.c      |  105 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 110 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/prom.h b/arch/x86/include/asm/prom.h
index 6c80e53..b74a49f 100644
--- a/arch/x86/include/asm/prom.h
+++ b/arch/x86/include/asm/prom.h
@@ -23,12 +23,16 @@
 #include <asm/irq_controller.h>
 
 #ifdef CONFIG_OF
+extern int of_ioapic;
 extern void init_dtb(void);
 extern void add_dtb(u64 data);
+void x86_early_of_parse(void);
 void add_interrupt_host(struct irq_host *ih);
 #else
 static inline void init_dtb(void) { }
 static inline void add_dtb(u64 data) { }
+static inline void x86_early_of_parse(void) { }
+#define of_ioapic 0
 #endif
 
 extern char cmd_line[COMMAND_LINE_SIZE];
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index d5970e2..8030193 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -250,7 +250,7 @@ void __init native_init_IRQ(void)
 			set_intr_gate(i, interrupt[i-FIRST_EXTERNAL_VECTOR]);
 	}
 
-	if (!acpi_ioapic)
+	if (!acpi_ioapic && !of_ioapic)
 		setup_irq(2, &irq2);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/prom.c b/arch/x86/kernel/prom.c
index 996fd05..9551f2f 100644
--- a/arch/x86/kernel/prom.c
+++ b/arch/x86/kernel/prom.c
@@ -10,11 +10,14 @@
 #include <linux/slab.h>
 
 #include <asm/irq_controller.h>
+#include <asm/io_apic.h>
 
 char ...
From: Sebastian Andrzej Siewior
Date: Thursday, November 25, 2010 - 10:39 am

x86_of_pci_init() does two things:
- it provides a generic irq enable and disable function. enable queries
  the device tree for the interrupt information, calls ->xlate on the
  irq host and updates the pci->irq information for the device.

- it walks through PCI buss(es) in the device tree and adds its children
  (devices) nodes to appropriate pci_dev nodes in kernel. So the dtb
  node information is available at probe time of the PCI device.

Adding a PCI bus based on the information in the device tree is
currently not supported. Right now direct access via ioports is used.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
CC: x86@kernel.org
Cc: devicetree-discuss@lists.ozlabs.org
Tested-by: Dirk Brandewie <dirk.brandewie@gmail.com>
---
 arch/x86/include/asm/prom.h |    1 +
 arch/x86/kernel/prom.c      |  110 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 111 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/prom.h b/arch/x86/include/asm/prom.h
index b74a49f..794c6a1 100644
--- a/arch/x86/include/asm/prom.h
+++ b/arch/x86/include/asm/prom.h
@@ -28,6 +28,7 @@ extern void init_dtb(void);
 extern void add_dtb(u64 data);
 void x86_early_of_parse(void);
 void add_interrupt_host(struct irq_host *ih);
+void __cpuinit x86_of_pci_init(void);
 #else
 static inline void init_dtb(void) { }
 static inline void add_dtb(u64 data) { }
diff --git a/arch/x86/kernel/prom.c b/arch/x86/kernel/prom.c
index f61c541..c02777d 100644
--- a/arch/x86/kernel/prom.c
+++ b/arch/x86/kernel/prom.c
@@ -8,10 +8,12 @@
 #include <linux/of.h>
 #include <linux/of_platform.h>
 #include <linux/slab.h>
+#include <linux/pci.h>
 
 #include <asm/hpet.h>
 #include <asm/irq_controller.h>
 #include <asm/io_apic.h>
+#include <asm/pci_x86.h>
 
 char __initdata cmd_line[COMMAND_LINE_SIZE];
 static LIST_HEAD(irq_hosts);
@@ -100,6 +102,114 @@ void __init add_dtb(u64 data)
 				offsetof(struct setup_data, data));
 }
 
+static int of_irq_map_pci(struct pci_dev ...
From: Benjamin Herrenschmidt
Date: Saturday, November 27, 2010 - 3:33 pm

That's something we need to eventually put into common code, ie matching
device nodes to PCI devices... In the meantime, your approach will do,

I don't quite get the logic in getting to the bus' interrupts if you

Ok so I see what you are trying to do, but I think it's not completely
correct, besides you miss the swizzling when crossing P2P bridges and
similar.

I suppose you looked at powerpc's of_irq_map_pci() so I'm not sure why
you modified it the way you did :-) You should probably either move it
to a generic place or copy it for now with a comment indicating where it

That too won't go down bridges, atom never have any ? (no PCIe root
complex at all ? ever will be ? even then, it should be supported as got
knows what we'll handle in the future).

Eventually we want that matching between PCI devices and OF nodes to be
in generic code, so that's not a big deal to have an "inferior" version
temporarily in there I suppose.


Cheers,
Ben.


--

From: Sebastian Andrzej Siewior
Date: Sunday, November 28, 2010 - 7:04 am

the of_irq_map_one() is here in case the device has an interrupt node in
the device tree. If not, looks it up in the device tree based on the pin
It should be correct. I did not understand the P2P bridge so I left it
Microblaze had its own copy of this code so I though there is something
specific about it. If it is okay with you, I would move it to drivers/of


Sebastian
--

From: Benjamin Herrenschmidt
Date: Sunday, November 28, 2010 - 3:32 pm

Appart from the accessor pci_device_to_OF_node() which might or might
not be specific, I thin the code is pretty common, probably something

Hehe yeah :-) It's actually not a simple problem. For example, we can't
just move the powerpc variant over to generic code as-is bcs ... we have
2 completely different ways of doing it between ppc32 and ppc64 for
historical reasons :-) They also have different "features". This is
something I need to reconcile at some stage.

For example our ppc32 variant support bus renumbering (ie, Linux
assigning different bus numbers than what the DT encodes) while our
ppc64 doesn't, but our ppc64 variant has additional "stuff" to deal with

Cheers,
Ben.

--

From: Sebastian Andrzej Siewior
Date: Thursday, December 2, 2010 - 9:17 am

With this patch I can get rid of my custom of_irq_map_pci() in x86 tree.
The remaining thing would be the inferior pci_device <=> of_node match
code.
This is only x86 tested.

From 8c7eeae45f28ea6737a8f5c5c32026a02432d5cc Mon Sep 17 00:00:00 2001
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Wed, 1 Dec 2010 21:55:12 +0100
Subject: [PATCH] of: move of_irq_map_pci() into generic code

There is a tiny difference between PPC32 and PPC64. Microblaze uses the
PPC32 variant.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/microblaze/include/asm/prom.h  |   15 -----
 arch/microblaze/kernel/prom_parse.c |   77 ---------------------------
 arch/microblaze/pci/pci-common.c    |    1 +
 arch/powerpc/include/asm/prom.h     |   15 -----
 arch/powerpc/kernel/pci-common.c    |    1 +
 arch/powerpc/kernel/prom_parse.c    |   84 ------------------------------
 drivers/of/Makefile                 |    1 +
 drivers/of/of_pci.c                 |   97 +++++++++++++++++++++++++++++++++++
 include/linux/of_pci.h              |   24 +++++++++
 9 files changed, 124 insertions(+), 191 deletions(-)
 create mode 100644 drivers/of/of_pci.c
 create mode 100644 include/linux/of_pci.h

diff --git a/arch/microblaze/include/asm/prom.h b/arch/microblaze/include/asm/prom.h
index bdc3831..aa3ab12 100644
--- a/arch/microblaze/include/asm/prom.h
+++ b/arch/microblaze/include/asm/prom.h
@@ -67,21 +67,6 @@ struct device_node *of_get_cpu_node(int cpu, unsigned int *thread);
 /* Get the MAC address */
 extern const void *of_get_mac_address(struct device_node *np);
 
-/**
- * of_irq_map_pci - Resolve the interrupt for a PCI device
- * @pdev:	the device whose interrupt is to be resolved
- * @out_irq:	structure of_irq filled by this function
- *
- * This function resolves the PCI interrupt for a given PCI device. If a
- * device-node exists for a given pci_dev, it will use normal OF tree
- * walking. If not, it will implement standard swizzling and walk up the
- * ...
From: Sebastian Andrzej Siewior
Date: Thursday, November 25, 2010 - 10:39 am

ioapic_xlate provides a translation from the information in device tree
to ioapic related informations. This includes
- obtaining hw irq which is the vector number "=> pin number + gsi"
- obtaining type (level/edge/..)
- programming this information into ioapic

ioapic_add_ofnode adds an irq_host based on informations from the device
tree. The memory address is obtained from the phys_reg property instead
of reg. On the PCI bus we use reg property for the PCI address. We can't
use the PCI functions because we need the ioapic before the PCI bus is
up. This information (irq_host) is required  in order to map a device to
its proper interrupt controller.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
CC: x86@kernel.org
Cc: devicetree-discuss@lists.ozlabs.org
Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
---
 arch/x86/include/asm/io_apic.h |    7 +++
 arch/x86/kernel/apic/io_apic.c |  100 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/prom.c         |    8 +++
 3 files changed, 115 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index d854b90..dc1169f 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -175,6 +175,13 @@ struct mp_ioapic_gsi{
 	u32 gsi_base;
 	u32 gsi_end;
 };
+#ifdef CONFIG_X86_OF
+struct mp_of_ioapic {
+	struct device_node *node;
+};
+extern struct mp_of_ioapic mp_of_ioapic[MAX_IO_APICS];
+void __init ioapic_add_ofnode(struct device_node *np);
+#endif
 extern struct mp_ioapic_gsi  mp_gsi_routing[];
 extern u32 gsi_top;
 int mp_find_ioapic(u32 gsi);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index ea51151..27a5709 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -43,6 +43,7 @@
 #include <linux/bootmem.h>
 #include <linux/dmar.h>
 #include <linux/hpet.h>
+#include <linux/of_address.h>
 
 #include <asm/idle.h>
 #include <asm/io.h>
@@ -60,6 +61,7 @@
 ...
From: Sebastian Andrzej Siewior
Date: Thursday, November 25, 2010 - 10:39 am

This patch adds minimal support for device tree support on x86. It will
be passed to the kernel via setup_data which requires atleast boot
protocol 2.09.
Memory size, restricted memory regions, boot arguments are gathered the
traditional way so things like cmd_line are just here to let the code
compile.
The current plan is use the device tree as an extension and to gather
informations from it which can not be enumerated and have to be
hardcoded otherwise. This includes things like
- which devices are on this I2C/ SPI bus?
- how are the interrupts wired to IO APIC?
- where could my hpet be?

Dirk is working on some patches which provide generic infrastructure for
linking the dtb into the kernel. Once this is it its final shape, we
will relocate the device tree unconditionally. This will remove the
requirement for the boot loader to locate the device tree within lowmem.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
CC: x86@kernel.org
Cc: devicetree-discuss@lists.ozlabs.org
Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
---
 Documentation/x86/boot_with_dtb.txt |   20 +++++++++++
 arch/x86/Kconfig                    |    7 ++++
 arch/x86/include/asm/bootparam.h    |    1 +
 arch/x86/include/asm/prom.h         |   60 +++++++++++++++++++++++++++++++++++
 arch/x86/kernel/Makefile            |    1 +
 arch/x86/kernel/irqinit.c           |    7 ++++
 arch/x86/kernel/prom.c              |   60 +++++++++++++++++++++++++++++++++++
 arch/x86/kernel/setup.c             |    4 ++
 8 files changed, 160 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/x86/boot_with_dtb.txt
 create mode 100644 arch/x86/include/asm/prom.h
 create mode 100644 arch/x86/kernel/prom.c

diff --git a/Documentation/x86/boot_with_dtb.txt b/Documentation/x86/boot_with_dtb.txt
new file mode 100644
index 0000000..3ba42b4
--- /dev/null
+++ b/Documentation/x86/boot_with_dtb.txt
@@ -0,0 +1,20 @@
+  Booting x86 with device tree
+=================================
+
+1. ...
From: Sam Ravnborg
Date: Thursday, November 25, 2010 - 3:53 pm

This file is not exported to userspace - so no need to guard with __KERNEL__

	Sam
--

From: Sebastian Andrzej Siewior
Date: Friday, November 26, 2010 - 2:06 am

Sebastian
--

From: Benjamin Herrenschmidt
Date: Friday, November 26, 2010 - 2:42 pm

How do that work with platforms like OLPC that have a real OF ?

One thing we did on powerpc which among other things allow kexec to work
on such platforms when we kill OF at boot (it might stay alive on OLPC),
is to basically detect very early in asm that we are coming from OF,
have a trampoline that extract the DT and turns it into a flat dtb, then
continue to the main kernel entry using the dtb method.

That way, one can kexec using the dtb method over and there's one single
entry point for device-tree use.


Linking the dtb into the kernel is something we prefer not doing on
powerpc and I'm curious why you think that applies better on x86...

We -do- have ways to include it in the zImage wrapper. However, this is
different in subtle ways because of the way our zImage wrapper building
works. Basically, we always build all the per-platform .o's of the
wrapper that apply to supported platforms by the kernel. The
binding/linking together of the final wrapper with a kernel image, an
optional dtb and optional initrd is performed by a shell script that can
be used outside of the normal build context.

That means that it's possible for a distro for example to install a
kernel image, all the wrapper .o files and that script, and at runtime
rebuild zImage wrappers with the appropriate dtb without having the
whole built kernel tree at hand.

The direction taken by ARM (and possibly newer powerpc platforms as
well) is to have the dtb be passed by the bootloader. Typically
bootloaders like uboot provide a way to flash the dtb separately so it
can be udpated (*).

(*) That brings a separate topic we shall discuss: A consistent way for
versionning the device-tree would be really useful.

Cheers,


--

From: Sebastian Andrzej Siewior
Date: Sunday, November 28, 2010 - 6:49 am

OLPC's openfirmware is embedded into the bootpage where ofw_magic is set
to OLPC_OFW_SIG (0x2057464F). I don't touch this, the device tree is
Similar. We get most critical parameters from the so called bootpage
(the traditional x86 way) which also contains a pointer to the device
tree (we don't have open firmware or something else where we call back).
We plan to relocate the device tree (before it is unflattered) so the
bootloader does not need to know about the memory layout the kernel is
having. 
On kexec, the bootpage is built from scratch AFAIK. So the kexec loader
This is only for the case where we do not get a dtb from the bootloader

The reason why you have multiple .o wrapper files is because the specific
platform code is not simply passing the device tree but also adding /
updating nodes like MAC address, bus clocks, ... which are coming from
the (different) bd_t struct or something else. The simpleboot target is
covering the case where you just pass the embedded dtb to kernel without
changing it.

On x86 we want to have the bootloader passing us the final dtb. The
For the distro reason the in-kernel dtb supports multiple dtbs. So a
distro kernel can include all of them into .init.data section and the
user can specify on the command line which device tree he wants. x86 gets
its command line via the bootpage so it is available before we have a

Yes, we want this as well. But what about the old ARMs where the
bootloader did not have dtb support? What about minimal bootloader which
just initialize the CPU and memory and jump then into the kernel? So the
in-kernel dtb is a simple way to solve this. However I don't know what
This isn't a problem unless you move nodes or deprecate them, right? Or

Sebastian
--

From: Benjamin Herrenschmidt
Date: Sunday, November 28, 2010 - 3:28 pm

Move nodes, deprecate them, yes, but also maybe fix bugs/typos etc... 

For most of these, of course, fixup code can figure things out without a
version. The version has a couple of (minor) advantages, such as being
something easier to get into a bug report rather than the whole tree,
for distro who may want to manage a "pool" of these, or maybe a
"generic" way to provide dtb "overrides" ...

Ben.


--

From: Grant Likely
Date: Thursday, December 30, 2010 - 1:26 am

/me gets ready to dodge tomatoes thrown at him.

Hmmm, back up a minute...

Since Linux on x86 has pretty much always depended on a two stage boot
(firmware boots a bootloader like grub which in turn boots the
kernel), then what is the use case for pursuing an in-kernel dtb
linkage?  simpleimage was used on powerpc for the use-case where there
is no 2nd stage bootloader, but instead only the kernel which is
booted from some firmware that is non-upgradeable (or at least too
risky to upgrade).  Same with the cuImages.  The wrapper is
effectively a 2nd stage bootloader to adapt from what older u-boot
provides and what the kernel needs.

What is the boot sequence for the embedded x86 platforms?  Is there
still a bootloader?  If so, what prevents always depending on the
bootloader to pass in the device tree blob?  If the bootloader is
software (not firmware) then it should be something we have control
over when shipping a distribution.

BTW, don't take microblaze as the example to be emulated.  Some of
the things it does for device tree support is not scalable, like
linking the .dtbs directly into the kernel.

John Bonesio has also prototyped doing a similar zImage bootwrapper on
arm which allows a dtb to be concatenated to the kernel image and
updated before passing it to the kernel.  As it stands, there are no
plans to use in-kernel .dtb linking on ARM.

I know it's not very fair to bring up these issues again right before
the merge window opens.  I got myself overcommitted and dropped the
ball over the last 1.5 months and I beg forgiveness.  However, I do
want to make sure that the right decision is made and I'd be happier
if a consistent scheme is used for passing the .dtb on all

Should equally be able to support this as a boot wrapper with the
added advantage that additional .dtbs could be added to the kernel
image at install time without rebuilding the kernel.

g.
--

From: Rob Landley
Date: Thursday, December 30, 2010 - 1:45 am

There's no one embedded setup on any platform, but one of the few
constants of embedded development is trying to eliminate unnecessary
requirements.

Just on standard-ish PC hardware I've seen people try to stick Linux in the
BIOS flash (generally not enough room), I've seen people try to stick
it as the first stage PXE payload, there's the fun and games with
kexec of emergency kernels for crash dumps...

If the capability to skip an unnecessary bootloader was available, people
would use it.

Rob
--

From: Grant Likely
Date: Thursday, December 30, 2010 - 1:58 pm

Right, but in all of those cases a boot wrapper provides the same
functionality with better flexability, such as being able to provided
the dtb image(s) at install time instead of compile time.

g.
--

From: H. Peter Anvin
Date: Monday, January 3, 2011 - 9:05 am

Assuming the boot wrapper is written correctly.  I have seen a number of
cases in which it was not, and it being "already locked into firmware"
and not changeable.

It's a nice theory.  And in theory, theory and practice agree.

	-hpa
--

From: H. Peter Anvin
Date: Monday, January 3, 2011 - 9:19 am

By the way, this is the same reason we also allow the initramfs and even
the command line to be compiled in.

	-hpa
--

From: Grant Likely
Date: Monday, January 3, 2011 - 10:52 am

I think we've got an impedance mismatch.

The whole point of the ppc boot wrapper, and the kind of boot wrapper
that I'm talking about here, is that it becomes part of the kernel
image and is *not* part of firmware.  ie. an executable wrapper which
carries the kernel as it's payload.  I'm wary too of depending of
firmware to get things right because it can be so painful to change.

g.
--

From: H. Peter Anvin
Date: Monday, January 3, 2011 - 11:06 am

The problem with that kind of boot wrapper is that they are
per-architecture, increasing the differences between architectures
needlessly, and they are often implemented very poorly.

As such, it's nice to have an ultimate fallback that doesn't depend on
anything outside ours -- the kernel community's -- control.

	-hpa
--

From: H. Peter Anvin
Date: Monday, January 3, 2011 - 11:10 am

In the case of x86, it's not just per architecture but actually per
platform interface, which is what aggravates the situation additionally.
 Unfortunately a lot of embedded x86 vendors seem extremely busy
recreating all the mistakes embedded developers on other platforms have
ever made, because "it's what they know"...

	-hpa
--

From: Grant Likely
Date: Thursday, December 30, 2010 - 1:57 pm

Hmmm, I shouldn't be sending email at 1:30 in the morning.  The above
statement is not really true.  One of the use-cases on ARM is using a
device tree with existing firmware that doesn't pass a dt blob.  Right
now there are two possible methods for doing this.  Option one is to
link the .dtbs in the the kernel proper and point to them from the
machine struct.  The dtb would be used when a matching machine id is
passed by the firmware.  Option 2 is to select the correct .dtb with a
kernel boot wrapper, which is exactly the method used by the powerpc
boot wrappers and is the mechanism that John Bonesio is prototyping
(hopefully will have patches out on the list early in the new year).

Personally I prefer the boot wrapper method because it means there is
exactly one mechanism for providing the kernel proper with a .dtb and
it allows the set of dtbs to be provided at install time instead of
kernel compile time.  Since the boot wrapper prototype is so-far
successful, it is the approach that I'm going to pursue on ARM (but
I'm not yet completely ruling out option 1).

Peter & Dirk, I realize that this is different from what we talked
about at Plumbers this year.  I'm not saying no to linking the .dtbs
into the kernel proper on x86, and I don't pretend to know the details
of the x86 Linux boot interface, but if dtb linking is merged, then
x86 will probably be the only major architecture to use it (microblaze
doesn't count as major)  :-).

g.
--

From: H. Peter Anvin
Date: Thursday, December 30, 2010 - 5:51 pm

There are a number of different boot loader solutions in use on embedded
platforms, as much as we would like to avoid it.

However, the ability to link in the dtb will provide a
architecture-neutral option of last resort.  I'm not saying it's a good
option, but it's better than random ad hoc stuff, and if that means that
it will only ever be used during in-lab platform bringup, *that is still
a huge win*.

	-hpa
--

From: Sebastian Andrzej Siewior
Date: Thursday, November 25, 2010 - 10:39 am

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
CC: x86@kernel.org
Cc: devicetree-discuss@lists.ozlabs.org
Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
---
 arch/x86/platform/ce4100/ce4100.dts |  210 +++++++++++++++++++++++++++++++++++
 1 files changed, 210 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/platform/ce4100/ce4100.dts

diff --git a/arch/x86/platform/ce4100/ce4100.dts b/arch/x86/platform/ce4100/ce4100.dts
new file mode 100644
index 0000000..2901882
--- /dev/null
+++ b/arch/x86/platform/ce4100/ce4100.dts
@@ -0,0 +1,210 @@
+/*
+ * CE4100 on Falcon Falls
+ *
+ * (c) Copyright 2010 Intel Corporation
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+/dts-v1/;
+/ {
+	model = "x86,CE4100";
+	compatible = "x86,CE4100";
+	#address-cells = <1>;
+	#size-cells = <1>;
+
+	cpus {
+		x86,Atom@0 {
+			device_type = "cpu";
+		};
+	};
+
+	atom@0 {
+		#address-cells = <1>;
+		#size-cells = <1>;
+		device_type = "soc";
+		compatible = "simple-bus";
+		ranges;
+
+		/* Local APIC */
+		lapic@fee00000 {
+			compatible = "intel,lapic";
+			reg = <0xfee00000 0x1000>;
+			phys_reg = <0xfee00000>;
+		};
+		/* Primary IO-APIC */
+		ioapic1: pic@fec00000 {
+			#interrupt-cells = <2>;
+			compatible = "intel,ioapic";
+			interrupt-controller;
+			device_type = "interrupt-controller";
+			id = <1>;
+			reg = <0xfec00000 0x1000>;
+			phys_reg = <0xfec00000>;
+		};
+
+		hpet@fed00000 {
+			compatible = "intel,hpet";
+			reg = <0xfed00000 0x200>;
+			phys_reg = <0xfed00000>;
+		};
+	};
+
+	isa@legacy {
+		device_type = "isa";
+		compatible = "simple-bus";
+		#address-cells = <2>;
+		#size-cells = <1>;
+		ranges = <0 0 0 0x400>;
+
+		rtc@legacy {
+			compatible = "motorola,mc146818";
+			interrupts ...
From: Benjamin Herrenschmidt
Date: Friday, November 26, 2010 - 2:57 pm

"Atom" would benefit from being more precise, like adding the model
number. Also you want some properties there defining maybe the mask, the
cache characteristics, etc... There's an exising OFW binding for x86, I
suppose you could follow it. A "reg" property at least is mandatory
here.

Also how do you plan to expose threading capability ?

You probably also want some linkage from the processor to the local APIC

What are those phys-reg properties ? Also APICs have some kind od
versionning, they aren't all identical, so your compatible property

All HPETs are identical ? If not, make your compatible property more
precise or if they are generally compatible from a programmer
perspective, use two entries from more generic to more specific, for
example:

	compatible = "intel,hpet","intel,hpet-atom-XXYY"


So ISA isn't a child of "atom"... that makes "atom" a bit strange as a
node, tho not a big deal per se. I suppose it represent the on-die
peripherals but then you need at least some linkage between that and

What does "simple-bus" means ? ISA has a well defined binding, you

I think the ISA binding mandate the use of the PNP codes in the
compatible properties, doesn't it ? At least that's the common usage
pattern I've seen so far on powerpc.

Also, "ctrl_reg" and "freq_reg" follow an existing binding ? If not,
then I'd suggest you use "-" instead of "_" which is more common in OFW
land and use more descriptive names since "reg" has a meaning of its own

Here you define a PCI bus with a child device that isn't PCI from what I
can tell, tho the "reg" property content is confusing, and then there's
a unit address that doesn't match "reg" and a "phys_reg" (what the heck
is that ?) that matches the unit-address. Care to explain a bit
more ? :-) I suspect that isn't the right way to represent the secondary
APIC


I notice that the interrupt number isn't part of your mask, is that
expected ? If you decide to make it so, remember that INT_A is 1 not 0

OFW PCI binding, ...
From: Sebastian Andrzej Siewior
Date: Sunday, November 28, 2010 - 9:04 am

I wasn't aware of the OFW binding for X86. I will follow it once I find
I haven't plan because this CPU has to threading capability. If there
is, I would follow the powerpc way (unless it is allready documented how
Like now I walk through the device tree and look for one but that sounds
The second ioapic behind the PCI bus which uses the reg property for the
devfn number so I can't use it for the chip address. I can't query the
PCI information because the PCI bus is not up yet.
The phys_reg property contains the physical address of the chip. The
boot uart code in powerpc's tree has a virtual-reg property. So I though
The APIC has a register where you can read the version of the chip, yes.
All hpets should be equal AFAIK. Some behave different but this was not
intendend in the first place. This information is not even included in

Yes, it should represent the on-die peripherals. How should that look
like? The bus the same level as the cpu node and link from the cpu to
I added simple bus in order to get probed. But I now I rember that this
I do. The reg property the rtc starts with "1" where 1 means it is an
I posted a patch for this at [0]. Powerpc uses the the pnpPNP,b00 node
for the rtc. This node is handled explictly by chrp and maple. Those two
don't use the generic driver but their own.

The remaining (mpc8572, p2020) handle this via add_rtc() in
arch/powerpc/sysdev/rtc_cmos_setup.c. What they do is, they create a
platform device for the OF node. What is missing is the initialization
of the ctrl_reg register and the frequency. This is performed in a PCI
quirk in quirk_final_uli1575() which is only performed on a few powerpc
machines (is is_quirk_valid() has a list).
This looks like dirty hack to me. I need to add every machine to it
rather then a simple entry into the device tree. If you replace the uli
The reg property contains the devfn number, interrupt mask, pin number.
That is what I've been seeing in PCI nodes. phys_reg is the physical
address of the chip since reg is ...
From: Benjamin Herrenschmidt
Date: Sunday, November 28, 2010 - 3:53 pm

Interesting, I though I would find it on
http://www.openfirmware.info/Bindings but it's not there...


Atoms have SMT don't they ?

The powerpc way somewhat sucks since it uses a property named
"ibm,interrupt-server#" :-) We might want to define something more
generic here but the base idea is sane.

IE. There is a node per core. Each "thread" is identified by a unique
number we call the "interrupt server number". The "reg" property of a

The day you have multiple Atom's on a board the "looking for one" won't
work well :-) Better have explicit references whenever you can for that

For the second ioapic, see below. This one isn't on PCI and should just

No, virtual-reg is a "hack" which contains a virtual address mapped by
the bootloader in the MMU for use by early boot code before takeover of
the MMU. It's not a physical address.

The physical addresses shall be expressed in the device-tree using the
normal mechanisms so that all the existing code to decode them "just

Ideally, you should add something like

 intel,ioapic-atomXXX intel-ioapic-vYY intel-ioapic

IE. From the most specific to the most generic. That way if a "quirk" is
ever needed due to an errata specific to that chip model that isn't
directly covered by the "version", you get to use that too (unless that
version register also contains things like mask number etc... in which

And of course I'm full of caca above, it shall be from the ost specific
to the most generic, so the other way around:


Well, you should probably still at least provide the "specific" part in
compatible so that difference can be quirked easily (though of course


device_type is a nasty bugger, we are trying to get rid of Linux
reliance on it.

Things like "simple-bus" don't rock my boat either, it's adding to the
device-tree "informations" that are specific to the way Linux will
interpret it, which is not how it should be.

In this case I would have said something like "atom,isa-bridge" but


I definitely agree that it's ...
From: Mitch Bradley
Date: Sunday, November 28, 2010 - 6:34 pm

I can't find the x86 binding either, which is a little bit embarrassing 
since I wrote it, albeit about 15 years ago...

If my memory is correct, it is not particularly useful now.  It 
primarily dealt with the ABI for transferring control to the OS and for 
calling back into the OFW client interface.  The only company that used 
it was Network Appliance, back when they were building their own x86 
motherboards (because most off-the-shelf mobo's of that era did not meet 
their stability requirements).

There has been a fair amount of churn since then, in relevant areas like 
x86 privileged architecture, compiler versions and code generation 
policies, popular bootloaders, OSs, and Linux early startup code.  The 
net result is that the ABI that the old binding specified probably isn't 
right for today.

I'd be happy to work with people to develop a new x86 binding.

The OLPC interface might be of some use as a starting point, but would 
need some work.  It is currently in use on AMD Geode, Via C7, and Intel 
Atom based systems, but, among other issues, it conflicts with the 
Physical Address Extension feature.

Mitch
--

From: H. Peter Anvin
Date: Monday, November 29, 2010 - 11:26 am

It conflicts with at least PAE, PAT and x86-64.  Since NX requires PAE,
it also conflicts with that.  As such, I would think it would have to be
considered obsolete.

	-hpa
--

From: Benjamin Herrenschmidt
Date: Monday, November 29, 2010 - 1:03 pm

In any case, what is of interest to Sebastian here isn't the ABI for the
client interface, but the properties to put in the /cpus/ nodes, so we
could start with that...

Cheers,
Ben.


--

From: Sebastian Andrzej Siewior
Date: Monday, November 29, 2010 - 12:44 pm

So for the CPU node I have so far:

         cpus {
                  #address-cells = <1>;
                  #size-cells = <0>;

                  cpu@0 {
                          device_type = "cpu";
			 compatible = "Intel,CE4100";
                          reg = <0>;
                          lapic = <&lapic0>;
                  };
          };

This one should match ePARP 1.0. David mentioned threads. I have just one.
No HyperThreading, nothing special. Should I just leave it as it or go
for:
         cpus {
                  #address-cells = <1>;
                  #size-cells = <0>;

                  cpu@0 {
                          device_type = "cpu";
			 compatible = "Intel,CE4100";	
			 reg = <0>;
                	         lapic = <&lapic0>;

			thread@0 {
         	                 reg = <0>;
                  	};
		};
          };
?

Sebastian
--

From: David Gibson
Date: Wednesday, December 1, 2010 - 5:40 pm

Leave it as is.  For hyperthreading there's a good chance you'll be
able to get away with the simple extension we're planning to use in
ePAPR 1.1, which would be:
	cpu@0 {
		...
		reg = <0 1 2 3>;
		...
	};

For, e.g. a cpu with 4 threads.

If more detailed per-thread information is needed then we or you might
want sub-nodes one day.  But even if we do that, we should allow them
to be omitted in the single-thread case.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
--

From: Scott Wood
Date: Monday, November 29, 2010 - 12:07 pm

On Mon, 29 Nov 2010 09:53:29 +1100

What is "@legacy"?  I don't think I've seen that in a unit address
before, googling only turns up this device tree, and a quick grep

The motivation for simple-bus comes from Linux, but its definition is
OS-neutral.  It indicates that no special bus knowledge is required to
access the devices under it.

I don't think it applies to ISA, though -- I/O space is special bus
knowledge, and the "ranges" looks weird for memory-space as well.

If we're going to get rid of device_type here, it would be nice to have
some other way to indicate that this node follows the ISA binding,
without having to recognize an implementation-specific compatible.

-Scott

--

From: Benjamin Herrenschmidt
Date: Monday, November 29, 2010 - 1:05 pm

That's never 100% true. In the case of ISA it's even less true due to


The code in drivers/of/address.c uses the name property to match isa
busses.

Cheers,
Ben.


--

From: Mitch Bradley
Date: Monday, November 29, 2010 - 1:32 pm

The usual layout is that the PCI bus is a direct child of
the root node, and the ISA bus is a child of the PCI bus.
That reflects the "Northbridge + Southbridge" wiring that
was common at the time that PCI was first introduced.
It's usually the case that faster and wider buses are closer
to the root, with speed and address width decreasing as you
go away from the root.

The fact that PCI configuration accesses are done via I/O
port 0x3fc doesn't make it a child of the ISA bus, because
I/O space is inherent in the x86 CPU architecture and thus
can be considered to be part of the root address space.

In the systems that I have worked with, the ISA bridge is a
first-class PCI device with a PCI config header, so it fits
naturally underneath the PCI bus.

Here are the properties for PCI and ISA on the OLPC XO-1.5
platform (Via C7 x86 CPU with Via VX855 IO chip):


ok dev /pci
ok .properties
interrupt-map            00000800 00000000 00000000 00000001 ff86bf34 0000000a 00000000
                         00006000 00000000 00000000 00000001 ff86bf34 0000000a 00000000
                         00008000 00000000 00000000 00000001 ff86bf34 0000000a 00000000
                         00008100 00000000 00000000 00000002 ff86bf34 00000009 00000000
                         00008200 00000000 00000000 00000003 ff86bf34 0000000b 00000000
                         00008400 00000000 00000000 00000004 ff86bf34 0000000a 00000000
                         0000a000 00000000 00000000 00000001 ff86bf34 00000009 00000000
interrupt-map-mask       0000ff00 00000000 00000000 00000007
#interrupt-cells         00000001
slot-names               00000000
slave-only               00000000
clock-frequency          01fca055
bus-range                00000000 00000000
#size-cells              00000002
#address-cells           00000003
device_type              pci
name                     pci

ok dev /pci/isa
ok .properties
devsel-speed             00000001
class-code               ...
From: Benjamin Herrenschmidt
Date: Monday, November 29, 2010 - 1:44 pm

Right, tho we have been relaxing that on SoC for some time now, at least
on powerpc, since the PCI bus itself tend to hang off one of the SoC
internal busses (as a sibling of other busses) and that those tend to be
represented in the tree, so we make PCI be a child of that SoC bus.

This is also useful in the case where you have multiple SoCs (some are
capable of SMP interconnects) in which case you really have multiple
separate PCI busses and it's clearer to have each of them be the child

This is actually the case of most systems, tho those Atoms SoC are a bit
weird as, afaik, they don't really have PCI... they just simulate some
kind of PCI config space for on-chip devices ,at least that's my
understanding.

Sebastian, do you have a block diagram of the SoC ? Following the actual
bus hierarchy of the chip might be the best approach.

Cheers,


--

From: Mitch Bradley
Date: Monday, November 29, 2010 - 2:32 pm

That seems fine to me.  It's not important that PCI be directly attached 
to the root bus.  I was mostly concerned about the relative positions of 
--

From: Alan Cox
Date: Monday, November 29, 2010 - 4:47 pm

This is true of a lot of devices on most "PCI" chipsets today. Even back
to things like the VIA K6 era chipsets with the V-Bus, or the
MediaGX/Geode where some of PCI space is a hallucination brought on by
SMM traps and BIOS upgrades have been known to add devices to the bus
which are not even neatly on their own sub-tree of any kind. Indeed some

That may not be wise. Your real bus heirarchy may not be architecturally
defined on some systems so you can't incorporate it into code, nor is it
necessarily a heirarchy - eg some of the Geodes.

Alan
--

From: Benjamin Herrenschmidt
Date: Monday, November 29, 2010 - 7:50 pm

Ok, so I'd suggest doing something like:

 - pci is below the corresponding atom node
 - isa is a child of pci

The later is a useful representation even if it doesn't correspond to
reality. From an address representation perspective, ISA can be
considered somewhat as a substractive decoding child of PCI (again even
if that's not 100% true), which simplifies the representation in the
device-tree a bit, and allows to still have things like VGA devices on
the PCI segment that decode IO ports in the ISA range.

Cheers,
Ben.


--

From: Sebastian Andrzej Siewior
Date: Tuesday, November 30, 2010 - 4:20 am

Sebastian
--

From: Alan Cox
Date: Monday, November 29, 2010 - 4:42 pm

That isn't strictly true either. On many PC devices the ISA bus (or LPC
bus nowdays) has no heirarchy as such because ISA cycles get issued if
the PCI cycles don't generate a response. In addition some cycles go to
both busses on some chipsets and there are various bits of magic so the
I/O spaces and particularly the memory spaces are intertwined.

So it's not a subordinate bus really, its a bit weirder. PCMCIA is
probably a sub-bus when you've got a PCI/PCMCIA adapter but ISA in
general is a bit fuzzy.

And then there are systems like PA-RISC where there are multiple entire
PCI/ISA busses hung off the primary bus which is neither 8)

There are also various bits of "architectural" space which are on the
motherboard (traditionally 0x00-0xFF) but some of which are on the CPU in
some cases (Cyrix was 0x21/22 if I remember), and there are other
architectural spaces like the ELCR which are "magic".

The PC is alas to computer architecture what perl is to programming
languages.

Alan
--

From: H. Peter Anvin
Date: Tuesday, November 30, 2010 - 2:18 pm

Actually, it can go both ways -- there are ISA/LPC busses which are true
childs of PCI busses -- in particular, are subject to the decoding
restrictions of the host bridge -- and there are those that aren't
logically even if they are physically.  The reason for this is that
subtractive decoding can be done either at the back end (as in a classic
PCI/ISA system with a single PCI bus) or at the front end (as in
HyperTransport for example.)

	-hpa
--

From: Sebastian Andrzej Siewior
Date: Tuesday, November 30, 2010 - 4:51 am

My PCI node does not have a reg property but its child nodes.
In x86_of_pci_init() [0] I walk through the PCI child nodes, read the reg
property of each node  which gives me the devfn of the device. I pass this
pci_get_slot() and get the pci_dev struct where I attach the of_node. So
it can by used by the pci driver.

How do I create the PCI device <-> OF node mapping with this? I think your
isa node has this as well. 00008800 is the devfn and I simply forgot the
four blocks of zeros.

[0] http://lkml.org/lkml/2010/11/25/294

Sebastian
--

From: Benjamin Herrenschmidt
Date: Tuesday, November 30, 2010 - 1:31 pm

I think Mitch meant the PCI host bridge usually has a reg property in
the address space of the parent, used to identify the bridge uniquely if
you have multiple of them.

Cheers,
Ben.


--

From: David Gibson
Date: Monday, November 29, 2010 - 4:58 pm

Well, sort of.  More specifically, that plus it indicates that the bus
does not support any method of dynamic probing, so the device tree is
the *only* way to figure out what's on it.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
--

From: Sebastian Andrzej Siewior
Date: Monday, November 29, 2010 - 12:36 pm

It is available on some Atom CPUs. This one does not support it. It has


Okay, so we want this for a quirk at a later point in time. Now I
understand.


Would "isa-bridge" be acceptable? So I don't have to add a new bus to the 

Okay, so I replace _ with - in ctrl_reg and freq_reg if this is your only

Yes. of_address_to_resource() will do the right thing in this case. It can
only be used after unflatten_device_tree() and I need this earlier.
Now using unflatten_device_tree() earlier isn't that easy, or is it.
I defered the ioapic init a little, so it is now called from
x86_init.mpparse.get_smp_config() so I have alloc_bootmem() working.
So unflatten_device_tree() seems to work here. The ugly part comes now:
early_init_dt_alloc_memory_arch() expects u64 which works with
phys_to_virt() and the other way around. This isn't really the case with
what __alloc_bootmem(). This looks like phys_map to me. Since the dtb code
simply uses phys_to_virt() it doesn't really matter. So it works and I 

Sebastian
--

From: Benjamin Herrenschmidt
Date: Monday, November 29, 2010 - 1:14 pm

More precisely, if something has to depend on a specific
revision/errata/feature, in the future, it would be problematic to have
to modify the device-tree.

The "rule" for compatible is to be a list going from a reasonably
precise description of the specific device to the more generic

Just call it 'isa', as for device_type, we shouldn't need it.

The default "probe list" is crap. If you want to have platform devices
instanciated for the ISA devices from the device-tree, I'd rather you
explicitely do it from the architecture code. As Scott said, "isa"


You can probably do the unflattening way before alloc_bootmem is
available.

The unflattening does a first pass to scan for the size, so all you need
is a way to get a single contiguous chunk of memory, I'm sure x86 has
ways to provide that sort of thing really early before bootmem is

Yeah just __pa what alloc_bootmem returns but as I said, it should
probably be unflattened earlier than that.

Peter (CC) should be able to help finding the right spot/API there.

Cheers,

--

From: David Gibson
Date: Sunday, November 28, 2010 - 7:22 pm

In the PowerPC flat-tree world, the newly established convention is to
extend the generic names convention to cpu nodes, so we name the nodes
just "cpu@0" etc. and move the more specific cpu type ("PowerPC,970FX"
/ "x86,Atom" / whatever) to the compatible property.  I'd recommend
this convention to you, even though it's a bit of a break from earlier
standard practice, it makes device tree manipulations by bootloaders

Unless the existing x86 bindings specify something different, I'd
suggest the method we're planning to put into ePAPR 1.1 for PowerPC
chips.  That is, threads sharing an MMU go in the same cpu node, with
the individual thread numbers given as multiple entries in the "reg"
property.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
--

From: Sebastian Andrzej Siewior
Date: Thursday, November 25, 2010 - 10:39 am

parse_setup_data() uses early_memremap() for a PAGE_SIZE mapping in
order to figure out the type & size. If this mapping is not large enough
then parse_e820_ext() will remap this area again via early_ioremap()
since the first mapping is still in use.

This patch attempts to simplify the handling and parse_e820_ext() does
not need to worry about the mapping anymore.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
CC: x86@kernel.org
Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
---
 arch/x86/include/asm/e820.h |    2 +-
 arch/x86/kernel/e820.c      |    8 +-------
 arch/x86/kernel/setup.c     |   11 +++++++++--
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index 5be1542..e956492 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -93,7 +93,7 @@ extern void e820_setup_gap(void);
 extern int e820_search_gap(unsigned long *gapstart, unsigned long *gapsize,
 			unsigned long start_addr, unsigned long long end_addr);
 struct setup_data;
-extern void parse_e820_ext(struct setup_data *data, unsigned long pa_data);
+extern void parse_e820_ext(struct setup_data *data);
 
 #if defined(CONFIG_X86_64) || \
 	(defined(CONFIG_X86_32) && defined(CONFIG_HIBERNATION))
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 0c2b7ef..33f6361 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -666,21 +666,15 @@ __init void e820_setup_gap(void)
  * boot_params.e820_map, others are passed via SETUP_E820_EXT node of
  * linked list of struct setup_data, which is parsed here.
  */
-void __init parse_e820_ext(struct setup_data *sdata, unsigned long pa_data)
+void __init parse_e820_ext(struct setup_data *sdata)
 {
-	u32 map_len;
 	int entries;
 	struct e820entry *extmap;
 
 	entries = sdata->len / sizeof(struct e820entry);
-	map_len = sdata->len + sizeof(struct setup_data);
-	if (map_len > PAGE_SIZE)
-		sdata = ...
From: Sebastian Andrzej Siewior
Date: Wednesday, December 8, 2010 - 1:38 am

Nobody commented this and haven't seen it merged. Is it good to go?

Sebastian
--

From: Thomas Gleixner
Date: Wednesday, December 8, 2010 - 7:15 am

Yup. It's in my list of stuff to take care of :)

Thanks,

	tglx
--

From: H. Peter Anvin
Date: Wednesday, December 15, 2010 - 4:28 pm

I like the fact that this puts all the mapping in the same layer, but I
also think it's unfortunate to discard the optimization of always
mapping the minimum of <header length, rest of page>; your code will
*always* map-unmap-map the code, even in the (presumably very common)
case of the data fitting on a single page.

Furthermore, your code retains a minor bug from the original code, which
is that if the header is not page-aligned, it may be needlessly map more
than one page with unknown content.

The proper way to do it is probably (this is pseudocode):

maplen = max(PAGE_SIZE - (pa_data & ~PAGE_MASK),
             sizeof(struct setup_data));
data = early_memremap(pa_data, maplen);
len = data->len + sizeof(struct setup_data);
if (len > maplen) {
	early_iounmap(pa_data, maplen);
	data = early_memremap(pa_data, maplen);
}

/* ... */

early_iounmap(pa_data, maplen);

I also found your patch description to be needlessly hard to follow.
The key point is that it puts all the map manipulation into
parse_setup_data() where it belongs.  Since you're changing an
interface, however, also do note that you have checked that there are no
other callers to parse_e820_ext().

	-hpa
--

From: Sebastian Andrzej Siewior
Date: Thursday, December 16, 2010 - 2:55 am

I just checked that early_memremap() maps the memory if it is not on a



Sebastian
--

From: Sebastian Andrzej Siewior
Date: Thursday, November 25, 2010 - 10:39 am

Set hpet_address based on information provied form DTB

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
CC: x86@kernel.org
Cc: devicetree-discuss@lists.ozlabs.org
Tested-by: Dirk Brandewie <dirk.brandewie@gmail.com>
---
 arch/x86/kernel/prom.c |   23 +++++++++++++++++++++++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/prom.c b/arch/x86/kernel/prom.c
index 9551f2f..f61c541 100644
--- a/arch/x86/kernel/prom.c
+++ b/arch/x86/kernel/prom.c
@@ -9,6 +9,7 @@
 #include <linux/of_platform.h>
 #include <linux/slab.h>
 
+#include <asm/hpet.h>
 #include <asm/irq_controller.h>
 #include <asm/io_apic.h>
 
@@ -99,6 +100,27 @@ void __init add_dtb(u64 data)
 				offsetof(struct setup_data, data));
 }
 
+static int __init early_scan_hpet(unsigned long node, const char *uname,
+				   int depth, void *data)
+{
+	unsigned long l;
+	int ret;
+	__be32 *cell;
+
+	if (depth != 2)
+		return 0;
+
+	ret = of_flat_dt_is_compatible(node, "intel,hpet");
+	if (!ret)
+		return 0;
+
+	cell = of_get_flat_dt_prop(node, "phys_reg", &l);
+	if (!cell)
+		return 0;
+	hpet_address = of_read_ulong(cell, l / 4);
+	return 1;
+}
+
 static void __init of_lapic_setup(void)
 {
 #ifdef CONFIG_X86_LOCAL_APIC
@@ -195,6 +217,7 @@ void __init x86_early_of_parse(void)
 
 	/* root level address cells */
 	of_scan_flat_dt(early_init_dt_scan_root, NULL);
+	of_scan_flat_dt(early_scan_hpet, NULL);
 	of_apic_setup();
 
 	early_iounmap(initial_boot_params, size);
-- 
1.7.3.2

--

From: Sebastian Andrzej Siewior
Date: Thursday, November 25, 2010 - 10:39 am

For now we probe these busses and we change is to board dependent probes
once we have to.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
CC: x86@kernel.org
Cc: devicetree-discuss@lists.ozlabs.org
Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
---
 arch/x86/kernel/prom.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/prom.c b/arch/x86/kernel/prom.c
index c02777d..bbd6064 100644
--- a/arch/x86/kernel/prom.c
+++ b/arch/x86/kernel/prom.c
@@ -102,6 +102,25 @@ void __init add_dtb(u64 data)
 				offsetof(struct setup_data, data));
 }
 
+/*
+ * CE4100 ids. Will be moved to machine_device_initcall() once we have it.
+ */
+static struct of_device_id __initdata ce4100_ids[] = {
+	{ .type = "soc", },
+	{ .compatible = "soc", },
+	{ .compatible = "simple-bus", },
+	{},
+};
+
+static int __init add_bus_probe(void)
+{
+	if (!initial_boot_params)
+		return 0;
+
+	return of_platform_bus_probe(NULL, ce4100_ids, NULL);
+}
+module_init(add_bus_probe);
+
 static int of_irq_map_pci(struct pci_dev *dev, struct of_irq *oirq)
 {
 	struct device_node *node;
-- 
1.7.3.2

--

From: Sebastian Andrzej Siewior
Date: Thursday, November 25, 2010 - 10:40 am

and hpet and a few others things....

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
CC: x86@kernel.org
Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
---
 arch/x86/platform/ce4100/ce4100.c |   16 ++++++++++++++--
 1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/platform/ce4100/ce4100.c b/arch/x86/platform/ce4100/ce4100.c
index 0ede12b..5ed25df 100644
--- a/arch/x86/platform/ce4100/ce4100.c
+++ b/arch/x86/platform/ce4100/ce4100.c
@@ -13,7 +13,11 @@
 #include <linux/irq.h>
 #include <linux/module.h>
 
+#include <asm/prom.h>
 #include <asm/setup.h>
+#include <asm/io.h>
+#include <asm/i8259.h>
+#include <asm/io_apic.h>
 
 static int ce4100_i8042_detect(void)
 {
@@ -24,8 +28,11 @@ static void __init sdv_arch_setup(void)
 {
 }
 
-static void __init sdv_find_smp_config(void)
+static void __cpuinit sdv_pci_init(void)
 {
+	x86_of_pci_init();
+	/* We can't set this earlier, because we need calibrate the timer */
+	legacy_pic = &null_legacy_pic;
 }
 
 /*
@@ -38,5 +45,10 @@ void __init x86_ce4100_early_setup(void)
 	x86_platform.i8042_detect = ce4100_i8042_detect;
 	x86_init.resources.probe_roms = x86_init_noop;
 	x86_init.mpparse.get_smp_config = x86_init_uint_noop;
-	x86_init.mpparse.find_smp_config = sdv_find_smp_config;
+	x86_init.mpparse.find_smp_config = x86_early_of_parse;
+
+#ifdef CONFIG_X86_IO_APIC
+	x86_init.pci.init_irq = sdv_pci_init;
+	x86_init.mpparse.setup_ioapic_ids = setup_ioapic_ids_from_apicid;
+#endif
 }
-- 
1.7.3.2

--

From: Sebastian Andrzej Siewior
Date: Thursday, November 25, 2010 - 10:40 am

This one goes through the registered IO-APICs and sets the id which the
core code is using.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
CC: x86@kernel.org
Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
---
 arch/x86/include/asm/io_apic.h |    1 +
 arch/x86/kernel/apic/io_apic.c |   44 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index dc1169f..c920657 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -170,6 +170,7 @@ extern int restore_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries);
 
 extern int get_nr_irqs_gsi(void);
 extern void setup_ioapic_ids_from_mpc(void);
+void setup_ioapic_ids_from_apicid(void);
 
 struct mp_ioapic_gsi{
 	u32 gsi_base;
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 27a5709..74cfe9b 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2047,6 +2047,50 @@ void __init setup_ioapic_ids_from_mpc(void)
 			apic_printk(APIC_VERBOSE, " ok.\n");
 	}
 }
+/*
+ * We assume here that the ids in mp_ioapics are correct but not yet
+ * written to the ioapic. While doing so we verify that those ids are
+ * unique.
+ */
+static __initdata DECLARE_BITMAP(apic_id_mask, MAX_APICS);
+void __init setup_ioapic_ids_from_apicid(void)
+{
+	union IO_APIC_reg_00 reg_00;
+	int apic_id;
+	unsigned long flags;
+
+	for (apic_id = 0; apic_id < nr_ioapics; apic_id++) {
+
+		if (mp_ioapics[apic_id].apicid > MAX_APICS) {
+			WARN_ON(1);
+			continue;
+		}
+
+		if (test_bit(mp_ioapics[apic_id].apicid, apic_id_mask)) {
+			WARN_ON(1);
+			continue;
+		}
+
+		set_bit(mp_ioapics[apic_id].apicid, apic_id_mask);
+
+		raw_spin_lock_irqsave(&ioapic_lock, flags);
+		reg_00.raw = io_apic_read(apic_id, 0);
+		raw_spin_unlock_irqrestore(&ioapic_lock, flags);
+
+		if (reg_00.bits.ID == ...
From: Yinghai Lu
Date: Thursday, November 25, 2010 - 2:04 pm

On Thu, Nov 25, 2010 at 9:40 AM, Sebastian Andrzej Siewior

can you update and split setup_ioapic_ids_from_mpc() for your using?

Thanks

Yinghai
--

From: Sebastian Andrzej Siewior
Date: Friday, November 26, 2010 - 4:03 am

Sebastian
--

From: Sebastian Andrzej Siewior
Date: Friday, November 26, 2010 - 9:50 am

I think that I need this function to write the APIC id back to the chip
as it is not initialized at all. apic_version[boot_cpu_physical_apicid]
return 0x14 and therefore it is skipped. The manual on the other hand
says that these ids should be uniqe.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
CC: x86@kernel.org
Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
---
 arch/x86/include/asm/io_apic.h |    1 +
 arch/x86/kernel/apic/io_apic.c |   28 ++++++++++++++++------------
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index dc1169f..185d484 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -170,6 +170,7 @@ extern int restore_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries);
 
 extern int get_nr_irqs_gsi(void);
 extern void setup_ioapic_ids_from_mpc(void);
+void setup_ioapic_ids_from_mpc_nocheck(void);
 
 struct mp_ioapic_gsi{
 	u32 gsi_base;
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 27a5709..56c45ab 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1939,8 +1939,7 @@ void disable_IO_APIC(void)
  *
  * by Matt Domsch <Matt_Domsch@dell.com>  Tue Dec 21 12:25:05 CST 1999
  */
-
-void __init setup_ioapic_ids_from_mpc(void)
+void __init setup_ioapic_ids_from_mpc_nocheck(void)
 {
 	union IO_APIC_reg_00 reg_00;
 	physid_mask_t phys_id_present_map;
@@ -1949,15 +1948,6 @@ void __init setup_ioapic_ids_from_mpc(void)
 	unsigned char old_id;
 	unsigned long flags;
 
-	if (acpi_ioapic)
-		return;
-	/*
-	 * Don't check I/O APIC IDs for xAPIC systems.  They have
-	 * no meaning without the serial APIC bus.
-	 */
-	if (!(boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
-		|| APIC_XAPIC(apic_version[boot_cpu_physical_apicid]))
-		return;
 	/*
 	 * This is broken; anything with a real cpu count has to
 	 * circumvent this idiocy regardless.
@@ ...
From: tip-bot for Sebastian Andrzej Siewior
Date: Monday, December 6, 2010 - 6:33 am

Commit-ID:  a38c5380ef9f088be9f49b6e4c5d80af8b1b5cd4
Gitweb:     http://git.kernel.org/tip/a38c5380ef9f088be9f49b6e4c5d80af8b1b5cd4
Author:     Sebastian Andrzej Siewior <bigeasy@linutronix.de>
AuthorDate: Fri, 26 Nov 2010 17:50:20 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 6 Dec 2010 14:30:28 +0100

x86: io_apic: Split setup_ioapic_ids_from_mpc()

Sodaville needs to setup the IO_APIC ids as the boot loader leaves
them uninitialized. Split out the setter function so it can be called
unconditionally from the sodaville board code.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20101126165020.GA26361@www.tglx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/io_apic.h |    1 +
 arch/x86/kernel/apic/io_apic.c |   28 ++++++++++++++++------------
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 240a0a5..d7d46cb 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -169,6 +169,7 @@ extern void mask_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries);
 extern int restore_IO_APIC_setup(struct IO_APIC_route_entry **ioapic_entries);
 
 extern void setup_ioapic_ids_from_mpc(void);
+extern void setup_ioapic_ids_from_mpc_nocheck(void);
 
 struct mp_ioapic_gsi{
 	u32 gsi_base;
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index ce3c6fb..4f026a6 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1934,8 +1934,7 @@ void disable_IO_APIC(void)
  *
  * by Matt Domsch <Matt_Domsch@dell.com>  Tue Dec 21 12:25:05 CST 1999
  */
-
-void __init setup_ioapic_ids_from_mpc(void)
+void __init setup_ioapic_ids_from_mpc_nocheck(void)
 {
 	union IO_APIC_reg_00 reg_00;
 	physid_mask_t phys_id_present_map;
@@ -1944,15 +1943,6 @@ void __init setup_ioapic_ids_from_mpc(void)
 ...
From: Yinghai Lu
Date: Tuesday, December 7, 2010 - 1:59 am

For 32bit mptable path, setup_ids_from_mpc() always write io apic id
register, even there is no change needed.

So try to do that when they are different bewteen reading out and mptable

-v2: update to recent setup_ioapic_ids_from_mpc() split..

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/apic/io_apic.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/kernel/apic/io_apic.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apic/io_apic.c
+++ linux-2.6/arch/x86/kernel/apic/io_apic.c
@@ -2006,9 +2006,12 @@ void __init setup_ioapic_ids_from_mpc_no
 						= mp_ioapics[apic_id].apicid;
 
 		/*
-		 * Read the right value from the MPC table and
-		 * write it into the ID register.
+		 * Update the ID register according to the right value from
+		 *  the MPC table if they are different.
 		 */
+		if (mp_ioapics[apic_id].apicid == reg_00.bits.ID)
+			continue;
+
 		apic_printk(APIC_VERBOSE, KERN_INFO
 			"...changing IO-APIC physical APIC ID to %d ...",
 			mp_ioapics[apic_id].apicid);
--

From: tip-bot for Yinghai Lu
Date: Thursday, December 9, 2010 - 1:56 pm

Commit-ID:  60d79fd99ff3b9c692b260a4d53a203f537c052a
Gitweb:     http://git.kernel.org/tip/60d79fd99ff3b9c692b260a4d53a203f537c052a
Author:     Yinghai Lu <yinghai@kernel.org>
AuthorDate: Tue, 7 Dec 2010 00:59:49 -0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 9 Dec 2010 21:52:05 +0100

x86, ioapic: Avoid writing io_apic id if already correct

For 32bit mptable path, setup_ids_from_mpc() always writes the io_apic
id register, even there is no change needed.

Skip the write, when readout and mptable match.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
LKML-Reference: <4CFDF785.7010401@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/apic/io_apic.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 4abf08a..8a02150 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2007,9 +2007,12 @@ void __init setup_ioapic_ids_from_mpc_nocheck(void)
 						= mp_ioapics[apic_id].apicid;
 
 		/*
-		 * Read the right value from the MPC table and
-		 * write it into the ID register.
+		 * Update the ID register according to the right value
+		 * from the MPC table if they are different.
 		 */
+		if (mp_ioapics[apic_id].apicid == reg_00.bits.ID)
+			continue;
+
 		apic_printk(APIC_VERBOSE, KERN_INFO
 			"...changing IO-APIC physical APIC ID to %d ...",
 			mp_ioapics[apic_id].apicid);
--

Previous thread: Re: [RFC/PATCH v6 02/12] media: Media device by Andy Walls on Thursday, November 25, 2010 - 10:20 am. (3 messages)

Next thread: Congratulations!!! by Irish Claims Desk on Thursday, November 25, 2010 - 10:54 am. (1 message)