login
Header Space

 
 

Re: 2.6.25-mm1

Previous thread: [PATCH 1/1] Update email address in MODULE_AUTHOR by Hans-Christian Egtvedt on Friday, April 18, 2008 - 4:02 am. (1 message)

Next thread: Problem with delayed data from pl2303 usb serial gps by Helge Hafting on Friday, April 18, 2008 - 6:16 am. (2 messages)
To: <linux-kernel@...>
Subject: 2.6.25-mm1
Date: Friday, April 18, 2008 - 4:47 am

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25/2.6.25-mm1/ 

- git-xfs is undropped because I finally got around to fixing its clashes
  with git-vfs.

- git-arm-master, git-sparc64 and perhaps others are dropped because they
  don't generate a clean pull.  They might be empty - I didn't check.

- git-kvm remains dropped due to clashes with git-s390 and perhaps git-x86.

- git-selinux is newly dropped due to memory corruption regressions.

- git-nfs is (perhaps permanently) dropped because its content is also in
  git-nfsd.

- git-drm remains reverted due to build failures

- Tomorrow I'll do the -mm merge plans email and I'll dump a couple hundred
  patches on tree maintainers (these have about a 15% yay-he-merged-it rate).

  Then I'm travelling for a poorly-timed week.  I return late in the merge
  window to find out if any of these patches still apply.



Boilerplate:

- See the `hot-fixes' directory for any important updates to this patchset.

- To fetch an -mm tree using git, use (for example)

  git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1
  git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1

- -mm kernel commit activity can be reviewed by subscribing to the
  mm-commits mailing list.

        echo "subscribe mm-commits" | mail majordomo@vger.kernel.org

- If you hit a bug in -mm and it is not obvious which patch caused it, it is
  most valuable if you can perform a bisection search to identify which patch
  introduced the bug.  Instructions for this process are at

        http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt

  But beware that this process takes some time (around ten rebuilds and
  reboots), so consider reporting the bug first and if we cannot immediately
  identify the faulty patch, then perform the bisection search.

- When reporting bugs, please try to Cc: the relevant maintainer and mailing
  list on any email.

- When reporting bugs in ...
To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, Al Viro <viro@...>, <linux-fsdevel@...>
Date: Monday, April 21, 2008 - 4:31 am

Hi,

$ ls /usr/share/man/cat3readlin
Segmentation fault

[the file doesn't exist.]
This is probably the same bug as in -rc8-mm2 I reported here:
http://www.opensubscriber.com/message/linux-kernel@vger.kernel.org/9008289.html

general protection fault: 0000 [1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:19.0/net/eth0/statistics/collisions
CPU 0
Modules linked in: test ipv6 tun bitrev arc4 ecb crypto_blkcipher cryptomgr 
crypto_algapi ath5k mac80211 crc32 sr_mod usbhid ohci1394 rtc_cmos hid rtc_core 
cfg80211 ieee1394 cdrom ehci_hcd rtc_lib ff_memless floppy evdev
Pid: 24838, comm: man Not tainted 2.6.25-mm1_64 #403
RIP: 0010:[&lt;ffffffff802aca27&gt;]  [&lt;ffffffff802aca27&gt;] __d_lookup+0x97/0x160
RSP: 0018:ffff8100337d1b98  EFLAGS: 00010206
RAX: 00f0000000000000 RBX: 00f0000000000000 RCX: 0000000000000012
RDX: ffff8100200830e0 RSI: ffff8100337d1ca8 RDI: ffff810079195708
RBP: ffff8100337d1bf8 R08: ffff8100337d1ca8 R09: 0000000000000000
R10: 000000000000013d R11: 0000000000000246 R12: ffff8100200830c8
R13: 00000000198eaed5 R14: ffff810079195708 R15: ffff8100337d1bc8
FS:  00007f447b5c06f0(0000) GS:ffffffff80664000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000001484f88 CR3: 000000005fac4000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process man (pid: 24838, threadinfo ffff8100337d0000, task ffff810034418000)
Stack:  ffff8100337d1ca8 000000000000000b ffff810079195710 0000000b792561a0
  ffff81003136600f ffffffff802f9073 00f0000000000000 0000000000000001
  ffff8100337d1e48 ffff8100337d1e48 ffff8100337d1ca8 ffff8100337d1cb8
Call Trace:
  [&lt;ffffffff802f9073&gt;] ? ext3_lookup+0xc3/0x100
  [&lt;ffffffff802a1e85&gt;] do_lookup+0x35/0x220
  [&lt;ffffffff802a22c2&gt;] __link_path_walk+0x252/0x1010
  [&lt;ffffffff802b20ba&gt;] ? mntput_no_expire+0x2a/0x140
  [&lt;ffffffff802a30ee&gt;] path_walk+0x6e/0xe0
  [&lt;ffffff...
To: Jiri Slaby <jirislaby@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Monday, April 21, 2008 - 5:06 am

On Mon, Apr 21, 2008 at 10:31:40AM +0200, Jiri Slaby wrote:

        hlist_for_each_entry_rcu(dentry, node, head, d_hash) {
                struct qstr *qstr;

                if (dentry-&gt;d_name.hash != hash)
                        continue;

walking into node == (struct hlist_node *)0x00f0000000000000...

--
To: Al Viro <viro@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Monday, April 21, 2008 - 5:37 am

Yup, true, In the last oops I stuck on memcmp few lines below.

BTW. it's 100% reproducible after it happens once, but fixable by reboot. Any 
tests I should run (memtest, some printks sticked anywhere)?
--
To: Jiri Slaby <jirislaby@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Monday, April 21, 2008 - 5:45 am

Well, if list has such turd in it, you'll certainly hit it every time
you walk that list, so 100% reproducible is not surprising.

How well is it reproducible from fresh boot?
--
To: Al Viro <viro@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Monday, April 21, 2008 - 5:59 am

Few days with suspend/resume cycles. This one was booted 12 hours ago, one 
suspend/resume. Will keep an eye on it and keep you informed.
--
To: Jiri Slaby <jirislaby@...>
Cc: Al Viro <viro@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Monday, April 21, 2008 - 1:23 pm

Shall we see if we can catch it earlier?  I have no idea if this will
help ... I haven't even booted it on a testmachine yet ;-)  If I got
something wrong, it'll BUG() pretty early.

diff --git a/include/linux/list.h b/include/linux/list.h
index 75ce2cb..238ca1e 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -724,10 +724,17 @@ static inline int hlist_empty(const struct hlist_head *h)
 	return !h-&gt;first;
 }
 
+#ifdef CONFIG_DEBUG_LIST
+extern void hlist_check(struct hlist_node *n);
+#else
+#define hlist_check(n)		do { } while (0)
+#endif
+
 static inline void __hlist_del(struct hlist_node *n)
 {
 	struct hlist_node *next = n-&gt;next;
 	struct hlist_node **pprev = n-&gt;pprev;
+	hlist_check(n);
 	*pprev = next;
 	if (next)
 		next-&gt;pprev = pprev;
@@ -785,6 +792,7 @@ static inline void hlist_replace_rcu(struct hlist_node *old,
 {
 	struct hlist_node *next = old-&gt;next;
 
+	hlist_check(old);
 	new-&gt;next = next;
 	new-&gt;pprev = old-&gt;pprev;
 	smp_wmb();
@@ -840,6 +848,7 @@ static inline void hlist_add_head_rcu(struct hlist_node *n,
 static inline void hlist_add_before(struct hlist_node *n,
 					struct hlist_node *next)
 {
+	hlist_check(next);
 	n-&gt;pprev = next-&gt;pprev;
 	n-&gt;next = next;
 	next-&gt;pprev = &amp;n-&gt;next;
@@ -849,6 +858,7 @@ static inline void hlist_add_before(struct hlist_node *n,
 static inline void hlist_add_after(struct hlist_node *n,
 					struct hlist_node *next)
 {
+	hlist_check(next);
 	next-&gt;next = n-&gt;next;
 	n-&gt;next = next;
 	next-&gt;pprev = &amp;n-&gt;next;
@@ -878,6 +888,7 @@ static inline void hlist_add_after(struct hlist_node *n,
 static inline void hlist_add_before_rcu(struct hlist_node *n,
 					struct hlist_node *next)
 {
+	hlist_check(next);
 	n-&gt;pprev = next-&gt;pprev;
 	n-&gt;next = next;
 	smp_wmb();
@@ -906,6 +917,7 @@ static inline void hlist_add_before_rcu(struct hlist_node *n,
 static inline void hlist_add_after_rcu(struct hlist_node *prev,
 				       struct hl...
To: Jiri Slaby <jirislaby@...>
Cc: Al Viro <viro@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <linux-fsdevel@...>
Date: Monday, April 21, 2008 - 9:42 am

I think that's exactly the same problem I reported here:
http://lkml.org/lkml/2008/4/20/182
for 2.6.25-git2, so it hit the mainline and seems to be related to RCU.

Thanks,
Rafael
--
To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, Ingo Molnar <mingo@...>, <linux-mm@...>
Date: Sunday, April 20, 2008 - 7:29 am

Hi, I'm not sure by what was this caused.

LANG=en strace -fo strace_gcc.txt  gcc -Wp,-MD,drivers/usb/class/.usblp.o.d 
-nostdinc -isystem /usr/lib64/gcc/x86_64-suse-linux/4.3/include -D__KERNEL__ 
-Iinclude -Iinclude2 -I/home/l/latest/xxx/include -include 
include/linux/autoconf.h -I/home/l/latest/xxx/drivers/usb/class 
-Idrivers/usb/class -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs 
-fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -O2 
-fno-stack-protector -m64 -march=core2 -mno-red-zone -mcmodel=kernel 
-funit-at-a-time -maccumulate-outgoing-args -DCONFIG_AS_CFI=1 
-DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -Wno-sign-compare 
-fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow 
-I/home/l/latest/xxx/include/asm-x86/mach-default -Iinclude/asm-x86/mach-default 
-fno-omit-frame-pointer -fno-optimize-sibling-calls -g 
-Wdeclaration-after-statement -Wno-pointer-sign -DMODULE -D"KBUILD_STR(s)=#s" 
-D"KBUILD_BASENAME=KBUILD_STR(usblp)"  -D"KBUILD_MODNAME=KBUILD_STR(usblp)" 
/home/l/latest/xxx/drivers/usb/class/usblp.c -S -o usblp.s
/home/l/latest/xxx/drivers/usb/class/usblp.c: In function 'usblp_submit_read':
/home/l/latest/xxx/drivers/usb/class/usblp.c:977: internal compiler error: 
Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See &lt;http://bugs.opensuse.org/&gt; for instructions.




strace_gcc.txt:
http://www.fi.muni.cz/~xslaby/sklad/strace_gcc.txt

preprocessor output available here:
http://www.fi.muni.cz/~xslaby/sklad/usblp.E

Reboot fixed it. It happened after few suspend/resume cycles. The preproc output 
differs in no way from after the reboot. Now, the strace looks like:
5341  mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
= 0x7f362e004000
5341  mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
0) = 0x7f362df04000
5341  brk(0x1964000)                    = 0x1964000
5341  brk(0x194c000)                    = 0x194c000
5341  brk(0x...
To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>
Date: Friday, April 18, 2008 - 11:10 pm

New, in 2.6.25-mm1 is a hang I'm seeing, just after the kernel prints:

"[    0.160375] NET: Registered protocol family 16"

The hang lasts about five minutes, and then boot continues.  Just
after that, a backtrace is printed; I don't know if it's related.  The
backtrace will follow.

This does not occur in mainline.  It seems it might be related to OLPC
support -- I enabled all those options -- but that's not good
behavior, and I see no warning of thus in the help.

I'm sending a number or reports against 2.6.25-mm1, so I've put my
dmesg and .config on a server:

http://home.columbus.rr.com/jfannin3/dmesg.txt
http://home.columbus.rr.com/jfannin3/config-2.6.25-mm1.txt

[    0.160375] NET: Registered protocol family 16
[  400.782683] ------------[ cut here ]------------
[  400.782832] WARNING: at arch/x86/mm/ioremap.c:158 __ioremap_caller+0x27d/0x2e0()
[  400.783022] Modules linked in:
[  400.783169] Pid: 1, comm: swapper Not tainted 2.6.25-mm1 #7
[  400.783300]  [&lt;c0130fa9&gt;] warn_on_slowpath+0x59/0x80
[  400.783480]  [&lt;c0106c2e&gt;] ? profile_pc+0x3e/0x50
[  400.783682]  [&lt;c01374ee&gt;] ? irq_exit+0x4e/0xa0
[  400.783879]  [&lt;c0115aec&gt;] ? smp_apic_timer_interrupt+0x5c/0x90
[  400.784087]  [&lt;c024314c&gt;] ? trace_hardirqs_on_thunk+0xc/0x10
[  400.784298]  [&lt;c01552cd&gt;] ? trace_hardirqs_on_caller+0xcd/0x150
[  400.784506]  [&lt;c024314c&gt;] ? trace_hardirqs_on_thunk+0xc/0x10
[  400.784706]  [&lt;c010416c&gt;] ? restore_nocheck_notrace+0x0/0xe
[  400.784906]  [&lt;c011d0e6&gt;] ? page_is_ram+0xa6/0xd0
[  400.785059]  [&lt;c011d4ed&gt;] __ioremap_caller+0x27d/0x2e0
[  400.785221]  [&lt;c03569d8&gt;] ? _spin_unlock_irqrestore+0x48/0x80
[  400.785421]  [&lt;c017f4cd&gt;] ? ftrace_record_ip+0x7d/0x250
[  400.785621]  [&lt;c0474801&gt;] ? olpc_init+0x31/0x140
[  400.785817]  [&lt;c011d59f&gt;] ioremap_nocache+0x1f/0x30
[  400.785976]  [&lt;c0474801&gt;] ? olpc_init+0x31/0x140
[  400.786165]  [&lt;c0474801&gt;] olpc_init+0x31/0x140
[  400.786318]  [&lt;c04...
To: Joseph Fannin <jfannin@...>
Cc: <linux-kernel@...>, Andres Salomon <dilinger@...>, Ingo Molnar <mingo@...>
Date: Friday, April 18, 2008 - 11:29 pm

Please add initcall_debug to the kernel boot command line - that should

&lt;looks at this again&gt;

That's

                WARN_ON_ONCE(is_ram);

the changelog for the patch which added that warning is information-free
and there's no code comment explaining what went wrong, which makes things
rather harder than they ought to be.

Yes it's due to the new OLPC code.  olpc_init() has

	romsig = ioremap(0xffffffc0, 16);

which we probably just shouldn't do this at all unless we're running on the
OLPC hardware.  But we need to do this to find out if we're running on the OLPC
hardware!  Perhaps the warning should just be removed.
--
To: Andrew Morton <akpm@...>
Cc: Joseph Fannin <jfannin@...>, <linux-kernel@...>, Andres Salomon <dilinger@...>, Ingo Molnar <mingo@...>
Date: Saturday, April 19, 2008 - 2:21 pm

On Fri, 18 Apr 2008 20:29:25 -0700


calling ioremap() on something which COULD be ram is... REALLY nasty.
The kernel has to mark that page uncached, for all users and mappings of that memory.
A second hard case then is to find out when the last ioremap() user has
released that memory (since there's several cases where different parts of the same
4K page can be ioremapped) before it can map it cached again. The good news is that
until this olpc patch got in, there were no users of this capability....
Instead of outright forbidding it though we added a warn_on to find out if the
assumption of no users was correct... 
seems it caught some new code which is trying to do this here.

this code should probably be a lot more careful and check that
1) there is no actual kernel memory or something else at this region
   (what if there's some other device there? this code could blow up)
2) the machine won't tripple fault or otherwise throw tantrums if
   this hardcoded value is accessed (not automatic on x86!!)
3) it only runs if there's a really high degree of confidence that this really is
   an OLPC device.
or maybe
4) get this address from some other table or system provided resource




-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To: Andrew Morton <akpm@...>
Cc: Joseph Fannin <jfannin@...>, <linux-kernel@...>, Ingo Molnar <mingo@...>, <jordan.crouse@...>
Date: Saturday, April 19, 2008 - 9:25 am

On Fri, 18 Apr 2008 20:29:25 -0700

Hm.  We could either protect that code with an:

if (!is_geode())
  return;

Or I could add the OpenFirmware patches which would allow us to get
rid of this code, and instead check for the existence of OFW using
that.

The former is quick and easy; the latter is (imo) nicer, so long as
people don't have problems w/ the OFW code.  :)


-- 
Need a kernel or Debian developer?  Contact me, I'm looking for contracts.
--
To: Andres Salomon <dilinger@...>
Cc: <jfannin@...>, <linux-kernel@...>, <mingo@...>, <jordan.crouse@...>
Date: Saturday, April 19, 2008 - 1:38 pm

Do both ;)

The quick-n-easy version sounds suitable for now.
--
To: Andrew Morton <akpm@...>
Cc: <jfannin@...>, <linux-kernel@...>, <mingo@...>, <jordan.crouse@...>
Date: Saturday, April 19, 2008 - 1:50 pm

On Sat, 19 Apr 2008 10:38:33 -0700

Heh, I already had sent the nicer version.  If people have some fundamental
problem w/ it, I can send the quick-n-easy version.


-- 
Need a kernel or Debian developer?  Contact me, I'm looking for contracts.
--
To: Andres Salomon <dilinger@...>
Cc: Andrew Morton <akpm@...>, <jfannin@...>, <linux-kernel@...>, <mingo@...>
Date: Monday, April 21, 2008 - 10:56 am

I prefer the nicer version.  It is not a good policy IMHO to wrap OLPC
specfic code with is_geode() and friends.  Even by Geode standards, we've
abused the code greatly for the benefit of the Geode, and few of those
abuses would translate very well even to the general Geode community.  I 
would prefer that we use the is_olpc() and #ifdef wrappers to ensure
that the code that is exclusively OLPC stays exclusively OLPC.

Thanks,
Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.

--
To: Jordan Crouse <jordan.crouse@...>
Cc: Andrew Morton <akpm@...>, <jfannin@...>, <linux-kernel@...>, <mingo@...>
Date: Monday, April 21, 2008 - 11:05 am

On Mon, 21 Apr 2008 08:56:19 -0600

Yeah, like I said; the nicer version is the _correct_ way to do things.  I
just fear that the OFW code isn't ready for merging (see hpa's concerns).

The code is already #ifdef'd (the original reporter had enabled
CONFIG_OLPC), and the code in question is what determines what is_olpc()
should return.  is_geode() is just to narrow the scope of what hardware
the check runs on.




-- 
Need a kernel or Debian developer?  Contact me, I'm looking for contracts.
--
To: Andres Salomon <dilinger@...>
Cc: Andrew Morton <akpm@...>, <jfannin@...>, <linux-kernel@...>, <mingo@...>
Date: Monday, April 21, 2008 - 11:12 am

My bad, I missed the key points.  This still is dangerous on a generic
Geode, but at least if they encounter the problem, we can loudly proclaim
"Don't do that".

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.

--
To: Andrew Morton <akpm@...>
Cc: Joseph Fannin <jfannin@...>, <linux-kernel@...>, Ingo Molnar <mingo@...>, <jordan.crouse@...>, Mitch Bradley <wmb@...>
Date: Saturday, April 19, 2008 - 1:39 pm

Prior to including OFW kernel support, we had to work around the lack of
OFW.  Once OFW support is added, we can switch to using it.  This cleans
up some pre-OFW model detection and OFW signature detection.

Note: this should be a bit nicer to non-OLPC hardware.

Signed-off-by: Andres Salomon &lt;dilinger@debian.org&gt;
---
 arch/x86/kernel/olpc.c |   43 +++++++++++++++++++++++++++++--------------
 1 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/olpc.c b/arch/x86/kernel/olpc.c
index 11670be..3a05683 100644
--- a/arch/x86/kernel/olpc.c
+++ b/arch/x86/kernel/olpc.c
@@ -190,11 +190,11 @@ EXPORT_SYMBOL_GPL(olpc_ec_cmd);
 static void __init platform_detect(void)
 {
 	size_t propsize;
-	u32 rev;
+	uint32_t rev;
 
 	if (ofw("getprop", 4, 1, NULL, "board-revision-int", &amp;rev, 4,
 			&amp;propsize) || propsize != 4) {
-		printk(KERN_ERR "ofw: getprop call failed!\n");
+		printk(KERN_ERR "olpc:  ofw getprop call failed!\n");
 		rev = 0;
 	}
 	olpc_platform_info.boardrev = be32_to_cpu(rev);
@@ -207,26 +207,43 @@ static void __init platform_detect(void)
 }
 #endif
 
-static int __init olpc_init(void)
+static int __init ofw_detect(void)
 {
-	unsigned char *romsig;
+	size_t propsize;
+	char romsig[20];
+	ofw_phandle phandle;
 
-	spin_lock_init(&amp;ec_lock);
+	/* Fetch /openprom/model */
+	if (ofw("finddevice", 1, 1, "/openprom", &amp;phandle) || phandle == ~0)
+		return -ENODEV;
 
-	romsig = ioremap(0xffffffc0, 16);
-	if (!romsig)
-		return 0;
+	if (ofw("getprop", 4, 1, phandle, "model", &amp;romsig, sizeof(romsig),
+			&amp;propsize) || propsize &lt; 7)
+		return -ENODEV;
 
+	/* String should look something like "CL1   Q2D08  Q2D" */
 	if (strncmp(romsig, "CL1   Q", 7))
-		goto unmap;
+		return -ENODEV;
 	if (strncmp(romsig+6, romsig+13, 3)) {
-		printk(KERN_INFO "OLPC BIOS signature looks invalid.  "
+		printk(KERN_INFO "olpc:  BIOS signature looks invalid.  "
 				"Assuming not OLPC\n");
-		goto unmap;
+		return -ENODEV;
 	...
To: Andrew Morton <akpm@...>
Cc: Joseph Fannin <jfannin@...>, <linux-kernel@...>, Ingo Molnar <mingo@...>, <jordan.crouse@...>, Mitch Bradley <wmb@...>
Date: Saturday, April 19, 2008 - 1:39 pm

This adds 32-bit support for calling into OFW from the kernel.  It's useful
for querying the firmware for misc hardware information, fetching the device
tree, etc.

There's potentially no reason why other platforms couldn't use this, but
currently OLPC is the main user of it.

This work was originally done by Mitch Bradley.

Signed-off-by: Andres Salomon &lt;dilinger@debian.org&gt;
---
 arch/x86/Kconfig          |    8 +++++
 arch/x86/kernel/Makefile  |    1 +
 arch/x86/kernel/head_32.S |   27 ++++++++++++++++
 arch/x86/kernel/ofw.c     |   75 +++++++++++++++++++++++++++++++++++++++++++++
 include/asm-x86/ofw.h     |   50 ++++++++++++++++++++++++++++++
 include/asm-x86/setup.h   |    1 +
 6 files changed, 162 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/kernel/ofw.c
 create mode 100644 include/asm-x86/ofw.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3b9089b..ce56105 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -661,6 +661,14 @@ config I8K
 	  Say Y if you intend to run this kernel on a Dell Inspiron 8000.
 	  Say N otherwise.
 
+config OPEN_FIRMWARE
+	bool "Support for Open Firmware"
+	default y if OLPC
+	---help---
+	  This option adds support for the implementation of Open Firmware
+	  that is used on the OLPC XO laptop.
+	  If unsure, say N here.
+
 config X86_REBOOTFIXUPS
 	def_bool n
 	prompt "Enable X86 board specific fixups for reboot"
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 9575754..d33600e 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -54,6 +54,7 @@ obj-$(CONFIG_X86_TRAMPOLINE)	+= trampoline_$(BITS).o
 obj-$(CONFIG_X86_MPPARSE)	+= mpparse_$(BITS).o
 obj-$(CONFIG_X86_LOCAL_APIC)	+= apic_$(BITS).o nmi_$(BITS).o
 obj-$(CONFIG_X86_IO_APIC)	+= io_apic_$(BITS).o
+obj-$(CONFIG_OPEN_FIRMWARE)	+= ofw.o
 obj-$(CONFIG_X86_REBOOTFIXUPS)	+= reboot_fixups_32.o
 obj-$(CONFIG_KEXEC)		+= machine_kexec_$(BITS).o
 obj-$(CONFIG_KEXEC)		+= relocate_kernel_$(BITS).o crash.o
diff ...
To: Andres Salomon <dilinger@...>, H. Peter Anvin <hpa@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>
Cc: Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>, Mitch Bradley <wmb@...>
Date: Sunday, April 20, 2008 - 6:34 am

how about changing to ofw_32.c?

YH
--
To: Yinghai Lu <yhlu.kernel@...>
Cc: Andres Salomon <dilinger@...>, H. Peter Anvin <hpa@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Sunday, April 20, 2008 - 11:09 pm

Is your suggestion to change the filename from "ofw.c" to "ofw_32.c"?  
That seems like a good idea to me.

--
To: Mitch Bradley <wmb@...>
Cc: Andres Salomon <dilinger@...>, H. Peter Anvin <hpa@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Sunday, April 20, 2008 - 11:15 pm

Yes.

BTW,  why olpc need OFW runtime service?
why not just put the info in in ram with some signiture, so
kernel/util just need to loot at the table if needed?

YH
--
To: Yinghai Lu <yhlu.kernel@...>
Cc: Andres Salomon <dilinger@...>, H. Peter Anvin <hpa@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 12:05 am

In SPARC land, at least on SunOS and Solaris, it was very convenient for 
debugging to interrupt the OS with Stop-A and use OFW to inspect the 
system state.  That was especially handy for live crash analysis.  Dumps 
are useful as far as they go, but they often fail to capture detailed 
I/O device state.

I was hoping to do that on x86 too.  So far we (OLPC) haven't 
implemented a sysrq hook to enter OFW, but I haven't given up hope yet.  
It doesn't cost much to leave OFW around, but once you decide to eject 
it, you can't easily get it back.

Apple made the early decision to eject OFW and just keep a device tree 
table.  That decision was probably due to several factors, including the 
rather lame state of Apple's first OFW implementation and the complexity 
of their OS startup process at the time (which included "trampolining" 
to a 68000 emulator to run their legacy code).  Once they went down that 
path, the die was cast, and the PowerPC community got used to the "OFW 
--
To: Mitch Bradley <wmb@...>
Cc: Yinghai Lu <yhlu.kernel@...>, H. Peter Anvin <hpa@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 10:24 am

On Sun, 20 Apr 2008 18:05:26 -1000

I'm not actually convinced that we *do* want to keep OFW resident in memory,
especially given the memory tricks we need to play.  I also don't actually
like the OFW interface that we.  The debugging aspect of it was a
compelling argument up until a week ago (when kernel debuggers started
finally finding their way into the kernel).

However, until we clean up the promfs stuff, there's no chance of getting


-- 
Need a kernel or Debian developer?  Contact me, I'm looking for contracts.
--
To: Andres Salomon <dilinger@...>
Cc: Mitch Bradley <wmb@...>, Yinghai Lu <yhlu.kernel@...>, H. Peter Anvin <hpa@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 11:54 am

I don't actually think that the debugging aspect was _ever_ a compelling
argument. It might have made it theoretically possible for _Mitch_ to
debug kernel problems, should he be inclined to do so -- but for the
rest of us mere mortals it's just a PITA trying to keep OpenFirmware

I see no reason why we shouldn't be able to create a 'flattened'
device-tree during early boot, like the PowerPC kernel does. And use it
thereafter, having quiesced OpenFirmware. Haven't we already been
working on unifying this between SPARC and PowerPC kernels?

I definitely don't think we need to play these tricks to keep
OpenFirmware resident while the kernel is running. Take a look at your
second patch -- it's _all_ just lookups in the device-tree, and you're
inventing a new way to do it instead of using the existing one.

-- 
dwmw2

--
To: David Woodhouse <dwmw2@...>
Cc: Andres Salomon <dilinger@...>, Mitch Bradley <wmb@...>, Yinghai Lu <yhlu.kernel@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 12:57 pm

If so, would this apply to OLPC as well?

	-hpa
--
To: H. Peter Anvin <hpa@...>
Cc: Andres Salomon <dilinger@...>, Mitch Bradley <wmb@...>, Yinghai Lu <yhlu.kernel@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 2:54 pm

Yes. The 'second patch' to which I refer is the one which makes OLPC
platform code use the calls in OpenFirmware... all of them gratuitous.

-- 
dwmw2

--
To: David Woodhouse <dwmw2@...>
Cc: Mitch Bradley <wmb@...>, Yinghai Lu <yhlu.kernel@...>, H. Peter Anvin <hpa@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 1:03 pm

On Mon, 21 Apr 2008 16:54:13 +0100

Quite simply, it's a lot more work (*and* we have to play nice w/
sparc and ppc).  I had intended to eventually do it, but first I wanted
to get this stuff in for 2.6.26 so that we could at least boot upstream
kernels on XOs.

I was also hoping to not get into this conversation, but alas.. too


-- 
Need a kernel or Debian developer?  Contact me, I'm looking for contracts.
--
To: Andres Salomon <dilinger@...>
Cc: Mitch Bradley <wmb@...>, Yinghai Lu <yhlu.kernel@...>, H. Peter Anvin <hpa@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 3:18 pm

It's only more work because we did it the wrong way in the first place.
If only someone had pointed it out at the time... :)

For interaction with device-tree properties in generic code, you should
be using the functions defined in &lt;linux/of.h&gt;.

Creating the static device-tree before we quiesce OpenFirmware surely

Is it only the things in your second patch which need to be made to
work? One of them was already working, by grubbing around in the BIOS
directly -- so all we need is the board revision, isn't it? Can we get
that from the EC for now?

-- 
dwmw2

--
To: David Woodhouse <dwmw2@...>
Cc: Mitch Bradley <wmb@...>, Yinghai Lu <yhlu.kernel@...>, H. Peter Anvin <hpa@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 3:46 pm

On Mon, 21 Apr 2008 20:18:11 +0100

Yes, and if only we had an infinite number of kernel hackers who had time


We're not adding a device tree right now, we're adding a method for
querying OFW for information.  Eventually that information should be
obtained from a device tree.  However, that's going to take additional time,
and I'd like to get rid of some of these patches that we've been carrying


Well, no, it wasn't already working; that's the reason this whole
thread started.  It was crashing someone's machine.  That's why the OFW
interface, as imperfect as it is, is an _improvement_.



-- 
Need a kernel or Debian developer?  Contact me, I'm looking for contracts.
--
To: Andres Salomon <dilinger@...>
Cc: Mitch Bradley <wmb@...>, Yinghai Lu <yhlu.kernel@...>, H. Peter Anvin <hpa@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 4:25 pm

You're proposing a new interface between bootloader and kernel as a
temporary hack just to work around that until we fix it properly?

That seems like overkill to me. I'd just go for is_geode() as you
suggested, and maybe PCI configuration tricks to detect the lack of VSA
so we can be _fairly_ sure it's OLPC before we poke at it?

Or why not try '!page_is_ram(0xffffffc0 &gt;&gt; PAGE_SHIFT)' if it's just to
avoid that particular warning? :)

-- 
dwmw2

--
To: David Woodhouse <dwmw2@...>
Cc: Mitch Bradley <wmb@...>, Yinghai Lu <yhlu.kernel@...>, H. Peter Anvin <hpa@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 5:02 pm

On Mon, 21 Apr 2008 21:25:17 +0100


Okay, does anyone have a problem with this?

    




The OFW sig check requires an ioremap that is dangerous on non-OLPC
systems.  Long term, we should be getting the signature from the
device tree (/openprom/model), but for right now just limit the
check to only run on a subset of Geode (GX2/LX) systems.

Signed-off-by: Andres Salomon &lt;dilinger@debian.org&gt;
---
 arch/x86/kernel/olpc.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/olpc.c b/arch/x86/kernel/olpc.c
index 11670be..3e66722 100644
--- a/arch/x86/kernel/olpc.c
+++ b/arch/x86/kernel/olpc.c
@@ -211,6 +211,10 @@ static int __init olpc_init(void)
 {
 	unsigned char *romsig;
 
+	/* The ioremap check is dangerous; limit what we run it on */
+	if (!is_geode() || geode_has_vsa2())
+		return 0;
+
 	spin_lock_init(&amp;ec_lock);
 
 	romsig = ioremap(0xffffffc0, 16);
-- 
1.5.4.4


-- 
Need a kernel or Debian developer?  Contact me, I'm looking for contracts.
--
To: Andres Salomon <dilinger@...>
Cc: David Woodhouse <dwmw2@...>, Mitch Bradley <wmb@...>, Yinghai Lu <yhlu.kernel@...>, H. Peter Anvin <hpa@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 28, 2008 - 11:06 pm

geode_has_vsa2() is a fairly expensive-looking function and afacit only
needs to be evaluated once per boot.  Perhaps we should cache it somewhere?

--
To: Andrew Morton <akpm@...>
Cc: H. Peter Anvin <hpa@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Tuesday, April 29, 2008 - 1:32 am

On Mon, 28 Apr 2008 20:06:51 -0700

How about this?






This moves geode_has_vsa2 into a .c file, caches the result we get from
the VSA virtual registers, and causes the function to no longer be inline.

Signed-off-by: Andres Salomon &lt;dilinger@debian.org&gt;
---
 arch/x86/kernel/geode_32.c |   19 +++++++++++++++++++
 include/asm-x86/geode.h    |   11 +----------
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/geode_32.c b/arch/x86/kernel/geode_32.c
index 9dad6ca..1cb8225 100644
--- a/arch/x86/kernel/geode_32.c
+++ b/arch/x86/kernel/geode_32.c
@@ -161,6 +161,25 @@ void geode_gpio_setup_event(unsigned int gpio, int pair, int pme)
 }
 EXPORT_SYMBOL_GPL(geode_gpio_setup_event);
 
+static int has_vsa2 = -1;
+
+int geode_has_vsa2(void)
+{
+	if (has_vsa2 == -1) {
+		/*
+		 * The VSA has virtual registers that we can query for a
+		 * signature.
+		 */
+		outw(VSA_VR_UNLOCK, VSA_VRC_INDEX);
+		outw(VSA_VR_SIGNATURE, VSA_VRC_INDEX);
+
+		has_vsa2 = (inw(VSA_VRC_DATA) == VSA_SIG);
+	}
+
+	return has_vsa2;
+}
+EXPORT_SYMBOL_GPL(geode_has_vsa2);
+
 static int __init geode_southbridge_init(void)
 {
 	if (!is_geode())
diff --git a/include/asm-x86/geode.h b/include/asm-x86/geode.h
index 7154dc4..8a53bc8 100644
--- a/include/asm-x86/geode.h
+++ b/include/asm-x86/geode.h
@@ -185,16 +185,7 @@ static inline int is_geode(void)
 	return (is_geode_gx() || is_geode_lx());
 }
 
-/*
- * The VSA has virtual registers that we can query for a signature.
- */
-static inline int geode_has_vsa2(void)
-{
-	outw(VSA_VR_UNLOCK, VSA_VRC_INDEX);
-	outw(VSA_VR_SIGNATURE, VSA_VRC_INDEX);
-
-	return (inw(VSA_VRC_DATA) == VSA_SIG);
-}
+extern int geode_has_vsa2(void);
 
 /* MFGPTs */
 
-- 
1.5.5

--
To: Andres Salomon <dilinger@...>
Cc: <hpa@...>, <mingo@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Tuesday, April 29, 2008 - 4:35 pm

On Tue, 29 Apr 2008 01:32:13 -0400

Looks sane.  Although one wonders if it should be cached as one of the
standard x86 feature bit thingies which show up in /proc/cpuinfo's 'flags'

nit:

--- a/arch/x86/kernel/geode_32.c
+++ a/arch/x86/kernel/geode_32.c
@@ -161,10 +161,10 @@ void geode_gpio_setup_event(unsigned int
 }
 EXPORT_SYMBOL_GPL(geode_gpio_setup_event);
 
-static int has_vsa2 = -1;
-
 int geode_has_vsa2(void)
 {
+	static int has_vsa2 = -1;
+
 	if (has_vsa2 == -1) {
 		/*
 		 * The VSA has virtual registers that we can query for a

--
To: Andrew Morton <akpm@...>
Cc: <hpa@...>, <mingo@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Tuesday, April 29, 2008 - 4:57 pm

On Tue, 29 Apr 2008 13:35:12 -0700

The VSA lives in a weird place between hardware and BIOS.  I'm not
really sure whether it's appropriate for it to be an x86_cap_flags (it
hadn't occurred to me), but I think of it more as BIOS.  Jordan, what do

Looks good.



--
To: Andres Salomon <dilinger@...>
Cc: Mitch Bradley <wmb@...>, Yinghai Lu <yhlu.kernel@...>, H. Peter Anvin <hpa@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 5:17 pm

That looks saner to me for now.

Acked-By: David Woodhouse &lt;dwmw2@infradead.org&gt;

-- 
dwmw2

--
To: Andres Salomon <dilinger@...>
Cc: David Woodhouse <dwmw2@...>, Mitch Bradley <wmb@...>, Yinghai Lu <yhlu.kernel@...>, H. Peter Anvin <hpa@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>
Date: Monday, April 21, 2008 - 5:17 pm

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.

--
To: Mitch Bradley <wmb@...>
Cc: Andres Salomon <dilinger@...>, H. Peter Anvin <hpa@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 12:50 am

geode is using SMI to simulate the pci conf space, wonder that could be problem.

later you have 64 runtime service for 64 platform like UEFI?

YH
--
To: Yinghai Lu <yhlu.kernel@...>
Cc: Andres Salomon <dilinger@...>, H. Peter Anvin <hpa@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 4:03 am

On the current OLPC system, we don't use the SMI-based PCI config space 
simulator.  The code for that "VSA" module is only partially open 
sourced (some of it is open, and some of it is just not available).  The 
parts of it for which we do have source can only be compiled with an old 
proprietary toolchain that is no longer available.

Instead of using the SMI-based simulation, we have added a PCI 
configuration access method in the kernel that supplies the necessary 
information from a table.  The code for that hardware-specific access 
method is roughly 40 lines of code plus a few data tables.

In the past few weeks, I have developed a rather complete Open 
Firmware-based reimplementation of the SMI PCI config hardware 
emulator.   All-told, it requires over 1000 lines.  It remains to be 
seen whether the complicated version will ultimately be deployed.  
Personally, I find it distasteful to use a lot of code to make the 
hardware pretend that it is something other than what it really is, when 
a much smaller driver works just as well.  The SMI-based emulator is 
quite difficult to understand and maintain, because the Geode SMI 
handling mechanism is complex, incompletely documented, and suffers from 
many of the multiple-mode-switches problems as real-mode to 

Possibly.   64-bit systems are not a problem per se - there have been 
64-bit OFW implementations for 64-bit architectures like SPARC and Alpha 
dating back to a long time ago.  The main issue from my point of view is 
--
To: <wmb@...>
Cc: <yhlu.kernel@...>, <dilinger@...>, <hpa@...>, <ebiederm@...>, <mingo@...>, <akpm@...>, <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 12:26 am

From: Mitch Bradley &lt;wmb@firmworks.com&gt;

In most current SPARC systems, OFW is not usable and is completely
forgotten right after bootup in order to accomodate LDOMs and CPU
hotplug.

It's a better idea, anyways, to develop more pervasive and usable
in-kernel debugger facilities.  Then it doesn't matter if you have
"cool" firmware or not. :-)

--
To: Yinghai Lu <yhlu.kernel@...>
Cc: Andres Salomon <dilinger@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>, Mitch Bradley <wmb@...>
Date: Sunday, April 20, 2008 - 8:07 am

Hm.  This interface seems more than a bit ad hoc.  In particular, I 
*really* don't like the swapper_pg_dir hack.

"There must be a better way."

	-hpa
--
To: H. Peter Anvin <hpa@...>
Cc: Yinghai Lu <yhlu.kernel@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>, Mitch Bradley <wmb@...>
Date: Sunday, April 20, 2008 - 1:59 pm

On Sun, 20 Apr 2008 08:07:55 -0400

I'm certainly open to suggestions..  Otherwise, I'll poke around and
see if I can come up w/ something.



-- 
Need a kernel or Debian developer?  Contact me, I'm looking for contracts.
--
To: Andres Salomon <dilinger@...>
Cc: Yinghai Lu <yhlu.kernel@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>, Mitch Bradley <wmb@...>
Date: Sunday, April 20, 2008 - 3:13 pm

It pretty much depends on what the invariants look like.  The 
normal/clean way of doing this kind of thing is via a fixmap entry 
and/or ioremap.

	-hpa
--
To: Andres Salomon <dilinger@...>
Cc: H. Peter Anvin <hpa@...>, Yinghai Lu <yhlu.kernel@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Sunday, April 20, 2008 - 2:42 pm

The x86 architecture doesn't make this problem easy.

The conventional solution is to have the BIOS operate in real mode.  
When the kernel calls into the BIOS, it has to do a grotesque dance that 
involves jumping through a chain of several segments of different 
flavors, thus gradually shutting down the multi-tiered address 
translation mechanism.  Then, if the BIOS is actually operating in 
protected mode (which is necessary if it is larger than 64K, as all 
modern BIOSes are), it has to perform the inverse process, do the 
requested work, then go back into real mode to return to the kernel.  
The net result is that a "call" into the BIOS involves:

a) Copying the arguments to a real-mode register shadow array
b) Saving all the registers - general ones and a few special ones too
c) Far call to a linear-mapped code segment with an execution address in 
the first 1M of memory
d) Switching to a different stack
e) Turning off page translation
f) Switching from protected mode to real mode (or in some cases, V86 
mode instead, which requires an additional Task State Segment dance to 
set the IO permission mask)
g) Switching to a real-mode interrupt descriptor table

h) Executing an INT instruction

I) Performing the inverse of a - g inside the BIOS

j) Doing the requested work

K) Performing a - g again to get back into real mode

l) Executing an "iret" instruction

M) Performing the inverse of a-g to return to normal operation

The machinery that you need to do all that is predictably complex - 
extra segment descriptors that are set up just-so, several little code 
fragments that must be at special addresses in the first meg, additional 
stacks, a real-mode interrupt table at a fixed address, and several data 
save arrays.  That machinery has to be in assembly language, spanning 
several different instruction set modes.

Compared to that, I think that sharing one or two page directory entries 
at the very top of the virtual address space is pretty clean and 
simple...
To: Mitch Bradley <wmb@...>
Cc: Andres Salomon <dilinger@...>, Yinghai Lu <yhlu.kernel@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Sunday, April 20, 2008 - 3:12 pm

[long rant about the x86 architecture]

It would be more useful if you described the actual defined entry 
conditions from OpenFirmware look like, including if they are 
well-defined for all OF implementations or only for OLPC.

	-hpa
--
To: H. Peter Anvin <hpa@...>
Cc: Andres Salomon <dilinger@...>, Yinghai Lu <yhlu.kernel@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Sunday, April 20, 2008 - 11:39 pm

Fair enough...

To get the second subquestion out of the way:  At the present time, on 
the x86 architecture, "all OF implementations" and "OLPC" are 
effectively the same.  I am unaware of any other x86 OFW deployments in 
current use.  There have been some in the past, on bespoke systems such 
as Network Appliance servers and at least one settop box, but those have 
fallen by the wayside as those companies have shifted over to commodity 
PC hardware.  The current market status quo is that x86 boards are 
primarily designed for Windows, and thus must run legacy BIOS, with some 
recent migration to EFI, neither of which are open source in the strong 
sense.  While I would like to see more OFW penetration into the larger 
x86 market, I don't really expect it.  x86 motherboard manufacturing is 
becoming more and more difficult as signal speeds increase, leading to a 
decline in the number of manufacturers.  The existing manufacturers 
depend on Windows for sales volume and their internal procedures and 
working knowledge are based on legacy BIOS.

Once upon a time, we had an OFW "binding" document that stipulated the 
interface conditions, with the intention of making that "standard" 
across all OFW-on-x86 systems.  However, by the time OLPC came around, 
there were no other systems to consider, so I felt free to make some 
changes in the interface.  I ended up choosing an ABI that resulted in a 
simple (in the sense of not much code, and no complex state transitions) 
interface with 2.6 Linux kernels.

The interface defined below is not inherently OLPC-specific - it would 
be suitable for any ia32 system that used OFW.  (At a higher level, the 
set of OFW callback functions is architecture-neutral; in this message I 
am focusing on the very low-level details of the ia32 ABI)

The system conditions for the OFW to Linux kernel transition are as follows:

a) OFW can load the Linux kernel from either bzimage format or ELF 
format (either uncompressed or zlib-compressed.)  If the ker...
To: Mitch Bradley <wmb@...>
Cc: H. Peter Anvin <hpa@...>, Andres Salomon <dilinger@...>, Yinghai Lu <yhlu.kernel@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>
Date: Monday, April 21, 2008 - 11:05 am

/me puts on his coreboot hat

This is off topic slightly, but let it be known that the coreboot project
considers OFW a very valid option for x86 platforms.  A kernel that
worked happily with OFW would greatly encourage people to adopt it in
lieu of other BIOS / firmware solutions.

I return you to your previously scheduled debate.

Jordan

--
To: Jordan Crouse <jordan.crouse@...>
Cc: Mitch Bradley <wmb@...>, Andres Salomon <dilinger@...>, Yinghai Lu <yhlu.kernel@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>
Date: Monday, April 21, 2008 - 10:58 am

The interface they are proposing is definitely not suitable for upward 
extension, for the reasons already mentioned.  However, they have units 
in the field, and the amount of changes required to support another 
interface should be relatively minor.

Hence my insistence that we don't promote it as *the* OFW interface, but 
*a* OFW interface.

	-hpa
--
To: Mitch Bradley <wmb@...>
Cc: Andres Salomon <dilinger@...>, Yinghai Lu <yhlu.kernel@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 7:36 am

So let me see here... you want the virtual address range [0xffc00000, 
0xfff00000) to be reserved for OFW, and you are prohibiting the kernel 


I do not like it, simply because it amounts to "initialize this 
otherwise zero-initialized piece of data without making any kind of 
reservations and blindly hope nothing else overwrites it."

I'm also troubled with the assumption that the kernel doesn't use PAE. 
I realize that this is not an issue for OLPC, but it certainly makes 
this a less-than-generic solution.

Having mapped page table entries which are not under kernel control is a 
very serious problem for PAT - PAT requires, by hardware specification, 
the kernel to eliminate all potential aliases with different mappings.

One way to deal with this, of course, is to save the firmware-provided 
PGD and only use it for OFW calls.  On the other hand, perhaps a better 
questions is to what extent it is needed at all.

Furthermore, since you're using a nonstandard OFW interface (not 
compliant with the x86 OFW binding document), all of this should be 
called something like OLPC_OFW to make it clear that it's the OLPC variant.

If I had designed this, I would probably have used an SMI; since you 
have control over the firmware you can do that.  SMI saves the entire 
machine state including all the modes, cleans them all up for you, and 
puts it all back together at RSM time.  It is slow, of course, but it 
completely decouples the firmware and the OS, which is why it's used.

	-hpa

--
To: Mitch Bradley <wmb@...>
Cc: Andres Salomon <dilinger@...>, Yinghai Lu <yhlu.kernel@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 9:19 am

Okay, stepping back a few steps, it's pretty clear that most of my 
objections aren't really an issue for Geode/OLPC; however, I *really* 
don't want others to pick it up as being "the" Open Firmware interface.

Within those constraints it makes sense to set up the PDEs in 
swapper_pg_dir and let them propagate using the normal mechanisms.

** This is assuming that your OF interface does not rely on a 1:1 
mapping of low memory being present at the time it makes a call.  If it 
*does*, then a separate page directory needs to be maintained for the OF 
class. **


Thus, I'm willing to accept this with these changes:

- Please name things specific to the interface (as opposed to Open 
Firmware in general, like the device tree) olpc_ofw or olpcfw, to denote 
that this is an OLPC-specific interface.  Thus, 
CONFIG_OLPC_OPEN_FIRMWARE or something along those lines.

- Make it explicit in Kconfig that OLPC_OPEN_FIRMWARE conflicts with 
X86_PAE, 64BIT, or X86_PAT.

- Change VMALLOC_END in include/asm-x86/pgtable_32.h so the kernel will 
know to avoid this virtual memory range.

- Add a memory region to arch/x86/mm/dump_tabletables.c.

	-hpa
--
To: Mitch Bradley <wmb@...>
Cc: Andres Salomon <dilinger@...>, Yinghai Lu <yhlu.kernel@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 9:13 am

Okay, stepping back a few steps, it's pretty clear that most of my 
objections aren't really an issue for Geode/OLPC; however, I *really* 
don't want others to pick it up as being "the" Open Firmware interface.

Within those constraints it makes sense to set up the PDEs in 
swapper_pg_dir and let them propagate using the normal mechanisms.

** This is assuming that your OF interface does not rely on a 1:1 
mapping of low memory being present at the time it makes a call.  If it 
*does*, then a separate page directory needs to be maintained for the OF 
class. **


Thus, I'm willing to accept this with these changes:

- Please name things specific to the interface (as opposed to Open 
Firmware in general, like the device tree) olpc_ofw or olpcfw, to denote 
that this is an OLPC-specific interface.  Thus, 
CONFIG_OLPC_OPEN_FIRMWARE or something along those lines.

- Make it explicit in Kconfig that OLPC_OPEN_FIRMWARE conflicts with 
X86_PAE, 64BIT, or X86_PAT.

- Change VMALLOC_END in include/asm-x86/pgtable_32.h so the kernel will 
know to avoid this virtual memory range.

- Add a memory region to arch/x86/mm/dump_tabletables.c.

	-hpa
--
To: Mitch Bradley <wmb@...>
Cc: Andres Salomon <dilinger@...>, Yinghai Lu <yhlu.kernel@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 9:09 am

Okay, stepping back a few steps, it's pretty clear that most of my 
objections aren't really an issue for Geode/OLPC; however, I *really* 
don't want others to pick it up as being "the" Open Firmware interface.

Within those constraints it makes sense to set up the PDEs in 
swapper_pg_dir and let them propagate using the normal mechanisms.

** This is assuming that your OF interface does not rely on a 1:1 
mapping of low memory being present at the time it makes a call.  If it 
*does*, then a separate page directory needs to be maintained for the OF 
class. **


Thus, I'm willing to accept this with these changes:

- Please name things specific to the interface (as opposed to Open 
Firmware in general, like the device tree) olpc_ofw or olpcfw, to denote 
that this is an OLPC-specific interface.  Thus, 
CONFIG_OLPC_OPEN_FIRMWARE or something along those lines.

- Make it explicit in Kconfig that OLPC_OPEN_FIRMWARE conflicts with 
X86_PAE, 64BIT, or X86_PAT.

- Change VMALLOC_END in include/asm-x86/pgtable_32.h so the kernel will 
know to avoid this virtual memory range.

- Add a memory region to arch/x86/mm/dump_tabletables.c.

	-hpa
--
To: Mitch Bradley <wmb@...>
Cc: H. Peter Anvin <hpa@...>, Andres Salomon <dilinger@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 12:54 am

so you are assuming that your uncompressed vmlinux only use less 8M space?

you are supposed to check the bzImage to get uncompressed vmlinux size.

YH
--
To: Yinghai Lu <yhlu.kernel@...>
Cc: H. Peter Anvin <hpa@...>, Andres Salomon <dilinger@...>, Eric W. Biederman <ebiederm@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Joseph Fannin <jfannin@...>, <linux-kernel@...>, <jordan.crouse@...>
Date: Monday, April 21, 2008 - 4:22 am

The 0x800000 ramdisk load address is an OLPC-specific firmware 
implementation detail that could easily be changed without affecting 
anything else. I probably shouldn't have mentioned it because it isn't 
really an integral part of the interface "contract".

I certainly hope that the OLPC kernel never gets anywhere near that 
size.  The OLPC hardware has limited configurability, so it's not 
plausible that the kernel would grow that large to include a huge kit of 
drivers.  If the kernel file becomes large as a result of including the 
initramfs in the same file, the 0x800000 ramdisk load address won't 
apply (because there won't be a separate load of the initramfs file), so 
the kernel could be extend way past that boundary with no problems.

If we get to the point where we do need huge kernels on OLPC, we can 
release a firmware upgrade along with the new OS.  We have mechanisms 
for coordinating firmware and OS upgrades.

If a new customer for OFW on x86 appears, I'll remember to float the 
boundary above the bzImage uncompressed size (assuming that the bzimage 
--
To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>
Date: Friday, April 18, 2008 - 10:25 pm

I've been seeing the following backtrace since (I think)
2.6.25-rc8-mm2.

I'm sending multiple reports vs. 2.6.25-mm1, so I'm putting the dmesg
and .config on a server:

http://home.columbus.rr.com/jfannin3/dmesg.txt
http://home.columbus.rr.com/jfannin3/config-2.6.25-mm1.txt

[  842.795144] hm, dftrace overflow: 265 changes (0 total) in 428 usecs
[  842.795182] ------------[ cut here ]------------
[  842.795192] WARNING: at kernel/trace/ftrace.c:658 ftraced+0x1a4/0x1b0()
[  842.795200] Modules linked in: af_packet rfcomm l2cap bluetooth ppdev ipv6 cpufreq_conservative cpufreq_stats cpufreq_userspace cpufreq_powersave video output wmi pci_slot container dock sbs sbshcbattery iptable_filter ip_tables x_tables ext2 ac lp loop snd_via82xx gameport snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_mpu401_uart snd_seq_dummy snd_seq_oss snd_seq_midi psmouse snd_rawmidi serio_raw snd_seq_midi_event snd_seq button i2c_viapro snd_timer snd_seq_device pcspkr i2c_core snd snd_page_alloc via686a shpchp pci_hotplug parport_pc parport via_agp agpgart soundcore evdev sg sr_mod cdrom sd_mod 8139cp aic7xxx scsi_transport_spi scsi_mod 8139too mii uhci_hcd usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod thermal processor fan fuse ext4dev mbcache jbd2 crc16
[  842.795470] Pid: 13, comm: ftraced Tainted: G        W 2.6.25-mm1 #7
[  842.795497]  [&lt;c0130fa9&gt;] warn_on_slowpath+0x59/0x80
[  842.795541]  [&lt;c013244f&gt;] ? vprintk+0x33f/0x4a0
[  842.795589]  [&lt;c0155216&gt;] ? trace_hardirqs_on_caller+0x16/0x150
[  842.795622]  [&lt;c0354eb0&gt;] ? __mutex_lock_common+0x2b0/0x3c0
[  842.795667]  [&lt;c0155216&gt;] ? trace_hardirqs_on_caller+0x16/0x150
[  842.795688]  [&lt;c015535b&gt;] ? trace_hardirqs_on+0xb/0x10
[  842.795709]  [&lt;c017e4d0&gt;] ? __ftrace_update_code+0x0/0x110
[  842.795730]  [&lt;c017e9f0&gt;] ? ftraced+0x0/0x1b0
[  842.795746]  [&lt;c01325d0&gt;] ? printk+0x20/0x30
[  842.795764]  [&lt;c017e9f0&gt;] ? ftraced+0x0/0x1b0
[  8...
To: Joseph Fannin <jfannin@...>
Cc: <linux-kernel@...>, Steven Rostedt <rostedt@...>, Ingo Molnar <mingo@...>
Date: Friday, April 18, 2008 - 11:08 pm

Seen plenty of them - I think Greg today dropped the offending patch(es).

[  451.915553] sysfs: duplicate filename 'pcspkr' can not be created


I haven't seen that one before.
--
To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>
Date: Friday, April 18, 2008 - 10:13 pm

I've been seeing the following backtraces since 2.6.25-rc8-mm1 -- at
least, since that's the earliest -mm I've built in a while.  I don't
get the same in mainline.

No idea who to CC:  I've sat on this report long enough.

I'm going to send a few different reports in separate mails, so I'll
put my dmesg and .config up on a server:

http://home.columbus.rr.com/jfannin3/dmesg.txt
http://home.columbus.rr.com/jfannin3/config-2.6.25-mm1.txt

[  451.915553] sysfs: duplicate filename 'pcspkr' can not be created
[  451.915731] ------------[ cut here ]------------
[  451.915851] WARNING: at fs/sysfs/dir.c:427 sysfs_add_one+0x85/0xe0()
[  451.915981] Modules linked in: snd_pcsp(+) ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_mpu401_uart snd_seq_dummy snd_seq_oss snd_seq_midi psmouse snd_rawmidi serio_raw snd_seq_midi_event snd_seq button i2c_viapro snd_timer snd_seq_device pcspkr i2c_core snd snd_page_alloc via686a shpchp pci_hotplug parport_pc parport via_agp agpgart soundcore evdev sg sr_mod cdrom sd_mod 8139cp aic7xxx  scsi_transport_spi scsi_mod 8139too mii uhci_hcd usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod thermal processor fan fuse ext4dev mbcache jbd2 crc16
[  451.918960] Pid: 2740, comm: modprobe Tainted: G        W 2.6.25-mm1 #7
[  451.929271]  [&lt;c0130fa9&gt;] warn_on_slowpath+0x59/0x80
[  451.929500]  [&lt;c0132400&gt;] ? vprintk+0x2f0/0x4a0
[  451.929723]  [&lt;c0356adc&gt;] ? _spin_unlock+0x2c/0x50
[  451.929918]  [&lt;c01c6a7a&gt;] ? ifind+0x4a/0xa0
[  451.930126]  [&lt;c0155216&gt;] ? trace_hardirqs_on_caller+0x16/0x150
[  451.930334]  [&lt;c015535b&gt;] ? trace_hardirqs_on+0xb/0x10
[  451.930534]  [&lt;c01325d0&gt;] ? printk+0x20/0x30
[  451.930727]  [&lt;c01fcc45&gt;] sysfs_add_one+0x85/0xe0
[  451.930900]  [&lt;c01fd89e&gt;] create_dir+0x4e/0xb0
[  451.931064]  [&lt;c01fd930&gt;] sysfs_create_dir+0x30/0x50
[  451.931291]  [&lt;c0356adc&gt;] ? _spin_unlock+0x2c/0x50
[  451.931485]  [&lt;c023dac6&gt;] kobject_add_internal...