ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25/2.6.25-mm1/ - git-xfs is undropped because I finally got around to fixing its clashes with git-vfs. - git-arm-master, git-sparc64 and perhaps others are dropped because they don't generate a clean pull. They might be empty - I didn't check. - git-kvm remains dropped due to clashes with git-s390 and perhaps git-x86. - git-selinux is newly dropped due to memory corruption regressions. - git-nfs is (perhaps permanently) dropped because its content is also in git-nfsd. - git-drm remains reverted due to build failures - Tomorrow I'll do the -mm merge plans email and I'll dump a couple hundred patches on tree maintainers (these have about a 15% yay-he-merged-it rate). Then I'm travelling for a poorly-timed week. I return late in the merge window to find out if any of these patches still apply. Boilerplate: - See the `hot-fixes' directory for any important updates to this patchset. - To fetch an -mm tree using git, use (for example) git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1 git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1 - -mm kernel commit activity can be reviewed by subscribing to the mm-commits mailing list. echo "subscribe mm-commits" | mail majordomo@vger.kernel.org - If you hit a bug in -mm and it is not obvious which patch caused it, it is most valuable if you can perform a bisection search to identify which patch introduced the bug. Instructions for this process are at http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt But beware that this process takes some time (around ten rebuilds and reboots), so consider reporting the bug first and if we cannot immediately identify the faulty patch, then perform the bisection search. - When reporting bugs, please try to Cc: the relevant maintainer and mailing list on any email. - When reporting bugs in ...
Hi, $ ls /usr/share/man/cat3readlin Segmentation fault [the file doesn't exist.] This is probably the same bug as in -rc8-mm2 I reported here: http://www.opensubscriber.com/message/linux-kernel@vger.kernel.org/9008289.html general protection fault: 0000 [1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:19.0/net/eth0/statistics/collisions CPU 0 Modules linked in: test ipv6 tun bitrev arc4 ecb crypto_blkcipher cryptomgr crypto_algapi ath5k mac80211 crc32 sr_mod usbhid ohci1394 rtc_cmos hid rtc_core cfg80211 ieee1394 cdrom ehci_hcd rtc_lib ff_memless floppy evdev Pid: 24838, comm: man Not tainted 2.6.25-mm1_64 #403 RIP: 0010:[<ffffffff802aca27>] [<ffffffff802aca27>] __d_lookup+0x97/0x160 RSP: 0018:ffff8100337d1b98 EFLAGS: 00010206 RAX: 00f0000000000000 RBX: 00f0000000000000 RCX: 0000000000000012 RDX: ffff8100200830e0 RSI: ffff8100337d1ca8 RDI: ffff810079195708 RBP: ffff8100337d1bf8 R08: ffff8100337d1ca8 R09: 0000000000000000 R10: 000000000000013d R11: 0000000000000246 R12: ffff8100200830c8 R13: 00000000198eaed5 R14: ffff810079195708 R15: ffff8100337d1bc8 FS: 00007f447b5c06f0(0000) GS:ffffffff80664000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000001484f88 CR3: 000000005fac4000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process man (pid: 24838, threadinfo ffff8100337d0000, task ffff810034418000) Stack: ffff8100337d1ca8 000000000000000b ffff810079195710 0000000b792561a0 ffff81003136600f ffffffff802f9073 00f0000000000000 0000000000000001 ffff8100337d1e48 ffff8100337d1e48 ffff8100337d1ca8 ffff8100337d1cb8 Call Trace: [<ffffffff802f9073>] ? ext3_lookup+0xc3/0x100 [<ffffffff802a1e85>] do_lookup+0x35/0x220 [<ffffffff802a22c2>] __link_path_walk+0x252/0x1010 [<ffffffff802b20ba>] ? mntput_no_expire+0x2a/0x140 [<ffffffff802a30ee>] path_walk+0x6e/0xe0 [<ffffff...
On Mon, Apr 21, 2008 at 10:31:40AM +0200, Jiri Slaby wrote:
hlist_for_each_entry_rcu(dentry, node, head, d_hash) {
struct qstr *qstr;
if (dentry->d_name.hash != hash)
continue;
walking into node == (struct hlist_node *)0x00f0000000000000...
--Yup, true, In the last oops I stuck on memcmp few lines below. BTW. it's 100% reproducible after it happens once, but fixable by reboot. Any tests I should run (memtest, some printks sticked anywhere)? --
Well, if list has such turd in it, you'll certainly hit it every time you walk that list, so 100% reproducible is not surprising. How well is it reproducible from fresh boot? --
Few days with suspend/resume cycles. This one was booted 12 hours ago, one suspend/resume. Will keep an eye on it and keep you informed. --
Shall we see if we can catch it earlier? I have no idea if this will
help ... I haven't even booted it on a testmachine yet ;-) If I got
something wrong, it'll BUG() pretty early.
diff --git a/include/linux/list.h b/include/linux/list.h
index 75ce2cb..238ca1e 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -724,10 +724,17 @@ static inline int hlist_empty(const struct hlist_head *h)
return !h->first;
}
+#ifdef CONFIG_DEBUG_LIST
+extern void hlist_check(struct hlist_node *n);
+#else
+#define hlist_check(n) do { } while (0)
+#endif
+
static inline void __hlist_del(struct hlist_node *n)
{
struct hlist_node *next = n->next;
struct hlist_node **pprev = n->pprev;
+ hlist_check(n);
*pprev = next;
if (next)
next->pprev = pprev;
@@ -785,6 +792,7 @@ static inline void hlist_replace_rcu(struct hlist_node *old,
{
struct hlist_node *next = old->next;
+ hlist_check(old);
new->next = next;
new->pprev = old->pprev;
smp_wmb();
@@ -840,6 +848,7 @@ static inline void hlist_add_head_rcu(struct hlist_node *n,
static inline void hlist_add_before(struct hlist_node *n,
struct hlist_node *next)
{
+ hlist_check(next);
n->pprev = next->pprev;
n->next = next;
next->pprev = &n->next;
@@ -849,6 +858,7 @@ static inline void hlist_add_before(struct hlist_node *n,
static inline void hlist_add_after(struct hlist_node *n,
struct hlist_node *next)
{
+ hlist_check(next);
next->next = n->next;
n->next = next;
next->pprev = &n->next;
@@ -878,6 +888,7 @@ static inline void hlist_add_after(struct hlist_node *n,
static inline void hlist_add_before_rcu(struct hlist_node *n,
struct hlist_node *next)
{
+ hlist_check(next);
n->pprev = next->pprev;
n->next = next;
smp_wmb();
@@ -906,6 +917,7 @@ static inline void hlist_add_before_rcu(struct hlist_node *n,
static inline void hlist_add_after_rcu(struct hlist_node *prev,
struct hl...I think that's exactly the same problem I reported here: http://lkml.org/lkml/2008/4/20/182 for 2.6.25-git2, so it hit the mainline and seems to be related to RCU. Thanks, Rafael --
Hi, I'm not sure by what was this caused. LANG=en strace -fo strace_gcc.txt gcc -Wp,-MD,drivers/usb/class/.usblp.o.d -nostdinc -isystem /usr/lib64/gcc/x86_64-suse-linux/4.3/include -D__KERNEL__ -Iinclude -Iinclude2 -I/home/l/latest/xxx/include -include include/linux/autoconf.h -I/home/l/latest/xxx/drivers/usb/class -Idrivers/usb/class -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -O2 -fno-stack-protector -m64 -march=core2 -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -I/home/l/latest/xxx/include/asm-x86/mach-default -Iinclude/asm-x86/mach-default -fno-omit-frame-pointer -fno-optimize-sibling-calls -g -Wdeclaration-after-statement -Wno-pointer-sign -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(usblp)" -D"KBUILD_MODNAME=KBUILD_STR(usblp)" /home/l/latest/xxx/drivers/usb/class/usblp.c -S -o usblp.s /home/l/latest/xxx/drivers/usb/class/usblp.c: In function 'usblp_submit_read': /home/l/latest/xxx/drivers/usb/class/usblp.c:977: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See <http://bugs.opensuse.org/> for instructions. strace_gcc.txt: http://www.fi.muni.cz/~xslaby/sklad/strace_gcc.txt preprocessor output available here: http://www.fi.muni.cz/~xslaby/sklad/usblp.E Reboot fixed it. It happened after few suspend/resume cycles. The preproc output differs in no way from after the reboot. Now, the strace looks like: 5341 mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f362e004000 5341 mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f362df04000 5341 brk(0x1964000) = 0x1964000 5341 brk(0x194c000) = 0x194c000 5341 brk(0x...
New, in 2.6.25-mm1 is a hang I'm seeing, just after the kernel prints: "[ 0.160375] NET: Registered protocol family 16" The hang lasts about five minutes, and then boot continues. Just after that, a backtrace is printed; I don't know if it's related. The backtrace will follow. This does not occur in mainline. It seems it might be related to OLPC support -- I enabled all those options -- but that's not good behavior, and I see no warning of thus in the help. I'm sending a number or reports against 2.6.25-mm1, so I've put my dmesg and .config on a server: http://home.columbus.rr.com/jfannin3/dmesg.txt http://home.columbus.rr.com/jfannin3/config-2.6.25-mm1.txt [ 0.160375] NET: Registered protocol family 16 [ 400.782683] ------------[ cut here ]------------ [ 400.782832] WARNING: at arch/x86/mm/ioremap.c:158 __ioremap_caller+0x27d/0x2e0() [ 400.783022] Modules linked in: [ 400.783169] Pid: 1, comm: swapper Not tainted 2.6.25-mm1 #7 [ 400.783300] [<c0130fa9>] warn_on_slowpath+0x59/0x80 [ 400.783480] [<c0106c2e>] ? profile_pc+0x3e/0x50 [ 400.783682] [<c01374ee>] ? irq_exit+0x4e/0xa0 [ 400.783879] [<c0115aec>] ? smp_apic_timer_interrupt+0x5c/0x90 [ 400.784087] [<c024314c>] ? trace_hardirqs_on_thunk+0xc/0x10 [ 400.784298] [<c01552cd>] ? trace_hardirqs_on_caller+0xcd/0x150 [ 400.784506] [<c024314c>] ? trace_hardirqs_on_thunk+0xc/0x10 [ 400.784706] [<c010416c>] ? restore_nocheck_notrace+0x0/0xe [ 400.784906] [<c011d0e6>] ? page_is_ram+0xa6/0xd0 [ 400.785059] [<c011d4ed>] __ioremap_caller+0x27d/0x2e0 [ 400.785221] [<c03569d8>] ? _spin_unlock_irqrestore+0x48/0x80 [ 400.785421] [<c017f4cd>] ? ftrace_record_ip+0x7d/0x250 [ 400.785621] [<c0474801>] ? olpc_init+0x31/0x140 [ 400.785817] [<c011d59f>] ioremap_nocache+0x1f/0x30 [ 400.785976] [<c0474801>] ? olpc_init+0x31/0x140 [ 400.786165] [<c0474801>] olpc_init+0x31/0x140 [ 400.786318] [<c04...
Please add initcall_debug to the kernel boot command line - that should
<looks at this again>
That's
WARN_ON_ONCE(is_ram);
the changelog for the patch which added that warning is information-free
and there's no code comment explaining what went wrong, which makes things
rather harder than they ought to be.
Yes it's due to the new OLPC code. olpc_init() has
romsig = ioremap(0xffffffc0, 16);
which we probably just shouldn't do this at all unless we're running on the
OLPC hardware. But we need to do this to find out if we're running on the OLPC
hardware! Perhaps the warning should just be removed.
--On Fri, 18 Apr 2008 20:29:25 -0700 calling ioremap() on something which COULD be ram is... REALLY nasty. The kernel has to mark that page uncached, for all users and mappings of that memory. A second hard case then is to find out when the last ioremap() user has released that memory (since there's several cases where different parts of the same 4K page can be ioremapped) before it can map it cached again. The good news is that until this olpc patch got in, there were no users of this capability.... Instead of outright forbidding it though we added a warn_on to find out if the assumption of no users was correct... seems it caught some new code which is trying to do this here. this code should probably be a lot more careful and check that 1) there is no actual kernel memory or something else at this region (what if there's some other device there? this code could blow up) 2) the machine won't tripple fault or otherwise throw tantrums if this hardcoded value is accessed (not automatic on x86!!) 3) it only runs if there's a really high degree of confidence that this really is an OLPC device. or maybe 4) get this address from some other table or system provided resource -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
On Fri, 18 Apr 2008 20:29:25 -0700 Hm. We could either protect that code with an: if (!is_geode()) return; Or I could add the OpenFirmware patches which would allow us to get rid of this code, and instead check for the existence of OFW using that. The former is quick and easy; the latter is (imo) nicer, so long as people don't have problems w/ the OFW code. :) -- Need a kernel or Debian developer? Contact me, I'm looking for contracts. --
Do both ;) The quick-n-easy version sounds suitable for now. --
On Sat, 19 Apr 2008 10:38:33 -0700 Heh, I already had sent the nicer version. If people have some fundamental problem w/ it, I can send the quick-n-easy version. -- Need a kernel or Debian developer? Contact me, I'm looking for contracts. --
I prefer the nicer version. It is not a good policy IMHO to wrap OLPC specfic code with is_geode() and friends. Even by Geode standards, we've abused the code greatly for the benefit of the Geode, and few of those abuses would translate very well even to the general Geode community. I would prefer that we use the is_olpc() and #ifdef wrappers to ensure that the code that is exclusively OLPC stays exclusively OLPC. Thanks, Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. --
On Mon, 21 Apr 2008 08:56:19 -0600 Yeah, like I said; the nicer version is the _correct_ way to do things. I just fear that the OFW code isn't ready for merging (see hpa's concerns). The code is already #ifdef'd (the original reporter had enabled CONFIG_OLPC), and the code in question is what determines what is_olpc() should return. is_geode() is just to narrow the scope of what hardware the check runs on. -- Need a kernel or Debian developer? Contact me, I'm looking for contracts. --
My bad, I missed the key points. This still is dangerous on a generic Geode, but at least if they encounter the problem, we can loudly proclaim "Don't do that". Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. --
Prior to including OFW kernel support, we had to work around the lack of
OFW. Once OFW support is added, we can switch to using it. This cleans
up some pre-OFW model detection and OFW signature detection.
Note: this should be a bit nicer to non-OLPC hardware.
Signed-off-by: Andres Salomon <dilinger@debian.org>
---
arch/x86/kernel/olpc.c | 43 +++++++++++++++++++++++++++++--------------
1 files changed, 29 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kernel/olpc.c b/arch/x86/kernel/olpc.c
index 11670be..3a05683 100644
--- a/arch/x86/kernel/olpc.c
+++ b/arch/x86/kernel/olpc.c
@@ -190,11 +190,11 @@ EXPORT_SYMBOL_GPL(olpc_ec_cmd);
static void __init platform_detect(void)
{
size_t propsize;
- u32 rev;
+ uint32_t rev;
if (ofw("getprop", 4, 1, NULL, "board-revision-int", &rev, 4,
&propsize) || propsize != 4) {
- printk(KERN_ERR "ofw: getprop call failed!\n");
+ printk(KERN_ERR "olpc: ofw getprop call failed!\n");
rev = 0;
}
olpc_platform_info.boardrev = be32_to_cpu(rev);
@@ -207,26 +207,43 @@ static void __init platform_detect(void)
}
#endif
-static int __init olpc_init(void)
+static int __init ofw_detect(void)
{
- unsigned char *romsig;
+ size_t propsize;
+ char romsig[20];
+ ofw_phandle phandle;
- spin_lock_init(&ec_lock);
+ /* Fetch /openprom/model */
+ if (ofw("finddevice", 1, 1, "/openprom", &phandle) || phandle == ~0)
+ return -ENODEV;
- romsig = ioremap(0xffffffc0, 16);
- if (!romsig)
- return 0;
+ if (ofw("getprop", 4, 1, phandle, "model", &romsig, sizeof(romsig),
+ &propsize) || propsize < 7)
+ return -ENODEV;
+ /* String should look something like "CL1 Q2D08 Q2D" */
if (strncmp(romsig, "CL1 Q", 7))
- goto unmap;
+ return -ENODEV;
if (strncmp(romsig+6, romsig+13, 3)) {
- printk(KERN_INFO "OLPC BIOS signature looks invalid. "
+ printk(KERN_INFO "olpc: BIOS signature looks invalid. "
"Assuming not OLPC\n");
- goto unmap;
+ return -ENODEV;
...This adds 32-bit support for calling into OFW from the kernel. It's useful for querying the firmware for misc hardware information, fetching the device tree, etc. There's potentially no reason why other platforms couldn't use this, but currently OLPC is the main user of it. This work was originally done by Mitch Bradley. Signed-off-by: Andres Salomon <dilinger@debian.org> --- arch/x86/Kconfig | 8 +++++ arch/x86/kernel/Makefile | 1 + arch/x86/kernel/head_32.S | 27 ++++++++++++++++ arch/x86/kernel/ofw.c | 75 +++++++++++++++++++++++++++++++++++++++++++++ include/asm-x86/ofw.h | 50 ++++++++++++++++++++++++++++++ include/asm-x86/setup.h | 1 + 6 files changed, 162 insertions(+), 0 deletions(-) create mode 100644 arch/x86/kernel/ofw.c create mode 100644 include/asm-x86/ofw.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 3b9089b..ce56105 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -661,6 +661,14 @@ config I8K Say Y if you intend to run this kernel on a Dell Inspiron 8000. Say N otherwise. +config OPEN_FIRMWARE + bool "Support for Open Firmware" + default y if OLPC + ---help--- + This option adds support for the implementation of Open Firmware + that is used on the OLPC XO laptop. + If unsure, say N here. + config X86_REBOOTFIXUPS def_bool n prompt "Enable X86 board specific fixups for reboot" diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 9575754..d33600e 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -54,6 +54,7 @@ obj-$(CONFIG_X86_TRAMPOLINE) += trampoline_$(BITS).o obj-$(CONFIG_X86_MPPARSE) += mpparse_$(BITS).o obj-$(CONFIG_X86_LOCAL_APIC) += apic_$(BITS).o nmi_$(BITS).o obj-$(CONFIG_X86_IO_APIC) += io_apic_$(BITS).o +obj-$(CONFIG_OPEN_FIRMWARE) += ofw.o obj-$(CONFIG_X86_REBOOTFIXUPS) += reboot_fixups_32.o obj-$(CONFIG_KEXEC) += machine_kexec_$(BITS).o obj-$(CONFIG_KEXEC) += relocate_kernel_$(BITS).o crash.o diff ...
how about changing to ofw_32.c? YH --
Is your suggestion to change the filename from "ofw.c" to "ofw_32.c"? That seems like a good idea to me. --
Yes. BTW, why olpc need OFW runtime service? why not just put the info in in ram with some signiture, so kernel/util just need to loot at the table if needed? YH --
In SPARC land, at least on SunOS and Solaris, it was very convenient for debugging to interrupt the OS with Stop-A and use OFW to inspect the system state. That was especially handy for live crash analysis. Dumps are useful as far as they go, but they often fail to capture detailed I/O device state. I was hoping to do that on x86 too. So far we (OLPC) haven't implemented a sysrq hook to enter OFW, but I haven't given up hope yet. It doesn't cost much to leave OFW around, but once you decide to eject it, you can't easily get it back. Apple made the early decision to eject OFW and just keep a device tree table. That decision was probably due to several factors, including the rather lame state of Apple's first OFW implementation and the complexity of their OS startup process at the time (which included "trampolining" to a 68000 emulator to run their legacy code). Once they went down that path, the die was cast, and the PowerPC community got used to the "OFW --
On Sun, 20 Apr 2008 18:05:26 -1000 I'm not actually convinced that we *do* want to keep OFW resident in memory, especially given the memory tricks we need to play. I also don't actually like the OFW interface that we. The debugging aspect of it was a compelling argument up until a week ago (when kernel debuggers started finally finding their way into the kernel). However, until we clean up the promfs stuff, there's no chance of getting -- Need a kernel or Debian developer? Contact me, I'm looking for contracts. --
I don't actually think that the debugging aspect was _ever_ a compelling argument. It might have made it theoretically possible for _Mitch_ to debug kernel problems, should he be inclined to do so -- but for the rest of us mere mortals it's just a PITA trying to keep OpenFirmware I see no reason why we shouldn't be able to create a 'flattened' device-tree during early boot, like the PowerPC kernel does. And use it thereafter, having quiesced OpenFirmware. Haven't we already been working on unifying this between SPARC and PowerPC kernels? I definitely don't think we need to play these tricks to keep OpenFirmware resident while the kernel is running. Take a look at your second patch -- it's _all_ just lookups in the device-tree, and you're inventing a new way to do it instead of using the existing one. -- dwmw2 --
If so, would this apply to OLPC as well? -hpa --
Yes. The 'second patch' to which I refer is the one which makes OLPC platform code use the calls in OpenFirmware... all of them gratuitous. -- dwmw2 --
On Mon, 21 Apr 2008 16:54:13 +0100 Quite simply, it's a lot more work (*and* we have to play nice w/ sparc and ppc). I had intended to eventually do it, but first I wanted to get this stuff in for 2.6.26 so that we could at least boot upstream kernels on XOs. I was also hoping to not get into this conversation, but alas.. too -- Need a kernel or Debian developer? Contact me, I'm looking for contracts. --
It's only more work because we did it the wrong way in the first place. If only someone had pointed it out at the time... :) For interaction with device-tree properties in generic code, you should be using the functions defined in <linux/of.h>. Creating the static device-tree before we quiesce OpenFirmware surely Is it only the things in your second patch which need to be made to work? One of them was already working, by grubbing around in the BIOS directly -- so all we need is the board revision, isn't it? Can we get that from the EC for now? -- dwmw2 --
On Mon, 21 Apr 2008 20:18:11 +0100 Yes, and if only we had an infinite number of kernel hackers who had time We're not adding a device tree right now, we're adding a method for querying OFW for information. Eventually that information should be obtained from a device tree. However, that's going to take additional time, and I'd like to get rid of some of these patches that we've been carrying Well, no, it wasn't already working; that's the reason this whole thread started. It was crashing someone's machine. That's why the OFW interface, as imperfect as it is, is an _improvement_. -- Need a kernel or Debian developer? Contact me, I'm looking for contracts. --
You're proposing a new interface between bootloader and kernel as a temporary hack just to work around that until we fix it properly? That seems like overkill to me. I'd just go for is_geode() as you suggested, and maybe PCI configuration tricks to detect the lack of VSA so we can be _fairly_ sure it's OLPC before we poke at it? Or why not try '!page_is_ram(0xffffffc0 >> PAGE_SHIFT)' if it's just to avoid that particular warning? :) -- dwmw2 --
On Mon, 21 Apr 2008 21:25:17 +0100
Okay, does anyone have a problem with this?
The OFW sig check requires an ioremap that is dangerous on non-OLPC
systems. Long term, we should be getting the signature from the
device tree (/openprom/model), but for right now just limit the
check to only run on a subset of Geode (GX2/LX) systems.
Signed-off-by: Andres Salomon <dilinger@debian.org>
---
arch/x86/kernel/olpc.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/olpc.c b/arch/x86/kernel/olpc.c
index 11670be..3e66722 100644
--- a/arch/x86/kernel/olpc.c
+++ b/arch/x86/kernel/olpc.c
@@ -211,6 +211,10 @@ static int __init olpc_init(void)
{
unsigned char *romsig;
+ /* The ioremap check is dangerous; limit what we run it on */
+ if (!is_geode() || geode_has_vsa2())
+ return 0;
+
spin_lock_init(&ec_lock);
romsig = ioremap(0xffffffc0, 16);
--
1.5.4.4
--
Need a kernel or Debian developer? Contact me, I'm looking for contracts.
--geode_has_vsa2() is a fairly expensive-looking function and afacit only needs to be evaluated once per boot. Perhaps we should cache it somewhere? --
On Mon, 28 Apr 2008 20:06:51 -0700
How about this?
This moves geode_has_vsa2 into a .c file, caches the result we get from
the VSA virtual registers, and causes the function to no longer be inline.
Signed-off-by: Andres Salomon <dilinger@debian.org>
---
arch/x86/kernel/geode_32.c | 19 +++++++++++++++++++
include/asm-x86/geode.h | 11 +----------
2 files changed, 20 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/geode_32.c b/arch/x86/kernel/geode_32.c
index 9dad6ca..1cb8225 100644
--- a/arch/x86/kernel/geode_32.c
+++ b/arch/x86/kernel/geode_32.c
@@ -161,6 +161,25 @@ void geode_gpio_setup_event(unsigned int gpio, int pair, int pme)
}
EXPORT_SYMBOL_GPL(geode_gpio_setup_event);
+static int has_vsa2 = -1;
+
+int geode_has_vsa2(void)
+{
+ if (has_vsa2 == -1) {
+ /*
+ * The VSA has virtual registers that we can query for a
+ * signature.
+ */
+ outw(VSA_VR_UNLOCK, VSA_VRC_INDEX);
+ outw(VSA_VR_SIGNATURE, VSA_VRC_INDEX);
+
+ has_vsa2 = (inw(VSA_VRC_DATA) == VSA_SIG);
+ }
+
+ return has_vsa2;
+}
+EXPORT_SYMBOL_GPL(geode_has_vsa2);
+
static int __init geode_southbridge_init(void)
{
if (!is_geode())
diff --git a/include/asm-x86/geode.h b/include/asm-x86/geode.h
index 7154dc4..8a53bc8 100644
--- a/include/asm-x86/geode.h
+++ b/include/asm-x86/geode.h
@@ -185,16 +185,7 @@ static inline int is_geode(void)
return (is_geode_gx() || is_geode_lx());
}
-/*
- * The VSA has virtual registers that we can query for a signature.
- */
-static inline int geode_has_vsa2(void)
-{
- outw(VSA_VR_UNLOCK, VSA_VRC_INDEX);
- outw(VSA_VR_SIGNATURE, VSA_VRC_INDEX);
-
- return (inw(VSA_VRC_DATA) == VSA_SIG);
-}
+extern int geode_has_vsa2(void);
/* MFGPTs */
--
1.5.5
--On Tue, 29 Apr 2008 01:32:13 -0400
Looks sane. Although one wonders if it should be cached as one of the
standard x86 feature bit thingies which show up in /proc/cpuinfo's 'flags'
nit:
--- a/arch/x86/kernel/geode_32.c
+++ a/arch/x86/kernel/geode_32.c
@@ -161,10 +161,10 @@ void geode_gpio_setup_event(unsigned int
}
EXPORT_SYMBOL_GPL(geode_gpio_setup_event);
-static int has_vsa2 = -1;
-
int geode_has_vsa2(void)
{
+ static int has_vsa2 = -1;
+
if (has_vsa2 == -1) {
/*
* The VSA has virtual registers that we can query for a
--On Tue, 29 Apr 2008 13:35:12 -0700 The VSA lives in a weird place between hardware and BIOS. I'm not really sure whether it's appropriate for it to be an x86_cap_flags (it hadn't occurred to me), but I think of it more as BIOS. Jordan, what do Looks good. --
That looks saner to me for now. Acked-By: David Woodhouse <dwmw2@infradead.org> -- dwmw2 --
-- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. --
geode is using SMI to simulate the pci conf space, wonder that could be problem. later you have 64 runtime service for 64 platform like UEFI? YH --
On the current OLPC system, we don't use the SMI-based PCI config space simulator. The code for that "VSA" module is only partially open sourced (some of it is open, and some of it is just not available). The parts of it for which we do have source can only be compiled with an old proprietary toolchain that is no longer available. Instead of using the SMI-based simulation, we have added a PCI configuration access method in the kernel that supplies the necessary information from a table. The code for that hardware-specific access method is roughly 40 lines of code plus a few data tables. In the past few weeks, I have developed a rather complete Open Firmware-based reimplementation of the SMI PCI config hardware emulator. All-told, it requires over 1000 lines. It remains to be seen whether the complicated version will ultimately be deployed. Personally, I find it distasteful to use a lot of code to make the hardware pretend that it is something other than what it really is, when a much smaller driver works just as well. The SMI-based emulator is quite difficult to understand and maintain, because the Geode SMI handling mechanism is complex, incompletely documented, and suffers from many of the multiple-mode-switches problems as real-mode to Possibly. 64-bit systems are not a problem per se - there have been 64-bit OFW implementations for 64-bit architectures like SPARC and Alpha dating back to a long time ago. The main issue from my point of view is --
From: Mitch Bradley <wmb@firmworks.com> In most current SPARC systems, OFW is not usable and is completely forgotten right after bootup in order to accomodate LDOMs and CPU hotplug. It's a better idea, anyways, to develop more pervasive and usable in-kernel debugger facilities. Then it doesn't matter if you have "cool" firmware or not. :-) --
Hm. This interface seems more than a bit ad hoc. In particular, I *really* don't like the swapper_pg_dir hack. "There must be a better way." -hpa --
On Sun, 20 Apr 2008 08:07:55 -0400 I'm certainly open to suggestions.. Otherwise, I'll poke around and see if I can come up w/ something. -- Need a kernel or Debian developer? Contact me, I'm looking for contracts. --
It pretty much depends on what the invariants look like. The normal/clean way of doing this kind of thing is via a fixmap entry and/or ioremap. -hpa --
The x86 architecture doesn't make this problem easy. The conventional solution is to have the BIOS operate in real mode. When the kernel calls into the BIOS, it has to do a grotesque dance that involves jumping through a chain of several segments of different flavors, thus gradually shutting down the multi-tiered address translation mechanism. Then, if the BIOS is actually operating in protected mode (which is necessary if it is larger than 64K, as all modern BIOSes are), it has to perform the inverse process, do the requested work, then go back into real mode to return to the kernel. The net result is that a "call" into the BIOS involves: a) Copying the arguments to a real-mode register shadow array b) Saving all the registers - general ones and a few special ones too c) Far call to a linear-mapped code segment with an execution address in the first 1M of memory d) Switching to a different stack e) Turning off page translation f) Switching from protected mode to real mode (or in some cases, V86 mode instead, which requires an additional Task State Segment dance to set the IO permission mask) g) Switching to a real-mode interrupt descriptor table h) Executing an INT instruction I) Performing the inverse of a - g inside the BIOS j) Doing the requested work K) Performing a - g again to get back into real mode l) Executing an "iret" instruction M) Performing the inverse of a-g to return to normal operation The machinery that you need to do all that is predictably complex - extra segment descriptors that are set up just-so, several little code fragments that must be at special addresses in the first meg, additional stacks, a real-mode interrupt table at a fixed address, and several data save arrays. That machinery has to be in assembly language, spanning several different instruction set modes. Compared to that, I think that sharing one or two page directory entries at the very top of the virtual address space is pretty clean and simple...
[long rant about the x86 architecture] It would be more useful if you described the actual defined entry conditions from OpenFirmware look like, including if they are well-defined for all OF implementations or only for OLPC. -hpa --
Fair enough... To get the second subquestion out of the way: At the present time, on the x86 architecture, "all OF implementations" and "OLPC" are effectively the same. I am unaware of any other x86 OFW deployments in current use. There have been some in the past, on bespoke systems such as Network Appliance servers and at least one settop box, but those have fallen by the wayside as those companies have shifted over to commodity PC hardware. The current market status quo is that x86 boards are primarily designed for Windows, and thus must run legacy BIOS, with some recent migration to EFI, neither of which are open source in the strong sense. While I would like to see more OFW penetration into the larger x86 market, I don't really expect it. x86 motherboard manufacturing is becoming more and more difficult as signal speeds increase, leading to a decline in the number of manufacturers. The existing manufacturers depend on Windows for sales volume and their internal procedures and working knowledge are based on legacy BIOS. Once upon a time, we had an OFW "binding" document that stipulated the interface conditions, with the intention of making that "standard" across all OFW-on-x86 systems. However, by the time OLPC came around, there were no other systems to consider, so I felt free to make some changes in the interface. I ended up choosing an ABI that resulted in a simple (in the sense of not much code, and no complex state transitions) interface with 2.6 Linux kernels. The interface defined below is not inherently OLPC-specific - it would be suitable for any ia32 system that used OFW. (At a higher level, the set of OFW callback functions is architecture-neutral; in this message I am focusing on the very low-level details of the ia32 ABI) The system conditions for the OFW to Linux kernel transition are as follows: a) OFW can load the Linux kernel from either bzimage format or ELF format (either uncompressed or zlib-compressed.) If the ker...
/me puts on his coreboot hat This is off topic slightly, but let it be known that the coreboot project considers OFW a very valid option for x86 platforms. A kernel that worked happily with OFW would greatly encourage people to adopt it in lieu of other BIOS / firmware solutions. I return you to your previously scheduled debate. Jordan --
The interface they are proposing is definitely not suitable for upward extension, for the reasons already mentioned. However, they have units in the field, and the amount of changes required to support another interface should be relatively minor. Hence my insistence that we don't promote it as *the* OFW interface, but *a* OFW interface. -hpa --
So let me see here... you want the virtual address range [0xffc00000, 0xfff00000) to be reserved for OFW, and you are prohibiting the kernel I do not like it, simply because it amounts to "initialize this otherwise zero-initialized piece of data without making any kind of reservations and blindly hope nothing else overwrites it." I'm also troubled with the assumption that the kernel doesn't use PAE. I realize that this is not an issue for OLPC, but it certainly makes this a less-than-generic solution. Having mapped page table entries which are not under kernel control is a very serious problem for PAT - PAT requires, by hardware specification, the kernel to eliminate all potential aliases with different mappings. One way to deal with this, of course, is to save the firmware-provided PGD and only use it for OFW calls. On the other hand, perhaps a better questions is to what extent it is needed at all. Furthermore, since you're using a nonstandard OFW interface (not compliant with the x86 OFW binding document), all of this should be called something like OLPC_OFW to make it clear that it's the OLPC variant. If I had designed this, I would probably have used an SMI; since you have control over the firmware you can do that. SMI saves the entire machine state including all the modes, cleans them all up for you, and puts it all back together at RSM time. It is slow, of course, but it completely decouples the firmware and the OS, which is why it's used. -hpa --
Okay, stepping back a few steps, it's pretty clear that most of my objections aren't really an issue for Geode/OLPC; however, I *really* don't want others to pick it up as being "the" Open Firmware interface. Within those constraints it makes sense to set up the PDEs in swapper_pg_dir and let them propagate using the normal mechanisms. ** This is assuming that your OF interface does not rely on a 1:1 mapping of low memory being present at the time it makes a call. If it *does*, then a separate page directory needs to be maintained for the OF class. ** Thus, I'm willing to accept this with these changes: - Please name things specific to the interface (as opposed to Open Firmware in general, like the device tree) olpc_ofw or olpcfw, to denote that this is an OLPC-specific interface. Thus, CONFIG_OLPC_OPEN_FIRMWARE or something along those lines. - Make it explicit in Kconfig that OLPC_OPEN_FIRMWARE conflicts with X86_PAE, 64BIT, or X86_PAT. - Change VMALLOC_END in include/asm-x86/pgtable_32.h so the kernel will know to avoid this virtual memory range. - Add a memory region to arch/x86/mm/dump_tabletables.c. -hpa --
Okay, stepping back a few steps, it's pretty clear that most of my objections aren't really an issue for Geode/OLPC; however, I *really* don't want others to pick it up as being "the" Open Firmware interface. Within those constraints it makes sense to set up the PDEs in swapper_pg_dir and let them propagate using the normal mechanisms. ** This is assuming that your OF interface does not rely on a 1:1 mapping of low memory being present at the time it makes a call. If it *does*, then a separate page directory needs to be maintained for the OF class. ** Thus, I'm willing to accept this with these changes: - Please name things specific to the interface (as opposed to Open Firmware in general, like the device tree) olpc_ofw or olpcfw, to denote that this is an OLPC-specific interface. Thus, CONFIG_OLPC_OPEN_FIRMWARE or something along those lines. - Make it explicit in Kconfig that OLPC_OPEN_FIRMWARE conflicts with X86_PAE, 64BIT, or X86_PAT. - Change VMALLOC_END in include/asm-x86/pgtable_32.h so the kernel will know to avoid this virtual memory range. - Add a memory region to arch/x86/mm/dump_tabletables.c. -hpa --
Okay, stepping back a few steps, it's pretty clear that most of my objections aren't really an issue for Geode/OLPC; however, I *really* don't want others to pick it up as being "the" Open Firmware interface. Within those constraints it makes sense to set up the PDEs in swapper_pg_dir and let them propagate using the normal mechanisms. ** This is assuming that your OF interface does not rely on a 1:1 mapping of low memory being present at the time it makes a call. If it *does*, then a separate page directory needs to be maintained for the OF class. ** Thus, I'm willing to accept this with these changes: - Please name things specific to the interface (as opposed to Open Firmware in general, like the device tree) olpc_ofw or olpcfw, to denote that this is an OLPC-specific interface. Thus, CONFIG_OLPC_OPEN_FIRMWARE or something along those lines. - Make it explicit in Kconfig that OLPC_OPEN_FIRMWARE conflicts with X86_PAE, 64BIT, or X86_PAT. - Change VMALLOC_END in include/asm-x86/pgtable_32.h so the kernel will know to avoid this virtual memory range. - Add a memory region to arch/x86/mm/dump_tabletables.c. -hpa --
so you are assuming that your uncompressed vmlinux only use less 8M space? you are supposed to check the bzImage to get uncompressed vmlinux size. YH --
The 0x800000 ramdisk load address is an OLPC-specific firmware implementation detail that could easily be changed without affecting anything else. I probably shouldn't have mentioned it because it isn't really an integral part of the interface "contract". I certainly hope that the OLPC kernel never gets anywhere near that size. The OLPC hardware has limited configurability, so it's not plausible that the kernel would grow that large to include a huge kit of drivers. If the kernel file becomes large as a result of including the initramfs in the same file, the 0x800000 ramdisk load address won't apply (because there won't be a separate load of the initramfs file), so the kernel could be extend way past that boundary with no problems. If we get to the point where we do need huge kernels on OLPC, we can release a firmware upgrade along with the new OS. We have mechanisms for coordinating firmware and OS upgrades. If a new customer for OFW on x86 appears, I'll remember to float the boundary above the bzImage uncompressed size (assuming that the bzimage --
I've been seeing the following backtrace since (I think) 2.6.25-rc8-mm2. I'm sending multiple reports vs. 2.6.25-mm1, so I'm putting the dmesg and .config on a server: http://home.columbus.rr.com/jfannin3/dmesg.txt http://home.columbus.rr.com/jfannin3/config-2.6.25-mm1.txt [ 842.795144] hm, dftrace overflow: 265 changes (0 total) in 428 usecs [ 842.795182] ------------[ cut here ]------------ [ 842.795192] WARNING: at kernel/trace/ftrace.c:658 ftraced+0x1a4/0x1b0() [ 842.795200] Modules linked in: af_packet rfcomm l2cap bluetooth ppdev ipv6 cpufreq_conservative cpufreq_stats cpufreq_userspace cpufreq_powersave video output wmi pci_slot container dock sbs sbshcbattery iptable_filter ip_tables x_tables ext2 ac lp loop snd_via82xx gameport snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_mpu401_uart snd_seq_dummy snd_seq_oss snd_seq_midi psmouse snd_rawmidi serio_raw snd_seq_midi_event snd_seq button i2c_viapro snd_timer snd_seq_device pcspkr i2c_core snd snd_page_alloc via686a shpchp pci_hotplug parport_pc parport via_agp agpgart soundcore evdev sg sr_mod cdrom sd_mod 8139cp aic7xxx scsi_transport_spi scsi_mod 8139too mii uhci_hcd usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod thermal processor fan fuse ext4dev mbcache jbd2 crc16 [ 842.795470] Pid: 13, comm: ftraced Tainted: G W 2.6.25-mm1 #7 [ 842.795497] [<c0130fa9>] warn_on_slowpath+0x59/0x80 [ 842.795541] [<c013244f>] ? vprintk+0x33f/0x4a0 [ 842.795589] [<c0155216>] ? trace_hardirqs_on_caller+0x16/0x150 [ 842.795622] [<c0354eb0>] ? __mutex_lock_common+0x2b0/0x3c0 [ 842.795667] [<c0155216>] ? trace_hardirqs_on_caller+0x16/0x150 [ 842.795688] [<c015535b>] ? trace_hardirqs_on+0xb/0x10 [ 842.795709] [<c017e4d0>] ? __ftrace_update_code+0x0/0x110 [ 842.795730] [<c017e9f0>] ? ftraced+0x0/0x1b0 [ 842.795746] [<c01325d0>] ? printk+0x20/0x30 [ 842.795764] [<c017e9f0>] ? ftraced+0x0/0x1b0 [ 8...
Seen plenty of them - I think Greg today dropped the offending patch(es). [ 451.915553] sysfs: duplicate filename 'pcspkr' can not be created I haven't seen that one before. --
I've been seeing the following backtraces since 2.6.25-rc8-mm1 -- at least, since that's the earliest -mm I've built in a while. I don't get the same in mainline. No idea who to CC: I've sat on this report long enough. I'm going to send a few different reports in separate mails, so I'll put my dmesg and .config up on a server: http://home.columbus.rr.com/jfannin3/dmesg.txt http://home.columbus.rr.com/jfannin3/config-2.6.25-mm1.txt [ 451.915553] sysfs: duplicate filename 'pcspkr' can not be created [ 451.915731] ------------[ cut here ]------------ [ 451.915851] WARNING: at fs/sysfs/dir.c:427 sysfs_add_one+0x85/0xe0() [ 451.915981] Modules linked in: snd_pcsp(+) ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_mpu401_uart snd_seq_dummy snd_seq_oss snd_seq_midi psmouse snd_rawmidi serio_raw snd_seq_midi_event snd_seq button i2c_viapro snd_timer snd_seq_device pcspkr i2c_core snd snd_page_alloc via686a shpchp pci_hotplug parport_pc parport via_agp agpgart soundcore evdev sg sr_mod cdrom sd_mod 8139cp aic7xxx scsi_transport_spi scsi_mod 8139too mii uhci_hcd usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod thermal processor fan fuse ext4dev mbcache jbd2 crc16 [ 451.918960] Pid: 2740, comm: modprobe Tainted: G W 2.6.25-mm1 #7 [ 451.929271] [<c0130fa9>] warn_on_slowpath+0x59/0x80 [ 451.929500] [<c0132400>] ? vprintk+0x2f0/0x4a0 [ 451.929723] [<c0356adc>] ? _spin_unlock+0x2c/0x50 [ 451.929918] [<c01c6a7a>] ? ifind+0x4a/0xa0 [ 451.930126] [<c0155216>] ? trace_hardirqs_on_caller+0x16/0x150 [ 451.930334] [<c015535b>] ? trace_hardirqs_on+0xb/0x10 [ 451.930534] [<c01325d0>] ? printk+0x20/0x30 [ 451.930727] [<c01fcc45>] sysfs_add_one+0x85/0xe0 [ 451.930900] [<c01fd89e>] create_dir+0x4e/0xb0 [ 451.931064] [<c01fd930>] sysfs_create_dir+0x30/0x50 [ 451.931291] [<c0356adc>] ? _spin_unlock+0x2c/0x50 [ 451.931485] [<c023dac6>] kobject_add_internal...
