Re: 2.6.28-rc1: NVRAM being corrupted on ppc64 preventing boot (bisected)

Previous thread: [git pull] Input updates for 2.6.28-rc1 by Dmitry Torokhov on Thursday, October 23, 2008 - 8:33 pm. (1 message)

Next thread: [patch 00/27] 2.6.27.4-stable review by Greg KH on Thursday, October 23, 2008 - 9:33 pm. (34 messages)
From: Linus Torvalds
Date: Thursday, October 23, 2008 - 9:10 pm

It's been two weeks, so it's time to close the merge window. A 2.6.28-rc1 
is out there, and it's hopefully all good.

The changes in -rc1 are (as usual) too many to really enumerate, with the 
bulk of them being - again as usual - drivers. In fact, that's doubly true 
now that we merged the drivers from the staging tree. The dirstat output 
makes that very obvious:

   3.3% arch/arm/
  14.2% arch/
   3.1% crypto/
   4.0% drivers/media/
   3.7% drivers/net/wireless/
  10.3% drivers/net/
   6.5% drivers/staging/me4000/
   8.5% drivers/staging/slicoss/
   4.8% drivers/staging/wlan-ng/
  29.7% drivers/staging/
  63.6% drivers/
   3.3% include/
   4.6% net/
   4.6% sound/

but some other statistics may be fun: 

 - 7141 non-merge commits (and 419 merges)

 - average non-merge commit:
	39 lines removed, 104 lines added
   (not counting renames)

 - About 880 individual authors
	- 340 of which had just one commit
	- while 183 authors had ten or more commits

 - Most screwed-up clock award goes to:

	Greg Kroah-Hartman

   for his commit 51b90540, which claims to be from April 9 - six years 
   ago! And it's a fix to a driver that was merged this July!

Way to go, Greg!

Anyway, have fun, please test it, and report any intersting anomalies you 
find.

			Linus
--

From: Roland Dreier
Date: Thursday, October 23, 2008 - 9:14 pm

>  - Most screwed-up clock award goes to:
 > 
 > 	Greg Kroah-Hartman
 > 
 >    for his commit 51b90540, which claims to be from April 9 - six years 
 >    ago! And it's a fix to a driver that was merged this July!

Maybe that commit really is from a time before printk supported the
"%zd" format for size_t :)

 - R.
--

From: Greg KH
Date: Friday, October 24, 2008 - 11:08 am

Heh :(

Sorry about this, stupid "default date for my patch" script messed up.
I suck at bash scripting at times...

greg k-h
--

From: Alistair John Strachan
Date: Friday, October 24, 2008 - 4:24 am

It seems if you have a broken asm/ symlink in include/ (which happened as a 
result of the x86 header moves, for me) the kernel won't try to update it 
appropriately, and this breaks "make prepare".

$ make    ARCH=x86_64 prepare                                                                                                   
  CHK     include/linux/version.h                                                                                                                            
  CHK     include/linux/utsrelease.h                                                                                                                         
  GEN     include/asm/asm-offsets.h                                                                                                                          
/bin/sh: include/asm/asm-offsets.h: No such file or directory                                                                                                
make[1]: *** [include/asm/asm-offsets.h] Error 1                                                                                                             
make: *** [prepare0] Error 2

rm -f include/asm fixes it

This was just from taking a 2.6.27 tree, git clean -d -f, git pull, make 
oldconfig. Might be a nice thing to fix?

-- 
Cheers,
Alistair.
--

From: Rafael J. Wysocki
Date: Friday, October 24, 2008 - 4:45 am

Hm, I didn't have any problems with compiling .28-rc1 on x86_64.

[Confused.]

Rafael
--

From: Alistair John Strachan
Date: Friday, October 24, 2008 - 5:52 am

This should reproduce it (whether or not it's a use-case we care about is 
another matter). First, make sure your include/asm symlink has been removed, 
then execute the following sequence:

git reset --hard v2.6.27 ; git clean -d -f
git status ("Nothing to commit")

cp /path/to/config .config
make oldconfig prepare
git clean -d -f ; git reset --hard
git status ("Nothing to commit")

Observe at this point that include/asm is valid and points to include/asm-x86, 
despite the clean and reset (I guess this file is being ignored). Now:

git reset --hard v2.6.28-rc1 (Or whatever other method you might choose)
git clean -d -f (Removes include/asm-x86)

Observe at this point that include/asm is now invalid, and still points to the 
removed include/asm-x86 directory.

cp /path/to/config .config
make oldconfig prepare

Should fail at this point:

scripts/kconfig/conf -o arch/x86/Kconfig
#
# configuration written to .config
#
scripts/kconfig/conf -s arch/x86/Kconfig
  CHK     include/linux/version.h
  UPD     include/linux/version.h
  CHK     include/linux/utsrelease.h
  UPD     include/linux/utsrelease.h
  CC      kernel/bounds.s
  GEN     include/linux/bounds.h
  CC      arch/x86/kernel/asm-offsets.s
  GEN     include/asm/asm-offsets.h
/bin/sh: include/asm/asm-offsets.h: No such file or directory
make[2]: *** [include/asm/asm-offsets.h] Error 1
make[1]: *** [prepare0] Error 2
make: *** [prepare] Error 2

Can you confirm?

I checked out Makefile and I believe it occurs because the current checks only 
make sure a symlink exists, and if it does exist that its target matches up 
with the selected architecture. It doesn't actually check the destination of 
the symlink is valid.

I'd suggest that it should do that too, and if the destination doesn't exist, 
re-write the symlink when it does "mkdir include/asm-x86" further down, but 
I'm not a kbuild expert.

-- 
Cheers,
Alistair.
--

From: Alexey Dobriyan
Date: Friday, October 24, 2008 - 6:13 am

Use this script for super-clean project-agnostic clean:

	$ cat ~/bin/git-mrproper
	#!/bin/sh
	git-ls-files -o --directory -z | xargs -0 rm -rf


I'd say nothing should be done here, include/asm symlink autochange
because of different ARCH was unsupported due to it being "big" event, and
headers move is equally "big" event.
--

From: Björn
Date: Friday, October 24, 2008 - 7:56 am

JFYI, that should be the same as:
git clean -xdf

The -x makes it wipe out ignored files as well.

Björn
--

From: Linus Torvalds
Date: Friday, October 24, 2008 - 8:17 am

The problem is ignored files.

Yes, git claims everything is clean, but that's because it has been told 
to ignore certain files, and because it has been told to ignore them, it 
will not remove them (without the -x flag) in "git clean", nor will it 
mention them in "git status".

And yes, one of the ignored file patterns is

	include/asm-*/asm-offsets.h

which means that your "git clean -df" didn't *really* clean everything 
from the old include/asm-x86, and because it didn't clean it all it also 
wouldn't be able to remove the old stale directory - since it wasn't 
empty.

You can use "git clean -dfx" to force git to remove ignored files too. And 
"make distclean" should have done it too.

Now, _another_ part (and arguably the really core reason) of this problem 
is that our Makefile rules for the asm include directory is weak and 
unreliable in the presense of already-existing unexpected entries.

And it has caused problems before. For example, if you somehow made the 
symlink not be a symlink at all (by using "cp -LR" for example), or a 
symlink pointing to another architecture (changing architecture builds in 
the same tree without doing a "make clean" in between), you historically 
got really odd results.

In fact, it's broken in subtle way before to the point where we now have a 
special "check-symlink" target internally that checks that the symlink is 
correctly set up.

Of course, it didn't check that you had some old stuff in include/asm-x86, 
it only checks for the _traditional_ problems we've had. Not some new odd 
one.

		Linus
--

From: David Miller
Date: Friday, October 24, 2008 - 3:31 pm

From: Linus Torvalds <torvalds@linux-foundation.org>

I guess we could use seperate "stamp" files to deal with this.

Along with the generated file "foo" there is a "foo.stamp" file
that is generated with "touch" after "foo" is built.

Then "foo"'s update rule is whether the "foo.stamp" is out of date
wrt. it's dependencies.
--

From: Sam Ravnborg
Date: Friday, October 24, 2008 - 3:51 pm

I remember I made an attempt doing so long time ago for
the asm symlink. But why it failed for me I dunno.

We used this trick in many archs before
but as part of the header move to arch/$ARCH we have killed
almost all uses of symlink to reach certain files.

The asm symlink is only used by asm-offsett.h for most archs these
days and when I get around to it I will fix that too so we
can kill it entirely.

But first we need to move all archs headers to arch/$ARCH.
And we are getting there.

	Sam
--

From: Sam Ravnborg
Date: Friday, October 24, 2008 - 12:22 pm

I just checked and make mrproper / make distclean deletes the symlink

We do not cover the "asm symlink became a dir" problem.
But when all archs has moved headers it is anyway implicitly covered.

	Sam
--

From: Sam Ravnborg
Date: Friday, October 24, 2008 - 12:15 pm

The following patch add another special case hwre we delete stale symlinks.
In my limited testing it fixes the issue - can you try to give it a spin.

	Sam

diff --git a/Makefile b/Makefile
index f6703f1..9dc7427 100644
--- a/Makefile
+++ b/Makefile
@@ -961,6 +961,7 @@ export CPPFLAGS_vmlinux.lds += -P -C -U$(ARCH)
 
 # The asm symlink changes when $(ARCH) changes.
 # Detect this and ask user to run make mrproper
+# If asm is a stale symlink (point to dir that does not exist) remove it
 define check-symlink
 	set -e;                                                            \
 	if [ -L include/asm ]; then                                        \
@@ -970,6 +971,7 @@ define check-symlink
 			echo "       set ARCH or save .config and run 'make mrproper' to fix it";             \
 			exit 1;                                            \
 		fi;                                                        \
+		test -e $$asmlink || rm include/asm;                       \
 	fi
 endef
 
--

From: Alistair John Strachan
Date: Friday, October 24, 2008 - 4:44 pm

Fixes it here, thanks.

-- 
Cheers,
Alistair.
--

From: Matt Mackall
Date: Friday, October 24, 2008 - 10:09 am

This fails building on allnoconfig on at least x86-64 because forbid_dac
used by arch/x86/kernel/pci-dma.c is defined off in
drivers/pci/quirks.c, which isn't built if CONFIG_PCI isn't set.

-- 
Mathematics is the supreme nostalgia of our time.

--

From: Matt Mackall
Date: Friday, October 24, 2008 - 10:54 am

(Also fails on x86-32)

Bisection points to: 

5b6985ce8ec7127b4d60ad450b64ca8b82748a3b
intel-iommu: IA64 support

-- 
Mathematics is the supreme nostalgia of our time.

--

From: Randy Dunlap
Date: Friday, October 24, 2008 - 10:57 am

Patch for this has been posted.  I don't have it handy ATM.

~Randy
--

From: Fenghua Yu
Date: Friday, October 24, 2008 - 11:05 am

Yes, the fix patch has been posted yesterday. And it has been merged into linux-next tree already. In case you need it, I post it here again.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>

---

 arch/ia64/include/asm/iommu.h |    1 -
 arch/ia64/kernel/pci-dma.c    |    7 -------
 arch/x86/include/asm/iommu.h  |    1 -
 arch/x86/kernel/pci-dma.c     |   16 ++++++++++++++++
 drivers/pci/pci.h             |    0 
 drivers/pci/quirks.c          |   14 --------------
 6 files changed, 16 insertions(+), 23 deletions(-)

diff --git a/arch/ia64/include/asm/iommu.h b/arch/ia64/include/asm/iommu.h
index 5fb2bb9..0490794 100644
--- a/arch/ia64/include/asm/iommu.h
+++ b/arch/ia64/include/asm/iommu.h
@@ -11,6 +11,5 @@ extern int force_iommu, no_iommu;
 extern int iommu_detected;
 extern void iommu_dma_init(void);
 extern void machvec_init(const char *name);
-extern int forbid_dac;
 
 #endif
diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index 10a75b5..031abbf 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -89,13 +89,6 @@ int iommu_dma_supported(struct device *dev, u64 mask)
 {
 	struct dma_mapping_ops *ops = get_dma_ops(dev);
 
-#ifdef CONFIG_PCI
-	if (mask > 0xffffffff && forbid_dac > 0) {
-		dev_info(dev, "Disallowing DAC for device\n");
-		return 0;
-	}
-#endif
-
 	if (ops->dma_supported_op)
 		return ops->dma_supported_op(dev, mask);
 
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 1972266..1926248 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -9,6 +9,8 @@
 #include <asm/calgary.h>
 #include <asm/amd_iommu.h>
 
+static int forbid_dac __read_mostly;
+
 struct dma_mapping_ops *dma_ops;
 EXPORT_SYMBOL(dma_ops);
 
@@ -291,3 +293,17 @@ void pci_iommu_shutdown(void)
 }
 /* Must execute after PCI subsystem */
 fs_initcall(pci_iommu_init);
+
+#ifdef CONFIG_PCI
+/* Many VIA bridges seem to corrupt data for DAC. Disable it here */
+
+static __devinit void ...
From: Matt Mackall
Date: Friday, October 24, 2008 - 11:11 am

Thanks, works for me.

-- 
Mathematics is the supreme nostalgia of our time.

--

From: Domenico Andreoli
Date: Friday, October 24, 2008 - 3:28 pm

Hi,


I have a couple of back traces on my parisc. Config file follows.

Let me know if you need a bisection.

thanks,
Domenico


nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
Backtrace:
 [<00feca70>] do_ipt_get_ctl+0x358/0x498 [ip_tables]
 [<1031fbcc>] nf_sockopt+0x1cc/0x204
 [<1031fc24>] nf_getsockopt+0x20/0x2c
 [<1032cadc>] ip_getsockopt+0xc0/0x100
 [<102f9890>] sock_common_getsockopt+0x28/0x34
 [<102f73c0>] sys_getsockopt+0x7c/0x104
 [<101190c0>] syscall_exit+0x0/0x14


Kernel Fault: Code=15 regs=ee7c0400 (Addr=01307008)

     YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00000000000001101111111100001111 Tainted: G        W 
r00-03  0006ff0f 00fec000 00feca70 ee6a6180
r04-07  00000001 00021140 01306000 1044c820
r08-11  ef1a2dc0 000008b0 00000001 ed00d080
r12-15  01306000 fb61abc0 0002078c fb61b450
r16-19  000201c8 00000054 000201c8 00000000
r20-23  00000000 ed786000 00000000 ee6a61b4
r24-27  00000000 01306000 ee6a6180 1041a020
r28-31  00000000 00000000 ee7c0400 01307000
sr00-03  00000000 00000000 00000000 000006b1
sr04-07  00000000 00000000 00000000 00000000

IASQ: 00000000 00000000 IAOQ: 00fec640 00fec644
 IIR: 0ffc1290    ISR: 00000000  IOR: 01307008
 CPU:        1   CR30: ee7c0000 CR31: 11111111
 ORIG_R28: 00000001
 IAOQ[0]: get_counters+0x54/0x12c [ip_tables]
 IAOQ[1]: get_counters+0x58/0x12c [ip_tables]
 RP(r2): do_ipt_get_ctl+0x358/0x498 [ip_tables]
Backtrace:
 [<00feca70>] do_ipt_get_ctl+0x358/0x498 [ip_tables]
 [<1031fbcc>] nf_sockopt+0x1cc/0x204
 [<1031fc24>] nf_getsockopt+0x20/0x2c
 [<1032cadc>] ip_getsockopt+0xc0/0x100
 [<102f9890>] sock_common_getsockopt+0x28/0x34
 [<102f73c0>] sys_getsockopt+0x7c/0x104
 [<101190c0>] syscall_exit+0x0/0x14

Kernel panic - not syncing: Kernel Fault
Rebooting in 60 seconds..


#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.28-rc1
# Fri Oct 24 14:14:51 2008
#
CONFIG_PARISC=y
CONFIG_MMU=y
CONFIG_STACK_GROWSUP=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
# ...
From: Tony Vroon
Date: Friday, October 24, 2008 - 3:53 pm

--=-EN4gjLaSxdwGpTXp6bbD
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable


I'm afraid it fails to boot here entirely.
Gets as far as:
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31
hpet0: 3 32-bit timers, 25000000 Hz

And it sits there, instead of showing me it switched into
high-resolution mode as expected:
Switched to high resolution mode on CPU 0
Switched to high resolution mode on CPU 4
Switched to high resolution mode on CPU 2
Switched to high resolution mode on CPU 3
Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 5
Switched to high resolution mode on CPU 7
Switched to high resolution mode on CPU 6

I'll go and bisect now if it doesn't ring any bells. It's a dual
quad-core Opteron 2354 on a Tyan n6650W (S2915-E), BIOS 2.07
I've attached .config; personally I'm suspecting commit
1f6d6e8ebe73ba9d9d4c693f7f6f50f661dbd6e4 as I booted a tree from
Wednesday without issue.

No accusation as I can't back that up yet. Any patches that you want me to =
try and revert first?

Regards,
Tony V.

--=-EN4gjLaSxdwGpTXp6bbD
Content-Disposition: attachment; filename=.config
Content-Type: text/plain; name=.config; charset=UTF-8
Content-Transfer-Encoding: ...
From: Arjan van de Ven
Date: Friday, October 24, 2008 - 4:01 pm

On Fri, 24 Oct 2008 23:53:28 +0100

I suspect these are totally innocent; the reason I think this is that
select/poll only get used once you hit userspace... and you're hanging
way before that.


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Tony Vroon
Date: Sunday, October 26, 2008 - 6:17 am

Entirely correct. It seems commit
4403b406d4369a275d483ece6ddee0088cc0d592 by Linus just fixed it for me.
My boot hang is gone.

Linux prometheus 2.6.28-rc1-00005-g23cf24c #1 SMP Sun Oct 26 13:12:55
GMT 2008 x86_64 Quad-Core AMD Opteron(tm) Processor 2354 AuthenticAMD
GNU/Linux

Regards,
Tony V.


From: Mel Gorman
Date: Thursday, October 30, 2008 - 7:26 am

I first encountered this problem in SLES 11 Beta 2 but now I see it
affects 2.6.28-rc1 too.

On some ppc64 machines, NVRAM is being corrupted very early in boot (before
console is initialised). The machine reboots and then fails to find yaboot
printing the error "PReP-BOOT: Unable to load PRep image".  It's nowhere near
as serious as the ftrace+e1000 problem as the machine is not bricked but it's
fairly scary looking, the machine cannot boot and the fix is non-obvious. To
"fix" the machine;

1. Go to OpenFirmware prompt
2. type dev nvram
3. type wipe-nvram

The machine will reboot, reconstruct the NVRAM using some magic and yaboot
work again allowing an older kernel to be used. I bisected the problem down
to this commit.

From 91a00302959545a9ae423e99732b1e46eb19e877 Mon Sep 17 00:00:00 2001
From: Paul Mackerras <paulus@samba.org>
Date: Wed, 8 Oct 2008 14:03:29 +0000
Subject: [PATCH] powerpc: Sync RPA note in zImage with kernel's RPA note

Commit 9b09c6d909dfd8de96b99b9b9c808b94b0a71614 ("powerpc: Change the
default link address for pSeries zImage kernels") changed the
real-base value in the CHRP note added by the addnote program from
12MB to 32MB to give more space for Open Firmware to load the zImage.
(The real-base value says where we want OF to position itself in
memory.)  However, this change was ineffective on most pSeries
machines, because the RPA note added by addnote has the "ignore me"
flag set to 1.  This was intended to tell OF to ignore just the RPA
note, but has the side effect of also making OF ignore the CHRP note
(at least on most pSeries machines).

To solve this we have to set the "ignore me" flag to 0 in the RPA
note.  (We can't just omit the RPA note because that is equivalent to
having an RPA note with default values, and the default values are not
what we want.)  However, then we have to make sure the values in the
zImage's RPA note match up with the values that the kernel supplies
later in prom_init.c with either the ...
From: Paul Mackerras
Date: Thursday, October 30, 2008 - 1:52 pm

Eek!

Which ppc64 machines has this been seen on, and how were they being
booted (netboot, yaboot, etc.)?

Is it just the Powerstations with their SLOF-based firmware, or is it
IBM pSeries machines as well?

Paul.
--

From: Josh Boyer
Date: Thursday, October 30, 2008 - 2:05 pm

I'm pretty sure it was with pSeries machines.  I saw reports of POWER5
being effected (p520 and p710).  I believe one of them resolved the
issue by upgrading firmware on the machine.

josh
--

From: Dave Kleikamp
Date: Thursday, October 30, 2008 - 2:35 pm

This is true of a p720 (CHRP IBM,9124-720) that I was testing on.  With
upgraded firmware, the problem is gone.

-- 
David Kleikamp
IBM Linux Technology Center

--

From: Mel Gorman
Date: Friday, October 31, 2008 - 3:36 am

Yaboot in my case and I've heard it affected a DVD installation. I don't
know for sure if it affects netboot but as I think it's something the

To be honest, I haven't been brave enough to try this on a Powerstation yet
as I only have the one and I don't know if it's a) affected or b) fixable
with the same workaround. It was an IBM pSeries that was affected in my case
and a few people have hit the problem on pSeries AFARIK.

It's been pointed out that it can be "fixed" by upgrading the firmware but
surely we can avoid breaking the machine in the first place?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: Paul Mackerras
Date: Friday, October 31, 2008 - 4:10 am

What changed in that commit was the contents of a couple of structures
that the firmware looks at to see what the kernel wants from
firmware.  Specifically the change was to say that the kernel (or
really the zImage wrapper) would like the firmware to be based at the
32MB point (which is what AIX uses) rather than 12MB (which was the
default on older machines).

So, as I understand it, it's not anything the kernel is actively
doing, it's how the firmware is reacting to what the kernel says it
wants.  And since we are requesting the same value as AIX (as far as I
know) I'm really surprised it caused problems.

We can revert that commit, but I still need to solve the problem that
the distros are facing, namely that their installer kernel + initramfs
images are now bigger than 12MB and can't be loaded if the firmware is
based at 12MB.  That's why I really want to understand the problem in

Have you upgraded the firmware on the machine you saw this problem on?
If not, would you be willing to run some tests for me?

Paul.
--

From: Mel Gorman
Date: Friday, October 31, 2008 - 4:31 am

Same here, it sounds like an innocent change. While it is possible that AIX


Of course. 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: Mel Gorman
Date: Friday, October 31, 2008 - 11:36 am

As per an off-line suggestion, I was able to get past the NVRAM problem
using the following patch. The machine still fails to fully boot but it's
due to some modules problem and unrelated to this issue.

From 7e54016ce29eb80026d7ff9a8310cf9c3a7e17a9 Mon Sep 17 00:00:00 2001
From: Mel Gorman <mel@csn.ul.ie>
Date: Fri, 31 Oct 2008 17:12:46 +0000
Subject: [PATCH] Partial revert of 91a00302, set new_mem_def back to 0

On the suggestion of Paul McKerras, I tried the following patch. It partially
reverts a change made by commit 91a00302 by setting new_mem_def back to 0.
Once applied, IBM pSeries with old firmware do not corrupt their NVRAM early
in boot.

I do not know why this change fixes the problem. A structure like this is
also in arch/powerpc/boot/addnote.c but it's not clear if it needs to be
similarly changed or not. Paul?

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
--- 
 arch/powerpc/kernel/prom_init.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 23e0db2..d6c8128 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -719,7 +719,7 @@ static struct fake_elf {
 			.max_pft_size = 46,	/* 2^46 bytes max PFT size */
 			.splpar = 1,
 			.min_load = ~0U,
-			.new_mem_def = 1
+			.new_mem_def = 0
 		}
 	}
 };

--

From: Paul Mackerras
Date: Friday, October 31, 2008 - 4:18 am

I do need to know whether it was the vmlinux or the zImage.pseries
that you were loading with yaboot.  That commit you identified affects
the contents of an ELF note in the zImage.pseries that firmware looks
at, as well as a structure in the kernel itself that gets passed as an
argument to a call to firmware.  If you were loading a vmlinux with
yaboot when you saw the corruption occur then that narrows things down
a bit.

Paul.
--

From: Mel Gorman
Date: Friday, October 31, 2008 - 4:32 am

It's the vmlinux file I am seeing problems with.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: Benjamin Herrenschmidt
Date: Friday, October 31, 2008 - 4:31 am

Unless missed something, I think it's narrowed already. When loaded from
yaboot, there is no relevant difference between zImage and vmlinux here.
IE. yaboot parses the ELF header of the zImage itself and ignores the
special notes anyway so only the CAS firmware call is relevant in both
cases, no ?

Cheers,
Ben.


--

From: Paul Mackerras
Date: Friday, October 31, 2008 - 4:56 am

Good point.  However, it would be the parse-elf-header firmware call,
rather than the CAS firmware call, since 91a00302 modified the
fake_elf structure (to make it consistent with the CAS structure) but
not the CAS structure.

Paul.
--

Previous thread: [git pull] Input updates for 2.6.28-rc1 by Dmitry Torokhov on Thursday, October 23, 2008 - 8:33 pm. (1 message)

Next thread: [patch 00/27] 2.6.27.4-stable review by Greg KH on Thursday, October 23, 2008 - 9:33 pm. (34 messages)