Re: [patch 4/5] x86: use BOOTMEM_EXCLUSIVE on 32-bit

Previous thread: 2.6.26-rc7-git2: Reported regressions from 2.6.25 by Rafael J. Wysocki on Sunday, June 22, 2008 - 10:49 am. (80 messages)

Next thread: [PATCH 1/4] ide-generic: probing fix by Bartlomiej Zolnierkiewicz on Sunday, June 22, 2008 - 12:35 pm. (4 messages)
From: Greg KH
Date: Sunday, June 22, 2008 - 12:01 pm

This is the start of the stable review cycle for the 2.6.25.9 release.
There are 5 patches in this series, all will be posted as a response to
this one.  If anyone has any issues with these being applied, please let
us know.  If anyone is a maintainer of the proper subsystem, and wants
to add a Signed-off-by: line to the patch, please respond with it.

These patches are sent out with a number of different people on the Cc:
line.  If you wish to be a reviewer, please email stable@kernel.org to
add your name to the list.  If you want to be off the reviewer list,
also email us.

Responses should be made by Tuesday, June 24, 18:00:00 UTC.  Anything
received after that time might be too late.

The whole patch series can be found in one patch at:
	kernel.org/pub/linux/kernel/v2.6/stable-review/patch-2.6.25.9-rc1.gz
and the diffstat can be found below.


thanks,

the -stable release team

 Makefile                   |    2 +-
 arch/powerpc/kernel/vdso.c |    2 +-
 arch/x86/kernel/setup_32.c |   10 ++++++++--
 drivers/net/atl1/atl1_hw.c |    1 -
 include/asm-x86/page_32.h  |    3 ++-
 mm/memory.c                |   17 +++++++++++++----
 mm/migrate.c               |   10 ++++++++++
 net/sctp/socket.c          |    4 +++-
 8 files changed, 38 insertions(+), 11 deletions(-)
--

From: Greg KH
Date: Sunday, June 22, 2008 - 12:01 pm

2.6.25-stable review patch.  If anyone has any objections, please let us
know.

------------------
From: Bernhard Walle <bwalle@suse.de>

commit d3942cff620bea073fc4e3c8ed878eb1e84615ce upstream

This patch uses the BOOTMEM_EXCLUSIVE for crashkernel reservation also for
i386 and prints a error message on failure.

The patch is still for 2.6.26 since it is only bug fixing. The unification
of reserve_crashkernel() between i386 and x86_64 should be done for 2.6.27.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/x86/kernel/setup_32.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/setup_32.c
+++ b/arch/x86/kernel/setup_32.c
@@ -483,10 +483,16 @@ static void __init reserve_crashkernel(v
 					(unsigned long)(crash_size >> 20),
 					(unsigned long)(crash_base >> 20),
 					(unsigned long)(total_mem >> 20));
+
+			if (reserve_bootmem(crash_base, crash_size,
+					BOOTMEM_EXCLUSIVE) < 0) {
+				printk(KERN_INFO "crashkernel reservation "
+					"failed - memory is in use\n");
+				return;
+			}
+
 			crashk_res.start = crash_base;
 			crashk_res.end   = crash_base + crash_size - 1;
-			reserve_bootmem(crash_base, crash_size,
-					BOOTMEM_DEFAULT);
 		} else
 			printk(KERN_INFO "crashkernel reservation failed - "
 					"you have to specify a base address\n");

-- 
--

From: Johannes Weiner
Date: Sunday, June 22, 2008 - 1:22 pm

Hi,


You will also need the patch from http://lkml.org/lkml/2008/6/21/103 to
make sure reserve_bootmem() is not void (*)().

	Hannes
--

From: Greg KH
Date: Sunday, June 22, 2008 - 1:30 pm

Ok, let me know when that goes into Linus's tree please.

thanks,

greg k-h
--

From: Adrian Bunk
Date: Sunday, June 22, 2008 - 1:36 pm

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Linus Torvalds
Date: Sunday, June 22, 2008 - 1:36 pm

It already is: 71c2742f5e6348d76ee62085cf0a13e5eff0f00e.

		Linus
--

From: Ingo Molnar
Date: Monday, June 23, 2008 - 1:09 am

thanks. This patch (which was not a build fix but an infrastructure fix 
that the kexec fix in arch/x86 depended on) is well-tested as well, it 
was queued in -tip on June 10th:

| commit 91d48fc80f22817332170082e10de60a75851640
| Author: Bernhard Walle <bwalle@suse.de>
| Date:   Sun Jun 8 15:46:29 2008 +0200
| CommitDate: Tue Jun 10 14:41:56 2008 +0200
|
|    bootmem: add return value to reserve_bootmem_node()
|
|    This patch changes the function reserve_bootmem_node() from void to
|    int, returning -ENOMEM if the allocation fails.
|
|    Signed-off-by: Bernhard Walle <bwalle@suse.de>
|    Signed-off-by: Ingo Molnar <mingo@elte.hu>

so it is a -stable candidate just as much as the kexec fix. (These are 
all fixes for long-standing problems so i guess it can go all the way 
back to all stable kernels that are being maintained.)

	Ingo
--

From: Bernhard Walle
Date: Monday, June 23, 2008 - 3:33 am

Ingo, 

shouldn't we add the reserve_bootmem_generic() fix [1] to 2.6.26-* at
least?


Bernhard

[1] 62b5ebe062c2801f6d40480ae3b91a64c8c8e6cb
-- 
Bernhard Walle, SUSE LINUX Products GmbH, Architecture Development
--

From: Ingo Molnar
Date: Monday, June 23, 2008 - 3:53 am

but note that this too has dependencies, it relies on:

 # tip/x86/numa: ddeb8ef: x86: add flags parameter to reserve_bootmem_generic()
 # tip/x86/numa: 62b5ebe: x86: use reserve_bootmem_generic() to reserve crashkernel memory on x86_64

so i've initially delayed the whole topic to v2.6.27.

I've attached both patches below - are they really urgent enough to be 
propagated to tip/x86/urgent and be sent to Linus? AFAICS these are 
ancient issues with kernel crashdumping.

	Ingo

---------------------->
commit ddeb8ef812cbe41739ea3d836681005e9646f922
Author: Bernhard Walle <bwalle@suse.de>
Date:   Sun Jun 8 15:46:30 2008 +0200

    x86: add flags parameter to reserve_bootmem_generic()
    
    This patch adds a 'flags' parameter to reserve_bootmem_generic() like it
    already has been added in reserve_bootmem() with commit
    72a7fe3967dbf86cb34e24fbf1d957fe24d2f246.
    
    It also changes all users to use BOOTMEM_DEFAULT, which doesn't effectively
    change the behaviour. Since the change is x86-specific, I don't think it's
    necessary to add a new API for migration. There are only 4 users of that
    function.
    
    The change is necessary for the next patch, using reserve_bootmem_generic()
    for crashkernel reservation.
    
    Signed-off-by: Bernhard Walle <bwalle@suse.de>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index 404683b..4901ae3 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -729,10 +729,11 @@ static int __init smp_scan_config(unsigned long base, unsigned long length,
 			if (!reserve)
 				return 1;
 
-			reserve_bootmem_generic(virt_to_phys(mpf), PAGE_SIZE);
+			reserve_bootmem_generic(virt_to_phys(mpf), PAGE_SIZE,
+				BOOTMEM_DEFAULT);
 			if (mpf->mpf_physptr)
 				reserve_bootmem_generic(mpf->mpf_physptr,
-							PAGE_SIZE);
+					PAGE_SIZE, BOOTMEM_DEFAULT);
 #endif
 		return 1;
 		}
diff --git a/arch/x86/mm/init_64.c ...
From: Bernhard Walle
Date: Monday, June 23, 2008 - 6:21 am

Ok, you have more experience which patches should go into 2.6.26 at

I only brought up that topic again because it's a regression between
2.6.22 and 2.6.23 caused by 5c3391f9f749023a49c64d607da4fb49263690eb.



Bernhard
-- 
Bernhard Walle, SUSE LINUX Products GmbH, Architecture Development
--

From: Greg KH
Date: Monday, June 23, 2008 - 12:20 pm

Hm, but it's not in Linus's tree yet, so I can't take it for stable at
this time :(

thanks,

greg k-h
--

From: Ingo Molnar
Date: Monday, June 23, 2008 - 12:36 pm

it's all fine already: it's the very same patch you just added, but 
different sha1. I just pointed out the lineage and the testing status of 
the patch.

	Ingo
--

From: Greg KH
Date: Sunday, June 22, 2008 - 12:01 pm

2.6.25-stable review patch.  If anyone has any objections, please let us
know.

------------------
From: David S. Miller <davem@davemloft.net>

commit 735ce972fbc8a65fb17788debd7bbe7b4383cc62 upstream

As noticed by Gabriel Campana, the kmalloc() length arg
passed in by sctp_getsockopt_local_addrs_old() can overflow
if ->addr_num is large enough.

Therefore, enforce an appropriate limit.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 net/sctp/socket.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4421,7 +4421,9 @@ static int sctp_getsockopt_local_addrs_o
 	if (copy_from_user(&getaddrs, optval, len))
 		return -EFAULT;
 
-	if (getaddrs.addr_num <= 0) return -EINVAL;
+	if (getaddrs.addr_num <= 0 ||
+	    getaddrs.addr_num >= (INT_MAX / sizeof(union sctp_addr)))
+		return -EINVAL;
 	/*
 	 *  For UDP-style sockets, id specifies the association to query.
 	 *  If the id field is set to the value '0' then the locally bound

-- 
--

From: David Miller
Date: Sunday, June 22, 2008 - 12:23 pm

From: Greg KH <gregkh@suse.de>

Unfortunately, Vlad found another case in SCTP which has
an overflow bug similar to this one.  I'll work on a
fix for that today and submit.
--

From: Greg KH
Date: Sunday, June 22, 2008 - 1:28 pm

Thanks for letting me know, I'll wait for that one as well before doing
this release.

greg k-h
--

From: David Miller
Date: Monday, June 23, 2008 - 2:36 pm

From: Greg KH <gregkh@suse.de>

This one turned out to be a false alarm, and Vlad confirmed my
analysis today.  So there is no other SCTP patch you need to
wait for.

Thanks!
--

From: Greg KH
Date: Monday, June 23, 2008 - 2:43 pm

Great, thanks for letting me know.

greg k-h
--

From: Greg KH
Date: Sunday, June 22, 2008 - 12:01 pm

2.6.25-stable review patch.  If anyone has any objections, please let us
know.

------------------
From: Linus Torvalds <torvalds@linux-foundation.org>

commit 89f5b7da2a6bad2e84670422ab8192382a5aeb9f upstream

KAMEZAWA Hiroyuki and Oleg Nesterov point out that since the commit
557ed1fa2620dc119adb86b34c614e152a629a80 ("remove ZERO_PAGE") removed
the ZERO_PAGE from the VM mappings, any users of get_user_pages() will
generally now populate the VM with real empty pages needlessly.

We used to get the ZERO_PAGE when we did the "handle_mm_fault()", but
since fault handling no longer uses ZERO_PAGE for new anonymous pages,
we now need to handle that special case in follow_page() instead.

In particular, the removal of ZERO_PAGE effectively removed the core
file writing optimization where we would skip writing pages that had not
been populated at all, and increased memory pressure a lot by allocating
all those useless newly zeroed pages.

This reinstates the optimization by making the unmapped PTE case the
same as for a non-existent page table, which already did this correctly.

While at it, this also fixes the XIP case for follow_page(), where the
caller could not differentiate between the case of a page that simply
could not be used (because it had no "struct page" associated with it)
and a page that just wasn't mapped.

We do that by simply returning an error pointer for pages that could not
be turned into a "struct page *".  The error is arbitrarily picked to be
EFAULT, since that was what get_user_pages() already used for the
equivalent IO-mapped page case.

[ Also removed an impossible test for pte_offset_map_lock() failing:
  that's not how that function works ]

Acked-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: ...
From: Linus Torvalds
Date: Sunday, June 22, 2008 - 12:22 pm

Let's wait for the vmware breakage report to sort out first.

	http://lkml.org/lkml/2008/6/22/10

before moving it to -stable.

		Linus
--

From: Greg KH
Date: Sunday, June 22, 2008 - 1:29 pm

Sure, thanks for pointing that out to me, I'll track it as well.

greg k-h
--

From: Jeff Chua
Date: Monday, June 23, 2008 - 8:32 am

I can confirm that the 2nd patch from Linus fixed the problem.

               http://lkml.org/lkml/2008/6/22/107

Sorry it took so long. Traveling.

Thanks,
Jeff.
--

From: Hugh Dickins
Date: Monday, June 23, 2008 - 9:04 am

Long?!  That was very quick, thanks for reporting back.

But I'm afraid you've pushed me into taking another look at that
patch, and I see a problem with it.  To be honest, I've lost the
plot on this issue, and didn't really get what your problem is,
nor how Linus expected to be fixing it.

The problem is that "insane" VM_LOCKED test which he has removed.
I've remembered now what that's about: it's for make_pages_present.
We do want mlocking a readonly area to make its pages present, even
if they're not at this moment writable: we don't want the ZERO_PAGE
substitution in that case.

So I think Linus needs to factor that into the final patch,
whilst at the same time solving whatever is the vmware breakage.

Hugh
--

From: Linus Torvalds
Date: Monday, June 23, 2008 - 9:39 am

The problem is that the old code said:

 - we can use FOLL_ANON, assuming that the vma has no vm_ops, or has no 
   "fault" callback.

That was funcamentally broken. Because you can have a "nopfn" callback. 
But it's hard to notice, since the whole FOLL_ANON code only _used_ to 
trigger if a whole page table was missing.

The VM_LOCKED test was just crazy, but I doubt it was the cause of the 

That's still crazy. make_pages_present() already does:

	write = (vma->vm_flags & VM_WRITE) != 0;

and passes that in to "get_user_pages()". So for a writable mapping, we'll 
elide the FOLL_ANON case anyway, and for a read-only mapping we should 
have used ZERO_PAGE. Damn. Oh, well.

We can certainly re-instate the insane behaviour for mlock(). Not that we 

So here's a third patch to test. It removes the VM_SHARED thing just to 
get us closer to the original code (and because do_no_page() didn't do it 
historically, so let's not do it either), and it re-instates the insane 
VM_LOCKED test with a comment.

Jeff, does this still work with vmware?

		Linus

---
 mm/memory.c |   20 ++++++++++++++++++--
 1 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 9aefaae..a2ce28d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1045,6 +1045,23 @@ no_page_table:
 	return page;
 }
 
+/* Can we do the FOLL_ANON optimization? */
+static inline int use_zero_page(struct vm_area_struct *vma)
+{
+	/*
+	 * We don't want to optimize FOLL_ANON for make_pages_present()
+	 * when it tries to page in a VM_LOCKED region.
+	 */
+	if (vma->vm_flags & VM_LOCKED)
+		return 0;
+	/*
+	 * And if we have a fault or a nopfn routine, it's not an
+	 * anonymous region.
+	 */
+	return !vma->vm_ops ||
+		(!vma->vm_ops->fault && !vma->vm_ops->nopfn);
+}
+
 int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		unsigned long start, int len, int write, int force,
 		struct page **pages, struct vm_area_struct **vmas)
@@ -1119,8 +1136,7 @@ ...
From: Jeff Chua
Date: Monday, June 23, 2008 - 10:05 am

On Tue, Jun 24, 2008 at 12:39 AM, Linus Torvalds


No, this breaks vmware. Does this trace help?

Jun 24 00:54:49.325: vmx| NOT_IMPLEMENTED
/build/mts/release/bora-93057/bora/vmx/main/vmmonPosix.c:774
Jun 24 00:54:49.325: vmx| Backtrace:
Jun 24 00:54:49.325: vmx| Backtrace[0] 0xbffc30c8 eip 0x8052f10
Jun 24 00:54:49.325: vmx| Backtrace[1] 0xbffc34f8 eip 0x80f2f7d
Jun 24 00:54:49.325: vmx| Backtrace[2] 0xbffc3548 eip 0x80e4b15
Jun 24 00:54:49.325: vmx| Backtrace[3] 0xbffc3638 eip 0x837b341
Jun 24 00:54:49.325: vmx| Backtrace[4] 0xbffc3688 eip 0x837cde4
Jun 24 00:54:49.325: vmx| Backtrace[5] 0xbffc36b8 eip 0x80fda89
Jun 24 00:54:49.325: vmx| Backtrace[6] 0xbffc36e8 eip 0x80f36f5
Jun 24 00:54:49.325: vmx| Backtrace[7] 0xbffc3728 eip 0x80f3bd4
Jun 24 00:54:49.325: vmx| Backtrace[8] 0xbffc3788 eip 0x80511be
Jun 24 00:54:49.325: vmx| Backtrace[9] 0xbffc3878 eip 0x8051561
Jun 24 00:54:49.325: vmx| Backtrace[10] 0xbffc38e8 eip 0xb7e374c0
Jun 24 00:54:49.325: vmx| Backtrace[11] 00000000 eip 0x804e7b1
Jun 24 00:54:49.325: vmx| Core dump limit is 0 kb.
Jun 24 00:54:49.326: vmx| Cannot remap region MonWired (addr=(nil),
size=0x13000, offset=0x19000)
Jun 24 00:54:49.326: vmx| Cannot remap region PShareMPN (addr=(nil),
size=0x1000, offset=0x18000)
Jun 24 00:54:49.326: vmx| Remapping region BusMemFrame1 as MAP_PRIVATE
(addr=0xb7f9c000, size=0x1000, offset=0x17000)
Jun 24 00:54:49.326: vmx| Remapping region BusMemFrame0 as MAP_PRIVATE
(addr=0xb7f9d000, size=0x1000, offset=0x16000)
Jun 24 00:54:49.326: vmx| Cannot remap region PhysRegion0 (addr=(nil),
size=0x1000, offset=0x15000)
Jun 24 00:54:49.326: vmx| Msg_Post: Error
Jun 24 00:54:49.326: vmx| [msg.log.error.unrecoverable] VMware
Workstation unrecoverable error: (vmx)
Jun 24 00:54:49.326: vmx| NOT_IMPLEMENTED
/build/mts/release/bora-93057/bora/vmx/main/vmmonPosix.c:774


Thanks,
--

From: Linus Torvalds
Date: Monday, June 23, 2008 - 10:27 am

Not really. I have no idea what vmware does, so any traces from vmware are 
pretty useless.

On the other hand, if you add a trace to the "use_zero_page()" function to 
print out the vm_flags and other details, that probably would help.

That said, since the previous patch _did_ work, I bet that one that does 
both VM_LOCKED and VM_SHARED works too. There was a reason I wanted to do 
that VM_SHARED test. I think the VM_SHARED test is sane, unlike the 
VM_LOCKED test (that is a fairly dubious hack for mlock).

So here's the final version. I bet it works.

		Linus
---
 mm/memory.c |   23 +++++++++++++++++++++--
 1 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 9aefaae..423e0e7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1045,6 +1045,26 @@ no_page_table:
 	return page;
 }
 
+/* Can we do the FOLL_ANON optimization? */
+static inline int use_zero_page(struct vm_area_struct *vma)
+{
+	/*
+	 * We don't want to optimize FOLL_ANON for make_pages_present()
+	 * when it tries to page in a VM_LOCKED region. As to VM_SHARED,
+	 * we want to get the page from the page tables to make sure
+	 * that we serialize and update with any other user of that
+	 * mapping.
+	 */
+	if (vma->vm_flags & (VM_LOCKED | VM_SHARED))
+		return 0;
+	/*
+	 * And if we have a fault or a nopfn routine, it's not an
+	 * anonymous region.
+	 */
+	return !vma->vm_ops ||
+		(!vma->vm_ops->fault && !vma->vm_ops->nopfn);
+}
+
 int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		unsigned long start, int len, int write, int force,
 		struct page **pages, struct vm_area_struct **vmas)
@@ -1119,8 +1139,7 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		foll_flags = FOLL_TOUCH;
 		if (pages)
 			foll_flags |= FOLL_GET;
-		if (!write && !(vma->vm_flags & VM_LOCKED) &&
-		    (!vma->vm_ops || !vma->vm_ops->fault))
+		if (!write && use_zero_page(vma))
 			foll_flags |= FOLL_ANON;
 
 		do {
--

From: Jeff Chua
Date: Monday, June 23, 2008 - 11:15 am

On Tue, Jun 24, 2008 at 1:27 AM, Linus Torvalds


Yeh, it works great! Thank you.

--

From: Linus Torvalds
Date: Monday, June 23, 2008 - 11:32 am

No, it's fine. It really was a bug, and a long-standing one, just one that 
was probably practically impossible to hit before (because we used to only 
do the FOLL_ANON logic on missing whole page tables, and just about any 
access to any mapping even nearby the one you care about will fill in the 
page tables - so you would have had to be really unlucky to trigger the 
case before).

The patch clearly fixes an issue, and makes the code more readable and 

Thanks for bisecting, reporting and testing. 

			Linus
--

From: Greg KH
Date: Sunday, June 22, 2008 - 12:01 pm

2.6.25-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Radu Cristescu <advantis@gmx.net>

upstream commit: 58c7821c4264a7ddd6f0c31c5caaf393b3897f10

The atl1 driver tries to determine the MAC address thusly:

	- If an EEPROM exists, read the MAC address from EEPROM and
	  validate it.
	- If an EEPROM doesn't exist, try to read a MAC address from
	  SPI flash.
	- If that fails, try to read a MAC address directly from the
	  MAC Station Address register.
	- If that fails, assign a random MAC address provided by the
	  kernel.

We now have a report of a system fitted with an EEPROM containing all
zeros where we expect the MAC address to be, and we currently handle
this as an error condition.  Turns out, on this system the BIOS writes
a valid MAC address to the NIC's MAC Station Address register, but we
never try to read it because we return an error when we find the all-
zeros address in EEPROM.

This patch relaxes the error check and continues looking for a MAC
address even if it finds an illegal one in EEPROM.

http://ubuntuforums.org/showthread.php?t=562617

[jacliburn@bellsouth.net: backport to 2.6.25.7]

Signed-off-by: Radu Cristescu <advantis@gmx.net>
Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/net/atl1/atl1_hw.c |    1 -
 1 file changed, 1 deletion(-)

--- a/drivers/net/atl1/atl1_hw.c
+++ b/drivers/net/atl1/atl1_hw.c
@@ -250,7 +250,6 @@ static int atl1_get_permanent_address(st
 			memcpy(hw->perm_mac_addr, eth_addr, ETH_ALEN);
 			return 0;
 		}
-		return 1;
 	}
 
 	/* see if SPI FLAGS exist ? */

-- 
--

From: gregkh
Date: Sunday, June 22, 2008 - 12:01 pm

2.6.25-stable review patch.  If anyone has any objections, please let us
know.

------------------
From: Jeremy Fitzhardinge <jeremy@goop.org>

commit ad524d46f36bbc32033bb72ba42958f12bf49b06 upstream

When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can
potentially have the same number of physical address bits as the
64-bit host ("Enhanced Legacy PAE Paging").  This means, in theory,
we could have up to 52 bits of physical address in a pte.

The 32-bit kernel uses a 32-bit unsigned long to represent a pfn.
This means that it can only represent physical addresses up to 32+12=44
bits wide.  Rather than widening pfns everywhere, just set 2^44 as the
Linux x86_32-PAE architectural limit for physical address size.

This is a bugfix for two cases:
1. running a 32-bit PAE kernel on a machine with
  more than 64GB RAM.
2. running a 32-bit PAE Xen guest on a host machine with
  more than 64GB RAM

In both cases, a pte could need to have more than 36 bits of physical,
and masking it to 36-bits will cause fairly severe havoc.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 include/asm-x86/page_32.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/include/asm-x86/page_32.h
+++ b/include/asm-x86/page_32.h
@@ -14,7 +14,8 @@
 #define __PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
 
 #ifdef CONFIG_X86_PAE
-#define __PHYSICAL_MASK_SHIFT	36
+/* 44=32+12, the limit we can fit into an unsigned long pfn */
+#define __PHYSICAL_MASK_SHIFT	44
 #define __VIRTUAL_MASK_SHIFT	32
 #define PAGETABLE_LEVELS	3
 

-- 
--

From: S.Çağlar
Date: Monday, June 23, 2008 - 4:19 am

Hi Greg and -stable team;


Please consider following commit for -stable also, it definetly fixes a boot failure caused by reported opps

commit 1f6ef2342972dc7fd623f360f84006e2304eb935
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Fri Jun 20 12:19:28 2008 -0700

    [watchdog] hpwdt: fix use of inline assembly

    The inline assembly in drivers/watchdog/hpwdt.c was incredibly broken,
    and included all the function prologue and epilogue stuff, even though
    it was itself then inside a C function where the compiler would add its
    own prologue and epilogue on top of it all.

    This then just _happened_ to work if you had exactly the right compiler
    version and exactly the right compiler flags, so that gcc just happened
    to not create any prologue at all (the gcc-generated epilogue wouldn't
    matter, since it would never be reached).

    But the more proper way to fix it is to simply not do this.  Move the
    inline asm to the top level, with no surrounding function at all (the
    better alternative would be to remove the prologue and make it actually
    use proper description of the arguments to the inline asm, but that's a
    bigger change than the one I'm willing to make right now).

    Tested-by: S.Çağlar Onur <caglar@pardus.org.tr>
    Acked-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


Cheers
-- 
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
--

From: Greg KH
Date: Monday, June 23, 2008 - 12:30 pm

Thanks, I've added that one now as well.

greg k-h
--

Previous thread: 2.6.26-rc7-git2: Reported regressions from 2.6.25 by Rafael J. Wysocki on Sunday, June 22, 2008 - 10:49 am. (80 messages)

Next thread: [PATCH 1/4] ide-generic: probing fix by Bartlomiej Zolnierkiewicz on Sunday, June 22, 2008 - 12:35 pm. (4 messages)