This is the start of the stable review cycle for the 2.6.25.9 release. There are 5 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let us know. If anyone is a maintainer of the proper subsystem, and wants to add a Signed-off-by: line to the patch, please respond with it. These patches are sent out with a number of different people on the Cc: line. If you wish to be a reviewer, please email stable@kernel.org to add your name to the list. If you want to be off the reviewer list, also email us. Responses should be made by Tuesday, June 24, 18:00:00 UTC. Anything received after that time might be too late. The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v2.6/stable-review/patch-2.6.25.9-rc1.gz and the diffstat can be found below. thanks, the -stable release team Makefile | 2 +- arch/powerpc/kernel/vdso.c | 2 +- arch/x86/kernel/setup_32.c | 10 ++++++++-- drivers/net/atl1/atl1_hw.c | 1 - include/asm-x86/page_32.h | 3 ++- mm/memory.c | 17 +++++++++++++---- mm/migrate.c | 10 ++++++++++ net/sctp/socket.c | 4 +++- 8 files changed, 38 insertions(+), 11 deletions(-) --
2.6.25-stable review patch. If anyone has any objections, please let us
know.
------------------
From: Bernhard Walle <bwalle@suse.de>
commit d3942cff620bea073fc4e3c8ed878eb1e84615ce upstream
This patch uses the BOOTMEM_EXCLUSIVE for crashkernel reservation also for
i386 and prints a error message on failure.
The patch is still for 2.6.26 since it is only bug fixing. The unification
of reserve_crashkernel() between i386 and x86_64 should be done for 2.6.27.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
arch/x86/kernel/setup_32.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/setup_32.c
+++ b/arch/x86/kernel/setup_32.c
@@ -483,10 +483,16 @@ static void __init reserve_crashkernel(v
(unsigned long)(crash_size >> 20),
(unsigned long)(crash_base >> 20),
(unsigned long)(total_mem >> 20));
+
+ if (reserve_bootmem(crash_base, crash_size,
+ BOOTMEM_EXCLUSIVE) < 0) {
+ printk(KERN_INFO "crashkernel reservation "
+ "failed - memory is in use\n");
+ return;
+ }
+
crashk_res.start = crash_base;
crashk_res.end = crash_base + crash_size - 1;
- reserve_bootmem(crash_base, crash_size,
- BOOTMEM_DEFAULT);
} else
printk(KERN_INFO "crashkernel reservation failed - "
"you have to specify a base address\n");
--
--
Hi, You will also need the patch from http://lkml.org/lkml/2008/6/21/103 to make sure reserve_bootmem() is not void (*)(). Hannes --
Ok, let me know when that goes into Linus's tree please. thanks, greg k-h --
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
--
It already is: 71c2742f5e6348d76ee62085cf0a13e5eff0f00e. Linus --
thanks. This patch (which was not a build fix but an infrastructure fix that the kexec fix in arch/x86 depended on) is well-tested as well, it was queued in -tip on June 10th: | commit 91d48fc80f22817332170082e10de60a75851640 | Author: Bernhard Walle <bwalle@suse.de> | Date: Sun Jun 8 15:46:29 2008 +0200 | CommitDate: Tue Jun 10 14:41:56 2008 +0200 | | bootmem: add return value to reserve_bootmem_node() | | This patch changes the function reserve_bootmem_node() from void to | int, returning -ENOMEM if the allocation fails. | | Signed-off-by: Bernhard Walle <bwalle@suse.de> | Signed-off-by: Ingo Molnar <mingo@elte.hu> so it is a -stable candidate just as much as the kexec fix. (These are all fixes for long-standing problems so i guess it can go all the way back to all stable kernels that are being maintained.) Ingo --
Ingo, shouldn't we add the reserve_bootmem_generic() fix [1] to 2.6.26-* at least? Bernhard [1] 62b5ebe062c2801f6d40480ae3b91a64c8c8e6cb -- Bernhard Walle, SUSE LINUX Products GmbH, Architecture Development --
but note that this too has dependencies, it relies on:
# tip/x86/numa: ddeb8ef: x86: add flags parameter to reserve_bootmem_generic()
# tip/x86/numa: 62b5ebe: x86: use reserve_bootmem_generic() to reserve crashkernel memory on x86_64
so i've initially delayed the whole topic to v2.6.27.
I've attached both patches below - are they really urgent enough to be
propagated to tip/x86/urgent and be sent to Linus? AFAICS these are
ancient issues with kernel crashdumping.
Ingo
---------------------->
commit ddeb8ef812cbe41739ea3d836681005e9646f922
Author: Bernhard Walle <bwalle@suse.de>
Date: Sun Jun 8 15:46:30 2008 +0200
x86: add flags parameter to reserve_bootmem_generic()
This patch adds a 'flags' parameter to reserve_bootmem_generic() like it
already has been added in reserve_bootmem() with commit
72a7fe3967dbf86cb34e24fbf1d957fe24d2f246.
It also changes all users to use BOOTMEM_DEFAULT, which doesn't effectively
change the behaviour. Since the change is x86-specific, I don't think it's
necessary to add a new API for migration. There are only 4 users of that
function.
The change is necessary for the next patch, using reserve_bootmem_generic()
for crashkernel reservation.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index 404683b..4901ae3 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -729,10 +729,11 @@ static int __init smp_scan_config(unsigned long base, unsigned long length,
if (!reserve)
return 1;
- reserve_bootmem_generic(virt_to_phys(mpf), PAGE_SIZE);
+ reserve_bootmem_generic(virt_to_phys(mpf), PAGE_SIZE,
+ BOOTMEM_DEFAULT);
if (mpf->mpf_physptr)
reserve_bootmem_generic(mpf->mpf_physptr,
- PAGE_SIZE);
+ PAGE_SIZE, BOOTMEM_DEFAULT);
#endif
return 1;
}
diff --git a/arch/x86/mm/init_64.c ...Ok, you have more experience which patches should go into 2.6.26 at I only brought up that topic again because it's a regression between 2.6.22 and 2.6.23 caused by 5c3391f9f749023a49c64d607da4fb49263690eb. Bernhard -- Bernhard Walle, SUSE LINUX Products GmbH, Architecture Development --
Hm, but it's not in Linus's tree yet, so I can't take it for stable at this time :( thanks, greg k-h --
it's all fine already: it's the very same patch you just added, but different sha1. I just pointed out the lineage and the testing status of the patch. Ingo --
2.6.25-stable review patch. If anyone has any objections, please let us know. ------------------ From: David S. Miller <davem@davemloft.net> commit 735ce972fbc8a65fb17788debd7bbe7b4383cc62 upstream As noticed by Gabriel Campana, the kmalloc() length arg passed in by sctp_getsockopt_local_addrs_old() can overflow if ->addr_num is large enough. Therefore, enforce an appropriate limit. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> --- net/sctp/socket.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4421,7 +4421,9 @@ static int sctp_getsockopt_local_addrs_o if (copy_from_user(&getaddrs, optval, len)) return -EFAULT; - if (getaddrs.addr_num <= 0) return -EINVAL; + if (getaddrs.addr_num <= 0 || + getaddrs.addr_num >= (INT_MAX / sizeof(union sctp_addr))) + return -EINVAL; /* * For UDP-style sockets, id specifies the association to query. * If the id field is set to the value '0' then the locally bound -- --
From: Greg KH <gregkh@suse.de> Unfortunately, Vlad found another case in SCTP which has an overflow bug similar to this one. I'll work on a fix for that today and submit. --
Thanks for letting me know, I'll wait for that one as well before doing this release. greg k-h --
From: Greg KH <gregkh@suse.de> This one turned out to be a false alarm, and Vlad confirmed my analysis today. So there is no other SCTP patch you need to wait for. Thanks! --
2.6.25-stable review patch. If anyone has any objections, please let us know. ------------------ From: Linus Torvalds <torvalds@linux-foundation.org> commit 89f5b7da2a6bad2e84670422ab8192382a5aeb9f upstream KAMEZAWA Hiroyuki and Oleg Nesterov point out that since the commit 557ed1fa2620dc119adb86b34c614e152a629a80 ("remove ZERO_PAGE") removed the ZERO_PAGE from the VM mappings, any users of get_user_pages() will generally now populate the VM with real empty pages needlessly. We used to get the ZERO_PAGE when we did the "handle_mm_fault()", but since fault handling no longer uses ZERO_PAGE for new anonymous pages, we now need to handle that special case in follow_page() instead. In particular, the removal of ZERO_PAGE effectively removed the core file writing optimization where we would skip writing pages that had not been populated at all, and increased memory pressure a lot by allocating all those useless newly zeroed pages. This reinstates the optimization by making the unmapped PTE case the same as for a non-existent page table, which already did this correctly. While at it, this also fixes the XIP case for follow_page(), where the caller could not differentiate between the case of a page that simply could not be used (because it had no "struct page" associated with it) and a page that just wasn't mapped. We do that by simply returning an error pointer for pages that could not be turned into a "struct page *". The error is arbitrarily picked to be EFAULT, since that was what get_user_pages() already used for the equivalent IO-mapped page case. [ Also removed an impossible test for pte_offset_map_lock() failing: that's not how that function works ] Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Nick Piggin <npiggin@suse.de> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: ...
Let's wait for the vmware breakage report to sort out first. http://lkml.org/lkml/2008/6/22/10 before moving it to -stable. Linus --
Sure, thanks for pointing that out to me, I'll track it as well. greg k-h --
I can confirm that the 2nd patch from Linus fixed the problem.
http://lkml.org/lkml/2008/6/22/107
Sorry it took so long. Traveling.
Thanks,
Jeff.
--
Long?! That was very quick, thanks for reporting back. But I'm afraid you've pushed me into taking another look at that patch, and I see a problem with it. To be honest, I've lost the plot on this issue, and didn't really get what your problem is, nor how Linus expected to be fixing it. The problem is that "insane" VM_LOCKED test which he has removed. I've remembered now what that's about: it's for make_pages_present. We do want mlocking a readonly area to make its pages present, even if they're not at this moment writable: we don't want the ZERO_PAGE substitution in that case. So I think Linus needs to factor that into the final patch, whilst at the same time solving whatever is the vmware breakage. Hugh --
The problem is that the old code said:
- we can use FOLL_ANON, assuming that the vma has no vm_ops, or has no
"fault" callback.
That was funcamentally broken. Because you can have a "nopfn" callback.
But it's hard to notice, since the whole FOLL_ANON code only _used_ to
trigger if a whole page table was missing.
The VM_LOCKED test was just crazy, but I doubt it was the cause of the
That's still crazy. make_pages_present() already does:
write = (vma->vm_flags & VM_WRITE) != 0;
and passes that in to "get_user_pages()". So for a writable mapping, we'll
elide the FOLL_ANON case anyway, and for a read-only mapping we should
have used ZERO_PAGE. Damn. Oh, well.
We can certainly re-instate the insane behaviour for mlock(). Not that we
So here's a third patch to test. It removes the VM_SHARED thing just to
get us closer to the original code (and because do_no_page() didn't do it
historically, so let's not do it either), and it re-instates the insane
VM_LOCKED test with a comment.
Jeff, does this still work with vmware?
Linus
---
mm/memory.c | 20 ++++++++++++++++++--
1 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 9aefaae..a2ce28d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1045,6 +1045,23 @@ no_page_table:
return page;
}
+/* Can we do the FOLL_ANON optimization? */
+static inline int use_zero_page(struct vm_area_struct *vma)
+{
+ /*
+ * We don't want to optimize FOLL_ANON for make_pages_present()
+ * when it tries to page in a VM_LOCKED region.
+ */
+ if (vma->vm_flags & VM_LOCKED)
+ return 0;
+ /*
+ * And if we have a fault or a nopfn routine, it's not an
+ * anonymous region.
+ */
+ return !vma->vm_ops ||
+ (!vma->vm_ops->fault && !vma->vm_ops->nopfn);
+}
+
int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, int len, int write, int force,
struct page **pages, struct vm_area_struct **vmas)
@@ -1119,8 +1136,7 @@ ...On Tue, Jun 24, 2008 at 12:39 AM, Linus Torvalds No, this breaks vmware. Does this trace help? Jun 24 00:54:49.325: vmx| NOT_IMPLEMENTED /build/mts/release/bora-93057/bora/vmx/main/vmmonPosix.c:774 Jun 24 00:54:49.325: vmx| Backtrace: Jun 24 00:54:49.325: vmx| Backtrace[0] 0xbffc30c8 eip 0x8052f10 Jun 24 00:54:49.325: vmx| Backtrace[1] 0xbffc34f8 eip 0x80f2f7d Jun 24 00:54:49.325: vmx| Backtrace[2] 0xbffc3548 eip 0x80e4b15 Jun 24 00:54:49.325: vmx| Backtrace[3] 0xbffc3638 eip 0x837b341 Jun 24 00:54:49.325: vmx| Backtrace[4] 0xbffc3688 eip 0x837cde4 Jun 24 00:54:49.325: vmx| Backtrace[5] 0xbffc36b8 eip 0x80fda89 Jun 24 00:54:49.325: vmx| Backtrace[6] 0xbffc36e8 eip 0x80f36f5 Jun 24 00:54:49.325: vmx| Backtrace[7] 0xbffc3728 eip 0x80f3bd4 Jun 24 00:54:49.325: vmx| Backtrace[8] 0xbffc3788 eip 0x80511be Jun 24 00:54:49.325: vmx| Backtrace[9] 0xbffc3878 eip 0x8051561 Jun 24 00:54:49.325: vmx| Backtrace[10] 0xbffc38e8 eip 0xb7e374c0 Jun 24 00:54:49.325: vmx| Backtrace[11] 00000000 eip 0x804e7b1 Jun 24 00:54:49.325: vmx| Core dump limit is 0 kb. Jun 24 00:54:49.326: vmx| Cannot remap region MonWired (addr=(nil), size=0x13000, offset=0x19000) Jun 24 00:54:49.326: vmx| Cannot remap region PShareMPN (addr=(nil), size=0x1000, offset=0x18000) Jun 24 00:54:49.326: vmx| Remapping region BusMemFrame1 as MAP_PRIVATE (addr=0xb7f9c000, size=0x1000, offset=0x17000) Jun 24 00:54:49.326: vmx| Remapping region BusMemFrame0 as MAP_PRIVATE (addr=0xb7f9d000, size=0x1000, offset=0x16000) Jun 24 00:54:49.326: vmx| Cannot remap region PhysRegion0 (addr=(nil), size=0x1000, offset=0x15000) Jun 24 00:54:49.326: vmx| Msg_Post: Error Jun 24 00:54:49.326: vmx| [msg.log.error.unrecoverable] VMware Workstation unrecoverable error: (vmx) Jun 24 00:54:49.326: vmx| NOT_IMPLEMENTED /build/mts/release/bora-93057/bora/vmx/main/vmmonPosix.c:774 Thanks, --
Not really. I have no idea what vmware does, so any traces from vmware are
pretty useless.
On the other hand, if you add a trace to the "use_zero_page()" function to
print out the vm_flags and other details, that probably would help.
That said, since the previous patch _did_ work, I bet that one that does
both VM_LOCKED and VM_SHARED works too. There was a reason I wanted to do
that VM_SHARED test. I think the VM_SHARED test is sane, unlike the
VM_LOCKED test (that is a fairly dubious hack for mlock).
So here's the final version. I bet it works.
Linus
---
mm/memory.c | 23 +++++++++++++++++++++--
1 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 9aefaae..423e0e7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1045,6 +1045,26 @@ no_page_table:
return page;
}
+/* Can we do the FOLL_ANON optimization? */
+static inline int use_zero_page(struct vm_area_struct *vma)
+{
+ /*
+ * We don't want to optimize FOLL_ANON for make_pages_present()
+ * when it tries to page in a VM_LOCKED region. As to VM_SHARED,
+ * we want to get the page from the page tables to make sure
+ * that we serialize and update with any other user of that
+ * mapping.
+ */
+ if (vma->vm_flags & (VM_LOCKED | VM_SHARED))
+ return 0;
+ /*
+ * And if we have a fault or a nopfn routine, it's not an
+ * anonymous region.
+ */
+ return !vma->vm_ops ||
+ (!vma->vm_ops->fault && !vma->vm_ops->nopfn);
+}
+
int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, int len, int write, int force,
struct page **pages, struct vm_area_struct **vmas)
@@ -1119,8 +1139,7 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
foll_flags = FOLL_TOUCH;
if (pages)
foll_flags |= FOLL_GET;
- if (!write && !(vma->vm_flags & VM_LOCKED) &&
- (!vma->vm_ops || !vma->vm_ops->fault))
+ if (!write && use_zero_page(vma))
foll_flags |= FOLL_ANON;
do {
--
On Tue, Jun 24, 2008 at 1:27 AM, Linus Torvalds Yeh, it works great! Thank you. --
No, it's fine. It really was a bug, and a long-standing one, just one that was probably practically impossible to hit before (because we used to only do the FOLL_ANON logic on missing whole page tables, and just about any access to any mapping even nearby the one you care about will fill in the page tables - so you would have had to be really unlucky to trigger the case before). The patch clearly fixes an issue, and makes the code more readable and Thanks for bisecting, reporting and testing. Linus --
2.6.25-stable review patch. If anyone has any objections, please let us know. ------------------ From: Radu Cristescu <advantis@gmx.net> upstream commit: 58c7821c4264a7ddd6f0c31c5caaf393b3897f10 The atl1 driver tries to determine the MAC address thusly: - If an EEPROM exists, read the MAC address from EEPROM and validate it. - If an EEPROM doesn't exist, try to read a MAC address from SPI flash. - If that fails, try to read a MAC address directly from the MAC Station Address register. - If that fails, assign a random MAC address provided by the kernel. We now have a report of a system fitted with an EEPROM containing all zeros where we expect the MAC address to be, and we currently handle this as an error condition. Turns out, on this system the BIOS writes a valid MAC address to the NIC's MAC Station Address register, but we never try to read it because we return an error when we find the all- zeros address in EEPROM. This patch relaxes the error check and continues looking for a MAC address even if it finds an illegal one in EEPROM. http://ubuntuforums.org/showthread.php?t=562617 [jacliburn@bellsouth.net: backport to 2.6.25.7] Signed-off-by: Radu Cristescu <advantis@gmx.net> Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> --- drivers/net/atl1/atl1_hw.c | 1 - 1 file changed, 1 deletion(-) --- a/drivers/net/atl1/atl1_hw.c +++ b/drivers/net/atl1/atl1_hw.c @@ -250,7 +250,6 @@ static int atl1_get_permanent_address(st memcpy(hw->perm_mac_addr, eth_addr, ETH_ALEN); return 0; } - return 1; } /* see if SPI FLAGS exist ? */ -- --
2.6.25-stable review patch. If anyone has any objections, please let us know. ------------------ From: Jeremy Fitzhardinge <jeremy@goop.org> commit ad524d46f36bbc32033bb72ba42958f12bf49b06 upstream When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can potentially have the same number of physical address bits as the 64-bit host ("Enhanced Legacy PAE Paging"). This means, in theory, we could have up to 52 bits of physical address in a pte. The 32-bit kernel uses a 32-bit unsigned long to represent a pfn. This means that it can only represent physical addresses up to 32+12=44 bits wide. Rather than widening pfns everywhere, just set 2^44 as the Linux x86_32-PAE architectural limit for physical address size. This is a bugfix for two cases: 1. running a 32-bit PAE kernel on a machine with more than 64GB RAM. 2. running a 32-bit PAE Xen guest on a host machine with more than 64GB RAM In both cases, a pte could need to have more than 36 bits of physical, and masking it to 36-bits will cause fairly severe havoc. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> --- include/asm-x86/page_32.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/include/asm-x86/page_32.h +++ b/include/asm-x86/page_32.h @@ -14,7 +14,8 @@ #define __PAGE_OFFSET _AC(CONFIG_PAGE_OFFSET, UL) #ifdef CONFIG_X86_PAE -#define __PHYSICAL_MASK_SHIFT 36 +/* 44=32+12, the limit we can fit into an unsigned long pfn */ +#define __PHYSICAL_MASK_SHIFT 44 #define __VIRTUAL_MASK_SHIFT 32 #define PAGETABLE_LEVELS 3 -- --
Hi Greg and -stable team;
Please consider following commit for -stable also, it definetly fixes a boot failure caused by reported opps
commit 1f6ef2342972dc7fd623f360f84006e2304eb935
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri Jun 20 12:19:28 2008 -0700
[watchdog] hpwdt: fix use of inline assembly
The inline assembly in drivers/watchdog/hpwdt.c was incredibly broken,
and included all the function prologue and epilogue stuff, even though
it was itself then inside a C function where the compiler would add its
own prologue and epilogue on top of it all.
This then just _happened_ to work if you had exactly the right compiler
version and exactly the right compiler flags, so that gcc just happened
to not create any prologue at all (the gcc-generated epilogue wouldn't
matter, since it would never be reached).
But the more proper way to fix it is to simply not do this. Move the
inline asm to the top level, with no surrounding function at all (the
better alternative would be to remove the prologue and make it actually
use proper description of the arguments to the inline asm, but that's a
bigger change than the one I'm willing to make right now).
Tested-by: S.Çağlar Onur <caglar@pardus.org.tr>
Acked-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cheers
--
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/
Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
--
Thanks, I've added that one now as well. greg k-h --
