login
Header Space

 
 

[PATCH] [24/58] x86_64: Untangle asm/hpet.h from asm/timex.h

Previous thread: [PATCH][RFC] Remove R/W semaphore content from generic semaphore.h headers. by Robert P. J. Day on Thursday, July 19, 2007 - 5:37 am. (3 messages)

Next thread: [ANNOUNCE] RSBAC 1.3.5 released by Amon Ott on Thursday, July 19, 2007 - 5:37 am. (1 message)
To: <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:54 am

- Some more improvements for AMD family 10
- Some help for gcc 4.3
- Out of line string functions for i386 (saves &gt;20k text) 
- x86-64 vDSO
- improved fake numa node code from David Rientjes
- various machine checking handling improvements from Tim H.
- timer cleanups and fixes from Thomas Gleixner
- various other cleanup and fixes

Please review. I plan to send them off relatively quickly
because I'm very late with this merge.

-Andi
-
To: <bunk@...>, <ak@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Adrian Bunk &lt;bunk@stusta.de&gt;

The Rise CPUs were only very short-lived, and there are no reports of
anyone both owning one and running Linux on it.

Googling for the printk string "CPU: Rise iDragon" didn't find any dmesg
available online.

If it turns out that against all expectations there are actually users
reverting this patch would be easy.

This patch will make the kernel images smaller by a few bytes for all
i386 users.

Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Acked-by: Dave Jones &lt;davej@redhat.com&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
---

 arch/i386/kernel/cpu/Makefile  |    1 
 arch/i386/kernel/cpu/common.c  |    2 -
 arch/i386/kernel/cpu/rise.c    |   52 -----------------------------------------
 include/asm-i386/processor.h   |    1 
 include/asm-x86_64/processor.h |    1 
 5 files changed, 57 deletions(-)

Index: linux/arch/i386/kernel/cpu/Makefile
===================================================================
--- linux.orig/arch/i386/kernel/cpu/Makefile
+++ linux/arch/i386/kernel/cpu/Makefile
@@ -9,7 +9,6 @@ obj-y	+=	cyrix.o
 obj-y	+=	centaur.o
 obj-y	+=	transmeta.o
 obj-y	+=	intel.o intel_cacheinfo.o addon_cpuid_features.o
-obj-y	+=	rise.o
 obj-y	+=	nexgen.o
 obj-y	+=	umc.o
 
Index: linux/arch/i386/kernel/cpu/common.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/common.c
+++ linux/arch/i386/kernel/cpu/common.c
@@ -606,7 +606,6 @@ extern int nsc_init_cpu(void);
 extern int amd_init_cpu(void);
 extern int centaur_init_cpu(void);
 extern int transmeta_init_cpu(void);
-extern int rise_init_cpu(void);
 extern int nexgen_init_cpu(void);
 extern int umc_init_cpu(void);
 
@@ -618,7 +617,6 @@ void __init early_cpu_init(void)
 	amd_init_cpu();
 	centaur_init_cpu();
 	transmeta_init_cpu();
-	rise_init_cpu();
 	nexgen_init_cpu();
 	umc_init_cpu();
 	early_c...
To: Andi Kleen <ak@...>
Cc: <bunk@...>, <ak@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 6:45 am

Why bother. Its a tiny tiny amount of code and it requires no maintenance
so it achieves nothing by leaving it alone and risks (slight I admit)
breaking someones box.

Alan

-
To: Alan Cox <alan@...>
Cc: Andi Kleen <ak@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 6:48 am

- It's not only code, it also bloats everyone's kernel image.
- All it did was to fiddle with capabilities - if any computer with
  a Rise cpu running Linux actually exists it should still work.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-
To: Adrian Bunk <bunk@...>
Cc: Andi Kleen <ak@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 7:13 am

&gt; - It's not only code, it also bloats everyone's kernel image.

Its a miniscule piece of code that is discarded on boot. Yes it might
make the image 100 bytes longer, but have you priced a 160GB disk
recently. I don't think 100 bytes of disk and 0 of memory really is worth
saving for any risk at all. Its not even worth the time to apply the

But you've no idea if this is true. Probably some of our other drivers
are bigger and have less users.

Alan
-
To: Alan Cox <alan@...>
Cc: Adrian Bunk <bunk@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 8:03 am

The patch is already applied.

Besides the CPU will likely boot even without special handling.

-Andi
-
To: Andi Kleen <ak@...>
Cc: Alan Cox <alan@...>, Adrian Bunk <bunk@...>, <patches@...>, <linux-kernel@...>, Andrew Morton <akpm@...>
Date: Thursday, July 19, 2007 - 10:56 am

You don't know this.  Why risk it?  Just leave the CPU magic as-is.

	Jeff


-
To: <ebiederm@...>, <nanhai.zou@...>, <asit.k.mallick@...>, <keith.packard@...>, <ak@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Eric W. Biederman &lt;ebiederm@xmission.com&gt;

On x86_64 kernel, level triggered irq migration gets initiated in the
context of that interrupt(after executing the irq handler) and following
steps are followed to do the irq migration.

1. mask IOAPIC RTE entry;     // write to IOAPIC RTE
2. EOI;                       // processor EOI write
3. reprogram IOAPIC RTE entry // write to IOAPIC RTE with new destination and
                              // and interrupt vector due to per cpu vector
                              // allocation.
4. unmask IOAPIC RTE entry;   // write to IOAPIC RTE

Because of the per cpu vector allocation in x86_64 kernels, when the irq
migrates to a different cpu, new vector(corresponding to the new cpu) will
get allocated.

An EOI write to local APIC has a side effect of generating an EOI write for
level trigger interrupts (normally this is a broadcast to all IOAPICs). 
The EOI broadcast generated as a side effect of EOI write to processor may
be delayed while the other IOAPIC writes (step 3 and 4) can go through.

Normally, the EOI generated by local APIC for level trigger interrupt
contains vector number.  The IOAPIC will take this vector number and search
the IOAPIC RTE entries for an entry with matching vector number and clear
the remote IRR bit (indicate EOI).  However, if the vector number is
changed (as in step 3) the IOAPIC will not find the RTE entry when the EOI
is received later.  This will cause the remote IRR to get stuck causing the
interrupt hang (no more interrupt from this RTE).

Current x86_64 kernel assumes that remote IRR bit is cleared by the time
IOAPIC RTE is reprogrammed.  Fix this assumption by checking for remote IRR
bit and if it still set, delay the irq migration to the next interrupt
arrival event(hopefully, next time remote IRR bit will get cleared before
the IOAPIC RTE is reprogrammed).

Initial analysis and patch from Nanhai.

Clean up patch from Suresh.

Rewritten to be less intrusive, and to contain a big f...
To: <venkatesh.pallipadi@...>, <ak@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Venki Pallipadi &lt;venkatesh.pallipadi@intel.com&gt;

This helps to reduce the frequency at which the CPU must be taken out of a
lower-power state.

Signed-off-by: Venkatesh Pallipadi &lt;venkatesh.pallipadi@intel.com&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Acked-by: Tim Hockin &lt;thockin@hockin.org&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
---

 arch/i386/kernel/cpu/mcheck/non-fatal.c |    4 ++--
 arch/x86_64/kernel/mce.c                |    9 ++++++---
 2 files changed, 8 insertions(+), 5 deletions(-)

Index: linux/arch/i386/kernel/cpu/mcheck/non-fatal.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/mcheck/non-fatal.c
+++ linux/arch/i386/kernel/cpu/mcheck/non-fatal.c
@@ -57,7 +57,7 @@ static DECLARE_DELAYED_WORK(mce_work, mc
 static void mce_work_fn(struct work_struct *work)
 { 
 	on_each_cpu(mce_checkregs, NULL, 1, 1);
-	schedule_delayed_work(&amp;mce_work, MCE_RATE);
+	schedule_delayed_work(&amp;mce_work, round_jiffies_relative(MCE_RATE));
 } 
 
 static int __init init_nonfatal_mce_checker(void)
@@ -82,7 +82,7 @@ static int __init init_nonfatal_mce_chec
 	/*
 	 * Check for non-fatal errors every MCE_RATE s
 	 */
-	schedule_delayed_work(&amp;mce_work, MCE_RATE);
+	schedule_delayed_work(&amp;mce_work, round_jiffies_relative(MCE_RATE));
 	printk(KERN_INFO "Machine check exception polling timer started.\n");
 	return 0;
 }
Index: linux/arch/x86_64/kernel/mce.c
===================================================================
--- linux.orig/arch/x86_64/kernel/mce.c
+++ linux/arch/x86_64/kernel/mce.c
@@ -375,7 +375,8 @@ static void mcheck_timer(struct work_str
 	if (mce_notify_user()) {
 		next_interval = max(next_interval/2, HZ/100);
 	} else {
-		next_interval = min(next_interval*2, check_interval*HZ);
+		next_interval = min(next_interval*2,
+				(int)round_jiffies_relative(check_interval*HZ));
 	}
 
 	schedule_delayed_work(&am...
To: <akpm@...>, <ak@...>, <benh@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Andrew Morton &lt;akpm@linux-foundation.org&gt;

Prevent stuff like this:

mm/vmalloc.c: In function 'unmap_kernel_range':
mm/vmalloc.c:75: warning: unused variable 'start'

Cc: Andi Kleen &lt;ak@suse.de&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

---

 include/asm-i386/tlbflush.h |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Index: linux/include/asm-i386/tlbflush.h
===================================================================
--- linux.orig/include/asm-i386/tlbflush.h
+++ linux/include/asm-i386/tlbflush.h
@@ -160,7 +160,11 @@ DECLARE_PER_CPU(struct tlb_state, cpu_tl
 	native_flush_tlb_others(&amp;mask, mm, va)
 #endif
 
-#define flush_tlb_kernel_range(start, end) flush_tlb_all()
+static inline void flush_tlb_kernel_range(unsigned long start,
+					unsigned long end)
+{
+	flush_tlb_all();
+}
 
 static inline void flush_tlb_pgtables(struct mm_struct *mm,
 				      unsigned long start, unsigned long end)
-
To: <stern@...>, <ak@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Alan Stern &lt;stern@rowland.harvard.edu&gt;

This patch (as921) adds code to the show_regs() routine in i386 and x86_64
to print the contents of the debug registers along with all the others.

Signed-off-by: Alan Stern &lt;stern@rowland.harvard.edu&gt;
Signed-off-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
---

 arch/i386/kernel/process.c   |   12 ++++++++++++
 arch/x86_64/kernel/process.c |   10 ++++++++++
 2 files changed, 22 insertions(+)

Index: linux/arch/i386/kernel/process.c
===================================================================
--- linux.orig/arch/i386/kernel/process.c
+++ linux/arch/i386/kernel/process.c
@@ -300,6 +300,7 @@ early_param("idle", idle_setup);
 void show_regs(struct pt_regs * regs)
 {
 	unsigned long cr0 = 0L, cr2 = 0L, cr3 = 0L, cr4 = 0L;
+	unsigned long d0, d1, d2, d3, d6, d7;
 
 	printk("\n");
 	printk("Pid: %d, comm: %20s\n", current-&gt;pid, current-&gt;comm);
@@ -324,6 +325,17 @@ void show_regs(struct pt_regs * regs)
 	cr3 = read_cr3();
 	cr4 = read_cr4_safe();
 	printk("CR0: %08lx CR2: %08lx CR3: %08lx CR4: %08lx\n", cr0, cr2, cr3, cr4);
+
+	get_debugreg(d0, 0);
+	get_debugreg(d1, 1);
+	get_debugreg(d2, 2);
+	get_debugreg(d3, 3);
+	printk("DR0: %08lx DR1: %08lx DR2: %08lx DR3: %08lx\n",
+			d0, d1, d2, d3);
+	get_debugreg(d6, 6);
+	get_debugreg(d7, 7);
+	printk("DR6: %08lx DR7: %08lx\n", d6, d7);
+
 	show_trace(NULL, regs, &amp;regs-&gt;esp);
 }
 
Index: linux/arch/x86_64/kernel/process.c
===================================================================
--- linux.orig/arch/x86_64/kernel/process.c
+++ linux/arch/x86_64/kernel/process.c
@@ -306,6 +306,7 @@ early_param("idle", idle_setup);
 void __show_regs(struct pt_regs * regs)
 {
 	unsigned long cr0 = 0L, cr2 = 0L, cr3 = 0L, cr4 = 0L, fs, gs, shadowgs;
+	unsigned long d0, d1, d2, d3, d6, d7;
 	unsigned int fsindex,gs...
To: <nigel@...>, <rdunlap@...>, <ak@...>, <rjw@...>, <pavel@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Nigel Cunningham &lt;nigel@nigel.suspend2.net&gt;

Signed-off-by: Nigel Cunningham &lt;nigel@nigel.suspend2.net&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Cc: Randy Dunlap &lt;rdunlap@xenotime.net&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
---

 arch/x86_64/kernel/vmlinux.lds.S  |    7 +++++++
 drivers/base/power/trace.c        |    5 ++++-
 include/asm-i386/resume-trace.h   |   13 +++++++++++++
 include/asm-x86_64/resume-trace.h |   13 +++++++++++++
 include/linux/resume-trace.h      |   19 +++++--------------
 kernel/power/Kconfig              |    2 +-
 6 files changed, 43 insertions(+), 16 deletions(-)

Index: linux/arch/x86_64/kernel/vmlinux.lds.S
===================================================================
--- linux.orig/arch/x86_64/kernel/vmlinux.lds.S
+++ linux/arch/x86_64/kernel/vmlinux.lds.S
@@ -52,6 +52,13 @@ SECTIONS
 
   RODATA
 
+  . = ALIGN(4);
+  .tracedata : AT(ADDR(.tracedata) - LOAD_OFFSET) {
+  	__tracedata_start = .;
+	*(.tracedata)
+  	__tracedata_end = .;
+  }
+
   . = ALIGN(PAGE_SIZE);        /* Align data segment to page size boundary */
 				/* Data */
   .data : AT(ADDR(.data) - LOAD_OFFSET) {
Index: linux/drivers/base/power/trace.c
===================================================================
--- linux.orig/drivers/base/power/trace.c
+++ linux/drivers/base/power/trace.c
@@ -142,6 +142,7 @@ void set_trace_device(struct device *dev
 {
 	dev_hash_value = hash_string(DEVSEED, dev-&gt;bus_id, DEVHASH);
 }
+EXPORT_SYMBOL(set_trace_device);
 
 /*
  * We could just take the "tracedata" index into the .tracedata
@@ -162,6 +163,7 @@ void generate_resume_trace(void *traceda
 	file_hash_value = hash_string(lineno, file, FILEHASH);
 	set_magic_time(user_hash_value, file_hash_value, dev_hash_value);
 }
+EXPORT_SYMBOL(gen...
To: <sam@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Sam Ravnborg &lt;sam@ravnborg.org&gt;

Following section mismatch warnings were reported by Andrey Borzenkov:

WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text:amd_init_mtrr from .text between 'mtrr_bp_init' (at offset 0x967a) and 'mtrr_attrib_to_str'
WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text:cyrix_init_mtrr from .text between 'mtrr_bp_init' (at offset 0x967f) and 'mtrr_attrib_to_str'
WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text:centaur_init_mtrr from .text between 'mtrr_bp_init' (at offset 0x9684) and 'mtrr_attrib_to_str'
WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text: from .text between 'get_mtrr_state' (at offset 0xa735) and 'generic_get_mtrr'
WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text: from .text between 'get_mtrr_state' (at offset 0xa749) and 'generic_get_mtrr'
WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text: from .text between 'get_mtrr_state' (at offset 0xa770) and 'generic_get_mtrr'

It was tracked down to a few functions missing __init tag.
Compile tested only.

Signed-off-by: Sam Ravnborg &lt;sam@ravnborg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

---

 arch/i386/kernel/cpu/mtrr/generic.c |    2 +-
 arch/i386/kernel/cpu/mtrr/main.c    |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Index: linux/arch/i386/kernel/cpu/mtrr/generic.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/mtrr/generic.c
+++ linux/arch/i386/kernel/cpu/mtrr/generic.c
@@ -79,7 +79,7 @@ static void print_fixed(unsigned base, u
 }
 
 /*  Grab all of the MTRR state for this CPU into *state  */
-void get_mtrr_state(void)
+void __init get_mtrr_state(void)
 {
 	unsigned int i;
 	struct mtrr_var_range *vrs;
Index: linux/arch/i386/kernel/cpu/mtrr/mai...
To: <trux@...>, <lee-in-berlin@...>, <ak@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Truxton Fulton &lt;trux@truxton.com&gt;

59f4e7d572980a521b7bdba74ab71b21f5995538 fixed machine rebooting on Truxton's
machine (when no keyboard was present).  But it broke it on Lee's machine.

The patch reinstates the old (pre-59f4e7d572980a521b7bdba74ab71b21f5995538)
code and if that doesn't work out, try the new,
post-59f4e7d572980a521b7bdba74ab71b21f5995538 code instead.

Cc: Lee Garrett &lt;lee-in-berlin@web.de&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

---

 include/asm-i386/mach-default/mach_reboot.h |   25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

Index: linux/include/asm-i386/mach-default/mach_reboot.h
===================================================================
--- linux.orig/include/asm-i386/mach-default/mach_reboot.h
+++ linux/include/asm-i386/mach-default/mach_reboot.h
@@ -19,14 +19,37 @@ static inline void kb_wait(void)
 static inline void mach_reboot(void)
 {
 	int i;
+
+	/* old method, works on most machines */
 	for (i = 0; i &lt; 10; i++) {
 		kb_wait();
 		udelay(50);
+		outb(0xfe, 0x64);	/* pulse reset low */
+		udelay(50);
+	}
+
+	/* New method: sets the "System flag" which, when set, indicates
+	 * successful completion of the keyboard controller self-test (Basic
+	 * Assurance Test, BAT).  This is needed for some machines with no
+	 * keyboard plugged in.  This read-modify-write sequence sets only the
+	 * system flag
+	 */
+	for (i = 0; i &lt; 10; i++) {
+		int cmd;
+
+		outb(0x20, 0x64);	/* read Controller Command Byte */
+		udelay(50);
+		kb_wait();
+		udelay(50);
+		cmd = inb(0x60);
+		udelay(50);
+		kb_wait();
+		udelay(50);
 		outb(0x60, 0x64);	/* write Controller Command Byte */
 		udelay(50);
 		kb_wait();
 		udelay(50);
-		outb(0x14, 0x60);	/* set "System flag" */
+		outb(cmd | 0x04, 0x60);	/* set "System flag" */
 		udelay(50);
 		kb_wait();
 		udelay(50);
-
To: <thockin@...>, <ak@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Tim Hockin &lt;thockin@google.com&gt;

Background:
 The MCE handler has several paths that it can take, depending on various
 conditions of the MCE status and the value of the 'tolerant' knob.  The
 exact semantics are not well defined and the code is a bit twisty.

Description:
 This patch makes the MCE handler's behavior more clear by documenting the
 behavior for various 'tolerant' levels.  It also fixes or enhances
 several small things in the handler.  Specifically:
     * If RIPV is set it is not safe to restart, so set the 'no way out'
       flag rather than the 'kill it' flag.
     * Don't panic() on correctable MCEs.
     * If the _OVER bit is set *and* the _UC bit is set (meaning possibly
       dropped uncorrected errors), set the 'no way out' flag.
     * Use EIPV for testing whether an app can be killed (SIGBUS) rather
       than RIPV.  According to docs, EIPV indicates that the error is
       related to the IP, while RIPV simply means the IP is valid to
       restart from.
     * Don't clear the MCi_STATUS registers until after the panic() path.
       This leaves the status bits set after the panic() so clever BIOSes
       can find them (and dumb BIOSes can do nothing).

 This patch also calls nonseekable_open() in mce_open (as suggested by akpm).

Result:
 Tolerant levels behave almost identically to how they always have, but
 not it's well defined.  There's a slightly higher chance of panic()ing
 when multiple errors happen (a good thing, IMHO).  If you take an MBE and
 panic(), the error status bits are not cleared.

Alternatives:
 None.

Testing:
 I used software to inject correctable and uncorrectable errors.  With
 tolerant = 3, the system usually survives.  With tolerant = 2, the system
 usually panic()s (PCC) but not always.  With tolerant = 1, the system
 always panic()s.  When the system panic()s, the BIOS is able to detect
 that the cause of death was an MC4.  I was not able to reproduce the
 case of a non-PCC error in userspace, with...
To: <thockin@...>, <ak@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Tim Hockin &lt;thockin@google.com&gt;

Background:
 /dev/mcelog is typically polled manually.  This is less than optimal for
 situations where accurate accounting of MCEs is important.  Calling
 poll() on /dev/mcelog does not work.

Description:
 This patch adds support for poll() to /dev/mcelog.  This results in
 immediate wakeup of user apps whenever the poller finds MCEs.  Because
 the exception handler can not take any locks, it can not call the wakeup
 itself.  Instead, it uses a thread_info flag (TIF_MCE_NOTIFY) which is
 caught at the next return from interrupt or exit from idle, calling the
 mce_user_notify() routine.  This patch also disables the "fake panic"
 path of the mce_panic(), because it results in printk()s in the exception
 handler and crashy systems.

 This patch also does some small cleanup for essentially unused variables,
 and moves the user notification into the body of the poller, so it is
 only called once per poll, rather than once per CPU.

Result:
 Applications can now poll() on /dev/mcelog.  When an error is logged
 (whether through the poller or through an exception) the applications are
 woken up promptly.  This should not affect any previous behaviors.  If no
 MCEs are being logged, there is no overhead.

Alternatives:
 I considered simply supporting poll() through the poller and not using
 TIF_MCE_NOTIFY at all.  However, the time between an uncorrectable error
 happening and the user application being notified is *the*most* critical
 window for us.  Many uncorrectable errors can be logged to the network if
 given a chance.

 I also considered doing the MCE poll directly from the idle notifier, but
 decided that was overkill.

Testing:
 I used an error-injecting DIMM to create lots of correctable DRAM errors
 and verified that my user app is woken up in sync with the polling interval.
 I also used the northbridge to inject uncorrectable ECC errors, and
 verified (printk() to the rescue) that the notify routine is called and the
 u...
To: <thockin@...>, <ak@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Tim Hockin &lt;thockin@google.com&gt;

Background:
 /dev/mcelog is a clear-on-read interface.  It is currently possible for
 multiple users to open and read() the device.  Users are protected from
 each other during any one read, but not across reads.

Description:
 This patch adds support for O_EXCL to /dev/mcelog.  If a user opens the
 device with O_EXCL, no other user may open the device (EBUSY).  Likewise,
 any user that tries to open the device with O_EXCL while another user has
 the device will fail (EBUSY).

Result:
 Applications can get exclusive access to /dev/mcelog.  Applications that
 do not care will be unchanged.

Alternatives:
 A simpler choice would be to only allow one open() at all, regardless of
 O_EXCL.

Testing:
 I wrote an application that opens /dev/mcelog with O_EXCL and observed
 that any other app that tried to open /dev/mcelog would fail until the
 exclusive app had closed the device.

Caveats:
 None.

Signed-off-by: Tim Hockin &lt;thockin@google.com&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
---

 arch/x86_64/kernel/mce.c |   36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

Index: linux/arch/x86_64/kernel/mce.c
===================================================================
--- linux.orig/arch/x86_64/kernel/mce.c
+++ linux/arch/x86_64/kernel/mce.c
@@ -465,6 +465,40 @@ void __cpuinit mcheck_init(struct cpuinf
  * Character device to read and clear the MCE log.
  */
 
+static DEFINE_SPINLOCK(mce_state_lock);
+static int open_count;	/* #times opened */
+static int open_exclu;	/* already open exclusive? */
+
+static int mce_open(struct inode *inode, struct file *file)
+{
+	spin_lock(&amp;mce_state_lock);
+
+	if (open_exclu || (open_count &amp;&amp; (file-&gt;f_flags &amp; O_EXCL))) {
+		spin_unlock(&amp;mce_state_lock);
+		return -EBUSY;
+	}
+
+	if (file-&gt;f_flags &amp; O_EXCL)
+		open_excl...
To: <adurbin@...>, <ak@...>, <rientjes@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Aaron Durbin &lt;adurbin@google.com&gt;

Insert the unclaimed MMCONFIG resources into the resource tree without the
IORESOURCE_BUSY flag during late initialization.  This allows the MMCONFIG
regions to be visible in the iomem resource tree without interfering with
other system resources that were discovered during PCI initialization.

[akpm@linux-foundation.org: nanofixes]
Signed-off-by: Aaron Durbin &lt;adurbin@google.com&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
---

 arch/i386/pci/mmconfig-shared.c |   48 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 45 insertions(+), 3 deletions(-)

Index: linux/arch/i386/pci/mmconfig-shared.c
===================================================================
--- linux.orig/arch/i386/pci/mmconfig-shared.c
+++ linux/arch/i386/pci/mmconfig-shared.c
@@ -24,6 +24,9 @@
 
 DECLARE_BITMAP(pci_mmcfg_fallback_slots, 32*PCI_MMCFG_MAX_CHECK_BUS);
 
+/* Indicate if the mmcfg resources have been placed into the resource table. */
+static int __initdata pci_mmcfg_resources_inserted;
+
 /* K8 systems have some devices (typically in the builtin northbridge)
    that are only accessible using type1
    Normally this can be expressed in the MCFG by not listing them
@@ -170,7 +173,7 @@ static int __init pci_mmcfg_check_hostbr
 	return name != NULL;
 }
 
-static void __init pci_mmcfg_insert_resources(void)
+static void __init pci_mmcfg_insert_resources(unsigned long resource_flags)
 {
 #define PCI_MMCFG_RESOURCE_NAME_LEN 19
 	int i;
@@ -194,10 +197,13 @@ static void __init pci_mmcfg_insert_reso
 			 cfg-&gt;pci_segment);
 		res-&gt;start = cfg-&gt;address;
 		res-&gt;end = res-&gt;start + (num_buses &lt;&lt; 20) - 1;
-		res-&gt;flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+		res-&gt;flags = IORESOURCE_MEM | resource_flags;
 		insert_resource(&amp;iomem_resource, res);
 		names += PC...
To: <rientjes@...>, <ak@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: David Rientjes &lt;rientjes@google.com&gt;

When we are in the emulated NUMA case, we need to make sure that all existing
apicid_to_node mappings that point to real node ID's now point to the
equivalent fake node ID's.

If we simply iterate over all apicid_to_node[] members for each node, we risk
remapping an entry if it shares a node ID with a real node.  Since apicid's
may not be consecutive, we're forced to create an automatic array of
apicid_to_node mappings and then copy it over once we have finished remapping
fake to real nodes.

Cc: Andi Kleen &lt;ak@suse.de&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

---

 arch/x86_64/mm/srat.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Index: linux/arch/x86_64/mm/srat.c
===================================================================
--- linux.orig/arch/x86_64/mm/srat.c
+++ linux/arch/x86_64/mm/srat.c
@@ -470,10 +470,13 @@ static int __init find_node_by_addr(unsi
  */
 void __init acpi_fake_nodes(const struct bootnode *fake_nodes, int num_nodes)
 {
-	int i;
+	int i, j;
 	int fake_node_to_pxm_map[MAX_NUMNODES] = {
 		[0 ... MAX_NUMNODES-1] = PXM_INVAL
 	};
+	unsigned char fake_apicid_to_node[MAX_LOCAL_APIC] = {
+		[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
+	};
 
 	printk(KERN_INFO "Faking PXM affinity for fake nodes on real "
 			 "topology.\n");
@@ -487,9 +490,17 @@ void __init acpi_fake_nodes(const struct
 		if (pxm == PXM_INVAL)
 			continue;
 		fake_node_to_pxm_map[i] = pxm;
+		/*
+		 * For each apicid_to_node mapping that exists for this real
+		 * node, it must now point to the fake node ID.
+		 */
+		for (j = 0; j &lt; MAX_LOCAL_APIC; j++)
+			if (apicid_to_node[j] == nid)
+				fake_apicid_to_node[j] = i;
 	}
 	for (i = 0; i &lt; num_nodes; i++)
 		__acpi_map_pxm_to_node(fake_node_to_pxm_map[i], i);
+	memcpy(apicid_to_node, fake_apicid_to_node, sizeo...
To: <rientjes@...>, <ak@...>, <lenb@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: David Rientjes &lt;rientjes@google.com&gt;

For NUMA emulation, our SLIT should represent the true NUMA topology of the
system but our proximity domain to node ID mapping needs to reflect the
emulated state.

When NUMA emulation has successfully setup fake nodes on the system, a new
function, acpi_fake_nodes() is called.  This function determines the proximity
domain (_PXM) for each true node found on the system.  It then finds which
emulated nodes have been allocated on this true node as determined by its
starting address.  The node ID to PXM mapping is changed so that each fake
node ID points to the PXM of the true node that it is located on.

If the machine failed to register a SLIT, then we assume there is no special
requirement for emulated node affinity so we use the default LOCAL_DISTANCE,
which is newly exported to this code, as our measurement if the emulated nodes
appear in the same PXM.  Otherwise, we use REMOTE_DISTANCE.

PXM_INVAL and NID_INVAL are also exported to the ACPI header file so that we
can compare node_to_pxm() results in generic code (in this case, the SRAT
code).

Cc: Andi Kleen &lt;ak@suse.de&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

---

 arch/x86_64/mm/numa.c     |    1 
 arch/x86_64/mm/srat.c     |   76 ++++++++++++++++++++++++++++++++++++++++++++--
 drivers/acpi/numa.c       |   11 ++++--
 include/acpi/acpi_numa.h  |    1 
 include/asm-x86_64/acpi.h |   11 ++++++
 include/linux/acpi.h      |    3 +
 6 files changed, 96 insertions(+), 7 deletions(-)

Index: linux/arch/x86_64/mm/numa.c
===================================================================
--- linux.orig/arch/x86_64/mm/numa.c
+++ linux/arch/x86_64/mm/numa.c
@@ -484,6 +484,7 @@ out:
 						nodes[i].end &gt;&gt; PAGE_SHIFT);
  		setup_node_bootmem(i, nodes[i].start, nodes[i].end);
 	}
+	acpi_fake_nodes(node...
To: <rientjes@...>, <mel@...>, <ak@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: David Rientjes &lt;rientjes@google.com&gt;

The logic in e820_find_active_regions() for determining the true active
regions for an e820 entry given a range of PFN's is needed for
e820_hole_size() as well.

e820_hole_size() is called from the NUMA emulation code to determine the
reserved area within an address range on a per-node basis.  Its logic should
duplicate that of finding active regions in an e820 entry because these are
the only true ranges we may register anyway.

[akpm@linux-foundation.org: cleanup]
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
---

 arch/x86_64/kernel/e820.c |   82 ++++++++++++++++++++++++++--------------------
 1 file changed, 48 insertions(+), 34 deletions(-)

Index: linux/arch/x86_64/kernel/e820.c
===================================================================
--- linux.orig/arch/x86_64/kernel/e820.c
+++ linux/arch/x86_64/kernel/e820.c
@@ -289,47 +289,61 @@ void __init e820_mark_nosave_regions(voi
 	}
 }
 
+/*
+ * Finds an active region in the address range from start_pfn to end_pfn and
+ * returns its range in ei_startpfn and ei_endpfn for the e820 entry.
+ */
+static int __init e820_find_active_region(const struct e820entry *ei,
+					  unsigned long start_pfn,
+					  unsigned long end_pfn,
+					  unsigned long *ei_startpfn,
+					  unsigned long *ei_endpfn)
+{
+	*ei_startpfn = round_up(ei-&gt;addr, PAGE_SIZE) &gt;&gt; PAGE_SHIFT;
+	*ei_endpfn = round_down(ei-&gt;addr + ei-&gt;size, PAGE_SIZE) &gt;&gt; PAGE_SHIFT;
+
+	/* Skip map entries smaller than a page */
+	if (*ei_startpfn &gt;= *ei_endpfn)
+		return 0;
+
+	/* Check if end_pfn_map should be updated */
+	if (ei-&gt;type != E820_RAM &amp;&amp; *ei_endpfn &gt; end_pfn_map)
+		end_pfn_map = *ei_endpfn;
+
+	/* Skip if map is outside the node */
+	if (ei-&gt;type != E820_RAM || ...
To: <clameter@...>, <davem@...>, <ak@...>, <tony.luck@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Christoph Lameter &lt;clameter@sgi.com&gt;

This adds caching of pgds and puds, pmds, pte.  That way we can avoid costly
zeroing and initialization of special mappings in the pgd.

A second quicklist is useful to separate out PGD handling.  We can carry the
initialized pgds over to the next process needing them.

Also clean up the pgd_list handling to use regular list macros.  There is no
need anymore to avoid the lru field.

Move the add/removal of the pgds to the pgdlist into the constructor /
destructor.  That way the implementation is congruent with i386.

Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Acked-by: William Lee Irwin III &lt;wli@holomorphy.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
---

 arch/x86_64/Kconfig          |    8 ++++
 arch/x86_64/kernel/process.c |    1 
 arch/x86_64/kernel/smp.c     |    2 -
 include/asm-x86_64/pgalloc.h |   73 ++++++++++++++++++++++++++++---------------
 include/asm-x86_64/pgtable.h |    1 
 5 files changed, 59 insertions(+), 26 deletions(-)

Index: linux/arch/x86_64/Kconfig
===================================================================
--- linux.orig/arch/x86_64/Kconfig
+++ linux/arch/x86_64/Kconfig
@@ -60,6 +60,14 @@ config ZONE_DMA
 	bool
 	default y
 
+config QUICKLIST
+	bool
+	default y
+
+config NR_QUICK
+	int
+	default 2
+
 config ISA
 	bool
 
Index: linux/arch/x86_64/kernel/process.c
===================================================================
--- linux.orig/arch/x86_64/kernel/process.c
+++ linux/arch/x86_64/kernel/process.c
@@ -207,6 +207,7 @@ void cpu_idle (void)
 			if (__get_cpu_var(cpu_idle_state))
 				__get_cpu_var(cpu_idle_state) = 0;
 
+			check_pgt_cache();
 			rmb();
 			idle = pm_idle;
 			if (!idle)
Index: linux/arch/x86_64/kernel/smp.c
====================...
To: <bunk@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Adrian Bunk &lt;bunk@stusta.de&gt;

timer_irq_works() needlessly became global.

Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

---

 arch/i386/kernel/io_apic.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/i386/kernel/io_apic.c
===================================================================
--- linux.orig/arch/i386/kernel/io_apic.c
+++ linux/arch/i386/kernel/io_apic.c
@@ -1902,7 +1902,7 @@ __setup("no_timer_check", notimercheck);
  *	- if this function detects that timer IRQs are defunct, then we fall
  *	  back to ISA timer IRQs
  */
-int __init timer_irq_works(void)
+static int __init timer_irq_works(void)
 {
 	unsigned long t1 = jiffies;
 
-
To: <bunk@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Adrian Bunk &lt;bunk@stusta.de&gt;

Every file should include the headers containing the prototypes for its
global functions.

Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

---

 arch/i386/kernel/i8253.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux/arch/i386/kernel/i8253.c
===================================================================
--- linux.orig/arch/i386/kernel/i8253.c
+++ linux/arch/i386/kernel/i8253.c
@@ -13,6 +13,7 @@
 #include &lt;asm/delay.h&gt;
 #include &lt;asm/i8253.h&gt;
 #include &lt;asm/io.h&gt;
+#include &lt;asm/timer.h&gt;
 
 #include "io_ports.h"
 
-
To: <bunk@...>, <ak@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Adrian Bunk &lt;bunk@stusta.de&gt;

Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
---

 arch/i386/kernel/setup.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/i386/kernel/setup.c
===================================================================
--- linux.orig/arch/i386/kernel/setup.c
+++ linux/arch/i386/kernel/setup.c
@@ -466,7 +466,7 @@ void __init setup_bootmem_allocator(void
  *
  * This should all compile down to nothing when NUMA is off.
  */
-void __init remapped_pgdat_init(void)
+static void __init remapped_pgdat_init(void)
 {
 	int nid;
 
-
To: <jbeulich@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: "Jan Beulich" &lt;jbeulich@novell.com&gt;
Constrain __supported_pte_mask and NX handling to just the PAE kernel.

Signed-off-by: Jan Beulich &lt;jbeulich@novell.com&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

 arch/i386/mm/init.c     |    7 ++++---
 include/asm-i386/page.h |    1 -
 2 files changed, 4 insertions(+), 4 deletions(-)

Index: linux/arch/i386/mm/init.c
===================================================================
--- linux.orig/arch/i386/mm/init.c
+++ linux/arch/i386/mm/init.c
@@ -471,6 +471,10 @@ void zap_low_mappings (void)
 	flush_tlb_all();
 }
 
+int nx_enabled = 0;
+
+#ifdef CONFIG_X86_PAE
+
 static int disable_nx __initdata = 0;
 u64 __supported_pte_mask __read_mostly = ~_PAGE_NX;
 EXPORT_SYMBOL_GPL(__supported_pte_mask);
@@ -500,9 +504,6 @@ static int __init noexec_setup(char *str
 }
 early_param("noexec", noexec_setup);
 
-int nx_enabled = 0;
-#ifdef CONFIG_X86_PAE
-
 static void __init set_nx(void)
 {
 	unsigned int v[4], l, h;
Index: linux/include/asm-i386/page.h
===================================================================
--- linux.orig/include/asm-i386/page.h
+++ linux/include/asm-i386/page.h
@@ -44,7 +44,6 @@
 extern int nx_enabled;
 
 #ifdef CONFIG_X86_PAE
-extern unsigned long long __supported_pte_mask;
 typedef struct { unsigned long pte_low, pte_high; } pte_t;
 typedef struct { unsigned long long pmd; } pmd_t;
 typedef struct { unsigned long long pgd; } pgd_t;
-
To: <jbeulich@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: "Jan Beulich" &lt;jbeulich@novell.com&gt;
Hence remove its handling in the opposite case.

Signed-off-by: Jan Beulich &lt;jbeulich@novell.com&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

 arch/i386/kernel/alternative.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

Index: linux/arch/i386/kernel/alternative.c
===================================================================
--- linux.orig/arch/i386/kernel/alternative.c
+++ linux/arch/i386/kernel/alternative.c
@@ -5,9 +5,8 @@
 #include &lt;asm/alternative.h&gt;
 #include &lt;asm/sections.h&gt;
 
-static int noreplace_smp     = 0;
-static int smp_alt_once      = 0;
-static int debug_alternative = 0;
+#ifdef CONFIG_HOTPLUG_CPU
+static int smp_alt_once;
 
 static int __init bootonly(char *str)
 {
@@ -15,6 +14,11 @@ static int __init bootonly(char *str)
 	return 1;
 }
 __setup("smp-alt-boot", bootonly);
+#else
+#define smp_alt_once 1
+#endif
+
+static int debug_alternative;
 
 static int __init debug_alt(char *str)
 {
@@ -23,6 +27,8 @@ static int __init debug_alt(char *str)
 }
 __setup("debug-alternative", debug_alt);
 
+static int noreplace_smp;
+
 static int __init setup_noreplace_smp(char *str)
 {
 	noreplace_smp = 1;
@@ -376,8 +382,6 @@ void __init alternative_instructions(voi
 #ifdef CONFIG_HOTPLUG_CPU
 	if (num_possible_cpus() &lt; 2)
 		smp_alt_once = 1;
-#else
-	smp_alt_once = 1;
 #endif
 
 #ifdef CONFIG_SMP
-
To: <jbeulich@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: "Jan Beulich" &lt;jbeulich@novell.com&gt;
.. and adjust documentation to properly reflect options that are
x86-64 specific.

Signed-off-by: Jan Beulich &lt;jbeulich@novell.com&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

 Documentation/x86_64/boot-options.txt |    6 ------
 arch/x86_64/kernel/mpparse.c          |    1 -
 2 files changed, 7 deletions(-)

Index: linux/Documentation/x86_64/boot-options.txt
===================================================================
--- linux.orig/Documentation/x86_64/boot-options.txt
+++ linux/Documentation/x86_64/boot-options.txt
@@ -134,12 +134,6 @@ Non Executable Mappings
 
 SMP
 
-  nosmp	Only use a single CPU
-
-  maxcpus=NUMBER only use upto NUMBER CPUs
-
-  cpumask=MASK   only use cpus with bits set in mask
-
   additional_cpus=NUM Allow NUM more CPUs for hotplug
 		 (defaults are specified by the BIOS, see Documentation/x86_64/cpu-hotplug-spec)
 
Index: linux/arch/x86_64/kernel/mpparse.c
===================================================================
--- linux.orig/arch/x86_64/kernel/mpparse.c
+++ linux/arch/x86_64/kernel/mpparse.c
@@ -32,7 +32,6 @@
 
 /* Have we found an MP table */
 int smp_found_config;
-unsigned int __initdata maxcpus = NR_CPUS;
 
 /*
  * Various Linux-internal data structures created from the
-
To: <jbeulich@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: "Jan Beulich" &lt;jbeulich@novell.com&gt;
Signed-off-by: Jan Beulich &lt;jbeulich@novell.com&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

 arch/x86_64/mm/fault.c |    2 +-
 arch/x86_64/mm/init.c  |    2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

Index: linux/arch/x86_64/mm/fault.c
===================================================================
--- linux.orig/arch/x86_64/mm/fault.c
+++ linux/arch/x86_64/mm/fault.c
@@ -301,7 +301,7 @@ static int vmalloc_fault(unsigned long a
 	return 0;
 }
 
-int page_fault_trace = 0;
+static int page_fault_trace;
 int exception_trace = 1;
 
 /*
Index: linux/arch/x86_64/mm/init.c
===================================================================
--- linux.orig/arch/x86_64/mm/init.c
+++ linux/arch/x86_64/mm/init.c
@@ -700,8 +700,6 @@ int kern_addr_valid(unsigned long addr) 
 #ifdef CONFIG_SYSCTL
 #include &lt;linux/sysctl.h&gt;
 
-extern int exception_trace, page_fault_trace;
-
 static ctl_table debug_table2[] = {
 	{
 		.ctl_name	= 99,
-
To: <jbeulich@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: "Jan Beulich" &lt;jbeulich@novell.com&gt;
Signed-off-by: Jan Beulich &lt;jbeulich@novell.com&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

 arch/i386/kernel/sysenter.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux/arch/i386/kernel/sysenter.c
===================================================================
--- linux.orig/arch/i386/kernel/sysenter.c
+++ linux/arch/i386/kernel/sysenter.c
@@ -336,7 +336,9 @@ struct vm_area_struct *get_gate_vma(stru
 
 int in_gate_area(struct task_struct *task, unsigned long addr)
 {
-	return 0;
+	const struct vm_area_struct *vma = get_gate_vma(task);
+
+	return vma &amp;&amp; addr &gt;= vma-&gt;vm_start &amp;&amp; addr &lt; vma-&gt;vm_end;
 }
 
 int in_gate_area_no_task(unsigned long addr)
-
To: <jbeulich@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: "Jan Beulich" &lt;jbeulich@novell.com&gt;
Consolidate the three 32-bit system call entry points so that they all
treat registers in similar ways.

Signed-off-by: Jan Beulich &lt;jbeulich@novell.com&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

 arch/x86_64/ia32/ia32entry.S |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux/arch/x86_64/ia32/ia32entry.S
===================================================================
--- linux.orig/arch/x86_64/ia32/ia32entry.S
+++ linux/arch/x86_64/ia32/ia32entry.S
@@ -104,7 +104,7 @@ ENTRY(ia32_sysenter_target)
 	pushq	%rax
 	CFI_ADJUST_CFA_OFFSET 8
 	cld
-	SAVE_ARGS 0,0,0
+	SAVE_ARGS 0,0,1
  	/* no need to do an access_ok check here because rbp has been
  	   32bit zero extended */ 
 1:	movl	(%rbp),%r9d
@@ -294,7 +294,7 @@ ia32_badarg:
  */ 				
 
 ENTRY(ia32_syscall)
-	CFI_STARTPROC	simple
+	CFI_STARTPROC32	simple
 	CFI_SIGNAL_FRAME
 	CFI_DEF_CFA	rsp,SS+8-RIP
 	/*CFI_REL_OFFSET	ss,SS-RIP*/
@@ -330,6 +330,7 @@ ia32_sysret:
 
 ia32_tracesys:			 
 	SAVE_REST
+	CLEAR_RREGS
 	movq $-ENOSYS,RAX(%rsp)	/* really needed? */
 	movq %rsp,%rdi        /* &amp;pt_regs -&gt; arg1 */
 	call syscall_trace_enter
-
To: Andi Kleen <ak@...>
Cc: <jbeulich@...>, <patches@...>, <linux-kernel@...>, Andrew Morton <akpm@...>
Date: Thursday, July 19, 2007 - 10:46 am

More comments and/or a less vague patch description would be nice.

What registers?  What behavior is being made common?  Why?

	Jeff


-
To: Jeff Garzik <jeff@...>
Cc: Andrew Morton <akpm@...>, Andi Kleen <ak@...>, <linux-kernel@...>, <patches@...>
Date: Monday, August 6, 2007 - 6:43 am

I think the description says this quite well - which registers are being saved/
cleared is being made consistent (not common).

Jan

-
To: <kiran@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Ravikiran G Thirumalai &lt;kiran@scalex86.org&gt;
Too many remote cpu references due to /proc/stat.

On x86_64, with newer kernel versions, kstat_irqs is a bit of a problem.
On every call to kstat_irqs, the process brings in per-cpu data from all
online cpus.  Doing this for NR_IRQS, which is now 256 + 32 * NR_CPUS
results in (256+32*63) * 63 remote cpu references on a 64 cpu config.
/proc/stat is parsed by common commands like top, who etc, causing
lots of cacheline transfers

This statistic seems useless. Other 'big iron' arches disable this.
Can we disable computing/reporting this statistic?  This piece of
statistic is not human readable on x86_64 anymore,

If not, can we optimize computing this statistic so as to avoid
too many remote references (patch to follow)

Signed-off-by: Ravikiran Thirumalai &lt;kiran@scalex86.org&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

---
 fs/proc/proc_misc.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux/fs/proc/proc_misc.c
===================================================================
--- linux.orig/fs/proc/proc_misc.c
+++ linux/fs/proc/proc_misc.c
@@ -499,7 +499,8 @@ static int show_stat(struct seq_file *p,
 	}
 	seq_printf(p, "intr %llu", (unsigned long long)sum);
 
-#if !defined(CONFIG_PPC64) &amp;&amp; !defined(CONFIG_ALPHA) &amp;&amp; !defined(CONFIG_IA64)
+#if !defined(CONFIG_PPC64) &amp;&amp; !defined(CONFIG_ALPHA) &amp;&amp; !defined(CONFIG_IA64) \
+					&amp;&amp; !defined(CONFIG_X86_64)
 	for (i = 0; i &lt; NR_IRQS; i++)
 		seq_printf(p, " %u", kstat_irqs(i));
 #endif
-
To: Andi Kleen <ak@...>
Cc: <kiran@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 6:21 am

-
To: Christoph Hellwig <hch@...>
Cc: <kiran@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 6:41 am

I guess it's fine on UP only architectures.  I will change it to !CONFIG_SMP
unless someone complains.

-Andi

-
To: Andi Kleen <ak@...>
Cc: Christoph Hellwig <hch@...>, <kiran@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 6:55 am

Making it depending on the kernel configuration will only cause 
surprises for users. And if you really need the data you can get 

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-
To: <tglx@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

---
 arch/x86_64/kernel/time.c |   88 +++++++++++++++++++++++-----------------------
 1 file changed, 44 insertions(+), 44 deletions(-)

Index: linux/arch/x86_64/kernel/time.c
===================================================================
--- linux.orig/arch/x86_64/kernel/time.c
+++ linux/arch/x86_64/kernel/time.c
@@ -220,7 +220,7 @@ unsigned long read_persistent_clock(void
 	/*
 	 * We know that x86-64 always uses BCD format, no need to check the
 	 * config register.
- 	 */
+	 */
 
 	BCD_TO_BIN(sec);
 	BCD_TO_BIN(min);
@@ -233,11 +233,11 @@ unsigned long read_persistent_clock(void
 		BCD_TO_BIN(century);
 		year += century * 100;
 		printk(KERN_INFO "Extended CMOS year: %d\n", century * 100);
-	} else { 
+	} else {
 		/*
 		 * x86-64 systems only exists since 2002.
 		 * This will work up to Dec 31, 2100
-	 	 */
+		 */
 		year += 2000;
 	}
 
@@ -249,45 +249,45 @@ unsigned long read_persistent_clock(void
 #define TICK_COUNT 100000000
 static unsigned int __init tsc_calibrate_cpu_khz(void)
 {
-       int tsc_start, tsc_now;
-       int i, no_ctr_free;
-       unsigned long evntsel3 = 0, pmc3 = 0, pmc_now = 0;
-       unsigned long flags;
-
-       for (i = 0; i &lt; 4; i++)
-               if (avail_to_resrv_perfctr_nmi_bit(i))
-                       break;
-       no_ctr_free = (i == 4);
-       if (no_ctr_free) {
-               i = 3;
-               rdmsrl(MSR_K7_EVNTSEL3, evntsel3);
-               wrmsrl(MSR_K7_EVNTSEL3, 0);
-               rdmsrl(MSR_K7_PERFCTR3, pmc3);
-       } else {
-               reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i);
-               reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
-       }
-       local_irq_save(flags);
-       /* start meauring cycles, incrementing...
To: <tglx@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fix coding style, white space wreckage and remove unused code.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

---
 arch/x86_64/kernel/apic.c |   73 ++++++++++++++++++----------------------------
 1 file changed, 30 insertions(+), 43 deletions(-)

Index: linux/arch/x86_64/kernel/apic.c
===================================================================
--- linux.orig/arch/x86_64/kernel/apic.c
+++ linux/arch/x86_64/kernel/apic.c
@@ -92,8 +92,9 @@ unsigned int safe_apic_wait_icr_idle(voi
 void enable_NMI_through_LVT0 (void * dummy)
 {
 	unsigned int v;
-	
-	v = APIC_DM_NMI;                        /* unmask and set to NMI */
+
+	/* unmask and set to NMI */
+	v = APIC_DM_NMI;
 	apic_write(APIC_LVT0, v);
 }
 
@@ -120,7 +121,7 @@ void ack_bad_irq(unsigned int irq)
 	 * holds up an irq slot - in excessive cases (when multiple
 	 * unexpected vectors occur) that might lock up the APIC
 	 * completely.
-  	 * But don't ack when the APIC is disabled. -AK
+	 * But don't ack when the APIC is disabled. -AK
 	 */
 	if (!disable_apic)
 		ack_APIC_irq();
@@ -616,7 +617,7 @@ early_param("apic", apic_set_verbosity);
  * Detect and enable local APICs on non-SMP boards.
  * Original code written by Keir Fraser.
  * On AMD64 we trust the BIOS - if it says no APIC it is likely
- * not correctly set up (usually the APIC timer won't work etc.) 
+ * not correctly set up (usually the APIC timer won't work etc.)
  */
 
 static int __init detect_init_APIC (void)
@@ -789,13 +790,13 @@ static void setup_APIC_timer(unsigned in
 	local_irq_save(flags);
 
 	/* wait for irq slice */
- 	if (hpet_address &amp;&amp; hpet_use_timer) {
- 		int trigger = hpet_readl(HPET_T0_CMP);
- 		while (hpet_readl(HPET_COUNTER) &gt;= trigger)
- 			/* do nothing */ ;
- 		while (hpet_readl(HPET_COUNTER) &l...
To: <tglx@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Thomas Gleixner &lt;tglx@linutronix.de&gt;
hpet.h in asm-i386 and asm-x86_64 contain tons of duplicated stuff.
Consolidate into one shared header file.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

---
 include/asm-i386/hpet.h   |  124 +++++++++++++++++-----------------------------
 include/asm-x86_64/hpet.h |   61 ----------------------
 2 files changed, 48 insertions(+), 137 deletions(-)

Index: linux/include/asm-i386/hpet.h
===================================================================
--- linux.orig/include/asm-i386/hpet.h
+++ linux/include/asm-i386/hpet.h
@@ -4,112 +4,82 @@
 
 #ifdef CONFIG_HPET_TIMER
 
-#include &lt;linux/errno.h&gt;
-#include &lt;linux/module.h&gt;
-#include &lt;linux/sched.h&gt;
-#include &lt;linux/kernel.h&gt;
-#include &lt;linux/param.h&gt;
-#include &lt;linux/string.h&gt;
-#include &lt;linux/mm.h&gt;
-#include &lt;linux/interrupt.h&gt;
-#include &lt;linux/time.h&gt;
-#include &lt;linux/delay.h&gt;
-#include &lt;linux/init.h&gt;
-#include &lt;linux/smp.h&gt;
-
-#include &lt;asm/io.h&gt;
-#include &lt;asm/smp.h&gt;
-#include &lt;asm/irq.h&gt;
-#include &lt;asm/msr.h&gt;
-#include &lt;asm/delay.h&gt;
-#include &lt;asm/mpspec.h&gt;
-#include &lt;asm/uaccess.h&gt;
-#include &lt;asm/processor.h&gt;
-
-#include &lt;linux/timex.h&gt;
-
 /*
  * Documentation on HPET can be found at:
  *      http://www.intel.com/ial/home/sp/pcmmspec.htm
  *      ftp://download.intel.com/ial/home/sp/mmts098.pdf
  */
 
-#define HPET_MMAP_SIZE	1024
+#define HPET_MMAP_SIZE		1024
 
-#define HPET_ID		0x000
-#define HPET_PERIOD	0x004
-#define HPET_CFG	0x010
-#define HPET_STATUS	0x020
-#define HPET_COUNTER	0x0f0
-#define HPET_T0_CFG	0x100
-#define HPET_T0_CMP	0x108
-#define HPET_T0_ROUTE	0x110
-#define HPET_T1_CFG	0x120
-#define HPET_T1_CMP	0x128
-#define HPET_T1_ROUTE	0x130
-#def...
To: <tglx@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Thomas Gleixner &lt;tglx@linutronix.de&gt;
The hpet_rtc_interrupt handler still uses pt_regs. Fix it.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

---
 arch/x86_64/kernel/hpet.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/x86_64/kernel/hpet.c
===================================================================
--- linux.orig/arch/x86_64/kernel/hpet.c
+++ linux/arch/x86_64/kernel/hpet.c
@@ -439,7 +439,7 @@ int hpet_rtc_dropped_irq(void)
 	return 1;
 }
 
-irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id, struct pt_regs *regs)
+irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id)
 {
 	struct rtc_time curr_time;
 	unsigned long rtc_int_flag = 0;
-
To: <tglx@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

---

 arch/x86_64/kernel/apic.c    |    4 ++--
 arch/x86_64/kernel/mce_amd.c |    6 +++---
 include/asm-x86_64/apic.h    |    4 ++--
 3 files changed, 7 insertions(+), 7 deletions(-)

Index: linux/arch/x86_64/kernel/apic.c
===================================================================
--- linux.orig/arch/x86_64/kernel/apic.c
+++ linux/arch/x86_64/kernel/apic.c
@@ -990,8 +990,8 @@ int setup_profiling_timer(unsigned int m
 	return -EINVAL;
 }
 
-void setup_APIC_extened_lvt(unsigned char lvt_off, unsigned char vector,
-			    unsigned char msg_type, unsigned char mask)
+void setup_APIC_extended_lvt(unsigned char lvt_off, unsigned char vector,
+			     unsigned char msg_type, unsigned char mask)
 {
 	unsigned long reg = (lvt_off &lt;&lt; 4) + K8_APIC_EXT_LVT_BASE;
 	unsigned int  v   = (mask &lt;&lt; 16) | (msg_type &lt;&lt; 8) | vector;
Index: linux/arch/x86_64/kernel/mce_amd.c
===================================================================
--- linux.orig/arch/x86_64/kernel/mce_amd.c
+++ linux/arch/x86_64/kernel/mce_amd.c
@@ -157,9 +157,9 @@ void __cpuinit mce_amd_feature_init(stru
 			high |= K8_APIC_EXT_LVT_ENTRY_THRESHOLD &lt;&lt; 20;
 			wrmsr(address, low, high);
 
-			setup_APIC_extened_lvt(K8_APIC_EXT_LVT_ENTRY_THRESHOLD,
-					       THRESHOLD_APIC_VECTOR,
-					       K8_APIC_EXT_INT_MSG_FIX, 0);
+			setup_APIC_extended_lvt(K8_APIC_EXT_LVT_ENTRY_THRESHOLD,
+						THRESHOLD_APIC_VECTOR,
+						K8_APIC_EXT_INT_MSG_FIX, 0);
 
 			threshold_defaults.address = address;
 			threshold_restart_bank(&amp;threshold_defaults, 0, 0);
Index: linux/include/asm-x86_64/apic.h
===================================================================
--- linux.orig/include/asm-x86_64/apic.h
+++ linux/include/asm-x86_64/...
To: <tglx@...>, <johnstul@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Remove unused code and variables and do some codingstyle / whitespace
cleanups while at it.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Cc: john stultz &lt;johnstul@us.ibm.com&gt;
---

 arch/x86_64/kernel/tsc.c |   39 +++++++++++----------------------------
 1 file changed, 11 insertions(+), 28 deletions(-)

Index: linux/arch/x86_64/kernel/tsc.c
===================================================================
--- linux.orig/arch/x86_64/kernel/tsc.c
+++ linux/arch/x86_64/kernel/tsc.c
@@ -36,25 +36,9 @@ static inline int check_tsc_unstable(voi
  * first tick after the change will be slightly wrong.
  */
 
-#include &lt;linux/workqueue.h&gt;
-
-static unsigned int cpufreq_delayed_issched = 0;
-static unsigned int cpufreq_init = 0;
-static struct work_struct cpufreq_delayed_get_work;
-
-static void handle_cpufreq_delayed_get(struct work_struct *v)
-{
-	unsigned int cpu;
-	for_each_online_cpu(cpu) {
-		cpufreq_get(cpu);
-	}
-	cpufreq_delayed_issched = 0;
-}
-
-static unsigned int  ref_freq = 0;
-static unsigned long loops_per_jiffy_ref = 0;
-
-static unsigned long tsc_khz_ref = 0;
+static unsigned int  ref_freq;
+static unsigned long loops_per_jiffy_ref;
+static unsigned long tsc_khz_ref;
 
 static int time_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
 				 void *data)
@@ -98,10 +82,8 @@ static struct notifier_block time_cpufre
 
 static int __init cpufreq_tsc(void)
 {
-	INIT_WORK(&amp;cpufreq_delayed_get_work, handle_cpufreq_delayed_get);
-	if (!cpufreq_register_notifier(&amp;time_cpufreq_notifier_block,
-				       CPUFREQ_TRANSITION_NOTIFIER))
-		cpufreq_init = 1;
+	cpufreq_register_notifier(&amp;time_cpufreq_notifier_block,
+				  CPUFREQ_TRANSITION_NOTIFIER);
 	return 0;
 }
 
@@ -123,17 +105,18 @@ __cpuinit int unsynchron...
To: <tglx@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Thomas Gleixner &lt;tglx@linutronix.de&gt;
xtime can be initialized including the cmos update from the generic
timekeeping code. Remove the arch specific implementation.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

---
 arch/x86_64/kernel/time.c |   40 +---------------------------------------
 1 file changed, 1 insertion(+), 39 deletions(-)

Index: linux/arch/x86_64/kernel/time.c
===================================================================
--- linux.orig/arch/x86_64/kernel/time.c
+++ linux/arch/x86_64/kernel/time.c
@@ -193,7 +193,7 @@ static irqreturn_t timer_interrupt(int i
 	return IRQ_HANDLED;
 }
 
-static unsigned long get_cmos_time(void)
+unsigned long read_persistent_clock(void)
 {
 	unsigned int year, mon, day, hour, min, sec;
 	unsigned long flags;
@@ -367,11 +367,6 @@ void __init time_init(void)
 {
 	if (nohpet)
 		hpet_address = 0;
-	xtime.tv_sec = get_cmos_time();
-	xtime.tv_nsec = 0;
-
-	set_normalized_timespec(&amp;wall_to_monotonic,
-	                        -xtime.tv_sec, -xtime.tv_nsec);
 
 	if (hpet_arch_init())
 		hpet_address = 0;
@@ -408,54 +403,21 @@ void __init time_init(void)
 	setup_irq(0, &amp;irq0);
 }
 
-
-static long clock_cmos_diff;
-static unsigned long sleep_start;
-
 /*
  * sysfs support for the timer.
  */
 
 static int timer_suspend(struct sys_device *dev, pm_message_t state)
 {
-	/*
-	 * Estimate time zone so that set_time can update the clock
-	 */
-	long cmos_time =  get_cmos_time();
-
-	clock_cmos_diff = -cmos_time;
-	clock_cmos_diff += get_seconds();
-	sleep_start = cmos_time;
 	return 0;
 }
 
 static int timer_resume(struct sys_device *dev)
 {
-	unsigned long flags;
-	unsigned long sec;
-	unsigned long ctime = get_cmos_time();
-	long sleep_length = (ctime - sleep_start) * HZ;
-
-	if (sleep_length &lt; 0) {
-		printk(KERN_WARNIN...
To: <tglx@...>, <johnstul@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Use the generic cmos update function in kernel/time/ntp.c

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Cc: john stultz &lt;johnstul@us.ibm.com&gt;
---

 arch/x86_64/Kconfig       |    4 ++++
 arch/x86_64/kernel/time.c |   25 +++++++++----------------
 2 files changed, 13 insertions(+), 16 deletions(-)

Index: linux/arch/x86_64/Kconfig
===================================================================
--- linux.orig/arch/x86_64/Kconfig
+++ linux/arch/x86_64/Kconfig
@@ -32,6 +32,10 @@ config GENERIC_TIME_VSYSCALL
 	bool
 	default y
 
+config GENERIC_CMOS_UPDATE
+	bool
+	default y
+
 config ZONE_DMA32
 	bool
 	default y
Index: linux/arch/x86_64/kernel/time.c
===================================================================
--- linux.orig/arch/x86_64/kernel/time.c
+++ linux/arch/x86_64/kernel/time.c
@@ -80,8 +80,9 @@ EXPORT_SYMBOL(profile_pc);
  * sheet for details.
  */
 
-static void set_rtc_mmss(unsigned long nowtime)
+static int set_rtc_mmss(unsigned long nowtime)
 {
+	int retval = 0;
 	int real_seconds, real_minutes, cmos_minutes;
 	unsigned char control, freq_select;
 
@@ -121,6 +122,7 @@ static void set_rtc_mmss(unsigned long n
 	if (abs(real_minutes - cmos_minutes) &gt;= 30) {
 		printk(KERN_WARNING "time.c: can't update CMOS clock "
 		       "from %d to %d\n", cmos_minutes, real_minutes);
+		retval = -1;
 	} else {
 		BIN_TO_BCD(real_seconds);
 		BIN_TO_BCD(real_minutes);
@@ -140,12 +142,17 @@ static void set_rtc_mmss(unsigned long n
 	CMOS_WRITE(freq_select, RTC_FREQ_SELECT);
 
 	spin_unlock(&amp;rtc_lock);
+
+	return retval;
 }
 
+int update_persistent_clock(struct timespec now)
+{
+	return set_rtc_mmss(now.tv_sec);
+}
 
 void main_timer_handler(void)
 {
-	static unsigned long rtc_update = 0;
 /*
  * Here we are in the time...
To: <chrisw@...>, <johnstul@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Chris Wright &lt;chrisw@sous-sol.org&gt;

When making changes to x86_64 timers, I noticed that touching hpet.h triggered
an unreasonably large rebuild.  Untangling it from timex.h quiets the extra
rebuild quite a bit.

Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Cc: john stultz &lt;johnstul@us.ibm.com&gt;

---

 drivers/char/rtc.c         |    2 +-
 include/asm-x86_64/apic.h  |    2 ++
 include/asm-x86_64/hpet.h  |    1 -
 include/asm-x86_64/timex.h |    1 -
 4 files changed, 3 insertions(+), 3 deletions(-)

Index: linux/drivers/char/rtc.c
===================================================================
--- linux.orig/drivers/char/rtc.c
+++ linux/drivers/char/rtc.c
@@ -82,7 +82,7 @@
 #include &lt;asm/uaccess.h&gt;
 #include &lt;asm/system.h&gt;
 
-#if defined(__i386__)
+#ifdef CONFIG_X86
 #include &lt;asm/hpet.h&gt;
 #endif
 
Index: linux/include/asm-x86_64/apic.h
===================================================================
--- linux.orig/include/asm-x86_64/apic.h
+++ linux/include/asm-x86_64/apic.h
@@ -86,6 +86,8 @@ extern void setup_apic_routing(void);
 extern void setup_APIC_extened_lvt(unsigned char lvt_off, unsigned char vector,
 				   unsigned char msg_type, unsigned char mask);
 
+extern int apic_is_clustered_box(void);
+
 #define K8_APIC_EXT_LVT_BASE    0x500
 #define K8_APIC_EXT_INT_MSG_FIX 0x0
 #define K8_APIC_EXT_INT_MSG_SMI 0x2
Index: linux/include/asm-x86_64/hpet.h
===================================================================
--- linux.orig/include/asm-x86_64/hpet.h
+++ linux/include/asm-x86_64/hpet.h
@@ -55,7 +55,6 @@
 
 extern int is_hpet_enabled(void);
 extern int hpet_rtc_timer_init(void);
-extern int apic_is_clustered_box(void);
 extern int hpet_arch_init(void);
 extern int hpet_timer_stop_set_go(unsigned long tick);
 extern int hpet_reenable(void);
Ind...
To: <chrisw@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Chris Wright &lt;chrisw@sous-sol.org&gt;

Remove pit_interrupt_hook as it adds just an extra layer.

Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

---

 include/asm-i386/i8253.h                 |   11 -----------
 include/asm-i386/mach-default/do_timer.h |    2 +-
 include/asm-i386/mach-voyager/do_timer.h |    2 +-
 3 files changed, 2 insertions(+), 13 deletions(-)

Index: linux/include/asm-i386/i8253.h
===================================================================
--- linux.orig/include/asm-i386/i8253.h
+++ linux/include/asm-i386/i8253.h
@@ -7,15 +7,4 @@ extern spinlock_t i8253_lock;
 
 extern struct clock_event_device *global_clock_event;
 
-/**
- * pit_interrupt_hook - hook into timer tick
- * @regs:	standard registers from interrupt
- *
- * Call the global clock event handler.
- **/
-static inline void pit_interrupt_hook(void)
-{
-	global_clock_event-&gt;event_handler(global_clock_event);
-}
-
 #endif	/* __ASM_I8253_H__ */
Index: linux/include/asm-i386/mach-default/do_timer.h
===================================================================
--- linux.orig/include/asm-i386/mach-default/do_timer.h
+++ linux/include/asm-i386/mach-default/do_timer.h
@@ -12,5 +12,5 @@
 
 static inline void do_timer_interrupt_hook(void)
 {
-	pit_interrupt_hook();
+	global_clock_event-&gt;event_handler(global_clock_event);
 }
Index: linux/include/asm-i386/mach-voyager/do_timer.h
===================================================================
--- linux.orig/include/asm-i386/mach-voyager/do_timer.h
+++ linux/include/asm-i386/mach-voyager/do_timer.h
@@ -12,7 +12,7 @@
  **/
 static inline void do_timer_interrupt_hook(void)
 {
-	pit_interrupt_hook();
+	global_clock_event-&gt;event_handler(global_clock_event);
 	voyager_timer_interrupt();
 }
 
-
To: <tglx@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: Thomas Gleixner &lt;tglx@linutronix.de&gt;
The current SMI detection logic in read_hpet_tsc() makes sure,
that when a SMI happens between the read of the HPET counter and
the read of the TSC, this wrong value is used for TSC calibration.

This is not the intention of the function. The comparison must ensure,
that we do _NOT_ use such a value.

Fix the check to use calibration values where delta of the two TSC reads
is smaller than a reasonable threshold.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;

---
 arch/x86_64/kernel/hpet.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux/arch/x86_64/kernel/hpet.c
===================================================================
--- linux.orig/arch/x86_64/kernel/hpet.c
+++ linux/arch/x86_64/kernel/hpet.c
@@ -190,7 +190,7 @@ int hpet_reenable(void)
  */
 
 #define TICK_COUNT 100000000
-#define TICK_MIN   5000
+#define SMI_THRESHOLD 50000
 #define MAX_TRIES  5
 
 /*
@@ -205,7 +205,7 @@ static void __init read_hpet_tsc(int *hp
 		tsc1 = get_cycles_sync();
 		hpet1 = hpet_readl(HPET_COUNTER);
 		tsc2 = get_cycles_sync();
-		if (tsc2 - tsc1 &gt; TICK_MIN)
+		if ((tsc2 - tsc1) &lt; SMI_THRESHOLD)
 			break;
 	}
 	*hpet = hpet1;
-
To: <B.Steinbrink@...>, <patches@...>, <linux-kernel@...>
Date: Thursday, July 19, 2007 - 5:55 am

From: [** iso-8859-1 charset **] Bj
To: Andi Kleen <ak@...>
Cc: <patches@...>, <linux-kernel@...>
Subject: