[Posted to linux-kernel some time ago to fix a bug that was workarounded in 2.6.24 in a cleaner better way, but seems to have escaped the attention of the maintainers. Please apply. Still applies to git-x86#testing] --- Run IST traps from user mode preemptive on process stack x86-64 has a few exceptions which run on special architecture supported IST exception stacks: these are nmi, double fault, stack fault, int 3, debug. Previously they would check for a scheduling event on returning if the original CPU state was user mode and then switch to a process stack to schedule. But the actual trap handler would still run on the IST stack with preemption disabled. This patch changes these traps instead to always switch to the process stack when the trap originated from user mode. For kernel traps it keeps running non preemptive on the IST stack because that is much safer (e.g. to still get nmi watchdog events out even when the process stack is corrupted) Then the actual trap handlers can run with preemption enabled or schedule as needed (e.g. to take locks) This fixes a regression I added earlier with print_vma_addr() executing down() in these trap handlers from user space. Strictly the change would have been only needed for debug and int3, but since they share this code as macros it was cleanest to just change all. Cc: jkosina@suse.cz Cc: zdenek.kabelac@gmail.com Signed-off-by: Andi Kleen <ak@suse.de> --- arch/x86/kernel/entry_64.S | 17 ++++++++++------- arch/x86/kernel/traps_64.c | 17 ++++++++++------- 2 files changed, 20 insertions(+), 14 deletions(-) Index: linux/arch/x86/kernel/entry_64.S =================================================================== --- linux.orig/arch/x86/kernel/entry_64.S +++ linux/arch/x86/kernel/entry_64.S @@ -770,12 +770,18 @@ END(spurious_interrupt) .if \ist movq %gs:pda_data_offset, %rbp .endif - movq %rsp,%rdi movq ORIG_RAX(%rsp),%rsi movq $-1,ORIG_RAX(%rsp) .if \ist subq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp) .endif + testl $3,CS(%rsp) + jz 2f /* in kernel? stay on exception stack for now */ + movq %rsp,%rdi + call sync_regs /* Move all state over to process stack */ + movq %rax,%rsp /* switch stack to process stack */ +2: + movq %rsp,%rdi call \sym .if \ist addq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp) @@ -801,16 +807,16 @@ END(spurious_interrupt) .macro paranoidexit trace=1 /* ebx: no swapgs flag */ paranoid_exit\trace: - testl %ebx,%ebx /* swapgs needed? */ - jnz paranoid_restore\trace testl $3,CS(%rsp) jnz paranoid_userspace\trace paranoid_swapgs\trace: + testl %ebx,%ebx /* swapgs needed? */ + jnz 1f .if \trace TRACE_IRQS_IRETQ 0 .endif SWAPGS_UNSAFE_STACK -paranoid_restore\trace: +1: RESTORE_ALL 8 jmp irq_return paranoid_userspace\trace: @@ -818,9 +824,6 @@ paranoid_userspace\trace: movl threadinfo_flags(%rcx),%ebx andl $_TIF_WORK_MASK,%ebx jz paranoid_swapgs\trace - movq %rsp,%rdi /* &pt_regs */ - call sync_regs - movq %rax,%rsp /* switch stack for scheduling */ testl $_TIF_NEED_RESCHED,%ebx jnz paranoid_schedule\trace movl %ebx,%edx /* arg3: thread flags */ Index: linux/arch/x86/kernel/traps_64.c =================================================================== --- linux.orig/arch/x86/kernel/traps_64.c +++ linux/arch/x86/kernel/traps_64.c @@ -84,7 +84,8 @@ static inline void conditional_sti(struc static inline void preempt_conditional_sti(struct pt_regs *regs) { - inc_preempt_count(); + if (!user_mode(regs)) + inc_preempt_count(); if (regs->flags & X86_EFLAGS_IF) local_irq_enable(); } @@ -95,7 +96,8 @@ static inline void preempt_conditional_c local_irq_disable(); /* Make sure to not schedule here because we could be running on an exception stack. */ - dec_preempt_count(); + if (!user_mode(regs)) + dec_preempt_count(); } int kstack_depth_to_print = 12; @@ -855,7 +857,7 @@ asmlinkage __kprobes void default_do_nmi io_check_error(reason, regs); } -/* runs on IST stack. */ +/* May run on IST stack. */ asmlinkage void __kprobes do_int3(struct pt_regs * regs, long error_code) { trace_hardirqs_fixup(); @@ -868,9 +870,10 @@ asmlinkage void __kprobes do_int3(struct preempt_conditional_cli(regs); } -/* Help handler running on IST stack to switch back to user stack - for scheduling or signal handling. The actual stack switch is done in - entry.S */ +/* + * Help handler running on IST stack to switch back to user stack. + * The actual stack switch is done in entry.S + */ asmlinkage __kprobes struct pt_regs *sync_regs(struct pt_regs *eregs) { struct pt_regs *regs = eregs; @@ -889,7 +892,7 @@ asmlinkage __kprobes struct pt_regs *syn return regs; } -/* runs on IST stack. */ +/* May run on IST stack. */ asmlinkage void __kprobes do_debug(struct pt_regs * regs, unsigned long error_code) { --
| David Miller | Re: [patch 7/8] fdmap v2 - implement sys_socket2 |
| Sean | Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation,pathname matching |
| Andi Kleen | Re: missing madvise functionality |
| Alan Cox | [PATCH 03/57] ali: watchdog locking and style |
git: | |
| Guido Ostkamp | [PATCH] Fix Solaris Workshop Compiler issues |
| David Lang | Re: mingw, windows, crlf/lf, and git |
| Johannes Schindelin | Re: [kernel.org users] [RFD] On deprecating "git-foo" for builtins |
| Johannes Schindelin | Re: [PATCH] Fix off by one error in prep_exclude. |
| Marco Peereboom | Re: Real men don't attack straw men |
| patrick keshishian | SMTP flood + spamdb |
| Marcos Laufer | dmesg IBM x3650 OpenBSD 4.3 |
| Nick Holland | Re: The Atheros story in much fewer words |
| Hans de Goede | Re: cat /proc/net/tcp takes 0.5 seconds on x86_64 |
| Stephen Hemminger | [RFC] TCP illinois max rtt aging |
| Tilman Schmidt | Re: 2.6.25-rc8: FTP transfer errors |
| Evgeniy Polyakov | Re: Network/block layer race. |
| high memory | 15 hours ago | Linux kernel |
| semaphore access speed | 18 hours ago | Applications and Utilities |
| the kernel how to power off the machine | 19 hours ago | Linux kernel |
| Easter Eggs in windows XP | 21 hours ago | Windows |
| Shared swap partition | 22 hours ago | Linux general |
| Root password | 23 hours ago | Linux general |
| Where/when DNOTIFY is used? | 1 day ago | Linux kernel |
| How to convert Linux Kernel built-in module into a loadable module | 1 day ago | Linux kernel |
| Linux 2.6.24 and I/O schedulers | 1 day ago | Linux kernel |
| USB Driver -- Interrupt Polling -- A Little Help Please | 1 day ago | Linux general |
