x86 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.Changelog:
- Use text_poke_early with cr0 WP save/restore to patch the bypass. We are doing
non atomic writes to a code region only touched by us (nobody can execute it
since we are protected by the imv_mutex).
- Put imv_set and _imv_set in the architecture independent header.
- Use $0 instead of %2 with (0) operand.
- Add x86_64 support, ready for i386+x86_64 -> x86 merge.
- Use asm-x86/asm.h.Ok, so the most flexible solution that I see, that should fit for both
x86 and x86_64 would be :
1 byte : "=q" : "a", "b", "c", or "d" register for the i386. For
x86-64 it is equivalent to "r" class (for 8-bit
instructions that do not use upper halves).
2, 4, 8 bytes : "=r" : A register operand is allowed provided that it is in a
general register.- "Redux" immediate values : no need to put a breakpoint, therefore, no
need to know where the instruction starts. It's therefore OK to have a
REX prefix.- Bugfix : 8 bytes 64 bits immediate value was declared as "4 bytes" in the
immediate structure.
- Change the immediate.c update code to support variable length opcodes.
- Vastly simplified, using a busy looping IPI with interrupts disabled.
Does not protect against NMI nor MCE.
- Pack the __imv section. Use smallest types required for size (char).
- Use imv_* instead of immediate_*.Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <ak@muc.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Chuck Ebbert <cebbert@redhat.com>
CC: Christoph Hellwig <hch@infradead.org>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Adrian Bunk <bunk@stusta.de>
CC...
Any reason to keep carrying this completely misleading comment chunk still?
-hpa
--
Hrm, since even the nmi-safe version supports REX-prefixed instructions,
there is no need for an =q constraint on single-byte immediate values
anymore. (thanks to your "discard" section used in the nmi-safe version)Here is the updated patch for the "[patch 13/17] Immediate Values - x86
Optimization". Thanks!Mathieu
Immediate Values - x86 Optimization
x86 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.Changelog:
- Use text_poke_early with cr0 WP save/restore to patch the bypass. We are doing
non atomic writes to a code region only touched by us (nobody can execute it
since we are protected by the imv_mutex).
- Use $0 instead of %2 with (0) operand.
- Add x86_64 support, ready for i386+x86_64 -> x86 merge.
- Use asm-x86/asm.h.
- Bugfix : 8 bytes 64 bits immediate value was declared as "4 bytes" in the
immediate structure.
- Vastly simplified, using a busy looping IPI with interrupts disabled.
Does not protect against NMI nor MCE.
- Pack the __imv section. Use smallest types required for size (char).
- Use imv_* instead of immediate_*.Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <ak@muc.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Chuck Ebbert <cebbert@redhat.com>
CC: Christoph Hellwig <hch@infradead.org>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Adrian Bunk <bunk@stusta.de>
CC: akpm@osdl.org
---
arch/x86/Kconfig | 1
include/asm-x86/immediate.h | 77 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 78 insertions(+)Index: linux-2.6-lttng/include/asm-x86/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/...
WRONG!
Using =r for single-byte values is incorrect for 32-bit code -- that
would permit %spl, %bpl, %sil, %dil which are illegal in 32-bit mode.
That is not the incorrect bit, it's the description that is confused.-hpa
--
Ah, right. I remembered there was something that made us use =q, but
could not remember what. I'll describe it correctly.Therefore, this updated patch is bogus. The original one was ok, given
that we change the header to :Immediate Values - x86 Optimization
x86 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.Note : a movb needs to get its value froma =q constraint.
Quoting "H. Peter Anvin" <hpa@zytor.com>
Using =r for single-byte values is incorrect for 32-bit code -- that would
permit %spl, %bpl, %sil, %dil which are illegal in 32-bit mode.Changelog:
- Use text_poke_early with cr0 WP save/restore to patch the bypass. We are doing
non atomic writes to a code region only touched by us (nobody can execute it
since we are protected by the imv_mutex).
- Put imv_set and _imv_set in the architecture independent header.
- Use $0 instead of %2 with (0) operand.
- Add x86_64 support, ready for i386+x86_64 -> x86 merge.
- Use asm-x86/asm.h.
- Bugfix : 8 bytes 64 bits immediate value was declared as "4 bytes" in the
immediate structure.
- Change the immediate.c update code to support variable length opcodes.
- Vastly simplified, using a busy looping IPI with interrupts disabled.
Does not protect against NMI nor MCE.
- Pack the __imv section. Use smallest types required for size (char).
- Use imv_* instead of immediate_*.Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
This comment explains why I use the =q constraint for the 1 bytes
immediate value. It makes sure we use an instruction with 1-byte opcode,
without REX.R prefix, on x86_64.That's required for the NMI-safe version of the immediate values, which
uses a breakpoint, but not for this version based on stop_machine_run().
However, to minimize the amount of changes between the two versions, I
left the =q constraint, which is more restrictive. Is it worth it to use
=r instead ? It will typically let the compiler use a wider range of
registers on x86_64.Thanks,
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
No, it doesn't. That would be "=Q".
-hpa
--
Ok. Sorry, it's been a few months since we looked at this. So the =q
opcode lets the compiler choose instructions with or without REX prefix.
We can allow this because- We don't need the opcode length in the stop_machine_run() version
- we support variable length opcode in the nmi-safe versionAm I remembering correctly now ?
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
Yes. Both =r and =q allow REX opcodes.
-hpa
--
| Christoph Lameter | Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures |
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
| Pekka Enberg | Re: [bug] mm/slab.c boot crash in -git, "kernel BUG at mm/slab.c:2103!" |
| Glauber de Oliveira Costa | [PATCH 08/79] [PATCH] use identify_boot_cpu |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Natalie Protasevich | [BUG] New Kernel Bugs |
| David Miller | [GIT]: Networking |
