Re: [patch 13/17] Immediate Values - x86 Optimization (updated)

Previous thread: [patch 11/17] Implement immediate update via stop_machine_run by Mathieu Desnoyers on Wednesday, April 9, 2008 - 11:08 am. (4 messages)

Next thread: [PATCH] printk: Don't read beyond string arguments' terminating zero by Markus Armbruster on Wednesday, April 9, 2008 - 12:41 pm. (1 message)
To: <akpm@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>
Cc: Andi Kleen <andi@...>, Rusty Russell <rusty@...>, Mathieu Desnoyers <mathieu.desnoyers@...>, Andi Kleen <ak@...>, H. Peter Anvin <hpa@...>, Chuck Ebbert <cebbert@...>, Christoph Hellwig <hch@...>, Jeremy Fitzhardinge <jeremy@...>, Thomas Gleixner <tglx@...>, Ingo Molnar <mingo@...>, Adrian Bunk <bunk@...>, Alexey Dobriyan <adobriyan@...>, <akpm@...>
Date: Wednesday, April 9, 2008 - 11:08 am

x86 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.

Changelog:
- Use text_poke_early with cr0 WP save/restore to patch the bypass. We are doing
non atomic writes to a code region only touched by us (nobody can execute it
since we are protected by the imv_mutex).
- Put imv_set and _imv_set in the architecture independent header.
- Use $0 instead of %2 with (0) operand.
- Add x86_64 support, ready for i386+x86_64 -> x86 merge.
- Use asm-x86/asm.h.

Ok, so the most flexible solution that I see, that should fit for both
x86 and x86_64 would be :
1 byte : "=q" : "a", "b", "c", or "d" register for the i386. For
x86-64 it is equivalent to "r" class (for 8-bit
instructions that do not use upper halves).
2, 4, 8 bytes : "=r" : A register operand is allowed provided that it is in a
general register.

- "Redux" immediate values : no need to put a breakpoint, therefore, no
need to know where the instruction starts. It's therefore OK to have a
REX prefix.

- Bugfix : 8 bytes 64 bits immediate value was declared as "4 bytes" in the
immediate structure.
- Change the immediate.c update code to support variable length opcodes.
- Vastly simplified, using a busy looping IPI with interrupts disabled.
Does not protect against NMI nor MCE.
- Pack the __imv section. Use smallest types required for size (char).
- Use imv_* instead of immediate_*.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <ak@muc.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Chuck Ebbert <cebbert@redhat.com>
CC: Christoph Hellwig <hch@infradead.org>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Adrian Bunk <bunk@stusta.de>
CC...

To: Mathieu Desnoyers <mathieu.desnoyers@...>
Cc: <akpm@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Andi Kleen <andi@...>, Rusty Russell <rusty@...>, Andi Kleen <ak@...>, Chuck Ebbert <cebbert@...>, Christoph Hellwig <hch@...>, Jeremy Fitzhardinge <jeremy@...>, Thomas Gleixner <tglx@...>, Ingo Molnar <mingo@...>, Adrian Bunk <bunk@...>, Alexey Dobriyan <adobriyan@...>, <akpm@...>
Date: Wednesday, April 9, 2008 - 2:01 pm

Any reason to keep carrying this completely misleading comment chunk still?

-hpa
--

To: H. Peter Anvin <hpa@...>
Cc: <akpm@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Andi Kleen <andi@...>, Rusty Russell <rusty@...>, Andi Kleen <ak@...>, Chuck Ebbert <cebbert@...>, Christoph Hellwig <hch@...>, Jeremy Fitzhardinge <jeremy@...>, Thomas Gleixner <tglx@...>, Ingo Molnar <mingo@...>, Adrian Bunk <bunk@...>, Alexey Dobriyan <adobriyan@...>, <akpm@...>
Date: Wednesday, April 9, 2008 - 4:21 pm

Hrm, since even the nmi-safe version supports REX-prefixed instructions,
there is no need for an =q constraint on single-byte immediate values
anymore. (thanks to your "discard" section used in the nmi-safe version)

Here is the updated patch for the "[patch 13/17] Immediate Values - x86
Optimization". Thanks!

Mathieu

Immediate Values - x86 Optimization

x86 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.

Changelog:
- Use text_poke_early with cr0 WP save/restore to patch the bypass. We are doing
non atomic writes to a code region only touched by us (nobody can execute it
since we are protected by the imv_mutex).
- Use $0 instead of %2 with (0) operand.
- Add x86_64 support, ready for i386+x86_64 -> x86 merge.
- Use asm-x86/asm.h.
- Bugfix : 8 bytes 64 bits immediate value was declared as "4 bytes" in the
immediate structure.
- Vastly simplified, using a busy looping IPI with interrupts disabled.
Does not protect against NMI nor MCE.
- Pack the __imv section. Use smallest types required for size (char).
- Use imv_* instead of immediate_*.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <ak@muc.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Chuck Ebbert <cebbert@redhat.com>
CC: Christoph Hellwig <hch@infradead.org>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Adrian Bunk <bunk@stusta.de>
CC: akpm@osdl.org
---
arch/x86/Kconfig | 1
include/asm-x86/immediate.h | 77 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 78 insertions(+)

Index: linux-2.6-lttng/include/asm-x86/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/...

To: Mathieu Desnoyers <mathieu.desnoyers@...>
Cc: <akpm@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Andi Kleen <andi@...>, Rusty Russell <rusty@...>, Andi Kleen <ak@...>, Chuck Ebbert <cebbert@...>, Christoph Hellwig <hch@...>, Jeremy Fitzhardinge <jeremy@...>, Thomas Gleixner <tglx@...>, Ingo Molnar <mingo@...>, Adrian Bunk <bunk@...>, Alexey Dobriyan <adobriyan@...>, <akpm@...>
Date: Wednesday, April 9, 2008 - 6:33 pm

WRONG!

Using =r for single-byte values is incorrect for 32-bit code -- that
would permit %spl, %bpl, %sil, %dil which are illegal in 32-bit mode.
That is not the incorrect bit, it's the description that is confused.

-hpa
--

To: H. Peter Anvin <hpa@...>
Cc: <akpm@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Andi Kleen <andi@...>, Rusty Russell <rusty@...>, Andi Kleen <ak@...>, Chuck Ebbert <cebbert@...>, Christoph Hellwig <hch@...>, Jeremy Fitzhardinge <jeremy@...>, Thomas Gleixner <tglx@...>, Ingo Molnar <mingo@...>, Adrian Bunk <bunk@...>, Alexey Dobriyan <adobriyan@...>, <akpm@...>
Date: Wednesday, April 9, 2008 - 7:15 pm

Ah, right. I remembered there was something that made us use =q, but
could not remember what. I'll describe it correctly.

Therefore, this updated patch is bogus. The original one was ok, given
that we change the header to :

Immediate Values - x86 Optimization

x86 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.

Note : a movb needs to get its value froma =q constraint.

Quoting "H. Peter Anvin" <hpa@zytor.com>

Using =r for single-byte values is incorrect for 32-bit code -- that would
permit %spl, %bpl, %sil, %dil which are illegal in 32-bit mode.

Changelog:
- Use text_poke_early with cr0 WP save/restore to patch the bypass. We are doing
non atomic writes to a code region only touched by us (nobody can execute it
since we are protected by the imv_mutex).
- Put imv_set and _imv_set in the architecture independent header.
- Use $0 instead of %2 with (0) operand.
- Add x86_64 support, ready for i386+x86_64 -> x86 merge.
- Use asm-x86/asm.h.
- Bugfix : 8 bytes 64 bits immediate value was declared as "4 bytes" in the
immediate structure.
- Change the immediate.c update code to support variable length opcodes.
- Vastly simplified, using a busy looping IPI with interrupts disabled.
Does not protect against NMI nor MCE.
- Pack the __imv section. Use smallest types required for size (char).
- Use imv_* instead of immediate_*.

Mathieu

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--

To: H. Peter Anvin <hpa@...>
Cc: <akpm@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Andi Kleen <andi@...>, Rusty Russell <rusty@...>, Andi Kleen <ak@...>, Chuck Ebbert <cebbert@...>, Christoph Hellwig <hch@...>, Jeremy Fitzhardinge <jeremy@...>, Thomas Gleixner <tglx@...>, Ingo Molnar <mingo@...>, Adrian Bunk <bunk@...>, Alexey Dobriyan <adobriyan@...>, <akpm@...>
Date: Wednesday, April 9, 2008 - 3:08 pm

This comment explains why I use the =q constraint for the 1 bytes
immediate value. It makes sure we use an instruction with 1-byte opcode,
without REX.R prefix, on x86_64.

That's required for the NMI-safe version of the immediate values, which
uses a breakpoint, but not for this version based on stop_machine_run().
However, to minimize the amount of changes between the two versions, I
left the =q constraint, which is more restrictive. Is it worth it to use
=r instead ? It will typically let the compiler use a wider range of
registers on x86_64.

Thanks,

Mathieu

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--

To: Mathieu Desnoyers <mathieu.desnoyers@...>
Cc: <akpm@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Andi Kleen <andi@...>, Rusty Russell <rusty@...>, Andi Kleen <ak@...>, Chuck Ebbert <cebbert@...>, Christoph Hellwig <hch@...>, Jeremy Fitzhardinge <jeremy@...>, Thomas Gleixner <tglx@...>, Ingo Molnar <mingo@...>, Adrian Bunk <bunk@...>, Alexey Dobriyan <adobriyan@...>, <akpm@...>
Date: Wednesday, April 9, 2008 - 6:33 pm

No, it doesn't. That would be "=Q".

-hpa
--

To: H. Peter Anvin <hpa@...>
Cc: <akpm@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Andi Kleen <andi@...>, Rusty Russell <rusty@...>, Andi Kleen <ak@...>, Chuck Ebbert <cebbert@...>, Christoph Hellwig <hch@...>, Jeremy Fitzhardinge <jeremy@...>, Thomas Gleixner <tglx@...>, Ingo Molnar <mingo@...>, Adrian Bunk <bunk@...>, Alexey Dobriyan <adobriyan@...>, <akpm@...>
Date: Wednesday, April 9, 2008 - 8:42 pm

Ok. Sorry, it's been a few months since we looked at this. So the =q
opcode lets the compiler choose instructions with or without REX prefix.
We can allow this because

- We don't need the opcode length in the stop_machine_run() version
- we support variable length opcode in the nmi-safe version

Am I remembering correctly now ?

Mathieu

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--

To: Mathieu Desnoyers <mathieu.desnoyers@...>
Cc: <akpm@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Andi Kleen <andi@...>, Rusty Russell <rusty@...>, Andi Kleen <ak@...>, Chuck Ebbert <cebbert@...>, Christoph Hellwig <hch@...>, Jeremy Fitzhardinge <jeremy@...>, Thomas Gleixner <tglx@...>, Ingo Molnar <mingo@...>, Adrian Bunk <bunk@...>, Alexey Dobriyan <adobriyan@...>, <akpm@...>
Date: Wednesday, April 9, 2008 - 8:47 pm

Yes. Both =r and =q allow REX opcodes.

-hpa
--

Previous thread: [patch 11/17] Implement immediate update via stop_machine_run by Mathieu Desnoyers on Wednesday, April 9, 2008 - 11:08 am. (4 messages)

Next thread: [PATCH] printk: Don't read beyond string arguments' terminating zero by Markus Armbruster on Wednesday, April 9, 2008 - 12:41 pm. (1 message)