[PATCHv4] SGI UV: TLB shootdown using broadcast assist unit

Previous thread: [PATCH 09/23] checkpatch: trailing statement indent: fix end of statement location by Andy Whitcroft on Thursday, June 12, 2008 - 5:05 am. (24 messages)

Next thread: [PATCH] fix HID quirks for aluminium apple wireless keyboards by Paul Collins on Thursday, June 12, 2008 - 5:26 am. (5 messages)
From: Cliff Wickman
Date: Thursday, June 12, 2008 - 5:23 am

From: Cliff Wickman <cpw@sgi.com>

TLB shootdown for SGI UV.

v1: 6/2 original
v2: 6/3 corrections/improvements per Ingo's review
v3: 6/4 split atomic operations off to a separate patch (Jeremy's review)
v4: 6/12 include <mach_apic.h> rather than <asm/mach-bigsmp/mach_apic.h>
         (fixes a !SMP build problem that Ingo found)
         fix the index on uv_table_bases[blade]
Now depends on patch:
    x86 atomic operations: atomic_or_long atomic_inc_short
  which was split off (and improved) at the suggestion of
  Jeremy Fitzhardinge <jeremy@goop.org>

This patch provides the ability to flush TLB's in cpu's that are not on
the local node.  The hardware mechanism for distributing the flush
messages is the UV's "broadcast assist unit".

The hook to intercept TLB shootdown requests is a 2-line change to
native_flush_tlb_others() (arch/x86/kernel/tlb_64.c).

This code has been tested on a hardware simulator. The real hardware
is not yet available.

The shootdown statistics are provided through /proc/sgi_uv/ptc_statistics.
The use of /sys was considered, but would have required the use of
many /sys files.  The debugfs was also considered, but these statistics
should be available on an ongoing basis, not just for debugging.

Issues to be fixed later:
- The IRQ for the messaging interrupt is currently hardcoded as 200
  (see UV_BAU_MESSAGE).  It should be dynamically assigned in the future.
- The use of appropriate udelay()'s is untested, as they are a problem
  in the simulator.

Diffed against 2.6.26-rc4

Signed-off-by: Cliff Wickman <cpw@sgi.com>
---
 arch/x86/kernel/Makefile    |    2 
 arch/x86/kernel/entry_64.S  |    4 
 arch/x86/kernel/tlb_64.c    |    5 
 arch/x86/kernel/tlb_uv.c    |  786 ++++++++++++++++++++++++++++++++++++++++++++
 include/asm-x86/uv/uv_bau.h |  336 ++++++++++++++++++
 5 files changed, 1132 insertions(+), 1 deletion(-)

Index: 080602.ingo/arch/x86/kernel/entry_64.S
===================================================================
--- ...
From: Nick Piggin
Date: Thursday, June 12, 2008 - 5:35 am

For someone not too familiar with low level x86 (or UV) code, can
you explain why you are hooking at this point? I mean, what it
looks like is either a performance improvement, or for some reason
UV does not support send_IPI_mask out to CPUs "not on the local node".

If the former, what sort of improvement to you expect / see?

If the latter, then why aren't you hooking at send_IPI_mask? If
possible that would obviously be preferable so you aren't
making the tlb flushing harder to follow...

Can you set me straight? :)

Thanks,
Nick
--

From: Cliff Wickman
Date: Thursday, June 12, 2008 - 5:56 am

Hi Nick,


Yes, a performance improvement.  The UV machine has hardware for
broadcasting messages to a set of nodes (represented in a bit mask).  The
messages will raise interrupts at each of the target nodes and provide
the message - all in one step.
(IPI is supported.  In fact this patch falls back to the IPI method

Good question.  The hardware does not exist yet.  But using IPI there
would be one set of packets exchanged to deliver the interrupts and
another set to pull over the flush address, just to start the operation.
I expect the improvement to be significant.

-Cliff
-- 
Cliff Wickman
Silicon Graphics, Inc.
cpw@sgi.com
(651) 683-3824
--

From: Nick Piggin
Date: Thursday, June 12, 2008 - 6:18 am

Thanks, that makes it perfectly clear to me now (the intent, not
the details of the code :))

So long as this raises a maskable interrupt on each target CPU, it

Ah, so you can send a small message with the IPI, and that can be
decoded and used by the target without invoking the cc protocol.
Seems like pretty sweet functionality.

I guess TLB flushing is an obvious candidate, but it could be
quite useful for other operations as well. I wonder if it couldn't
be used to create a slightly more advanced API (than send_IPI)
which other platforms can just implement using cache coherency for
the payload...

For example, some classes of smp_call_function could use this too.

But for now I don't see anything wrong with getting this patch
upstream and looking to generalise it later.
--

From: Cliff Wickman
Date: Thursday, June 12, 2008 - 7:28 am

Hi Nick,


Jack Steiner's thought as well.  But I haven't considered any yet. If
you care to nominate any other such uses for this hardware mechanism

-Cliff
-- 
Cliff Wickman
Silicon Graphics, Inc.
cpw@sgi.com
(651) 683-3824
--

From: Ingo Molnar
Date: Thursday, June 12, 2008 - 6:11 am

style: we try to add a single space after commas, to make it visually 

since you basically hook into this function and replace it, wouldnt it 
make more sense instead to override the flush_tlb_others callback in 


return is not needed at the end of void functions. (there are many other 


why not split the iterator into a helper function? That would avoid all 
these line length artifacts as well. Also, decrease constant name 

'struct bau_activation_descriptor' is too long of a name (and that 


many repetitive __get_cpu_var(bau_control) instances - put that into a 


small nit, we try to keep vertical alignment consistent across files, 



this iteration could avoid the linebreak via intelligent shrinking of 

(similar things in other places as well)

	Ingo
--

Previous thread: [PATCH 09/23] checkpatch: trailing statement indent: fix end of statement location by Andy Whitcroft on Thursday, June 12, 2008 - 5:05 am. (24 messages)

Next thread: [PATCH] fix HID quirks for aluminium apple wireless keyboards by Paul Collins on Thursday, June 12, 2008 - 5:26 am. (5 messages)