What if a TLB flush needed to sleep?

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Luck, Tony
Date: Tuesday, March 25, 2008 - 1:49 pm

ia64 processors have a "ptc.g" instruction that will purge
a TLB entry across all processors in a system.  On current
cpus there is a limitation that only one ptc.g instruction may
be in flight at a time, so we serialize execution with code
like this:

	spin_lock(&ptcg_lock);
	... execute ptc.g
	spin_unlock(&ptcg_lock);

The architecture allows for more than one purge at a time.
So (without making any declarations about features of
unreleased processors) it seemed like time to update the
code to grab the maximum count from PAL, use that to
initialize a semaphore, and change the code to:

	down(&ptcg_sem);
	... execute ptc.g
	up(&ptcg_sem);

This code lasted about a week before someone ran hackbench
with parameters chosen to cause some swap activity (memory
footprint ~8.5GB on an 8GB system).  The machine promptly
deadlocked because VM code called the tlbflush code while
holding an anon_vma_lock, the semaphore happened to sleep
because some other processor was also trying to do a purge,
and the test was on a system where the limit was still just
one ptc.g at a time, and the process got swapped.

Now for the questions:

1) Is holding a spin lock a problem for any other arch when
doing a TLB flush (I'm particularly thinking of those that
need to use IPI shootdown for the purge)?

2) Is it feasible to rearrange the MM code so that we don't
hold any locks while doing a TLB flush?  Or should I implement
some sort of spin_only_semaphore?

-Tony
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
What if a TLB flush needed to sleep?, Luck, Tony, (Tue Mar 25, 1:49 pm)
Re: What if a TLB flush needed to sleep?, Alan Cox, (Tue Mar 25, 2:47 pm)
Re: What if a TLB flush needed to sleep?, Matthew Wilcox, (Wed Mar 26, 5:32 am)
Re: What if a TLB flush needed to sleep?, Christoph Lameter, (Wed Mar 26, 12:25 pm)
Re: What if a TLB flush needed to sleep?, Thomas Gleixner, (Wed Mar 26, 1:29 pm)
RE: What if a TLB flush needed to sleep?, Luck, Tony, (Wed Mar 26, 1:29 pm)
Re: What if a TLB flush needed to sleep?, Christoph Lameter, (Wed Mar 26, 6:19 pm)
Re: What if a TLB flush needed to sleep?, Jens Axboe, (Thu Mar 27, 1:09 am)
Re: What if a TLB flush needed to sleep?, Peter Zijlstra, (Thu Mar 27, 6:20 am)
down_spin() implementation, Matthew Wilcox, (Thu Mar 27, 7:15 am)
Re: What if a TLB flush needed to sleep?, Christoph Lameter, (Thu Mar 27, 11:44 am)
Re: down_spin() implementation, Nick Piggin, (Thu Mar 27, 5:01 pm)
Re: down_spin() implementation, Stephen Rothwell, (Thu Mar 27, 9:51 pm)
Re: down_spin() implementation, Nick Piggin, (Thu Mar 27, 10:03 pm)
Re: What if a TLB flush needed to sleep?, Peter Zijlstra, (Fri Mar 28, 2:59 am)
Re: down_spin() implementation, Matthew Wilcox, (Fri Mar 28, 5:45 am)
Re: down_spin() implementation, Matthew Wilcox, (Fri Mar 28, 5:46 am)
Re: down_spin() implementation, Jens Axboe, (Fri Mar 28, 5:51 am)
Re: down_spin() implementation, Matthew Wilcox, (Fri Mar 28, 6:17 am)
Re: down_spin() implementation, Jens Axboe, (Fri Mar 28, 6:24 am)
RE: down_spin() implementation, Luck, Tony, (Fri Mar 28, 2:16 pm)
Re: down_spin() implementation, Arnd Bergmann, (Fri Mar 28, 4:48 pm)
Re: down_spin() implementation, Nick Piggin, (Fri Mar 28, 6:04 pm)