Re: [RFC PATCH] Writer-biased low-latency rwlock v8

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Linus Torvalds <torvalds@...>
Cc: Paul E. McKenney <paulmck@...>, H. Peter Anvin <hpa@...>, Jeremy Fitzhardinge <jeremy@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, Joe Perches <joe@...>, <linux-kernel@...>, Steven Rostedt <rostedt@...>
Date: Saturday, August 23, 2008 - 1:09 am

* Linus Torvalds (torvalds@linux-foundation.org) wrote:
[...]

Hi Linus,

After having had a look 15 times more at your rwlock code, and again at
your answer, in loops, I come to the conclusion that I made a terrible
job at explaining the core idea beneath my rwlock design. So let's try
to make it more obvious. I am going to keep this explanation short.

First, the goal : to kill latency induced by rwlocks.

Now, let me explain why I need at least not one, but _four different_
contention bits. Simply because there are four types of contention, one
for each execution context which may take the read lock. (irq, softirq,
non-preemptable, preemptable)

So let's suppose I share the rwlock between non-preemptable writers,
non-preemptable readers and interrupt context readers. The idea is this:
If I want to take the write lock, I'll have to protect against both
non-preemptable and interrupt readers (and also against other writers).

A way to do that without inducing high interrupt latency with current
rwlocks would be for the writer to take _two_ locks, not just one. The
first excludes preemptable readers and the second excludes irqs. So we
have :

For the writer :

preempt_disable();
write_lock(&preempt_rwlock);

local_irq_disable();
write_lock(&irq_rwlock);

/* Write access */

write_unlock(&irq_rwlock);
local_irq_enable();

write_unlock(&preempt_rwlock);
preempt_enable();

And the readers :

Interrupt :
read_lock(&irq_rwlock);
...
read_unlock(&irq_rwlock);

non-preemptable :
read_lock(&preempt_rwlock);
...
read_unlock(&preempt_rwlock);


If you still wonder why I need to take two locks, thus elevating the
context priority by exluding one context at a time, we can go to this
example showing why current rwlock use in the kernel produces such high
latencies. The current use is :

For the writer, we exclude all the readers in one go :

local_irq_disable();
write_lock(&big_lock_against_all_readers);

/* Write access */

write_unlock(&big_lock_against_all_readers);
local_irq_enable();

write_unlock(&preempt_rwlock);
preempt_enable();

And for the readers :

Interrupt :
read_lock(&big_lock_against_all_readers);
...
read_unlock(&big_lock_against_all_readers);

non-preemptable :
read_lock(&big_lock_against_all_readers);
...
read_unlock(&big_lock_against_all_readers);


And where is the problem ? In the following execution sequence :
- non-preemptable reader takes the read lock
- writer disables irqs
- writer takes the write lock, contended by the reader
  - hrm, the reader can take a while to complete, may be iterating on
    the tasklist, interrupted by irqs and softirqs...

This contention is pretty bad because interrupts are disabled while we
busy-loop. And yes, the new implementation you proposed *does* suffer
from the same problem. Now if you go in the multiple locks scenario I
explained above, you'll notice that you don't have this problem. Then
comes the question : how can we efficiently take many locks, one per
reader execution context we want to exclude.

This is where we need to have at _least_ one contention bit _per
context_. Because at a given moment, non-preemptable readers can be
contended, but not irq readers. But we also have to know when a
particular reader execution context is locked out so that the writer
knows it has exclusive access and can therefore either disable
interrupts and exclude interrupt readers or finally access the critical
section. Therefore, we need to keep one reader count _per context_,
which makes that 4 reader counts (irq, softirq, non-preemptable,
preemptable). And that's where we start being short on bits.

But my current maximums for number of readers is not a problem,
especially because I detect overflow beforehand using cmpxchg and
busy-loop in the rare event these counters would be full. And I am not
limited to NR_CPUS _at all_ anymore, as shows these constants :

#define PTHREAD_RMAX    16384
#define NPTHREAD_RMAX   2048
#define SOFTIRQ_RMAX    512
#define HARDIRQ_RMAX    256

So, before discussing optimization or implementation details further,
I'd like to know if I made myself clear enough about the design goal. If
not, then again I must be doing a terrible job at explaining it and
please let me know. I made a big patch cleanup after v8, so you might
want to wait for us to come to an understanding before going through a
next patch release.

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH] x86_64 : support atomic ops with 64 bits integer val..., Mathieu Desnoyers, (Sat Aug 16, 3:39 am)
Re: [PATCH] x86_64 : support atomic ops with 64 bits integer..., Mathieu Desnoyers, (Sat Aug 16, 11:43 am)
[RFC PATCH] Fair rwlock, Mathieu Desnoyers, (Sat Aug 16, 5:19 pm)
Re: [RFC PATCH] Fair rwlock, Linus Torvalds, (Sat Aug 16, 5:33 pm)
[RFC PATCH] Fair low-latency rwlock v3, Mathieu Desnoyers, (Sun Aug 17, 3:53 am)
Re: [RFC PATCH] Fair low-latency rwlock v3, Linus Torvalds, (Sun Aug 17, 12:17 pm)
[RFC PATCH] Fair low-latency rwlock v5, Mathieu Desnoyers, (Sun Aug 17, 3:10 pm)
Re: [RFC PATCH] Fair low-latency rwlock v5, Peter Zijlstra, (Mon Aug 25, 3:20 pm)
Re: [RFC PATCH] Fair low-latency rwlock v5, Paul E. McKenney, (Mon Aug 18, 7:25 pm)
Re: [RFC PATCH] Fair low-latency rwlock v5, Mathieu Desnoyers, (Tue Aug 19, 2:04 am)
Re: [RFC PATCH] Fair low-latency rwlock v5, Mathieu Desnoyers, (Tue Aug 19, 3:33 am)
Re: [RFC PATCH] Fair low-latency rwlock v5, Linus Torvalds, (Tue Aug 19, 12:48 pm)
[RFC PATCH] Writer-biased low-latency rwlock v8, Mathieu Desnoyers, (Thu Aug 21, 4:50 pm)
Re: [RFC PATCH] Writer-biased low-latency rwlock v8, Linus Torvalds, (Thu Aug 21, 5:00 pm)
Re: [RFC PATCH] Writer-biased low-latency rwlock v8, H. Peter Anvin, (Thu Aug 21, 5:26 pm)
Re: [RFC PATCH] Writer-biased low-latency rwlock v8, Linus Torvalds, (Thu Aug 21, 5:41 pm)
Re: [RFC PATCH] Writer-biased low-latency rwlock v8, Linus Torvalds, (Thu Aug 21, 5:15 pm)
Re: [RFC PATCH] Writer-biased low-latency rwlock v8, Mathieu Desnoyers, (Sat Aug 23, 1:09 am)
Re: [RFC PATCH] Writer-biased low-latency rwlock v8, Linus Torvalds, (Sat Aug 23, 2:02 pm)
Re: [RFC PATCH] Writer-biased low-latency rwlock v8, Mathieu Desnoyers, (Sat Aug 23, 4:30 pm)
Re: [RFC PATCH] Writer-biased low-latency rwlock v8, Linus Torvalds, (Sat Aug 23, 5:40 pm)
Re: [RFC PATCH] Writer-biased low-latency rwlock v8, Linus Torvalds, (Thu Aug 21, 6:22 pm)
Re: [RFC PATCH] Fair low-latency rwlock v5, Mathieu Desnoyers, (Tue Aug 19, 5:06 am)
Re: [RFC PATCH] Fair low-latency rwlock v5, Linus Torvalds, (Mon Aug 18, 2:59 pm)
Re: [RFC PATCH] Fair low-latency rwlock v5 (updated benchmar..., Mathieu Desnoyers, (Sun Aug 17, 5:30 pm)