* Linus Torvalds (torvalds@linux-foundation.org) wrote:[...] Hi Linus, After having had a look 15 times more at your rwlock code, and again at your answer, in loops, I come to the conclusion that I made a terrible job at explaining the core idea beneath my rwlock design. So let's try to make it more obvious. I am going to keep this explanation short. First, the goal : to kill latency induced by rwlocks. Now, let me explain why I need at least not one, but _four different_ contention bits. Simply because there are four types of contention, one for each execution context which may take the read lock. (irq, softirq, non-preemptable, preemptable) So let's suppose I share the rwlock between non-preemptable writers, non-preemptable readers and interrupt context readers. The idea is this: If I want to take the write lock, I'll have to protect against both non-preemptable and interrupt readers (and also against other writers). A way to do that without inducing high interrupt latency with current rwlocks would be for the writer to take _two_ locks, not just one. The first excludes preemptable readers and the second excludes irqs. So we have : For the writer : preempt_disable(); write_lock(&preempt_rwlock); local_irq_disable(); write_lock(&irq_rwlock); /* Write access */ write_unlock(&irq_rwlock); local_irq_enable(); write_unlock(&preempt_rwlock); preempt_enable(); And the readers : Interrupt : read_lock(&irq_rwlock); ... read_unlock(&irq_rwlock); non-preemptable : read_lock(&preempt_rwlock); ... read_unlock(&preempt_rwlock); If you still wonder why I need to take two locks, thus elevating the context priority by exluding one context at a time, we can go to this example showing why current rwlock use in the kernel produces such high latencies. The current use is : For the writer, we exclude all the readers in one go : local_irq_disable(); write_lock(&big_lock_against_all_readers); /* Write access */ write_unlock(&big_lock_against_all_readers); local_irq_enable(); write_unlock(&preempt_rwlock); preempt_enable(); And for the readers : Interrupt : read_lock(&big_lock_against_all_readers); ... read_unlock(&big_lock_against_all_readers); non-preemptable : read_lock(&big_lock_against_all_readers); ... read_unlock(&big_lock_against_all_readers); And where is the problem ? In the following execution sequence : - non-preemptable reader takes the read lock - writer disables irqs - writer takes the write lock, contended by the reader - hrm, the reader can take a while to complete, may be iterating on the tasklist, interrupted by irqs and softirqs... This contention is pretty bad because interrupts are disabled while we busy-loop. And yes, the new implementation you proposed *does* suffer from the same problem. Now if you go in the multiple locks scenario I explained above, you'll notice that you don't have this problem. Then comes the question : how can we efficiently take many locks, one per reader execution context we want to exclude. This is where we need to have at _least_ one contention bit _per context_. Because at a given moment, non-preemptable readers can be contended, but not irq readers. But we also have to know when a particular reader execution context is locked out so that the writer knows it has exclusive access and can therefore either disable interrupts and exclude interrupt readers or finally access the critical section. Therefore, we need to keep one reader count _per context_, which makes that 4 reader counts (irq, softirq, non-preemptable, preemptable). And that's where we start being short on bits. But my current maximums for number of readers is not a problem, especially because I detect overflow beforehand using cmpxchg and busy-loop in the rare event these counters would be full. And I am not limited to NR_CPUS _at all_ anymore, as shows these constants : #define PTHREAD_RMAX 16384 #define NPTHREAD_RMAX 2048 #define SOFTIRQ_RMAX 512 #define HARDIRQ_RMAX 256 So, before discussing optimization or implementation details further, I'd like to know if I made myself clear enough about the design goal. If not, then again I must be doing a terrible job at explaining it and please let me know. I made a big patch cleanup after v8, so you might want to wait for us to come to an understanding before going through a next patch release. Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 --
| monstr | [PATCH 27/56] microblaze_v2: support for a.out |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Rafael J. Wysocki | [Bug #10493] mips BCM47XX compile error |
git: | |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| Frans Pop | svc: failed to register lockdv1 RPC service (errno 97). |
