On Fri, Apr 17, 2009 at 06:51:37PM +0200, Peter Zijlstra wrote:
It probably would not be hard to enable preemptable RCU in a
!CONFIG_PREEMPT configuration, which would allow mutexes to be acquired
in these read-side critical sections. After I fix any relevant bugs,
of course...
Well, I am trying to get rid of the summing over all CPUs -- really hard
to make a reasonable hierarchy that way. But yes. ;-)
Given that for classic and hierarchical RCU, rcu_read_lock() and
rcu_read_unlock() just map to preempt_disable() and preempt_enable(),
how is this helping?
Jim Houston did an unlock-driven implementation some years back:
http://marc.theaimsgroup.com/?l=linux-kernel&m=109387402400673&w=2
The read-side overhead can be a problem. And I have gotten grace-period
latencies under 100ns without driving the grace period from the update
side. Of course, these implementations have their downsides as well. ;-)
Compared to the global rwlock, it is a wonderful solution. ;-)
In principle I agree. In practice, this is an infrequently executed
slow path, right?
Or are you concerned about real-time latencies while loading new
iptables or some such?
There of course is a point beyond which this method is slower than
a full RCU grace period. But I bet that Dave's 256-way machine
is not anywhere near big enough to reach that point. Maybe he can
try it and tell us what happens. ;-)
This is a good point. I understand the need to acquire the locks, but
am not fully clear on why we cannot acquire one CPU's lock, gather its
counters, release that lock, acquire the next CPU's lock, and so on.
Maybe a code-complexity issue?
Please keep in mind that we are trying to hit 2.6.30 with this fix, so
simplicity is even more important than it usually be. Yes, I have some
idea of the irony of me saying much of anything about simplicity. ;-)
We are making it faster than it used to be by quite a bit by getting rid
of the global lock, so this does sound like a good approach. Here is my
reasoning:
1. The update-side performance is good, as verified by Jeff Chua.
2. The per-packet read-side performance is slowed by roughly the
overhead of an uncontended lock, which comes to about 60ns
on my laptop. At some point, this 60ns will become critical,
but I do not believe that we are there yet.
When it does become critical, a new patch can be produced.
Such a patch can of course be backported as required -- this
is a reasonably isolated piece of code, right?
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html