On Tuesday 18 November 2008 13:08, Linus Torvalds wrote:
It's much harder to do this with powerpc I think because they would need
to calculate 8 hashes and touch 8 cachelines to prefill 8 translations,
wouldn't they?
> low-latency fault handling (for not when you miss in the TLB, but when
The per-page processing costs are interesting too, but IMO there is more
work that should be done to speed up order-0 pages. The patches I had to
remove the sync instruction for smp_mb() in unlock_page sped up pagecache
throughput (populate, write(2), reclaim) on my G5 by something really
crazy like 50% (most of that's in, but I'm still sitting on that fancy
unlock_page speedup to remove the final smp_mb).
I suspect some of the costs are also in powerpc specific code to insert
linux ptes into their hash table. I think some of the synchronisation for
those could possibly be shared with generic code so you don't need the
extra layer of locks there.
--
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| KAMEZAWA Hiroyuki | Re: 2.6.23-mm1 |
| Bart Van Assche | Re: Integration of SCST in the mainstream Linux kernel |
| Dave Hansen | Re: [RFC/PATCH] Documentation of kernel messages |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Alan Cox | Re: [BUG] New Kernel Bugs |
| Herbert Xu | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Corey Minyard | [PATCH 3/3] Convert the UDP hash lock to RCU |
