On Tuesday 18 November 2008 13:08, Linus Torvalds wrote:It's much harder to do this with powerpc I think because they would need to calculate 8 hashes and touch 8 cachelines to prefill 8 translations, wouldn't they? The per-page processing costs are interesting too, but IMO there is more work that should be done to speed up order-0 pages. The patches I had to remove the sync instruction for smp_mb() in unlock_page sped up pagecache throughput (populate, write(2), reclaim) on my G5 by something really crazy like 50% (most of that's in, but I'm still sitting on that fancy unlock_page speedup to remove the final smp_mb). I suspect some of the costs are also in powerpc specific code to insert linux ptes into their hash table. I think some of the synchronisation for those could possibly be shared with generic code so you don't need the extra layer of locks there. --
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Alan Cox | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Jan Engelhardt | intel iommu (Re: -mm merge plans for 2.6.23) |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | Re: [GIT]: Networking |
| Evgeniy Polyakov | Re: [BUG] New Kernel Bugs |
