On Wed, 17 Sep 2008 21:58:08 -0700 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:This is the result on 8cpu box. I think I have to reduce footprint of fastpath of my patch ;) Test result of your patch is (2). == Xeon 8cpu/2socket/1-node equips 48GB of memory. run shell/exec benchmark 3 times just after boot. lps ... loops per sec. lpm ... loops per min. (*) Shell tests somtimes fail because of division by zero, etc... (1). rc6-mm1(2008/9/13 version) == Run == 1st == == 2nd == ==3rd== Execl Throughput 2425.2 2534.5 2465.8 (lps) C Compiler Throughput 1438.3 1476.3 1459.1 (lpm) Shell Scripts (1 concurrent) 9360.3 9368.3 9360.0 (lpm) Shell Scripts (8 concurrent) 3868.0 3870.0 3868.0 (lpm) Shell Scripts (16 concurrent) 2207.0 2204.0 2201.0 (lpm) Dc: sqrt(2) to 99 decimal places 101644.3 102184.5 102118.5 (lpm) (2). (1) +remove-page-cgroup-pointer-v3 (radix-tree + dynamic allocation) == Run == 1st == == 2nd == == 3rd == Execl Throughput 2514.1 2548.9 2648.7 (lps) C Compiler Throughput 1353.9 1324.6 1324.7 (lpm) Shell Scripts (1 concurrent) 8866.7 8871.0 8856.0 (lpm) Shell Scripts (8 concurrent) 3674.3 3680.0 3677.7 (lpm) Shell Scripts (16 concurrent) failed. failed 2094.3 (lpm) Dc: sqrt(2) to 99 decimal places 98837.0 98206.9 98250.6 (lpm) (3). (1) + pre-allocation by "vmalloc" + hash + misc(atomic flags etc..) == Run == 1st == == 2nd == == 3rd == Execl Throughput 2385.4 2579.2 2361.5 (lps) C Compiler Throughput 1424.3 1436.3 1430.6 (lpm) Shell Scripts (1 concurrent) 9222.0 9234.0 9246.7 (lpm) Shell Scripts (8 concurrent) 3787.7 3799.3 failed (lpm) Shell Scripts (16 concurrent) 2165.7 2166.7 failed (lpm) Dc: sqrt(2) to 99 decimal places 102228.9 102658.5 104049.8 (lpm) (4). (3) + get/put page charge/uncharge + lazy lru handling Run == 1st == == 2nd == == 3rd == Execl Throughput 2349.4 2335.7 2338.9 (lps) C Compiler Throughput 1430.8 1445.0 1435.3 (lpm) Shell Scripts (1 concurrent) 9250.3 9262.0 9265.0 (lpm) Shell Scripts (8 concurrent) 3831.0 3834.4 3833.3 (lpm) Shell Scripts (16 concurrent) 2193.3 2195.3 2196.0 (lpm) Dc: sqrt(2) to 99 decimal places 102956.8 102886.9 101884.6 (lpm) It seems "execl" test is affected by footprint and cache hit rate than other tests. I need some more efforts for reducing overhead in (4). Note: (1)'s struct page is 64 bytes. (2)(3)(4)'s struct page is 56 bytes. -Kame --
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Nigel Cunningham | Re: [PATCH] Remove process freezer from suspend to RAM pathway |
| Paul Mundt | Re: 2.6.22-rc4-mm2 |
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
git: | |
| Arjan van de Ven | Re: [GIT]: Networking |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Natalie Protasevich | [BUG] New Kernel Bugs |
