Re: tbench regression on each kernel release from 2.6.22 -> 2.6.28

Previous thread: [PATCH] bonding: add more ethtool support by Stephen Hemminger on Monday, August 11, 2008 - 11:34 am. (3 messages)

Next thread: Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem by David Witbrodt on Monday, August 11, 2008 - 12:05 pm. (1 message)
From: Christoph Lameter
Date: Monday, August 11, 2008 - 11:36 am

It seems that the network stack becomes slower over time? Here is a list of
tbench results with various kernel versions:

2.6.22		3207.77 mb/sec
2.6.24		3185.66
2.6.25		2848.83
2.6.26		2706.09
2.6.27(rc2)	2571.03

And linux-next is:

2.6.28(l-next)	2568.74

It shows that there is still have work to be done on linux-next. Too close to
upstream in performance.

Note the KT event between 2.6.24 and 2.6.25. Why is that?
--

From: Kok, Auke
Date: Monday, August 11, 2008 - 11:50 am

is this with SLAB or with SLUB? SLUB has been known to impact network performance...

Auke


--

From: Christoph Lameter
Date: Monday, August 11, 2008 - 11:56 am

The original testing config was SLAB based so it was used throughout.

--

From: David Miller
Date: Monday, August 11, 2008 - 2:15 pm

From: Christoph Lameter <cl@linux-foundation.org>

Isn't that when some major scheduler changes went in?  I'm not blaming
the scheduler, but rather I'm making the point that there are other
subsystems in the kernel that the networking interacts with that
influences performance at such a low level.  This includes the memory
allocator :-)
--

From: Christoph Lameter
Date: Monday, August 11, 2008 - 2:33 pm

Right this covers a significant portion of the kernel. SLAB was used since .22
was pretty early for SLUB. And around 2.6.24 we had the merges of the antifrag
logic.

.25 was the point where HR timers came in. By switching off hrtimers I can get
some (minor) portion of performance back. There must be more things in play
though.

Maybe what we are seeing is general bloat in kernel execution paths due to the
growth in complexity?
--

From: David Miller
Date: Monday, August 11, 2008 - 2:50 pm

From: Christoph Lameter <cl@linux-foundation.org>

It could be, and any kind of analysis into this would be great.

I had a change that RCU destroyed sockets and this added a
tiny bit of latency, so I never added it even though it would
have allowed a lot of simplification of socket handling (which
I though would make up for RCU's latency, but it didn't).
--

From: Kok, Auke
Date: Monday, August 11, 2008 - 2:56 pm

perhaps Rick Jones who maintains netperf could enlighten us on some historic
numbers? he usually seems to be happy to prop up new netperf numbers :)


Auke
--

From: Rick Jones
Date: Monday, August 11, 2008 - 3:11 pm

While this is an excellent opening to talk about how netperf 
top-of-trunk can now emit keyword=value results easier (ostensibly) to 
put into a database then the regular or even CSV output formats, I 
cannot fully exploit it by pointing at a database of results :(

rick jones
--

From: Andi Kleen
Date: Tuesday, August 12, 2008 - 12:11 am

Wouldn't surprise me. Have you considered doing profiles? 

e.g. just oprofiling the benchmark on the different kernels and see
if there's some obvious difference in the CPU consumers? 

-Andi
--

From: Christoph Lameter
Date: Tuesday, August 12, 2008 - 11:57 am

If I get the time I will try to do that.

Another way to understand why we are accepting the regressions here may be
that we give more consideration to real time issues and deterministic
performance these days. Hardware speed gains compensate for the additional
bloat? (I ran the old kernels on cutting edge hardware after all).
--

From: Ilpo Järvinen
Date: Tuesday, August 12, 2008 - 1:13 am

...IIRC, somebody in the past did even bisect his (probably netperf) 
2.6.24-25 regression to some scheduler change (obviously it might or might 
not be related to this case of yours)...


-- 
 i.
--

From: Zhang, Yanmin
Date: Sunday, August 17, 2008 - 7:05 pm

I did find much regression with netperf TCP-RR-1/UDP-RR-1/UDP-RR-512. I start
1 serve and 1 client while binding them to a different logical processor in
different physical cpu.

Comparing with 2.6.22, the regression of TCP-RR-1 on 16-core tigerton is:
2.6.23		6%
2.6.24		6%
2.6.25		9.7%
2.6.26		14.5%
2.6.27-rc1	22%

Other regressions on other machines are similar.

yanmin


--

From: Ilpo Järvinen
Date: Monday, August 18, 2008 - 12:53 am

> On Tue, 2008-08-12 at 11:13 +0300, Ilpo J
From: Zhang, Yanmin
Date: Monday, August 18, 2008 - 5:56 pm

I reverted the patch against 2.6.27-rc1 and did a quick testing with netperf TCP-RR-1
and didn't find improvement. So your patch is good.
Mostly, I suspect process scheduler causes the regression. It seems when there are 
only 1 or 2 tasks running on the cpu, the performance isn't good. My netperf testing
is just one example.


--

From: Christoph Lameter
Date: Monday, August 18, 2008 - 7:07 am

There are AIM7 regressions that are similar to tbench.

2.6.22	28436
2.6.26	23064
--

From: Ray Lee
Date: Monday, August 18, 2008 - 7:31 am

On Mon, Aug 18, 2008 at 7:07 AM, Christoph Lameter

Just a shot in the dark -- is this with Group Scheduling on or off?
Off is prefered for benchmarks.
--

From: Christoph Lameter
Date: Monday, August 18, 2008 - 7:34 am

Off
--

From: Zhang, Yanmin
Date: Monday, August 18, 2008 - 6:01 pm

Mostly, AIM7 has about 4~5% regression on my machines. As AIM7 result is stable,
so 4% is big. 

--

From: Zhang, Yanmin
Date: Sunday, August 17, 2008 - 6:48 pm

What's the hardware configuration? Is it dual-core?

I also track tbench performance with lastest kernels on a couple of quad-core machines,
and didn't find such regression while the results did have fluctuation.

What's the commandline you is using to start tbench? I start tbench with CPU_NUM*2.



--

Previous thread: [PATCH] bonding: add more ethtool support by Stephen Hemminger on Monday, August 11, 2008 - 11:34 am. (3 messages)

Next thread: Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem by David Witbrodt on Monday, August 11, 2008 - 12:05 pm. (1 message)