e1000: Question about polling

Previous thread: [PATCH 2/2] bluetooth : do not move child device other than rfcomm by Dave Young on Monday, February 18, 2008 - 3:58 am. (2 messages)

Next thread: [PATCH][IBMVETH]: Use single_open instead of manual manipulations. by Pavel Emelyanov on Monday, February 18, 2008 - 6:55 am. (6 messages)
To: <netdev@...>
Date: Monday, February 18, 2008 - 5:18 am

Hello all.

Interesting think:

Have PC that do NAT. Bandwidth about 600 mbs.

Have 4 CPU (2xCoRe 2 DUO "HT OFF" 3.2 HZ).

irqbalance in kernel is off.

nat2 ~ # cat /proc/irq/217/smp_affinity
00000001
nat2 ~ # cat /proc/irq/218/smp_affinity
00000003

Load SI on CPU0 and CPU1 is about 90%

Good... try do
echo ffffffff > /proc/irq/217/smp_affinity
echo ffffffff > /proc/irq/218/smp_affinity

Get 100% SI at CPU0

Question Why?

I listen that if use IRQ from 1 netdevice to 1 CPU i can get 30%
perfomance... but i have 4 CPU... i must get more perfomance if i cat
"ffffffff" to smp_affinity.

picture looks liks this:
0-3 CPU get over 50% SI.... bandwith up.... 55% SI... bandwith up...
100% SI on CPU0....

I remember patch to fix problem like it... patched function
e1000_clean... kernel on pc have this patch (2.6.24-rc7-git2)... e1000
driver work much better (i up to 1.5-2x bandwidth before i get 100% SI),
but i think that it not get 100% that it can =)

Thanks for answers and sorry for my English

--

To: Badalian Vyacheslav <slavon@...>, <netdev@...>
Date: Wednesday, February 20, 2008 - 4:15 am

do you mean to be balancing interrupts between core 1 and 2 here?
1 = cpu 0
2 = cpu 1
4 = cpu 2
8 = cpu 3

so 1+2 = 3 for irq 218, ie balancing between the two.

sometimes the cpus will have a paired cache, depending on your bios it
will be organized like cpu 0/2 = shared cache, and cput 1/3 = shared
cache.
you can find this out by looking at physical ID and CORE ID in

because as each adapter generating interrupts gets rotated through cpu0,
it gets "stuck" on cpu0 because the napi scheduling can only run one at
a time, and so each is always waiting in line behind the other to run
its napi poll, always fills its quota (work_done is always != 0) and

only if your performance is not cache limited but cpu horsepower
limited. you're sacrificing cache coherency for cpu power, but if that

the patch helps a little because it decreases the amount of time the
driver spends in napi mode, basically shortening the exit condition
(which reenables interrupts, and therefore balancing) to work_done <

you basically can't get much more than one cpu can do for each nic. its
possible to get a little more, but my guess is you won't get much. The
best thing you can do is make sure as much traffic as possible stays in
the same cache, on two different cores.

you can try turning off NAPI mode either in the .config, or build the
sourceforge driver with CFLAGS_EXTRA=-DE1000_NO_NAPI, which seems
counterintuitive, but with the non-napi e1000 pushing packets to the
backlog queue on each cpu, you may actually get better performance due
to the balancing.

some day soon (maybe) we'll have some coherent way to have one tx and rx
interrupt per core, and enough queues for each port to be able to handle
1 queue per core.

good luck,
Jesse
--

To: Brandeburg, Jesse <jesse.brandeburg@...>
Cc: <netdev@...>
Date: Wednesday, February 20, 2008 - 5:15 am

Very big thanks for this answer. You ask for all my questions and for

--

To: Badalian Vyacheslav <slavon@...>
Cc: <netdev@...>
Date: Wednesday, February 20, 2008 - 3:50 am

If some patch works for you, and you can show here its advantages,
you should probably add here some link and request for merging.

BTW, I wonder if you tried to check if changing CONFIG_HZ makes any
difference here?

Regards,
Jarek P.
--

To: Jarek Poplawski <jarkao2@...>
Cc: <netdev@...>
Date: Wednesday, February 20, 2008 - 5:25 am

Sorry for little information and mistakes in letter. Jesse Brandeburg
ask for all my questions. In future i will try to be more accurate then
write letters and post more info.

--

To: Badalian Vyacheslav <slavon@...>
Cc: <netdev@...>
Date: Wednesday, February 20, 2008 - 5:47 am

On Wed, Feb 20, 2008 at 12:25:32PM +0300, Badalian Vyacheslav wrote:

OK! Don't disrespect for me - I'll try fix my English next time!)

Jarek P.
--

To: Jarek Poplawski <jarkao2@...>
Cc: <netdev@...>
Date: Wednesday, February 20, 2008 - 7:54 am

Khrm.... i try to say that i have language barrier and some time may
wrong compose clauses. Example below =)
"I'll try fix my English next time!"

--

To: Badalian Vyacheslav <slavon@...>
Cc: <netdev@...>, <jesse.brandeburg@...>
Date: Wednesday, February 20, 2008 - 8:25 am

Don't worry Vyacheslav: I think your message was understandable enough
if you got good answer from Jesse. (And I've learned something BTW too;
Thanks Jesse!) And after all it's not a language group: we care here
for serious problems!

Jarek P.
--

Previous thread: [PATCH 2/2] bluetooth : do not move child device other than rfcomm by Dave Young on Monday, February 18, 2008 - 3:58 am. (2 messages)

Next thread: [PATCH][IBMVETH]: Use single_open instead of manual manipulations. by Pavel Emelyanov on Monday, February 18, 2008 - 6:55 am. (6 messages)