Re: convert netisr to real softinterrupt

Previous thread: limit the number of cpus amd64 will attach to the number of cpuinfo slots we have by David Gwynne on Wednesday, November 24, 2010 - 6:07 am. (8 messages)

Next thread: more usb detach love by Jacob Meuser on Wednesday, November 24, 2010 - 1:59 pm. (4 messages)
From: Claudio Jeker
Date: Wednesday, November 24, 2010 - 9:06 am

This diff was made by oga@ some time ago -- I just fixed a few conflicts
and I would really like to see this going in.

netisr needs to be made a normal C function and not this madness it
currently is. This is necessary to imporve the packet scheduling.

Tested myself on i386, amd64, sparc64, sparc, alpha and hppa

Index: arch/alpha/alpha/interrupt.c
===================================================================
RCS file: /cvs/src/sys/arch/alpha/alpha/interrupt.c,v
retrieving revision 1.29
diff -u -p -r1.29 interrupt.c
--- arch/alpha/alpha/interrupt.c	19 Apr 2009 19:13:57 -0000	1.29
+++ arch/alpha/alpha/interrupt.c	5 Nov 2010 15:50:38 -0000
@@ -88,26 +88,6 @@
 #include <sys/device.h>
 #endif
 
-#include <net/netisr.h>
-#include <net/if.h>
-
-#ifdef INET
-#include <netinet/in.h>
-#include <netinet/if_ether.h>
-#include <netinet/ip_var.h>
-#endif
-
-#ifdef INET6
-#ifndef INET
-#include <netinet/in.h>
-#endif
-#include <netinet/ip6.h>
-#include <netinet6/ip6_var.h>
-#endif
-
-#include "ppp.h"
-#include "bridge.h"
-
 #include "apecs.h"
 #include "cia.h"
 #include "lca.h"
@@ -119,8 +99,6 @@ extern struct evcount clk_count;
 
 struct scbvec scb_iovectab[SCB_VECTOIDX(SCB_SIZE - SCB_IOVECBASE)];
 
-void netintr(void);
-
 void	scb_stray(void *, u_long);
 
 void
@@ -480,33 +458,8 @@ badaddr_read(void *addr, size_t size, vo
 
 #endif	/* NAPECS > 0 || NCIA > 0 || NLCA > 0 || NTCASIC > 0 */
 
-int netisr;
-
-void
-netintr()
-{
-	int n;
-
-	while ((n = netisr) != 0) {
-		atomic_clearbits_int(&netisr, n);
-
-#define	DONETISR(bit, fn)						\
-		do {							\
-			if (n & (1 << (bit)))				\
-				fn();					\
-		} while (0)
-
-#include <net/netisr_dispatch.h>
-
-#undef DONETISR
-	}
-}
-
 struct alpha_soft_intr alpha_soft_intrs[SI_NSOFT];
 
-/* XXX For legacy software interrupts. */
-struct alpha_soft_intrhand *softnet_intrhand;
-
 /*
  * softintr_init:
  *
@@ -524,10 +477,6 @@ softintr_init()
 		mtx_init(&asi->softintr_mtx, IPL_HIGH);
 ...
From: Mike Belopuhov
Date: Wednesday, November 24, 2010 - 9:42 am

hi,

i like the diff, but why not get rid of netisr_dispatch.h?
also i'm not sure, but maybe netisr.c copyright should be
attributed to berkeley as we're using their code directly?

otherwise i'm okay.

cheers!

From: Claudio Jeker
Date: Thursday, November 25, 2010 - 3:50 am

I planned to this after this diff goes in. I did not want to change

Are we? The current netisr.c code is a generic implementation of using
softinterrupts to run the netisr and in the future it will get even more
different.  Owain, what do you think? 


From: Mike Belopuhov
Date: Thursday, November 25, 2010 - 4:21 am

On Thu, Nov 25, 2010 at 11:50 AM, Claudio Jeker


actually, this is netbsd code in part.  i've traced it to the Eric
Haszlakiewicz's
<erh@netbsd> commit to sys/arch/sparc/sparc/intr.c [1] and he also introduced
netisr_dispatch.h include.  unfortunately he's not mentioned in any copyrights.

From: Owain Ainsworth
Date: Thursday, November 25, 2010 - 5:30 am

My diff needed some tweaking anyway.

Miod pointed out to me when I first sent this out that some archs use
something slightly different than this generic version.

For example hppa uses a specially aligned netisr variable so it can use
the pa-risc real atomic instructions which are quite a bit faster.

Similarly x86 uses compare-and-swap instead of test, then clear bits.

I meant to add some MD defines so that this could be kept (falling back
to the implementation in this diff)  since miod claims that it is useful
and significantly faster (I did not check this). I've cced miod so he

I think my initial mail possibly mentioned that I was unsure if the
original was needed. Put the original one in there as well if you are
unsure. I won't try and claim that I'm positive that it was entirely my
code (too long ago for a start ;).

Anyway, thanks for taking over this diff, claudio. I'm far too busy
right now to sort it out.

Ta,

-0-
-- 
Grelb's Reminder:
	Eighty percent of all people consider themselves to be above
	average drivers.

From: Henning Brauer
Date: Thursday, November 25, 2010 - 5:41 am

I really want to see a verification that this doesn't slow us down
before it goes in.

and no, sorry, "shouldn't be slower" is not good enough.

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting

From: Henning Brauer
Date: Thursday, November 25, 2010 - 11:12 am

the last word on this was that you'd run them with remote access to my
test setup

be assured i had long run it if i had found the time

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting

From: Claudio Jeker
Date: Thursday, November 25, 2010 - 9:02 am

This is all nice and so but the main reason for me to get the netisr code
into a pure c form is that we need better netisr scheduling. There are
some rather nasty effects because of the current way netisr works and it
has nothing todo with optimising the netisr bitfield itself. 
Some quick tests showed that netisr is causing a >25% drop in forwarding
performance. So fixing netisr and the protocol queues can have a very huge
impact on performance.

If somebody has a good way to allow netisr.c to have MD defines in a clean
way then go for it. I doubt that there is a measurable performance

OK, I will kill netisr_dispatch.h. Diff follows...


From: Henning Brauer
Date: Thursday, November 25, 2010 - 11:13 am

yeah, well, "doubt". pls run a quick test.

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting

From: Claudio Jeker
Date: Monday, December 20, 2010 - 7:45 am

Well this was not "quick" but I finally finsihed the testing.

Tested with packet generator set to a packet size of 512B.
All test were done on the same machine using the dual port ix(4) as
interface. Forwarding performance measured with pf disabled, no
POOL_DEBUG and a pool fix for the mcl2k pool for i386.
No other fiddling was done.  dmesg of the box is below.
The numbers are the top forwarding speeds as measured by the packet
generator right before livelock detection would kick in (which increases
packet loss and lowers the throughput).

amd64:
GENERIC:	TX 428767	RX 428780 pps	18.25% line rate
GENERIC.MP:	TX 417020	RX 417044 pps	17.75% line rate
with netisr diff:
GENERIC:	TX 427622	RX 427173 pps	18.20% line rate
GENERIC.MP:	TX 417020	RX 416997 pps	17.75% line rate

i386:
GENERIC:	TX 440514	RX 440403 pps	18.75% line rate
GENERIC.MP:	TX 428768	RX 428790 pps	18.25% line rate
with netisr diff:
GENERIC:	TX 425330	RX 425320 pps	18.10% line rate
GENERIC.MP:	TX 415875	RX 415619 pps	17.70% line rate

Btw. as this shows amd64 and i386 are more or less equal in performance.
i386 is a tiny little bit faster.

Diff used attached as well. I will commit this version in the next days
later on an atomic_load_and_clear_int() function will be used to help
hppa.

OpenBSD 4.8-current (GENERIC.MP) #2: Mon Dec 20 14:54:04 CET 2010
    root@foo.vantronix.net:/usr/src/sys/arch/i386/compile/GENERIC.MP
cpu0: Intel(R) Xeon(R) CPU E5504 @ 2.00GHz ("GenuineIntel" 686-class) 2.01 GHz
cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT
real mem  = 3211730944 (3062MB)
avail mem = 3149099008 (3003MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 12/31/99, BIOS32 rev. 0 @ 0xf0010, SMBIOS rev. 2.5 @ 0x9a800 (81 entries)
bios0: vendor HP version "O33" date 12/03/2009
bios0: HP ProLiant DL160 G6
acpi0 at bios0: rev 2
acpi0: sleep states S0 S4 ...
From: Mark Kettenis
Date: Thursday, November 25, 2010 - 3:23 pm

The hppa and i386/amd64 approach are rather similar.  Both do an
atomic load and clear, which happens to be the only atomic instruction
available on hppa, and is easy to implement as an atomic swap with 0
on i386/amd64.  But implementing this operation on other architectures
shouldn't be much more difficult, and should work just as well as the
proposed code that uses atomic_clearbits_int().  The MI code would
look something like:

void
netintr(void *unused)
{
	int n;

	n = atomic_load_and_clear(&netisr);
#define DONETISR(bit, fn)			\
		do {				\
			if (n & 1 << (bit))	\
				fn();		\
		} while ( /* CONSTCOND */ 0)
#include <net/netisr_dispatch.h>
}

From: Owain Ainsworth
Date: Thursday, November 25, 2010 - 3:45 pm

The only other thing to note is that you may need to define a type (or
whatever) since iirc the hppa one needs a specific alignment (cacheline
to itself).

Otherwise I like this approach, it is cleaner than what I was thinking
(not knowing exactly what hppa was doing other than it was faster didn't
help ;).

-0-
-- 
Sometimes I worry about being a success in a mediocre world.
		-- Lily Tomlin

Previous thread: limit the number of cpus amd64 will attach to the number of cpuinfo slots we have by David Gwynne on Wednesday, November 24, 2010 - 6:07 am. (8 messages)

Next thread: more usb detach love by Jacob Meuser on Wednesday, November 24, 2010 - 1:59 pm. (4 messages)