Re: firewall is very slow, something's wrong

Previous thread: Re: Sun Hardware by Jonathan Lindsey on Thursday, October 4, 2007 - 2:51 pm. (1 message)

Next thread: configuring ntpd to use GPS by James Hartley on Thursday, October 4, 2007 - 8:21 pm. (3 messages)
From: Florin Andrei
Date: Thursday, October 4, 2007 - 5:48 pm

Dual-homed firewall, web server on the private network, firewall is 
doing 1:1 NAT for the web server to the public interface of the 
firewall. em0 is the public interface, em1 is the private one.

In the exact same setup (same hardware even) I am comparing Linux and 
OpenBSD for a firewall. Installed Linux on a hard-disc, OpenBSD on 
another disc, and I'm just swapping discs while I'm testing.
All firewall rules are written as stateless as possible - I don't need 
stateful filtering, the setup is very simple (allow HTTP inbound, allow 
a few ICMP types, and that's it).

With Linux, I achieve gigabit transfer speeds through the firewall 
(saturating the network ports), but the firewall refuses to let any new 
connection through when I flood it with a bunch of small UDP packets 
with random source addresses.

I expected OpenBSD 4.1 to do better. But the thing is, even without the 
UDP flood, the OpenBSD firewall is very slow. I am downloading a huge 
file through it, via HTTP, and all I get is 4 Mbyte / sec. With Linux I 
get 112 Mbyte / sec.

Something's wrong. Or I'm doing something wrong.

The hardware is AMD64, Tyan Transport, 2 CPUs 2 cores each. I am using 
the SMP kernel. The network card is Intel Pro/1000 PCI Express 4x dual 
gigabit port, it carries both em0 and em1.

=========================

lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33192
         groups: lo
         inet 127.0.0.1 netmask 0xff000000
         inet6 ::1 prefixlen 128
         inet6 fe80::1%lo0 prefixlen 64 scopeid 0x8
fxp0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
         lladdr 00:e0:81:4a:0a:7f
         media: Ethernet autoselect (none)
         status: no carrier
bge0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
         lladdr 00:e0:81:4a:0a:a8
         media: Ethernet autoselect (none)
         status: no carrier
bge1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
         lladdr 00:e0:81:4a:0a:a9
         media: Ethernet autoselect (none)
         status: ...
From: Stuart Henderson
Date: Friday, October 5, 2007 - 1:31 am

You might want to re-think this, stateless rulesets are usually
slower. This is interesting:


Try setting net.inet.ip.ifq.maxlen to 256 (sysctl/sysctl.conf),
if you still see the congestion count increasing then search for
net.inet.ip.ifq.maxlen in the list archives and have a read.

From: Florin Andrei
Date: Monday, October 8, 2007 - 10:41 am

I raised maxlen to 300. I also enabled ACPI. It's still slow. The 
congestion counter is still not zero - currently at 386.5/s
One good thing is that there used to be a big pause when the kernel was 
booting up, probably waiting for some device or something - now with 
ACPI the pause is smaller. It's still waiting for something, just not as 
much.

I am watching the system with top, set to update every 1s, and I noticed 
there are a lot of interrupt load bursts on CPU0. The percentage of 
interrupt load is very uneven, sometimes as low as 15%, sometimes as 
high as 75%.
I unleashed the UDP flood and the firewall is totally frozen - can't do 
anything even on the local keyboard. Not even the display (running top) 
gets updated anymore. The machine is frozen solid. All network traffic 
stops immediately.
Kill the UDP flood and OpenBSD resumes normal operations.

I tried the uniprocessor kernel and it's exactly the same.

Comparison with Linux on the exact same hardware:
HTTP download speed through the firewall is 112 Mbyte / sec (saturating 
the GigE ports) and the interrupt load is relatively low and constant - 
about 30%.
Under UDP flood with Linux as a firewall, the current download finishes 
up, but a new one cannot get started. The system is not frozen at all, 
it's quite usable, in fact I can heavily overload it (running a bunch of 
CPU hogs) to the point where userspace becomes sluggish and load average 
is up to 250 or so, yet the firewall is not influenced at all.

So what's the deal here? The heavy interrupt load percentage seems to 
indicate an issue with the network driver if I'm not mistaken. But these 
are good and quite popular network cards - Intel Pro/1000 PCI Express 4x 
dual-port gigabit, seen by kernel as em0 and em1

-- 
Florin Andrei

http://florin.myip.org/

From: Claudio Jeker
Date: Sunday, October 7, 2007 - 12:15 pm

I guess you need to "enable acpi" with config(8) as the system is quite
new and most newer system have busted MP BIOS infos. The effect is bad
interrupt routing and other crazyness -- which is often felt as slow
systems.


From: Henning Brauer
Date: Tuesday, October 9, 2007 - 4:32 am

First, you want to run 4.2 or -current, that shoudl about double your 
throughput.
then, an i386 kernel should perform considerably better than amd64 for 
firewalling/routing/...
next, you don't want SMP for such tasks. take out the second CPU and 
give it to somebody who can use it, and run the uniprocessor kernel.
last, increase net.inet.ip.ifq.maxlen until you see the congestion 
counter not increasing much any more under load. should not exceed 2500 
by too much. as a rule of thumb, 256 per gigE interface aren't too far 
off.

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam

From: Florin Andrei
Date: Tuesday, October 9, 2007 - 9:27 am

Yes, I was looking at a paragraph in the 4.2 release notes and I thought 
all those things might be related exactly to the problem I'm seeing:

##############
Huge performance improvements in the network stack, including:
     * In pf, store routing table ID, queue ID etc directly in the 
packet header mbuf instead of using mbuf tags (which use malloc'd 
memory). This yields a 100% improvement in pf performance.
     * Skip TCP/UDP/ICMP/ICMP6 checksumming when not necessary. This 
yields a further 10% improvement in pf performance.
     * A change in the way the kernel random pool is stirred greatly 
increases performance with network interface cards that support 
interrupt mitigation, especially on architectures where reading the 
clock is expensive (such as amd64).
##############


That is surprising. What is the reason?


So, assuming the box is a pure firewall / static router (so just pf and 
static routes), even with multiple interfaces, all those tasks run in a 
single kernel thread?

Now here's the second thing: if this firewall needs to be integrated in 
an environment with dynamic routing, it will need to run some kind of 
dynamic routing daemon(s). For that, I'd like to have at least two cores 
on the system, and a kernel that can take advantage of them.
If the SMP kernel does not actually hurt performance, I might have to 
use it.

-- 
Florin Andrei

http://florin.myip.org/

From: Henning Brauer
Date: Tuesday, October 9, 2007 - 11:03 am

we dunno really. it hasn't been benched in sometimesoit might not even 



the required locking will cost you more than the second cpu/core 

it does. seriously. locking is not free.

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam

From: Douglas A. Tutty
Date: Tuesday, October 9, 2007 - 4:56 pm

Why is this?  Is there a security reason why the kernel is
single-thread; is it OBSD resource limitations (no developer time, no
hardware, etc); is it not enough interest yet?

With interface speeds and bus bandwidth going up, how many interfaces is
it possible to handle at full interface bandwidth on the fastest UP CPU
and how much memory does that take?

If you need more performance, do you build multiple boxes and CARP them?
Virtualization to run multiple OBSDs, each on its own core (ignoring
security issues of virtualization; crack one client is no worse than
having a single OBSD running all interfaces getting cracked).  Or do you
start assembling a big box with muliple MBs each with a UP hooked up to
a pair of drives, all co-located in one box with dual/triple/quad
redudant PSUs?

Not that I'm personally in need of the technology; I'm the one trying to
keep a 486 patched on dialup.  

I'm just interested.

Doug.

From: Ted Unangst
Date: Wednesday, October 10, 2007 - 11:44 am

actually, i think henning wanted to say that the network stack runs in

the stack runs entirely as interrupts.  if there were a thread, we
could add another, but going from 0 to 1 is more work than 1 to 2.

networking workloads do not always divide up among CPUs nicely.
assuming the code is written, just turning 2 or more CPUs loose on a
stream of packets is likely to result in reordering, which is bad.  to
avoid reordering, you need lots of queueing, which hurts performance
and drives up latency.  the problem is unfortunately not as simple as
add a lock here, a thread there, and presto.

From: Douglas A. Tutty
Date: Wednesday, October 10, 2007 - 3:25 pm

Right, I see that multiple threads dealing with one interface would be a
problem, but if you had a box with several interfaces, couldn't a
mult-threaded stack work?  Yes, I agree that 1 to 2 threads is totally
different than 2 to n.  

I'm just concerned with what I perceive as two converging trends: 1) the
trend for hardware per-interface bandwidth to increase; 2) the slowing
of advances in single-processor speed.  We're getting multiple cores on
a chip and multiple chips on a board, and multiple interfaces on a box.
What is the answer when the primary to-the-world interface is faster
than the OBSD firewall can handle on a single CPU?

Doug.

From: Florin Andrei
Date: Tuesday, October 9, 2007 - 10:11 pm

Even more offtopic - on Linux I saw there's a kernel thread for each 
interface. Interestingly, while routing 1 Gbps of traffic through the 
system (just a single download of a huge file over HTTP), on Linux 
kernel 2.6.18 both kernel threads are at 35% CPU usage, while on OpenBSD 
4.1 the single kernel thread is at 70...80%. Maybe a coincidence, maybe 
the numbers don't usually translate linearly like that, I don't know.

I like pf, it's a really clever firewall, that's why I'll keep testing 
with 4.2

-- 
Florin Andrei

http://florin.myip.org/

From: Dave Anderson
Date: Tuesday, October 9, 2007 - 8:51 pm

I'm not an OpenBSD developer, but I'd bet that the reason is that BSD
was originally written single-threaded (both because that's much easier
than multi-threaded and because multi-cpy systems were rare back then)
and has not [yet] been changed because changing to a multi-threaded
kernel requires a lot of very finicky work (with innumerable
opportunities to introduce very subtle bugs).

	Dave

-- 
Dave Anderson
<dave@daveanderson.com>

From: Florin Andrei
Date: Tuesday, October 9, 2007 - 1:49 pm

Then I will do some tests with 4.2 on gigabit-capable hardware. If 
anything noteworthy comes out, I'll post the results.
Don't expect something too fancy, but I guess anything is better than 

Hmmm.

Please correct me if I'm wrong:
Let's say a firewall is connected to a pretty fast Internet pipe (in the 
gigabit range). Let's say there's a DDoS against this environment. In 
theory, the firewall would need lots of RAM so that it can deal with the 
incoming nasty packets, create an entry for each packet in the state 
table (don't know the correct name for it in OpenBSD, sorry), then 
expire it after a while.
In theory, the firewall could be tweaked to expire unused states 
quickly, but still, more RAM is better when dealing with a DDoS.

What's still not clear to me is how much RAM I should provision per 1Gb 
of bandwidth on OpenBSD, assuming there's an incoming 
worst-case-scenario DDoS, that consumes RAM (and other resources) on the 
firewall yet leaves some bandwidth open for legitimate traffic (so the 
firewall must be able to continue to let the good traffic pass through). 
Also assuming some tweaking has been done on the firewall to expire the 
bad stuff quickly without affecting legitimate traffic.

But all that depends on the actual legitimate traffic and on the 
firewall rules.

Aw, damn. I was hoping that's not quite the case.

Well, then hopefully the dynamic routing daemons won't get too greedy 
and DoS the firewall from within. :-) Or I may have to re-think the 
whole environment and forget the idea of doing any kind of dynamic 
routing on the firewall - from a security perspective, dynamic routing 
on the firewall sucks anyway.


+-----+-------+-------+
|  \  | i386  | amd64 |
+-----+-------+-------+
| SMP |       |       |
+-----+-------+-------+
| UP  |       |       |
+-----+-------+-------+

-- 
Florin Andrei

http://florin.myip.org/

From: Henning Brauer
Date: Wednesday, October 10, 2007 - 12:35 am

nope.
the kernel will not ever use more than 1 GB (or were it 768MB? memory 
fuzzy).
more than 1 GB of memory on a firewall even hurts.ok, not much. but a 


no, they won't.
they only get the cpu cycles not required for packet forwarding (well, 

no, not really, not if done right.

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam

From: Siju George
Date: Wednesday, October 10, 2007 - 6:04 am

I thought by running an amd64 kernel will get me twice the speed than
an i386 on an amd64 machine since one is 64 bit processing and the
other is just 32 bit :-(

How about on sparc64 systems? do you get thwice the speed compared to
its 32 bit counterpart?

Thank you so much

Kind Regards

Siju

From: Henning Brauer
Date: Wednesday, October 10, 2007 - 6:15 am

so you think a 20 ton truck is twice as fast as a 10 ton truck?

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam

From: Peter N. M. Hansteen
Date: Wednesday, October 10, 2007 - 6:35 am

horizontal or vertical motion? assuming a perfectly spherical truck?

-- 
Peter N. M. Hansteen, member of the first RFC 1149 implementation team
http://bsdly.blogspot.com/ http://www.datadok.no/ http://www.nuug.no/
"Remember to set the evil bit on all malicious network traffic"
delilah spamd[29949]: 85.152.224.147: disconnected after 42673 seconds.

From: Scott Wells
Date: Wednesday, October 10, 2007 - 7:01 am

And is it in a vacuum?


From: Siju George
Date: Wednesday, October 10, 2007 - 8:23 am

O.K I get it :-)
So when does changing from 32 bit to a 64-bit processor actually help?

Kind Regards

Siju

From: Tony Abernethy
Date: Wednesday, October 10, 2007 - 9:20 am

Siju George wrote:

Quoting Paul de Weerd,
"In short: There is no short answer. It depends on what you're doing."
( Not to mention how you do it ;-)

Short answer:
When you *might* need more than a GB or so of RAM/swap. 
Most anything is faster than stuck.

Easy: 2:1 ratio *either direction* which is faster.
Hard: 10:1 ratio (again either direction).
(figure in loading/unloading times on the truck analogy)

From: Stuart Henderson
Date: Wednesday, October 10, 2007 - 10:04 am

There are other changes between i386/amd64 than the number of bits
(e.g. amd64 has more registers, which allows some other changes that
can improve performance for some things), so it depends a lot on
the code being run.

You can't even always say, "software X is faster on arch Y", since
the way you use that software can give different results.

If you're looking for "fastest", just benchmark as close to real-life
use on both, it's the easiest way. You also often need to test whether
what you're trying to run does work correctly on !i386 arch (it's not
uncommon for code to make assumptions which don't hold true on !i386).

Of course, there are reasons other than "fastest" you might choose

I'm not too sure I understand what you're saying here.

From: Robert C Wittig
Date: Wednesday, October 10, 2007 - 7:24 am

64 bit processors (combined with 64 bit capable operating systems) have 
the ability to address more RAM than 32 bit processors because 64^2 is a 
much larger number than 32^2... lots more RAM addresses).

This does not speed things up, though, until you run out of RAM, and 
start having to access the swapfile.

The processor's speed... MHz, GHz, etc., will determine how fast the 
processor itself can process instructions.


-- 
-wittig http://www.robertwittig.com/
         http://robertwittig.net/
         http://robertwittig.org/
.

From: Tony Abernethy
Date: Wednesday, October 10, 2007 - 7:59 am

Actually 2^64 vs 2^32  (64^2 is 2^7, 64 is 2^6, 32 is 2^5)

Other things equal, 64-bit should take twice as long because it 
takes 64 bits to do anything instead of 32 bits.

Not really that simple, because accessing 32 bits can involve
1) accessing the 64 bits that the 32 bits are in.
The 64-bits does affect how big the swap file can be without

From: Paul de Weerd
Date: Wednesday, October 10, 2007 - 7:41 am

On Wed, Oct 10, 2007 at 09:24:25AM -0500, Robert C Wittig wrote:
| Siju George wrote:
|
| >I thought by running an amd64 kernel will get me twice the speed than
| >an i386 on an amd64 machine since one is 64 bit processing and the
| >other is just 32 bit :-(
| >
|
| 64 bit processors (combined with 64 bit capable operating systems) have
| the ability to address more RAM than 32 bit processors because 64^2 is a
| much larger number than 32^2... lots more RAM addresses).
|
| This does not speed things up, though, until you run out of RAM, and
| start having to access the swapfile.
|
| The processor's speed... MHz, GHz, etc., will determine how fast the
| processor itself can process instructions.

Depending on your software, 64 bit processors can be quite a bit
faster. If you're dealing with 64bit integers, using 64bit registers,
etc., a lower clocked 64bit CPU might be faster than a 32bit CPU
clocking at a higher rate. In short: There is no short answer. It
depends on what you're doing.

From what Henning tells us (and what sounds logical to me), grabbing a
ethernet frame from a NIC and putting it on another NIC doesn't really
change much from 32bit to 64bit.

Your compiler also comes into play. If that is more tuned towards a
certain 32bit architecture (such as i386) than a certain 64bit arch
(because it's less populair, such as sparc64 or hppa64 or mips64),
this will impact your performance quite a bit.

Cheers,

Paul 'WEiRD' de Weerd

+++++++++++>-]<.>++[<------------>-]<+.--------------.[-]
                 http://www.weirdnet.nl/

[demime 1.01d removed an attachment of type application/pgp-signature]

From: Robert C Wittig
Date: Wednesday, October 10, 2007 - 10:34 am

Paul de Weerd wrote:


Oops! that should have read:



If you had to choose between, say, 2 gig RAM and a 32 bit CPU, or 1 gig 
RAM and a 64 bit CPU, which would be a better choice, in general?


-- 
-wittig http://www.robertwittig.com/
         http://robertwittig.net/
         http://robertwittig.org/
.

From: Paul de Weerd
Date: Wednesday, October 10, 2007 - 12:01 pm

On Wed, Oct 10, 2007 at 12:34:48PM -0500, Robert C Wittig wrote:
| If you had to choose between, say, 2 gig RAM and a 32 bit CPU, or 1 gig
| RAM and a 64 bit CPU, which would be a better choice, in general?

There is no such generalization. The amount of RAM you need depends on
the task. For firewalling, you don't need lots. For a high-traffic,
caching webserver you do need much.

If, in general, you are firewalling .. you won't need much RAM. If, in
general, you are doing something else, you might need it. Like I said
in my previous mail, there is no short answer. No quick solution.
Everything has advantages and disadvantages. In some cases you may not
even want to run OpenBSD (*shock* !).

In general, you should look at the specific problem at hand and solve
it with the means available.

Cheers,

Paul 'WEiRD' de Weerd

+++++++++++>-]<.>++[<------------>-]<+.--------------.[-]
                 http://www.weirdnet.nl/

[demime 1.01d removed an attachment of type application/pgp-signature]

From: Henning Brauer
Date: Wednesday, October 10, 2007 - 1:20 pm

for a packet filter/router/...? 32bit 2Gig and take a gig out.
for a databse server? 64bit and add ram when required.
there is no "in general".

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam

From: Ted Unangst
Date: Wednesday, October 10, 2007 - 11:54 am

64-bit and 1 GB.  it's much easier to add another GB RAM later than to
add 32-bits.

From: Jon Radel
Date: Wednesday, October 10, 2007 - 7:47 am

The increase from 2^32 to 2^64 is even more impressive.  ;-)

--Jon Radel

[demime 1.01d removed an attachment of type application/x-pkcs7-signature which had a name of smime.p7s]

From: Florin Andrei
Date: Tuesday, October 16, 2007 - 2:57 pm

HOLY SH*T! I tried 4.2. It rocks!

Just the first test that I tried after installing it:
- switched gigabit network
- web server behind 1:1 NATing firewall
- firewall is AMD64 X2 2.4GHz
- downloading 2GB file via HTTP through the firewall in infinite loop
- flooding the firewall with small UDP packets, random source IPs, 
generated as fast as my workstation (AMD64 X2 6400, Intel Pro/1000 PCI 
Express card, Linux Fedora 7, running the kernel-level "pktgen" packet 
generator which is very fast) can crank them out. The packets are 
directed to the NATed address of the web server, to a port that's 
blocked by the firewall.

Under these conditions, OpenBSD 4.1 as a firewall just keels over and 
dies. All traffic through the firewall just stops in an instant.
Linux 2.6.18 fares slightly better, the current download finishes up, 
but another one won't start.

But the default OpenBSD 4.2 i386 uniprocessor kernel doesn't seem to 
care. The download just keeps going. New downloads are initiated OK 
through the firewall. There are even spare CPU cycles left :-) not many 
(10%) but still. There's a very large percentage of CPU (80...90%) used 
for interrupts.

Good job folks, I'm impressed.

Anyone building gigabit routers and firewalls, don't delay, upgrade to 
4.2. Heck, do that even for 100Mbit systems, this type of DoS doesn't 
need much bandwidth to be effective.

I'll keep doing tests. If anything interesting shows up, I'll post the 
results in a new thread.

-- 
Florin Andrei

http://florin.myip.org/

From: James Hartley
Date: Tuesday, October 16, 2007 - 3:27 pm

First, thanks for sharing your findings.

Secondly, does anyone on the mailing list know of an OpenBSD
equivalent to pktgen?

Thanks.

Jim

From: Stuart Henderson
Date: Tuesday, October 16, 2007 - 3:44 pm

Not in-kernel, but netblast from the netrate package is somewhat
useful.

From: Florin Andrei
Date: Tuesday, October 16, 2007 - 5:14 pm

If anybody has a same-hardware performance comparison between pktgen and 
netblast, please post it. I'm especially interested in generating lots 
of small packets, which is difficult.

-- 
Florin Andrei

http://florin.myip.org/

From: Henning Brauer
Date: Wednesday, October 17, 2007 - 1:38 am

lovely :)

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam

From: Florin Andrei
Date: Monday, October 8, 2007 - 10:59 am

Disabled all pf rules including NAT, now it's just "pass in ; pass out"
Now the download is able to saturate the gig ports, about 112 Mbyte / sec.
But it's still not constantly at 112, it sometime drops below that about 
10%. When that happens, CPU0 has 0% idle cycles. A lot of interrupts, 
always above 70% on CPU0, going to 99% when the download slows down.
The congestion counter is now 0.

The UDP flood still freezes the system solid (but I discovered that the 
system clock continues to work more or less fine, it's just the text 
console and the firewall that are not responsive).

I still can't match the performance I get from Linux. Any suggestion is 
appreciated.

-- 
Florin Andrei

http://florin.myip.org/

From: Karsten McMinn
Date: Monday, October 8, 2007 - 6:05 pm

while is dreadfully obvious that there is some weirdness
happening, you'll definately get more performance by
switching to the latest snapshot or wait for your 4.2 cd
if it hasn't come yet.  What model transport do you have
and whats the Mainbords bios rev?

From: Florin Andrei
Date: Tuesday, October 9, 2007 - 9:07 am

Tyan Transport GT24-B3992
BIOS Date: 03/06/07 09:36:13 Ver: 08.00.11

-- 
Florin Andrei

http://florin.myip.org/

From: knitti
Date: Monday, October 8, 2007 - 1:49 pm

there were in the past postings on this list about problems with quad-port
em NICs. I am absolutely not in a position to tell whether they are relevant
for this situation.  If I remember correctly, there was a problem with TCP
checksum offloading, and a suggested fix in one instance was jumpering
the card down to 66 MHz. I can't tell if this is related in *any* way.

I think there are some people here who *could* tell if you'd post a dmesg.

gretings,
knitti

From: Florin Andrei
Date: Monday, October 8, 2007 - 5:28 pm

# dmesg 
 

OpenBSD 4.1 (GENERIC.MP) #1152: Sat Mar 10 19:22:57 MST 2007
     deraadt@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 3220754432 (3145268K)
avail mem = 2757828608 (2693192K)
using 22937 buffers containing 322281472 bytes (314728K) of memory
mainbus0 (root)
bios0 at mainbus0: SMBIOS rev. 2.3 @ 0xf97e0 (61 entries)
bios0: empty empty
acpi0 at mainbus0: rev 2
acpi0: tables DSDT FACP APIC OEMB SRAT
acpitimer at acpi0 not configured
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Dual-Core AMD Opteron(tm) Processor 2216, 2394.33 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB 
64b/line 16-way L2 cache
cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu0: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu0: apic clock running at 205MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Dual-Core AMD Opteron(tm) Processor 2216, 2465.82 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB 
64b/line 16-way L2 cache
cpu1: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu1: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Dual-Core AMD Opteron(tm) Processor 2216, 2465.82 MHz
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB 
64b/line 16-way L2 cache
cpu2: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu2: ...
Previous thread: Re: Sun Hardware by