I'm running into an interesting problem with joining multiple
multicast feeds. If you join multiple multicast feeds using
setsockopt(...,IP_ADD_MEMBERSHIP...) it causes packets on UNRELATED
multicast feeds to get dropped. We have a multicast feed on a rock
solid network, and we were very surprised to see dropped packets. The
cause was a different process/program being run by a different user
joining a bunch of mulitcast feeds.I can recreate this with a fairly simple testcase (attached below.)
The problem doesn't happen with unicast UDP data, and it doesn't
happen with loopback, so you need at least two systems to run this
(and what subscriber to netdev doesn't have at least two systems.) To
recreate, run "receiver" on one system, "sender", on another, and then
"joiner" on the receiving system. You should see a message pop out
saying that packets have been dropped. I've recreated this on a few
different kernel versions (the latest being 2.6.28) and a few
different sets off hardware. I HAVEN"T recreated it if the system
doing the IP_ADD_MEMBERSHIP specifies a specific interface rather than
INADDR_ANY. I'm not sure if that is core to the issue or not. You
may also need to bump the value in
/proc/sys/net/ipv4/igmp_max_memberships (though that hasn't seemed
necessary for me.)I poked around in igmp.c, but its mojo exceeds my threshold. If
anyone has any ideas or questions I'd be happy to hear them.diff -uNr null/joiner.c multicast/joiner.c
--- null/joiner.c 1969-12-31 18:00:00.000000000 -0600
+++ multicast/joiner.c 2009-03-14 15:04:10.000000000 -0500
@@ -0,0 +1,44 @@
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <unistd.h>
+
+#define NUMSOCK 55
+
+int main(int argc, char **argv)
+{
+ struct ip_mreq mreq;
+ int i;
+ int sd;
+ char ipaddr[64];
+
+ for (i=0; i&l...
I could not reproduce the problem on my machines (bnx2 adapter), even if changing
NUMSOCK from 55 to 200 in joiner.cIs your network a 100Mb one or Gigabit ?
Try to slow down your joiner ?
(Could be a flood of IGMP messages your router/switch cannot cope with)Please describe your "rock solid" network setup (kind of network adapters you have, kind of router...)
Each time an address is added, NIC driver have to reprogram mcfilter of
the device. Maybe some NIC can drop some packets at this moment...If using tcpdump to force promiscuous mode on the device also triggers packet losses ?
(see also ifconfig ethX promisc|allmulti)
--
Thanks for trying Eric. Based on your email I did some more testing
and thus far I've
only recreated this on x86_64 arches, not on i386. Which arch did youThe problem originally manifest itself at work on a 24-core Dell
server with 6 NICs. The network
is gigabit with a Cisco 4900 switch. I recreated it in my basement on
my little white-box
system and a cheap netgear switch. The NIC at work is Intel e1000e
driver, the oneI haven't had a chance to play with promiscuous yet...
--
Dave B
--
I tried both, 32 and 64 bit kernels. No problems so far.
--
Eric, based on your inability to recreate this, I tried on some other
hardware I had lying around that has an AMD chipset built-in NIC.
I could not recreate the problem on that hardware. I'm starting to
think this is an e1000 problem. In both the e1000 and e1000e
drivers they do the following logic:/* clear the old settings from the multicast hash table */
for (i = 0; i < mta_reg_count; i++) {
E1000_WRITE_REG_ARRAY(hw, MTA, i, 0);
E1000_WRITE_FLUSH();
}/* load any remaining addresses into the hash table */
for (; mc_ptr; mc_ptr = mc_ptr->next) {
hash_value = e1000_hash_mc_addr(hw, mc_ptr->da_addr);
e1000_mta_set(hw, hash_value);
}There's clearly a window where the NIC doesn't have the multicast
addresses loaded. This may just be broken-as-designed. If anyone
else happens to have some e1000 hardware and wants to see if you
can recreate this, I'd be curious.Some other notes just FYI...
- RcvbufErrors in /proc/net/snmp doesn't get incremented when this happens
- there are no messages in dmesg
- frames get dropped when the program calls exit() and all the sockets
get closed
(and multicast joins dropped) as well as when the ADD_MEMBERSHIPs happen
- The problem happens even when adding a sleep(1) in between each of the
ADD_MEMBERSHIP calls.--
Dave B
--
Interesting, this code has been there for eons (and probably this
behavior) but that doesn't mean its not a problem.We are in the process of figuring out if there are any hardware corner
cases to changing this code (particularly in e1000)Initial thoughts are:
1) kcalloc an array that we then populate with the hash functions, and
then program every location only once (never flush)
2) only program a single hash value each time a multicast is added (bad
because we can't tell the difference in the list since the last time
the OS gave us the list)It really seems like this should be fixable, and I agree that the driver
behavior is far from optimal, however well entrenched.Jesse
--
From: "Brandeburg, Jesse" <jesse.brandeburg@intel.com>
Just do what tg3 does to fix this now, get fancy and "beautiful"
later.
--
On Wed, Mar 18, 2009 at 12:24 PM, Brandeburg, Jesse
Hi Jesse, thanks for the response...
If you go back in this thread I had a dead easy unprivileged user-land testcase
that causes frame loss. We ran into this in a production environment
(and I kind
of glossed over how long it took to figure out why the hell we were dropping
frames...you can only increase rmem_max so many times ;-) OTOH not that many
people use multicast, and even fewer notice a few dropped frames, so the
priority is probably lowish.On the other other hand, I'm working in the financial trading space these days,
where Linux is pretty much king....and they're all about multicast.--
Dave B
--
here is a patch proposal [RFC] only, I've just briefly tested it for e1000
parts. If you want to give it a spin I would appreciate feedback.[RFC] e1000: fix loss of multicast packets
From: Jesse Brandeburg <jesse.brandeburg@intel.com>
e1000 (and e1000e, igb, ixgbe, ixgb) all do a series of operations each time a
multicast address is added. The flow goes something like1) stack adds one multicast address
2) stack passes whole current list of unicast and multicast addresses to
driver
3) driver clears entire list in hardware
4) driver programs each multicast address using iomem in a loopThis was causing multicast packets to be lost during the reprogramming
process.reference with test program:
http://kerneltrap.org/mailarchive/linux-netdev/2009/3/14/5160514/threadThanks to Dave Boutcher for his report and test program.
This driver fix prepares an array all at once in memory and programs it in
one shot to the hardware, not requiring an "erase" cycle. It would still
be possible for packets to be dropped while the receiver is off during
reprogramming.Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Dave Boutcher <daveboutcher@gmail.com>
---drivers/net/e1000/e1000_main.c | 40 +++++++++++++++++++++++++++++++---------
1 files changed, 31 insertions(+), 9 deletions(-)diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 26474c9..65697ab 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -2328,6 +2328,12 @@ static void e1000_set_rx_mode(struct net_device *netdev)
int mta_reg_count = (hw->mac_type == e1000_ich8lan) ?
E1000_NUM_MTA_REGISTERS_ICH8LAN :
E1000_NUM_MTA_REGISTERS;
+ u32 *mcarray = kzalloc(512, GFP_ATOMIC);
+
+ if (!mcarray) {
+ DPRINTK(PROBE, ERR, "memory allocation failed\n");
+ return;
+ }if (hw->mac_type == e1000_ich8lan)
rar_entries = E1000_RAR_ENTRIES_ICH8LAN;
@@ -2394,22 +2400,38 @@ static void e1000_set_...
Also, is using a third machine to start your joiner program is able to trigger
packet losses too ?--
| Linus Torvalds | Linux 2.6.21 |
| debian developer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Jan Engelhardt | intel iommu (Re: -mm merge plans for 2.6.23) |
| Greg Kroah-Hartman | [PATCH 002/196] Chinese: rephrase English introduction in HOWTO |
git: | |
| Andrew Morton | Re: [Bugme-new] [Bug 12877] New: tg3: eth0 transit timed out, resetting -> dead NIC |
| Krzysztof Oledzki | Error: an inet prefix is expected rather than "0/0". |
| David Miller | Re: iptables very slow after commit784544739a25c30637397ace5489eeb6e15d7d49 |
| Johann Baudy | Packet mmap: TX RING and zero copy |
