Hi, Whenever you run a monitor interface in mac80211, you can see lots of truesize bugs: SKB BUG: Invalid truesize (464) len=3D307, sizeof(sk_buff)=3D176 It appears to be caused by mac80211's re-injection of the transmitted frame. For those not familiar, here's what happens: When a frame comes in on say wlan0's hard_start_xmit(), it is prepared for transmission by the code there (802.11 headers added etc.) and then scheduled to the master interface. Once it arrives on the master (wmaster0) interface's hard_start_xmit(), it is modified again and finally handed to the driver. When the driver has transmitted the frame (successfully or not) it reports the status of the transmission to mac80211 including the skb the driver was given. At that point, things go different depending on circumstances. If no monitor interfaces are present, mac80211 simply orphans the skb and destroys it. If there are monitor interfaces, it pushes some data into the skb (the radiotap transmit status) and hands clones of the skb to netif_rx() for each monitor interface, or the skb itself for the last interface in the list. All this is in net/mac80211/main.c:ieee80211_tx_status. Now, the thing is that the skb truesize bug ONLY occurs when the last part here is done when a radiotap monitor interface is present, if you add dev_kfree_skb(skb); return; in that function somewhere before the skb_orphan() call it never happens. Hence, I'm confused. Since I only have a single monitor interface when this happens, it can't be due to af_packet either, afaict. Can anyone help me diagnose this? johannes
Hm, unrelated to this... But I am wondering what happens if the driver adds a device header to the skb. Is that header then also passed up netif_rx()? This doesn't happen for b43, as we use the DMA fragmentation to transmit the header, Seems the skb->destructor messes it up. -- Greetings Michael. --
Hmm. I thought we said that it was supposed to be removed again by the hardware before TX status reporting. That's what most drivers seem to do anyway. I'm considering turning off this transmit status reporting for now but Actually, it seems to be outside of mac80211, I put in a WARN_ON() and got this: Badness at include/linux/skbuff.h:392 NIP: c026ea14 LR: c0273d54 CTR: c026e9e4 REGS: edfc7c00 TRAP: 0700 Not tainted (2.6.25-wl-06841-g6b3d5c6-dirty) MSR: 00029032 <EE,ME,IR,DR> CR: 82022444 XER: 00000000 TASK =3D edf50e20[3453] 'tcpdump' THREAD: edfc6000 GPR00: 00000001 edfc7cb0 edf50e20 edfd7700 edfd7700 00000002 edfc7e75 03230= 306=20 GPR08: 02000100 00000168 4dff0200 00000150 22022442 100a6290 100783f8 10078= e18=20 GPR16: 10078e14 10078e10 100a0000 00000000 00000000 bfe2c9d2 1004d320 bfe2c= 4b0=20 GPR24: 10165070 edfd7724 00000060 00000020 ed8157f0 edfd7700 ed8157f0 edfd7= 700=20 NIP [c026ea14] sock_rfree+0x30/0x94 LR [c0273d54] skb_release_all+0x98/0x128 Call Trace: [edfc7cb0] [10078e10] 0x10078e10 (unreliable) [edfc7cc0] [c0273d54] skb_release_all+0x98/0x128 [edfc7cd0] [c0273034] __kfree_skb+0x18/0xc8 [edfc7ce0] [c02760d0] skb_free_datagram+0x1c/0x54 [edfc7cf0] [f264d068] packet_recvmsg+0x170/0x1e8 [af_packet] [edfc7d40] [c026b69c] sock_recvmsg+0xb8/0xf0 [edfc7e30] [c026b9d0] sys_recvfrom+0x94/0x100 [edfc7f00] [c026ca08] sys_socketcall+0x114/0x1dc [edfc7f40] [c00124cc] ret_from_syscall+0x0/0x38 johannes
From: Johannes Berg <johannes@sipsolutions.net> You're just seeing who freed it last here. It could have had it's ->truesize put into an illegal state elsewhere. --
Yes, I know, but it doesn't come from my skb_orphan() call. Hence, I just =EF=BB=BFnetif_rx() the packet which makes it go onto the input_pkt_qu= eue and then to netif_receive_skb() which gives it to af_packet and all others should ignore it since I set =EF=BB=BFPACKET_OTHERHOST. johannes
From: Johannes Berg <johannes@sipsolutions.net> I looked at the mac80211 code, the problem is the skb_push() you guys do in this situation. Things like loopback, which also orphan then reinject, don't trigger this problem because the re-input path trims things, never adds. The good news is that this is easy to fix. Since you've orphaned the SKB, simply adjust skb->truesize as you do pushes. Like this: mac80211: Adjust truesize in ieee80211_tx_status() when reinjecting. Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/net/mac80211/main.c b/net/mac80211/main.c index 9ad4e36..de2e904 100644 --- a/net/mac80211/main.c +++ b/net/mac80211/main.c @@ -1485,6 +1485,9 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb, rthdr = (struct ieee80211_tx_status_rtap_hdr*) skb_push(skb, sizeof(*rthdr)); + /* This is safe because the buffer has been orphaned. */ + skb->truesize += sizeof(*rthdr); + memset(rthdr, 0, sizeof(*rthdr)); rthdr->hdr.it_len = cpu_to_le16(sizeof(*rthdr)); rthdr->hdr.it_present =
Hmm. The disconnect between truesize and skb->len+sizeof(*skb) was usually 17 or 19 bytes and sizeof(*rthdr) is only 11. On the other hand, I don't see where the other bytes should be coming from. I'll give this a try, thanks. johannes
From: Johannes Berg <johannes@sipsolutions.net> Grrr, I bet it's coming from a combination of the skb_set_mac_header(skb, 0); call done by mac80211 and the skb_push() calls in net/packet/af_packet.c davem@sunset:~/src/GIT/net-2.6$ egrep skb_push net/packet/af_packet.c skb_push(skb, skb->data - skb_mac_header(skb)); skb_push(skb, skb->data - skb_mac_header(skb)); skb_push(skb, skb->data - skb_mac_header(skb)); davem@sunset:~/src/GIT/net-2.6$ --
Even when I explicitly set truesize (rather than adjusting it as you But mac80211 does set_mac_header(0) so this should just push zero bytes, no? johannes
From: Johannes Berg <johannes@sipsolutions.net> Right you are. So, I wonder what's causing the problem... Could you "remember" the length and truesize at the skb_orphan() point in mac80211, right after the skb_push(), then in the truesize warning, print those "remembered" values as well as the current ones. --
I was just playing with af_packet and added some debugging there that prints out the len of all packets it gets (for a certain ifidx) That's confusing me even more now. I get [ 7650.792004] packet_recv eda9e8c0 (len=3D137) [ 7650.792015] snaplen(eda9e8c0)=3D137 [ 7650.792027] free eda9e8c0, len =3D 137 [ 7650.792031] new skb: eda9e540 [ 7650.792039] packet_recv eda9e8c0 (len=3D137) [ 7650.792044] snaplen(eda9e8c0)=3D137 [ 7650.792048] new skb: eda9e8c0 [ 7650.819464] packet_recv d1f4e9a0 (len=3D124) [ 7650.819478] snaplen(d1f4e9a0)=3D124 [ 7650.819489] free d1f4e9a0, len =3D 124 [ 7650.819493] new skb: d1f4e8c0 [ 7650.819502] packet_recv d1f4e9a0 (len=3D124) [ 7650.819507] snaplen(d1f4e9a0)=3D124 [ 7650.819511] new skb: d1f4e9a0 [ 7651.215631] packet_recv e9ecc2a0 (len=3D376) [ 7651.215645] snaplen(e9ecc2a0)=3D376 [ 7651.215657] free e9ecc2a0, len =3D 376 [ 7651.215662] new skb: ede04b60 [ 7651.215671] packet_recv e9ecc2a0 (len=3D376) [ 7651.215675] snaplen(e9ecc2a0)=3D376 [ 7651.215680] new skb: e9ecc2a0 [ 7651.760751] SKB BUG: Invalid truesize (528) len=3D357, sizeof(sk_buff)= =3D176 528-176 is 352 which doesn't occur in that list... Maybe I should print it in mac80211. johannes
skb->truesize should always account the skb->head area in its entirety so we should never need to adjust it when pushing or pulling. So I suggest we find the place that expanded the head area and make the adjustment there. Alternative we could adjust it right after the orphan call if the expansion occurs where we can't adjust the truesize. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
From: Herbert Xu <herbert@gondor.apana.org.au>
That makes more sense, good catch Herbert.
I guess it's the pskb_expand_head() calls done by net/mac80211/tx.c
I suspect we'll need to orphan early in order to accomodate these
adjustments, otherwise socket memory buffer allocations will
be corrupted.
Once that is cured, I think we can detect this better, by adding a
carefully constructed assertion to pskb_expand_head(). Basically, the
idea is, if "nhead" or "ntail" are non-zero, and there is a socket
still attached to the SKB, print a warning message.
Something like:
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4fe605f..9bfca08 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -699,6 +699,12 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
if (skb_shared(skb))
BUG();
+ if (unlikely((nhead || ntail) && skb->sk)) {
+ printk(KERN_ERR "SKB BUG: Illegal pskb expand (%d:%d) "
+ "with socket attached\n",
+ nhead, ntail);
+ }
+
size = SKB_DATA_ALIGN(size);
data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask);
--
Ok I think I'm starting to understand this a little better. However, shouldn't this function update skb->truesize so if the skb is later attached to a different socket again it has the right size? johannes
From: Johannes Berg <johannes@sipsolutions.net> Relax :-) We certainly could check that there is no socket attached here, and make the truesize adjustment right at this spot. It just never happened before in practice in a way that matters. That's why we have the truesize assertion, to discover situations like this and thus be able to fix it. --
Except unfortunately the truesize assertion is rather useless since you have no idea where it comes from. FWIW, some caller that does the adjustment must be going wrong, whenever I start vpnc I get a single one like this: [ 162.108556] SKB BUG: Invalid truesize (408) size=3D432, sizeof(sk_buff)=3D176 This is again without the patch to pskb_expand_head that did the truesize adjustment, I only put in a WARN_ON (similar to the code you had above but is, I think, more useful since it has a stack dump and other useful info) Right now I think I'm too lazy to dig into where this happens. I don't hit the warning in pskb_expand_head so it must be one of the other 20-odd places where truesize is adjusted. Maybe I'll just make each of them print out the info. johannes
I'm confused now. I added this patch:
--- everything.orig/net/core/skbuff.c 2008-05-01 13:21:52.000000000 +0200
+++ everything/net/core/skbuff.c 2008-05-01 13:29:57.000000000 +0200
@@ -683,6 +683,14 @@ int pskb_expand_head(struct sk_buff *skb
if (!data)
goto nodata;
=20
+ if (unlikely((nhead || ntail) && skb->sk)) {
+ printk(KERN_ERR "SKB BUG: Illegal pskb expand (%d:%d) "
+ "with socket attached\n",
+ nhead, ntail);
+ dump_stack();
+ } else
+ skb->truesize =3D size + sizeof(struct sk_buff);
+
/* Copy only real data... and, alas, header. This should be
* optimized for the cases when header is void. */
#ifdef NET_SKBUFF_DATA_USES_OFFSET
--- everything.orig/net/mac80211/tx.c 2008-05-01 13:01:09.000000000 +0200
+++ everything/net/mac80211/tx.c 2008-05-01 13:16:50.000000000 +0200
@@ -1279,6 +1279,8 @@ int ieee80211_master_start_xmit(struct s
int headroom;
int ret;
=20
+ skb_orphan(skb);
+
if (info->flags & IEEE80211_TX_CTL_READY_FOR_TX) {
/*
* We set the IEEE80211_TX_CTL_READY_FOR_TX bit in all skbs
@@ -1581,6 +1583,7 @@ int ieee80211_subif_start_xmit(struct sk
* us broadcast frames. */
=20
if (head_need > 0 || skb_cloned(skb)) {
+ skb_orphan(skb);
#if 0
printk(KERN_DEBUG "%s: need to reallocate buffer for %d bytes "
"of headroom\n", dev->name, head_need);
and the assertion never triggers, however I get a number of bugs like this:
=EF=BB=BFSKB BUG: Invalid truesize (4294963740) len=3D44, sizeof(sk_buff)=
=3D176, skb=3D0xeecb6620
and definitely cannot explain that number (-3556).
johannes
From: David Miller <davem@davemloft.net> I think the mac80211 encryption paths on TX could be doing this as well, just something I noticed meanwhile. --
Indeed. But then why did we never see this bug w/o monitor interfaces and this reinjection? johannes
The debugging only catches it if the expanded area actually gets used, e.g., by skb_push. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
I'm confused. The area should be used say with encryption when it's actually necessary. Maybe there's always enough headroom for some reason now? On another note, why is this truesize mismatch a bug anyway? I mean, the field could just be called "socket_charged_size" and simply be required to have the same value throughout the skb lifetime, the slight mismatch between charged bytes and actually used bytes wouldn't usually matter too much, would it? johannes
From: Johannes Berg <johannes@sipsolutions.net> That's possible. There are paths, that, although they could take advantage of the fact that all of the non-header packet data is going to come from Sure the name could change, but the problem wouldn't. We can't update skb->truesize during arbitray skb->data reallocations, because it could corrupt the socket accounting. On the other hand, if we provide ways for users to subvert the socket buffer limits, we might as well not try to limit anything. Take a look at some ethernet drivers that implement TSO in a way that requires munging the IP headers for whatever reason. If they need to COW the packet data in order to modify it, they always do this with pskb_expand_head(skb, 0, 0, GFP_*) exactly so that they don't modify the SKB data size, and exactly so that the skb->truesize value stays accurate. Try to find out exactly what's going wrong here, you seem to be close. Once we know the precise issue we can talk about real changes to make this easier to cope with and debug in the future. --
Well, not exactly, since we'd only do that (at least in mac80211) when packets are about to be sent to the hardware so they wouldn't live much Right, but we might actually need more space. Say you have a device that requires 82 bytes headroom (yes, there are such devices) for their own transmit header. Then you need maybe up to 30 bytes of 802.11 header plus 8 byte ICV, so minus the 14 ethernet header that we remove we now need well over 100 bytes headroom. On the other hand, not even accounting the actual data buffer (you proposed to skb_orphan the skb early) seems wrong as well. Worse yet, the needed transmit header headroom is variable depending on devices. One of the worst devices is the Broadcom one with 82 header and nowadays actually DMAs this header from a separate memory location, so there this won't happen, but can we guarantee that all devices are programmable I'll try. I don't really see myself being that close ;) johannes
The worst case is probably prism54 usb devices which by itself need 76 bytes headroom for the USB buffer, and then when we say run mesh on top of it we'll need a total of 122 bytes. Needless to say, it cannot do s/g operation. The question is: how do we handle that? Do we reallocate the buffer in the driver? That is well possible but makes it rather inconvenient for driver authors. Also, mac80211 will still need 46 bytes of headroom and 12 bytes of tailroom in the worst case (so far, HT might require four more.) If we skb_orphan() the skb right away we have essentially removed all socket memory accounting, so that's pretty pointless. Should we increase the LL_MAX_HEADER constant to 40 (no mesh networking) or 46 for when 802.11 (with mesh networking) is configured into the kernel? Most people probably don't run an IPIP tunnel over wireless yet configure them in (especially distros) so that might be why we never saw the problem before. johannes
From: Johannes Berg <johannes@sipsolutions.net> Date: Sat, 03 May 2008 16:25:20 +0200 [ Ok, I'm going through all of this, we'll solve the problems one at Thanks for these numbers. Based upon them I will adjust LL_MAX_HEADER as follows: net: Set LL_MAX_HEADER properly for wireless. Wireless networking, particularly with MESH enabled, has quite strong requirements for link-layer header space. Based upon some numbers and descriptions from Johannes Berg we use 96 (same as AX25) for plain wireless, and with mesh enabled we use 128. In the process, simplify the cpp conditional logic here by ordering the cases by those needing the most space down to those needing the least case. Signed-off-by: David S. Miller <davem@davemloft.net> --- include/linux/netdevice.h | 16 +++++++++------- 1 files changed, 9 insertions(+), 7 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7469017..a3fb57f 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -93,14 +93,16 @@ struct wireless_dev; * used. */ -#if !defined(CONFIG_AX25) && !defined(CONFIG_AX25_MODULE) && !defined(CONFIG_TR) -#define LL_MAX_HEADER 32 +#if defined(CONFIG_WLAN_80211) || defined(CONFIG_AX25) || defined(CONFIG_AX25_MODULE) +# if defined(CONFIG_MAC80211_MESH) +# define LL_MAX_HEADER 128 +# else +# define LL_MAX_HEADER 96 +# endif +#elif defined(CONFIG_TR) +# define LL_MAX_HEADER 48 #else -#if defined(CONFIG_AX25) || defined(CONFIG_AX25_MODULE) -#define LL_MAX_HEADER 96 -#else -#define LL_MAX_HEADER 48 -#endif +# define LL_MAX_HEADER 32 #endif #if !defined(CONFIG_NET_IPIP) && !defined(CONFIG_NET_IPIP_MODULE) && \ -- 1.5.5.1.57.g5909c --
Is WLAN_80211 really the right symbol here? It applies to more than just mac80211, and I doubt if full MAC devices need higher values. It seems like MAC80211 would be the right symbol to add. John -- John W. Linville linville@tuxdriver.com --
Most fullmac devices also need some space, whether they do things that way or not is a completely different thing, the ieee80211 stack for example copies every packet into a new skb anyway. johannes
On Tue, May 13, 2008 at 11:59 PM, Johannes Berg It's really depends on when is the 802.3 to 802.11 header translation made it doesn't matter whether it's fullmac or softmac. I would say that WLAN is OK. Tomas --
True, some devices take 802.3 frames and do the translation and everything themselves. I dunno. I guess it would make sense to make it mac80211 anyway because ieee80211 doesn't care and the other drivers are probably handling this in some other way anyway (if maybe even by using different dma descriptors, or just not caring because transfer speeds are low enough) johannes
From: Johannes Berg <johannes@sipsolutions.net> We should fix ieee80211 (it's something I planned to look into today in fact), and the drivers that copy or do things incorrectly. It doesn't make sense to make this decision based upon which 80211 stack or which drivers are enabled. All of them in essence want this extra space, so let's give it to them. --
Why don't we update the socket allocation when doing pskb_expand_head()? We need more space though. Should we then just increase the built-in headroom? johannes
From: Johannes Berg <johannes@sipsolutions.net> The socket locked state at this time is variable and unknown. The socket must be locked in order to modify these values. And such locks cannot be taken, for example, from HW interrupt I simply don't know what to suggest at this point, that's why we are having this discussion :-) --
I'm still not sure about the dependencies between LL_MAX_HEADER,
dev->hard_header_len and similar. Why, for example, does ipip set it to
LL_MAX_HEADER + sizeof(struct iphdr)? Because it doesn't know better
since the packets it creates could be routed anywhere?
Could mac80211 announce it needs a very long hard_header_len (say 48 (or
54) bytes)? Am I right in thinking that then we'd have to increase
LL_MAX_HEADER as well? I haven't found a check somewhere that warns you
if you set dev->hard_header_len > LL_MAX_HEADER, should there be one?
If I increase dev->hard_header_len, will that have any negative impact
on the caching since I'm still just using regular ethernet headers?
As far as I understand we have a few options:
(a) go along with it as we do now, use pskb_expand_head, just call
skb_orphan before. I assume this has a number of requirements just
like sock size accounting would have, does this work from within a
hard_start_xmit path? I haven't seen any problems with it so far
anyway.
(b) clone the skb and free the original. pretty much equivalent
(c) increase hard_header_len/LL_MAX_HEADER constants to 48 (54).
Options (a) and (b) make the accounting pretty useless since that would
drop the charge to the socket quite early. (c) doesn't seem to work, I
tried =EF=BB=BFjust increasing LL_MAX_HEADER doesn't seem to help although
MAX_TCP_HEADER suggests it should be getting enough headroom then.
Ideally, we'd have enough headroom in the skb to start with, since right
now we're apparently reallocating a lot, especially encrypted frames.
Not that I understand why we don't get a truesize bug (without the
monitors) when we do that.
With smart hardware like b43 we could even think about putting the
802.11 header stuff into a separate buffer and have the hardware to
gather-dma but there are so many dumb usb devices that this won't help
much anyway.
johannes
The difference between LL_MAX_HEADER and hard_header_len is that the former is a system-wide hint of how big the header can be and the latter is a device-specific value. In other words, we use the former when we don't know where we create a packet since we don't know where it'll end up. As such we allocate a generous amount of header space so that hopefully nobody will have to expand the header later. However, it would be OK for someone to expand it but obviously it this happened for the majority of your traffic then you've done something wrong since expanding it wastes CPU resources by copying the data. The value of hard_header_len on the other hand is used for two purposes at least. First of all it is a hint of how much free header space there should be in a packet before it goes into a device (but don't rely on it). More importantly it's the length of the hardware header in the packet. The definition of the hardware header varies with each type of device. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
Ok, thanks for the explanation. The latter probably means that I cannot use hard_header_len since mac80211 really needs devices to show up as ethernet devices. However, it wouldn't really help anyway if I cannot rely on it being available. So, what is mac80211 supposed to do? It needs up to 54 bytes of available headroom (for an encrypted mesh packet which currently can't really happen, but anyway) yet it cannot pskb_expand_head() either. Cloning each packet seems even more expensive, and just like skb_orphan subverts the purpose of the socket accounting. johannes
If all/the majority of your packets need the space then put it in LL_MAX_HEADER. In any case, you should always expand the packet if necessary in your output routine since LL_MAX_HEADER is just a hint. Yes cloning is expensive compared to not having to do it, but as long as this is only done for the exception then it's irrelevant. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
Yes, wireless always needs at least 24 bytes, but more likely 34 (encryption+QoS). However, I just increased LL_MAX_HEADER to 54 and that What's wrong with, instead, doing skb_orphan() and then pskb_expand_head()? That seems to have the same effect. johannes
If you packet sticks around for long enough then this skews the accounting. In any case, thinking too much about optimising this part is a waste of time because we should be thinking about having enough head room in the packet so that we don't have to expand it in the first place except for the odd packet. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
I stuck a WARN_ON((nhead || ntail) && skb->sk) into pskb_expand_head (which never triggered except from mac80211). And mac80211 has code that calculates the required header length and only calls pskb_expand_head() from that place when it needs more header space (or the skb is cloned. I Very true. How about tail length? :) johannes
I just re-did the test, and I definitely need 29 more bytes on, for example, the IPv6 autodiscovery packets and ICMP packets generated with ping(8). Some of them even need additional tailroom. johannes
Could you find out the code path that created the skb in the kernel so we can see why there isn't enough head room? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
Yeah I was just considering putting a stacktrace into the skb when it's allocated. Haven't done, lunchtime now. johannes
http://johannes.sipsolutions.net/patches/kernel/all/2008-05-04-11%3a23/027-= skb-alloc-stackdump.patch will report results after kernel rebuild. johannes
7-skb-alloc-stackdump.patch This was broken when cloning, this one works: http://johannes.sipsolutions.net/patches/kernel/all/2008-05-04-12%3a19/027-= skb-alloc-stackdump.patch Now I see the problem. I increased the LL_MAX_HEADER constant, but all code uses dev->hard_header_len to allocate the headroom (via LL_RESERVED_SPACE), e.g. packet.c: packet_sendmsg_spkt: skb =3D sock_wmalloc(sk, len + LL_RESERVED_SPACE(dev), 0, GFP_KERNE= L); =20 =20 [...] /* FIXME: Save some space for broken drivers that write a * hard header at transmission time by themselves. PPP is the * notable one here. This should really be fixed at the driver leve= l. */ skb_reserve(skb, LL_RESERVED_SPACE(dev)); This one even complains about "broken drivers" (like PPP, but wireless code behaves like this too). This is getting really really really frustrating. All kinds of comments all over tell you how this all is wrong but NEVER actually tell you how to do it correctly! As far as I understand, I cannot change dev->hard_header_len because I want/need an ethernet header and not more. But then I don't get enough headroom. Oddly enough, I even get a warning from tcp_connect() although that actually does /* Reserve space for headers. */ skb_reserve(buff, MAX_TCP_HEADER); which should definitely be sufficient. Maybe the packet gets cloned somewhere? On the other hand, IPv4 raw's =EF=BB=BFraw_send_hdrinc reserves LL_RESERVED_SPACE(rt->u.dst.dev) like af_packet.... johannes
Right hard_header_len is being overloaded with two meanings. On the one hand it's used to indicate the length of the header managed by the hardware header cache, on the other hand we use it to indicate the head room for those devices that don't have a cache. Since your device wants both, it fails miserably. There are three ways out of this. You could have your own cache (I haven't looked at your headers so I don't know whether that makes sense), have no cache at all, or add a new field that gets added to hard_header_len for the purposes of calculating head room. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
In fact, I should probably add another field for the needed tail length too since we need that as well. johannes
This doesn't really make all that much sense since 802.11 has not only variable length headers but also needs a fixed BSSID in the header and That wouldn't be good either I think since we can actually use the ethernet header quite well. johannes
Actually this is not as bad as it sounds. Since you're constructing the hardware header anyway, it may turn out to be easier to tack on the Ethernet header while you're at it. Please note that the hardware cache is not the same as the neighbour cache. Just because you lose the hardware cache doesn't mean that you'd lose the neighbour (ARP) cache. However, this would mean that you'd no longer have an Ethernet device which is both good and bad. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
We actuall remove the ethernet header and rewrite it to a hardware Yeah. I don't think that'll go down well with userspace. johannes
I tried the last option, patch below, didn't quite work out yet, but I
got:
[ 256.241496] KERNEL: assertion (!atomic_read(&sk->sk_wmem_alloc))
failed at net/packet/af_packet.c (225)
any ideas about that?
johannes
---
include/linux/netdevice.h | 10 ++++++++--
net/core/netpoll.c | 2 +-
net/econet/af_econet.c | 2 +-
net/ipv4/arp.c | 2 +-
net/ipv4/igmp.c | 4 ++--
net/ipv4/ip_forward.c | 4 +++-
net/ipv4/ipconfig.c | 6 +++---
net/ipv4/ipip.c | 1 +
net/ipv4/ipmr.c | 1 +
net/ipv4/ipvs/ip_vs_xmit.c | 1 +
net/ipv4/raw.c | 10 ++++------
net/ipv6/ip6_output.c | 3 ++-
net/ipv6/mcast.c | 4 ++--
net/ipv6/ndisc.c | 4 ++--
net/ipv6/raw.c | 10 ++++------
net/ipv6/sit.c | 2 ++
net/packet/af_packet.c | 2 +-
net/xfrm/xfrm_output.c | 11 ++++++-----
18 files changed, 45 insertions(+), 34 deletions(-)
--- everything.orig/include/linux/netdevice.h 2008-05-04 14:48:47.000000000 +0200
+++ everything/include/linux/netdevice.h 2008-05-04 15:34:04.000000000 +0200
@@ -254,11 +254,16 @@ struct hh_cache
*
* We could use other alignment values, but we must maintain the
* relationship HH alignment <= LL alignment.
+ *
+ * LL_ALLOCATED_SPACE also takes into account the tailroom the device
+ * may need.
*/
#define LL_RESERVED_SPACE(dev) \
- (((dev)->hard_header_len&~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
+ ((((dev)->hard_header_len+(dev)->needed_headroom)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
#define LL_RESERVED_SPACE_EXTRA(dev,extra) \
- ((((dev)->hard_header_len+extra)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
+ ((((dev)->hard_header_len+(dev)->needed_headroom+(extra))&~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
+#define LL_ALLOCATED_SPACE(dev) \
+ ((((dev)->hard_header_len+(dev)->needed_headroom+(dev)->needed_tailroom)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
struct header_ops {
int (*create) ...Except for that, the patch works fine with mac80211 setting the new netdev fields. I still get lots of reallocations though because I get a lot of cloned packets, not sure why. johannes
Oddly, I haven't been able to reproduce this, so I'll clean up the patches now. If I have a device that needs 100 bytes headroom (not unheard of), there will of course still be packets that don't have enough headroom. Should we skb_orphan them? johannes
From: Johannes Berg <johannes@sipsolutions.net> This is exactly what happens if you change skb->truesize when a socket is still attached to the skb. --
Ok, that makes sense. Then it also makes sense I never got that warning again, I had left one of my test patches in too long. johannes
From: Herbert Xu <herbert@gondor.apana.org.au> I know what causes this problem, things like ARP. They don't use LL_MAX_HEADER, and instead go: skb = alloc_skb(arp_hdr_len(dev) + LL_RESERVED_SPACE(dev), GFP_ATOMIC); because they are reasonably sure what exact device they are sending out of :-) As mentioned elsewhere, there is a disconnect between how some of these values are used. But I'm pretty sure I know why some of them are used this way. --
Hmm. I wonder if the reason might be that the skb itself is long enough, but has so much tailroom that it doesn't have enough headroom, but when we reallocate it to have more headroom we actually later don't use the extra tailroom. That would explain why I don't get truesize bugs although I do get a warning (from the modified pskb_expand_head()) that I mustn't reallocate the buffer. johannes
Yes that would be a meaningful improvement although we'd need to audit/test this to make sure that we don't spam people's logs with it. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
It does spam the log. A lot. And I don't know why, from this discussion I only thought that it shouldn't. johannes
This was a stupid mistake, if you do it correctly it actually works and
so far has only triggered a single warning on my system:
[ 217.507048] SKB BUG: Invalid truesize (4294964120) size=432, sizeof(sk_buff)=176
that was with my patch though to update skb->truesize during !skb->sk
pskb_expand_head() calls.
johannes
---
include/linux/skbuff.h | 8 ++++++--
net/core/skbuff.c | 10 ++++++++--
2 files changed, 14 insertions(+), 4 deletions(-)
--- everything.orig/include/linux/skbuff.h 2008-05-03 15:47:00.000000000 +0200
+++ everything/include/linux/skbuff.h 2008-05-04 00:30:34.000000000 +0200
@@ -387,9 +387,13 @@ extern void skb_truesize_bug(struc
static inline void skb_truesize_check(struct sk_buff *skb)
{
- int len = sizeof(struct sk_buff) + skb->len;
+#ifdef NET_SKBUFF_DATA_USES_OFFSET
+ int len = sizeof(struct sk_buff) + skb->end;
+#else
+ int len = sizeof(struct sk_buff) + (skb->end - skb->head);
+#endif
- if (unlikely((int)skb->truesize < len))
+ if (unlikely((int)skb->truesize != len))
skb_truesize_bug(skb);
}
--- everything.orig/net/core/skbuff.c 2008-05-03 16:29:23.000000000 +0200
+++ everything/net/core/skbuff.c 2008-05-04 00:31:32.000000000 +0200
@@ -151,9 +151,15 @@ void skb_under_panic(struct sk_buff *skb
void skb_truesize_bug(struct sk_buff *skb)
{
+#ifdef NET_SKBUFF_DATA_USES_OFFSET
+ int len = sizeof(struct sk_buff) + skb->end;
+#else
+ int len = sizeof(struct sk_buff) + (skb->end - skb->head);
+#endif
+
printk(KERN_ERR "SKB BUG: Invalid truesize (%u) "
- "len=%u, sizeof(sk_buff)=%Zd\n",
- skb->truesize, skb->len, sizeof(struct sk_buff));
+ "size=%u, sizeof(sk_buff)=%Zd\n",
+ skb->truesize, len, sizeof(struct sk_buff));
}
EXPORT_SYMBOL(skb_truesize_bug);
--
RnJvbTogSm9oYW5uZXMgQmVyZyA8am9oYW5uZXNAc2lwc29sdXRpb25zLm5ldD4NCkRhdGU6IFN1 biwgMDQgTWF5IDIwMDggMDA6NTY6MjMgKzAyMDANCg0KPiBUaGlzIHdhcyBhIHN0dXBpZCBtaXN0 YWtlLCBpZiB5b3UgZG8gaXQgY29ycmVjdGx5IGl0IGFjdHVhbGx5IHdvcmtzIGFuZA0KPiBzbyBm YXIgaGFzIG9ubHkgdHJpZ2dlcmVkIGEgc2luZ2xlIHdhcm5pbmcgb24gbXkgc3lzdGVtOg0KPiAN Cj4gWyAgMjE3LjUwNzA0OF0gU0tCIEJVRzogSW52YWxpZCB0cnVlc2l6ZSAoNDI5NDk2NDEyMCkg c2l6ZT00MzIsIHNpemVvZihza19idWZmKT0xNzYNCj4g77u/DQo+IHRoYXQgd2FzIHdpdGggbXkg cGF0Y2ggdGhvdWdoIHRvIHVwZGF0ZSBza2ItPnRydWVzaXplIGR1cmluZyAhc2tiLT5zaw0KPiBw c2tiX2V4cGFuZF9oZWFkKCkgY2FsbHMuDQoNClRoYW5rcyBhIGxvdCBmb3IgdHJhY2tpbmcgdGhp cyBkb3duIEpvaGFubmVzLg0K --
No, no, I haven't. The only thing that was a stupid mistake was my change to check the actual allocated skb size in the truesize bug rather than do the < len check... That works, when you do it as in that patch, but the mac80211 stuff still stands (along with all the questions I posted elsewhere in this thread) johannes
This could actually work :) I was worried about tunnelling doing a genuine expansion on such a packet but it turns that it does a skb_clone first. So perhaps we should just require that if a packet is to be expanded while attached to a socket then it must be cloned first. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
't WARN_ON()? Not sure. We could try to analyse the packet header and see if it's 802.11, or put a length into skb->cb (after my rework to put things into skb->cb we have a single byte free so we could check skb->len % 256) johannes
Regardless of the fact that I'm pretty sure that we have bugs with respect to pskb_expand_head in mac80211 that for some reason never show up unless you add monitors, this re-injection of frames is quite a hack. For one, it means that on a monitor interface you see every outgoing frame, even if it wasn't sent on that interface. That's actually quite nice, especially for debugging, but I think we can already see all transmitted packets on the master interface. Secondly, it means that actually injected packets show up twice, first via dev_queue_xmit_nit and then via the status reporting. Also, if we ever want to move towards an API where the driver need not give a TX status report for every frame (some hardware cannot do this) then this is at best a hack since we do not know the exact status for a frame. On the other hand, hostapd requires knowing whether a specific frame was acknowledged. Does anyone have any ideas how to implement such a status query for a frame that was sent? What hostapd does is open a raw socket and simply send a frame, but it needs to know whether that frame was acknowledged so currently it also listens on a monitor and checks all outgoing frames, which is suboptimal anyway. The only thing I could so far think of is like the socket timestamps but that seems rather bad to do for such a special case. johannes
