Hi,
Whenever you run a monitor interface in mac80211, you can see lots of
truesize bugs:SKB BUG: Invalid truesize (464) len=3D307, sizeof(sk_buff)=3D176
It appears to be caused by mac80211's re-injection of the transmitted
frame. For those not familiar, here's what happens:When a frame comes in on say wlan0's hard_start_xmit(), it is prepared
for transmission by the code there (802.11 headers added etc.) and then
scheduled to the master interface. Once it arrives on the master
(wmaster0) interface's hard_start_xmit(), it is modified again and
finally handed to the driver.When the driver has transmitted the frame (successfully or not) it
reports the status of the transmission to mac80211 including the skb the
driver was given. At that point, things go different depending on
circumstances.If no monitor interfaces are present, mac80211 simply orphans the skb
and destroys it. If there are monitor interfaces, it pushes some data
into the skb (the radiotap transmit status) and hands clones of the skb
to netif_rx() for each monitor interface, or the skb itself for the last
interface in the list.All this is in net/mac80211/main.c:ieee80211_tx_status.
Now, the thing is that the skb truesize bug ONLY occurs when the last
part here is done when a radiotap monitor interface is present, if you
adddev_kfree_skb(skb);
return;in that function somewhere before the skb_orphan() call it never
happens. Hence, I'm confused. Since I only have a single monitor
interface when this happens, it can't be due to af_packet either,
afaict.Can anyone help me diagnose this?
johannes
Regardless of the fact that I'm pretty sure that we have bugs with
respect to pskb_expand_head in mac80211 that for some reason never show
up unless you add monitors, this re-injection of frames is quite a hack.For one, it means that on a monitor interface you see every outgoing
frame, even if it wasn't sent on that interface. That's actually quite
nice, especially for debugging, but I think we can already see all
transmitted packets on the master interface.Secondly, it means that actually injected packets show up twice, first
via dev_queue_xmit_nit and then via the status reporting.Also, if we ever want to move towards an API where the driver need not
give a TX status report for every frame (some hardware cannot do this)
then this is at best a hack since we do not know the exact status for a
frame.On the other hand, hostapd requires knowing whether a specific frame was
acknowledged.Does anyone have any ideas how to implement such a status query for a
frame that was sent? What hostapd does is open a raw socket and simply
send a frame, but it needs to know whether that frame was acknowledged
so currently it also listens on a monitor and checks all outgoing
frames, which is suboptimal anyway.The only thing I could so far think of is like the socket timestamps but
that seems rather bad to do for such a special case.johannes
Hm, unrelated to this...
But I am wondering what happens if the driver adds a device header to the skb.
Is that header then also passed up netif_rx()?
This doesn't happen for b43, as we use the DMA fragmentation to transmit the header,Seems the skb->destructor messes it up.
--
Greetings Michael.
--
Hmm. I thought we said that it was supposed to be removed again by the
hardware before TX status reporting. That's what most drivers seem to do
anyway.I'm considering turning off this transmit status reporting for now but
Actually, it seems to be outside of mac80211, I put in a WARN_ON() and
got this:Badness at include/linux/skbuff.h:392
NIP: c026ea14 LR: c0273d54 CTR: c026e9e4
REGS: edfc7c00 TRAP: 0700 Not tainted (2.6.25-wl-06841-g6b3d5c6-dirty)
MSR: 00029032 <EE,ME,IR,DR> CR: 82022444 XER: 00000000
TASK =3D edf50e20[3453] 'tcpdump' THREAD: edfc6000
GPR00: 00000001 edfc7cb0 edf50e20 edfd7700 edfd7700 00000002 edfc7e75 03230=
306=20
GPR08: 02000100 00000168 4dff0200 00000150 22022442 100a6290 100783f8 10078=
e18=20
GPR16: 10078e14 10078e10 100a0000 00000000 00000000 bfe2c9d2 1004d320 bfe2c=
4b0=20
GPR24: 10165070 edfd7724 00000060 00000020 ed8157f0 edfd7700 ed8157f0 edfd7=
700=20
NIP [c026ea14] sock_rfree+0x30/0x94
LR [c0273d54] skb_release_all+0x98/0x128
Call Trace:
[edfc7cb0] [10078e10] 0x10078e10 (unreliable)
[edfc7cc0] [c0273d54] skb_release_all+0x98/0x128
[edfc7cd0] [c0273034] __kfree_skb+0x18/0xc8
[edfc7ce0] [c02760d0] skb_free_datagram+0x1c/0x54
[edfc7cf0] [f264d068] packet_recvmsg+0x170/0x1e8 [af_packet]
[edfc7d40] [c026b69c] sock_recvmsg+0xb8/0xf0
[edfc7e30] [c026b9d0] sys_recvfrom+0x94/0x100
[edfc7f00] [c026ca08] sys_socketcall+0x114/0x1dc
[edfc7f40] [c00124cc] ret_from_syscall+0x0/0x38johannes
't WARN_ON()?
Not sure. We could try to analyse the packet header and see if it's
802.11, or put a length into skb->cb (after my rework to put things into
skb->cb we have a single byte free so we could check skb->len % 256)johannes
From: Johannes Berg <johannes@sipsolutions.net>
You're just seeing who freed it last here.
It could have had it's ->truesize put into an illegal state
elsewhere.
--
Yes, I know, but it doesn't come from my skb_orphan() call. Hence, I
just =EF=BB=BFnetif_rx() the packet which makes it go onto the input_pkt_qu=
eue
and then to netif_receive_skb() which gives it to af_packet and all
others should ignore it since I set =EF=BB=BFPACKET_OTHERHOST.johannes
From: Johannes Berg <johannes@sipsolutions.net>
I looked at the mac80211 code, the problem is the skb_push() you
guys do in this situation.Things like loopback, which also orphan then reinject, don't trigger
this problem because the re-input path trims things, never adds.The good news is that this is easy to fix.
Since you've orphaned the SKB, simply adjust skb->truesize as you
do pushes. Like this:mac80211: Adjust truesize in ieee80211_tx_status() when reinjecting.
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 9ad4e36..de2e904 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -1485,6 +1485,9 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb,
rthdr = (struct ieee80211_tx_status_rtap_hdr*)
skb_push(skb, sizeof(*rthdr));+ /* This is safe because the buffer has been orphaned. */
+ skb->truesize += sizeof(*rthdr);
+
memset(rthdr, 0, sizeof(*rthdr));
rthdr->hdr.it_len = cpu_to_le16(sizeof(*rthdr));
rthdr->hdr.it_present =
skb->truesize should always account the skb->head area in its
entirety so we should never need to adjust it when pushing or
pulling. So I suggest we find the place that expanded the head
area and make the adjustment there. Alternative we could adjust
it right after the orphan call if the expansion occurs where we
can't adjust the truesize.Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
From: Herbert Xu <herbert@gondor.apana.org.au>
That makes more sense, good catch Herbert.
I guess it's the pskb_expand_head() calls done by net/mac80211/tx.c
I suspect we'll need to orphan early in order to accomodate these
adjustments, otherwise socket memory buffer allocations will
be corrupted.Once that is cured, I think we can detect this better, by adding a
carefully constructed assertion to pskb_expand_head(). Basically, the
idea is, if "nhead" or "ntail" are non-zero, and there is a socket
still attached to the SKB, print a warning message.Something like:
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4fe605f..9bfca08 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -699,6 +699,12 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
if (skb_shared(skb))
BUG();+ if (unlikely((nhead || ntail) && skb->sk)) {
+ printk(KERN_ERR "SKB BUG: Illegal pskb expand (%d:%d) "
+ "with socket attached\n",
+ nhead, ntail);
+ }
+
size = SKB_DATA_ALIGN(size);data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask);
--
This could actually work :) I was worried about tunnelling doing
a genuine expansion on such a packet but it turns that it does a
skb_clone first. So perhaps we should just require that if a
packet is to be expanded while attached to a socket then it must
be cloned first.Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
From: David Miller <davem@davemloft.net>
I think the mac80211 encryption paths on TX could be doing this as
well, just something I noticed meanwhile.--
Indeed. But then why did we never see this bug w/o monitor interfaces
and this reinjection?johannes
The debugging only catches it if the expanded area actually
gets used, e.g., by skb_push.Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
Why, btw? It's not too hard to check the allocated size, no?
johannes
Yes that would be a meaningful improvement although we'd need to
audit/test this to make sure that we don't spam people's logs
with it.Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
It does spam the log. A lot. And I don't know why, from this discussion
I only thought that it shouldn't.johannes
This was a stupid mistake, if you do it correctly it actually works and
so far has only triggered a single warning on my system:[ 217.507048] SKB BUG: Invalid truesize (4294964120) size=432, sizeof(sk_buff)=176
that was with my patch though to update skb->truesize during !skb->sk
pskb_expand_head() calls.johannes
---
include/linux/skbuff.h | 8 ++++++--
net/core/skbuff.c | 10 ++++++++--
2 files changed, 14 insertions(+), 4 deletions(-)--- everything.orig/include/linux/skbuff.h 2008-05-03 15:47:00.000000000 +0200
+++ everything/include/linux/skbuff.h 2008-05-04 00:30:34.000000000 +0200
@@ -387,9 +387,13 @@ extern void skb_truesize_bug(strucstatic inline void skb_truesize_check(struct sk_buff *skb)
{
- int len = sizeof(struct sk_buff) + skb->len;
+#ifdef NET_SKBUFF_DATA_USES_OFFSET
+ int len = sizeof(struct sk_buff) + skb->end;
+#else
+ int len = sizeof(struct sk_buff) + (skb->end - skb->head);
+#endif- if (unlikely((int)skb->truesize < len))
+ if (unlikely((int)skb->truesize != len))
skb_truesize_bug(skb);
}--- everything.orig/net/core/skbuff.c 2008-05-03 16:29:23.000000000 +0200
+++ everything/net/core/skbuff.c 2008-05-04 00:31:32.000000000 +0200
@@ -151,9 +151,15 @@ void skb_under_panic(struct sk_buff *skbvoid skb_truesize_bug(struct sk_buff *skb)
{
+#ifdef NET_SKBUFF_DATA_USES_OFFSET
+ int len = sizeof(struct sk_buff) + skb->end;
+#else
+ int len = sizeof(struct sk_buff) + (skb->end - skb->head);
+#endif
+
printk(KERN_ERR "SKB BUG: Invalid truesize (%u) "
- "len=%u, sizeof(sk_buff)=%Zd\n",
- skb->truesize, skb->len, sizeof(struct sk_buff));
+ "size=%u, sizeof(sk_buff)=%Zd\n",
+ skb->truesize, len, sizeof(struct sk_buff));
}
EXPORT_SYMBOL(skb_truesize_bug);--
RnJvbTogSm9oYW5uZXMgQmVyZyA8am9oYW5uZXNAc2lwc29sdXRpb25zLm5ldD4NCkRhdGU6IFN1
biwgMDQgTWF5IDIwMDggMDA6NTY6MjMgKzAyMDANCg0KPiBUaGlzIHdhcyBhIHN0dXBpZCBtaXN0
YWtlLCBpZiB5b3UgZG8gaXQgY29ycmVjdGx5IGl0IGFjdHVhbGx5IHdvcmtzIGFuZA0KPiBzbyBm
YXIgaGFzIG9ubHkgdHJpZ2dlcmVkIGEgc2luZ2xlIHdhcm5pbmcgb24gbXkgc3lzdGVtOg0KPiAN
Cj4gWyAgMjE3LjUwNzA0OF0gU0tCIEJVRzogSW52YWxpZCB0cnVlc2l6ZSAoNDI5NDk2NDEyMCkg
c2l6ZT00MzIsIHNpemVvZihza19idWZmKT0xNzYNCj4g77u/DQo+IHRoYXQgd2FzIHdpdGggbXkg
cGF0Y2ggdGhvdWdoIHRvIHVwZGF0ZSBza2ItPnRydWVzaXplIGR1cmluZyAhc2tiLT5zaw0KPiBw
c2tiX2V4cGFuZF9oZWFkKCkgY2FsbHMuDQoNClRoYW5rcyBhIGxvdCBmb3IgdHJhY2tpbmcgdGhp
cyBkb3duIEpvaGFubmVzLg0K
--
No, no, I haven't. The only thing that was a stupid mistake was my
change to check the actual allocated skb size in the truesize bug rather
than do the < len check... That works, when you do it as in that patch,
but the mac80211 stuff still stands (along with all the questions I
posted elsewhere in this thread)johannes
I'm confused. The area should be used say with encryption when it's
actually necessary. Maybe there's always enough headroom for some reason
now?On another note, why is this truesize mismatch a bug anyway? I mean, the
field could just be called "socket_charged_size" and simply be required
to have the same value throughout the skb lifetime, the slight mismatch
between charged bytes and actually used bytes wouldn't usually matter
too much, would it?johannes
From: Johannes Berg <johannes@sipsolutions.net>
That's possible.
There are paths, that, although they could take advantage of the
fact that all of the non-header packet data is going to come fromSure the name could change, but the problem wouldn't.
We can't update skb->truesize during arbitray skb->data reallocations,
because it could corrupt the socket accounting.On the other hand, if we provide ways for users to subvert the socket
buffer limits, we might as well not try to limit anything.Take a look at some ethernet drivers that implement TSO in a way that
requires munging the IP headers for whatever reason. If they need to
COW the packet data in order to modify it, they always do this with
pskb_expand_head(skb, 0, 0, GFP_*) exactly so that they don't modify
the SKB data size, and exactly so that the skb->truesize value stays
accurate.Try to find out exactly what's going wrong here, you seem to be
close. Once we know the precise issue we can talk about real
changes to make this easier to cope with and debug in the future.
--
Why don't we update the socket allocation when doing pskb_expand_head()?
We need more space though. Should we then just increase the built-in
headroom?johannes
From: Johannes Berg <johannes@sipsolutions.net>
The socket locked state at this time is variable and unknown.
The socket must be locked in order to modify these values.
And such locks cannot be taken, for example, from HW interruptI simply don't know what to suggest at this point, that's why
we are having this discussion :-)
--
I'm still not sure about the dependencies between LL_MAX_HEADER,
dev->hard_header_len and similar. Why, for example, does ipip set it to
LL_MAX_HEADER + sizeof(struct iphdr)? Because it doesn't know better
since the packets it creates could be routed anywhere?Could mac80211 announce it needs a very long hard_header_len (say 48 (or
54) bytes)? Am I right in thinking that then we'd have to increase
LL_MAX_HEADER as well? I haven't found a check somewhere that warns you
if you set dev->hard_header_len > LL_MAX_HEADER, should there be one?If I increase dev->hard_header_len, will that have any negative impact
on the caching since I'm still just using regular ethernet headers?As far as I understand we have a few options:
(a) go along with it as we do now, use pskb_expand_head, just call
skb_orphan before. I assume this has a number of requirements just
like sock size accounting would have, does this work from within a
hard_start_xmit path? I haven't seen any problems with it so far
anyway.
(b) clone the skb and free the original. pretty much equivalent
(c) increase hard_header_len/LL_MAX_HEADER constants to 48 (54).Options (a) and (b) make the accounting pretty useless since that would
drop the charge to the socket quite early. (c) doesn't seem to work, I
tried =EF=BB=BFjust increasing LL_MAX_HEADER doesn't seem to help although
MAX_TCP_HEADER suggests it should be getting enough headroom then.Ideally, we'd have enough headroom in the skb to start with, since right
now we're apparently reallocating a lot, especially encrypted frames.
Not that I understand why we don't get a truesize bug (without the
monitors) when we do that.With smart hardware like b43 we could even think about putting the
802.11 header stuff into a separate buffer and have the hardware to
gather-dma but there are so many dumb usb devices that this won't help
much anyway.johannes
Hmm. I wonder if the reason might be that the skb itself is long enough,
but has so much tailroom that it doesn't have enough headroom, but when
we reallocate it to have more headroom we actually later don't use the
extra tailroom. That would explain why I don't get truesize bugs
although I do get a warning (from the modified pskb_expand_head()) that
I mustn't reallocate the buffer.johannes
The difference between LL_MAX_HEADER and hard_header_len is that
the former is a system-wide hint of how big the header can be
and the latter is a device-specific value.In other words, we use the former when we don't know where we
create a packet since we don't know where it'll end up. As such
we allocate a generous amount of header space so that hopefully
nobody will have to expand the header later. However, it would
be OK for someone to expand it but obviously it this happened
for the majority of your traffic then you've done something wrong
since expanding it wastes CPU resources by copying the data.The value of hard_header_len on the other hand is used for two
purposes at least. First of all it is a hint of how much free
header space there should be in a packet before it goes into a
device (but don't rely on it). More importantly it's the length
of the hardware header in the packet. The definition of the
hardware header varies with each type of device.Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
Ok, thanks for the explanation. The latter probably means that I cannot
use hard_header_len since mac80211 really needs devices to show up as
ethernet devices. However, it wouldn't really help anyway if I cannot
rely on it being available.So, what is mac80211 supposed to do? It needs up to 54 bytes of
available headroom (for an encrypted mesh packet which currently can't
really happen, but anyway) yet it cannot pskb_expand_head() either.
Cloning each packet seems even more expensive, and just like skb_orphan
subverts the purpose of the socket accounting.johannes
If all/the majority of your packets need the space then put it in
LL_MAX_HEADER. In any case, you should always expand the packet
if necessary in your output routine since LL_MAX_HEADER is just a
hint. Yes cloning is expensive compared to not having to do it,
but as long as this is only done for the exception then it's
irrelevant.Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
Yes, wireless always needs at least 24 bytes, but more likely 34
(encryption+QoS). However, I just increased LL_MAX_HEADER to 54 and thatWhat's wrong with, instead, doing skb_orphan() and then
pskb_expand_head()? That seems to have the same effect.johannes
If you packet sticks around for long enough then this skews the
accounting. In any case, thinking too much about optimising this
part is a waste of time because we should be thinking about having
enough head room in the packet so that we don't have to expand
it in the first place except for the odd packet.Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
From: Herbert Xu <herbert@gondor.apana.org.au>
I know what causes this problem, things like ARP.
They don't use LL_MAX_HEADER, and instead go:
skb = alloc_skb(arp_hdr_len(dev) + LL_RESERVED_SPACE(dev), GFP_ATOMIC);
because they are reasonably sure what exact device they
are sending out of :-)As mentioned elsewhere, there is a disconnect between how some
of these values are used. But I'm pretty sure I know why some
of them are used this way.
--
I stuck a WARN_ON((nhead || ntail) && skb->sk) into pskb_expand_head
(which never triggered except from mac80211). And mac80211 has code that
calculates the required header length and only calls pskb_expand_head()
from that place when it needs more header space (or the skb is cloned. IVery true. How about tail length? :)
johannes
I just re-did the test, and I definitely need 29 more bytes on, for
example, the IPv6 autodiscovery packets and ICMP packets generated with
ping(8). Some of them even need additional tailroom.johannes
Could you find out the code path that created the skb in the
kernel so we can see why there isn't enough head room?Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
Yeah I was just considering putting a stacktrace into the skb when it's
allocated. Haven't done, lunchtime now.johannes
http://johannes.sipsolutions.net/patches/kernel/all/2008-05-04-11%3a23/0...
skb-alloc-stackdump.patchwill report results after kernel rebuild.
johannes
7-skb-alloc-stackdump.patch
This was broken when cloning, this one works:
http://johannes.sipsolutions.net/patches/kernel/all/2008-05-04-12%3a19/0...
skb-alloc-stackdump.patchNow I see the problem. I increased the LL_MAX_HEADER constant, but all
code uses dev->hard_header_len to allocate the headroom (via
LL_RESERVED_SPACE), e.g. packet.c:packet_sendmsg_spkt:
skb =3D sock_wmalloc(sk, len + LL_RESERVED_SPACE(dev), 0, GFP_KERNE=
L); =20
=20
[...]
/* FIXME: Save some space for broken drivers that write a
* hard header at transmission time by themselves. PPP is the
* notable one here. This should really be fixed at the driver leve=
l.
*/
skb_reserve(skb, LL_RESERVED_SPACE(dev));This one even complains about "broken drivers" (like PPP, but wireless
code behaves like this too). This is getting really really really
frustrating. All kinds of comments all over tell you how this all is
wrong but NEVER actually tell you how to do it correctly!As far as I understand, I cannot change dev->hard_header_len because I
want/need an ethernet header and not more. But then I don't get enough
headroom.Oddly enough, I even get a warning from tcp_connect() although that
actually does
/* Reserve space for headers. */
skb_reserve(buff, MAX_TCP_HEADER);which should definitely be sufficient. Maybe the packet gets cloned
somewhere? On the other hand, IPv4 raw's =EF=BB=BFraw_send_hdrinc reserves
LL_RESERVED_SPACE(rt->u.dst.dev) like af_packet....johannes
Right hard_header_len is being overloaded with two meanings.
On the one hand it's used to indicate the length of the header
managed by the hardware header cache, on the other hand we use
it to indicate the head room for those devices that don't have
a cache.Since your device wants both, it fails miserably.
There are three ways out of this. You could have your own cache
(I haven't looked at your headers so I don't know whether that
makes sense), have no cache at all, or add a new field that gets
added to hard_header_len for the purposes of calculating head
room.Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
I tried the last option, patch below, didn't quite work out yet, but I
got:[ 256.241496] KERNEL: assertion (!atomic_read(&sk->sk_wmem_alloc))
failed at net/packet/af_packet.c (225)any ideas about that?
johannes
---
include/linux/netdevice.h | 10 ++++++++--
net/core/netpoll.c | 2 +-
net/econet/af_econet.c | 2 +-
net/ipv4/arp.c | 2 +-
net/ipv4/igmp.c | 4 ++--
net/ipv4/ip_forward.c | 4 +++-
net/ipv4/ipconfig.c | 6 +++---
net/ipv4/ipip.c | 1 +
net/ipv4/ipmr.c | 1 +
net/ipv4/ipvs/ip_vs_xmit.c | 1 +
net/ipv4/raw.c | 10 ++++------
net/ipv6/ip6_output.c | 3 ++-
net/ipv6/mcast.c | 4 ++--
net/ipv6/ndisc.c | 4 ++--
net/ipv6/raw.c | 10 ++++------
net/ipv6/sit.c | 2 ++
net/packet/af_packet.c | 2 +-
net/xfrm/xfrm_output.c | 11 ++++++-----
18 files changed, 45 insertions(+), 34 deletions(-)--- everything.orig/include/linux/netdevice.h 2008-05-04 14:48:47.000000000 +0200
+++ everything/include/linux/netdevice.h 2008-05-04 15:34:04.000000000 +0200
@@ -254,11 +254,16 @@ struct hh_cache
*
* We could use other alignment values, but we must maintain the
* relationship HH alignment <= LL alignment.
+ *
+ * LL_ALLOCATED_SPACE also takes into account the tailroom the device
+ * may need.
*/
#define LL_RESERVED_SPACE(dev) \
- (((dev)->hard_header_len&~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
+ ((((dev)->hard_header_len+(dev)->needed_headroom)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
#define LL_RESERVED_SPACE_EXTRA(dev,extra) \
- ((((dev)->hard_header_len+extra)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
+ ((((dev)->hard_header_len+(dev)->needed_headroom+(extra))&~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
+#define LL_ALLOCATED_SPACE(dev) \
+ ((((dev)->hard_header_len+(dev)->needed_headroom+(dev)->needed_tailroom)&~(HH_DATA_MOD - 1)) + HH_D...
From: Johannes Berg <johannes@sipsolutions.net>
This is exactly what happens if you change skb->truesize
when a socket is still attached to the skb.
--
Ok, that makes sense. Then it also makes sense I never got that warning
again, I had left one of my test patches in too long.johannes
Except for that, the patch works fine with mac80211 setting the new
netdev fields. I still get lots of reallocations though because I get a
lot of cloned packets, not sure why.johannes
Oddly, I haven't been able to reproduce this, so I'll clean up the
patches now.If I have a device that needs 100 bytes headroom (not unheard of),
there will of course still be packets that don't have enough headroom.
Should we skb_orphan them?johannes
This doesn't really make all that much sense since 802.11 has not only
variable length headers but also needs a fixed BSSID in the header andThat wouldn't be good either I think since we can actually use the
ethernet header quite well.johannes
Actually this is not as bad as it sounds. Since you're constructing
the hardware header anyway, it may turn out to be easier to tack on
the Ethernet header while you're at it.Please note that the hardware cache is not the same as the neighbour
cache. Just because you lose the hardware cache doesn't mean that
you'd lose the neighbour (ARP) cache.However, this would mean that you'd no longer have an Ethernet device
which is both good and bad.Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
We actuall remove the ethernet header and rewrite it to a hardware
Yeah. I don't think that'll go down well with userspace.
johannes
In fact, I should probably add another field for the needed tail length
too since we need that as well.johannes
Well, not exactly, since we'd only do that (at least in mac80211) when
packets are about to be sent to the hardware so they wouldn't live muchRight, but we might actually need more space. Say you have a device that
requires 82 bytes headroom (yes, there are such devices) for their own
transmit header. Then you need maybe up to 30 bytes of 802.11 header
plus 8 byte ICV, so minus the 14 ethernet header that we remove we now
need well over 100 bytes headroom. On the other hand, not even
accounting the actual data buffer (you proposed to skb_orphan the skb
early) seems wrong as well. Worse yet, the needed transmit header
headroom is variable depending on devices.One of the worst devices is the Broadcom one with 82 header and nowadays
actually DMAs this header from a separate memory location, so there this
won't happen, but can we guarantee that all devices are programmableI'll try. I don't really see myself being that close ;)
johannes
The worst case is probably prism54 usb devices which by itself need 76
bytes headroom for the USB buffer, and then when we say run mesh on top
of it we'll need a total of 122 bytes. Needless to say, it cannot do s/g
operation.The question is: how do we handle that? Do we reallocate the buffer in
the driver? That is well possible but makes it rather inconvenient for
driver authors. Also, mac80211 will still need 46 bytes of headroom and
12 bytes of tailroom in the worst case (so far, HT might require four
more.) If we skb_orphan() the skb right away we have essentially removed
all socket memory accounting, so that's pretty pointless.Should we increase the LL_MAX_HEADER constant to 40 (no mesh networking)
or 46 for when 802.11 (with mesh networking) is configured into the
kernel? Most people probably don't run an IPIP tunnel over wireless yet
configure them in (especially distros) so that might be why we never saw
the problem before.johannes
From: Johannes Berg <johannes@sipsolutions.net>
Date: Sat, 03 May 2008 16:25:20 +0200[ Ok, I'm going through all of this, we'll solve the problems one at
Thanks for these numbers.
Based upon them I will adjust LL_MAX_HEADER as follows:
net: Set LL_MAX_HEADER properly for wireless.
Wireless networking, particularly with MESH enabled, has
quite strong requirements for link-layer header space.Based upon some numbers and descriptions from Johannes Berg
we use 96 (same as AX25) for plain wireless, and with
mesh enabled we use 128.In the process, simplify the cpp conditional logic here by
ordering the cases by those needing the most space down
to those needing the least case.Signed-off-by: David S. Miller <davem@davemloft.net>
---
include/linux/netdevice.h | 16 +++++++++-------
1 files changed, 9 insertions(+), 7 deletions(-)diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7469017..a3fb57f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -93,14 +93,16 @@ struct wireless_dev;
* used.
*/-#if !defined(CONFIG_AX25) && !defined(CONFIG_AX25_MODULE) && !defined(CONFIG_TR)
-#define LL_MAX_HEADER 32
+#if defined(CONFIG_WLAN_80211) || defined(CONFIG_AX25) || defined(CONFIG_AX25_MODULE)
+# if defined(CONFIG_MAC80211_MESH)
+# define LL_MAX_HEADER 128
+# else
+# define LL_MAX_HEADER 96
+# endif
+#elif defined(CONFIG_TR)
+# define LL_MAX_HEADER 48
#else
-#if defined(CONFIG_AX25) || defined(CONFIG_AX25_MODULE)
-#define LL_MAX_HEADER 96
-#else
-#define LL_MAX_HEADER 48
-#endif
+# define LL_MAX_HEADER 32
#endif#if !defined(CONFIG_NET_IPIP) && !defined(CONFIG_NET_IPIP_MODULE) && \
--
1.5.5.1.57.g5909c--
Is WLAN_80211 really the right symbol here? It applies to more than
just mac80211, and I doubt if full MAC devices need higher values.
It seems like MAC80211 would be the right symbol to add.John
--
John W. Linville
linville@tuxdriver.com
--
Most fullmac devices also need some space, whether they do things that
way or not is a completely different thing, the ieee80211 stack for
example copies every packet into a new skb anyway.johannes
On Tue, May 13, 2008 at 11:59 PM, Johannes Berg
It's really depends on when is the 802.3 to 802.11 header translation
made it doesn't matter whether it's fullmac or softmac. I would say
that WLAN is OK.Tomas
--
True, some devices take 802.3 frames and do the translation and
everything themselves. I dunno. I guess it would make sense to make it
mac80211 anyway because ieee80211 doesn't care and the other drivers are
probably handling this in some other way anyway (if maybe even by using
different dma descriptors, or just not caring because transfer speeds
are low enough)johannes
From: Johannes Berg <johannes@sipsolutions.net>
We should fix ieee80211 (it's something I planned to look into today
in fact), and the drivers that copy or do things incorrectly.It doesn't make sense to make this decision based upon which 80211
stack or which drivers are enabled. All of them in essence want this
extra space, so let's give it to them.
--
I'm confused now. I added this patch:
--- everything.orig/net/core/skbuff.c 2008-05-01 13:21:52.000000000 +0200
+++ everything/net/core/skbuff.c 2008-05-01 13:29:57.000000000 +0200
@@ -683,6 +683,14 @@ int pskb_expand_head(struct sk_buff *skb
if (!data)
goto nodata;
=20
+ if (unlikely((nhead || ntail) && skb->sk)) {
+ printk(KERN_ERR "SKB BUG: Illegal pskb expand (%d:%d) "
+ "with socket attached\n",
+ nhead, ntail);
+ dump_stack();
+ } else
+ skb->truesize =3D size + sizeof(struct sk_buff);
+
/* Copy only real data... and, alas, header. This should be
* optimized for the cases when header is void. */
#ifdef NET_SKBUFF_DATA_USES_OFFSET
--- everything.orig/net/mac80211/tx.c 2008-05-01 13:01:09.000000000 +0200
+++ everything/net/mac80211/tx.c 2008-05-01 13:16:50.000000000 +0200
@@ -1279,6 +1279,8 @@ int ieee80211_master_start_xmit(struct s
int headroom;
int ret;
=20
+ skb_orphan(skb);
+
if (info->flags & IEEE80211_TX_CTL_READY_FOR_TX) {
/*
* We set the IEEE80211_TX_CTL_READY_FOR_TX bit in all skbs
@@ -1581,6 +1583,7 @@ int ieee80211_subif_start_xmit(struct sk
* us broadcast frames. */
=20
if (head_need > 0 || skb_cloned(skb)) {
+ skb_orphan(skb);
#if 0
printk(KERN_DEBUG "%s: need to reallocate buffer for %d bytes "
"of headroom\n", dev->name, head_need);and the assertion never triggers, however I get a number of bugs like this:
=EF=BB=BFSKB BUG: Invalid truesize (4294963740) len=3D44, sizeof(sk_buff)=
=3D176, skb=3D0xeecb6620and definitely cannot explain that number (-3556).
johannes
Also this could be a WARN_ON since here we really want to know who
called it.johannes
Ok I think I'm starting to understand this a little better. However,
shouldn't this function update skb->truesize so if the skb is later
attached to a different socket again it has the right size?johannes
Judging from some of the callers, the caller should. Why?!
johannes
From: Johannes Berg <johannes@sipsolutions.net>
Relax :-)
We certainly could check that there is no socket attached here,
and make the truesize adjustment right at this spot.It just never happened before in practice in a way that matters.
That's why we have the truesize assertion, to discover situations
like this and thus be able to fix it.
--
Except unfortunately the truesize assertion is rather useless since you
have no idea where it comes from. FWIW, some caller that does the
adjustment must be going wrong, whenever I start vpnc I get a single one
like this:[ 162.108556] SKB BUG: Invalid truesize (408) size=3D432,
sizeof(sk_buff)=3D176This is again without the patch to pskb_expand_head that did the
truesize adjustment, I only put in a WARN_ON (similar to the code you
had above but is, I think, more useful since it has a stack dump and
other useful info)Right now I think I'm too lazy to dig into where this happens. I don't
hit the warning in pskb_expand_head so it must be one of the other
20-odd places where truesize is adjusted. Maybe I'll just make each of
them print out the info.johannes
Hmm. The disconnect between truesize and skb->len+sizeof(*skb) was
usually 17 or 19 bytes and sizeof(*rthdr) is only 11. On the other hand,
I don't see where the other bytes should be coming from. I'll give this
a try, thanks.johannes
From: Johannes Berg <johannes@sipsolutions.net>
Grrr, I bet it's coming from a combination of the
skb_set_mac_header(skb, 0); call done by mac80211 and the skb_push()
calls in net/packet/af_packet.cdavem@sunset:~/src/GIT/net-2.6$ egrep skb_push net/packet/af_packet.c
skb_push(skb, skb->data - skb_mac_header(skb));
skb_push(skb, skb->data - skb_mac_header(skb));
skb_push(skb, skb->data - skb_mac_header(skb));
davem@sunset:~/src/GIT/net-2.6$
--
Even when I explicitly set truesize (rather than adjusting it as you
But mac80211 does set_mac_header(0) so this should just push zero bytes,
no?johannes
From: Johannes Berg <johannes@sipsolutions.net>
Right you are.
So, I wonder what's causing the problem... Could you "remember" the
length and truesize at the skb_orphan() point in mac80211, right
after the skb_push(), then in the truesize warning, print those
"remembered" values as well as the current ones.--
I was just playing with af_packet and added some debugging there that
prints out the len of all packets it gets (for a certain ifidx)That's confusing me even more now. I get
[ 7650.792004] packet_recv eda9e8c0 (len=3D137)
[ 7650.792015] snaplen(eda9e8c0)=3D137
[ 7650.792027] free eda9e8c0, len =3D 137
[ 7650.792031] new skb: eda9e540
[ 7650.792039] packet_recv eda9e8c0 (len=3D137)
[ 7650.792044] snaplen(eda9e8c0)=3D137
[ 7650.792048] new skb: eda9e8c0
[ 7650.819464] packet_recv d1f4e9a0 (len=3D124)
[ 7650.819478] snaplen(d1f4e9a0)=3D124
[ 7650.819489] free d1f4e9a0, len =3D 124
[ 7650.819493] new skb: d1f4e8c0
[ 7650.819502] packet_recv d1f4e9a0 (len=3D124)
[ 7650.819507] snaplen(d1f4e9a0)=3D124
[ 7650.819511] new skb: d1f4e9a0
[ 7651.215631] packet_recv e9ecc2a0 (len=3D376)
[ 7651.215645] snaplen(e9ecc2a0)=3D376
[ 7651.215657] free e9ecc2a0, len =3D 376
[ 7651.215662] new skb: ede04b60
[ 7651.215671] packet_recv e9ecc2a0 (len=3D376)
[ 7651.215675] snaplen(e9ecc2a0)=3D376
[ 7651.215680] new skb: e9ecc2a0[ 7651.760751] SKB BUG: Invalid truesize (528) len=3D357, sizeof(sk_buff)=
=3D176528-176 is 352 which doesn't occur in that list... Maybe I should print
it in mac80211.johannes
| Greg KH | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Andrew Morton | Re: 2.6.23-rc6-mm1 |
| Luciano Rocha | usb hdd problems with 2.6.27.2 |
git: | |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| Andrew Morton | Re: [BUG] New Kernel Bugs |
| David Miller | [GIT]: Networking |
| Jarek Poplawski | [PATCH take 2] pkt_sched: Protect gen estimators under est_lock. |
