linux-netdev mailing list

FromSubjectsort iconDate
Adam Langley
[RFC] tcp: Add (limited) SYNACK payload support
This patch implements the draft spec: http://www.ietf.org/internet-drafts/draft-agl-tcpm-sadata-01.txt At the moment, this is just an [RFC] patch because an option number hasn't been assigned by the IETF yet. It allows listening sockets to be configured with a small (<= 64 bytes), payload that is included in SYN/ACK packets elicited by SYN packets that include a special option. See the draft linked to above for motivations. Additionally, the listening socket can request that the kernel ...
Aug 12, 1:56 pm 2008
Scott Wood Aug 12, 1:10 pm 2008
Manuel Lauss
[PATCH] smc91x: allow platform data to configure LEDs.
Add another field to smc91x_platdata to configure LEDs. Signed-off-by: Manuel Lauss <mano@roarinelk.homelinux.net> --- drivers/net/smc91x.c | 7 ++++++- drivers/net/smc91x.h | 5 ++++- include/linux/smc91x.h | 17 +++++++++++++++++ 3 files changed, 27 insertions(+), 2 deletions(-) diff --git a/drivers/net/smc91x.c b/drivers/net/smc91x.c index 2040965..a591508 100644 --- a/drivers/net/smc91x.c +++ b/drivers/net/smc91x.c @@ -1520,7 +1520,7 @@ smc_open(struct net_device *dev) ...
Aug 12, 11:49 am 2008
David Witbrodt
Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
I having nothing but gratitude for anyone who has any suggestions whatsoever! I do _want_ to do both of those things... but you are right that no one should try to do them both at the same time. BTW, the bisect data was from my first post (trying to _find_ the problem) about 8 days ago. Since that time, I have been _assuming_ I located the cause problem, and had not thought about it again... until today. Unfortunately, this kernel stuff can get very deep. Just finding the commit, where ...
Aug 12, 10:29 am 2008
Ray Lee
Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
Hard data first, and then there will be plenty of time for blame later. Not that there's any real blame for anyone here, we just have a His commit may have uncovered a latent problem somewhere else, that happens often. But if the commit really is the trouble one, then two things happen: It's rc3 or rc4 now, so we just revert the damn thing, and then (secondly) he works with you (by adding debugging or whatever) to figure out where the problem actually is. The point I'm trying to make ...
Aug 12, 10:38 am 2008
Ray Lee
Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
Heh. Can I offer a suggestion here? You're trying to do two things at once -- finding where the problem is, and also trying to understand the problem at the same time. Speaking just for myself, I try to Git should have printed out "<SHA1> is first bad commit" Did you see that? If not, you stopped the process too soon. Viewing the history with gitk, though, it seems you fingered the right commit. Which leads Or this :-). Can you try reverting that commit against the top of the latest ...
Aug 12, 9:03 am 2008
David Witbrodt Aug 12, 8:17 am 2008
Matthew Wilcox
[PATCH] Don't take the mdio_lock in atl1e_probe
Lockdep warns about the mdio_lock taken with interrupts enabled then later taken from interrupt context. Initially, I considered changing these to spin_lock_irq/spin_unlock_irq, but then I looked at atl1e_phy_init() and saw that it calls msleep(). Sleeping while holding a spinlock is not allowed either. In the probe path, we haven't registered the interrupt handler, so it can't poke at this card yet. It's before we call register_netdev(), so I don't think any other threads can reach this ...
Aug 12, 6:13 am 2008
Peter Zijlstra
Re: [PATCH 05/30] mm: slb: add knowledge of reserve pages
Could do I guess. Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c +++ linux-2.6/mm/slub.c @@ -1543,7 +1543,7 @@ load_freelist: if (unlikely(!object)) goto another_slab; if (unlikely(PageSlubDebug(c->page) || c->reserve)) - goto debug; + goto slow_path; c->freelist = object[c->offset]; c->page->inuse = c->page->objects; @@ -1586,11 +1586,21 @@ grow_slab: goto load_freelist; } return ...
Aug 12, 3:23 am 2008
Neil Brown
Re: [PATCH 05/30] mm: slb: add knowledge of reserve pages
I see.... a little. I'm trying to avoid understanding slub too deeply, I don't want to use up valuable brain cell :-) Would we be justified in changing the label from 'debug:' to 'slow_path:' or something? And if it is just c->reserve, should we avoid the call to alloc_debug_processing? Thanks, --
Aug 12, 2:35 am 2008
Neil Brown
Re: [PATCH 02/30] mm: gfp_to_alloc_flags()
Oh yes, obvious when you explain it, thanks. cat << END >> Changelog As the test - if (((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE))) - && !in_interrupt()) { - if (!(gfp_mask & __GFP_NOMEMALLOC)) { has been replaced with a slightly strong + if (alloc_flags & ALLOC_NO_WATERMARKS) { we need to ensure we don't recurse when PF_MEMALLOC is set END ?? Thanks, NeilBrown --
Aug 12, 2:33 am 2008
Neil Brown
Re: [PATCH 12/30] mm: memory reserve management
Two comments to be precise. 1/ __kmalloc_reserve attempts a __GFP_NOMEMALLOC allocation, and then if that fails, ___kmalloc_reserve immediately tries again. Is that pointless? Should the second one be removed? 2/ mem_reserve_kmalloc_charge appears to assume that the 'mem_reserve' has been 'connected' and so is active. While callers probably only set GFP_MEMALLOC in cases where the mem_reserve is connected, ALLOC_NO_WATERMARKS could get via PF_MEMALLOC so we could end ...
Aug 12, 12:46 am 2008
Peter Zijlstra
Re: [PATCH 12/30] mm: memory reserve management
Pretty pointless yes, except that it made ___kmalloc_reserve a nicer function to read, and as its an utter slow path I couldn't be arsed to Hmm, that would be __mem_reserve_charge() then, because the callers Uhmm,. good point. Let me ponder this while I go for breakfast ;-) --
Aug 12, 1:12 am 2008
Rusty Russell
[PATCH 2/2] tun: fallback if skb_alloc() fails on big packets
skb_alloc produces linear packets (using kmalloc()). That can fail, so should we fall back to making paged skbs. My original version of this patch always allocate paged skbs for big packets. But that made performance drop from 8.4 seconds to 8.8 seconds on 1G lguest->Host TCP xmit. So now we only do that as a fallback. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> diff -r ffcd4a3f63a8 drivers/net/tun.c --- a/drivers/net/tun.c Wed Aug 06 16:19:36 2008 +1000 +++ ...
Aug 11, 11:25 pm 2008
Herbert Xu
Re: [PATCH 2/2] tun: fallback if skb_alloc() fails on bi ...
I'm not sure that this is really a good idea. If anything then tries to expand the head of this skb, they may fail and be forced to drop the packet. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
Aug 12, 3:14 am 2008
Rusty Russell
[PATCH 1/2] net: skb_copy_datagram_from_iovec()
There's an skb_copy_datagram_iovec() to copy out of a paged skb, but nothing the other way around (because we don't do that). We want to allocate big skbs in tun.c, so let's add the function. It's a carbon copy of skb_copy_datagram_iovec() with enough changes to be annoying. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> --- include/linux/skbuff.h | 4 ++ net/core/datagram.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 91 insertions(+) diff -r ...
Aug 11, 11:24 pm 2008
Peter Zijlstra
Re: [PATCH 12/30] mm: memory reserve management
against __mem_reserve_charge(), granted, the race would be minimal at weirdness in my brain when I wrote that I guess, shall ammend! --
Aug 12, 1:10 am 2008
Neil Brown
Re: [PATCH 12/30] mm: memory reserve management
This looks quite different to last time I looked at the code (I think). You now have a more structured "kmalloc_reserve" interface which returns a flag to say if the allocation was from an emergency pool. I think this will be a distinct improvement at the call sites, though I I cannot figure out why the spinlock is being used to protect updates to 'limit'. As far as I can see, mem_reserve_mutex already protects all those updates. Why not just if (emerg) *emerg = 1. I can't ...
Aug 11, 11:23 pm 2008
Neil Brown
Re: [PATCH 05/30] mm: slb: add knowledge of reserve pages
This looks suspiciously like debugging code that you have left in. This sort of thing always worries me. It is a per-cpu data structure so you won't get SMP races corrupting fields. But you do get read-modify-write in place of simple updates. I guess it's not a problem.. But it worries me :-) NeilBrown --
Aug 11, 10:35 pm 2008
Peter Zijlstra
Re: [PATCH 05/30] mm: slb: add knowledge of reserve pages
Its not, we need to force slub into the debug slow path when we have a reserve page, otherwise we cannot do the permission check on each Right,.. do people prefer I just add another int? --
Aug 12, 12:22 am 2008
Neil Brown
Re: [PATCH 02/30] mm: gfp_to_alloc_flags()
This patch all looks "obviously correct" and a nice factorisation of I don't remember seeing it before (though my memory is imperfect) and it doesn't seem to fit with the rest of the patch (except spatially). There is a test above for PF_MEMALLOC which will result in a "goto" somewhere else unless "in_interrupt()". There is immediately above a test for "!wait". So the only way this test can fire is when in_interrupt and wait. But if that happens, then the might_sleep_if(wait) at the top ...
Aug 11, 10:01 pm 2008
Peter Zijlstra
Re: [PATCH 02/30] mm: gfp_to_alloc_flags()
Ok, so the old code did: if (((p->flags & PF_MEMALLOC) || ...) && !in_interrupt) { .... goto nopage; } which avoid anything that has PF_MEMALLOC set from entering into direct reclaim, right? Now, the new code reads: if (alloc_flags & ALLOC_NO_WATERMARK) { } Which might be false, even though we have PF_MEMALLOC set - __GFP_NOMEMALLOC comes to mind. So we have to stop that recursion from happening. so we add: if (p->flags & PF_MEMALLOC) goto ...
Aug 12, 12:33 am 2008
Pravin Bathija
[Qdisc] Prioritizing traffic per vlan using qdisc.
Using Linux 2.6.23 I want to create multiple Vlans for voice, data, video etc. The requirement is to prioritize each vlan on the ingress and egress. I am looking at qdisc to do this however without success. Could you please advise is there is a way to do this using qdisc tc(traffic control) commands. If not - is there another way to do this in Linux ? Any help would be greatly appreciated. Thanks, Pravin --
Aug 11, 5:48 pm 2008
Pravin Bathija
RE: [Qdisc] Prioritizing traffic per vlan using qdisc.
For example if I create 3 vlans on interface 0 : eth0.1, eth0.2 and eth0.3 I want to assign them priorities of 1, 2 and 3 respectively (1 being the highest and 3 the lowest) -----Original Message----- From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Pravin Bathija Sent: Monday, August 11, 2008 5:48 PM To: netdev@vger.kernel.org Subject: [Qdisc] Prioritizing traffic per vlan using qdisc. Using Linux 2.6.23 I want to create multiple Vlans for voice, ...
Aug 11, 5:56 pm 2008
Stephen Hemminger
Re: [Qdisc] Prioritizing traffic per vlan using qdisc.
On Mon, 11 Aug 2008 17:48:22 -0700 For 2.6.25, an addition to the tc meta match to allow matching on vlan tag was added. Using this it is possible to use meta match to convert tag -> flowid then use the flowid with a priority (or htb) queue. --
Aug 11, 6:17 pm 2008
Simon Horman
Re: [RFC,PATCH] ipvs: Fix race condition in lblb and lbl ...
Is there a pathological case here if sysctl_ip_vs_lblc_expiration is set to be very short and we happen to hit ip_vs_lblc_full_check()? To be honest I think that I like the reference count approach best, as it seems safe and simple. Is it really going to be horrible for performance? If so, I wonder if a workable solution would be to provide a more fine-grained lock on tbl. Something like the way that ct_read_lock/unlock() works. --
Aug 11, 7:10 pm 2008
Sven Wegener
Re: [RFC,PATCH] ipvs: Fix race condition in lblb and lbl ...
Also possible. But I guess I was thinking too complicated last night. What I was after with the "protect the whole ip_vs_lblc_schedule() with write_lock()ing the lock" was also to simply prevent someone adding duplicate entries. If we just extend the read_lock() region to cover the whole usage of the entry and do an additional duplicate check during inserting the entry under write_lock(), we fix the issue and also fix the race that someone may add duplicate entries. We have a bit ...
Aug 11, 9:27 pm 2008
Sven Wegener
Re: [RFC,PATCH] ipvs: Fix race condition in lblb and lbl ...
I wondered if this whole thing can ever be totally race condition free, without changing how destinations are purged from the trash. Initial version of the patch below. Basically it pushes the locking up into ip_vs_lblc_schedule() and rearranges the code to be safe that we have a valid destination. diff --git a/net/ipv4/ipvs/ip_vs_lblc.c b/net/ipv4/ipvs/ip_vs_lblc.c index 7a6a319..67f7b04 100644 --- a/net/ipv4/ipvs/ip_vs_lblc.c +++ b/net/ipv4/ipvs/ip_vs_lblc.c @@ -123,31 +123,6 @@ ...
Aug 12, 6:07 am 2008
Ingo Oeser
Re: [NET-NEXT PATCH 3/3] e1000e: add support for new 825 ...
Hi Jeff, also in this patch, you have unresolved merge conflicts. Please see my other email on hints on resolving this. Best Regards Ingo Oeser --
Aug 12, 9:39 am 2008
Ingo Oeser
Re: [PATCH 3/3] e1000e: add support for new 82574L part
Hi Jeff, you have unresolved merge conflicts in your patch, please resolve them first. Just search in your editor for these marks in the patch you sent and resolve these in your repository. Best Regards Ingo Oeser --
Aug 12, 9:37 am 2008
David Miller
Re: [GIT] Please pull updates for IPVS
From: Sven Wegener <sven.wegener@stealer.net> Pulled, thanks a lot Sven. --
Aug 11, 6:10 pm 2008
David Miller
Re: [PATCH] pkt_sched: Add BH protection for qdisc_stab_lock.
From: Jarek Poplawski <jarkao2@gmail.com> Applied, thanks Jarek. --
Aug 11, 6:11 pm 2008
David Miller
Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl ...
From: Jarek Poplawski <jarkao2@gmail.com> The qdisc pointer traverses to the softirq handler, which can be run in a process context (via ksoftirqd), and this pointer gets there I didn't see it possible to keep scheduling the netdev_queues, as the qdiscs can be shared with multiple queues. Qdisc "are we running?" and other state pieces are now inside of the Qdisc itself. And all of the qdisc_run() and netif_schedule logic is, as a result, Qdisc centric. The synchronization object is the ...
Aug 12, 1:15 am 2008
Jarek Poplawski
Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl ...
On Mon, Aug 11, 2008 at 10:40:47PM -0700, David Miller wrote: Of course I've to miss something, but I still don't get it: after synchronize_rcu() in dev_deactivate() we are sure anyone in dev_queue_xmit() rcu block has to see the change to noop_qdisc(), so it can only lose packets and not really enqueue(). IMHO the only problem is this __netif_schedule(), which could be done with dev_queues instead of Qdiscs with proper dereferencing there. (BTW, I think we need rcu_read_lock() instead of the ...
Aug 12, 12:00 am 2008
Jarek Poplawski
Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl ...
On Tue, Aug 12, 2008 at 01:15:10AM -0700, David Miller wrote: If you mean net_tx_action() this looks like we would get a root lock of a current qdisc, just like seen in dev_queue_xmit() at the moment, so I'm still looking for a clue, what could be wrong with this... Jarek P. --
Aug 12, 3:38 am 2008
Jarek Poplawski
Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl ...
Could you explain this more? I've thought this synchronize_rcu() is just to prevent this (and what these comments talk about?): void dev_deactivate(struct net_device *dev) { bool running; netdev_for_each_tx_queue(dev, dev_deactivate_queue, &noop_qdisc); dev_deactivate_queue(dev, &dev->rx_queue, &noop_qdisc); dev_watchdog_down(dev); /* Wait for outstanding qdisc-less dev_queue_xmit calls. */ synchronize_rcu(); do { ...
Aug 11, 10:20 pm 2008
Jarek Poplawski
[PATCH take 2] pkt_sched: Protect gen estimators under e ...
So, since it's currently impossible, here is an alternative solution. Jarek P. ------------> pkt_sched: Protect gen estimators under est_lock. gen_kill_estimator() required rtnl_lock() protection, but since it is moved to an RCU callback __qdisc_destroy() let's use est_lock instead. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> --- net/core/gen_estimator.c | 9 +++++---- 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/net/core/gen_estimator.c ...
Aug 12, 3:02 pm 2008
David Miller
Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl ...
From: Jarek Poplawski <jarkao2@gmail.com> We can't do this. And at a minimum, the final ->reset() must occur in the RCU callback, otherwise asynchronous threads of execution could queue packets into this dying qdisc and such packets would leak forever. --
Aug 11, 6:12 pm 2008
David Miller
Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl ...
From: Jarek Poplawski <jarkao2@gmail.com> Those comments are out of date and I need to update them. In fact this whole loop is now largely pointless. The rcu_dereference() on dev_queue->qdisc happens before the QDISC_RUNNING bit is set. We no longer resample the qdisc under any kind of lock. Because we no longer have a top-level lock that synchronizes the setting of dev_queue->qdisc Rather, the lock we use for calling ->enqueue() and ->dequeue() is inside of the root qdisc ...
Aug 11, 10:40 pm 2008
Andi Kleen
Re: tbench regression on each kernel release from 2.6.22 ...
Wouldn't surprise me. Have you considered doing profiles? e.g. just oprofiling the benchmark on the different kernels and see if there's some obvious difference in the CPU consumers? -Andi --
Aug 12, 12:11 am 2008
Christoph Lameter
Re: tbench regression on each kernel release from 2.6.22 ...
If I get the time I will try to do that. Another way to understand why we are accepting the regressions here may be that we give more consideration to real time issues and deterministic performance these days. Hardware speed gains compensate for the additional bloat? (I ran the old kernels on cutting edge hardware after all). --
Aug 12, 11:57 am 2008
Ilpo Järvinen
Re: tbench regression on each kernel release from 2.6.22 ...
...IIRC, somebody in the past did even bisect his (probably netperf) 2.6.24-25 regression to some scheduler change (obviously it might or might not be related to this case of yours)... -- i. --
Aug 12, 1:13 am 2008
David Miller
Re: net-next-2.6 [PATCH 1/1] skbuff: Small NiT
From: Gerrit Renker <gerrit@erg.abdn.ac.uk> Applied, thanks Gerrit. --
Aug 11, 6:17 pm 2008
Vegard Nossum
Re: latest -git: kernel hangs when pulling the plug on 8139too
Oops, I spoke a bit too soon. Nope. It tries to take a lock that is already held. Instead: How can it be solved? Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 --
Aug 12, 1:54 pm 2008
Vegard Nossum
Re: latest -git: kernel hangs when pulling the plug on 8139too
Actually, I just forgot earlyprintk=...,keep. Now I modified the printk() to use early_printk() if oops_in_progress is set. This does actually give some result, but again the text is garbled, this time for real, and I don't understand why. But this is what I get on the serial console: <0>BUG: NMI Watchdog detected LOCKUP on CPU1, ip c010ae8c, registers: Pid: 0, comm: swapper Not tainted (2.6.27-rc2-00325-g796aade-dirty #3) EIP: 0060:[<c010ae8c>] EFLAGS: 00000246 CPU: 1 EIP is at ...
Aug 12, 1:46 pm 2008
Vegard Nossum
Re: latest -git: kernel hangs when pulling the plug on 8139too
Now I've tried to use kdump to catch the panic, but it doesn't help :-( At boot, I have this: Reserving 64MB of memory at 16MB for crashkernel (System RAM: 1023MB) Loading the dump-capture kernel succeeds: # build/sbin/kexec -p --initrd=/boot/initrd-2.6.23.8-34.fc7.img --append="ro root=/dev/VolGroup00/LogVol00 rhgb console=tty0 console=ttyS0,115200 nmi_watchdog=1 panic=30 sysrq_always_enabled maxcpus=1 irqpoll reset_devices 3" /boot/testing/bzImage ...but after pulling the ...
Aug 12, 10:20 am 2008
Vegard Nossum
Re: latest -git: kernel hangs when pulling the plug on 8139too
It turns out that we're not getting as far as the "panic:" line in panic(). So I tried something new: Running a bash busy loop while unplugging the cable: $ while true; do echo p > /proc/sysrq-trigger; done And to my great surprise, the kernel doesn't reboot. But I can't use it either. It's simply printing the same message to ttyS0 over and over: SysRq : Show Regs It is also occasionally garbled, like this: SysRq : ow Regs ... Sys : Show Regs ... ...
Aug 12, 12:02 pm 2008
John W. Linville
Re: ath9k build failure
My bad...I'll fix it up w/ the next batch of fixes... John -- John W. Linville linville@tuxdriver.com --
Aug 12, 6:21 am 2008
Stephen Rothwell
Re: ath9k build failure
Hi Luis, On Mon, 11 Aug 2008 09:46:57 -0700 "Luis R. Rodriguez" <lrodriguez@atheros.= That is only the first application of the patch, my patch is a fix of a second instance of the same code. See Adrian's post. --=20 Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/
Aug 11, 8:33 pm 2008
David Miller
Re: LRO restructuring?
From: Herbert Xu <herbert@gondor.apana.org.au> And the checksums :-) As an intermediate node we don't want to touch the checksum. The length and the checksum is two u16 values, which would be able to fit in a single 32-bit descriptor or something like that. --
Aug 11, 5:54 pm 2008
David Miller
Re: LRO restructuring?
From: Rick Jones <rick.jones2@hp.com> IP header is a little different, intermediate nodes should verify it (and we do adjust it when decrementing TTL). --
Aug 11, 6:39 pm 2008
Herbert Xu
Re: LRO restructuring?
You don't have to save the whole thing, just save enough so we can easily/exactly reconstruct it on output, i.e., save the lengths. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
Aug 11, 5:50 pm 2008
Herbert Xu
Re: LRO restructuring?
Yep. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
Aug 11, 6:00 pm 2008
Rick Jones
Re: LRO restructuring?
Even if it was verified I think you want to keep the checksums from the header. Since an intermediate device isn't supposed to be peeking at the TCP part anyway, it wouldn't do to drop the segment ourselves, pass it along to be dropped by the ultimate reciever. And if there is something amis in the verification or the regeneration, we don't want to introduce silent data corruption. Likely that also goes for the IP header checksum... rick jones --
Aug 11, 6:30 pm 2008
Herbert Xu
Re: LRO restructuring?
Well I wasn't suggesting that it be dropped, but simply skip LRO if the inbound packet fails the checksum check. But yeah, it's only two bytes so we might as well always have it. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
Aug 11, 6:53 pm 2008
Andrew Gallatin
Re: LRO restructuring?
Indeed. Nor should they change lengths, or anything else. Everything about this "inexact" forwarding is illegal as hell. However, you have to admit that it is an interesting hack :) Drew --
Aug 12, 4:50 am 2008
Herbert Xu
Re: csum offload and af_packet
Oh I totally agree that there are lots of scenarios where you want to have an unmolested guest image. My point was that if you're going to touch the guest kernel anyway you might as well fix the guest user-space instead. This is also why I've argued that the default should be to disable TX checksums until the guest enables it so that old guests that know nothing about this can continue to work. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} ...
Aug 12, 4:37 pm 2008
David Miller
Re: csum offload and af_packet
From: Herbert Xu <herbert@gondor.apana.org.au> Unfortunately the censored version doesn't allow you to get at the link level headers, which is what at least some of these applications want. --
Aug 11, 5:51 pm 2008
Rusty Russell
Re: csum offload and af_packet
Then should we insist the user set PACKET_AUXDATA? Even then, the format of that cmsg will have to be enhanced as we change kernel internals. Which is probably why you *don't* get to see the bare details: you get a flag saying "oh, I know the checksum is bad". Without the csum_start/csum_offset fields you can't even calculate what it will be. The dhcp client thing is a symptom which can be fixed, but are we doing the right thing? (Tho for lguest this is a new problem with the current ...
Aug 11, 7:27 pm 2008
Herbert Xu
Re: csum offload and af_packet
Yes that is certainly true for the DHCP server, but Rusty was complaining about the DHCP client which certainly does not need the LL headers on reception (although using it is definitely more convenient since you need it on transmit anyway). I agree that handling this in AF_PACKET is certainly possible, and for that matter not extremely difficult. However, my point is that doing this for the purposes of virtualisation is completely pointless. The only time you need this is when you have ...
Aug 11, 5:58 pm 2008
Ingo Oeser
Re: csum offload and af_packet
Are you talking about modifying the KVM client image? There may be reasons, why this is impossible or at least highly undesired. Before virtualisation developers of embedded stuff need to seal away their whole machines for their development and test environment together with sample hardware. Now using virtualisation they can at least virtualize their development and test environment, save lots of cost and not risking that their sealed away hardware will not start again, when they need to ...
Aug 12, 9:17 am 2008
Herbert Xu
Re: csum offload and af_packet
I disagree. If you're using AF_PACKET you're asking to see the bare details. If you want to see the censored version you can It's not about disabling it, it's about enabling it dynamically once guest user-space is sure that *it* can handle this. Cheers,` -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
Aug 11, 5:32 pm 2008
Parag Warudkar
Re: [Bugme-new] [Bug 11297] New: OOPS in rt6_fill_node
Grrr. It looks like I was bitten by the infamous Netlink "No buffer space available" error which I was somehow overlooked. Applying your patch to a kernel which boots without the Netlink buffer space error shows the right output. Closing. Thanks. --
Aug 11, 6:06 pm 2008
Eugene Teo
Re: [Bugme-new] [Bug 11297] New: OOPS in rt6_fill_node
Hmm, I tried it on an older kernel that doesn't have Yoshfuji-san's ipv6_get_saddr() changes, and it should display the output with the loopback MAC address instead of ethX MAC address. Correct me if I am wrong. Thanks, Eugene --
Aug 11, 6:10 pm 2008
Brian Haley
Re: [Bugme-new] [Bug 11297] New: OOPS in rt6_fill_node
# ip -f inet6 route get fec0::1 unreachable fec0::1 dev lo table unspec proto none src 2001:1890:1109:a10:218:feff:fe7f:49c8 metric -1 error -101 hoplimit 255 And if I down eth0 I get: # ip -f inet6 route get fec0::1 unreachable fec0::1 dev lo table unspec proto none src ::1 metric -1 error -101 hoplimit 255 This is 2.6.27-rc2, like I said, I"m building a 2.6.24 kernel now. -Brian --
Aug 11, 5:41 pm 2008
Eugene Teo
Re: [Bugme-new] [Bug 11297] New: OOPS in rt6_fill_node
Evidence of me still sleepy. Not the MAC address but the ipv6 address... Eugene --
Aug 11, 6:28 pm 2008
Brian Haley
Re: [Bugme-new] [Bug 11297] New: OOPS in rt6_fill_node
Just an fyi I think part of the confusion was the output I posted: # ip -f inet6 route get fec0::1 unreachable fec0::1 dev lo table unspec proto none src 2001:1890:1109:a10:218:feff:fe7f:49c8 metric -1 error -101 hoplimit 255 On my system I have a global address on eth0, so that's printed in my output. Others don't have a global, so see ::1, which is expected. I see the same behavior on my Debian Lenny 2.6.18 box as 2.6.27, so my patch doesn't seem to have changed ...
Aug 11, 7:06 pm 2008
Parag Warudkar
Re: [Bugme-new] [Bug 11297] New: OOPS in rt6_fill_node
With ipv6 not loaded it returns not supported or something similar - correct of course. What output do you see with your patch? Parag --
Aug 11, 5:01 pm 2008
David Miller
Re: [Bugme-new] [Bug 11297] New: OOPS in rt6_fill_node
From: "Eugene Teo" <eugeneteo@kernel.sg> Hmmm... from what I understand so far based upon Parag's most recent reply, Brian's patch should be OK. Does everyone else agree? --
Aug 11, 6:40 pm 2008
Stephen Hemminger
Re: [PATCH] sky2: Fix suspend/hibernation/shutdown regre ...
On Sun, 10 Aug 2008 19:30:28 +0200 Yes, that's better. Acked-by: Stephen Hemminger <shemminger@vyatta.com> --
Aug 11, 7:57 pm 2008
John Gumb
RE: OOPS, ip -f inet6 route get fec0::1, linux-2.6.26,ip ...
Folks I've enclosed patch from Eugene just so we all know which patch we're talking about. It 'works' according to the following definition: a) Fixed OOPS b) runs overnight in our test network. This run doesn't do much specific ipv6 testing - but clearly what's there is catching stuff :-; cheers John -----Original Message----- From: Eugene Teo [mailto:eugeneteo@kernel.sg]=20 Sent: 11 August 2008 12:04 To: Brian Haley Cc: Alexey Dobriyan; John Gumb; ...
Aug 12, 2:11 am 2008
Eugene Teo
Re: OOPS, ip -f inet6 route get fec0::1, linux-2.6.26, i ...
With the patch I posted, this is the behaviour I get: $ ip -f inet6 route get fec0::1 unreachable fec0::1 dev lo table unspec proto none src fe80::214:4fff:fe0f:7332 metric -1 error -101 hoplimit 255 John emailed me that he will be testing this patch. I have not tested Thanks for letting me know. I wasn't familiar, but I welcome the hint. Thanks, Eugene --
Aug 11, 5:13 pm 2008
Eugene Teo
Re: OOPS, ip -f inet6 route get fec0::1, linux-2.6.26, i ...
Ok, so there's a mistake in my patch. It should return the loopback MAC address instead. I'm wondering if the fix should be related to initialising rt6i_idev in addrconf_init routine like in the upstream commit: c62dba9011b93fd88fde929848582b2a98309878. The code changed quite a lot. Eugene --
Aug 11, 5:41 pm 2008
Eugene Teo
Re: OOPS, ip -f inet6 route get fec0::1, linux-2.6.26, i ...
So I checked, it's initialised in ip6_route_init(). Eugene --
Aug 11, 6:40 pm 2008
Brian Haley
Re: OOPS, ip -f inet6 route get fec0::1, linux-2.6.26, i ...
Acked-by: Brian Haley <brian.haley@hp.com> But Yoshfuji might have another opinion since he did the work to remove ipv6_get_saddr() in the first place. -Brian --
Aug 11, 5:41 pm 2008
Marc Haber
Re: Need help with MCS7830 driver and 802.1q VLAN Tagging
Hi Ben, thanks for your quick answer. It'll be until next week when I'll be near a VLAN-able setup and the MCS device again, but I'll report back. Greetings Marc -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Mannheim, Germany | lose things." Winona Ryder | Fon: *49 621 72739834 Nordisch by Nature | How to make an American Quilt | Fax: *49 3221 2323190 --
Aug 12, 5:01 am 2008
Herbert Xu
Re: Kernel oops with 2.6.26, padlock and ipsec: probably ...
Right, well at least VIA could still use this and it wouldn't hurt others that much. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
Aug 11, 5:52 pm 2008
Herbert Xu
Re: Kernel oops with 2.6.26, padlock and ipsec: probably ...
I'll push this one along. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
Aug 12, 4:40 pm 2008
Wolfgang Walter
Re: Kernel oops with 2.6.26, padlock and ipsec: probably ...
* Works fine, machine is up since 61 minutes. * Performance: Routing performance over esp-tunnels seems unchanged here compared to 2.6.25 (this was also the case with the "kernel_fpu_begin" patch). tcrypt mode=200 shows exactly the same performance penalty compared to 2.6.25 as the "kernel_fpu_begin" patch. But I think this the right way to go with 2.6.26 und probably 2.6.27. And I'm not sure if tcyrpt really shows the whole story for 2.6.25: a) does it measure the costs of the ...
Aug 12, 4:43 am 2008
Herbert Xu
Re: Kernel oops with 2.6.26, padlock and ipsec: probably ...
On Mon, Aug 11, 2008 at 01:19:01PM -0700, Suresh Siddha wrote: Yes disabling preemption is the real killer. This is just a quick band-aid. Longer term we should add a task flag that indicates the task is currently doing kernel FPU which will tell the scheduler to clear TS the next time it's run. That way we won't need to disable preemtion or pollute the user task's FPU used state. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} ...
Aug 11, 5:39 pm 2008
Wolfgang Walter
Re: Kernel oops with 2.6.26, padlock and ipsec: probably ...
Yes, I'll test that tomorrow. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts Leiter EDV Leopoldstraße 15 80802 München Tel: +49 89 38196-276 Fax: +49 89 38196-144 wolfgang.walter@stwm.de http://www.studentenwerk-muenchen.de/ --
Aug 11, 5:38 pm 2008
Suresh Siddha
Re: Kernel oops with 2.6.26, padlock and ipsec: probably ...
Ok. In the real world usage, where we use these instructions both from process and softirq context, we will probably not see much penality, as the process context's first access will always endup doing full fp restore (and also we kick in the context switch FP optimization, which will As there are no further objections, Herbert/Ingo not sure who among you will push this patch to Linus tree and 2.6.26 -stable tree aswell. thanks, suresh --
Aug 12, 11:28 am 2008
H. Peter Anvin
Re: Kernel oops with 2.6.26, padlock and ipsec: probably ...
That's not sufficient, though, because you have to track all the state and how it relates to everything. You now have to track both the userspace FPU state and the potential kernel FPU state. The VIA instructions are special (in the short bus to school sense) in that they use a mechanism intended to protect specific state to protect -- exactly nothing. -hpa --
Aug 11, 5:42 pm 2008
Herbert Xu
Re: Kernel oops with 2.6.26, padlock and ipsec: probably ...
Sorry, the kernel TS state is what I meant. I'm definitely not advocating the saving of the kernel FPU state. This is only for things like the VIA (which also exists for other processors, see the xor SSE stuff in include/asm-x86). Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
Aug 11, 5:46 pm 2008
H. Peter Anvin
Re: Kernel oops with 2.6.26, padlock and ipsec: probably ...
No, there you are actually using the FPU state (which includes the SSE state.) -hpa --
Aug 11, 5:48 pm 2008
Herbert Xu
Re: Kernel oops with 2.6.26, padlock and ipsec: probably ...
That's not surprising since tcrypt runs with BH off so it'll do pretty much the same thing as before. This also shows that reading CR0 doesn't impose any extra overhead compared to what was done in kernel_fpu_begin. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
Aug 12, 5:02 am 2008
John Patrick Poet
Re: Realtek 8111C transmit timed out
When one of these "hick-ups" happen while I am streaming video from my Myth backend to my Myth frontend, it kills the playback, and I have to re-start it. It takes 5 to 10 seconds before I can re-start the playback, but it does work without having to take any special action on my part. So, to answer your question, yes traffic does flow afterwards. John --
Aug 12, 2:44 pm 2008
David Madsen
Re: Realtek 8111C transmit timed out
I also have a Realtek GigE card that was quite stable running on 2.6.24. I recently updated my kernel briefly to 2.6.25.10 then ultimately to 2.6.26.2 and started seeing similar timeouts in both kernel versions. My configuration didn't change much between the kernels, but I do remember enabling MSI when I rebuit the kernel. I have not yet had a chance to disable MSI to see if that fixes the timeouts but I thought I'd post what info I have in case that might steer the debug in the right ...
Aug 11, 10:42 pm 2008
Francois Romieu
Re: Realtek 8111C transmit timed out
David Madsen <david.madsen@gmail.com> : Please note that the kernel complains more loudly about the watchdog than it used to. Does it allow traffic to flow afterwards ? -- Ueimor --
Aug 12, 1:18 pm 2008
Jarek Poplawski
Re: [BUG] NULL pointer dereference in skb_dequeue
Of course, I've considered here only re-reading with a separate rcu_dereference(). BTW, in "our" code we can't have a NULL dereference: in the "worst" case it points to a noop_qdisc, which is a static As a matter of fact I wonder if it's 100% safe even without ksoftiqd or PREEMPT_RCU? Considering that such a softirq handler would be triggered after rcu_read_unlock_bh(), and maybe after some additional hard or soft irq handlers, isn't it possible some RCU reclaiming code I don't think so. I ...
Aug 12, 2:15 pm 2008
Paul E. McKenney
Re: [BUG] NULL pointer dereference in skb_dequeue
The usual problem with re-reading in a separate read-side critical section is that someone might have removed/destroyed it in the meantime. Consider the following example: Task 0: rcu_read_lock(); p = rcu_dereference(global_pointer); if (p == NULL) { rcu_read_unlock(); goto somewhere_else; } do_something_with(p); rcu_read_unlock(); do_some_unrelated_stuff(); rcu_read_lock(); do_something_else_with(p); /* BUG!!! */ rcu_read_unlock(); somewhere_else: Task ...
Aug 12, 1:18 pm 2008
Jarek Poplawski
Re: [BUG] NULL pointer dereference in skb_dequeue
I understand this similarly (but I'm still trying to find out what's wrong with reading this again in a separate read-side section). David gave some additional explanations (which BTW don't look to me like very "orthodox" RCU) in this thread: http://marc.info/?l=linux-netdev&m=121851847805942&w=2 Thanks, Jarek P. --
Aug 12, 11:09 am 2008
Jarek Poplawski
Re: [BUG] NULL pointer dereference in skb_dequeue
Sure, but I'm concerned here with pure RCU reading: From net/sched/sch_generic.c: void __qdisc_run(struct Qdisc *q) { unsigned long start_time = jiffies; while (qdisc_restart(q)) { /* * Postpone processing if * 1. another process needs the CPU; * 2. we've been doing it for too long. */ if (need_resched() || jiffies != start_time) { ...
Aug 11, 11:36 pm 2008
Paul E. McKenney
Re: [BUG] NULL pointer dereference in skb_dequeue
If I understand this code, one way to handle it would be to increment q->refcnt before passing to netif_schedule(), then decrementing it (within an RCU read-side critical section) in the softirq handler. There are probably other ways to handle this as well. --
Aug 12, 6:42 am 2008
Paul E. McKenney
Re: [BUG] NULL pointer dereference in skb_dequeue
OK, in that case you would not need the NULL check and goto, but Good point -- even if it were impossible in the current implementation, RCU is certainly within its rights to do the kfreeing in between. So Fair enough! Thanx, Paul --
Aug 12, 3:33 pm 2008
Divy Le Ray
Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator
Hi Dave, iSCSI PDUs might spawn over multiple TCP segments, it is unclear to me how to do placement without keeping some state of the transactions. In any case, such a stateless solution is not yet designed, whereas accelerated iSCSI is available now, from us and other companies. The accelerated iSCSI streams benefit from the performance TOE provides, outlined in the following third party ...
Aug 12, 2:57 pm 2008
David Miller
Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator
From: Divy Le Ray <divy@chelsio.com> So, WHAT?! There are TOE pieces of crap out there too. It's strictly not our problem. Like Herbert said, this is the TOE discussion all over again. The results will be the same, and as per our decisions wrt. TOE, history speaks for itself. --
Aug 12, 3:02 pm 2008
David Miller
Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator
From: Divy Le Ray <divy@chelsio.com> You keep a flow table with buffer IDs and offsets. The S2IO guys did something similar for one of their initial LRO impelementations. It's still strictly stateless, and best-effort. Entries can fall out of the flow cache which makes upcoming data use new buffers and offsets. But these are the kinds of tricks you hardware folks should be more than adequately able to design, rather than me. :-) --
Aug 12, 3:01 pm 2008
David Miller
Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator
From: Steve Wise <swise@opengridcomputing.com> When I say shape I mean apply any packet scheduler, any netfilter module, and any other feature we support. --
Aug 11, 5:22 pm 2008
Divy Le Ray
Re: [RFC][PATCH 1/1] cxgb3i: cxgb3 iSCSI initiator
Well, there is demand for accerated iscsi out there, which is the driving Herbert requested some benchmark numbers, I consequently obliged. Cheers, Divy --
Aug 12, 3:21 pm 2008
Jarek Poplawski
Re: NMI lockup, 2.6.26 release
On Tue, Aug 12, 2008 at 02:31:40PM +0300, Denys Fedoryshchenko wrote: Great! I didn't expect it would be so easy with this strange problem. So, it looks like hrtimers could break probably after some overscheduling. The only problem with this is to find some reasonable limit which is both safe and doesn't harm resolution too much for others. IMHO this second patch with 1 jiffie watchdog resolution looks reasonable and should be acceptable, but it would be nice to check if we can go lower. ...
Aug 12, 5:40 am 2008
Denys Fedoryshchenko
Re: NMI lockup, 2.6.26 release
With second patch it works fine, 9 days uptime now --
Aug 12, 4:31 am 2008
Martin Michlmayr
Re: [PATCH 2/2] myri10ge: use ioremap_wc
In that case, let's add Lennert to the CC line. -- Martin Michlmayr http://www.cyrius.com/ --
Aug 12, 2:14 am 2008
Dâniel
Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragilit ...
On Tue, 12 Aug 2008 01:18:22 -0700 (PDT) Sorry for my ignorance (I'm just an user), but if the problem is not with Linux, why this problem appeared just on 2.6.25 kernel? I mean, with 2.6.24 and before I never had stalled connections. Just a coincidence? Or something has changed in 2.6.25 which caused this? Thank you! -- --
Aug 12, 10:43 am 2008
David Miller
Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragilit ...
From: Thomas Jarosch <thomas.jarosch@intra2net.com> We had the same situation with ECN and window scaling, and my proposal is the same as how we handled those situations involving broken middleware boxes. --
Aug 12, 1:18 am 2008
Ilpo Järvinen Aug 12, 10:52 am 2008
Thomas Jarosch
Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragilit ...
David, I agree with you, though I'm not sure about the end user experience: The kernel is an early adopter of FRTO and will be bitten by bugs of other TCP implementations like we've experienced. I guess most affected users just see stalled or slow connections and won't have the time or knowledge to debug this. A proper warning could help them and the kernel developers to get this issue solved as quickly as possible. We called the hotline of the ISP several times and they always ...
Aug 12, 12:46 am 2008
Matti Linnanvuori
[patch v1.2.34] WAN: merge driver retina
From: Matti Linnanvuori <matti.linnanvuori@ascom.com> Retina G.703 and G.SHDSL PCI card driver. Signed-off-by: Matti Linnanvuori <matti.linnanvuori@ascom.com> --- I am sending a patch against 'retina' branch of jgarzik/netdev-2.6.git. This patch fixes the netif_tx_unlock bug in the previous patch. --- --- linux/drivers/net/wan/retina.c 2008-08-08 08:36:33.998260400 +0300 +++ linux-next/drivers/net/wan/retina.c 2008-08-12 12:37:59.477928300 +0300 @@ -40,12 +40,11 @@ #define ...
Aug 12, 2:46 am 2008
previous daytodaynext day
August 11, 2008August 12, 2008August 13, 2008