Then try to lower gc_elasticity, to 3, or even 2
echo 3 >/proc/sys/net/ipv4/route/gc_elasticity
--
Here is output from dmesg with patch you supplied.
Kup /config # rtstat -i60 -c60
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst|
an_src| | _tot| _mc| | ed| miss| verflow|
_search|t_search|
103266| 69986| 11574| 0| 54| 0| 0| 0|
1982| 634| 0| 10984| 10980| 0| 0| 191808| 5254|
124787| 45125| 6019| 0| 28| 0| 0| 0|
807| 230| 0| 6277| 6274| 0| 0| 128922| 2518|
120270| 45588| 6288| 0| 30| 0| 0| 0|
883| 214| 0| 6532| 6529| 0| 0| 125651| 2743|
122253| 46522| 6582| 0| 27| 0| 0| 0|
897| 213| 0| 6822| 6819| 0| 0| 124927| 2761|[ 102.534363] dst_total: 120397 delayed: 12 work_perf: 0 expires: 27999
elapsed: 1 us
[ 130.530240] dst_total: 124277 delayed: 12 work_perf: 0 expires: 32998
elapsed: 2 us
[ 163.523240] dst_total: 110006 delayed: 12 work_perf: 0 expires: 39000
elapsed: 1 us
[ 202.519402] dst_total: 130453 delayed: 12 work_perf: 0 expires: 45998
elapsed: 1 us
[ 248.511220] dst_total: 110637 delayed: 12 work_perf: 0 expires: 52600
elapsed: 2 us
[ 301.102445] dst_total: 129366 delayed: 12 work_perf: 0 expires: 60696
elapsed: 6 usAfter while
Kup /config # rtstat -i300 -c60
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_to...
Very interesting... you can see 101588 dst are *delayed* in dst_garbage,
but apparently never freed.Something is wrong, but the count seems stable. Must be some kind of
event, admin driven or something...
might be the "ip route flush cache" that is schedlued around 600 seconds
of machine alive, then secret_interval seconds later...Typically, when a "ip route flush cache" is done (manually or triggered
by secret_interval timer), refcounted>0 entries are put into dst_garbage.Then when some trafic occurs on the flows involved, IP stack should
decrement refcount so that next dst_garbage round can free the deleted
entries. Normal TCP connections are doing this correctly. On your
machine, nothing.Possibly idle tcp sessions ?
Maybe some netfilter problem ?
Please tell us more about your machine ;)
--
From: Eric Dumazet <dada1@cosmosbay.com>
I'm beginning to suspect it's IFB and shaping somehow.
If those variables could be eliminated by eliminating them
from the configuration, it would help diagnose this a lot.
--
Just to make sure 2.6.24.3 is stable and it is regression i am supplying
output from it.
Do you want me to submit summary to bugzilla and regression list as well?And in short, IMHO 2.6.25 have major issues on routing that have to be fixed
before release. TRIE is crashing, and even with HASH there is leak. I am
trying my best to bisect it, but it is major router and i cannot take much
risk on it, so i wish i can simulate in my home mini-lab. Still i am not able
to get even proper switch (Lebanon difficult country for IT).Kup ~ # uname -a
Linux Kup 2.6.24.3-build-0023 #3 SMP Sat Mar 8 13:01:35 EET 2008 i686 unknownup ~ # rtstat -i60 -c6000
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst|
an_src| | _tot| _mc| | ed| miss| verflow|
_search|t_search|
54750| 4430| 1128| 0| 12| 0| 0| 0|
263| 190| 0| 709| 708| 0| 0| 3545| 313|
92913| 8829| 1211| 0| 1| 0| 0| 0|
343| 163| 0| 1375| 1373| 0| 0| 12545| 724|
115323| 8232| 906| 0| 0| 0| 0| 0|
299| 128| 0| 1035| 1033| 0| 0| 18069| 813|
128985| 8650| 839| 0| 0| 0| 0| 0|
289| 115| 0| 954| 952| 0| 0| 22515| 845|
116682| 8911| 861| 0| 0| 0| 0| 0|
288| 117| 0| 978| 976| 0| 0| 23433| 775|
99969| 9164| 889| 0| 0| 0| 0| 0|
280| 113| 0| 1002| 1...
Maybe you are a litle bit too fast for "ip route flush cache" :)
It used to work like that : schedule a timer to start a flush in about 2
seconds. A flush meaning : scan the whole table and delete all entries.On machines with 4 millions dst entries, this was using too much time
and eventually crashing.On recent kernels, each rtable entry has a special field named rt_genid,
so that "ip route flush cache" doesnt have to scan the whole table, but
only change the global genid. rtables entries will be deleted later,
when their rt_genid is found to be different than the global genid.Please try the patch that was suggested yesterday, as it is probably the
cure your router needs.http://git2.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdif...
Thank you
--
Already patched and tested, it doesn't change anything.
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.--
We still leak dsts somewhere.
You could try git bisect, or try to patch net/core/dst.c so that
dst_gc_task() (line 83) displays
route informations for say 10 first entries found in the dst_busy_list
(refcnt, interface, source IP, dest IP, things like that) that could--
I cooked a patch (untested) to implement this idea :
It should display lines similar to /proc/net/rt_cache (reusing the same=20
helper function)Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
include/net/dst.h | 2 +
net/core/dst.c | 7 ++++++
net/ipv4/route.c | 47 ++++++++++++++++++++++++++------------------
3 files changed, 37 insertions(+), 19 deletions(-)
On Fri, 28 Mar 2008 16:57:55 +0100
I wonder how much the route cache really helps when it grows so large?
Robert Olsson had suggested that turning it off when routing would help.
Perhaps the route cache is only really useful for local destinations? If the
cost of maintaining the route cache exceeds the cost of just using the existing
route table, there is no value to having a route cache.
--
It seems or patch change something (but it is just showing debug, strange),
or there is something fixed between 2.6.25-rc7-git1 and 2.6.25-rc7-git3. LC-
trie working fine, HASH also i cannot see any leaks.I will have to wait 5-6 hours to make sure. After this time pass, if i will
not see bug again, i will try to run kernel just with default debug like
before.If it is required, i can test performance and cpu load with/without routing
cache on real workload. Sure it is better to have syntetic tests before that.--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.--
Yes, the fix is the patch we mentioned yesterday, and you told us you tried it :(
commit 7c0ecc4c4f8fd90988aab8a95297b9c0038b6160
[ICMP]: Dst entry leak in icmp_send host re-lookup code (v2).
Commit 8b7817f3a959ed99d7443afc12f78a7e1fcc2063 ([IPSEC]: Add ICMP host
relookup support) introduced some dst leaks on error paths: the rt
pointer can be forgotten to be put. Fix it bu going to a proper label.Found after net namespace's lo refused to unregister :) Many thanks to
Den for valuable help during debugging.Herbert pointed out, that xfrm_lookup() will put the rtable in case
of error itself, so the first goto fix is redundant.Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Denis V. Lunev <den@openvz.org>--
From: Eric Dumazet <dada1@cosmosbay.com>
Denys please be more careful in the future :-(
So much time and effort got wasted because of this.
--
I checked whole night, shame on me. Seems this patch was curing the problem.
Probably it was rejected, and i didn't notice. I will be more careful, sorry
again.--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.--
I will try do so,
And i have a bit more information, not sure if it is useful. I took risk and
running now one more loaded router, without nat this time. It has significant
less load, but maybe i will catch something here also.Here is dmesg i got
[ 23.280155] dst_total: 4 delayed: 1 work_perf: 0 expires: 600 elapsed: 1 us
[ 23.888719] dst_total: 5 delayed: 1 work_perf: 0 expires: 1600 elapsed: 1
us
[ 25.489486] dst_total: 11 delayed: 0 work_perf: 1 expires: 4294967295
elapsed: 2 us
[ 67.187254] dst_total: 23980 delayed: 1 work_perf: 0 expires: 600 elapsed:
2 us
[ 67.807126] dst_total: 24452 delayed: 1 work_perf: 0 expires: 1600
elapsed: 3 us
[ 69.453570] dst_total: 25103 delayed: 0 work_perf: 2 expires: 4294967295
[ 278.911357] dst_total: 16855 delayed: 2 work_perf: 0 expires: 600 elapsed:
2 us
[ 279.530432] dst_total: 16866 delayed: 2 work_perf: 0 expires: 1600
elapsed: 1 us
[ 281.197568] dst_total: 16901 delayed: 2 work_perf: 0 expires: 3100
elapsed: 1 us
[ 284.425797] dst_total: 16981 delayed: 2 work_perf: 0 expires: 4981
elapsed: 1 us
[ 289.665137] dst_total: 17067 delayed: 2 work_perf: 0 expires: 8000
elapsed: 1 us
[ 297.960978] dst_total: 17219 delayed: 2 work_perf: 0 expires: 11000
elapsed: 1 us
[ 309.379867] dst_total: 17426 delayed: 2 work_perf: 0 expires: 14100
elapsed: 2 us
[ 323.972039] dst_total: 17629 delayed: 2 work_perf: 0 expires: 18196
elapsed: 2 us
[ 342.831626] dst_total: 13563 delayed: 2 work_perf: 0 expires: 23000
elapsed: 2 us
[ 366.592260] dst_total: 13830 delayed: 2 work_perf: 0 expires: 28000
elapsed: 2 us
[ 395.753299] dst_total: 14142 delayed: 2 work_perf: 0 expires: 33000
elapsed: 2 us
[ 429.952513] dst_total: 13156 delayed: 3 work_perf: 0 expires: 600 elapsed:
2 us
[ 430.565783] dst_total: 13164 delayed: 3 work_perf: 0 expires: 1600
elapsed: 1 us
[ 432.267868] dst_total: 13184 delayed: 3 work_perf: 0 expires: 3100
elapsed: 1 us
[ 435.457375] dst_total: 13220 delayed: 3 work_perf: 0 expi...
After patching , without ifb and "shapers".
[11807.126790] dst_total: 807311 delayed: 785973 work_perf: 0 expires: 45928
elapsed: 71977 us
[11853.118939] dst_total: 810625 delayed: 785973 work_perf: 0 expires: 52928
elapsed: 72092 us
[11906.110627] dst_total: 814329 delayed: 785973 work_perf: 0 expires: 59929
elapsed: 71415 us
[11966.101684] dst_total: 818621 delayed: 785973 work_perf: 0 expires: 67929
elapsed: 71566 us
[12034.092650] dst_total: 823641 delayed: 785973 work_perf: 0 expires: 76927
elapsed: 72856 us
[12111.080909] dst_total: 829008 delayed: 785973 work_perf: 0 expires: 85927
elapsed: 73175 us
[12197.066708] dst_total: 835397 delayed: 785973 work_perf: 0 expires: 94928
elapsed: 72001 us
[12292.054049] dst_total: 842092 delayed: 785972 work_perf: 1 expires: 104927
elapsed: 73341 us
[12397.038451] dst_total: 849712 delayed: 785972 work_perf: 0 expires: 115926
elapsed: 74072 us
[12513.032054] dst_total: 858081 delayed: 785972 work_perf: 0 expires: 119915
elapsed: 77027 us
[12633.016029] dst_total: 866064 delayed: 785972 work_perf: 0 expires: 119913
elapsed: 87431 us
[12752.987981] dst_total: 874067 delayed: 785972 work_perf: 0 expires: 119923
elapsed: 76570 us
[12872.981870] dst_total: 882006 delayed: 785972 work_perf: 0 expires: 119911
elapsed: 88655 us
[12992.950062] dst_total: 889918 delayed: 785972 work_perf: 0 expires: 119924
elapsed: 76035 usrt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst|
an_src| | _tot| _mc| | ed| miss| verflow|
_search|t_search|
896238|135781150|112584149| 0| 475839| 1708| 0| 18|
3234260| 2340122| 0|115324498|115284650| 35962| 0|...
After removing shaping and ifb problem probably remains. I will try to apply
mentioned by Denis Lunev, patch.Netfilter it is kind of difficult to remove completely, pattern of traffic
will change a lot, and it will be false that it was cause of problem. I am
using NOTRACK, SNAT, DNAT, sure filtering. Nothing "special".rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
| | tot| mc| ute| | an_dst|
an_src| | _tot| _mc| | ed| miss| verflow|
_search|t_search|
122438|25068014| 2464002| 0| 8579| 105| 0| 3|
436160| 149272| 0| 2560231| 2557567| 54| 0|52890998|
1164822|[ 355.594112] dst_total: 115387 delayed: 1 work_perf: 0 expires: 68196
elapsed: 0 us
[ 423.780775] dst_total: 126186 delayed: 1 work_perf: 0 expires: 76999
elapsed: 0 us
[ 500.768537] dst_total: 122085 delayed: 1 work_perf: 0 expires: 86000
elapsed: 0 us
[ 586.755462] dst_total: 121715 delayed: 1 work_perf: 0 expires: 95000
elapsed: 1 us
[ 681.741665] dst_total: 129798 delayed: 1 work_perf: 0 expires: 104999
elapsed: 0 us
[ 786.732216] dst_total: 116243 delayed: 74333 work_perf: 1 expires: 600
elapsed: 7076 us
[ 787.338620] dst_total: 116912 delayed: 74333 work_perf: 0 expires: 1600
elapsed: 6962 us
[ 788.945653] dst_total: 118565 delayed: 74333 work_perf: 0 expires: 3100
elapsed: 7246 us
[ 792.052890] dst_total: 121611 delayed: 74333 work_perf: 0 expires: 5672
elapsed: 6954 us
[ 797.730354] dst_total: 126794 delayed: 74333 work_perf: 0 expires: 7993
elapsed: 7276 us
[ 805.729136] dst_total: 115686 delayed: 74333 work_perf: 0 expires: 10993
elapsed: 7270 us
[ 816.727732] dst_total: 126402 delayed: 74333 work_pe...
could you check the patch sent yesterday by Pavel under the name
[PATCH][ICMP]: Dst entry leak in icmp_send host re-lookup code (v2).It can fit the case, massive DST leakage is possible here.
Regards,
Den--
One more idea before leaving :)
If rt_cache_entries still increasing, we might have a dst leak somewhere :
gc tries to evict entries that have a non null refcount -> they are put
in dst_garbage.listFollowing patch will show us how dst_garbage behaves.
( printk(KERN_DEBUG "dst_total: %d delayed: %d work_perf: %d" ...)diff --git a/net/core/dst.c b/net/core/dst.c
index 7deef48..e634e5f 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -31,6 +31,8 @@
* 3) This list is guarded by a mutex,
* so that the gc_task and dst_dev_event() can be synchronized.
*/
+#undef RT_CACHE_DEBUG
+#define RT_CACHE_DEBUG 2
#if RT_CACHE_DEBUG >= 2
static atomic_t dst_total = ATOMIC_INIT(0);
#endif--
| Andrew Morton | -mm merge plans for 2.6.23 |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Greg KH | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Tomasz Kłoczko | Is it time for remove (crap) ALSA from kernel tree ? |
git: | |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | [GIT]: Networking |
| Paweł Staszewski | iproute2 action/policer question |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
