RE: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: J. Bruce Fields <bfields@...>
Cc: Jeff Layton <jlayton@...>, <linux-kernel@...>, <linux-nfs@...>, Neil Brown <neilb@...>
Date: Thursday, June 19, 2008 - 11:53 am

> -----Original Message-----

The kernel that we were really seeing the problem with was 2.6.25.4, but
I think we may have figured out the 4096 problem, and it was probably a
mistake on my part, but it is important for the NFS users to see it so
they don't make the same mistake.  I had found some performance tuning
guides, and in trying some of the suggestions, found that the setting
changes did seem to help on some things, but of course I never got to
run a check under full load (800 + clients).  A suggestion was to change
the tcp_reordering tunable under /proc/sys/net/ipv4 from the default 3
to 127.  We think that this was actually causing the issue.  I was able
to trace back through all of the changes, and I changed this setting
back to the default 3, and it immediately fixed the size-4096 hell.  It
appears that the reordering just eats into the memory, especially in
high demand situations, and I guess that should make perfect sense if we
are actually buffering up packets for reorder, and we are slamming the
box with thousands of requests per minute.

We still have other performance issues now, but it appears to be more of
a bottleneck, the nodes do not appear to be backing off when the servers
are becoming congested.



We were estimating between 40 and 50 threads was the cut off for being
able to service all of the (current) requests at once.  I haven't ramped
back up to that level yet.  I wasn't comfortable yet with letting it all
hang back out just in case we get into that hellish mode again, it can
be a pain to try and get into those systems once they are overloaded
(even over serial, sometimes it can just timeout the login).  We had to
actually bring online a second option to help alleviate some of the back
congestion because the servers couldn't handle the workload.  




Ah, OK.  That makes sense.

--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?, J. Bruce Fields, (Wed Jun 11, 3:52 pm)
Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?, Jeff Layton, (Wed Jun 11, 4:09 pm)
Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?, J. Bruce Fields, (Wed Jun 11, 4:57 pm)
RE: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?, Weathers, Norman R., (Wed Jun 11, 6:46 pm)
Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?, J. Bruce Fields, (Wed Jun 11, 6:54 pm)
RE: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?, Weathers, Norman R., (Thu Jun 12, 3:54 pm)
Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?, J. Bruce Fields, (Fri Jun 13, 4:15 pm)
RE: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?, Weathers, Norman R., (Fri Jun 13, 5:53 pm)
Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?, J. Bruce Fields, (Fri Jun 13, 6:04 pm)
RE: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?, Weathers, Norman R., (Fri Jun 13, 6:53 pm)
Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?, J. Bruce Fields, (Mon Jun 16, 1:43 pm)
RE: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?, Weathers, Norman R., (Thu Jun 19, 11:53 am)
Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?, J. Bruce Fields, (Thu Jun 19, 2:46 pm)