> I have a dual quad-core Xeon system running software
> (
http://www.unidata.ucar.edu/software/ldm) that relays and processes
> weather data through RPC calls, keeping a queue of data in a memory
> mapped file. Up until 2.6.26 the system has run just fine (for example
> 2.6.25.17). But starting with 2.6.26 through 2.6.27.2 the system runs
> into a problem after approximately 24 hours. The symptom is that the
> processing slows down to a crawl. Using "top" I can see that the System
> time is up over 90%, with almost no User and Wait time. If I stop and
> restart the software, most of the time it gets better - but sometimes it
> takes a reboot to fix the problem. I have an identical system that does
> just processing and ingesting data from remote systems, and it does not
> have this problem. I have tried a number of different kernel
> configurations, but they all show the same problem.
>
> I suspect a problem with SUNRPC. I notice that there were a large
> number of SUNRPC patches in 2.6.26. I am looking for suggestions on how
> to pin down which patches are causing the problem. Are there ways to
> figure where in the kernel the time is being spent? I am will to work
> on isolating the problem, but I need some suggestions on the best way to
> do it given the large number of SUNRPC patches in 2.6.26 and the fact
> that each experiment takes a day.