> -----Message d'origine-----
quoted text > De : linux-kernel-owner@vger.kernel.org=20
> [mailto:linux-kernel-owner@vger.kernel.org] De la part de Eric Dumazet
>=20
> On Wed, 5 Sep 2007 09:37:50 -0400
> "Fortier,Vincent [Montreal]" <Vincent.Fortier1@EC.GC.CA> wrote:
>=20
> >=20
> > Hi all,
> >=20
> > We are testing new hardware and planning a switch from our old
redhat
quoted text > > 7.3 to a Debian Etch 4.0 for our radar forecast analysis systems.
We=20
quoted text > > found out that our main IPC dispatcher software module would use
100%=20
quoted text > > of a CPU all the time and that the IPC queues would fill up quickly
on a
quoted text > > 2.6 kernel. We first tought that it would be a problem of=20
> > compatibility between a 2.4 vs 2.6 IPC calls vs our radar analysis=20
> > software but after a lot of work we have been able to test a 2.4=20
> > kernel on that same hardware and got the exact same problem.
> >=20
> > So curiously, on our actual systems (see SYSTEM 1 below) our IPC=20
> > dispatcher module works like a charm and queues gets near 0. On our
quoted text > > test system which are way more powerfull systems (see SYSTEM 2) our=20
> > IPC dispatcher module queue fills up rapidly (depending of the
msgmnb=20
quoted text > > queue size it will wimply take a bit longer to fill).
> >=20
> > We have tested both our already compiled binaries from rh73 using
gcc
quoted text > > 2.9 and a recompiled version of the modules on a debian sarge system
quoted text > > and got the exact same problem on either a Debian Sarge 3.1 (running
a=20
quoted text > > 2.4 or 2.6 kernel) and on a Etch 64bit system (using 32bit compat=20
> > layer) with a 2.6 kernel. In all cases the queues would simply
fill-up.
quoted text > >=20
> > After strac'ing the module I noticed that the time needed to handle=20
> > the signal & ipc calls are way lower on the new system hence I don't
quoted text > > see why the dispatcher queue does fill-up like that?!?!?!
> >=20
> > Does anyone experienced something similar? Could this be a kernel=20
> > issue vs material, kernel option? Might this be related to libc?
> >=20
> > Help / Clues very much appreciated.
>=20
> Hi Vincent
>=20
> top shows that something is eating cpu cycles in User mode on=20
> your new platform, while old platform consumes cycles both in=20
> User and System land.
>=20
> This might be related to some programing error, maybe some=20
> spinlock in user mode or bad multi-threading synchronization,=20
> or scheduling assumptions, that break because of the quad=20
> core cpus of your new machine.
Actually, could this be worth trying (adding Ingo in CC):
http://lkml.org/lkml/2007/9/5/75
quoted text > So the thread that is supposed to consume IPC messages is not=20
> scheduled in time, because CPU starves. (beware the four=20
> cores of each CPU compete for ressources)
>=20
> You could issue "ps auxm" to check which threads are spining=20
> in User mode and try to trace them ?
Effectively in this specific test case I had 5 stuck process using each
100% of a CPU... although 3 other cores where still available so there
is (I believe) no reason why it should had starved that much.
Anyhow, that did not happend on all the other testing I made during the
past 3 weeks (except this one).
I restarted the this test (again using a 2.4.35.1 kernel), mde sure no
process where stuck :), and grabbed the ps aux + ipcs -q info (attached)
Again, the ipcs -q shows that the queue is getting full comparing to
SYSTEM 1 which always has a queue of 0.
Note: Also attached a top.txt file showing that the dispatcher uses 100%
of a CPU on SYSTEM 2. This never occurs on SYSTEM 1.
quoted text > Eric
PS, thnx for replying.
- vin
quoted text > > SYSTEM INFORMATION:
> >=20
> > SYSTEM 1:
> > ---------
> > HPDL580 G2
> > Quad Intel Xeon 1.90GHz
> > 4G ram
> > DRBD disks on dual-gigabit adapter
> > OS: RedHat 7.3 / kernel: 2.4.33 / libc: 2.2.5 / gcc 2.96
> >=20
> > SYSTEM 2:
> > ---------
> > Dell PE2950
> > Dual Intel Quad-Core 2.66GHz
> > 16G ram
> > local 300G 15000 RPM SCSI.
> > OS1: Debian Etch 4.0 / kernels 2.6.18 -> 2.6.22 / libc 2.3.6 / gcc
4.1.2
quoted text > > OS2: Debian Sarge 3.1 / kernels 2.4.35, 2.6.18 -> 2.6.22 / libc
2.3.2 / gcc 3.3.5