How do I analyze a soft lockup?

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: <linux-kernel@...>
Date: Wednesday, March 7, 2007 - 10:02 am

Hi there,

Environment:

Kernel is 2.6.16.27, arch x86_64 on a Dual Core AMD64 machine with
4 GB of RAM. Also involved is an Areca 1100 SATA RAID controller
with the drives from the Tekram website. 

Problem: 

We get customer reports that a system stops with the following kernel
messages (as they are submitted to us, we're not able yet to reproduce
the problem in our laboratory :-( )

------------------8<-snip--------------------------------------------

BUG: soft lockup detected on CPU#1!

Call Trace:
  <IRQ>  [<ffffffff8024cd64>] softlockup_tick+0xd3/0xe5
  [<ffffffff8023700d>] update_process_times+0x42/0x68
  [<ffffffff80215925>] smp_local_timer_interrupt+0x31/0x54
  [<ffffffff80215997>] smp_apic_timer_interrupt+0x4f/0x66
  [<ffffffff8020a266>] apic_timer_interrupt+0x66/0x70
  <EOI>  [<ffffffff8020969e>] system_call+0x7e/0x83
  [<ffffffff8025d643>] __handle_mm_fault+0x256/0x2d9
  [<ffffffff8025d640>] __handle_mm_fault+0x253/0x2d9
  [<ffffffff8021d5c2>] do_page_fault+0x23f/0x572
  [<ffffffff8020969e>] system_call+0x7e/0x83
  [<ffffffff8020969e>] system_call+0x7e/0x83
  [<ffffffff8020969e>] system_call+0x7e/0x83
  [<ffffffff8020969e>] system_call+0x7e/0x83
  [<ffffffff805c27dd>] error_exit+0x0/0x84
  [<ffffffff8020969e>] system_call+0x7e/0x83
  [<ffffffff80373550>] __put_user_4+0x20/0x30
  [<ffffffff802272cb>] schedule_tail+0x81/0x86
  [<ffffffff8020958c>] ret_from_fork+0xc/0x25

BUG: soft lockup detected on CPU#1!

Call Trace:
  <IRQ>  [<ffffffff8024cd64>] softlockup_tick+0xd3/0xe5
  [<ffffffff8023700d>] update_process_times+0x42/0x68
  [<ffffffff80215925>] smp_local_timer_interrupt+0x31/0x54
  [<ffffffff80215997>] smp_apic_timer_interrupt+0x4f/0x66
  [<ffffffff8020a266>] apic_timer_interrupt+0x66/0x70
  <EOI>  [<ffffffff8020969e>] system_call+0x7e/0x83
  [<ffffffff8025d643>] __handle_mm_fault+0x256/0x2d9
  [<ffffffff8025d640>] __handle_mm_fault+0x253/0x2d9
  [<ffffffff8021d5c2>] do_page_fault+0x23f/0x572
  [<ffffffff8020969e>] system_call+0x7e/0x83
  [<ffffffff8020969e>] system_call+0x7e/0x83
  [<ffffffff8021d8f2>] do_page_fault+0x56f/0x572
  [<ffffffff805c27dd>] error_exit+0x0/0x84
  [<ffffffff8020969e>] system_call+0x7e/0x83
  [<ffffffff80373550>] __put_user_4+0x20/0x30
  [<ffffffff802272cb>] schedule_tail+0x81/0x86
  [<ffffffff8020958c>] ret_from_fork+0xc/0x25

----------------8<-snip------------------------------------------

The first thing that makes me very suspicious or curious is that the
call stack shows "__handle_mm_fault" twice, but looking at the source
of that function I don't see any recursion that would explain the 
double line. 

Is there any idea where I can start digging around? 

TIA
Rainer
-- 
Rainer König, Diplom-Infomatiker (FH), Augsubrg, Germany
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
How do I analyze a soft lockup? , Rainer Koenig, (Wed Mar 7, 10:02 am)
Re: How do I analyze a soft lockup?, Arjan van de Ven, (Wed Mar 7, 11:12 am)