Re: MPTSAS problems in 2.6.26-rc2-mm1

Previous thread: Performance Characteristics of All Linux RAIDs (mdadm/bonnie++) by Justin Piszcz on Wednesday, May 28, 2008 - 4:53 am. (27 messages)

Next thread: two different oopses with 2.6.26-rc4 by Alejandro Riveira on Wednesday, May 28, 2008 - 5:50 am. (7 messages)
To: <linux-scsi@...>
Cc: Andrew Morton <akpm@...>, linux kernel mailing list <linux-kernel@...>, Peter Zijlstra <a.p.zijlstra@...>
Date: Wednesday, May 28, 2008 - 5:41 am

I have 2.6.26-rc2-mm1 with Rik's splitvm patches. While booting the kernel, I
ran into the following. I searched my email quickly to see that no one else has
reported this problem. Lockdep seems to be complaining about the same lock being
held at the same location (the message is a bit confusing to me).

=============================================
[ INFO: possible recursive locking detected ]
2.6.26-rc2-mm1 #2
---------------------------------------------
insmod/1072 is trying to acquire lock:
(&cls->mutex){--..}, at: [<ffffffff803eb2cc>] device_add+0x46e/0x5d7

but task is already holding lock:
(&cls->mutex){--..}, at: [<ffffffff803eb2cc>] device_add+0x46e/0x5d7

other info that might help us debug this:
3 locks held by insmod/1072:
#0: (&ioc->sas_discovery_mutex){--..}, at: [<ffffffffa003e7e9>]
mptsas_probe+0x3a1/0x442 [mptsas]
#1: (&shost->scan_mutex){--..}, at: [<ffffffff80466cbd>]
scsi_scan_target+0x71/0xb0
#2: (&cls->mutex){--..}, at: [<ffffffff803eb2cc>] device_add+0x46e/0x5d7

stack backtrace:
Pid: 1072, comm: insmod Not tainted 2.6.26-rc2-mm1 #2

Call Trace:
[<ffffffff80254545>] __lock_acquire+0x911/0xc4c
[<ffffffff80254835>] ? __lock_acquire+0xc01/0xc4c
[<ffffffff803eb2cc>] ? device_add+0x46e/0x5d7
[<ffffffff8025490e>] lock_acquire+0x8e/0xb2
[<ffffffff803eb2cc>] ? device_add+0x46e/0x5d7
[<ffffffff8059e067>] mutex_lock_nested+0xf2/0x27f
[<ffffffff803eb2cc>] ? device_add+0x46e/0x5d7
[<ffffffff8059f80b>] ? _spin_unlock+0x26/0x2a
[<ffffffff803eb2cc>] device_add+0x46e/0x5d7
[<ffffffff803eb44e>] device_register+0x19/0x1d
[<ffffffff803eb531>] device_create+0xdf/0x110
[<ffffffff80253794>] ? trace_hardirqs_on_caller+0xf9/0x124
[<ffffffff802537cc>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff8059df73>] ? mutex_unlock+0x9/0xb
[<ffffffff803eed13>] ? kobj_map+0x119/0x12e
[<ffffffff802a4488>] ? e...

To: <balbir@...>
Cc: <linux-scsi@...>, Andrew Morton <akpm@...>, linux kernel mailing list <linux-kernel@...>
Date: Wednesday, May 28, 2008 - 5:45 am

looks like device_add() recursing - most likely with a different device

--

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: <balbir@...>, <linux-scsi@...>, Andrew Morton <akpm@...>, linux kernel mailing list <linux-kernel@...>, Greg KH <greg@...>
Date: Wednesday, May 28, 2008 - 10:47 am

This is another instance of a problem being caused by semaphore to mutex
conversion in struct class;

There's another thread on this here:

http://marc.info/?t=121074904600001

James

--

To: James Bottomley <James.Bottomley@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, <linux-scsi@...>, Andrew Morton <akpm@...>, linux kernel mailing list <linux-kernel@...>, Greg KH <greg@...>
Date: Wednesday, May 28, 2008 - 10:56 am

Thanks for pointer, it does indeed look like a variant of the same problem

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--

Previous thread: Performance Characteristics of All Linux RAIDs (mdadm/bonnie++) by Justin Piszcz on Wednesday, May 28, 2008 - 4:53 am. (27 messages)

Next thread: two different oopses with 2.6.26-rc4 by Alejandro Riveira on Wednesday, May 28, 2008 - 5:50 am. (7 messages)