Re: [ANNOUNCE] 2.6.33.1-rt11 - BUG?

Previous thread: none

Next thread: [PATCH] readahead even for FMODE_RANDOM by Jens Axboe on Thursday, April 1, 2010 - 11:31 am. (6 messages)
From: Thomas Gleixner
Date: Thursday, April 1, 2010 - 11:23 am

No, it traces back to a call to lock_tx_qs() which is a spinlock in
mainline and gets converted to a "sleeping" spinlock in -RT. That
means it can't be called with interrupts disabled. But the code in
adjust_link does exaclty that.

Does the patch below fix it ?

Thanks,

	tglx
---
Subject: net-gianfar-fix-rt-splat.patch
From: Thomas Gleixner <tglx@linutronix.de>
Date: Thu, 01 Apr 2010 20:20:57 +0200

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/net/gianfar.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6-tip/drivers/net/gianfar.c
===================================================================
--- linux-2.6-tip.orig/drivers/net/gianfar.c
+++ linux-2.6-tip/drivers/net/gianfar.c
@@ -2717,7 +2717,7 @@ static void adjust_link(struct net_devic
 	struct phy_device *phydev = priv->phydev;
 	int new_state = 0;
 
-	local_irq_save(flags);
+	local_irq_save_nort(flags);
 	lock_tx_qs(priv);
 
 	if (phydev->link) {
@@ -2785,7 +2785,7 @@ static void adjust_link(struct net_devic
 	if (new_state && netif_msg_link(priv))
 		phy_print_status(phydev);
 	unlock_tx_qs(priv);
-	local_irq_restore(flags);
+	local_irq_restore_nort(flags);
 }
 
 /* Update the hash table based on the current list of multicast




--

From: Xianghua Xiao
Date: Thursday, April 1, 2010 - 11:46 am

That fixed it. Thanks!
However I'm seeing two more similar rtmutex:684 BUGs from dmesg now,
they're from my own drivers and I'm tracking them down.

Xianghua


--

From: Xianghua Xiao
Date: Thursday, April 1, 2010 - 1:44 pm

Here is the new dmesg output:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:684
pcnt: 1 0 in_atomic(): 1, irqs_disabled(): 1, pid: 5770, name: insmod
Call Trace:
[ce935dc0] [c00096cc] show_stack+0x6c/0x1a4 (unreliable)
[ce935df0] [c001f928] __might_sleep+0x104/0x108
[ce935e00] [c03cb414] rt_spin_lock+0xa0/0xa4
[ce935e10] [c00a4098] kmem_cache_alloc+0x50/0x17c
[ce935e40] [c0073570] irq_to_desc_alloc_node+0x104/0x5ec
[ce935e60] [c00064e0] irq_setup_virq+0x30/0xa8
[ce935e80] [c000665c] irq_create_mapping+0x104/0x168
[ce935ea0] [d1f69bc4] dma_init+0x118/0x1f0 [ipc]
[ce935ee0] [d1f75018] ipc_init+0x18/0x140 [ipc]
[ce935ef0] [c00038e0] do_one_initcall+0x54/0x210
[ce935f20] [c005e424] sys_init_module+0x120/0x240
[ce935f40] [c00139d4] ret_from_syscall+0x0/0x38

I chased from ipc_init to irq_to_desc_alloc_node and found no
interrupt-disabling.

By looking at irq_to_desc_alloc_node (kernel/irq/handler.c) it has
raw_spin_lock_irqsave(), with this raw spinlock irqsave I'm not sure
if it causes trouble at kmem_cache_alloc after rt11 is applied, still
checking on that.

thanks,
xianghua

--

From: Thomas Gleixner
Date: Thursday, April 1, 2010 - 2:07 pm

On Thu, 1 Apr 2010, Xianghua Xiao wrote:


Can you please disabled CONFIG_SPARSE_IRQ ?

Thanks,

	tglx
--

From: Xianghua Xiao
Date: Thursday, April 1, 2010 - 2:54 pm

After disabled CONFIG_SPARSE_IRQ the BUG output disappeared.
Thanks a lot!
Xianghua
--

Previous thread: none

Next thread: [PATCH] readahead even for FMODE_RANDOM by Jens Axboe on Thursday, April 1, 2010 - 11:31 am. (6 messages)