Re: [PATCH 2/2] atl1: zero out CMB and SBM in atl1_free_ring_resources

Previous thread: [PATCH] net/cxgb3: remove undefined operations by Andreas Schwab on Saturday, September 11, 2010 - 4:12 am. (2 messages)

Next thread: Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device. by Michael S. Tsirkin on Sunday, September 12, 2010 - 6:37 am. (1 message)
From: Luca Tettamanti
Date: Saturday, September 11, 2010 - 6:12 am

Hello,
I'm having troubles with atl1 driver (notebook is an Asus F3Sa): during
the resume from S3 the machine either locks up or reboots.
However I was also able to get a (partial) stack trace involving
atl1_resume (the rest of the trace scrolled off screen) and a message
about a recursive fault, followed by a hard lock up.
If I rmmod atl1 before suspending, the machine comes back fine; I don't
known whether this is a regression or not, until recently the radeon
driver didn't survive S3 so I didn't even try suspending to RAM...
I'm currently running -git current (2.6.36-rc3-00396-gbe6200a), but I've
gone back till 2.6.32 without any succcess.
I've no idea on how to debug this, the machine does not have a serial
port (and netconsole obviously does not help)... help :)

Luca
--

From: Jarek Poplawski
Date: Saturday, September 11, 2010 - 1:47 pm

Here is an example of kernel photography with an interesting hint
by Andrew Morton: 
https://bugzilla.kernel.org/show_bug.cgi?id=16626


Jarek P.
--

From: Luca Tettamanti
Date: Tuesday, September 14, 2010 - 6:59 am

On Sat, Sep 11, 2010 at 10:47 PM, Jarek Poplawski <jarkao2@gmail.com> wrote:

Thanks for the tip, however the stack trace is extremely elusive: the
machine keeps resetting.
Anyway I've managed to track it down the old way (sticking printk
everywhere and commenting out random pieces of code :P)

Luca
--

From: Luca Tettamanti
Date: Tuesday, September 14, 2010 - 7:22 am

Hello,
I've managed to track down the resume bug in the driver, I'm leaving
the full quote for reference:


The bug is in atl1_resume, when it clears the interrupt status:
adapter->cmb.cmb->int_stats = 0

adapter->cmb.cmb is initialized (atl1_setup_ring_resources) only when
the interface is brought up so if the interface is not in use (which
is very common for me, since I'm usually on wifi) on resume the kernel
will try to dereference a NULL pointer.
Worse still, the memory pointed by adapter->cmb.cmb is freed by
atl1_free_ring_resources when the interface is brought down, leaving a
stale pointer which is then used in the resume path.
Btw, the same applies to adapter->smb.smb, though it's not used on resume.

The bug should be solved by moving the assignment under "if
(netif_running(netdev))" (assuming netif_running does what I think it
does :P); it also probably a good idea to reset adapter->cmb.cmb and
adapter->smb.smb when the memory is freed anyway.
My notebook has survived a few suspend&resume cycles with and without
wired networking, so far so good :)
If you agree on the proposed fixes I'll send the patches.

Luca
--

From: Luca Tettamanti
Date: Thursday, September 16, 2010 - 12:29 pm

Ok, I'll be AFK till next monday so here's the patches.
I think the first one is suitable for -stable.

Luca
--

From: Luca Tettamanti
Date: Thursday, September 16, 2010 - 12:29 pm

adapter->cmb.cmb is initialized when the device is opened and freed when
it's closed. Accessing it unconditionally during resume results either
in a crash (NULL pointer dereference, when the interface has not been
opened yet) or data corruption (when the interface has been used and
brought down adapter->cmb.cmb points to a deallocated memory area).

Cc: stable@kernel.org
---
 drivers/net/atlx/atl1.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/atlx/atl1.c b/drivers/net/atlx/atl1.c
index 63b9ba0..bbd6e30 100644
--- a/drivers/net/atlx/atl1.c
+++ b/drivers/net/atlx/atl1.c
@@ -2847,10 +2847,11 @@ static int atl1_resume(struct pci_dev *pdev)
 	pci_enable_wake(pdev, PCI_D3cold, 0);
 
 	atl1_reset_hw(&adapter->hw);
-	adapter->cmb.cmb->int_stats = 0;
 
-	if (netif_running(netdev))
+	if (netif_running(netdev)) {
+		adapter->cmb.cmb->int_stats = 0;
 		atl1_up(adapter);
+	}
 	netif_device_attach(netdev);
 
 	return 0;
-- 
1.7.1

--

From: Chris Snook
Date: Thursday, September 16, 2010 - 1:13 pm

Thanks.  I confirmed that the other atlx drivers (which I have been
reminded I need to finish merging into a single driver) do not have
this bug.

Acked-by: Chris Snook <chris.snook@gmail.com>
--

From: Greg KH
Date: Thursday, September 16, 2010 - 2:13 pm

You do realize you need to sign-off on a patch you create, right?

thanks,

greg k-h
--

From: David Miller
Date: Thursday, September 16, 2010 - 9:47 pm

From: Greg KH <greg@kroah.com>

Right.

Luca please formally resubmit these patches with a proper signoff
and all of the accumulated ACKs so far.
--

From: Luca Tettamanti
Date: Thursday, September 16, 2010 - 12:29 pm

They are allocated in atl1_setup_ring_resources, zero out the pointers
in atl1_free_ring_resources (like the other resources).
---
 drivers/net/atlx/atl1.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/net/atlx/atl1.c b/drivers/net/atlx/atl1.c
index bbd6e30..c73be28 100644
--- a/drivers/net/atlx/atl1.c
+++ b/drivers/net/atlx/atl1.c
@@ -1251,6 +1251,12 @@ static void atl1_free_ring_resources(struct atl1_adapter *adapter)
 
 	rrd_ring->desc = NULL;
 	rrd_ring->dma = 0;
+
+	adapter->cmb.dma = 0;
+	adapter->cmb.cmb = NULL;
+
+	adapter->smb.dma = 0;
+	adapter->smb.smb = NULL;
 }
 
 static void atl1_setup_mac_ctrl(struct atl1_adapter *adapter)
-- 
1.7.1

--

From: Chris Snook
Date: Thursday, September 16, 2010 - 1:15 pm

Acked-by: Chris Snook <chris.snook@gmail.com>
--

From: Luca Tettamanti
Date: Wednesday, September 22, 2010 - 12:34 pm

Ping?

Luca
--

From: David Miller
Date: Wednesday, September 22, 2010 - 1:23 pm

From: Luca Tettamanti <kronos.it@gmail.com>

For some reason these resubmissions make it to netdev nor patchwork,
could you please try sending these two patches again?

Thanks!
--

From: Luca Tettamanti
Date: Wednesday, September 22, 2010 - 1:41 pm

adapter->cmb.cmb is initialized when the device is opened and freed when
it's closed. Accessing it unconditionally during resume results either
in a crash (NULL pointer dereference, when the interface has not been
opened yet) or data corruption (when the interface has been used and
brought down adapter->cmb.cmb points to a deallocated memory area).

Cc: stable@kernel.org
Signed-off-by: Luca Tettamanti <kronos.it@gmail.com>
Acked-by: Chris Snook <chris.snook@gmail.com>
---
This time with mutt :)

 drivers/net/atlx/atl1.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/atlx/atl1.c b/drivers/net/atlx/atl1.c
index 63b9ba0..bbd6e30 100644
--- a/drivers/net/atlx/atl1.c
+++ b/drivers/net/atlx/atl1.c
@@ -2847,10 +2847,11 @@ static int atl1_resume(struct pci_dev *pdev)
 	pci_enable_wake(pdev, PCI_D3cold, 0);
 
 	atl1_reset_hw(&adapter->hw);
-	adapter->cmb.cmb->int_stats = 0;
 
-	if (netif_running(netdev))
+	if (netif_running(netdev)) {
+		adapter->cmb.cmb->int_stats = 0;
 		atl1_up(adapter);
+	}
 	netif_device_attach(netdev);
 
 	return 0;
-- 
1.7.1
--

From: David Miller
Date: Wednesday, September 22, 2010 - 1:53 pm

From: Luca Tettamanti <kronos.it@gmail.com>

Applied.
--

From: Luca Tettamanti
Date: Wednesday, September 22, 2010 - 1:42 pm

They are allocated in atl1_setup_ring_resources, zero out the pointers
in atl1_free_ring_resources (like the other resources).

Signed-off-by: Luca Tettamanti <kronos.it@gmail.com>
Acked-by: Chris Snook <chris.snook@gmail.com>
---
 drivers/net/atlx/atl1.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/net/atlx/atl1.c b/drivers/net/atlx/atl1.c
index bbd6e30..c73be28 100644
--- a/drivers/net/atlx/atl1.c
+++ b/drivers/net/atlx/atl1.c
@@ -1251,6 +1251,12 @@ static void atl1_free_ring_resources(struct atl1_adapter *adapter)
 
 	rrd_ring->desc = NULL;
 	rrd_ring->dma = 0;
+
+	adapter->cmb.dma = 0;
+	adapter->cmb.cmb = NULL;
+
+	adapter->smb.dma = 0;
+	adapter->smb.smb = NULL;
 }
 
 static void atl1_setup_mac_ctrl(struct atl1_adapter *adapter)
-- 
1.7.1
--

From: David Miller
Date: Wednesday, September 22, 2010 - 1:53 pm

From: Luca Tettamanti <kronos.it@gmail.com>

Applied.
--

Previous thread: [PATCH] net/cxgb3: remove undefined operations by Andreas Schwab on Saturday, September 11, 2010 - 4:12 am. (2 messages)

Next thread: Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device. by Michael S. Tsirkin on Sunday, September 12, 2010 - 6:37 am. (1 message)