hda_intel (sigmatel) defunct in mmotm 2008-09-13-03-09

Previous thread: [fbdev] Lockdep error by Dmitry Baryshkov on Monday, September 15, 2008 - 5:21 am. (3 messages)

Next thread: Re: Turning off camera also kills card reader on EeePC 900 by Sitsofe Wheeler on Monday, September 15, 2008 - 8:02 am. (2 messages)
From: Jiri Slaby
Date: Monday, September 15, 2008 - 7:00 am

Hi,

I've found my sound defunct in mmotm 2008-09-13-03-09 (opposing to
2008-09-10-19-39).

My debug shows:
HDA Intel 0000:00:1b.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
HDA Intel 0000:00:1b.0: setting latency timer to 64
azx_codec_create (1232): t=0, max=4, def=3, mask=4, probe_mask=ffffffff
snd_hda_codec_new A: ffff88007b6988f8
patch_stac927x A: 0
stac92xx_parse_auto_config A
stac92xx_parse_auto_config B
stac92xx_parse_auto_config E
stac92xx_parse_auto_config F
stac92xx_parse_auto_config G
stac92xx_parse_auto_config H
stac92xx_parse_auto_config L: 0
stac92xx_parse_auto_config M: 0
stac92xx_parse_auto_config O
stac92xx_parse_auto_config P
stac92xx_parse_auto_config Q
stac92xx_auto_create_spdif_mux_ctls: num_cons=5
patch_stac927x B: -22
snd_hda_codec_new E: -22
azx_codec_create (1239): c=2, err=-22
azx_codec_create (1251): cod=0, acod=0
hda-intel: no codecs initialized
HDA Intel 0000:00:1b.0: PCI INT A disabled


The problem lays in the newly added function:
stac92xx_auto_create_spdif_mux_ctls
The count (=5) is out of bounds.




The device is:
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller
(rev 02)
        Subsystem: Intel Corporation Optiplex 755
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 22
        Region 0: Memory at ffa70000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+ Count=1/1
Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v1) Root Complex ...
From: Matthew Ranostay
Date: Monday, September 15, 2008 - 7:16 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Jiri,

Yeah the problem is my bounds checking isn't correct, submitting a patch upstream shortly.

Thanks,

Matt Ranostay


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjObi8ACgkQ7s2wy7nhBHW4FgCdEHD/SGJYCArw8pIdsdFiXruT
piIAn13101Q+dkQnP6bcrLzSqRUUsNw4
=61nc
-----END PGP SIGNATURE-----
--

From: Matthew Ranostay
Date: Tuesday, September 16, 2008 - 7:43 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bug fix patch at:

http://mailman.alsa-project.org/pipermail/alsa-devel/2008-September/010767.html

Thanks,

Matt Ranostay



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjPxgAACgkQ7s2wy7nhBHULRQCeM0uqdR/rIPRHl+YhMKuZqIg7
NxEAoKQDzb2zw05UyY+i48Y2G65KZkoA
=CQnT
-----END PGP SIGNATURE-----
--

From: Jiri Slaby
Date: Wednesday, September 17, 2008 - 1:58 am

It works, thanks.

Another problem is when I switch IEC958 in alsamixer, I get:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
IP: [<ffffffffa0057d8a>] snd_hda_spdif_out_switch_put+0xca/0x170 [snd_hda_intel]
PGD 7a3ea067 PUD 7a378067 PMD 0
Oops: 0000 [1] SMP
last sysfs file: /sys/devices/virtual/net/tun0/type
Dumping ftrace buffer:
   (ftrace buffer empty)
CPU 0
Modules linked in: arc4 ecb cryptomgr aead crypto_blkcipher crypto_algapi ath5k
mac80211 hid_microsoft led_class usbhid rtc_cmos snd_hda_intel hid ohci1394
ieee1394 evdev cfg80211 ff_memless [last unloaded: freq_table]
Pid: 3831, comm: alsamixer Tainted: G        W 2.6.27-rc6-mm1_64 #452
RIP: 0010:[<ffffffffa0057d8a>]  [<ffffffffa0057d8a>]
snd_hda_spdif_out_switch_put+0xca/0x170 [snd_hda_intel]
RSP: 0018:ffff88007a3d5c68  EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000000f
RDX: ffff88007b410be0 RSI: 0000000000070d1e RDI: ffff88007bda9bb0
RBP: ffff88007a3d5ca8 R08: 0000000000000001 R09: 0000000000000001
R10: ffffffff8049083f R11: ffff880078cd4240 R12: ffff88007bd2f398
R13: 0000000000000001 R14: 0000000000000001 R15: 000000000000001e
FS:  00007fb6b40476f0(0000) GS:ffffffff806f2400(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000007a3cc000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process alsamixer (pid: 3831, threadinfo ffff88007a3d4000, task ffff88007a35b300)
Stack:  0000000101e33400 ffff88007bd2f568 001e88007a3d5cc8 ffff88007b51d1f0
 ffff880078cd4240 ffff88007bd2d5a0 ffff88007bd2d708 ffff88007b7720c8
 ffff88007a3d5cc8 ffffffff80493097 ffff88007b51d1f0 ffff880078cd4240
Call Trace:
 [<ffffffff80493097>] slave_put_val+0x37/0xc0
 [<ffffffff804933bc>] slave_put+0x5c/0x70
 [<ffffffff8048fdf9>] snd_ctl_elem_write+0x119/0x160
 [<ffffffff804908a5>] snd_ctl_ioctl+0x295/0x940
 ...
From: Matthew Ranostay
Date: Wednesday, September 17, 2008 - 7:09 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Ok, the problem was that codec->slave_dig_outs getting a dereference when NULL.
This has been already fixed in Takashi's sound-2.6.git tree, with the commit id
06efc354f735d5dbbdf4653fc6a8acd489ac5a18.

Thanks,

Matt Ranostay

- ---

# git log pci/hda/hda_codec.c
...
commit 06efc354f735d5dbbdf4653fc6a8acd489ac5a18
Author: Herton Ronaldo Krzesinski <herton@mandriva.com.br>
Date:   Sat Sep 13 16:44:29 2008 +0200

    ALSA: hda: fix oopses in snd-hda-intel after digital slave support additions

    Many places fail to check if codec has slave_dig_outs entries (the most common
    case is not having any entry), leading to various possible oopses in hda_codec
    code.

    Signed-off-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br>
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
...
# git diff af41e1693fdaa7a3e25cc8dd1ff3180f83487e5a 06efc354f735d5dbbdf4653fc6a8acd489ac5a18 pci/hda/hda_codec.c

diff --git a/sound/pci/hda/hda_codec.c b/sound/pci/hda/hda_codec.c
index c4fc0d4..0707a8e 100644
- --- a/sound/pci/hda/hda_codec.c
+++ b/sound/pci/hda/hda_codec.c
@@ -1457,14 +1457,15 @@ static int snd_hda_spdif_default_put(struct snd_kcontrol *kcontrol,
 					  AC_VERB_SET_DIGI_CONVERT_2,
 					  val >> 8);

- -		for (d = codec->slave_dig_outs; *d; d++) {
- -			snd_hda_codec_write_cache(codec, *d, 0,
+		if (codec->slave_dig_outs)
+			for (d = codec->slave_dig_outs; *d; d++) {
+				snd_hda_codec_write_cache(codec, *d, 0,
 					  AC_VERB_SET_DIGI_CONVERT_1,
 					  val & 0xff);
- -			snd_hda_codec_write_cache(codec, *d, 0,
+				snd_hda_codec_write_cache(codec, *d, 0,
 					  AC_VERB_SET_DIGI_CONVERT_2,
 					  val >> 8);
- -		}
+			}
 	}

 	mutex_unlock(&codec->spdif_mutex);
@@ -1502,8 +1503,9 @@ static int snd_hda_spdif_out_switch_put(struct snd_kcontrol *kcontrol,
 					  AC_VERB_SET_DIGI_CONVERT_1,
 					  val & 0xff);

- -		for (d = codec->slave_dig_outs; *d; d++)
- -			snd_hda_codec_write_cache(codec, *d, ...
From: Jiri Slaby
Date: Wednesday, September 24, 2008 - 2:42 am

ACKing this. Thanks.
--

From: Jiri Slaby
Date: Wednesday, September 17, 2008 - 9:46 am

Hi,

I've found suspend on my desktop defunct in mmotm 2008-09-13-03-09 (opposing to
2008-09-10-19-39).

Dmesg says:

uhci_hcd 0000:00:1d.1: PCI INT B disabled
pci_legacy_suspend(): usb_hcd_pci_suspend+0x0/0x3e0 returns -16
pm_op(): pci_pm_suspend+0x0/0xd0 returns -16
PM: Device 0000:00:1d.0 failed to suspend: error -16
PM: Some devices failed to suspend

$ ls -l /sys/bus/pci/devices/0000:00:1d.1/usb7/
-rw-r--r-- 1 root root  4096 17. zář 18.43 authorized
-rw-r--r-- 1 root root  4096 17. zář 18.43 authorized_default
-r--r--r-- 1 root root  4096 17. zář 10.51 bcdDevice
-rw-r--r-- 1 root root  4096 17. zář 10.51 bConfigurationValue
-r--r--r-- 1 root root  4096 17. zář 10.51 bDeviceClass
-r--r--r-- 1 root root  4096 17. zář 10.51 bDeviceProtocol
-r--r--r-- 1 root root  4096 17. zář 10.51 bDeviceSubClass
-r--r--r-- 1 root root  4096 17. zář 10.51 bmAttributes
-r--r--r-- 1 root root  4096 17. zář 18.43 bMaxPacketSize0
-r--r--r-- 1 root root  4096 17. zář 10.51 bMaxPower
-r--r--r-- 1 root root  4096 17. zář 10.51 bNumConfigurations
-r--r--r-- 1 root root  4096 17. zář 10.51 bNumInterfaces
-r--r--r-- 1 root root  4096 17. zář 18.43 busnum
-r--r--r-- 1 root root  4096 17. zář 10.51 configuration
-r--r--r-- 1 root root 65553 17. zář 18.43 descriptors
-r--r--r-- 1 root root  4096 17. zář 18.43 dev
-r--r--r-- 1 root root  4096 17. zář 10.51 devnum
lrwxrwxrwx 1 root root     0 17. zář 10.51 driver -> ../../../../bus/usb/drivers/usb
lrwxrwxrwx 1 root root     0 17. zář 18.43 ep_00 -> usb_endpoint/usbdev7.1_ep00
-r--r--r-- 1 root root  4096 17. zář 10.51 idProduct
-r--r--r-- 1 root root  4096 17. zář 10.51 idVendor
-r--r--r-- 1 root root  4096 17. zář 18.43 manufacturer
-r--r--r-- 1 root root  4096 17. zář 10.51 maxchild
drwxr-xr-x 2 root root     0 17. zář 18.43 power
-r--r--r-- 1 root root  4096 17. zář 18.43 product
-r--r--r-- 1 root root  4096 17. zář 18.43 quirks
-r--r--r-- 1 root root  4096 17. zář 10.51 serial
-r--r--r-- 1 root root  4096 17. zář 10.51 speed
lrwxrwxrwx 1 ...
From: Alan Stern
Date: Thursday, September 18, 2008 - 2:14 pm

The reason for this error is that the root hub wasn't already 
suspended.  Was there a USB device plugged into that controller?

Two days ago I ran 2.6.27-rc6 plus gregkh-all-2.6.27-rc6.patch from

http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/

on my laptop, and it suspended with no problem.  Can you try the same 
kernel?

If you still run into problems, post more of the dmesg log (with 
CONFIG_USB_DEBUG enabled, of course!).

Alan Stern

--

From: Jiri Slaby
Date: Wednesday, September 24, 2008 - 3:55 am

No:
/:  Bus 08.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
/:  Bus 07.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M <----
/:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
    |__ Port 1: Dev 2, If 0, Class=hub, Driver=hub/3p, 12M
        |__ Port 1: Dev 3, If 0, Class=HID, Driver=usbhid, 12M
        |__ Port 1: Dev 3, If 1, Class=HID, Driver=usbhid, 12M
        |__ Port 2: Dev 4, If 0, Class=HID, Driver=usbhid, 1.5M
/:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci_hcd/6p, 480M

Hm, it doesn't work. 2 dmesgs from mmotm 2008-09-10-19-39 (without usb debug)
[ok] and -rc6+gkh-all (with usb debug) [fail]:
http://decibel.fi.muni.cz/~xslaby/dmesg_fail.hcd
http://decibel.fi.muni.cz/~xslaby/dmesg_ok.hcd

--

From: Alan Stern
Date: Wednesday, September 24, 2008 - 7:49 am

In fact neither of those logs includes USB debugging messages.  Maybe
you omitted a mkinitrd step.

Alan Stern

--

From: Jiri Slaby
Date: Wednesday, September 24, 2008 - 9:04 am

I don't think so. I don't use neither vanilla nor initrd so this must be the
kernel. Anyway I recompiled with .config linked in and did
# dmesg >dmesg_config.hcd
# zcat /proc/config.gz >>dmesg_config.hcd
in the built kernel. The file is here:
http://decibel.fi.muni.cz/~xslaby/dmesg_config.hcd
--

From: Alan Stern
Date: Wednesday, September 24, 2008 - 9:38 am

Still no debugging messages.  It's because you have 
CONFIG_DYNAMIC_PRINTK_DEBUG enabled.

Alan Stern

--

From: Greg KH
Date: Wednesday, September 24, 2008 - 11:14 am

If you have that enabled, you can turn on debugging "on the fly" by
writing to the debugfs file as described in the documentation for that
option.

thanks,

greg k-h
--

From: Jiri Slaby
Date: Wednesday, September 24, 2008 - 11:19 am

I see, I need to boot to that kernel anyway, with dynamic_printk param this time.

Thanks.
--

From: Jiri Slaby
Date: Wednesday, September 24, 2008 - 11:54 am

It didn't work as I expected, nevermind, here comes updated dmesg:
http://decibel.fi.muni.cz/~xslaby/dmesg_fail.hcd

I seem to overlooked the pci identification before. There are my mouse and kbd
connected to that hub behind another hub.
Attaching lsusb -tv again:
/:  Bus 08.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
/:  Bus 07.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
/:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
    |__ Port 1: Dev 2, If 0, Class=hub, Driver=hub/3p, 12M
        |__ Port 1: Dev 3, If 0, Class=HID, Driver=usbhid, 12M
        |__ Port 1: Dev 3, If 1, Class=HID, Driver=usbhid, 12M
        |__ Port 2: Dev 4, If 0, Class=HID, Driver=usbhid, 1.5M
/:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci_hcd/6p, 480M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci_hcd/6p, 480M
Full lsusb -vvv version at:
http://decibel.fi.muni.cz/~xslaby/lsusb.hcd
--

From: Alan Stern
Date: Wednesday, September 24, 2008 - 2:29 pm

Got it.  I found the problem, and there is a patch below for you to 
test.

Rafael, since the type now has a complete pm_ops structure, is there
any reason we shouldn't use its suspend/resume methods along with
the bus's and the class's?  When writing the USB updates I assumed that
would be the case -- and as a result we now have a regression which
this patch should fix.

(Also, is there any reason why the pm_dev_dbg() shouldn't go inside 
pm_op()?  It precedes every call of that routine.)

If everyone is okay with it, I'll submit the patch with a proper 
Changelog and Signed-off-by.

Alan Stern



Index: usb-2.6/drivers/base/power/main.c
===================================================================
--- usb-2.6.orig/drivers/base/power/main.c
+++ usb-2.6/drivers/base/power/main.c
@@ -107,7 +107,7 @@ void device_pm_remove(struct device *dev
 }
 
 /**
- *	pm_op - execute the PM operation appropiate for given PM event
+ *	pm_op - execute the PM operation appropriate for given PM event
  *	@dev:	Device.
  *	@ops:	PM operations to choose from.
  *	@state:	PM transition of the system being carried out.
@@ -166,7 +166,7 @@ static int pm_op(struct device *dev, str
 }
 
 /**
- *	pm_noirq_op - execute the PM operation appropiate for given PM event
+ *	pm_noirq_op - execute the PM operation appropriate for given PM event
  *	@dev:	Device.
  *	@ops:	PM operations to choose from.
  *	@state: PM transition of the system being carried out.
@@ -363,6 +363,13 @@ static int resume_device(struct device *
 			goto End;
 	}
 
+	if (dev->type && dev->type->pm) {
+		pm_dev_dbg(dev, state, "type ");
+		error = pm_op(dev, dev->type->pm, state);
+		if (error)
+			goto End;
+	}
+
 	if (dev->class) {
 		if (dev->class->pm) {
 			pm_dev_dbg(dev, state, "class ");
@@ -596,6 +603,13 @@ static int suspend_device(struct device 
 			goto End;
 	}
 
+	if (dev->type && dev->type->pm) {
+		pm_dev_dbg(dev, state, "type ");
+		error = pm_op(dev, dev->type->pm, state);
+		if ...
From: Jiri Slaby
Date: Thursday, September 25, 2008 - 1:51 pm

Confirming, it works now!
--

From: Alan Stern
Date: Thursday, September 25, 2008 - 2:07 pm

This patch (as1141) adds code to use the device type's pm_op methods,
if they are defined.  It fixes a regression in the USB PM code; the
various suspend and resume methods are defined in the device type
rather than in the bus, because USB devices have to be handled
differently from USB interfaces.  Without the patch, those methods
never get called.

The patch also fixes a couple of spelling errors.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Jiri Slaby <jirislaby@gmail.com>

---

This should be merged before 2.6.27 is released, if possible.  
Otherwise people will find their systems refuse to suspend when any USB
devices are attached.



Index: usb-2.6/drivers/base/power/main.c
===================================================================
--- usb-2.6.orig/drivers/base/power/main.c
+++ usb-2.6/drivers/base/power/main.c
@@ -107,7 +107,7 @@ void device_pm_remove(struct device *dev
 }
 
 /**
- *	pm_op - execute the PM operation appropiate for given PM event
+ *	pm_op - execute the PM operation appropriate for given PM event
  *	@dev:	Device.
  *	@ops:	PM operations to choose from.
  *	@state:	PM transition of the system being carried out.
@@ -166,7 +166,7 @@ static int pm_op(struct device *dev, str
 }
 
 /**
- *	pm_noirq_op - execute the PM operation appropiate for given PM event
+ *	pm_noirq_op - execute the PM operation appropriate for given PM event
  *	@dev:	Device.
  *	@ops:	PM operations to choose from.
  *	@state: PM transition of the system being carried out.
@@ -363,6 +363,13 @@ static int resume_device(struct device *
 			goto End;
 	}
 
+	if (dev->type && dev->type->pm) {
+		pm_dev_dbg(dev, state, "type ");
+		error = pm_op(dev, dev->type->pm, state);
+		if (error)
+			goto End;
+	}
+
 	if (dev->class) {
 		if (dev->class->pm) {
 			pm_dev_dbg(dev, state, "class ");
@@ -596,6 +603,13 @@ static int suspend_device(struct device 
 			goto End;
 	}
 
+	if (dev->type && dev->type->pm) {
+		pm_dev_dbg(dev, state, "type ...
From: Rafael J. Wysocki
Date: Thursday, September 25, 2008 - 2:27 pm

Hm, these changes are not needed in the current mainline, so there's a patch
in -next that removes the code added by this patch.



--

From: Greg KH
Date: Thursday, September 25, 2008 - 3:06 pm

I agree, I think we would have seen more bugs if mainline can't suspend
with a USB device attached, right?

confused,

greg k-h
--

From: Alan Stern
Date: Friday, September 26, 2008 - 6:56 am

Okay, I was confused too.  Looking more closely, it's apparent that 
mainline is okay and the problem was introduced by Hannes Reinecke's 

driver-core-remove-suspend-resume-callbacks-for-device-type.patch

which states that the suspend/resume callbacks in struct device_type
are unused.  It may be true that the legacy suspend/resume methods are
unused, but the new pm_ops methods definitely are used.

Therefore part or all of Hannes patch should be reverted.  And the 
mainline is okay as it stands.

Alan Stern

--

From: Greg KH
Date: Friday, September 26, 2008 - 10:59 am

Ah, ick.

Hannes, care to respin this patch based on this information?

thanks,

greg k-h
--

From: Pavel Machek
Date: Monday, October 6, 2008 - 8:11 am

I guess Hannes patch shoud simply be dropped....

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

Previous thread: [fbdev] Lockdep error by Dmitry Baryshkov on Monday, September 15, 2008 - 5:21 am. (3 messages)

Next thread: Re: Turning off camera also kills card reader on EeePC 900 by Sitsofe Wheeler on Monday, September 15, 2008 - 8:02 am. (2 messages)