Re: 2.6.27-rc4: 90% system time because of khubd, unable to reboot

Previous thread: [PATCH, v2] PCI: create function symlinks in /sys/bus/pci/slots/N/ by Alex Chiang on Friday, August 22, 2008 - 9:20 am. (11 messages)

Next thread: [PATCH] Adding a maintainer for the BCM5974 multitouch driver by Henrik Rydberg on Friday, August 22, 2008 - 10:01 am. (2 messages)
From: Andrey Borzenkov
Date: Friday, August 22, 2008 - 9:26 am

2.6.26-rc3 is OK

top - 20:11:08 up 5 min,  1 user,  load average: 7.05, 4.50, 1.93
Tasks: 103 total,   3 running, 100 sleeping,   0 stopped,   0 zombie
Cpu(s):  8.3%us, 91.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    494172k total,   278192k used,   215980k free,    18264k buffers
Swap:   500432k total,        0k used,   500432k free,   159376k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  851 root      15  -5     0    0    0 R 65.6  0.0   3:21.16 khubd
 3847 bor       20   0 97960  26m  20m R 32.3  5.6   0:06.27 kontact

I am unable to reboot - khubd is apparently spinning on CPU preventing
it. System is using ohci_hcd.

00:00.0 Host bridge [0600]: ALi Corporation M1644/M1644T Northbridge+Trident [10b9:1644] (rev 01)
        Flags: bus master, medium devsel, latency 0
        Memory at f0000000 (32-bit, prefetchable) [size=64M]
        Capabilities: <access denied>
        Kernel driver in use: agpgart-ali
        Kernel modules: ali-agp

00:01.0 PCI bridge [0604]: ALi Corporation PCI to AGP Controller [10b9:5247] (prog-if 00 [Normal decode])
        Flags: bus master, slow devsel, latency 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        Memory behind bridge: f7f00000-fdffffff
        Prefetchable memory behind bridge: 48000000-480fffff

00:02.0 USB Controller [0c03]: ALi Corporation USB 1.1 Controller [10b9:5237] (rev 03) (prog-if 10 [OHCI])
        Subsystem: Toshiba America Info Systems Device [1179:0004]
        Flags: bus master, medium devsel, latency 64, IRQ 11
        Memory at f7eff000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: <access denied>
        Kernel driver in use: ohci_hcd
        Kernel modules: ohci-hcd

00:04.0 IDE interface [0101]: ALi Corporation M5229 IDE [10b9:5229] (rev c3) (prog-if f0)
        Subsystem: Toshiba America Info Systems Device [1179:0004]
        Flags: bus master, medium devsel, latency 64, IRQ 255
        [virtual] Memory at ...
From: Andrey Borzenkov
Date: Friday, August 22, 2008 - 9:53 am

reverting 38b375d9610e2467cb793a84d17c6f65e44cdb39 fixed it
From: Rafael J. Wysocki
Date: Friday, August 22, 2008 - 10:04 am

... that is:

commit 38b375d9610e2467cb793a84d17c6f65e44cdb39
Author: Alan Stern <stern@rowland.harvard.edu>
Date:   Mon Jul 21 09:56:26 2008 -0400

    USB: OHCI: fix system hang caused by earlier patch

    Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
    Tested by: Andrey Borzenkov <arvidjaar@mail.ru>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

so it apparently used to work for you at that time.  What gives?

Rafael
--

From: Andrey Borzenkov
Date: Friday, August 22, 2008 - 10:09 am

Well, you should not commit a fix without commiting code that has been
fixed first :)
From: Alan Stern
Date: Friday, August 22, 2008 - 10:39 am

Actually the code to be fixed _was_ committed first -- but then it was 
reverted before the fix was accepted, so the fix was merged without it.

My advice is not to worry about it.  That code has been sent once again
to Linus -- it's not merged yet but presumably it will be soon.  
Certainly before 2.6.27-rc5 appears.

On the other hand, I still have to wonder how the fix could have caused
your problem without the original patch in place.  The fix itself
should have been totally innocuous.

Alan Stern

--

From: Andrey Borzenkov
Date: Friday, August 22, 2008 - 10:57 am

It looks even funnier. Right now I am running with commits
38b375d9610e2467cb793a84d17c6f65e44cdb39 *and*
e872154921a6b5256a3c412dd69158ac0b135176 reverted. I.e. this should be
the state which hopelessly failed in 2.6.26-rc. It seems to be doing
quite well now in 2.6.27-rc.

"git revert e872154921a6b5256a3c412dd69158ac0b135176" gives me this
one liner patch:

commit f3cf9ad86ee76077d1c6be9af7d197aa13ccdff9
Author: Andrey Borzenkov <arvidjaar@mail.ru>
Date:   Fri Aug 22 21:15:26 2008 +0400

    Revert "USB: don't explicitly reenable root-hub status interrupts"

    This reverts commit e872154921a6b5256a3c412dd69158ac0b135176.

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 107e1d2..d30f822 100644
=2D-- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -3086,6 +3086,11 @@ static void hub_events(void)
                if (!hdev->parent && !hub->busy_bits[0])
                        usb_enable_root_hub_irq(hdev->bus);

+               /* If this is a root hub, tell the HCD it's okay to
+                * re-enable port-change interrupts now. */
+               if (!hdev->parent && !hub->busy_bits[0])
+                       usb_enable_root_hub_irq(hdev->bus);
+
 loop_autopm:
                /* Allow autosuspend if we're not going to run again */
                if (list_empty(&hub->event_list))

Either my git tree is completely botched or most parts were already reverted
before.

So the problem seems to have cured by itself between 2.6.26 and 2.6.27? =20
From: Alan Stern
Date: Friday, August 22, 2008 - 11:25 am

_Something_ is completely botched.  e872154 is much bigger than what 
you quoted above.

The commit you really want to revert is
09ca8adbe9f724a7e96f512c0039c4c4a1c5dcc0.

Alan Stern

--

From: Andrey Borzenkov
Date: Friday, August 22, 2008 - 11:27 pm

Sure. Mouse slipped doing copy'n'paste :)

If you are still interested in this strange effect of lone 38b375d9,
I could run some tests; just tell me what is needed.
From: Alan Stern
Date: Saturday, August 23, 2008 - 11:30 am

I'm not really concerned with theoretical intermediate states.  So long
as your system is okay with the final state and the actual intermediate 
versions of the kernel (other than the 2.6.26-rc form which we already 
know causes problems), then I'm happy.

Alan Stern

--

From: Rafael J. Wysocki
Date: Friday, August 22, 2008 - 11:29 am

Well, such things happened in the past.

I won't add this to the list of regressions for now, but please monitor the
status of -rc5 and let me know if the problem reappears in there.

Thanks,
Rafael

--

Previous thread: [PATCH, v2] PCI: create function symlinks in /sys/bus/pci/slots/N/ by Alex Chiang on Friday, August 22, 2008 - 9:20 am. (11 messages)

Next thread: [PATCH] Adding a maintainer for the BCM5974 multitouch driver by Henrik Rydberg on Friday, August 22, 2008 - 10:01 am. (2 messages)