Not sure how I should interpret this difference between dmesg output for 2.6.26 and 2.6.27-rc1 (after 'grep -i usb'): usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb USB Universal Host Controller Interface driver v3.0 uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 1 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 2 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 3 usb usb3: configuration #1 chosen from 1 choice hub 3-0:1.0: USB hub found uhci_hcd 0000:00:1d.3: new USB bus registered, assigned bus number 4 usb usb4: configuration #1 chosen from 1 choice hub 4-0:1.0: USB hub found +usb 3-1: new low speed USB device using uhci_hcd and address 2 ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 5 ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb5: configuration #1 chosen from 1 choice hub 5-0:1.0: USB hub found -usb 3-1: new low speed USB device using uhci_hcd and address 2 +usb 3-1: device not accepting address 2, error -71 +hub 3-0:1.0: unable to enumerate USB device on port 1 +usb 3-1: new low speed USB device using uhci_hcd and address 4 usb 3-1: configuration #1 chosen from 1 choice usb 4-1: new low speed USB device using uhci_hcd and address 2 usb 4-1: configuration #1 chosen from 1 choice usbcore: registered new interface driver hiddev input: USB Compliant Keyboard as /class/input/input0 input: USB HID v1.10 Keyboard [USB Compliant Keyboard] on usb-0000:00:1d.2-1 input: USB Compliant Keyboard as /class/input/input1 input: USB HID v1.10 Device [USB Compliant Keyboard] on usb-0000:00:1d.2-1 input: Logitech USB Receiver as /class/input/input2 input: USB HID v1.10 Mouse [Logitech USB Receiver] on ...
Hi, what is this device attached to this port ? Do you have the device attached before turning on the machine ? Does it happen if you boot without that device attached and attach it afterwards ? Can you please enable CONFIG_USB_DEBUG and resend dmesg output ? thanks, -- balbi --
Eh, that info is already trivially available from my original mail: uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 3 [...] input: USB Compliant Keyboard as /class/input/input0 input: USB HID v1.10 Keyboard [USB Compliant Keyboard] on usb-0000:00:1d.2-1 input: USB Compliant Keyboard as /class/input/input1 Yes, obviously the keyboard is attached during boot (and has been connected No idea. I could try this I guess if it will really provide useful info. Will include that in my reply to Alan's mail. Cheers, FJP --
I think Alan's reply is already good enough. It's basically port handover. -- balbi --
The difference is most likely meaningless; it is probably caused a No. It is caused by the EHCI controller being initialized. When an EHCI controller is initialized, it switches all the USB connections from its companion controllers to itself. Hence any device which _was_ connected to the UHCI controller will _now_ be connected to the EHCI controller. Consequently the UHCI controller is unable to communicate with the device and cannot enumerate it. After a short time, the EHCI controller driver realizes that the device can't run at high speed and switches it back to the UHCI controller. At that point it gets enumerated for the second time, successfully. The difference must have been that under 2.6.26 the EHCI controller was initialized _before_ the 3-1 device was detected, so there was no aborted attempt at enumeration. If you want to prevent all errors of this sort, all you have to do is insure that ehci-hcd is loaded before either uhci-hcd or ohci-hcd during system startup. Alan Stern --
Hi Alan, Meaningless, but annoying. Especially as such "errors" get through to the console when you boot with the 'quiet' boot parameter and thus draw more Thanks for the explanation. Attached boot messages (for -rc2) with usb debug enabled as suggested by Felipe Balbi. Messages have been extracted from kern.log (dmesg had already overflowed). If I filter out messages related to bus 3 (PCI 1d.2) what I see is: [...] ! usb 3-1: new low speed USB device using uhci_hcd and address 2 [ EHCI Host Controller gets initialized here for bus 5 ] ! usb 3-1: uhci_result_common: failed with status 440000 ! usb 3-1: device not accepting address 2, error -71 ! hub 3-0:1.0: unable to enumerate USB device on port 1 ! hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0002 ! hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0002 ! uhci_hcd 0000:00:1d.2: port 1 portsc 01a3,00 ! hub 3-0:1.0: port 1, status 0301, change 0001, 1.5 Mb/s ! hub 3-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x301 ! usb 3-1: new low speed USB device using uhci_hcd and address 4 [...] I guess that does match your description. I do find it a bit strange though that EHCI is allowed to grab bus 3 when Hmmm. Also not sure that I'm ready to agree with this conclusion. Shouldn't the kernel itself be smart enough to prevent error messages in apparently predictable and probably fairly common scenarios? To me this seems a lot like an issue discussed during the .25 cycle in http://lkml.org/lkml/2008/5/2/256 which was eventually fixed by you with 3a31155c Alan Stern, "USB: EHCI: suppress unwanted error messages" Cheers, FJP
Here's how it works. An EHCI controller contains a bank of switches, one for each port. By default, the switches are set so that each port connects to the companion UHCI (or OHCI) controller; that way you get USB-1.1 functionality even if ehci-hcd isn't loaded. But when the driver loads, it resets the switches so that the ports connect to the EHCI controller. There is no way for the driver to tell which ports have devices attached and which don't, so it has to reset all the switches. Thus, any device which was connected to the UHCI controller is now connected to the EHCI controller. As far as uhci-hcd is concerned, it appears that all the devices were suddenly plugged. As each device is enumerated, ehci-hcd determines whether it can run at high speed. If not, the corresponding switch is set so the device It follows directly from the description above. If ehci-hcd is loaded first then all the switches will be reset before any device has a chance to register the UHCI driver. Hence uhci-hcd will never see them It's somewhat difficult to synchronize activities between two different drivers, especially when they can be in separate modules (so that one might be present in memory and the other not). As for whether these messages really _should_ be suppressed... That depends on the circumstances. In your case, yes. But suppose for some reason ehci-hcd was loaded much later, at a time when the devices connected to the UHCI controller were already in use. In that case it seems reasonable to log some error messages when the devices stop That didn't involve cooperation between ehci-hcd and uhci-hcd. Alan Stern --
Urgh... What happens if it contains a mounted filesystem? Seems quite dangerous... BYtE, Diego. --
It still enumerating, you'll never have a mounted fs at this point :-) -- balbi --
It's no more dangerous than somebody unplugging the USB cable by mistake. :-) Alan Stern --
Sorry for the delay since the last message, but I needed to digest this a=20 bit and also did not see any point in degenerating the (so far very=20 useful) discussion into a yes-no-yes-no flamefest. Below some comments and an extra what looks to be real regression. Thanks a lot for the explanation Alan. I get the general idea and it all=20 sounds somewhat logical if you accept the fact that EHCI can be loaded at=20 any random time after [UO]HCI as a given, but _that_ still seems to me=20 (admittedly a relative outsider and not hindered by any actual technical=20 knowledge ;-) like something that is fundamentally broken in this=20 sequence. It also seems to be fragile in practice. I have now had two occasions=20 since your last mail where my system would come up with a dead USB=20 keyboard and it looks like this issue is the root cause. Attached a full diff between dmesg from two consecutive boots: first=20 without keyboard; after reboot the keyboard is detected. The actual=20 difference is fairly small and clearly shows that usb 3-1 is not handed=20 off correctly, probably due to a small difference in timing. I still feel it should not be up to individual users to need to "force"=20 something like this by manually messing with their initramfs or /etc/modules. If loading EHCI first is the right thing to do (and it seems= =20 to me like it is) then the kernel itself should ensure that that's what=20 =46rom an end-user PoV (which basically I am) I personally actually don't=20 think it is reasonable to have _any_ error messages in situations that=20 are expected and part of a "normal" boot sequence. For me, error messages=20 always indicate that something is wrong or broken and needs to be fixed=20 and followed up on. So, if this driver hand-off is really necessary,=20 expected and safe, it should be done with only informational messages,=20 not errors. Even in the case where ehci-hcd is loaded much later I don't think error=20 messages would be right. At least, ...
The arrangement certainly isn't perfect. Partly it's an historical artifact, arising from the way USB 2.0 controller hardware was "designed" to work with existing USB 1.1 devices. (I put "designed" in quotes because that's just what they didn't do -- they came up with a separate chip to handle the high-speed connections and left the It isn't any more fragile than unplugging the USB cable and then plugging it back in. If your system can't handle that sort of thing then something else is wrong. I.e., you've run across a bug, not a I can't tell exactly what's going on because your usbcore module wasn't built with CONFIG_USB_DEBUG enabled. Have you experimented with unloading and reloading uhci-hcd and ehci-hcd by hand (over the network if your only keyboard is USB)? If you remove both and then load uhci-hcd first followed by ehci-hcd, does The kernel has very little control over the order in which modules are loaded, partly because loading is carried out by programs like udev running in userspace and partly because there can be multiple threads sending out device-discovery messages in parallel. With UHCI and EHCI things are made even worse by the fact that UHCI is always discovered first. The EHCI spec requires that the companion controllers have the lowest PCI function numbers and the EHCI controller has the highest. You can see this in your log, where 1d.0 through 1d.3 are UHCI devices and 1d.7 is EHCI. Since PCI devices are probed in order of function number, the natural result is that uhci-hcd Well, that's a problem. The kernel _can't_ make that guarantee, not once some USB devices have been set up. So according to your reasoning, ehci-hcd shouldn't be allowed to load if uhci-hcd is already loaded! Can you suggest a reasonable method for suppressing the unwanted error messages? Maybe I'm too close to the problem, but nothing occurs to me. Part of the problem is that these errors could occur at any point during the life cycle of a USB ...
The fragile part IMO is that the kernel currently allows the loading of ehci to interrupt the initialization of uhci/ohci and *that* is what is causing the errors. I have run some tests loading ehci and uhci manually and when they are done separately (i.e. with a little delay between the two) there are no errors at all! If uhci is loaded first, you only get a nice, clean "USB disconnect" message (for devices already detected by uhci) when ehci is loaded. If ehci is loaded first the low-speed devices are only detected after uhci is loaded as well. The *only* time you get the "device not accepting address" and "unable to enumerate" errors is when you allow the ehci initialization to interrupt the uhci initialization. IMO that cannot be classified anything other than a bug. See also the attachments with dmesg output: - modprobe_uhci-ehci: uhci first; ehci later - modprobe_ehci-uhci: ehci first; uhci later Two problems: - CONFIG_USB_DEBUG causes such a huge load of output that it is totally unacceptable to have that enabled permanently for a running system - I cannot reproduce this issue on demand, even though I've tried with various delays between loading uhci and ehci Possibly with the new patches from Greg KH [1] it would be possible to disable USB debugging automatically when system boot is completed, but I'd have to build a kernel with those and wait for the problem to happen again. What I can see in the logs I do have is that in the error case for some reason a "reset low speed USB device" is triggered instead of either an "enumeration failure" or a "USB disconnect", which are what I normally see. As mentioned before, this seems to indicate to me a subtle timing difference between the boots and IMO confirms the danger of allowing the initialization of ehci to interrupt an ongoing initialization of uhci. My guess is that this "reset" is insufficient to cause the bus to be properly rescanned when ehci hands it back to uhci. I also guess ...
It doesn't interrupt just the initialization of UHCI/OHCI -- it interrupts all their activities. And I wouldn't say the kernel "allows" this to happen. More like it You have to be careful when making statements like that. All you know is that no error messages showed up _in the log_. But I'll bet there were plenty of errors. In your attached logs you tested with a keyboard and wireless mouse receiver. The USB HID driver is careful not to send error messages to the log as soon as it encounters I/O errors; instead it retries at various increasing intervals for a while before giving up. Long enough for the disconnect notification to be received. If you would use usbmon (see Documentation/usb/usbmon.txt) to record what those HID drivers actually send and receive when ehci-hcd is More accurately, you get exactly the same sequence of events as if you had unplugged the USB cable. Sometimes it's a nice clean disconnect message, and sometimes that message is preceded by a string of error messages. It depends on how the device is being used at the time. Do you have a USB flash drive? Try plugging that in and running That's hardly a surprise -- those messages are emitted by the device initialization code. So what you're saying is circular: The only time you get initialization errors is when you interrupt device It shouldn't do that. Almost all of the extra output occurs when devices are plugged in or unplugged; during normal operation there should be very few extra messages. (Depending on what USB drivers you There should always be a USB disconnect, unless the interruption Offhand I can't think what time that would be. Which is why having a debugging log would be a big help. Unlike those error messages you Linus made a similar suggestion (regarding a different problem) some time within the last year. After looking into the matter he realized that it is very difficult, effectively impossible, to tell at the USB level whether a ...
Just replying to one comment now, the rest will need more thought :-) Here's a snippet of what I got when I enabled USB debug recently: [...] Aug 6 22:06:23 faramir kernel: hub 2-0:1.0: hub_resume Aug 6 22:06:23 faramir kernel: hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0000 Aug 6 22:06:23 faramir kernel: usb usb1: usb auto-resume Aug 6 22:06:23 faramir kernel: usb usb1: wakeup_rh Aug 6 22:06:23 faramir kernel: hub 1-0:1.0: hub_resume Aug 6 22:06:23 faramir kernel: hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0000 Aug 6 22:06:24 faramir kernel: usb usb2: suspend_rh (auto-stop) Aug 6 22:06:24 faramir kernel: usb usb1: suspend_rh (auto-stop) Aug 6 22:06:25 faramir kernel: hub 5-0:1.0: hub_suspend Aug 6 22:06:25 faramir kernel: usb usb5: bus auto-suspend Aug 6 22:06:25 faramir kernel: ehci_hcd 0000:00:1d.7: suspend root hub Aug 6 22:06:25 faramir kernel: hub 2-0:1.0: hub_suspend Aug 6 22:06:25 faramir kernel: usb usb2: bus auto-suspend Aug 6 22:06:25 faramir kernel: usb usb2: suspend_rh Aug 6 22:06:25 faramir kernel: hub 1-0:1.0: hub_suspend Aug 6 22:06:25 faramir kernel: usb usb1: bus auto-suspend Aug 6 22:06:25 faramir kernel: usb usb1: suspend_rh Aug 6 22:06:28 faramir kernel: usb usb5: usb auto-resume Aug 6 22:06:28 faramir kernel: ehci_hcd 0000:00:1d.7: resume root hub Aug 6 22:06:28 faramir kernel: hub 5-0:1.0: hub_resume Aug 6 22:06:28 faramir kernel: hub 5-0:1.0: state 7 ports 8 chg 0000 evt fe00 Aug 6 22:06:28 faramir kernel: usb usb2: usb auto-resume Aug 6 22:06:28 faramir kernel: usb usb2: wakeup_rh Aug 6 22:06:28 faramir kernel: hub 2-0:1.0: hub_resume Aug 6 22:06:28 faramir kernel: usb usb1: usb auto-resume Aug 6 22:06:28 faramir kernel: usb usb1: wakeup_rh Aug 6 22:06:28 faramir kernel: hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0000 Aug 6 22:06:28 faramir kernel: hub 1-0:1.0: hub_resume Aug 6 22:06:28 faramir kernel: hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0000 Aug 6 22:06:29 faramir kernel: usb usb2: suspend_rh (auto-stop) Aug 6 ...
Maybe you haven't enabled anything special, but something sure is running. It appears to be probing your USB devices every 5 seconds. You might be able to find out what it is by adding printk(KERN_INFO "usbdev open by %s\n", current->comm); Turning off CONFIG_USB_SUSPEND would get rid of most but not all of those messages above. Alan Stern --
Thanks a lot. That identified the culprit: brltty (which I only have installed for some very rare testing with the Debian Installer). I'll file a bug in the Debian BTS as I know the Debian maintainer has very good contacts with the upstream developers. --
Here's the promised patch. Be warned, I haven't tested it at all.
Alan Stern
Index: usb-2.6/drivers/usb/core/hcd.h
===================================================================
--- usb-2.6.orig/drivers/usb/core/hcd.h
+++ usb-2.6/drivers/usb/core/hcd.h
@@ -486,4 +486,7 @@ static inline void usbmon_urb_complete(s
*/
extern struct rw_semaphore ehci_cf_port_reset_rwsem;
+/* This is also for use by khubd and ehci-hcd. */
+extern void usb_wait_for_khubd_idle(void);
+
#endif /* __KERNEL__ */
Index: usb-2.6/drivers/usb/core/hub.c
===================================================================
--- usb-2.6.orig/drivers/usb/core/hub.c
+++ usb-2.6/drivers/usb/core/hub.c
@@ -3097,6 +3097,16 @@ loop:
} /* end while (1) */
}
+/* Let ehci-hcd know when khubd is idle */
+static int khubd_active;
+static DECLARE_WAIT_QUEUE_HEAD(khubd_wqh);
+
+void usb_wait_for_khubd_idle(void)
+{
+ wait_event(khubd_wqh, !khubd_active);
+}
+EXPORT_SYMBOL_GPL(usb_wait_for_khubd_idle);
+
static int hub_thread(void *__unused)
{
/* khubd needs to be freezable to avoid intefering with USB-PERSIST
@@ -3107,7 +3117,11 @@ static int hub_thread(void *__unused)
set_freezable();
do {
+ khubd_active = 1;
hub_events();
+ khubd_active = 0;
+ wake_up_all(&khubd_wqh);
+
wait_event_freezable(khubd_wait,
!list_empty(&hub_event_list) ||
kthread_should_stop());
Index: usb-2.6/drivers/usb/host/ehci-hcd.c
===================================================================
--- usb-2.6.orig/drivers/usb/host/ehci-hcd.c
+++ usb-2.6/drivers/usb/host/ehci-hcd.c
@@ -604,7 +604,13 @@ static int ehci_run (struct usb_hcd *hcd
* guarantees that no resets are in progress. After we set CF,
* a short delay lets the hardware catch up; new resets shouldn't
* be started before the port switching actions could complete.
+ *
+ * In an attempt to avoid unwanted initialization error messages
+ * for devices attached to companion controllers, we will ...The patch seems to do exactly what I was looking for! Details further down, but first some other comments. My first boot with your patch was with brltty still enabled. The log showed the right result, but I wanted to get a cleaner log without all the USB suspend/resume noise caused by brltty. So I disabled brltty's init script and rebooted. This gave a totally unexpected result: ehci now got loaded and initialized _before_ uhci which of course avoids the problem altogether (see attached log: dmesg_ehci-first). So apparently brltty's init script was responsible not only for the suspend/resume noise, but also influenced the load order of the two USB drivers. And apparently when things are left alone ehci _will_ be loaded before uhci, but that does not match what you described in earlier mails as it means the PCI bus order isn't followed. Confusing... Anyway, this meant that to test your patch I had to try to get uhci to load first again. This turned out to be simple: just adding uhci_hcd in /etc/modules did the trick. The result is in dmesg_uhci-first. That log shows what was desired, at least 3-1 and 4-1 get disconnected cleanly and I get no error messages to the console. Yay! I added two extra printk's around the usb_wait_for_khubd_idle call to show the exact effect of the wait (look for "khubd_idle" in two messages). These seem to show what was wanted: both keyboard and mouse are detected during the wait. But it also shows a few events on bus 3 and 4 that happen after I get khubd_idle that seem to indicate the probe on that bus is not completely finished and that could maybe still cause "problems". But maybe I'm just reading it wrong and those events are part of the lead up to the disconnects for the hand-off. Anyway, I'm quite happy with the result, so: Tested-by: Frans Pop <elendil@planet.nl> Last note FYI. I had another case of "dead keyboard" this morning, again with the "reset low speed USB device" message in the log. ...
It _is_ confusing. There's so much going on at boot time, with tons of processes all trying to run at once... Small changes can have unexpectedly large effects. What I said before was that the UHCI controllers are discovered and registered before the EHCI controller. That's different from the order in which the drivers are loaded, however; driver loading is managed by the hotplug infrastructure in userspace, which is outside the kernel's control. If your drivers had been built into the kernel instead of as separate modules, then they would be probed in order of discovery -- which would mean ehci-hcd would _always_ come after uhci-hcd, not what you want. (Actually even this might not be true any more. There are patches in development which make certain parts of the kernel initialization, including driver probing, run in parallel threads rather than Both statements are correct. The initialization of the devices on buses 3 and 4 _was_ complete. But the drivers were loaded later, so driver probing and/or normal driver operation were in progress when ehci-hcd interrupted things. Initialization != probing. Remember what I said earlier, that EHCI initialization would interfere with _any_ ongoing operations on the companion controllers? Now you're I still regard this more as a band-aid than anything else. In fact, it should (in theory) be _safer_ to interrupt things during device initialization than later, after the driver has started up. So I'm not at all certain the patch should be merged. Tomorrow I'll try asking for comments from other people on the mailing list... Alan Stern --
No, that did go in for one kernel release and then ripped out, it turned out to not save any time and just cause lots more problems. So don't worry about this. thanks, greg k-h --
I got lucky on the second boot with -rc5 and USB_DEBUG enabled. Attached the output of dmesg. After this boot the keyboard was dead and the telltale "reset low speed USB device" message is present. Cheers, FJP
Okay, now I can tell what happened. It's kind of amusing. You ran across a side effect of some new code I added fairly recently. The new code was supposed to make device operation _more_ reliable! But in your case it backfired. :-( Here's the story. As you'd expect, ehci-hcd initialization interrupted the keyboard probing procedure. The interruption occurred just before the point where the keyboard was configured. As a result the configuration was not installed, but this isn't a fatal error and the kernel continued normally. It turns out that the EHCI init had finished and the keyboard's connection was handed back to the UHCI controller before the hub driver got around to noticing the disconnection. Here's where the new code came into play. The idea is to prevent transient errors from causing lasting effects -- if the hub driver sees that a device _was_ disconnected but after a hundred ms or so the same device appears to be plugged in to the same port as before, it resets the device and otherwise leaves it alone. In particular it does not treat the event as a regular disconnection. That's exactly what happened to you: EHCI init caused the keyboard to be disconnected from the UHCI controller, and it was reconnected again by the time the hub driver looked at it. That's why there was no "USB disconnect" message and there was a "reset low speed USB device" instead. The end effect was that the _same_ instantiation of the keyboard device remained present, and since it had never been configured, of course it didn't work. This odd scenario might provide a valid reason for accepting the new patch. Or on the other hand, it might provide a reason for removing the "ignore transient disconnect" changes. At this point I'm not sure what's best. Alan Stern --
