Hi,
I've just started testing 2.6.26 on my desktop machine and have found that I get
a long pause (20 seconds or so) as the system boots. By pause I mean that the
boot messages that normally scroll by stop doing so for, as I say, about 20
seconds.I've done a bisect and arrived at:
b0ed43360fdca227048d88a08290365cb681c1a8 is first bad commit
commit b0ed43360fdca227048d88a08290365cb681c1a8
Author: Hannes Reinecke <hare@suse.de>
Date: Tue Mar 18 14:32:28 2008 +0100[SCSI] add scsi_host and scsi_target to scsi_bus
This patch implements scsi_host and scsi_target device types
and adds both to the scsi_bus.Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
46e4fd28cc537ba700eb4f5652abf043fed4d76d M driversThe config file for that kernel is attached as are the kernel log for booting
that kernel and the bisect log.I've also noticed that it takes much longer for icons to appear on my desktop
when I plug a usb storage device in. (The icons are shown/hidden by a simple
sctript that is run via udev when the device is attached/detached.)Let me know of any way I can help solve this.
Please cc me to any reply as I'm not subscribed.
Chris
--
Beauty is in the eye of the beerholder.
cc's added.
Rafael, we might need to track this as a post-2.6.26 regression.
--
We're tracking it as http://bugzilla.kernel.org/show_bug.cgi?id=10861 .
Thanks,
Rafael
--
Thanks for the reply, Andrew. I was beginning to think that I'd committed an
LKML sin :-)Yes, it is about the same. On 2.6.25.4 (and earlier), the icons appear within a
second or two, but with 2.6.26-rc2 or later, it takes about 20 seconds. I've
now built and installed -rc5 and am seeing the same behaviour.I've also built -rc5 on my laptop. On that my kernel uses the older IDE drivers
rather than libata. The boot proceeds with no pause, but I do see the same--
Beauty is in the eye of the beerholder.
--
It already is ... as I asked in the actual bug (but no-one seems to have
noticed, so I assume the bug entry isn't set up correctly). Does this
still persist with 2.6.26-rc5? The reason for asking is that it looks
very much like a silent manifestation of our scsi_bus_uevent() problem
that got fixed in 2.6.26-rc5.James
--
Yes, I still get the boot pause and the delay in desktop icons appearing when a
usb-storage device is plugged in with -rc5. A boot log is attached. Is there
some sort of verbose scsi boot messaging that I can switch on? I've looked but
I'm damned if I can find anything in 'make menuconfig'.--
Beauty is in the eye of the beerholder.
SCSI_CONSTANTS and SCSI_LOGGING might help.
Both live in the SCSI options menu. Read the Kconfig help on how
to use SCSI_LOGGING.Can you please also enable CONFIG_PRINTK_TIME ?
Using the timestamps in /var/log/messages is ok but not the most
reliable place to get somewhat accurate time stamps of events.hth,
grant
--
I've enabled CONFIG_PRINTK_TIME and will look into the SCSI options.
--
Beauty is in the eye of the beerholder.
--
The bug reporter has no Bugzilla account, so I couldn't add him to the CC list
of the bug.Thanks,
Rafael
--
I do have a bugzilla account, but for some reason I couldn't log in to it to
report this bug, so I mailed LKML to get the ball rolling. I can now log in so
I've added myself to the CC list.--
Beauty is in the eye of the beerholder.
--
Well you did - you reported a bug.
<watches wodim wibble for five minutes then report "CD/DVD-Recorder
OK that makes sense. ata and usb use the scsi layer. IDE does not.
--
It goes much better when the CD is inserted upside up.
--
So I can close this regression if I send you a pack of CD ring labels
with "other way up" on them for you to affix to the underside of your
CDs ... ?James
--
Would prefer double-sided CDs.
Is this 20-second-delay thing known-about/expected/etc?
--
I just discovered my budget only runs to a permanent marker for you to
If it's not the scsi_bus_uevent problem, then no, it isn't.
The odd thing is that it occurs in the middle of USB initialisation from
Note the jump from 19 to 44 seconds.
I'm a bit baffled as to how that could be SCSI related, but I suppose it
could be udev related somehow. Perhaps turning on driver core debugging
might help (that's CONFIG_DEBUG DRIVER ... it depends on
CONFIG_DEBUG_KERNEL for activation). Hopefully the verbose kernel logs
from here might tell us what's going on in the lacunae.James
--
--Boundary-00=_NG3SIU63+Zdv2JV
Content-Type: text/plain;
charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inlineBecause of this oddity, I've also built 2.6.26-rc5-git2 with verbose USB
debugging on. A boot log from that kernel is attached. You will see lots of
ETIME errors from usb. A log from a similarly verbose 2.6.25.5 kernel does notTriggered by James' bafflement and for my own peace of mind, I've run another
bisect session. This time I started with v2.6.26 as good and v2.6.26-rc2 as bad.
It highlighted the same patch as the "culprit".--
Beauty is in the eye of the beerholder.
In fact the log shows that the 30-second delay occurs in the middle of
ohci1394 (Firewire) initialization, not USB initialization:Jun 8 05:26:35 upstairs kernel: ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[22] MMIO=[fbfff000-fbfff7ff] Max Packet=[2048] IR/IT contexts=[4/8]
Jun 8 05:26:35 upstairs kernel: ohci_hcd 0000:02:0a.0: urb f71cb680 path 2 ep1in 5e160000 cc 5 --> status -62
Jun 8 05:26:35 upstairs kernel: hub 1-2:1.0: transfer --> -62
Jun 8 05:27:08 upstairs kernel: driver: '0000:02:0b.0': driver_bound: bound to device 'ohci1394'
Jun 8 05:27:08 upstairs kernel: bus: 'pci': really_probe: bound device 0000:02:0b.0 to driver ohci1394Those -62 errors from ohci_hcd are red herrings. To test this, try
booting with "nousb".Alan Stern
--
Ah sorry, that's a legacy of when I had considerably less ram than 2GB. I only
use firewire on the rare occasion that I want to grab some video from my camera
and I use the parallel port even less frequently. So one of the last thingsWhen I was bisecting, I assessed good or bad based on whether I experienced the
long pause during boot up. You may have seen my reply to James a short while
ago that I have bisected again and arrived at the same patch.Thanks,
Chris
--
Beauty is in the eye of the beerholder.
--
On an unrelated note: If the driver was automatically loaded in a
hotplug-like fashion (based on PCI IDs and module aliases), you could
prevent it with "blacklist ohci1394" in one of the modprobe
configuration files.
--
Stefan Richter
-=====-==--- -==- -=-=-
http://arcgraph.de/sr/
--
It's also worth noting that the first 17:25:44 message is highly
suspicious. It is the initial output of the ohci-hcd driver,
indicating that the udev/hotplug system had just loaded that driver.
This doesn't seem to have anything to do with the SCSI stack.Whatever caused the delay, it also managed to pause the other hotplug
task which was in the midst of loading uhci-hcd.Alan Stern
--
Sorry folks, I'm unexpectedly away from the computers that these
"problems" occur on.I should be back at home tomorrow and will answer you all as quickly
as possible then.Thanks,
--
Beauty is in the eye of the beerholder.
--
I'm starting to think this might be a distribution problem. I tried
booting my laptop on your config (only system I have with ata_piix)
without an initrd (bit painful for fedora 9) and I didn't see any slow
downs (in fact it was a lot faster than previously; makes note to file
bug with redhat about their initramfs).The one thing the patch you identified does is to add the SCSI target to
the BUS and type model for the generic device. What this does is to add
extra uevents for the target. If your hotplug handlers aren't set up to
expect this, they could be delaying the system progress.Actually, that's not really necessary. Bisection is taking you back
through kernel history, so unless the bug has two separate causes, it's
pretty likely to home in on the same section of code.James
--
[Apologies to copy recipients - I accidentally replied to only James earlier
this evening. I've enclosed the text of my reply in square brackets at the
relevant places below, followed, where appropriate, by my findings from
tonight's investigations][OK, I'll start investigating this line. My desktop system was originally (the
now lapsed) Peanut Linux, which was a Slackware derivative. But I've been
running it for 3 or 4 years, upgrading and adding packages along the way. My
udev setup is based on that in a fairly recent Slackware, although I can't now
recall precisely which version.]For the time being I'll try and isolate what it is that causes the delay in
icons appearing on my desktop when I plug in a usb-storage device. The script
that shows the icon is invoked by adding the script name to udev's run list.
Here is a typical rule from /etc/udev/rules.d/20-local.rules:ACTION=="add", KERNEL=="sd[a-z][0-9]", SYSFS{idProduct}=="6545",
SYSFS{idVendor}=="0930", SYMLINK+="usbstickc%n", MODE="0666",
RUN+="/usr/bin/udevshowicon.sh"I've added a couple of calls to logger (from util-linux-2.12r) to the script.
What this has shown is that, when I plug in a usb-storage device, there seems
to be a delay between the usb/scsi/usb-storage drivers setting up the device
and my icon-showing script being run. Here is the snippet from the kernel log
file:Jun 10 20:27:45 upstairs kernel: [ 3200.592107] hub 3-0:1.0: state 7 ports 5 chg
0000 evt 0004
Jun 10 20:27:45 upstairs kernel: [ 3200.592107] ehci_hcd 0000:02:0a.2: GetStatus
port 2 status 001803 POWER sig=j CSC CONNECT
Jun 10 20:27:45 upstairs kernel: [ 3200.592107] hub 3-0:1.0: port 2, status
0501, change 0001, 480 Mb/s
Jun 10 20:27:45 upstairs kernel: [ 3200.695130] hub 3-0:1.0: debounce: port 2:
total 100ms stable 100ms status 0x501
Jun 10 20:27:45 upstairs kernel: [ 3200.746270] ehci_hcd 0000:02:0a.2: port 2
high speed
Jun 10 20:27:45 upstairs kernel: [ 3200.746279] ehci_hcd 0000:02:0a.2: GetStatus
port...
Hi everyone,
I've done that poking around and found something that to me looks interesting.
Before plugging in a usb stick, I changed udev's log_priority to "debug" (using
udevcontrol). Attached is the snippet from /var/log/messages that shows all the
diagnostics emitted by udev when the stick was plugged in. (This is with
2.6.26-rc5-git5). You will see that there are several instances of a message
about udev unsuccesfully waiting for a file ioerr_cnt to appear deep down in
the guts of /sys. Eventually, it does appear, but only after 21 seconds have
elapsed. On 2.6.25.6, the file appears almost as soon as udev starts looking
for it. The log extract showing this is also attached as is an extract from the
kernel log for 2.6.26-rc5-git5.By the way, the main udev-rules file that is installed is the one for slackware
that comes with the udev package from kernel.org. So the file these tests were
run with are that from udev-122, which I upgraded to earlier this evening (more
in hope than expectation :-)--
Beauty is in the eye of the beerholder.
This doesn't look good. It looks like a udev rule in your system is
looking for a file that will never show up. That's a short way to a
long delay time :)Try commenting out the rule that does this and see if things are fixed.
thanks,
greg k-h
--
Yes, that's it. It's this rule, which appears as the second rule in the
slackware rules file that's distributed with udev (and it's been there since
udev-97 in August 2006):ACTION=="add", SUBSYSTEM=="scsi", WAIT_FOR_SYSFS="ioerr_cnt"
Commenting that rule out eliminates the pause in the boot and in the appearance
of icons on my desktop.Thanks,
--
Beauty is in the eye of the beerholder.
--
FWIW, Gentoo's udev 119 has this:
$ grep -r ioerr_cnt /etc/udev/
/etc/udev/rules.d/05-udev-early.rules:ACTION=="add",
KERNEL=="[0-9]*:[0-9]*", SUBSYSTEM=="scsi", WAIT_FOR_SYSFS="ioerr_cnt"Whatever this rule is good for... Maybe the pattern should be
"[0-9]+(:[0-9]+){3}" to match logical units only.
--
Stefan Richter
-=====-==--- -==- -=-==
http://arcgraph.de/sr/
--
Actually no; looks like these are just glob(7) patterns. Then the
pattern might already be OK since host devices' and target devices'
names don't start with decimal numbers.
--
Stefan Richter
-=====-==--- -==- -=-==
http://arcgraph.de/sr/
--
For Slackware 12.0.0, the rule is:
ACTION=="add", KERNEL=="[0-9]*:[0-9]*", SUBSYSTEM=="scsi",
WAIT_FOR_SYSFS="ioerr_cnt"Using that rule instaed of the old one in the Slackware rules file included with
udev appears to result in the same behaviour as with the old rule and kernel <
2.6.26.Unless someone warns me otherwise, I think I'll go with that on my (homebrewed)
system, so thanks to everyone for your help.Chris
--
Beauty is in the eye of the beerholder.
--
It's called a workaround, not a fix.
--
Stefan Richter
-=====-==--- -==- -=-==
http://arcgraph.de/sr/
--
Actually, there's something seriously wrong here: ioerr_cnt is a
property of the device, not of the host (as in it will never appear
under .../host4 but under .../host4/targetX:X:X/4:X:X:X/ioerr_cnt).
Perhaps an investigation of why udevd-event is looking under host4/ is
in order.James
--
I blame a badly written udev rule :)
thanks,
greg k-h
--
So the change which commit b0ed43360fdca227048d88a08290365cb681c1a8
"[SCSI] add scsi_host and scsi_target to scsi_bus" introduced is leading
the udev scripts onto a wrong trail?(This commit was the one which Chris found by bisection, see
http://lkml.org/lkml/2008/6/1/2)
--
Stefan Richter
-=====-==--- -==- -=-==
http://arcgraph.de/sr/
--
Yes, I think so ... what it does is make us get bus events for the
target as well as the lun, whereas without it we only get them for the
lun. I think something is misparsing the new target event and this is
where the trouble is coming from.James
--
Hi James,
I assume that since we now the source of this problem, you can create it at
will. I only mention because if you need me to test a patch, it will have to be
before Tuesday, June 17 - I go on holiday (vacation) on that day.Thanks
--
Beauty is in the eye of the beerholder.
--
Yes, we can; thanks!
The problem though is that it's in the udev scripts, so there's nothing
we can fix in the kernel, we now need to persuade the various distros to
fix it (if they're actually broken), so it might take a while.James
--
...
You can eliminate this 5-second delay by adding
options usb-storage delay_use=0
to /etc/modprobe.conf. Of course, this will make the other delays just
stand out that much more sharply...Alan Stern
--
What distro is this being seen on? I know that openSUSE 11.0 betas did
have a bug in their init scripts that was causing a long delay, much
like this one. But it has been fixed in the later releases
(hopefully...)thanks,
greg k-h
--
As I repied to James a few minutes ago, my desktop system was originally (the
now lapsed) Peanut Linux, which was a Slackware derivative. But I've been
running it for 3 or 4 years, upgrading and adding packages along the way. My
udev setup is based on that in a fairly recent Slackware, although I can't now
recall precisely which version.Thanks
--
Beauty is in the eye of the beerholder.
--
OK, here's a boot log with driver core debugging turned on. Hope it helps!
(Oh, and thanks for adding the word lacunae to my vocabulary. I'd never com=
e=20
across it before.)=2D-=20
Beauty is in the eye of the beerholder.
| Artem Bityutskiy | [PATCH 12/44 take 2] [UBI] allocation unit implementation |
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| Jeff Garzik | Re: [RFC] Heads up on sys_fallocate() |
| Christoph Hellwig | pcmcia ioctl removal |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| David Miller | Re: [BUG] New Kernel Bugs |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
