"It turns out that USB devices suck when it comes to powermanagement issues :(" lamented Greg KH in posting some patches to handle USB autosuspend problems. He noted that the patches were intended for inclusion in the upcoming 2.6.23 kernel, "a number of patches have been submitted near the end of this kernel release cycle that add new device ids to the quirk table in the kernel to disable autosuspend for specific devices. However, a number of developers are very worried that even with the testing that has been done, once 2.6.23 is released, we are going to get a whole raft of angry users when their devices break in nasty ways." He proved an example, "it seems that almost 2/3 of all USB printers just can not handle autosuspend. And there's a _lot_ of USB printers out there..."
Later in the discussion, Linux creator Linus Torvalds commented, "in general, I think the USB blacklist/whitelists are generally a sign of some deeper bug." He continued on to point out a number of quirks in the USB layer that need to be addressed and added:
"We used to have a lot of those things due to simply incorrect SCSI probing, causing devices to lock up because Linux probed them with bad or unexpected modepages etc. I suspect we still have old blacklist entries from those days that just never got cleaned up, because nobody ever dared remove the blacklist entry.
"We should strive to make the default behaviour be so safe that we never need a black-list (or a whitelist), and basically consider blacklists to be not a way to 'fix up a device', but a way to avoid some really serious AND *RARE* error."
From: Linus Torvalds [email blocked]
Subject: Re: [GIT PATCH] USB autosuspend fixes for 2.6.23-rc6
Date: Thu, 13 Sep 2007 09:43:13 -0700 (PDT)
On Thu, 13 Sep 2007, Alan Stern wrote:
>
> But mainly it's a question of maintenance and modification. Kernel
> developers don't really enjoy maintaining black- or whitelists of
> devices, together with all the work involved in sorting through the
> issues when somebody posts an email saying "My device doesn't work!".
Yeah.
In general, I think the USB blacklist/whitelists are generally a sign of
some deeper bug.
We used to have a lot of those things due to simply incorrect SCSI
probing, causing devices to lock up because Linux probed them with bad or
unexpected modepages etc. I suspect we still have old blacklist entries
from those days that just never got cleaned up, because nobody ever dared
remove the blacklist entry.
We should strive to make the default behaviour be so safe that we never
need a black-list (or a whitelist), and basically consider blacklists to
be not a way to "fix up a device", but a way to avoid some really serious
AND *RARE* error.
The moment you have lots of devices having the same blacklist entry,
that's a sign that the blacklist is wrong, and the subsystem itself is
likely doing something bad!
So, I would seriously suggest:
- look at USB quirks that have more than ten entries (and the entries
aren't just the exact same device in various guises)
- start considering that feature to be something that is known broken,
and shouldn't be done AT ALL by default.
- have some way to enable some extension on a device-by-device basis from
the /sysfs interface, and then users can enable those things on their
own with a graphical interface or something (or using whitelists in
user space saying "ok, this device can actually do this")
- REMOVE THE DAMN QUIRK
It looks like the current situation now is that the latest autosuspend
patches did basically everything but the last point.
Btw, this is in no way just an AUTOSUSPEND issue. The USB layer has a
*lot* of these quirks. They are often called "UNUSUAL_DEV()", but the
thing is, some of those things seem to be so usual that the naming is
dubious, and thus calling it a "quirk" or "unusual" is pretty dubious too.
For example, why do we have that US_FL_MAX_SECTORS_64 at all? The fact
that some USB device is broken with more than 64 sectors would seem to
indicate that Windows *never* does more than 64 sectors, and that in turn
means that pretty much *no* devices have ever been tested with anything
bigger.
So why not make the 64 sector limit be the default? Get rid of the quirk:
we already allow people to override it in /sys if they really want to, but
realistically, it's probably not going to make any difference what-so-ever
for *any* normal load. So we seem to have a quirk that really doesn't buy
us anything but headache.
Other quirks worth looking at (but likely unfixable) are:
- US_FL_IGNORE_RESIDUE:
Does this really matter? Can we not just always do the
US_FL_IGNORE_RESIDUE thing? Windows must not be doing what we're
doing.
- US_FL_FIX_CAPACITY:
This is a generic SCSI issue, not a USB one, and maybe there are
better solutions to it. Are we perhaps doing something wrong? Is
there some patterns we haven't seen? Why do we need this, when
presumably Windows does not?
- US_FL_SINGLE_LUN:
At least a few of these seem to indicate that the real problem
could be detected dynamically ("device reports Sub=ff") rather
than with a quirk. Quirks are unmaintainable (and change), but
noticing when devices return impossible values and going into a
"safe mode" is just defensive programming.
> Maybe you're concerned about propagating updates as painlessly as
> possible -- if the whitelist is in the kernel then every kernel release
> would include an update. But in userspace it's possible to do updates
> even more quickly and painlessly. For example, there could be a
> network server available for both interactive lookups and automatic
> queries from HAL.
For a lot of these things, you probably do not need a whitelist *at*all*!
IOW, just default to something safe (the 64 sector example), and then
perhaps allow people to explicitly play with their settings in a hardware
manager. People actually tend to *like* being able to tweak meaningless
things, and it makes them feel in control. So you'd have the Gentoo people
who want to optimize their iPod access times by 0.2% by raising the
maximum sector number - good for them! They'll feel empowered, and if it
stops working, they know it was because of something *they* did.
So at least in some cases, I think we should "default to stupid, but give
users rope".
Linus
From: Alan Stern [email blocked]
Subject: Re: [GIT PATCH] USB autosuspend fixes for 2.6.23-rc6
Date: Thu, 13 Sep 2007 15:13:46 -0400 (EDT)
On Thu, 13 Sep 2007, Linus Torvalds wrote:
> In general, I think the USB blacklist/whitelists are generally a sign of
> some deeper bug.
>
> We used to have a lot of those things due to simply incorrect SCSI
> probing, causing devices to lock up because Linux probed them with bad or
> unexpected modepages etc. I suspect we still have old blacklist entries
> from those days that just never got cleaned up, because nobody ever dared
> remove the blacklist entry.
I don't just suspect -- I know for a fact that we do. Partly because
of laziness and partly because of not being able to verify that an
entry is no longer needed.
> We should strive to make the default behaviour be so safe that we never
> need a black-list (or a whitelist), and basically consider blacklists to
> be not a way to "fix up a device", but a way to avoid some really serious
> AND *RARE* error.
In general I agree. However there are some problems for which nobody
has been able to come up with another solution. See below.
> The moment you have lots of devices having the same blacklist entry,
> that's a sign that the blacklist is wrong, and the subsystem itself is
> likely doing something bad!
>
> So, I would seriously suggest:
> - look at USB quirks that have more than ten entries (and the entries
> aren't just the exact same device in various guises)
> - start considering that feature to be something that is known broken,
> and shouldn't be done AT ALL by default.
> - have some way to enable some extension on a device-by-device basis from
> the /sysfs interface, and then users can enable those things on their
> own with a graphical interface or something (or using whitelists in
> user space saying "ok, this device can actually do this")
> - REMOVE THE DAMN QUIRK
>
> It looks like the current situation now is that the latest autosuspend
> patches did basically everything but the last point.
Yes, none of those USB_QUIRK_NO_AUTOSUSPEND entries are needed any
more. They can all be removed and handed over to the HAL people as a
starting point.
> Btw, this is in no way just an AUTOSUSPEND issue. The USB layer has a
> *lot* of these quirks. They are often called "UNUSUAL_DEV()", but the
> thing is, some of those things seem to be so usual that the naming is
> dubious, and thus calling it a "quirk" or "unusual" is pretty dubious too.
You have concentrated your attention on the list for usb-storage,
but the usbhid driver also has an impressively long quirks list.
> For example, why do we have that US_FL_MAX_SECTORS_64 at all? The fact
> that some USB device is broken with more than 64 sectors would seem to
> indicate that Windows *never* does more than 64 sectors, and that in turn
> means that pretty much *no* devices have ever been tested with anything
> bigger.
>
> So why not make the 64 sector limit be the default? Get rid of the quirk:
> we already allow people to override it in /sys if they really want to, but
> realistically, it's probably not going to make any difference what-so-ever
> for *any* normal load. So we seem to have a quirk that really doesn't buy
> us anything but headache.
That's true now, but it wasn't always. Until the last year or so,
cdrecord wouldn't work properly with USB CD drives having a 64-sector
limit unless the user added a particular command-line argument.
In fact, setting max_sectors down to 64 is probably overkill -- 120
ought to be enough. But there may have been one or two oddball devices
that really did have a 32-KB limit, and better safe than sorry. At one
point an engineer from Genesys said their devices did, although they do
seem to work perfectly well with 64-KB transfers (and that's what
Windows gives them).
> Other quirks worth looking at (but likely unfixable) are:
> - US_FL_IGNORE_RESIDUE:
> Does this really matter? Can we not just always do the
> US_FL_IGNORE_RESIDUE thing? Windows must not be doing what we're
> doing.
Windows does indeed ignore the residue field, as far as I can tell.
But this is a rather tricky thing. The USB mass-storage spec
specifically says that one way a device can signal a short transfer is
to pad the data with 0s to the requested length and then set the
residue to indicate how much of the data is valid. If we ignore the
residue then we run a risk of misinterpreting the 0s as valid data.
Now in practice this doesn't matter much because short transfers of
block data (READ_10) generally involve other errors that would show up
anyway, and for non-block data (MODE SENSE) the padding probably
wouldn't matter. Still it seems like a dangerous sort of thing to do,
which is why I have resisted it.
(And by the way, there _definitely_ are devices which use this
signalling method. In fact, Linux contains a driver that does it.)
> - US_FL_FIX_CAPACITY:
> This is a generic SCSI issue, not a USB one, and maybe there are
> better solutions to it. Are we perhaps doing something wrong? Is
> there some patterns we haven't seen? Why do we need this, when
> presumably Windows does not?
This is another hard case. No, we aren't doing anything wrong. If
there are any patterns we haven't seen, we aren't aware of them. :-)
You might think that if a device claims to have an odd number of
sectors then it must be wrong, but this turns out not to be true.
Why doesn't Windows need this? For all we know, it does. Has anybody
ever tried forcing Windows to read the sector beyond the end of one of
these buggy devices?
For one reason or another, Linux supports filesystems/partitioning
schemes which do need to access the last sector (EFI GUID, md, maybe
others). Some devices are so buggy that trying to read the nonexistent
"last" sector causes them to lock up, requiring a power cycle.
Obviously we can't probe for this sort of behavior. (There was one
report of a device which _could_ read its last sector correctly, but
only if the transfer was exactly 1 sector long! Attempts to read two
sectors starting from the second-to-last sector would cause it to
crash.)
There's a straightforward solution: Never try to use the last sector --
in effect, assume every device has the FIX_CAPACITY flag set. Doing
this universally could cause data loss, however, so again I have been
opposed to it.
> - US_FL_SINGLE_LUN:
> At least a few of these seem to indicate that the real problem
> could be detected dynamically ("device reports Sub=ff") rather
> than with a quirk. Quirks are unmaintainable (and change), but
> noticing when devices return impossible values and going into a
> "safe mode" is just defensive programming.
This is almost certainly a case where lots of the entries are no longer
needed. But it isn't easy to tell which ones can safely be removed.
The problem here isn't that the device reports impossible values --
what happens is that it responds to commands for any LUN, causing the
kernel to think it is really multiple devices.
> For a lot of these things, you probably do not need a whitelist *at*all*!
>
> IOW, just default to something safe (the 64 sector example), and then
> perhaps allow people to explicitly play with their settings in a hardware
> manager. People actually tend to *like* being able to tweak meaningless
> things, and it makes them feel in control. So you'd have the Gentoo people
> who want to optimize their iPod access times by 0.2% by raising the
> maximum sector number - good for them! They'll feel empowered, and if it
> stops working, they know it was because of something *they* did.
>
> So at least in some cases, I think we should "default to stupid, but give
> users rope".
This will work in some cases, but not in others. In particular, it
won't work when the values have to known at device-detection time.
Once the sysfs files have been set up and the user can put in the
correct values, it's already too late.
Still, I agree that we can get rid of MAX_SECTORS_64. Similarly,
FIX_INQUIRY shouldn't be needed, since any device which does need it
would fail to work under Windows. However the really popular flags are
the ones which would be hardest to remove.
Alan Stern
From: Pete Zaitcev [email blocked]
Subject: Re: [GIT PATCH] USB autosuspend fixes for 2.6.23-rc6
Date: Thu, 13 Sep 2007 12:26:47 -0700
On Thu, 13 Sep 2007 09:43:13 -0700 (PDT), Linus Torvalds [email blocked] wrote:
> So why not make the 64 sector limit be the default? Get rid of the quirk:
> we already allow people to override it in /sys if they really want to, but
> realistically, it's probably not going to make any difference what-so-ever
> for *any* normal load. So we seem to have a quirk that really doesn't buy
> us anything but headache.
Well, ub does that today. And there is a measurable performance differential
with usb-storage when driving rotating discs, or so I heard.
> Other quirks worth looking at (but likely unfixable) are:
> - US_FL_IGNORE_RESIDUE:
> Does this really matter? Can we not just always do the
> US_FL_IGNORE_RESIDUE thing? Windows must not be doing what we're
> doing.
I'm afraid this is valuable. However, a number of devices only return
garbage as residue if the transfer length is greater than 32KB. Limiting
that would trim this blacklist, I think. The vast majority of devices
work correctly in this regard, and ub checks the residue without any
blacklist.
> - US_FL_FIX_CAPACITY:
> This is a generic SCSI issue, not a USB one, and maybe there are
> better solutions to it. Are we perhaps doing something wrong? Is
> there some patterns we haven't seen? Why do we need this, when
> presumably Windows does not?
It has something to do with the way our partition detection works. Linux
tends to rely on the reported device size. Windows reads the first block
and then goes further based on its contents. If we exterminate partitioning
code which uses the reported device size for autodetection, then this
problem will fix itself.
> - US_FL_SINGLE_LUN:
> At least a few of these seem to indicate that the real problem
> could be detected dynamically ("device reports Sub=ff") rather
> than with a quirk. Quirks are unmaintainable (and change), but
> noticing when devices return impossible values and going into a
> "safe mode" is just defensive programming.
This is being worked upon. The recent change for floppies eliminated
a big number of those.
-- Pete
From: Adrian Bunk [email blocked]
Subject: Re: [GIT PATCH] USB autosuspend fixes for 2.6.23-rc6
Date: Thu, 13 Sep 2007 22:19:10 +0200
On Thu, Sep 13, 2007 at 12:07:15PM -0400, Alan Stern wrote:
> On Thu, 13 Sep 2007, Adrian Bunk wrote:
>
> > > > It is a good thing if userspace can add currently missing devices to
> > > > whitelists, but the whitelist itself should be in the kernel.
> > >
> > > It's not clear that this sort of approach will turn out to be workable.
> > > Whitelists/blacklists do okay in the kernel when they refer to a
> > > relatively small subset of devices. However in this case I have the
> > > impression that we're talking about roughly a 50/50 split. Keeping an
> > > in-kernel list with even 10% of all existing USB devices simply isn't
> > > feasible.
> >
> > What about this is not feasible?
> >
> > The amount of work for maintaining the list is the same:
> >
> > No matter whether it's in-kernel or in the userspace, you need a list of
> > working devices in some machine readable format.
> >
> > Whether this gets used by the kernel, by userspace, or both, shouldn't
> > make any difference.
> >
> > Kernel image size can be a problem in some cases, but an in-kernel list
> > doesn't have to be mandatory but could be made selectable in kconfig.
>
> Well, size is one problem I had in mind. There are a _lot_ of USB
> devices in existence.
Kernel image size matters much for some uses, but not for all.
> But mainly it's a question of maintenance and modification. Kernel
> developers don't really enjoy maintaining black- or whitelists of
> devices, together with all the work involved in sorting through the
> issues when somebody posts an email saying "My device doesn't work!".
>
> Also, modifying device lists in the kernel tends to be a slow process,
> involving at least one kernel release cycle. It's much easier for
> people to maintain userspace databases. Now I realize you proposed
> there be a userspace interface for modifying the kernel's whitelist --
> but if you're going to do that, why not put the entire whitelist in
> userspace to begin with?
No matter whether the list is in userspace or in the kernel, maintaining
it is exactly the same job of adding entries into some machine readable
list (no matter whether it's a C struct or a CSV list that later gets
converted).
This could be the kernel developers or some external sf project that
gets synced with the kernel during each merge window.
> Maybe you're concerned about propagating updates as painlessly as
> possible -- if the whitelist is in the kernel then every kernel release
> would include an update. But in userspace it's possible to do updates
> even more quickly and painlessly. For example, there could be a
> network server available for both interactive lookups and automatic
> queries from HAL.
>...
No, what I'm concerned about is that this would require userspace for
something that is completely in-kernel.
> Alan Stern
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
From: Linus Torvalds [email blocked]
Subject: Re: [GIT PATCH] USB autosuspend fixes for 2.6.23-rc6
Date: Thu, 13 Sep 2007 13:44:23 -0700 (PDT)
On Thu, 13 Sep 2007, Adrian Bunk wrote:
>
> No, what I'm concerned about is that this would require userspace for
> something that is completely in-kernel.
If done right (and autosuspend now is), there is no "required" userspace.
If you want autosuspend, you just say so. The kernel doesn't do it by
default. This is not about "user space required" - it's about "user space
can ask for it if it wants to".
Notice? There doesn't even have to be any blacklists/whitelists at all. It
really can be just an application that allows the user to check or uncheck
the capability (with a warning saying something like: "Some USB devices
may disconnect when suspended - if this affects you, uncheck this").
That's why the kernel shouldn't set policy. It's a *good* thing to just
expose the capabilities, but not necessarily use them!
Linus
the problems of default
the problems of default permit anyone?
a blacklist usb list will always be a catch-up system, similar to anti-virus databases.
hell, its like asking for a visit from murphy. as in, it sounds like some devices have x number of switches, that if put into pattern y will break said device.
someone needs to clue-bat the engineers that worked on these devices...
but i have a feeling that this a windowitis, "it can be fixed in the driver" mentality, symptom.
good luck to the devs on this.
Hmpf...
I think this shows the huge difference between Windoof and Linux.
Windows : Driver comes from the producer / Windows specific tested driver by Microsoft
Linux : All purpose driver , sometimes tested, sometimes set as default as chip is same
(problems may be there too, for example the DELL things with emu10k1... :-) )
The white/blacklisting thing was a generally purpose workaround, but that it is not good is clear...
But how do you want to solve it?!? I think this is a very tricky thing as changing could break things.
And if I think how many drivers are all purpose drivers, I think this is nearly impossible
I worked at Lexmark doing
I worked at Lexmark doing firmware for 5 years.
I didn't do the USB device stuff for our printers, but did do the usb host side of the print server line. Linus is way off here ... if you want to work with any random cheaply produced asian printer, you're going to have to deal with its horrible quirky behavior. More often than not, working with them requires an "emulate windows" strategy instead of "follow the USB spec". The ones that ship with their own driver are particularly poor. This situation is especially bad in the inkjet market, where they shaving pennies off of every printer.
No, Linus is quite correct here.
Linus is choosing the correct option of a default behaviour that works for all devices, regardless of their bugs or non-bugs.
This is rumoured to be the same method as is used by another popular O/S from the USA, which uses software whitelists to specifically enable autosuspend only on devices that are known to be okay.
And to avoid any misunderstanding here, this is not "suspend/resume" support. This is "autosuspend", which just means telling the device to switch to an ultra-low power state after a few seconds of inactivity (think, spinning down an idle USB drive).
Cheers
Compile flags are the worst
Linus wants to give users rope, but doesn't. Too many compile flags should be run-time like CONFIG_USB_SUSPEND which broke practically everything USB in 2.6.20. (Just scan distro forums.) Recompiling the kernel to change one flag is not "user empowerment."
That, and a whitelist/blacklist solution, is best practice for this case. Ideally the whitelist/blacklists would be user editable, too. I'm really tired of stuff getting "compiled in" that doesn't need to be.
If devices are not USB spec compliant, fine, that's market reality, and Linux should deal with it, not dreams of beautiful streamlined code.