> You agree that drivers need to block various activities during suspend.Yes. The main thing I see here is that here is nothing common among drivers in what a "request" is and how it's processed. For example, for block drivers, it's actually fairly simple to just stop processing the request queue and wait for pending ones to complete. For network, in a similar vein, we can mostly just tell the network stack to stop sending us packets. That's what I call the "main path". This is often the trivial part to deal with, mostly because for a whole lot of drivers, it can be done via a couple of helpers in the subsystem that the driver provides a service too, via a helper, asking that subsystem to stop calling into the said driver (the asking should be done by the driver itself of course, for ordering reasons). We have some helpers, but I think not enough, and that's where we should focus imho. For example, I added fb_set_suspend() so that fbdev's can request fbcon to stop accessing them (it doesn't solve the problem of userland mmap's, that will have to be done, if we want to do it, in a more sneaky way, using VM tricks, but the DRM nowadays has the infrastructure to do it). But that's only the "main" path. Aside for that, almost all drivers also have sideband "request" input and some driver don't actually live behind a subsystem. That ranges from ioctl, to direct read/write on a char dev from userland. I think many of those cases can fairly well deal with just taking a PM semaphore, that's how I did for a couple of things in the past, provided that the request path isn't deadlocking with the semaphore held because of the system suspending of course. But in a whole lot of cases, it's, I beleive, perfectly kosher to just return an error. You're trying to capture frame from your camera while the machine is suspended ? error. At worst, your capture app will be unhappy when you wakeup, nothing terrible and totally fixable in userland if it's a problem. In some cases, we could use a little bit more help from the subsystem. Network for example, could have some explicit knowledge of the suspend state, and in addition to stopping the queue would also stop calling into things like change_mtu or set_multicast, provided it's agreed that the driver will account for those changes on resume (the actual MTU values or multicast lists are still updated in the netdev). I think there's no "normal" scenario, each driver or family of drivers will do things very differently. Yes. In fact, the best is to have parallel suspend/resume of drivers and asynchronous resume but that's out of topic :-) (For the record, I did some bits like ADB resume like that, that is asynchronous, to speed up wakeup time). There are two things I believe. There's a generic issue with usermode helpers that make no sense to call between pre-suspend and post-resume, and there's the specific issue of adding/removing devices. I believe that "bus" drivers such as USB should indeed get a first round of notifications to tell them to stop performing bus plug/unplug operations (it's debatable whether we want to keep unplug going provided we can stack up the usermode events and re-send them later though, but let's say no for the sake of simplicity). I'm not totally sure about that. I like some of it, but I think it's fairly different conceptually from the freezer (and the implementation could be as trivial as a single system wide wait queue). Basically it has a very big difference to the current freezer, and I like that, which is that we don't have some 3rd party trying to find out what to freeze and what not (the freezer), but instead, we have explicitely drivers or kernel threads sending -themselves- to the "icebox" when they think it's a good idea. Think of it as lazy freezing -> you only freeze lazy tasks that are trying to do something that cannot be done because of suspend. I think it's a fairly significant change from the current freezer and I also think it's a very good idea. The more I think about it, the more I like it, in the sense that it's a simple drop-in that you could put in a lot of the ioctl path of drivers to just block tasks that are trying to call in while suspending, and could be used selectively by things like the USB hub threads. I don't think suspend has to be -that- transparent (though there is some debate on whether it should be if we're gonna do some kind of fast "light" suspend for things like OLPC) but overall, I agree that a bind operation on sysfs should probably block until resume, and it does make sense to have the logic to do that in sysfs itself. It could perfectly use the above icebox thingy you came up with. Registration is fine, binding/ubinding is not (no problem putting the driver on/off a list, though of course, you can't unregister a bound driver without unbinding, it's prefectly find to unregister an unbound driver). That's where the meat is. I think that should indeed be handled within the driver core. Though I tend to think our driver core is a bit of a locking mess at the moment :-) (Who says it could use more RCU-like constructs ?). Regarding unbinding, that's debatable, it might be perfectly allright to unbind while suspended, though then, there is the question of a driver being later on bound to a piece of HW that is suspended (though I suppose that could happen today... some machines have their firmware leave some devices off at boot). As a general matter, If we have those pre-suspend/post-resume notifiers and we adapt the few (there isn't that much) bus drivers so that they stop all probe/unprobe operations before the suspend dance starts, we avoid a lot of that problem. At this point, it becomes fair enough for bind/unbind, in the core, to return an error (and maybe even stack trace in dmesg to catch the culprits) while suspend in in progress. The only remaining annoying case is manual bind/unbind via sysfs which is a good candidate for the above described icebox. Yes, unless kernel threads explicitely decide to stop themselves (for example, khubd is a good candidate for that). Again, not a 3rd party trying to decide what to freeze and what not, but the drivers or kernel threads themselves deciding it. We can either stop it at the sysfs write level, or we can have the workqueue task reschedule itself later until we are resumed. In fact, worqueue items being what they are (queue items), we could imagine having a special list where they enqueue themselves to rescheduled after resume. Don't get me wrong, I never said we don't need generic infrastructure and utilities, such as your proposed icebox scheme, or some of those workqueue bits, helpers in subsystems, etc... I just think that the freezer approach, as it is, is backward. We can't have a 3rd party try to discriminate what to freeze and what not, it will always get something wrong, and in some cases with the wrong timing or ordering. Cheers, Ben. -
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
| Nick Piggin | [patch] my mmu notifier sample driver |
| Sean | Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching |
| Arjan van de Ven | [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in |
git: | |
| Antonio Almeida | HTB accuracy for high speed |
| Gerrit Renker | [PATCH 0/37] dccp: Feature negotiation - last call for comments |
| Jens Axboe | Re: [BUG] New Kernel Bugs |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
