Hi, The first of the following three patches introduces notifier chaing that can be used by subsystems to prefore some hibernation-related or suspend-related operations independent of device drivers' .suspend() and .resume() callbacks. The next two patches use this mechanism for disabling/enabling the usermode helper and firmware requesting functionality before/after (respectively) the hibernation or suspend. Comments welcome. Greetings, Rafael -
From: Rafael J. Wysocki <rjw@sisk.pl>
Use a hibernation and suspend notifier to disable the user mode helper before
a hibernation/suspend and enable it after the operation.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
kernel/kmod.c | 31 +++++++++++++++++++++++++++++--
1 file changed, 29 insertions(+), 2 deletions(-)
Index: linux-2.6.22-rc3/kernel/kmod.c
===================================================================
--- linux-2.6.22-rc3.orig/kernel/kmod.c 2007-05-27 18:56:44.000000000 +0200
+++ linux-2.6.22-rc3/kernel/kmod.c 2007-05-27 19:43:35.000000000 +0200
@@ -33,6 +33,8 @@
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/resource.h>
+#include <linux/notifier.h>
+#include <linux/suspend.h>
#include <asm/uaccess.h>
extern int max_threads;
@@ -46,6 +48,14 @@ static struct workqueue_struct *khelper_
*/
char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";
+/*
+ * If set, both call_usermodehelper_keys() and call_usermodehelper_pipe() exit
+ * immediately returning -EBUSY. Used for preventing user land processes from
+ * being created after the user land has been frozen during a system-wide
+ * hibernation or suspend operation.
+ */
+static int usermodehelper_disabled;
+
/**
* request_module - try to load a kernel module
* @fmt: printf style format string for the name of the module
@@ -251,6 +261,22 @@ static void __call_usermodehelper(struct
complete(sub_info->complete);
}
+static int usermodehelper_pm_callback(struct notifier_block *nfb,
+ unsigned long action,
+ void *ignored)
+{
+ switch (action) {
+ case PM_PRE_FREEZE:
+ usermodehelper_disabled = 1;
+ return NOTIFY_OK;
+ case PM_POST_THAW:
+ usermodehelper_disabled = 0;
+ return NOTIFY_OK;
+ }
+
+ return NOTIFY_DONE;
+}
+
/**
* call_usermodehelper_keys - start a usermode application
* @path: pathname for the application
@@ -276,7 +302,7 @@ int call_usermodehelper_keys(char *path,
struct subprocess_info *sub_info;
...Seems ok to me. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
From: Rafael J. Wysocki <rjw@sisk.pl>
Make it possible to register hibernation and suspend notifiers, so that
subsystems can perform hibernation-related or suspend-related operations that
should not be carried out by device drivers' .suspend() and .resume() routines.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Documentation/power/notifiers.txt | 76 ++++++++++++++++++++++++++++++++++++++
include/linux/notifier.h | 13 ++++++
include/linux/suspend.h | 25 +++++++++++-
kernel/power/Makefile | 2 -
kernel/power/disk.c | 31 +++++++++++++--
kernel/power/main.c | 12 ++++++
kernel/power/notifier.c | 51 +++++++++++++++++++++++++
kernel/power/power.h | 4 ++
kernel/power/user.c | 11 ++++-
9 files changed, 214 insertions(+), 11 deletions(-)
Index: linux-2.6.22-rc3/kernel/power/notifier.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.22-rc3/kernel/power/notifier.c 2007-05-27 14:43:14.000000000 +0200
@@ -0,0 +1,51 @@
+/*
+ * linux/kernel/power/notifier.c
+ *
+ * This file contains functions used for registering and calling hibernation and
+ * suspend (PM) notifiers that can be used by subsystems for carrying out some
+ * special hibernation-related and/or suspend-related operations.
+ *
+ * Copyright (C) 2007 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * This file is released under the GPLv2.
+ *
+ */
+
+#include <linux/smp.h>
+#include <linux/notifier.h>
+#include <linux/module.h>
+#include <linux/suspend.h>
+
+static DEFINE_MUTEX(pm_notifier_lock);
+
+static RAW_NOTIFIER_HEAD(pm_chain);
+
+int register_pm_notifier(struct notifier_block *nb)
+{
+ int ret;
+ mutex_lock(&pm_notifier_lock);
+ ret = raw_notifier_chain_register(&pm_chain, nb);
+ mutex_unlock(&pm_notifier_lock);
+ return ret;
+}
+EXPORT_SYMBOL(register_pm_notifier);
+
+void ...Hi. Quick read only, but it looks good to me. Nigel
Does this one need to be different from POST_THAW? I do not see the need for the other chains. Notice that we do not _want_ to have too many of them, because changing anything in the hibernation will become impossible with 10 chains having intimate details of suspend sequence. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Allocating memory before the hibernation, mainly. Anything done after freezing Do you mean POST_RESTORE? Well, maybe not. Yesterday I thought it would be I think the two suspend-specific might be handy. What about: PM_PRE_FREEZE PM_HIBERNATION_PREPARE PM_POST_HIBERNATION (handling failed snapshots too) PM_POST_THAW plus PM_RESTORE_PREPARE for the restore and PM_SUSPEND_PREPARE PM_POST_SUSPEND (handling failed suspends too) for the suspend code path? Rafael -
Is there any particular reason you chose to use a RAW_NOTIFIER_HEAD with an explicit mutex instead of using a BLOCKING_NOTIFIER_HEAD? Alan Stern -
Hmm, not really. I based it on the CPU hotplug notifiers, actually. I'll see if I can use BLOCKING_NOTIFIER_HEAD. Greetings, Rafael -
Yes, this makes the patch simpler.
I have also followed the Pavel's suggestion to limit the nubmer of events.
Please have a look at the updated version of the patch (appended).
Greetings,
Rafael
---
From: Rafael J. Wysocki <rjw@sisk.pl>
Make it possible to register hibernation and suspend notifiers, so that
subsystems can perform hibernation-related or suspend-related operations that
should not be carried out by device drivers' .suspend() and .resume() routines.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Documentation/power/notifiers.txt | 57 ++++++++++++++++++++++++++++++++++++++
include/linux/notifier.h | 8 +++++
include/linux/suspend.h | 37 ++++++++++++++++++++++--
kernel/power/disk.c | 24 ++++++++++++----
kernel/power/main.c | 14 +++++++++
kernel/power/power.h | 10 ++++++
kernel/power/user.c | 11 +++++--
7 files changed, 150 insertions(+), 11 deletions(-)
Index: linux-2.6.22-rc3/include/linux/suspend.h
===================================================================
--- linux-2.6.22-rc3.orig/include/linux/suspend.h 2007-05-27 19:57:21.000000000 +0200
+++ linux-2.6.22-rc3/include/linux/suspend.h 2007-05-29 22:52:24.000000000 +0200
@@ -54,7 +54,8 @@ struct hibernation_ops {
void (*restore_cleanup)(void);
};
-#if defined(CONFIG_PM) && defined(CONFIG_SOFTWARE_SUSPEND)
+#ifdef CONFIG_PM
+#ifdef CONFIG_SOFTWARE_SUSPEND
/* kernel/power/snapshot.c */
extern void __register_nosave_region(unsigned long b, unsigned long e, int km);
static inline void register_nosave_region(unsigned long b, unsigned long e)
@@ -72,7 +73,7 @@ extern unsigned long get_safe_page(gfp_t
extern void hibernation_set_ops(struct hibernation_ops *ops);
extern int hibernate(void);
-#else
+#else /* CONFIG_SOFTWARE_SUSPEND */
static inline void register_nosave_region(unsigned long b, unsigned long e) {}
static inline void register_nosave_region_late(unsigned long b, ...Hmm, looks like bad idea if we are going to remove freezer from Is not PRE_FREEZE enough? We can allocate memory for drivers there, too... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Hi, We need PM_PRE_FREEZE anyway and it's a different question whether or not it'll be used for suspend (STR) too. The timing is not the best one, but so far the freezer is in the suspend code Well, there is a reason for not doing this. Namely, if the memory if freed on PM_POST_HIBERNATION after the image has been created, we can use it for saving the image (and speed up the saving). Besides, if the freezer is dropped from the suspend code, the notifiers will be useful to it anyway IMO, and PRE_FREEZE won't make sense in that case. I think the rule should be: If you need to do something _before_ tasks are frozen, do it in PM_PRE_FREEZE, but if you can do that after the tasks have been frozen, do it on PM_HIBERNATION_PREPARE (or PM_SUSPEND_PREPARE in the suspend case). Greetings, Rafael -
OTOH, having considered it for a while, I think that for now I can add just PM_PRE_FREEZE and PM_POST_THAW, as I don't have any user for the other ones. The other events may be added in the future if need be (along with some users). I'll post revised patches in a new thread. Greetings, Rafael -
Hi. I haven't been giving this much attention, so forgive me if I'm about to ask a silly question... which notifiers would you see the avenrun saving/restoring using? Regards, Nigel
Hi, I'm not sure what you mean. Could you please clarify? Greetings, Rafael -
Remember that patch about saving/restoring "load average"? I guess Nigel asks where to hook it. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Hi, Ah, okay, thanks. Hmm, that would be PM_PRE_HIBERNATION, I think. Rafael -
Yes please. (Hmm, but you may create 4 of them PM_PRE_HIBERNATION, PM_PRE_SUSPEND, PM_POST_HIBERNATION, PM_POST_SUSPEND -- I'd call all of them before freeze and after thaw, but if we drop freezer for suspend, we won't have to rename anything.) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Yes, that's a good idea. :-) Greetings, Rafael -
From: Rafael J. Wysocki <rjw@sisk.pl>
Use a hibernation and suspend notifier to disable the firmware requesting
mechanism before a hibernation/suspend and enable it after the operation.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
drivers/base/firmware_class.c | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
Index: linux-2.6.22-rc3/drivers/base/firmware_class.c
===================================================================
--- linux-2.6.22-rc3.orig/drivers/base/firmware_class.c 2007-05-27 19:43:03.000000000 +0200
+++ linux-2.6.22-rc3/drivers/base/firmware_class.c 2007-05-27 19:44:06.000000000 +0200
@@ -17,6 +17,8 @@
#include <linux/bitops.h>
#include <linux/mutex.h>
#include <linux/kthread.h>
+#include <linux/notifier.h>
+#include <linux/suspend.h>
#include <linux/firmware.h>
#include "base.h"
@@ -35,6 +37,9 @@ enum {
static int loading_timeout = 60; /* In seconds */
+/* If set, _request_firmware() will exit immediately returning -EBUSY */
+static int requesting_firmware_disabled;
+
/* fw_lock could be moved to 'struct firmware_priv' but since it is just
* guarding for corner cases a global lock should be OK */
static DEFINE_MUTEX(fw_lock);
@@ -397,6 +402,9 @@ _request_firmware(const struct firmware
struct firmware *firmware;
int retval;
+ if (requesting_firmware_disabled)
+ return -EBUSY;
+
if (!firmware_p)
return -EINVAL;
@@ -568,6 +576,26 @@ request_firmware_nowait(
return 0;
}
+static int firmware_class_pm_callback(struct notifier_block *nfb,
+ unsigned long action,
+ void *ignored)
+{
+ switch (action) {
+ case PM_PRE_FREEZE:
+ requesting_firmware_disabled = 1;
+ return NOTIFY_OK;
+ case PM_POST_THAW:
+ requesting_firmware_disabled = 0;
+ return NOTIFY_OK;
+ }
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block firmware_class_pm_nb = {
+ .notifier_call = firmware_class_pm_callback,
+};
+
static int __init
firmware_class_init(void)
{
@@ ...I don't like this approach, as I feel that the firmware loading interface should be able to detect if a firmware load request is not being handled, due to absence of userspace / hotplug handler presence. Other circumstances in which this can be a problem is during bootup when request_firmware() calls can be made before userspace is up and init has run (even in the presence of an initramfs). (Slightly OT: A particularly nasty race is when an initramfs userspace is present, but firmware loading cannot occur because init has not run, so proc hasn't been mounted, so a hotplug event handler cannot be registered, despite the fact that the firmware is sitting on the ramdisk mounted correctly...) In short, a more general solution would be preferred, and preferably one which allows firmware loading to *actually* occur once userspace has actually turned up and registered a handler :) Michael-Luke Jones -
In principle, I agree. In practice, though, I don't know how to make this I agree, but well ... ;-) Greetings, Rafael -
I don't think that's possible. If hotplug handler needs /dev/foo, but /dev/foo is not available, it will just block waiting there. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
I believe that the current functionality is that /sbin/hotplug is the default. I think that if there is one in initramfs, it'll get called as a usermode helper, even before init has run. (If it needs /proc /sys or /dev, setting it up properly is its problem, but not an insurmountable one.) I remember discussion of this a while back, and that it was indeed intentional. I dunno if anybody's actually tried it. Rob -
This avoids the problem of .resume methods calling userspace while userspace is frozen and a resulting hang, but does it actually result in the drivers beginning to work again? If we remove process freezing in STR, this should just work[1] without the need to complicate things. On the other hand, if we don't want to support these functions in the suspend and resume methods we could just audit the kernel and remove them all. -- Matthew Garrett | mjg59@srcf.ucam.org -
Well, this was acutally invented before you've decided to remove the freezing of tasks from the suspend code path (which I think is a mistake, but that's only my personal opinion, so it doesn't matter very much ;-)) and I regard it Under the (optimistic, IMO) assumption that the relevant user space task won't block on I/O with a suspended device involved or something like this. BTW, I know of two subsystems that want their kernel threads to be frozen for synchronization purposes. Please see these messages: 1) https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012592.html (plus follow up) 2) http://marc.info/?l=linux-kernel&m=117919066830575&w=2 Your patch breaks them and I suspect there are more cases like these. Besides, there's the hibernation that needs to freeze tasks for another reason, so it needs some way to ensure that drivers won't request firmware while Yes, I think we should do this. Greetings, Rafael -
Yeah, I forgot the footnote that was going to mention that. Clearly there are issues if this is on some block device that hasn't been I'm not entirely sold on this. The issue is that there's the possibility of races during suspend/resume? It sounds like that should be implemented in the driver, rather than depending on a side-effect of process freezing. Otherwise there's no way of selectively suspending I agree that that's an issue right now, but I think we should see if this is repairable without just breaking those drivers. One option would be to defer resuming them until userspace is alive again - that would be no worse than the suspend to RAM case without process freezing. -- Matthew Garrett | mjg59@srcf.ucam.org -
Or worse. Suppose, for example, that the device which needs the firmware uploaded is your network adapter and the firmware file is on NFS. I don't think there's a solution to the "requesting firmware from .resume()" problem other than copying the firmware into memory _before_ the suspend That's true, but we've been using the freezer for so long that at least some drivers started to rely on it being used. That's the reason, IMO, why we can't just drop the freezer right now. It can be _replaced_ by some other synchronization mechanisms, but not just dropped. Moreover, I think that to implement such mechanisms within device drivers we should first stop using the same .suspend() and .resume() routines for both hibernation and suspend. This is the plan right now (ie. to stop using .suspend() and .resume() for hibernation) and when we're done with that (yes, it'll take some time), then we Please consider the example with a firmware on NFS. :-) I think the _solution_ would be to fix the drivers. The other approaches are just workarounds, including the $subject patch. Greetings, Rafael -
I can't speak for the second example, but there's a good reason the first example works this way. It's not a matter of races; the problem is that the kernel thread's job is to selectively suspend and resume devices. We don't want it doing this while a system sleep is in progress; it would (and in fact has, before the thread was made freezable) cause the sleep transition to abort. Handling it entirely from within the drivers is possible in theory. Unfortunately the design of the PM core has not leant itself to such an approach. Using separate callbacks for hibernation vs. STR will help, as will Raphael's notifiers. Alan Stern -
How does this work on PPC or APM systems? -- Matthew Garrett | mjg59@srcf.ucam.org -
For hibernation it behaves the same as on other types of systems. For STR it generally works okay. There was one report of suspends aborting, and it looked like this was caused by selective resumes originating from userspace. This seemed to be unrelated to the kernel threads; apparently some program was running while the STR was in progress, and causing the problem. For example, the lsusb program will do a selective resume on every USB device as it scans through them all. However that's just a guess, we haven't fully resolved that bug report. The theoretical answer is that it behaves the way we want. The kernel thread does selective resumes in response to device requests. If such a request comes in while the system is asleep it will awaken the system; so it's only logical that a request coming in while the system is in the process of going to sleep should abort the suspend. Alan Stern -
Ok, I guess I'm still not clear on this :) If it doesn't cause major problems on Powermac or APM systems, why is freezing the thread beneficial on ACPI systems? -- Matthew Garrett | mjg59@srcf.ucam.org -
Isn't it true, even on PPC and APM systems, that tasks are frozen for hibernation? The difference is that they aren't frozen for STR. Hence the answer to your question: Not freezing the thread would work okay on ACPI systems for STR. (In fact, not freezing anything would probably work okay for STR.) But it doesn't work with hibernation. Alan Stern -
I'd say that it shows ppc being broken. User wanted to suspend the system, and now unrelated task did lsusb... and system will not sleep. AFAICT it is DoS issue -- if one of your users keeps doing lsusb, root will not be able to suspend the system. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
This is a matter of one's philosophy. In suspend-to-RAM, should tasks be frozen or should I/O queues be frozen? With the USB subsystem I have followed the approach taken by the PM core, which is that tasks are frozen. But one can -- and Linus has on at least one occasion -- make a good case that tasks should be left running while only I/O is frozen. This would require the subsystem to distinguish between a selective device suspend and a system-wide suspend-to-RAM, so that selective resume could be enabled on demand in one case but not the other. It's quite doable in principle -- it's just not the technique I used. Alan Stern -
In fact that makes a heck of a lot more sense to me from the conceptual point of view. From the hardware perspective, the task of preparing to enter true suspend states (STR, or suspend for ACPI; embedded systems have more options) focusses on what I/O signals are disabled. Once the relevant I/O signals are first idled, then disabled, the CPU can do whatever it likes. Whether it runs or not is purely a workload decision... Remember too that not all systems suffer from the constraints that ACPI decrees. In particular, it's not uncommon that some parts of the system be active in certain suspend states. The whole point is to turn off as much of the system as possible, especially the high power portions, while letting work proceed. Turning off some clocks and peripherals doesn't need to imply turning them all off, or disabling DMA ... and should not need to be triggered by a user (or userspace tool) explicitly saying Exactly. "Selective suspend" of parts of the system is a far more general model. It fits well with runtime power management, degrades smoothly to states where memory goes into self-refresh (maybe the system idle loop when NO_HZ is being effective) or even hibernation (as discussed elsewhere). -
One can think of suspend-to-RAM as nothing more than a super-duper selective suspend of all devices (including the CPU!). This illustrates the relationship between suspend-to-RAM and runtime PM. However there is one important distinction, the DoS issue that Pavel raised. With true suspend-to-RAM, autoresume on demand is not enabled -- only on remote wakeup. With runtime PM, both are enabled. Elegant though it is, the "freeze I/O" approach suffers from implementation awkwardness. Ideally individual drivers shouldn't have to worry about it, but the subsystems certainly will. Consider as an example the class of char device drivers. The only subsystem they have in common is VFS. It would then be necessary for every entry point into VFS to check whether a suspend-to-RAM is in progress, and if it is, block until the suspend is over. Each one of those entry points is on a hot path, so adding these checks is the sort of thing one would like to avoid. By freezing tasks we can eliminate (most) I/O requests at the source with a single, relatively small amount of code (i.e., the refrigerator). Freezing I/O, on the other hand, would involve many checks dispersed throughout a large number of source files -- checks that would have to execute even when a suspend wasn't in progress. Alan Stern -
I guess we need to do that. Random user should not be able to prevent machine from sleeping. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Just to be clear about this, let's agree that we're talking about suspend-to-RAM here, not hibernation. It boils down to whether we want to freeze user tasks. As I recall, Linus said that he didn't have any big objection to freezing user threads; he was much more concerned about freezing kernel threads. Thanks to Raphael's new notifier chains this will no longer be an issue, since kernel threads will be able to stop themselves when they receive a suspend notification. There may remain some obscure difficulties in discerning whether a particular thread should be classified as user or kernel, but let's ignore them. Even if we don't actively freeze user threads, approximately the same effect can be achieved in the following way: Change the main kernel entry points so that any thread performing a system call during a suspend will get frozen until the suspend is over. Threads that run entirely in userspace will continue doing useful work as before, and kernel threads won't be affected at all. (Not that I think it's necessary to do this; it's just a way to avoid freezing user tasks until they need it.) One way or another, freezing user tasks should not be a big deal. After all, once the suspend is complete eveything will effectively be frozen anyway. I suppose there might be issues involving tasks which need to run in order to complete the suspend -- IMO any such issues should be handled by carrying out the necessary actions before the point where we now start up the freezer. The alternative is to have drivers take over the burden. I don't like this at all. The most obvious disadvantage is that the necessary checks would have to be duplicated many many times and spread out over lots of drivers. It's also harder to handle these things at the driver level. Suppose a driver gets an I/O request while a suspend is underway. What should it do? Return an error? Block until the suspend is over? Both approaches have their difficulties: Returning an ...
Blocking would be possible option. I agree it is tricky to implement... it may also be useful for a harddrive: "I'm riding a horse at 40kph now, so you'll kill the harddrive if you access it; just freeze everyone until we are at the other end of meadow". ...hmm, but this seems to be blockdevice specific, and I can't think Agreed. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Hi. Suspend-to-ram code paths shouldn't assume userspace is unfrozen anyway. Doesn't [u]swsusp have a code path like Suspend2 where we can suspend to ram after writing the hibernation image? In that case, it will still be possible that we seek to enter and leave S3 with processes frozen. Apologies if anyone has already mentioned this - I'm just starting to play catchup. Regards, Nigel
No one has and that's a valid point, I think. :-) Greetings, Rafael -
What exactly is the problem we see here? The timeout of the firmware loader? What goes wrong with frozen userspace, usually there is only a netlink message sent from the kernel, which should be received and handled just fine when userspace is running again. Thanks, Kay -
Users report the timeout as a problem and it's not that straightforward to figure out what happens. Still, I agree it's much better if drivers don't use request_firmware() in their .resume() routines at all, because they shouldn't rely upon user land at this point. Greetings, Rafael -
Driver calls request_firmware in the resume method. The userspace helper can't be run because it's been frozen, so the firmware never gets loaded and the call times out. The driver then fails to resume. While all this is happening, the rest of the kernel is blocking on that resume method. The firmware can be loaded once userspace has been started again, but by that time the driver has given up. -- Matthew Garrett | mjg59@srcf.ucam.org -
Seems, that's just the broken synchronous firmware loading interface with the useless timeout handling. The nowait version of the same loader doesn't time out, and should not have that problem. The sync version should be removed from the kernel, it just causes all sorts of problems since it exists. Userspace should handle the async request just fine when it comes back running, regardless of the time it was submitted. Kay -
Okay, so the solution is to convert the drivers to use request_firmware_nowait() instead of request_firmware() in their .resume() routines. Greetings, Rafael -
[added Rob Landley to CC] I think asynchronous loading should be made the default behaviour. However, we need to think and document how to do firmware loading. Firmware loading can be done at two different times: Device Driver Load First use of Device Driver For a network device, this correlates to when the driver is first loaded into memory or at 'ifconfig up' respectively. At device driver load, firmware loading must be asynchronous. This is because device driver init can occur before userspace runs and registers a hotplug/uevent event handler. If device use is attempted before firmware loads, a -ENOFIRMWARE call would be great so that userspace and thus the user can be clearly informed what the problem is. However, at 'first use' firmware loading, the synchronous interface should remain. 'ifconfig up' either works or it doesn't, and as I see it, has to just hang around until firmware turns up. One more thing, it seems that the asynchronous firmware loading thread just spawns a _request_firmware() call which then times out at 60 seconds. I think, if the first request fails it spawns another. 60 seconds is *far* too long for this type of thing, and this was set at 10 seconds before the last two kernel releases (which is even a bit slow itself). Unfortunately, this appears to a case of quite senior kernel developers pushing a bodge upstream rather than fixing the underlying issue :( [1] [2] Documentation for how hotplug/uevent handlers should cope with these types of firmware loading is also *strongly* requested, in order for lightweight but fully functional implementations to be made. Documentation > Reference Implementation :) Michael-Luke [1] http://tinyurl.com/2colng (git.kernel.org) [2] http://tinyurl.com/224h54 (redhat bugzilla) -
Why would a driver create an interface before it has the needed What kind of network driver does create an interface for a non-functioning device? That sounds like a bug on its own. If a driver binds to a device, it should just have the firmware already loaded, and not wait until its used. What's the reason for such a behavior, to let a driver pretend it can handle a device, but it doesn't The underlying issue are the driver authors, that's not so easy to fix. :) Well, 10 seconds are just to short for userspace to react on some setups, from tiny boxes which are busy, to 512 CPU boxes enumerating thousands of devices, all had problems here. Any timeout for a firmware-request is just a broken concept, the request should wait forever, to be fulfilled or canceled from userspace when it's ready. Kay -
Sorry, I know this maybe be unintentional, but comments like this make me somewhat angry. If there is no decent documentation as to how to do it the right way What I wrote above is especially true when the in-kernel APIs themselves do things the wrong way (tm) themselves. Thus, even more thought is required to work around this imperfect behaviour in a sane manner. And without documentation, every single device driver author has to go through this thought process themselves. Unsurprisingly, they often get it wrong. But as there's no decent documentation to do it right, it's *not* their fault. I'd suggest it's more the fault of the core kernel devs who fail to fix up a questionable firmware loading interface, then fail to document how to work around its shortcomings. Again, apologies if this sounds angry, I don't want to start an argument. But as someone just starting out here, this kind of thing can be very frustrating, as even with the best will in the world, achieving the right way (tm) is basically impossible if those in the know about what the right way (tm) is fail to share this information. Michael-Luke Jones -
It was the response to "pushing a bodge upstream", while we just try to make things working. :) We are fighting that broken firmware-loader model for years, but every driver author seems to have its own idea how that should work in theory, but they are not even willing to switch to the existing version that is In our development model, users of an interface are expected to improve it, because nobody else really knows what's needed. That unfortunately didn't happen so far. We get this kind of conversation every few months since a few years now. I wonder, if we will see code, or at least come up with a better idea this time. :) Kay -
Thanks for responding :) The fact that no-one is told to use the new right way (tm) in any available documentation does not help matters improve. One issue is that the 'asynchronous' interface seems to rely on the same 'dodgy' timeout mechanism as the original request_firmware() call... Looks like the lack of a maintainer should be the opportunity for a rework of this API. Michael-Luke Jones -
Now a technical rather than emotional response... A valid point. But there should be some kind of error notification if firmware loading hasn't happened correctly rather than a permanent asynchronous wait in which the interface fails to turn up. Possibly a kernel information printk or something, which does not exist at the Unclear. My point was that when ifconfig up exits, the interface should be up, not asynchronously waiting for firmware to be loaded, then taken up in the background. Thus, firmware loading in this case Basically, you have a device which can carry out different functions depending on the firmware loaded into it. Driver A is specific to this device, and loads the firmware. Driver B uses functions exported by Driver A to carry out one particular function of the device. Driver C uses the same functions to carry out a totally different function on the same device, but with different firmware loaded. Add in multiple devices handled by Driver A, all with different functionality, and sometimes with combinations of functionality that can coexist, and you see that when Driver A loads it cannot possibly know which firmware to load, but must wait for other Drivers to turn up and be put into use. Thus it 'pretends' to handle all the devices until it's forced to make a choice. Absolutely. I said this in an earlier email and suggested rejecting this patchset on the grounds that it was another bodge covering over Michael-Luke Jones -
If it's spawning a new usermode helper process, then figuring out when to give up and exit is that process's job. If it _can't_ exec usermode helper, then that should fail immediately. Where does the timeout come in? Rob -
I'd love to write up documentation on this if anybody can tell me what it And for block devices ala: https://bugs.launchpad.net/ubuntu/dapper/+source/initramfs-tools/+bug/74004 https://issues.rpath.com/browse/RPL-1350 The first use could be "mount" or open of the block device under /dev. So now you have the mount and open syscalls returning -ENOFIRMWARE. This is not a Userspace doesn't have to register a hotplug handler: /sys/hotplug is the default and if there is one in initramfs it should get called. You shouldn't have to wait for init to run for this to be the case. I believe the initramfs cpio archive used to get extracted just before the attempt to exec /init out of it, and that it was moved to much earlier in the boot process so that firmware loading could be done out of it for statically linked drivers. Look at do_basic_setup() in init/main.c: notice how usermodehelper_init() gets called right before driver_init(). Ask yourself: "why did it do that"? Notice how rest_init() forks the first kernel thread (PID 1) to run kernel_init(), and then becomes PID 0 (the idle task). So from kernel_init() it's ok to spawn all the new tasks we want because they can't take any Because of course every userspace utility in the world will immediately be rewritten to provide clear and informative error messages for this race condition corner case. Happy to. I'm just trying to figure out what the correct behavior is so I can document it. Rob -
Well, chapter on firmware & suspend/hibernation should say something about either loading the firmware during early suspend, or just keeping firmware in ram from boot... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
You'll just get deadlock at different level (and more rare). Imagine disk with its firmware on NFS and NFS with its firmware on disk. (Or maybe firmware loader doing find /, including both disk and NFS). Just don't call request_firmware_* from .resume(). -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Yeah, I've just given the very same example in another message. :-) I believe the only _solution_ would be to load the firmware into memory before the suspend and use the in-RAM copy to upload it into the device in .resume(). A PM_PRE_FREEZE notifier could be used for that just fine, BTW. Greetings, Rafael -
A driver for a bootup-critical device like this should just never release the firmware after the first load. There is absolutely no point in doing that. Kay -
It does not have to be _bootup-critical_ device. Problem is any device that might be used by userspace firmware loader. And that is _any_ device. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Bogus argument: is a USB-Ethernet device which needs firmware loading boot-up critical? Not on the surface, but if the device loads root over this device, it suddenly is. This functionality should also be written into the firmware-class (and this fact *is* acknowledged in the sparse documentation). Michael-Luke Jones -
Feel free to submit documentation improvements. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Releasing loaded firmware for anything that needs to handle situations Yeah, but these are just words, nothing people could use. Unfortunately the author of this document died a few years ago. Either the whole idea of userspace firmware-loading should be considered as a problem impossible to do right because of its unsolvable side effects. Or at least releasing loaded firmware should be the exception for drivers which can be sure, that the firmware is not needed in situations we try to work around here. Kay -
I don't believe that it is impossible. What is needed is some thought, and maybe a loose spec. We need to think about safe ways of solving a simple problem: what to do when userspace is unable to respond to a kernel uevent request. We now have a timeout, whether it be handled synchronously or in the background. I don't think either solution is good enough. [RFC] Possible Plan: 1) request_firmware() should stick around unmodified but be marked __deprecated. 2) request_firmware_nowait() as it stands should probably be removed. After all, it only has 2 in-kernel users at the moment. 3) uevent interface should be notified when userspace handlers are / are not available so that events can be queued to be handled when userspace does turn up and re-register a hotplug event handler. 4) request_firmware_async() introduced: asynchronous firmware loading interface built on basis of 3, such that problems at early boot and suspend / resume are avoided. Also request_firmware_async() taught to retain firmware in memory by default such that subsequent calls do not require a call to userspace if userspace is unavailable. 6) Documentation on correct firmware loading behaviour for driver authors written, together with migration notes for existing users of request_firmware(). 7) Existing uses of request_firmware() converted. *Grumble ahead* Unfortunately, I don't properly understand the uevent interface, so some of the above may be inaccurate. This is *not* my fault as there is no decent documentation (and btw, telling me to write the documentation is not the answer to that problem). Michael-Luke -
We should probably leave the whole firmware class alone, add something Uevents are queued by the netlink socket buffer. If userspace can't receice them (udevd) while it's frozen, they will just be handled (in the right order) when userspace runs again. We can queue up thousands of The event emission in the kernel and the userspace side is probably fine. Two years ago, benh already run into exactly the same suspend Sure, we always need at least one working example before we can even tell what's the right way to do it. :) Kay -
Translated: "I don't know what I'm talking about, but I still like to flame people. I'm too lazy to figure out how it currently works, but I'll just state it is not my fault". Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Figuring out how a userspace/kernelspace interface works should not rely on having to read kernelspace code. Unfortunately, in the case of hotplug / uevents, there is no such documentation. Thus, what kernelspace / userspace interactions actually are and what they should be is unclear, leading to confusion over corner cases, such as this one. Michael-Luke Jones -
WARN_ON()? This should not happen, and if it does, we'll end up with non-working device. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
You're using the PM_PRE_FREEZE and PM_POST_THAW notifiers for both this and the userspace helper change. Is it your intention that drivers should continue to request these services but encounter an error if the request occurs at the wrong time? Or do you expect drivers to use the notifier chains to know when they shouldn't make any requests? In the second case you may have a problem, because there's no specification about the order in which the notifiers are sent. The service may get disabled before the driver learns it isn't available, or the driver may think the service is once again available before it gets enabled. Alan Stern -
In fact, I'd like drivers to use notifiers to actually load the firmware into memory before hibernation/suspend. Namely, if there's PM_PRE_FREEZE, the driver calls request_firmware() from within the notifier and saves the firmware in memory for future use, if need be. Later, when PM_POST_THAW comes, the memory holding the firmware is released. Unfortunately there are drivers that call request_firmware() directly from .resume() which blocks until timeout expires and fails anyway. I just wanted Yes. Greetings, Rafael -
Stupid question time. Wouldn't it just be easier to have request_firmware() keep a copy of the firmware once it's been loaded? We're not talking about a lot of memory that would be wasted, and that way no drivers have to be changed. Ray -
Actually, I like this idea. Firmware problems magically disappear. ...unless someone uses x86 emulator in userland to POST graphics card. You can't 'cache' that. But that's separate problem. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Hi, This is the second revision of the patches that introduce hibernation and suspend notifiers. Generally, I have followed the Alan's suggestion to use a blocking notifier chain and the Pavel's suggestion to limit the number of events. Also, I've dropped the patch to disable the requesting of firmware. Comments welcome. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth -
From: Rafael J. Wysocki <rjw@sisk.pl>
Use a hibernation and suspend notifier to disable the user mode helper before
a hibernation/suspend and enable it after the operation.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
kernel/kmod.c | 33 +++++++++++++++++++++++++++++++--
1 file changed, 31 insertions(+), 2 deletions(-)
Index: linux-2.6.22-rc3/kernel/kmod.c
===================================================================
--- linux-2.6.22-rc3.orig/kernel/kmod.c 2007-05-31 00:00:37.000000000 +0200
+++ linux-2.6.22-rc3/kernel/kmod.c 2007-06-02 00:01:47.000000000 +0200
@@ -33,6 +33,8 @@
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/resource.h>
+#include <linux/notifier.h>
+#include <linux/suspend.h>
#include <asm/uaccess.h>
extern int max_threads;
@@ -46,6 +48,14 @@ static struct workqueue_struct *khelper_
*/
char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";
+/*
+ * If set, both call_usermodehelper_keys() and call_usermodehelper_pipe() exit
+ * immediately returning -EBUSY. Used for preventing user land processes from
+ * being created after the user land has been frozen during a system-wide
+ * hibernation or suspend operation.
+ */
+static int usermodehelper_disabled;
+
/**
* request_module - try to load a kernel module
* @fmt: printf style format string for the name of the module
@@ -251,6 +261,24 @@ static void __call_usermodehelper(struct
complete(sub_info->complete);
}
+static int usermodehelper_pm_callback(struct notifier_block *nfb,
+ unsigned long action,
+ void *ignored)
+{
+ switch (action) {
+ case PM_HIBERNATION_PREPARE:
+ case PM_SUSPEND_PREPARE:
+ usermodehelper_disabled = 1;
+ return NOTIFY_OK;
+ case PM_POST_HIBERNATION:
+ case PM_POST_SUSPEND:
+ usermodehelper_disabled = 0;
+ return NOTIFY_OK;
+ }
+
+ return NOTIFY_DONE;
+}
+
/**
* call_usermodehelper_keys - start a usermode application
* @path: pathname for the application
@@ -276,7 +304,7 @@ int ...ACK. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
From: Rafael J. Wysocki <rjw@sisk.pl>
Make it possible to register hibernation and suspend notifiers, so that
subsystems can perform hibernation-related or suspend-related operations that
should not be carried out by device drivers' .suspend() and .resume() routines.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
Documentation/power/notifiers.txt | 50 ++++++++++++++++++++++++++++++++++++++
include/linux/notifier.h | 6 ++++
include/linux/suspend.h | 37 +++++++++++++++++++++++++---
kernel/power/disk.c | 16 +++++++++---
kernel/power/main.c | 9 ++++++
kernel/power/power.h | 10 +++++++
kernel/power/user.c | 11 ++++++--
7 files changed, 129 insertions(+), 10 deletions(-)
Index: linux-2.6.22-rc3/include/linux/suspend.h
===================================================================
--- linux-2.6.22-rc3.orig/include/linux/suspend.h 2007-05-31 00:00:38.000000000 +0200
+++ linux-2.6.22-rc3/include/linux/suspend.h 2007-06-01 22:55:01.000000000 +0200
@@ -54,7 +54,8 @@ struct hibernation_ops {
void (*restore_cleanup)(void);
};
-#if defined(CONFIG_PM) && defined(CONFIG_SOFTWARE_SUSPEND)
+#ifdef CONFIG_PM
+#ifdef CONFIG_SOFTWARE_SUSPEND
/* kernel/power/snapshot.c */
extern void __register_nosave_region(unsigned long b, unsigned long e, int km);
static inline void register_nosave_region(unsigned long b, unsigned long e)
@@ -72,7 +73,7 @@ extern unsigned long get_safe_page(gfp_t
extern void hibernation_set_ops(struct hibernation_ops *ops);
extern int hibernate(void);
-#else
+#else /* CONFIG_SOFTWARE_SUSPEND */
static inline void register_nosave_region(unsigned long b, unsigned long e) {}
static inline void register_nosave_region_late(unsigned long b, unsigned long e) {}
static inline int swsusp_page_is_forbidden(struct page *p) { return 0; }
@@ -81,7 +82,7 @@ static inline void swsusp_unset_page_fre
static inline void hibernation_set_ops(struct hibernation_ops ...One more question: what will we call in suspend-to-both case? I do not think we can call SUSPEND_PREPARE with frozen tasks, so just one call Ack. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Thanks! Rafael -- "Premature optimization is the root of all evil." - Donald Knuth -
(Should it be mentioned in docuementation?) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
Eventually, yes. Still, for now, the STR code in user.c is outdated and needs fixing. I'll take care of it later this week. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth -
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Lo |
