Re: [RFC 09/15] PM / Hibernate: user, implement user_ops writer

Previous thread: [RFC 12/15] PM / Hibernate: split snapshot_read_next by Jiri Slaby on Tuesday, March 23, 2010 - 9:17 am. (2 messages)

Next thread: [RFC 05/15] PM / Hibernate: group swap ops by Jiri Slaby on Tuesday, March 23, 2010 - 9:17 am. (2 messages)
From: Jiri Slaby
Date: Tuesday, March 23, 2010 - 9:17 am

Switch /dev/snapshot writer to sws_module_ops approach so that we
can transparently rewrite the rest of the snapshot from pages pulling
to their pushing through layers.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Nigel Cunningham <ncunningham@crca.org.au>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
---
 kernel/power/user.c |  113 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 105 insertions(+), 8 deletions(-)

diff --git a/kernel/power/user.c b/kernel/power/user.c
index 20bf34c..748567d 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -66,15 +66,82 @@ static struct snapshot_data {
 
 atomic_t snapshot_device_available = ATOMIC_INIT(1);
 
+static void *to_do_buf;
+static struct workqueue_struct *suspend_worker;
+static DECLARE_WAIT_QUEUE_HEAD(to_do_wait);
+static DECLARE_WAIT_QUEUE_HEAD(to_do_done);
+static DECLARE_BITMAP(to_do_flags, 10);
+
+#define TODO_WORK	1
+#define TODO_FINISH	2
+#define TODO_CLOSED	3
+#define TODO_ERROR	4
+
 static unsigned long user_storage_available(void)
 {
 	return ~0UL; /* we have no idea, maybe we will fail later */
 }
 
+static int get_user_writer(void)
+{
+	return 0;
+}
+
+static int user_write_page(void *buf, struct bio **bio_chain)
+{
+	int err = 0;
+
+	if (test_bit(TODO_CLOSED, to_do_flags))
+		return -EIO;
+
+	to_do_buf = buf;
+	wmb();
+	set_bit(TODO_WORK, to_do_flags);
+	wake_up_interruptible(&to_do_wait);
+
+	wait_event(to_do_done, !test_bit(TODO_WORK, to_do_flags) ||
+			(err = test_bit(TODO_CLOSED, to_do_flags)));
+
+	return err ? -EIO : 0;
+}
+
+static int put_user_writer(unsigned int flags, int error)
+{
+	int err = 0;
+
+	if (error)
+		set_bit(TODO_ERROR, to_do_flags);
+	set_bit(TODO_FINISH, to_do_flags);
+	wake_up_interruptible(&to_do_wait);
+
+	wait_event(to_do_done, !test_bit(TODO_FINISH, to_do_flags) ||
+			(err = test_bit(TODO_CLOSED, to_do_flags)));
+
+	if (!error && err)
+		error = -EIO;
+
+	return error;
+}
+
 struct sws_module_ops user_ops = {
 ...
From: Pavel Machek
Date: Wednesday, March 24, 2010 - 1:42 pm

Uhuh, open-coded barriers... these need to be commented, and I guess
you just should not play this kind of trickery.

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

From: Jiri Slaby
Date: Wednesday, March 24, 2010 - 2:40 pm

It's just to ensure the to_do_buf store is not reordered with the
set_bit. I wanted to avoid locks as too heavy tools here.

thanks,
-- 
js
--

From: Pavel Machek
Date: Thursday, March 25, 2010 - 2:36 pm

Locks are the only sane choice here. Open coding them is not an option.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

From: Rafael J. Wysocki
Date: Thursday, March 25, 2010 - 3:14 pm

No, please use them, at least in a prototype version.

We can always optimize things out later, but doing optimizations upfront
doesn't really work well from my experience.

So, if you'd use a lock somewhere, please use it, or maybe use a completion if
that fits the design better.  In the majority of cases it's not as heavy wieght
as it seems at first sight.

Rafael
--

From: Jiri Slaby
Date: Friday, March 26, 2010 - 2:34 am

That's it, I don't think a lock is appropriate here (I didn't even think
of that) -- I don't know what to lock (OK, I see it, but it's not that
clear). There is no potential for race per se, I only need to disable
reordering (which locks do as a side-effect). I need the steps to be
done in the A-B order where there is a barrier appropriate. Here, A is
store to to_do_buf, B is set_bit. It's I set to_do_buf, flag that it may
be used, the consumer will see the flag and use to_do_buf, in this order.

Above that if I introduce locks the wait_event on the other side will
grow into an unreadable mess. I would need to hold a lock when checking
the condition and hold it until I reach to_do_buf use, but also unlock
it on all paths that do not reach that point. Yeah, it's indeed doable,
but I don't think, it will improve things.

I also don't think completion is appropriate here, as I have a condition
to check for and it differs over wake_up sites.

thanks,
-- 
js
--

From: Rafael J. Wysocki
Date: Friday, March 26, 2010 - 3:04 pm

OK

I have some other comments to this patch, but I'd like to understand what
really happens here, since the changelog is not too verbose.

Please explain the design here.

Thanks,
Rafael
--

From: Jiri Slaby
Date: Monday, March 29, 2010 - 8:33 am

Yeah, my bad.

What I want to achieve is to revert the behaviour of current
implementation to the one used in toi. Now, it is that when writing a
page to the swap, the hibernation does snapshot_read_page(page) with
swap_write_page(page) in a loop. This is OK until one needs to do
anything with the page.

When compression or encryption (or both) are needed to be done, this
scenario does not work well because page returned by snapshot_read_page,
after going through the crypto layers, is no longer PAGE_SIZE big.

Probably the easiest solution is to revert it as noted above: a page is
taken from snapshot (with patches I have here the snapshot layer is only
told to "store next page" without returning a page to the caller), fed
through crypto layers as needed and finally given to chunk writer which
assembles PAGE_SIZE blocks from the chunks. Then whole pages of
compressed/encrypted data are given to user or in-kernel block io by
hibernate_io_ops->write_page. In-kernel .write_page simply calls
swap_write_page (which in turn hib_bio_write_page while storing swap
sector entries).

User .write_page, with buf as a parameter, does the following:
* to_do_buf = buf
* set WORK bit
* wake_up .read (below)
* wait_for WORK bit clear

Writer in this case is a user process performing fops->read. .read does
the following:
* wait_for WORK bit
* copy_to_user to_do_buf
* clear WORK bit
* wake_up .write_page


I need the barrier to ensure the "to_do_buf = buf" assignment is not
reordered with the "set WORK bit" in .write_page. Otherwise .read will
see WORK bit set, but have to_do_buf not updated. I certainly can lock
the two operation together, but (a) I see no point in doing so; (b) it
makes the code worse (I need to unlock and relock in wait_event and
unlock on all fail paths).


The very similar happens for fops->write, ergo hibernate_io_ops->read_page.

thanks,
-- 
js
--

From: Pavel Machek
Date: Saturday, March 27, 2010 - 12:02 am

Could you just use same locking user.c uses? It does not really do any
magic...
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

Previous thread: [RFC 12/15] PM / Hibernate: split snapshot_read_next by Jiri Slaby on Tuesday, March 23, 2010 - 9:17 am. (2 messages)

Next thread: [RFC 05/15] PM / Hibernate: group swap ops by Jiri Slaby on Tuesday, March 23, 2010 - 9:17 am. (2 messages)