The following patches on top of net-next fix issues related to write logging in vhost. This fixes all known to me logging issues, migration now works for me while under stress in both TX and RX directions. Rusty's going on vacation, I am guessing he won't have time to review this: Gleb, Juan, Herbert, could one of you review this patchset please? There's also the send queue full issue reported by Sridhar Samudrala which I'm testing various fixes for, that patch is contained to vhost/net though, so there's no conflict, patch will be posted separately. Michael S. Tsirkin (3): vhost: logging thinko fix vhost: initialize log eventfd context pointer vhost: fix get_user_pages_fast error handling drivers/vhost/vhost.c | 14 +++++++++----- 1 files changed, 9 insertions(+), 5 deletions(-) --
vhost was dong some complex math to get
offset to log at, and got it wrong by a couple of bytes,
while in fact it's simple: get address where we write,
subtract start of buffer, add log base.
Do it this way.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/vhost/vhost.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 6eb1525..c767279 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1004,10 +1004,12 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
if (unlikely(vq->log_used)) {
/* Make sure data is seen before log. */
smp_wmb();
- log_write(vq->log_base, vq->log_addr + sizeof *vq->used->ring *
- (vq->last_used_idx % vq->num),
- sizeof *vq->used->ring);
- log_write(vq->log_base, vq->log_addr, sizeof *vq->used->ring);
+ log_write(vq->log_base,
+ vq->log_addr + ((void *)used - (void *)vq->used),
+ sizeof *used);
+ log_write(vq->log_base,
+ vq->log_addr + offsetof(struct vring_used, idx),
+ sizeof vq->used->idx);
if (vq->log_ctx)
eventfd_signal(vq->log_ctx, 1);
}
--
1.7.0.18.g0d53a5
--
Once here, can we add a comment explaining _what_ are we trying to write to the log? michael explains that t is the used element and the index, --
get_user_pages_fast returns number of pages on success, negative value on failure, but never 0. Fix vhost code to match this logic. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- drivers/vhost/vhost.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index d4f8fdf..d003504 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -646,8 +646,9 @@ static int set_bit_to_user(int nr, void __user *addr) int bit = nr + (log % PAGE_SIZE) * 8; int r; r = get_user_pages_fast(log, 1, 1, &page); - if (r) + if (r < 0) return r; + BUG_ON(r != 1); base = kmap_atomic(page, KM_USER0); set_bit(bit, base); kunmap_atomic(base, KM_USER0); -- 1.7.0.18.g0d53a5 --
I think no. get_user_pages_fast always returns number of pages pinned (in this case always 1) or an error (< 0). --
Just for the record I'm generally not interested in vhost patches. If it's a specific network one that will be merged via the networking tree, yes please CC: me. But if it's a bunch of changes to vhost.c and other pieces of infrastructure, feel free to leave me out of it. It just clutters my already overflowing inbox. Thanks. --
Dave, so while Rusty's on vacation, what's the best way to get vhost infrastructure fixes in? Are you ok with getting pull requests and merging them into net-next? That should keep the clutter in your inbox to the minimum. Of course network changes would still go the usual way. -- MST --
From: "Michael S. Tsirkin" <mst@redhat.com> Well, who is providing oversight of vhost work while he's gone? Has he, implicitly or explicitly, appointed a maintainer while he's away? --
My plan was to get peer review of the patches before merging. Implicitly, I guess. He said "if there's an issue Michael Tsirkin is the best person to resolve it", this was wrt merging his virtio&lguest tree. He didn't mention vhost, I wrote all of vhost though, there shouldn't be an issue with that. -- MST --
From: "Michael S. Tsirkin" <mst@redhat.com> That's good enough for me. Feel free to setup a tree for me to pull from. --
It can return 0 if you ask for 0 pages :) From the comment: * Returns number of pages pinned. This may be fewer than the number * requested. If nr_pages is 0 or negative, returns 0. If no pages * were pinned, returns -errno. */ I agree that code was wrong, but the BUG_ON() is not neccessary IMHO. The important bit is the change in the comparison. --
vq log eventfd context pointer needs to be initialized, otherwise operation may fail or oops if log is enabled but log eventfd not set by userspace. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- drivers/vhost/vhost.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index c767279..d4f8fdf 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -121,6 +121,7 @@ static void vhost_vq_reset(struct vhost_dev *dev, vq->kick = NULL; vq->call_ctx = NULL; vq->call = NULL; + vq->log_ctx = NULL; } long vhost_dev_init(struct vhost_dev *dev, -- 1.7.0.18.g0d53a5 --
Reviewed-by: Juan Quintela <quintela@redhat.com> When log_ctx for device is created, it is copied to the vq. This reset --
