Re: [PATCH v2 1/4] Defer skb allocation -- add destroy buffers function for virtio

Previous thread: [PATCH 2.6.32-rc6] drivers/net: ks8851_mll ethernet network driver -resubmit by Choi, David on Thursday, November 19, 2009 - 6:34 pm. (2 messages)

Next thread: [PATCH 1/1] Defer skb allocation for both mergeable buffers and big packets in virtio_net by Shirley Ma on Thursday, November 19, 2009 - 11:15 pm. (20 messages)
From: Shirley Ma
Date: Thursday, November 19, 2009 - 11:09 pm

Guest virtio_net receives packets from its pre-allocated vring 
buffers, then it delivers these packets to upper layer protocols
as skb buffs. So it's not necessary to pre-allocate skb for each
mergable buffer, then frees it when it's useless. 

This patch has deferred skb allocation when receiving packets for
both big packets and mergeable buffers. It reduces skb pre-allocations 
and skb_frees.

Based on Mickael & Avi's suggestion. A destroy function has been created
to push virtio free buffs to vring for unused pages, and used page private
to maintain page list.

I didn't touch small packet skb allocation to avoid extra copies for small
packets.

This patch has tested and measured against 2.6.32-rc5 git. It is built again
 2.6.32-rc7 kernel. Tests have been done for small packets, big packets and
mergeable buffers.

The single netperf TCP_STREAM performance improved for host to guest. 
It also reduces UDP packets drop rate.

The netperf laptop results were:

mtu=1500
netperf -H xxx -l 120

		w/o patch	w/i patch (two runs)	
guest to host:  3336.84Mb/s   3730.14Mb/s ~ 3582.88Mb/s

host to guest:  3165.10Mb/s   3370.39Mb/s ~ 3407.96Mb/s

--

From: Rusty Russell
Date: Sunday, November 22, 2009 - 5:53 pm

Nice!

Is this using mergeable_rx_bufs?  Or just big_packets?

I'd like to drop big packet support from our driver, but I don't know
how many kvm hosts will not offer mergable rx bufs yet.  Anthony?

Thanks,
Rusty.
--

From: Mark McLoughlin
Date: Monday, November 23, 2009 - 1:51 am

Mergeable rx bufs were first added in kvm-80 and qemu-0.10.0

So e.g., it's in Fedora since Fedora 11 and RHEL since 5.4

Cheers,
Mark.

--

From: Michael S. Tsirkin
Date: Tuesday, December 8, 2009 - 5:21 am

One issue with mergeable buffers is that they can work well with zero
copy RX, but only if hardware merges RX buffers in the same way virtio
does.

However, I suspect that most hardware can not do this, and simply
scatters data to a buffer you give it and then reports completion.
For such simple hardware, RX can work without extra data copies when guest
uses big packets (just scatter data into guest buffers), but when guest
tries to use mergeable buffers, host will have to merge them and post a
single big packet to hardware.  Thus host will be able to only post
about 16 buffers to hardware, and performance will suffer.


-- 
MST
--

From: Shirley Ma
Date: Friday, December 11, 2009 - 5:28 am

This is a patch-set for deferring skb allocation based on Rusty and
Michael's inputs.

Guest virtio_net receives packets from its pre-allocated vring 
buffers, then it delivers these packets to upper layer protocols
as skb buffs. So it's not necessary to pre-allocate skb for each
mergable buffer, then frees it when it's useless. 

This patch has deferred skb allocation when receiving packets for
both big packets and mergeable buffers. It reduces skb pre-allocations 
and skb_frees.

A destroy function has been created to push virtio free buffs to vring
for unused pages, and used page private to maintain page list.

This patch has tested and measured against 2.6.32-rc7 git. It is built
again 2.6.32 kernel. Tests have been done for small packets, big packets
and mergeable buffers.

The single netperf TCP_STREAM performance improved for host to guest. 
It also reduces UDP packets drop rate.

Thanks
Shirley

--

From: Shirley Ma
Date: Friday, December 11, 2009 - 5:33 am

Signed-off-by: <xma@us.ibm.com>
-------------

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c708ecc..bb5eb7b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -107,6 +107,16 @@ static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
 	return p;
 }
 
+static void virtio_net_free_pages(void *buf)
+{
+	struct page *page, *next;
+
+	for (page = buf; page; page = next) {
+		next = (struct page *)page->private;
+		__free_pages(page, 0);
+	}
+}
+
 static void skb_xmit_done(struct virtqueue *svq)
 {
 	struct virtnet_info *vi = svq->vdev->priv;
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index fbd2ecd..db83465 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -334,6 +334,29 @@ static bool vring_enable_cb(struct virtqueue *_vq)
 	return true;
 }
 
+static int vring_destroy_bufs(struct virtqueue *_vq, void (*destroy)(void *))
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	void *buf;
+	unsigned int i;
+	int freed = 0;
+
+	START_USE(vq);
+
+	for (i = 0; i < vq->vring.num; i++) {
+		if (vq->data[i]) {
+			/* detach_buf clears data, so grab it now. */
+			buf = vq->data[i];
+			detach_buf(vq, i);
+			destroy(buf);
+			freed++;
+		}
+	}
+
+	END_USE(vq);
+	return freed;
+}
+
 irqreturn_t vring_interrupt(int irq, void *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
@@ -360,6 +383,7 @@ static struct virtqueue_ops vring_vq_ops = {
 	.kick = vring_kick,
 	.disable_cb = vring_disable_cb,
 	.enable_cb = vring_enable_cb,
+	.destroy_bufs = vring_destroy_bufs,
 };
 
 struct virtqueue *vring_new_virtqueue(unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 057a2e0..e6d7d77 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -71,6 +71,7 @@ struct virtqueue_ops {
 
 	void (*disable_cb)(struct virtqueue *vq);
 	bool (*enable_cb)(struct virtqueue *vq);
+	int (*destroy_bufs)(struct virtqueue ...
From: Michael S. Tsirkin
Date: Sunday, December 13, 2009 - 3:26 am

This is not the best way to split the patch,
as I commented separately: this adds static function
but no way to see whether it does what it should do.
If you think about it, free should always go into same patch


Hmm, you are calling detach on a buffer which was not used.
It's not safe to call this while host might use buffers, is it?



Typo above.  Please document the new field.
Also please document assumptions about usage.

I think callback should get struct virtqueue *vq, and decrement num.
And destroy_bufs should return void, which is much more natural for a
destructor.


That said - do we have to use a callback?
I think destroy_buf which returns data pointer,
and which we call repeatedly until we get NULL
or error, would be an a better, more flexible API.
--

From: Shirley Ma
Date: Monday, December 14, 2009 - 1:08 pm

Hello Michael,

I agree with the comments (will have two patches instead of 4 based on
Rusty's comments) except below one.


The reason to use this is because in virtio_net remove, it has
BUG_ON(vi->num != 0), which will be consistent with small skb packet. If
we use NULL, error then we lose the track for vi->num, since we don't
know how many buffers have been passed to ULPs or still unused.

Thanks
Shirley

--

From: Michael S. Tsirkin
Date: Monday, December 14, 2009 - 1:22 pm

I dont insist, but my idea was

for (;;) {
	b = vq->destroy(vq);
	if (!b)
		break;
	--vi->num;
	put_page(b);
}

so we do not have to lose track of the counter.

-- 
MST
--

From: Shirley Ma
Date: Monday, December 14, 2009 - 4:22 pm

Hello Michael,


That's not a bad idea, but to do this, then this fuction will be
specific for virtio_net device (vi is virtnet_info) only in a generic
virtqueue API.

Thanks
Shirley

--

From: Michael S. Tsirkin
Date: Tuesday, December 15, 2009 - 3:57 am

No, this code would be in virtio net.
destroy would simply be the virtqueue API that returns
data pointer.

-- 
MST
--

From: Rusty Russell
Date: Tuesday, December 15, 2009 - 3:36 pm

In this case it should be called "get_unused_buf" or something.  But I like
Shirley's approach here; destroy (with callback) accurately reflects the only
time this can be validly used.

Cheers,
Rusty.
--

From: Michael S. Tsirkin
Date: Tuesday, December 15, 2009 - 3:40 pm

I guess the actual requirement is that device must be
inactive.

As I said this is fine with me as well.
But I think the callback should get vq pointer besides the
data pointer, so that it can e.g. find the device if it needs to.
In case of virtio net this makes it possible
to decrement the outstanding skb counter in the callback.
Makes sense?

-- 
MST
--

From: Rusty Russell
Date: Tuesday, December 15, 2009 - 10:04 pm

Technically, the vq has to be inactive.  (This distinction may matter for

Sure.  I don't really mind either way, and I'm warming to the name
detach_buf :)

Cheers,
Rusty.
--

From: Rusty Russell
Date: Sunday, December 13, 2009 - 8:25 pm

Hi Shirley,

   These patches look quite close.  More review to follow :)

   This title needs revision.  It should start with virtio: (all the virtio
patches do, for easy identification after merge), eg:

	Subject: virtio: Add destroy callback for unused buffers

It also needs a commit message which explains the problem and the solution.
Something like this:

	There's currently no way for a virtio driver to ask for unused
	buffers, so it has to keep a list itself to reclaim them at shutdown.
	This is redundant, since virtio_ring stores that information.  So

This belongs in one of the future patches: it will cause an unused warning

unsigned int for return and freed counter?  Means changing prototype, but

You could simplify this a bit, since the queue must be inactive:

	destroy(vq->data[i]);
	detach_buf(vq, i);

Perhaps add a:
	/* That should have freed everything. */

destory typo :)

Cheers,
Rusty.
--

From: Shirley Ma
Date: Monday, December 14, 2009 - 3:09 pm

Thanks Rusty.

I agree with all these comments, will work on them.

Shirley

--

From: Shirley Ma
Date: Friday, December 11, 2009 - 5:43 am

Signed-off-by: Shirley Ma <xma@us.ibm.com>
--------------

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index bb5eb7b..100b4b9 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -80,29 +80,25 @@ static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb)
 	return (struct skb_vnet_hdr *)skb->cb;
 }
 
-static void give_a_page(struct virtnet_info *vi, struct page *page)
+static void give_pages(struct virtnet_info *vi, struct page *page)
 {
-	page->private = (unsigned long)vi->pages;
-	vi->pages = page;
-}
+	struct page *end;
 
-static void trim_pages(struct virtnet_info *vi, struct sk_buff *skb)
-{
-	unsigned int i;
-
-	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
-		give_a_page(vi, skb_shinfo(skb)->frags[i].page);
-	skb_shinfo(skb)->nr_frags = 0;
-	skb->data_len = 0;
+	/* Find end of list, sew whole thing into vi->pages. */
+	for (end = page; end->private; end = (struct page *)end->private);
+	end->private = (unsigned long)vi->pages;
+	vi->pages = page;
 }
 
 static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
 {
 	struct page *p = vi->pages;
 
-	if (p)
+	if (p) {
 		vi->pages = (struct page *)p->private;
-	else
+		/* use private to chain pages for big packets */
+		p->private = 0;
+	} else
 		p = alloc_page(gfp_mask);
 	return p;
 }
@@ -128,6 +124,85 @@ static void skb_xmit_done(struct virtqueue *svq)
 	netif_wake_queue(vi->dev);
 }
 
+static int skb_set_frag(struct sk_buff *skb, struct page *page,
+				int offset, int len)
+{
+	int i = skb_shinfo(skb)->nr_frags;
+	skb_frag_t *f;
+
+	i = skb_shinfo(skb)->nr_frags;
+	f = &skb_shinfo(skb)->frags[i];
+	f->page = page;
+	f->page_offset = offset;
+
+	if (len > PAGE_SIZE - f->page_offset)
+		f->size = PAGE_SIZE - f->page_offset;
+	else
+		f->size = len;
+
+	skb_shinfo(skb)->nr_frags++;
+	skb->data_len += f->size;
+	skb->len += f->size;
+
+	len -= f->size;
+	return len;
+}
+
+static struct sk_buff *skb_goodcopy(struct ...
From: Michael S. Tsirkin
Date: Sunday, December 13, 2009 - 4:24 am

Hmm, this scans the whole list each time.
OTOH, the caller probably can easily get list tail as well as head.
If we ask caller to give us list tail, and chain them at head, then
1. we won't have to scan the list each time

So this comment does not explain why this = 0 is here.
clearly = 0 does not chain anything.
Please add a bigger comment.

I think you also want to extend the comment at top of
file, where datastructure is, that explains the deferred

Maybe it's better not to prefix functions with skb_, one



Use min for clarity instead of opencoded if.
This will make it obvious that len won't ever become

I know you got this name from GOOD_COPY_LEN, but it's not
very good for a function :) and skb_ prefix is also confusing.

Comments about splitting patches  apply here as well.
No way to understand what this should do and whether it


I think reader will still wonder about is "why does it
need to be 16 byte aligned?". And this is what

So you are overriding *len here? why bother calculating it
then?

Also - this does *not* always copy all of data, and *page
tells us whether it did a copy or not? This is pretty confusing,
as APIs go. Also, if we have scatter/gather anyway,
why do we bother copying the head?
Also, before skb_set_frag skb is linear, right?
So in fact you do not need generic skb_set_frag,
you can just put stuff in the first fragment.
For example, pass the fragment number to skb_set_frag,
--

From: Shirley Ma
Date: Monday, December 14, 2009 - 2:23 pm

I could use page private to point to a list_head which will have a head
and a tail, but it will induce more alloc, and free, when this page is


If receiving buffer in mergeable buf and big packets, the packet is
small, then there is no scatter/gather, we can release the page for new
receiving, that was the reason to copy skb head. *page will be only used
by big packets path to indicate whether/where to start next skb frag if

You meant to reuse skb_put_frags() in ipoib_cm.c?

Thanks
Shirley

--

From: Michael S. Tsirkin
Date: Tuesday, December 15, 2009 - 4:21 am

Yes, we don't want that. But I think caller already has
list tail available because he built up the list,
so it should be possible to chain our pages
at head: head -> new pages -> old pages.

Is this call a rare one? Maybe we do not need
to optimize this list scan, but if so let's
add a comment explaining why it does not matter.

If we are going to change data structures,
I think we should replace the linked list
simply with an array acting as a circular buffer.
But I am not asking you to implement it as

I guess the main complaint is that if a function
has copy in the name, one expects it to copy data.
Maybe split it up to functions that copy

I just mean we can pass fragment number to skb_set_frag.
In your code nr_fragments is always 0, right?

-- 
MST
--

From: Rusty Russell
Date: Sunday, December 13, 2009 - 11:54 pm

I don't think there's a good way of splitting this change across multiple
patches.  And I don't think this patch will compile; I don't think we can
get rid of trim_pages yet.

We *could* first split the receive paths into multiple parts as you have
(eg. add_recvbuf_big etc), then actually rewrite them, but that's too much
work to refactor the code twice.  

So just roll all the driver changes into one patch; so you will have two
patches: one which creates the destroy_bufs API for virtio, and one which

This actually allocates an skb, and transfers the first page to it (via copy

Perhaps you should return NULL here as an error if *len was < hdr_len?

Thanks,
Rusty.
--

From: Shirley Ma
Date: Monday, December 14, 2009 - 3:10 pm

Thanks Rusty, agree with you, working on them.

Shirley

--

From: Shirley Ma
Date: Friday, December 11, 2009 - 5:46 am

Signed-off-by: Shirley Ma <xma@us.ibm.com>
-------------

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 100b4b9..dde8060 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -203,6 +203,73 @@ static struct sk_buff *skb_goodcopy(struct virtnet_info *vi, struct page **page,
 	return skb;
 }
 
+static struct sk_buff *receive_big(struct virtnet_info *vi, struct page *page,
+				   unsigned int len)
+{
+	struct sk_buff *skb;
+
+	skb = skb_goodcopy(vi, &page, &len);
+	if (unlikely(!skb))
+		return NULL;
+
+	while (len > 0) {
+		len = skb_set_frag(skb, page, 0, len);
+		page = (struct page *)page->private;
+	}
+
+	if (page)
+		give_pages(vi, page);
+
+	return skb;
+}
+
+static struct sk_buff *receive_mergeable(struct virtnet_info *vi,
+					 struct page *page, unsigned int len)
+{
+	struct sk_buff *skb;
+	struct skb_vnet_hdr *hdr;
+	int num_buf, i;
+
+	if (len > PAGE_SIZE)
+		len = PAGE_SIZE;
+
+	skb = skb_goodcopy(vi, &page, &len);
+
+	if (unlikely(!skb))
+		return NULL;
+
+	hdr = skb_vnet_hdr(skb);
+	num_buf = hdr->mhdr.num_buffers;
+	while (--num_buf) {
+		struct page *page;
+
+		i = skb_shinfo(skb)->nr_frags;
+		if (i >= MAX_SKB_FRAGS) {
+			pr_debug("%s: packet too long %d\n", skb->dev->name,
+				 len);
+			skb->dev->stats.rx_length_errors++;
+			return skb;
+		}
+
+		page = vi->rvq->vq_ops->get_buf(vi->rvq, &len);
+		if (!page) {
+			pr_debug("%s: rx error: %d buffers missing\n",
+				 skb->dev->name, hdr->mhdr.num_buffers);
+			skb->dev->stats.rx_length_errors++;
+			return skb;
+		}
+
+		if (len > PAGE_SIZE)
+			len = PAGE_SIZE;
+
+		skb_set_frag(skb, page, 0, len);
+
+		vi->num--;
+	}
+
+	return skb;
+}
+
 static void receive_skb(struct net_device *dev, struct sk_buff *skb,
 			unsigned len)
 {
@@ -356,6 +423,103 @@ drop:
 	dev_kfree_skb(skb);
 }
 
+static int add_recvbuf_small(struct virtnet_info *vi, gfp_t gfp, bool *oom)
+{
+	struct sk_buff *skb;
+	struct skb_vnet_hdr ...
From: Michael S. Tsirkin
Date: Sunday, December 13, 2009 - 4:43 am

Interesting. I think skb_goodcopy will sometimes

don't put empty line here. if below is part of same logical block as


Local variable shadows a parameter.
It seems gcc will let you get away with a warning,


If this happens, we have corrupted memory already.
We do need this check, but please put is before you increment

This will propagate the error up the stack and corrupt

Here, skb is some random part of packet, don't propagate




MAX_SKB_FRAGS + 2 will be more readable.
Also, create a macro for this constant and document

Again, pls explain *why* do we want 16 byte alignment.
Also this code seems duplicated?
Please put structs at top of file where they

I prefer --i.
Also, total is just a constant.
So simply MAX_SKB_FRAGS + 1 will be clearer.
Why do we scan last to first?

space around - .
All the if (i == 1) handling on exit is really hard to grok.
How about moving common code out of this loop
into a function, and then you can
	for (i = total - 1; i > 1; i--) {
		handle(i);
	}
	handle(1);
	handle(0);


do we really need *oom here and below?

--

From: Shirley Ma
Date: Monday, December 14, 2009 - 3:08 pm

It is before increase for mergeable buffer case. Only one page(one frag)

I just copied the code from original code. There might not be a problem
This is for small packet. I didn't change that code since it will

We use page private to maintain next page, here there is no scan last to
first, just add the new page in list head instead of list tail, which

Ok.

--

From: Shirley Ma
Date: Monday, December 14, 2009 - 5:37 pm

Hello Michael,


I think I can remove skb link for small packet as well by adding
kfree_skb() in virtio_net_free_bufs for small packet.

Thanks
Shirley

--

From: Michael S. Tsirkin
Date: Tuesday, December 15, 2009 - 4:33 am

Hmm. Yes, I see, it is here:
+       if (*len) {
+               *len = skb_set_frag(skb, *page, offset, *len);
+               *page = (struct page *)(*page)->private;
+       } else {
+               give_pages(vi, *page);
+               *page = NULL;
+       }

So what I would suggest is, have function
that just copies part of skb, and have
caller open-code allocating the skb and free up

What I am asking is why do we add skb in vi->recv.



I mean the for loop: can it be for(i = 0, ..., ++i) just as well?
--

From: Shirley Ma
Date: Tuesday, December 15, 2009 - 9:25 am

Are asking why reverse order for new page to sg? The reason is we link
the new page in first, and only maintain the first pointer. So the most
recent new page should be related to sg[0], if we put the new page in
the last, then we need to travel the page list to get last pointer. Am I
missing your point here?

Thanks
Shirley 

--

From: Michael S. Tsirkin
Date: Tuesday, December 15, 2009 - 9:39 am

No, that was what I was looking for.

--

From: Shirley Ma
Date: Tuesday, December 15, 2009 - 11:42 am

I submit one split patch for review to make sure that's the right format.
I copied Rusty's comment for the commit message, and change destroy
to detach since we destroy the buffers in caller. This patch is built
against Dave's net-next tree.

There's currently no way for a virtio driver to ask for unused
buffers, so it has to keep a list itself to reclaim them at shutdown.
This is redundant, since virtio_ring stores that information.  So
add a new hook to do this: virtio_net will be the first user.

Signed-off-by: Shirley Ma <xma@us.ibm.com> 
---
 drivers/virtio/virtio_ring.c |   24 ++++++++++++++++++++++++
 include/linux/virtio.h       |    1 +
 2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index fbd2ecd..f847bc3 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -334,6 +334,29 @@ static bool vring_enable_cb(struct virtqueue *_vq)
 	return true;
 }
 
+/* This function is used to return vring unused buffers to caller for free */
+static void *vring_detach_bufs(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	unsigned int i;
+
+	START_USE(vq);
+
+	for (i = 0; i < vq->vring.num; ++i) {
+		if (vq->data[i]) {
+			/* detach_buf clears data, so grab it now. */
+			detach_buf(vq, i);
+			END_USE(vq);
+			return vq->data[i];
+		}
+	}
+	/* That should have freed everything. */
+	BUG_ON(vq->num_free != vq->vring.num);
+
+	END_USE(vq);
+	return NULL;
+}
+
 irqreturn_t vring_interrupt(int irq, void *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
@@ -360,6 +383,7 @@ static struct virtqueue_ops vring_vq_ops = {
 	.kick = vring_kick,
 	.disable_cb = vring_disable_cb,
 	.enable_cb = vring_enable_cb,
+	.detach_bufs = vring_detach_bufs,
 };
 
 struct virtqueue *vring_new_virtqueue(unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 057a2e0..d7da456 100644
--- a/include/linux/virtio.h
+++ ...
From: Michael S. Tsirkin
Date: Tuesday, December 15, 2009 - 11:47 am

Almost :) text not intended for git commit logs like the above

This is a single statement loop, so you do not need {}
around it. Or even better:
	for (i = 0; i < vq->vring.num; ++i) {
		if (!vq->data[i])
			continue;
		...
	}




Please add documentation in virtio.h
--

From: Shirley Ma
Date: Tuesday, December 15, 2009 - 12:08 pm

Thanks Michael, will fix them all.

Shirley

--

From: Shirley Ma
Date: Tuesday, December 15, 2009 - 12:14 pm

Hello Michael,


Nope, I changed the destroy to detach and return the buffers without
destroying them within the call. I thought it might be useful in some
other case. 

Maybe I should put destroy call back?

Thanks
Shirley

--

From: Michael S. Tsirkin
Date: Tuesday, December 15, 2009 - 2:14 pm

No I think it's good as is, we do not need a callback.
I was simply saying that detach_buf sets data to NULL,
so return vq->data[i] after detach does not make sense.
You need to save data as comment below says.c

-- 
MST
--

From: Shirley Ma
Date: Friday, December 11, 2009 - 5:49 am

Signed-off-by: Shirley Ma <xma@us.ibm.com>
-------------

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index dde8060..b919169 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -270,99 +270,44 @@ static struct sk_buff *receive_mergeable(struct virtnet_info *vi,
 	return skb;
 }
 
-static void receive_skb(struct net_device *dev, struct sk_buff *skb,
-			unsigned len)
+static void receive_buf(struct net_device *dev, void *buf, unsigned int len)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
-	struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
-	int err;
-	int i;
+	struct sk_buff *skb = (struct sk_buff *)buf;
+	struct page *page = (struct page *)buf;
+	struct skb_vnet_hdr *hdr;
 
 	if (unlikely(len < sizeof(struct virtio_net_hdr) + ETH_HLEN)) {
 		pr_debug("%s: short packet %i\n", dev->name, len);
 		dev->stats.rx_length_errors++;
-		goto drop;
-	}
-
-	if (vi->mergeable_rx_bufs) {
-		unsigned int copy;
-		char *p = page_address(skb_shinfo(skb)->frags[0].page);
-
-		if (len > PAGE_SIZE)
-			len = PAGE_SIZE;
-		len -= sizeof(struct virtio_net_hdr_mrg_rxbuf);
-
-		memcpy(&hdr->mhdr, p, sizeof(hdr->mhdr));
-		p += sizeof(hdr->mhdr);
-
-		copy = len;
-		if (copy > skb_tailroom(skb))
-			copy = skb_tailroom(skb);
-
-		memcpy(skb_put(skb, copy), p, copy);
-
-		len -= copy;
-
-		if (!len) {
-			give_pages(vi, skb_shinfo(skb)->frags[0].page);
-			skb_shinfo(skb)->nr_frags--;
+		if (vi->mergeable_rx_bufs || vi->big_packets) {
+			give_pages(vi, page);
+			return;
 		} else {
-			skb_shinfo(skb)->frags[0].page_offset +=
-				sizeof(hdr->mhdr) + copy;
-			skb_shinfo(skb)->frags[0].size = len;
-			skb->data_len += len;
-			skb->len += len;
-		}
-
-		while (--hdr->mhdr.num_buffers) {
-			struct sk_buff *nskb;
-
-			i = skb_shinfo(skb)->nr_frags;
-			if (i >= MAX_SKB_FRAGS) {
-				pr_debug("%s: packet too long %d\n", dev->name,
-					 len);
-				dev->stats.rx_length_errors++;
-				goto drop;
-			}
-
-			nskb = ...
From: Michael S. Tsirkin
Date: Sunday, December 13, 2009 - 4:08 am

Do not cast away void*.
This initialization above looks very strange: in
fact only one of skb, page makes sense.
So I think you should either get rid of both page and
skb variables (routines such as give_pages get page *
so they will accept void* just as happily) or
move initialization or even use of these variables to

So above you override skb that you initialized
at the start of function. It would be better
to do in the 3'd case:
		skb = buf;
as well. And then it would be clear why
" Only for mergeable and big" comment below



This looks like double free to me: should not you remove code that does
--

From: Shirley Ma
Date: Tuesday, December 15, 2009 - 1:43 am

The label is used when checking buf len, but I will remove drop in the

Nope, this is not a double free, the queue recv skb is still there for
small packet size. But I will remove it in the update patch.

Thanks
Shirley

--

From: Michael S. Tsirkin
Date: Sunday, December 13, 2009 - 3:19 am

Shirley, some advice on packaging patches
that I hope will be helpful:

You did try to split up the patch logically,
and it's better than a single huge patch, but it
can be better.  For example, you add static functions
in one patch and use them in another patch,
this works well for new APIs which are documented
so you can understand from the documentation
what function should do, but not for internal, static functions:
one ends up looking at usage, going back to implementation,
back to usage, each time switching between patches.

One idea on how to split up the patch set better:
- add new "destroy" API and supply documentation
- switch current implementation over to use destroy API
- split current implementation into subfunctions
  handling mergeable/big cases
- convert functions to use deferred allocation
  [would be nice to convert mergeable/big separately,
   but I am not sure this is possible while keeping
   patchset bisectable]

Some suggestions on formatting:
- keep patch names short, and prefix with module name,
  not patchset name, so that git summaries look nicer. E.g.
  Defer skb allocation -- add destroy buffers function for virtio
  Would be better "virtio: add destroy buffers function".
- please supply commit message with some explanation
  and motivation that will help someone looking
  at git history, without background from mailing list.
  E.g.
  "Add "destroy" vq API that returns all posted
   buffers back to caller. This makes it possible
   to avoid tracking buffers in callers to free
   them on vq teardown. Will be used by deferred
   skb allocation patch.".
- Use "---" to separate description from text,
  and generally make patch acceptable to git am.
  It is a good idea to use git to generate patches,
  for example with git format-patch.

E.g. the above should go into commit message for the virtio net

I think you need to base your patch on Dave's net-next,
it's too late to put it in 2.6.32, or even 2.6.33.
It also should probably go in ...
From: Shirley Ma
Date: Monday, December 14, 2009 - 12:59 pm

Ok, I will run Dave's net-next tree.

Thanks
Shirley

--

Previous thread: [PATCH 2.6.32-rc6] drivers/net: ks8851_mll ethernet network driver -resubmit by Choi, David on Thursday, November 19, 2009 - 6:34 pm. (2 messages)

Next thread: [PATCH 1/1] Defer skb allocation for both mergeable buffers and big packets in virtio_net by Shirley Ma on Thursday, November 19, 2009 - 11:15 pm. (20 messages)