| From | Subject | Date |
|---|---|---|
| David Miller | Re: [PATCH] net/ipv4, linux-2.6.30.4
From: Daniel Slot <slot.daniel@gmail.com>
We already have code in the stack which tries to detect packet
reordering with a high level of sophistication.
--
| Aug 12, 2:55 pm 2009 |
| Gregory Haskins | AlacrityVM numbers updated for 31-rc4
I re-ran the numbers on 10GE against the actual alacrityvm v0.1 release
available in git on kernel.org.
I tried to include the newly announced "vhost" driver (Michael Tsirkin)
for virtio acceleration, but ran into issues getting the patches to apply.
For now, this includes native, virtio-u (virtio-userspace), and venet
all running on 31-rc4. If I can resolve the issue with Michaels
patches, I will add "virtio-k" (virtio-kernel) to the mix as well. For
now, here are the results for ...
| Aug 12, 1:43 pm 2009 |
| Javier Guerra | Re: AlacrityVM numbers updated for 31-rc4
On Wed, Aug 12, 2009 at 3:43 PM, Gregory
pseudo-3D charts are just wrong
(http://www.chuckchakrapani.com/articles/PDF/94070347.pdf)
--
Javier
--
| Aug 12, 2:33 pm 2009 |
| Michael S. Zick | Re: AlacrityVM numbers updated for 31-rc4
Nice quote.
Even 15 years after it was published, those first two graphs
are prime "bad examples".
The second two could stand some improvement also - like put
the legend and numbers *on* each solid bar.
@J.G. - I think you are fighting an uphill battle here against
human nature. People tend to go with their idea of "pretty".
Mike
--
| Aug 12, 3:05 pm 2009 |
| Anthony Liguori | Re: AlacrityVM numbers updated for 31-rc4
Just FYI, the numbers quoted are wrong for virtio-u. Greg's machine
didn't have high res timers enabled in the kernel. He'll post newer
numbers later but they're much better than these (venet is still ahead
though).
Regards,
Anthony Liguori
--
| Aug 12, 4:05 pm 2009 |
| Gregory Haskins | Re: AlacrityVM numbers updated for 31-rc4
Anthony is correct. The new numbers after fixing the HRT clock issue are:
virtio-u: 2670Mb/s, 266us rtt (3764 tps udp-rr)
I will update the charts later tonight.
Sorry for the confusion.
-Greg
| Aug 12, 4:21 pm 2009 |
| Dan Smith | [PATCH] c/r: Add AF_UNIX support (v10)
This patch adds basic checkpoint/restart support for AF_UNIX sockets. It
has been tested with a single and multiple processes, and with data inflight
at the time of checkpoint. It supports socketpair()s, path-based, and
abstract sockets.
Changes in v10:
- Moved header structure definitions back to checkpoint_hdr.h
- Moved AF_UNIX checkpoint/restart code to net/unix/checkpoint.c
- Make sock_unix_*() functions only compile if CONFIG_UNIX=y
- Add TODO for CONFIG_UNIX=m case
Changes ...
| Aug 12, 1:02 pm 2009 |
| Daniel Slot | [PATCH] net/ipv4, linux-2.6.30.4
RFC 4653 specifies Non-Congestion Robustness (NCR) for TCP.
In the absence of explicit congestion notification from the network,
TCP uses loss as an indication of congestion.
One of the ways TCP detects loss is using the arrival of three
duplicate acknowledgments.
However, this heuristic is not always correct, notably in the case
when network paths reorder segments (for whatever reason), resulting
in degraded performance.
TCP-NCR is designed to mitigate this degraded performance by
increasing ...
| Aug 12, 11:59 am 2009 |
| Daniel Slot | [PATCH] net/ipv4, linux-2.6.30.4
RFC 4653 specifies Non-Congestion Robustness (NCR) for TCP.
In the absence of explicit congestion notification from the network,
TCP uses loss as an indication of congestion.
One of the ways TCP detects loss is using the arrival of three
duplicate acknowledgments.
However, this heuristic is not always correct, notably in the case
when network paths reorder segments (for whatever reason), resulting
in degraded performance.
TCP-NCR is designed to mitigate this degraded performance by
increasing ...
| Aug 12, 11:50 am 2009 |
| Stephen Hemminger | Re: [PATCH] net/ipv4, linux-2.6.30.4
On Wed, 12 Aug 2009 20:50:59 +0200
Patch has funny indentation and awkward naming for socket elements.
Your style needs to match existing code.
What is the usage model for this? I expect that some user who wants
to enable this would be stuck somewhere with a lossy network and
would want to enable it. Or is it something only researchers will
want to play with?
It would be easier to use a sysctl value for this because otherwise
each application has to be changed to select the socket ...
| Aug 12, 12:02 pm 2009 |
| Daniel Slot | Re: [PATCH] net/ipv4, linux-2.6.30.4
I already tried to adapt the style to existing code.
Would be nice if you could give me hint about what is akward.
I'm quite new to the kernelhacking area and sorry for all obvious errors.
The usage is imho interesting in research domains.
As it is an implementation of an ietf RFC, there might be some
researchers who can use it.
It is part of my master thesis and I use it for measurements and comparisons.
I tried to avoid a sysctl value for this.
I don't wanted this algorithm to be used by ...
| Aug 12, 12:27 pm 2009 |
| Eilon Greenstein | [net-next 33/36] bnx2x: Beautify bnx2x_dump.h
Signed-off-by: Yitchak Gertner <gertner@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_dump.h | 890 +++++++++++++++++++++++-----------------------
1 files changed, 449 insertions(+), 441 deletions(-)
diff --git a/drivers/net/bnx2x_dump.h b/drivers/net/bnx2x_dump.h
index 78c6b03..3bb9a91 100644
--- a/drivers/net/bnx2x_dump.h
+++ b/drivers/net/bnx2x_dump.h
@@ -13,31 +13,35 @@
* The signature is time stamp, diag version and grc_dump version
...
| Aug 12, 11:24 am 2009 |
| Eilon Greenstein | [net-next 34/36] bnx2x: Removing unused definitions
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_reg.h | 811 -----------------------------------------------
1 files changed, 0 insertions(+), 811 deletions(-)
diff --git a/drivers/net/bnx2x_reg.h b/drivers/net/bnx2x_reg.h
index 1e6f5aa..95ebf3f 100644
--- a/drivers/net/bnx2x_reg.h
+++ b/drivers/net/bnx2x_reg.h
@@ -190,12 +190,6 @@
_(0..15) stands for the connection type (one of 16). */
#define ...
| Aug 12, 11:24 am 2009 |
| Eilon Greenstein | [net-next 32/36] bnx2x: Re-factor the initialization code
Moving the code to a more logical place and beautifying it. No real change in
behavior.
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x.h | 28 +++-
drivers/net/bnx2x_init.h | 320 ++++++--------------------------
drivers/net/bnx2x_init_ops.h | 419 ++++++++++++++++++++++++------------------
drivers/net/bnx2x_main.c | 79 ++++++--
drivers/net/bnx2x_reg.h | 113 +++++++++++
5 ...
| Aug 12, 11:24 am 2009 |
| Eilon Greenstein | [net-next 35/36] bnx2x: Whitespaces and comments
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x.h | 38 ++++++++-------
drivers/net/bnx2x_link.c | 115 +++++++++++++++++++++++-----------------------
drivers/net/bnx2x_main.c | 80 ++++++++++++++++----------------
drivers/net/bnx2x_reg.h | 3 +-
4 files changed, 121 insertions(+), 115 deletions(-)
diff --git a/drivers/net/bnx2x.h b/drivers/net/bnx2x.h
index 97bc5e0..bbf8422 100644
--- a/drivers/net/bnx2x.h
+++ b/drivers/net/bnx2x.h
@@ -314,9 ...
| Aug 12, 11:24 am 2009 |
| Eilon Greenstein | [net-next 29/36] bnx2x: Using macro for phy address
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_link.c | 108 ++++++++++++---------------------------------
drivers/net/bnx2x_link.h | 12 +++--
drivers/net/bnx2x_main.c | 8 +---
3 files changed, 39 insertions(+), 89 deletions(-)
diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
index 74f4d10..c2b0010 100644
--- a/drivers/net/bnx2x_link.c
+++ b/drivers/net/bnx2x_link.c
@@ -1547,10 +1547,7 @@ static u8 bnx2x_ext_phy_resove_fc(struct ...
| Aug 12, 11:24 am 2009 |
| Eilon Greenstein | [net-next 23/36] bnx2x: Remove the init_dmae field from bp
Moved the dmae_command from the heap to the stack. This will save 56
bytes per bnx2x structure. As a side benefit, we can also reduce the
time the dmae_mutex is held. This is because do we not need to hold
this mutex when setting up the dmae command. The memory where is dmae
command is stored is not a shared resource and doesn not need to be
protected.
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x.h | ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 36/36] bnx2x: update version to 1.52.1
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_main.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bnx2x_main.c b/drivers/net/bnx2x_main.c
index 54e3ef9..037d862 100644
--- a/drivers/net/bnx2x_main.c
+++ b/drivers/net/bnx2x_main.c
@@ -56,8 +56,8 @@
#include "bnx2x_init_ops.h"
#include "bnx2x_dump.h"
-#define DRV_MODULE_VERSION "1.48.114-1"
-#define DRV_MODULE_RELDATE "2009/07/29"
+#define ...
| Aug 12, 11:24 am 2009 |
| Eilon Greenstein | [net-next 28/36] bnx2x: Re-arrange the link structures f ...
Change ieee_fc to u16 instead of u32 and re-arrange the link parameters
structures
Signed-off-by: Yitchak Gertner <gertner@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_link.c | 6 +++---
drivers/net/bnx2x_link.h | 31 +++++++++++++++++++------------
2 files changed, 22 insertions(+), 15 deletions(-)
diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
index c163c42..74f4d10 100644
--- a/drivers/net/bnx2x_link.c
+++ ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 31/36] bnx2x: Using PCI_DEVICE macro
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_main.c | 9 +++------
1 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/net/bnx2x_main.c b/drivers/net/bnx2x_main.c
index a409767..23b9c7d 100644
--- a/drivers/net/bnx2x_main.c
+++ b/drivers/net/bnx2x_main.c
@@ -138,12 +138,9 @@ static struct {
static const struct pci_device_id bnx2x_pci_tbl[] = {
- { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_NX2_57710,
- PCI_ANY_ID, PCI_ANY_ID, 0, 0, ...
| Aug 12, 11:24 am 2009 |
| Eilon Greenstein | [net-next 27/36] bnx2x: Missing smp_wmb for statistics s ...
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_main.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/drivers/net/bnx2x_main.c b/drivers/net/bnx2x_main.c
index 656ee97..bcf8362 100644
--- a/drivers/net/bnx2x_main.c
+++ b/drivers/net/bnx2x_main.c
@@ -4411,6 +4411,9 @@ static void bnx2x_stats_handle(struct bnx2x *bp, enum bnx2x_stats_event event)
...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 30/36] bnx2x: Adding explicit casting
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_main.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/bnx2x_main.c b/drivers/net/bnx2x_main.c
index bdf7228..a409767 100644
--- a/drivers/net/bnx2x_main.c
+++ b/drivers/net/bnx2x_main.c
@@ -2926,7 +2926,7 @@ static inline void bnx2x_attn_int_deasserted0(struct bnx2x *bp, u32 attn)
REG_WR(bp, reg_offset, val);
BNX2X_ERR("FATAL HW block attention set0 0x%x\n",
- ...
| Aug 12, 11:24 am 2009 |
| Eilon Greenstein | [net-next 26/36] bnx2x: Remove SGMII configuration when ...
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_link.c | 13 ++++++++++---
1 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
index b81a057..c163c42 100644
--- a/drivers/net/bnx2x_link.c
+++ b/drivers/net/bnx2x_link.c
@@ -1276,14 +1276,14 @@ static void bnx2x_program_serdes(struct link_params *params,
struct bnx2x *bp = params->bp;
u16 ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 25/36] bnx2x: Keep only one HW path active
Disable bmac access while working with emac and keep the single lane SerDes in
reset while working with 4 lanes XGXS
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_link.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
index dc3b69e..b81a057 100644
--- a/drivers/net/bnx2x_link.c
+++ b/drivers/net/bnx2x_link.c
@@ -397,7 +397,8 @@ ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 24/36] bnx2x: Check unzip return code
Without this check, when running out of memory, we will see PSOD's in
bnx2x_init_fill() when doing a memset(). This is because at that time,
bp->gunzip_buf is not pointing to a valid allocated space.
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_main.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/drivers/net/bnx2x_main.c b/drivers/net/bnx2x_main.c
index cfcc4ee..656ee97 100644
--- ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 22/36] bnx2x: Updating regdump_len at drvinfo
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_main.c | 69 ++++++++++++++++++++++------------------------
1 files changed, 33 insertions(+), 36 deletions(-)
diff --git a/drivers/net/bnx2x_main.c b/drivers/net/bnx2x_main.c
index 347036f..dd9a77f 100644
--- a/drivers/net/bnx2x_main.c
+++ b/drivers/net/bnx2x_main.c
@@ -8946,50 +8946,15 @@ static int bnx2x_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
return 0;
}
-#define ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 13/36] bnx2x: Removing old PHY FW upgrade code
This code should not have resided in the driver. Now that we have a new
interface, this logic can reside in the application that whishes to upgrade the
PHY FW
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_link.c | 427 ----------------------------------------------
drivers/net/bnx2x_link.h | 2 -
2 files changed, 0 insertions(+), 429 deletions(-)
diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
index 98e3e8f..dc3b69e 100644
--- ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 19/36] bnx2x: Stop loading if error condition ...
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x.h | 1 +
drivers/net/bnx2x_main.c | 8 ++++++++
2 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/drivers/net/bnx2x.h b/drivers/net/bnx2x.h
index 004f4a8..633acca 100644
--- a/drivers/net/bnx2x.h
+++ b/drivers/net/bnx2x.h
@@ -89,6 +89,7 @@
} while (0)
#else
#define bnx2x_panic() do { \
+ bp->panic = 1; \
BNX2X_ERR("driver assert\n"); ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 18/36] bnx2x: Calling pci_set_drvdata earlier
In case of error, bnx2x_init_dev calls pci_set_drvdata(pdev, NULL)
Signed-off-by: Yitchak Gertner <gertner@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_main.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bnx2x_main.c b/drivers/net/bnx2x_main.c
index e9e4349..9eea52d 100644
--- a/drivers/net/bnx2x_main.c
+++ b/drivers/net/bnx2x_main.c
@@ -11884,14 +11884,14 @@ static int __devinit bnx2x_init_one(struct ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 21/36] bnx2x: Move printing of version from pr ...
Move printing of version from probe to the init function
Rather then checking if this is the first module probe call to print
the version of the driver only once, the statement is moved to the init
function of the module where init is only called once
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_main.c | 6 ++----
1 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/net/bnx2x_main.c ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 20/36] bnx2x: Combine get_pcie_width and get_p ...
The functions bnx2x_get_pcie_width() and bnx2x_get_pcie_speed() were
combined into bnx2x_get_pcie_width_speed() so that there is only
1 PCI read to PCICFG_OFFSET + PCICFG_LINK_CONTROL rather then 2 reads.
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_main.c | 34 ++++++++++++++++------------------
1 files changed, 16 insertions(+), 18 deletions(-)
diff --git a/drivers/net/bnx2x_main.c ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 17/36] bnx2x: Configurable pause scheme
When a given ring is running out of space, the FW can send pause towards the
network. When working with multi-queues, when one queue is getting out of space
it can block all other queues. The preferred scheme is to send pause frames only
when running out of the shared internal chip buffers and if a given queue cannot
place a packet on the host, it will drop it. Since some users might want to work
in drop-less mode, allowing changing the behavior as a module parameter.
Signed-off-by: Eilon ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 16/36] bnx2x: Adding Likely directive
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_main.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/drivers/net/bnx2x_main.c b/drivers/net/bnx2x_main.c
index 594168a..c4427ef 100644
--- a/drivers/net/bnx2x_main.c
+++ b/drivers/net/bnx2x_main.c
@@ -1612,7 +1612,8 @@ static int bnx2x_rx_int(struct bnx2x_fastpath *fp, int budget)
skb = new_skb;
- } else if ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 15/36] bnx2x: Prefetch the page containing the ...
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_main.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/drivers/net/bnx2x_main.c b/drivers/net/bnx2x_main.c
index e6fde5e..594168a 100644
--- a/drivers/net/bnx2x_main.c
+++ b/drivers/net/bnx2x_main.c
@@ -1497,6 +1497,13 @@ static int bnx2x_rx_int(struct bnx2x_fastpath *fp, int budget)
bd_prod = RX_BD(bd_prod);
bd_cons = ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 14/36] bnx2x: Reporting host statistics to man ...
This is required for NCSI statistics
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x.h | 1 +
drivers/net/bnx2x_main.c | 228 +++++++++++++++++++++++++++++++++++-----------
2 files changed, 176 insertions(+), 53 deletions(-)
diff --git a/drivers/net/bnx2x.h b/drivers/net/bnx2x.h
index 903c89d..1d0b727 100644
--- a/drivers/net/bnx2x.h
+++ b/drivers/net/bnx2x.h
@@ -777,6 +777,7 @@ struct bnx2x_slowpath {
struct nig_stats nig_stats;
struct ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 12/36] bnx2x: Supporting PHY FW upgrade
There are 3 operations that the driver needs to support to allow applications to
access the PHY FW (on top of the MDC/MDIO access). Since those are essentially
nvram access commands, adding them to the ethtool -E interface.
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_link.c | 21 ++++++-------
drivers/net/bnx2x_link.h | 4 ++
drivers/net/bnx2x_main.c | 72 ++++++++++++++++++++++++++++++++++-----------
3 files changed, 67 insertions(+), 30 ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 11/36] bnx2x: MDC/MDIO CL45 IOCTLs
As suggested by Ben Hutchings <bhutchings@solarflare.com>, using the MDC/MDIO
IOCTL
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/Kconfig | 1 +
drivers/net/bnx2x.h | 3 +
drivers/net/bnx2x_main.c | 121 ++++++++++++++++++++++++++++++++--------------
3 files changed, 88 insertions(+), 37 deletions(-)
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 9948fa2..29935a9 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2722,6 ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 10/36] bnx2x: Adding XAUI CL73 autoneg support
Adding CL73 support to the built in PHY in the 5771x device. Also supporting
fallbacks to CL73 if the link partner does not respond.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_link.c | 197 +++++++++++++++++++++++++++++++++++++++-------
drivers/net/bnx2x_reg.h | 13 +++
2 files changed, 180 insertions(+), 30 deletions(-)
diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
index ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 04/36] bnx2x: Supporting Device Control Channel
In multi-function mode, the FW can receive special management control commands
to set the Min/Max BW and the the function link state
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x.h | 6 +
drivers/net/bnx2x_hsi.h | 34 +++++-
drivers/net/bnx2x_main.c | 325 +++++++++++++++++++++++++++++++++-------------
3 files changed, 268 insertions(+), 97 deletions(-)
diff --git a/drivers/net/bnx2x.h b/drivers/net/bnx2x.h
index 16ccba8..5864ae2 100644
--- ...
| Aug 12, 11:22 am 2009 |
| Eilon Greenstein | [net-next 06/36] bnx2x: BCM8481 LED4 instead of LASI
The BCM8481 does not generate LASI interrupt for 10M, 100M and 1G link, so we
are using LED4 output as the interrupt input to the 57711. This requires some
adaptation in the link interrupt routines
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_link.c | 498 ++++++++++++++++++++++++++++++++++++++++------
drivers/net/bnx2x_reg.h | 32 +++
2 files changed, 468 insertions(+), 62 deletions(-)
diff --git ...
| Aug 12, 11:22 am 2009 |
| Eilon Greenstein | [net-next 09/36] bnx2x: BCM8727 FW load
The BCM8727 is a dual port PHY. The FW must be loaded in a given order on all
designs - including those which swapped the ports (calling port number zero the
second port)
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_link.c | 16 +++++++++-------
1 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
index aee9fff..db4f3c0 100644
--- ...
| Aug 12, 11:23 am 2009 |
| Eilon Greenstein | [net-next 07/36] bnx2x: Reading the FW version of the BC ...
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_link.c | 118 ++++++++++++++++++++++++++++++++++++++++++++--
1 files changed, 113 insertions(+), 5 deletions(-)
diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
index 9e1f19a..c925249 100644
--- a/drivers/net/bnx2x_link.c
+++ b/drivers/net/bnx2x_link.c
@@ -2046,6 +2046,111 @@ static void bnx2x_save_bcm_spirom_ver(struct bnx2x *bp, u8 port,
...
| Aug 12, 11:22 am 2009 |
| Eilon Greenstein | [net-next 08/36] bnx2x: get_ext_phy_fw_version returns N ...
To avoid confusion, if the PHY does not have a FW (and so, no FW version) make
sure that the string is NULL.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_link.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
index c925249..aee9fff 100644
--- a/drivers/net/bnx2x_link.c
+++ b/drivers/net/bnx2x_link.c
@@ -5393,7 +5393,7 @@ u8 ...
| Aug 12, 11:22 am 2009 |
| Eilon Greenstein | [net-next 05/36] bnx2x: Advertize flow control normally ...
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x_main.c | 4 +---
1 files changed, 1 insertions(+), 3 deletions(-)
diff --git a/drivers/net/bnx2x_main.c b/drivers/net/bnx2x_main.c
index 442ba61..47b687b 100644
--- a/drivers/net/bnx2x_main.c
+++ b/drivers/net/bnx2x_main.c
@@ -2148,9 +2148,7 @@ static u8 bnx2x_initial_phy_init(struct bnx2x *bp, int load_mode)
/* Initialize link parameters structure variables */
/* It is recommended to turn off RX FC for ...
| Aug 12, 11:22 am 2009 |
| Eilon Greenstein | [net-next 02/36] bnx2x: Using the new FW
The new FW improves the packets per second rate. It required a lot of change in
the FW which implies many changes in the driver to support it. It is now also
possible for the driver to use a separate MSI-X vector for Rx and Tx - this also
add some to the complicity of this change.
All things said - after this patch, practically all performance matrixes show
improvement.
Though Vladislav Zolotarov is not signed on this patch, he did most of the job
and deserves credit for that.
Signed-off-by: ...
| Aug 12, 11:20 am 2009 |
| Eilon Greenstein | [net-next 00/36] bnx2x patch series
Hi Dave,
Here is a patch series for the bnx2x. This patch series also replace the
FW, so
it contains two big blobs - the new fw and the removal of the old one.
Those
patches do not contain anything but the ihex - the actually change to
the driver
is in patch number 2 which is small enough to fit the mailing list.
For those who wish to see all the patches, including the ihex, I also
updated
http://linux.broadcom.com/eilong/ to contain this patch series.
Please consider applying to ...
| Aug 12, 11:19 am 2009 |
| David Miller | Re: [net-next 00/36] bnx2x patch series
Come on... 36 patches? :-/
Please trickle changes in, don't send patch bombs. I think
I've told you this not once, but several times. But I keep
seeing these sizable patch sets.
I frankly don't care that it might not mesh well with how you code up
and validate changes internally, because it absolutely does NOT work
well for how bugs really get found and fixed upstream.
If you trickle changes in, the guilty change is obvious to spot and it
gets found before you do more development that ...
| Aug 12, 2:50 pm 2009 |
| Jens Rosenboom | [PATCH] ipv6: Log the explicit address that triggered DA ...
If an interface has multiple addresses, the current message for DAD
failure isn't really helpful, so this patch adds the address itself to
the printk.
Signed-off-by: Jens Rosenboom <jens@mcbone.net>
---
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 43b3c9f..01a4b25 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1403,8 +1403,8 @@ void addrconf_dad_failure(struct inet6_ifaddr
*ifp)
struct inet6_dev *idev = ifp->idev;
if ...
| Aug 12, 7:58 am 2009 |
| Jens Rosenboom | [RFC] ipv6: Change %pI6 format to output compacted addresses?
Currently the output looks like 2001:0db8:0000:0000:0000:0000:0000:0001
which might be compacted to 2001:db8::1. The code to do this could be
adapted from inet_ntop in glibc, which would add about 80 lines to
lib/vsprintf.c. How do you guys value the tradeoff between more readable
logging and increased kernel size?
This was already mentioned in
http://kerneltrap.org/mailarchive/linux-netdev/2008/11/25/4231684 but
noone seems to have taken up on it.
--
| Aug 12, 8:39 am 2009 |
| Dan Smith | [PATCH] c/r: Add AF_UNIX support (v9)
This patch adds basic checkpoint/restart support for AF_UNIX sockets. It
has been tested with a single and multiple processes, and with data inflight
at the time of checkpoint. It supports socketpair()s, path-based, and
abstract sockets.
Changes in v9:
- Fix double-free of skb's in the list and target holding queue in the
error path of sock_copy_buffers()
- Adjust use of ckpt_read_string() to match new signature
Changes in v8:
- Fix stale dev_alloc_skb() from before the ...
| Aug 12, 8:12 am 2009 |
| =?UTF-8?B?7ZmN7IugIH ... | net/unix : possible race bug at unix_create1()
Hi. I am reporting a possible race bug at unix_create1()
in net/unix/af_unix.c of Linux 2.6.30.4.
Concurrent executions of unix_create1() function
in two different threads may result race condition
when unix_nr_socks +1 == 2 * get_max_files().
It is possible that no thread can pass the if-condition
checking if two atomic_inc() operations are executed
before.
It seems that it would be better to combine two
atomic operations into one atomic_inc_and_return().
Please examine the code and ...
| Aug 12, 8:00 am 2009 |
| Mrs Joan Thomas | LUCKY WINNER
950.000.00 GBP has been Awarded to you in our LG Electronics,send our office your
Names:............
Address:..........
Country:..........
--
| Aug 12, 7:11 am 2009 |
| =?UTF-8?B?7ZmN7IugIH ... | a question on packet_sock struct
Hi. I have a question on packet_sock struct
defined in net/packet/af_packet.c of Linux 2.6.30.4.
Is it necessary to hold a packet_sock's bind_lock
before accessing its ifindex field?
According to the definition of packet_sock struct,
it seems that an access to ifindex field should be
synchronized by the bind_lock.
However, according to the its usage in the code,
ifindex accesses are not consistently protected
by the bind_lock.
Thank you
Sincerely
Shin Hong
--
| Aug 11, 11:43 pm 2009 |
| Rusty Russell | Re: Page allocation failures in guest
Subject: virtio: net refill on out-of-memory
If we run out of memory, use keventd to fill the buffer. There's a
report of this happening: "Page allocation failures in guest",
Message-ID: <20090713115158.0a4892b0@mjolnir.ossman.eu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -71,6 +71,9 @@ struct virtnet_info
struct sk_buff_head recv;
struct ...
| Aug 11, 10:31 pm 2009 |
| Avi Kivity | Re: Page allocation failures in guest
schedule_delayed_work()?
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
--
| Aug 11, 10:41 pm 2009 |
| Rusty Russell | Re: Page allocation failures in guest
Hmm, might as well, although this is v. unlikely to happen.
Thanks,
Rusty.
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -72,7 +72,7 @@ struct virtnet_info
struct sk_buff_head send;
/* Work struct for refilling if we run low on memory. */
- struct work_struct refill;
+ struct delayed_work refill;
/* Chain pages by the private ptr. */
struct page *pages;
@@ -402,19 +402,16 @@ static void ...
| Aug 11, 11:56 pm 2009 |
| Gregory Haskins | Re: [PATCHv2 2/2] vhost_net: a kernel-level virtio server
Only a quick review for now. Will look closer later.
This seems odd. If you have the flush to act as a sync-barrier, why do
you also need rcu_dereference(sock)? At first blush, it seems
| Aug 11, 5:06 pm 2009 |
| Michael S. Tsirkin | Re: [PATCHv2 2/2] vhost_net: a kernel-level virtio server
It inserts memory barriers on architectures that require them
(currently only the Alpha), and, more importantly, documents
I don't think so. sync-barrier has nothing to do with it as it comes
--
| Aug 12, 2:02 am 2009 |
| Michael S. Tsirkin | Re: [PATCHv2 2/2] vhost_net: a kernel-level virtio server
Good idea. Thanks!
--
MST
--
| Aug 12, 3:52 am 2009 |
| Gregory Haskins | Re: [PATCHv2 2/2] vhost_net: a kernel-level virtio server
Is it? I don't see where RCU actually enters the equation if you are
not bracketing the dereference with rcu_read_lock/unlock. I'm sure you
have this correct, its just that it goes against my understanding of how
to use RCU properly so I am trying to understand what you did. I'm
I still am not seeing it. Even rcupdate.h says:
/**
* rcu_dereference - fetch an RCU-protected pointer in an
* RCU read-side critical section. This pointer may later
* be safely dereferenced.
note ...
| Aug 12, 6:01 am 2009 |
| Michael S. Tsirkin | Re: [PATCHv2 2/2] vhost_net: a kernel-level virtio server
Here's a thesis on what rcu_dereference does (besides documentation):
reader does this
A: sock = n->sock
B: use *sock
Say writer does this:
C: newsock = allocate socket
D: initialize(newsock)
E: n->sock = newsock
F: flush
On Alpha, reads could be reordered. So, on smp, command A could get
data from point F, and command B - from point D (uninitialized, from
cache). IOW, you get fresh pointer but stale data.
Heh, if readers are lockless and writer does ...
| Aug 12, 6:25 am 2009 |
| Gregory Haskins | Re: [PATCHv2 2/2] vhost_net: a kernel-level virtio server
Yes, that is understood. Perhaps you should just use a normal barrier,
however. (Or at least a comment that says "I am just using this for its
More correctly: it "smells like" RCU, but its not. ;) It's rcu-like,
but you are not really using the rcu facilities. I think anyone that
knows RCU and reads your code will likely be scratching their heads as well.
Its probably not a big deal, as I understand your code now. Just a
suggestion to help clarify it.
Regards,
-Greg
| Aug 12, 6:41 am 2009 |
| Michael S. Tsirkin | Re: [PATCHv2 2/2] vhost_net: a kernel-level virtio server
OK, I'll add some comments about that.
Thanks for the review!
--
MST
--
| Aug 12, 6:47 am 2009 |
| Paul E. McKenney | Re: [PATCHv2 2/2] vhost_net: a kernel-level virtio server
If you are using call_rcu(), synchronize_rcu(), or one of the
similar primitives, then you absolutely need rcu_read_lock() and
rcu_read_unlock(), or one of the similar pairs of primitives.
If you -don't- use rcu_read_lock(), then you are pretty much restricted
to adding data, but never removing it.
Make sense? ;-)
Thanx, Paul
--
| Aug 12, 7:11 am 2009 |
| Michael S. Tsirkin | Re: [PATCHv2 2/2] vhost_net: a kernel-level virtio server
Since I only access data from a workqueue, I replaced synchronize_rcu
with workqueue flush. That's why I don't need rcu_read_lock.
--
MST
--
| Aug 12, 7:15 am 2009 |
| Paul E. McKenney | Re: [PATCHv2 2/2] vhost_net: a kernel-level virtio server
Well, you -do- need -something- that takes on the role of rcu_read_lock(),
and in your case you in fact actually do. Your equivalent of
rcu_read_lock() is the beginning of execution of a workqueue item, and
the equivalent of rcu_read_unlock() is the end of execution of that same
workqueue item. Implicit, but no less real.
If a couple more uses like this show up, I might need to add this to
Documentation/RCU. ;-)
Thanx, Paul
--
| Aug 12, 8:26 am 2009 |
| Michael S. Tsirkin | Aug 12, 8:51 am 2009 | |
| Paul E. McKenney | Re: [PATCHv2 2/2] vhost_net: a kernel-level virtio server
And I idly wonder if this approach could replace SRCU. Probably not
for protecting the CPU-hotplug notifier chains, but worth some thought.
Thanx, Paul
--
| Aug 12, 9:06 am 2009 |
| Michael S. Tsirkin | Re: [PATCHv2 0/2] vhost: a kernel-level virtio server
1. use a dedicated network interface with SRIOV, program mac to match
that of guest (for testing, you can set promisc mode, but that is
bad for performance)
2. disable tso,gso,lro with ethtool
3. add vhost=ethX
--
MST
--
| Aug 12, 12:16 am 2009 |
| Gregory Haskins | Re: [PATCHv2 0/2] vhost: a kernel-level virtio server
Are you saying SRIOV is a requirement, and I can either program the
SRIOV adapter with a mac or use promis? Or are you saying I can use
Out of curiosity, wouldnt you only need to disable LRO on the adapter,
since the other two (IIUC) are transmit path and are therefore
You mean via "ip link" I assume?
Regards,
-Greg
| Aug 12, 4:56 am 2009 |
| Michael S. Tsirkin | Re: [PATCHv2 0/2] vhost: a kernel-level virtio server
SRIOV is not a requirement. And you can also use a dedicated
No, that's a new flag for virtio in qemu:
--
| Aug 12, 5:05 am 2009 |
| Gregory Haskins | Re: [PATCHv2 0/2] vhost: a kernel-level virtio server
Makes sense. Got it.
I was going to add guest-to-guest to the test matrix, but I assume that
is not supported with vhost unless you have something like a VEPA
enabled bridge?
Ah, ok. Even better.
Thanks!
-Greg
| Aug 12, 5:41 am 2009 |
| Arnd Bergmann | Re: [PATCHv2 0/2] vhost: a kernel-level virtio server
If I understand it correctly, you can at least connect a veth pair
to a bridge, right? Something like
veth0 - veth1 - vhost - guest 1
eth0 - br0-|
veth2 - veth3 - vhost - guest 2
It's a bit more complicated than it need to be, but should work fine.
Arnd <><
--
| Aug 12, 5:52 am 2009 |
| Michael S. Tsirkin | Re: [PATCHv2 0/2] vhost: a kernel-level virtio server
Presumably you mean on the same host? There were also some patches to
enable local guest to guest for macvlan, that would be a nice
software-only solution. For back to back, I just tried over veth, seems
--
| Aug 12, 6:04 am 2009 |
| Michael S. Tsirkin | Re: [PATCHv2 0/2] vhost: a kernel-level virtio server
Heh, you don't need a bridge in this picture:
guest 1 - vhost - veth0 - veth1 - vhost guest 2
--
| Aug 12, 6:06 am 2009 |
| Michael S. Tsirkin | Re: [PATCHv2 0/2] vhost: a kernel-level virtio server
Oh, hopefully macvlan will soon allow that.
--
MST
--
| Aug 12, 6:42 am 2009 |
| Arnd Bergmann | Re: [PATCHv2 0/2] vhost: a kernel-level virtio server
Sure, but the setup I described is the one that I would expect
to see in practice because it gives you external connectivity.
Measuring two guests communicating over a veth pair is
interesting for finding the bottlenecks, but of little
practical relevance.
Arnd <><
--
| Aug 12, 6:40 am 2009 |
| Gregory Haskins | Re: [PATCHv2 0/2] vhost: a kernel-level virtio server
Yeah, this would be the config I would be interested in.
Regards,
-Greg
| Aug 12, 6:51 am 2009 |
| Michael S. Tsirkin | Re: [PATCHv2 0/2] vhost: a kernel-level virtio server
Hmm, this wouldn't be the config to use for the benchmark though: there
are just too many variables. If you want both guest to guest and guest
to host, create 2 nics in the guest.
Here's one way to do this:
-net nic,model=virtio,vlan=0 -net user,vlan=0
-net nic,vlan=1,model=virtio,vhost=veth0
-redir tcp:8022::22
-net nic,model=virtio,vlan=0 -net user,vlan=0
-net nic,vlan=1,model=virtio,vhost=veth1
-redir tcp:8023::22
In guests, for simplicity, configure eth1 and eth0
to use ...
| Aug 12, 7:02 am 2009 |
| Gregory Haskins | Re: [PATCHv2 0/2] vhost: a kernel-level virtio server
I can try to do a few variations, but what I am interested is in
performance in a real-world L2 configuration. This would generally mean
all hosts (virtual or physical) in the same L2 domain.
If I get a chance, though, I will try to also wire them up in isolation
as another data point.
Regards,
-Greg
| Aug 12, 9:13 am 2009 |
| Michael S. Tsirkin | Re: [PATCHv2 0/2] vhost: a kernel-level virtio server
Or patch macvlan to support guest to guest:
http://markmail.org/message/sjy74g57qsvdo2wh
That patch needs to be updated to support guest to guest multiast,
but it seems functional enough for your purposes.
--
MST
--
| Aug 12, 9:37 am 2009 |
| David Miller | Re: pull request: wireless-2.6 2009-08-11
From: "John W. Linville" <linville@tuxdriver.com>
Ok, I'll take care of that.
--
| Aug 12, 2:52 pm 2009 |
| John W. Linville | Re: pull request: wireless-2.6 2009-08-11
Dave,
When you pull this, could you also revert 57921c31 ("libertas: Read
buffer overflow"). It has been shown to create a new problem. There
is work towards a solution to that one, but it isn't a simple clean-up.
If you would prefer, I can include the revert with another pull
request or even regenerate this one. Just let me know if that is
what you prefer.
Thanks,
John
--
John W. Linville Someday the world will need a hero, and you
linville@tuxdriver.com might be all we ...
| Aug 12, 11:24 am 2009 |
| Bob Dunlop | Re: [PATCH] libertas: name the network device wlan%d
Well I've been applying the equivalent of this patch privately since
we started using the libertas driver. We build systems with one or two
wired Ethernets and then an optional wireless module.
A fixed name wlan0 is a lot easier than explaining to a user that the
interface might be eth1 or eth2 depending on which model they have, or
that eth1 might be wired or wireless. It also simplifies scripts and
configuration file handling.
I'm sure there are many other solutions for big systems but ...
| Aug 12, 12:56 am 2009 |
| Daniel Mack | Re: [PATCH] libertas: name the network device wlan%d
Yes, our story here is very similar :)
--
| Aug 12, 1:55 am 2009 |
| Dan Williams | Re: [PATCH] libertas: name the network device wlan%d
I don't care either way, it's completely historical. Most of the
fullmac drivers used 'eth' back when. Might want to get buy-in from the
OLPC crew since they probably have the most deployed units using
libertas (cc-ed Daniel Drake).
Daniel, is it a problem for you guys if the libertas wifi interface name
went from 'eth' -> 'wlan' ? Mesh name would be unchanged.
Dan
--
| Aug 12, 9:31 am 2009 |
| Stephen Hemminger | Re: [PATCH] Fix Warnings from net/netlink/genetlink.c
On Tue, 11 Aug 2009 16:57:41 -0700
Agreed, and the line numbers are off.
--
--
| Aug 11, 8:24 pm 2009 |
| Stephen Rothwell | Re: [PATCH] Fix Warnings from net/netlink/genetlink.c
Hi all,
In the -next tree, it looks like this:
int genl_register_mc_group(struct genl_family *family,
struct genl_multicast_group *grp)
{
int id;
unsigned long *new_groups;
int err;
BUG_ON(grp->name[0] == '\0');
genl_lock();
/* special-case our own group */
if (grp == &notify_grp)
id = GENL_ID_CTRL;
else
id = find_first_zero_bit(mc_groups,
mc_groups_longs * BITS_PER_LONG);
if (id >= mc_groups_longs * BITS_PER_LONG) {
size_t nlen = ...
| Aug 11, 8:50 pm 2009 |
| Marcel Holtmann | Re: [PATCH] Fix Warnings from net/netlink/genetlink.c
it would have been nice if the patch actually indicates that it is for
-next since otherwise just shutting up a compiler warning is a bad idea
I prefer we add a err = 0 in the if (family->netnsok) { block instead of
just globally setting it to a value.
Regards
Marcel
--
| Aug 11, 9:03 pm 2009 |
| Rusty Russell | Re: [PATCH] fix memory leak in virtio_net
Nope, kfree_skb() frees the frags.
It needs to, otherwise we leak on every received packet!
Cheers,
Rusty.
--
| Aug 12, 5:41 am 2009 |
| Serge E. Hallyn | Re: module loading permissions and request_module permis ...
Right, so taking a more extreme example, the request_module() in
search_binary_handler... requiring CAP_SYS_MODULE there would mean
you'd have to be privileged to be the first to execute say a
binfmt_misc.
The actual modules are to be protected by protecting /lib/modules
and /sbin/modprobe themselves. So long as those are properly
protected, the ability to cause a call to __request_module() at most
takes up more memory.
So what you say seems to make sense.
-serge
--
| Aug 12, 4:48 pm 2009 |
| Arnd Bergmann | Re: [PATCH 2/2] vhost_net: a kernel-level virtio server
We discussed this before, and I still think this could be directly derived
from struct virtqueue, in the same way that vring_virtqueue is derived from
struct virtqueue. That would make it possible for simple device drivers
to use the same driver in both host and guest, similar to how Ira Snyder
used virtqueues to make virtio_net run between two hosts running the
same code [1].
Ideally, I guess you should be able to even make virtio_net work in the
host if you do that, but that could bring ...
| Aug 12, 10:03 am 2009 |
| Ira W. Snyder | Re: [PATCH 2/2] vhost_net: a kernel-level virtio server
I have no comments about the vhost code itself, I haven't reviewed it.
It might be interesting to try using a virtio-net in the host kernel to
communicate with the virtio-net running in the guest kernel. The lack of
a management interface is the biggest problem you will face (setting MAC
addresses, negotiating features, etc. doesn't work intuitively). Getting
the network interfaces talking is relatively easy.
Ira
--
| Aug 12, 10:19 am 2009 |
| Michael S. Tsirkin | Re: [PATCH 2/2] vhost_net: a kernel-level virtio server
I prefer keeping it simple. Much of abstraction in virtio is due to the
fact that it needs to work on top of different hardware emulations:
lguest,kvm, possibly others in the future. vhost is always working on
I don't think so. For example, there's a callback field that gets
invoked in guest when buffers are consumed. It could be overloaded to
mean "buffers are available" in host but you never handle both
As I pointed out earlier, most code in virtio net is asymmetrical: guest
provides ...
| Aug 12, 10:21 am 2009 |
| Michael S. Tsirkin | Re: [PATCH 2/2] vhost_net: a kernel-level virtio server
That was one of the reasons I decided to move most of code out to
userspace. My kernel driver only handles datapath,
Tried this, but
- guest memory isn't pinned, so copy_to_user
to access it, errors need to be handled in a sane way
- used/available roles are reversed
- kick/interrupt roles are reversed
So most of the code then looks like
if (host) {
} else {
}
return
The only common part is walking the descriptor list,
but that's like 10 lines of code.
At which point ...
| Aug 12, 10:31 am 2009 |
| Ira W. Snyder | Re: [PATCH 2/2] vhost_net: a kernel-level virtio server
Ok, that makes sense. Let me see if I understand the concept of the
driver. Here's a picture of what makes sense to me:
guest system
---------------------------------
| userspace applications |
---------------------------------
| kernel network stack |
---------------------------------
| virtio-net |
---------------------------------
| transport (virtio-ring, etc.) |
---------------------------------
|
...
| Aug 12, 10:48 am 2009 |
| Arnd Bergmann | Re: [PATCH 2/2] vhost_net: a kernel-level virtio server
Well, that was my point: virtio can already work on a number of abstractions,
The trick is to swap the virtqueues instead. virtio-net is actually
mostly symmetric in just the same way that the physical wires on a
twisted pair ethernet are symmetric (I like how that analogy fits).
virtio_net kicks the transmit virtqueue when it has data and
it kicks the receive queue when it has empty buffers to fill,
and it has callbacks when the two are done. You can do the
same in both the guest and the ...
| Aug 12, 10:59 am 2009 |
| Anthony Liguori | Re: [PATCH 2/2] vhost_net: a kernel-level virtio server
Actually, vhost may not always be limited to real hardware.
We may on day use vhost as the basis of a driver domain. There's quite
a lot of interest in this for networking.
At any rate, I'd like to see performance results before we consider
trying to reuse virtio code.
Regards,
Anthony Liguori
--
| Aug 12, 12:22 pm 2009 |
| Anthony Liguori | Re: [PATCH 2/2] vhost_net: a kernel-level virtio server
It's already been done between two guests. See
http://article.gmane.org/gmane.linux.kernel.virtualization/5423
Regards,
Anthony Liguori
--
| Aug 12, 12:27 pm 2009 |
| Paul E. McKenney | Re: [PATCH 2/2] vhost_net: a kernel-level virtio server
Much better -- a couple of documentation nits below.
How about something like "Therefore the beginning of workqueue
execution acts as rcu_read_lock() and the end of workqueue execution
acts as rcu_read_lock()"?
It would also be good to add comments to the workqueue functions
themselves saying that they act as read-side critical sections for
your kind of RCU.
--
| Aug 12, 12:58 pm 2009 |
| Paul Moore | Re: [RFC PATCH v2 2/2] selinux: Support for the new TUN ...
Thanks, I added both acks.
--
paul moore
linux @ hp
--
| Aug 12, 7:59 am 2009 |
| Serge E. Hallyn | Aug 12, 12:28 pm 2009 | |
| Paul Moore | Re: [RFC PATCH v2 1/2] lsm: Add hooks to the TUN driver
Thanks.
--
paul moore
linux @ hp
--
| Aug 12, 12:43 pm 2009 |
| Serge E. Hallyn | Re: [RFC PATCH v2 2/2] selinux: Support for the new TUN ...
IIUC it is possible for multiple processes to attach to the same
tun device. Will it get confusing/incorrect to have each attach
--
| Aug 12, 3:14 pm 2009 |
| Paul Moore | Re: [RFC PATCH v2 2/2] selinux: Support for the new TUN ...
I may be reading the code wrong, but in drivers/net/tun.c:tun_attach() the
code checks to see if the TUN device is already in use and if it is then the
attach fails with -EBUSY (check where the tun_device->tfile is examined). I
believe this should ensure that only one process at a time has access to the
TUN device so we shouldn't have to worry about a TUN socket getting relabeled
while it is currently in use. As far as persistent TUN devices getting
relabeled when a new process ...
| Aug 12, 3:55 pm 2009 |
| Serge E. Hallyn | Re: [RFC PATCH v2 2/2] selinux: Support for the new TUN ...
Ah yes, you're right - I saw the check for (ifr->ifr_flags & IFF_TUN_EXCL) in
Ok, thanks. To my untrained eye the class addition looks right too, so
with the trivial change:
Acked-by: Serge Hallyn <serue@us.ibm.com>
thanks,
-serge
--
| Aug 12, 4:07 pm 2009 |
| Oren Laadan | Re: [PATCH 5/5] c/r: Add AF_UNIX support (v8)
Dan,
I just noticed that this message wasn't posted last night.
So hitting "send" now ... sorry about that.
-----
Before pulling this one, I took a quick look at this patch, and
I saw that it still uses skb_morph despite the changelog and my
memory...
Can you please verify that this is the latest ?
Also, while trying to pull it, I'd like to ask for three cosmetic
changes, if it isn't too much -
1) Move 'struct ckpt_hdr_socket' et-al to checkpoint_hdr.h
2) Move everything that is ...
| Aug 12, 8:29 am 2009 |
| Dan Smith | Re: [PATCH 5/5] c/r: Add AF_UNIX support (v8)
OL> Before pulling this one, I took a quick look at this patch, and I
OL> saw that it still uses skb_morph despite the changelog and my
OL> memory...
That's correct. We've been through several ways of allocating the
skb's, so it's definitely confusing. We're back to skb_morph()
because I'm pre-allocating them for lock safety when traversing the
queues. The only thing that is allocated is the actual skb structure
itself; the buffers are still shared like with skb_clone().
OL> 1) Move ...
| Aug 12, 8:36 am 2009 |
| Oren Laadan | Aug 12, 12:19 pm 2009 | |
| Dmitry Eremin-Solenikov | Re: [PATCH 1/2] mac802154: add a software MAC 802.15.4 i ...
Currently we do all the work from special worker threads, so it's
possible for this callback to sleep. The error isn't yet propagated to
We were using master netdevices for several purposes:
1) ip link add link mwpanX type wpan, so that we have out-of-box support
for radio additions. That's really nice thing to have.
2) for SOCK_RAW implementation that can be used to send raw packets
over-the-air/receive raw packets. I think we can use af_packet for
this, but I'm still not sure ...
| Aug 12, 6:06 am 2009 |
| Johannes Berg | Re: [PATCH 1/2] mac802154: add a software MAC 802.15.4 i ...
-----BEGIN PGP ...
| Aug 12, 6:13 am 2009 |
| Dmitry Eremin-Solenikov | Re: [PATCH 1/2] mac802154: add a software MAC 802.15.4 i ...
Hmmm. Really weird. Then, if we want to pass data from socket layer to
MAC layer, we should place data in skb->data and not in skb->cb (like
Nice idea. Thanks a lot!
--
With best wishes
Dmitry
--
| Aug 12, 1:46 pm 2009 |
| Brandeburg, Jesse | RE: Receive side performance issue with multi-10-GigE and NUMA
bill, I recently helped Jesse Barnes push a patch that addresses this kind
of issue on CoreI7, the root cause was the numa_node variable was
initialized based on slot on AMD systems, but needed to be set to -1 by
default on systems with a uniform IOH to slot architecture.
here is the commit ID:
http://git.kernel.org/?p=linux/kernel/git/sfr/linux-next.git;a=commit;h=3c38
d674be519109696746192943a6d524019f7f
I'm not sure it is in linus' tree yet, this link is to net-next
Maybe see if it ...
| Aug 11, 5:02 pm 2009 |
| Bill Fink | Re: Receive side performance issue with multi-10-GigE and NUMA
I did have NUMA enabled, and memory was configured as independent
rather than interleaved.
Based on all the discussions, it seemed a good possibility that the
BIOS was broken. Today a colleague checked the SuperMicro site, and
discovered and installed a newer version of the BIOS. Things seem
better now, but not totally correct.
There are now NUMA nodes 0 and 1 instead of 0 and 2, and the CPUs
for node 0 are 0 through 3 while the CPUs for node 1 are 4 through 7
(previously the even CPUs ...
| Aug 11, 9:30 pm 2009 |
| Bill Fink | Re: Receive side performance issue with multi-10-GigE and NUMA
It's worth a shot.
Hopefully I can get a chance to build a new kernel tomorrow to check
out some of the suggestions, like this one, the setting of ACPI_DEBUG,
and the new ftrace module for checking NUMA affinity of skbs.
-Thanks
-Bill
--
| Aug 11, 9:38 pm 2009 |
| Andi Kleen | Re: Receive side performance issue with multi-10-GigE and NUMA
That might be ok, depending on how the APICs are configured.
Of course you should have the same number of CPUs on the different
Most likely you need the appended patch from linux-next.
It should be probably in .31, but I can't see it in linus' tree only in -next.
Jesse?
No.
-Andi
commit eaf2f454cc9a76dbe1890af6269e60fe9978a3a5
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date: Fri Jul 10 14:04:30 2009 -0700
x86/PCI: initialize PCI bus node numbers early
...
| Aug 12, 12:21 am 2009 |
| Jesse Barnes | Re: Receive side performance issue with multi-10-GigE and NUMA
On Wed, 12 Aug 2009 00:38:24 -0400
It's a fairly significant change so I wasn't planning on sending it to
Linus for 2.6.31. If you think it *should* go into 2.6.31 (and stable
for that matter), please let me know soon.
Thanks,
--
Jesse Barnes, Intel Open Source Technology Center
--
| Aug 12, 9:00 am 2009 |
| David Miller | Re: Receive side performance issue with multi-10-GigE and NUMA
From: Bill Fink <billfink@mindspring.com>
This, unfortunately, won't be comprehensive. You'd also need to
kludge the NUMA node used for allocation of the skb->data buffer via
the netdev_alloc_skb() calls in myri10ge_rx_done() and friends.
This could possibly account for why, with your kludge, you still
were only getting 56.4703 Gbps
--
| Aug 12, 4:29 pm 2009 |
| Phil Sutter | Re: [PATCH] korina: Read buffer overflow
Hi,
Obviously, I took the chance to mess things up again. These three
patches were accidentially written on top of the linux-mips tree, right
before Ralf pulled from Linus. So they do not apply cleanly to the
netdev tree, and even worse the last one is completely useless since
it's changes have already been implemented.
I will follow up to this email with an updated series of the two
remaining, valid patches. Sorry for the inconvenience.
Greetings, Phil
--
| Aug 12, 3:15 pm 2009 |
| Phil Sutter | [PATCH 1/2] korina: fix printk formatting, add final info line
The macro DRV_NAME contains "korina", the field dev->name points to the
actual interface name. So messages were formerly prefixed with
'korinaeth2:' (on my system).
Signed-off-by: Phil Sutter <n0-1@freewrt.org>
---
drivers/net/korina.c | 32 +++++++++++++++++---------------
1 files changed, 17 insertions(+), 15 deletions(-)
diff --git a/drivers/net/korina.c b/drivers/net/korina.c
index b4cf602..6df9d25 100644
--- a/drivers/net/korina.c
+++ b/drivers/net/korina.c
@@ -338,7 +338,7 @@ ...
| Aug 12, 3:22 pm 2009 |
| Phil Sutter | [PATCH 2/2] korina: add error-handling to korina_alloc_ring
This also avoids a potential buffer overflow in case the very first
receive descriptor fails to allocate, as an index of -1 would be used
afterwards. Kudos to Roel Kluin for pointing this out and providing an
initial patch.
Signed-off-by: Phil Sutter <n0-1@freewrt.org>
---
drivers/net/korina.c | 12 +++++++++---
1 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/net/korina.c b/drivers/net/korina.c
index 6df9d25..51ca54c 100644
--- a/drivers/net/korina.c
+++ ...
| Aug 12, 3:52 pm 2009 |
| Dave Jones | Re: 8139cp dma-debug warning.
On Thu, Aug 06, 2009 at 05:57:02PM -0400, Dave Jones wrote:
> I'm chasing yet another dma-debug warning where we're unmapping a different
> size to what we mapped.
>
> > WARNING: at lib/dma-debug.c:803 check_unmap+0x1f5/0x509() (Not tainted)
> > Hardware name:
> > 8139cp 0000:00:03.0: DMA-API: device driver frees DMA memory with different
> > size [device address=0x000000001e9f8852] [map size=1536 bytes] [unmap size=1538
> > bytes]
> > Modules linked in: ipv6 dm_multipath ...
| Aug 12, 10:13 am 2009 |
| David Miller | Re: [PATCH] net: Fix spinlock use in alloc_netdev_mq()
From: Jiri Pirko <jpirko@redhat.com>
Well, because of those potential late dev->type settings we
can't do things this way. And I believe those in fact do happen.
So I'm tossing this patch, I wouldn't have applied it to net-2.6
anyways, as it's net-next-2.6 material :-)
--
| Aug 12, 4:44 pm 2009 |
| Vlad Yasevich | Re: WARNING: at net/ipv4/af_inet.c:155 inet_sock_destruc ...
BTW, I've seen the same issue in 2.6.28 and 2.6.29 while doing a bunch
of NFS-over-UDP testing. I've seen the issue reported in 2.6.27 as well,
but it went by ignored. It's not easy to reproduce as it seems like it
requires quite a bit traffic over over multiple interfaces.
I've been looking at this for a while and haven't caught the bugger.
Here is the stack trace from 2.6.28:
May 13 16:17:38 dl380g6-2 kernel: [ 4473.086015] ------------[ cut here
]-------
-----
May 13 16:17:38 ...
| Aug 12, 1:00 pm 2009 |
| David Miller | Re: [PATCH] pppoe: fix race at init time
From: Cyrill Gorcunov <gorcunov@gmail.com>
Still no feedback on this one, but it looks totally correct to me.
So I've applied it to net-next-2.6 so that it doesn't get lost and if
it turns out we need it to actually fix a user reported bug we can
toss it into net-2.6 too.
--
| Aug 12, 4:40 pm 2009 |
| =?iso-8859-1?q?R=E9m ... | Re: [PATCH] Phonet: sockets list through proc_fs
I simply did not think of the sole pn_sock_seq_fops as "so many things"... But
I can change it if you think it would be better.
--
Rémi Denis-Courmont
Nokia Devices R&D, Maemo Software, Helsinki
--
| Aug 12, 4:02 am 2009 |
| David Miller | Re: [PATCH] Phonet: sockets list through proc_fs
From: "Rémi Denis-Courmont" <remi.denis-courmont@nokia.com>
Maybe I'm exaggerating, but in any event the less you export from
a file the cleaner it tends to be.
--
| Aug 12, 11:06 am 2009 |
| Arnd Bergmann | Re: [PATCH][RFC] net/bridge: add basic VEPA support
Right, that question is still open, and dont't see it as very important
Not yet, but I guess it comes as a natural extension when I fix
multicast/broadcast delivery from the reflective relay for VEPA.
The logic that I would use there is:
broadcast from a dowstream port:
if (bridge_mode(source_port)) {
forward_to_upstream(frame);
for_each_downstream(port) {
/* deliver to all bridge ports except self, do
not deliver to any VEPA ...
| Aug 12, 6:19 am 2009 |
| Fischer, Anna | RE: [PATCH][RFC] net/bridge: add basic VEPA support
Yes, for the basic VEPA this is not important. For MultiChannel VEPA, it
would be nice if a macvlan device could operate as VEPA and as a typical
VEB (VEB = traditional bridge but no learning).
Basically, what we would need to be able to support is running a VEB and
a VEPA simultaneously on the same uplink port (e.g. the physical device).
A new component (called the S-Component) would then multiplex frames
to the VEB or the VEPA based on a tagging scheme.
I could see this potentially ...
| Aug 12, 7:32 am 2009 |
| Arnd Bergmann | Re: [PATCH][RFC] net/bridge: add basic VEPA support
Right, this would be a logical extension in that scenario. I would imagine
that in many scenarios running a VEB also means that you want to use
the advanced ebtables/iptables filtering of the bridge subsystem, but
if all guests trust each other, using macvlan to bridge between them
You can of course do that by adding one port of the S-component to
a port of a bridge, and using another port of the S-component to
create macvlan devices, or you could have multiple ports of the
S-component each ...
| Aug 12, 9:27 am 2009 |
| previous day | today | next day |
|---|---|---|
| August 11, 2009 | August 12, 2009 | August 13, 2009 |
