Re: [PATCH linux-next v3 1/2] irq: Add CPU mask affinity hint

Previous thread: Re: [PATCH v2] [RFC] ehci: Disable wake on overcurrent (WKOC_E) and disconnect (WKDISC_E) by Alan Stern on Thursday, April 29, 2010 - 9:16 am. (1 message)

Next thread: Re: [PATCH 1/8] PM: Add suspend block api. by Alan Stern on Thursday, April 29, 2010 - 8:41 am. (1 message)
From: Peter P Waskiewicz Jr
Date: Friday, April 30, 2010 - 1:23 pm

This patch adds a cpumask affinity hint to the irq_desc
structure, along with a registration function and a read-only
proc entry for each interrupt.

This affinity_hint handle for each interrupt can be used by
underlying drivers that need a better mechanism to control
interrupt affinity.  The underlying driver can register a
cpumask for the interrupt, which will allow the driver to
provide the CPU mask for the interrupt to anything that
requests it.  The intent is to extend the userspace daemon,
irqbalance, to help hint to it a preferred CPU mask to balance
the interrupt into.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
---

 include/linux/interrupt.h |    7 +++++++
 include/linux/irq.h       |    1 +
 kernel/irq/manage.c       |   19 +++++++++++++++++++
 kernel/irq/proc.c         |   42 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 69 insertions(+), 0 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 75f3f00..4d1df9b 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -209,6 +209,8 @@ extern int irq_set_affinity(unsigned int irq, const struct cpumask *cpumask);
 extern int irq_can_set_affinity(unsigned int irq);
 extern int irq_select_affinity(unsigned int irq);
 
+extern int irq_register_affinity_hint(unsigned int irq,
+                                      const struct cpumask *m);
 #else /* CONFIG_SMP */
 
 static inline int irq_set_affinity(unsigned int irq, const struct cpumask *m)
@@ -223,6 +225,11 @@ static inline int irq_can_set_affinity(unsigned int irq)
 
 static inline int irq_select_affinity(unsigned int irq)  { return 0; }
 
+static inline int irq_register_affinity_hint(unsigned int irq,
+                                             const struct cpumask *m)
+{
+	return -EINVAL;
+}
 #endif /* CONFIG_SMP && CONFIG_GENERIC_HARDIRQS */
 
 #ifdef CONFIG_GENERIC_HARDIRQS
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 707ab12..83b16d7 ...
From: Peter P Waskiewicz Jr
Date: Friday, April 30, 2010 - 1:24 pm

This patch uses the new IRQ affinity_hint callback mechanism.
It serves purely as an example of how a low-level driver can
utilize this new interface.

An official ixgbe patch will be pushed through netdev once the
IRQ patches have been accepted and merged.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
---

 drivers/net/ixgbe/ixgbe.h      |    2 ++
 drivers/net/ixgbe/ixgbe_main.c |   21 ++++++++++++++++++++-
 2 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe.h b/drivers/net/ixgbe/ixgbe.h
index 79c35ae..c220b9f 100644
--- a/drivers/net/ixgbe/ixgbe.h
+++ b/drivers/net/ixgbe/ixgbe.h
@@ -32,6 +32,7 @@
 #include <linux/pci.h>
 #include <linux/netdevice.h>
 #include <linux/aer.h>
+#include <linux/cpumask.h>
 
 #include "ixgbe_type.h"
 #include "ixgbe_common.h"
@@ -236,6 +237,7 @@ struct ixgbe_q_vector {
 	u8 tx_itr;
 	u8 rx_itr;
 	u32 eitr;
+	cpumask_var_t affinity_mask;
 };
 
 /* Helper macros to switch between ints/sec and what the register uses.
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index 1b1419c..8f84bb8 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -1083,6 +1083,16 @@ static void ixgbe_configure_msix(struct ixgbe_adapter *adapter)
 			q_vector->eitr = adapter->rx_eitr_param;
 
 		ixgbe_write_eitr(q_vector);
+
+		/*
+		 * Allocate the affinity_hint cpumask, assign the mask for
+		 * this vector, and register our affinity_hint for this irq.
+		 */
+		if (!alloc_cpumask_var(&q_vector->affinity_mask, GFP_KERNEL))
+			return;
+		cpumask_set_cpu(v_idx, q_vector->affinity_mask);
+		irq_register_affinity_hint(adapter->msix_entries[v_idx].vector,
+		                           q_vector->affinity_mask);
 	}
 
 	if (adapter->hw.mac.type == ixgbe_mac_82598EB)
@@ -3218,7 +3228,7 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 	struct ixgbe_hw *hw = &adapter->hw;
 	u32 rxctrl;
 	u32 txdctl;
-	int i, j;
+	int i, j, ...
From: Thomas Gleixner
Date: Friday, April 30, 2010 - 1:24 pm

Peter,


One last nitpick. 

Can you please rename to irq_set_affinity_hint() ?

Otherwise, I'm happy with the outcome. Thanks for your patience!

Thanks,

	tglx
--

From: Peter P Waskiewicz Jr
Date: Friday, April 30, 2010 - 1:30 pm

Not a problem.  It's a better name anyways now that we've dropped the 

Thanks for your help getting this pulled together.  This is a big step for 
our performance tuning goals.

Cheers,
-PJ
--

From: Ben Hutchings
Date: Friday, April 30, 2010 - 2:02 pm

Is it possible for irq_to_desc(irq) to be NULL?  This function already
[...]

I don't think this should be returning -EINVAL if the affinity hint is
missing.  That means 'invalid argument', but there is nothing invalid
about trying to read() the corresponding file.  The file should simply
be empty if there is no hint.  (Actually it might be better if it didn't
appear at all, but that would be a pain to implement.)

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--

From: Thomas Gleixner
Date: Friday, April 30, 2010 - 2:17 pm

Oh come on. Driver writers get everything wrong and not checking on an

I agree that -EINVAL is not really a good match.

How about just returning CPU_MASK_ALL if desc->affinity_hint is not
set ?

Thanks,

	tglx
--

From: Peter P Waskiewicz Jr
Date: Friday, April 30, 2010 - 2:18 pm

That seems reasonable to me.

cheers,
-PJ
--

Previous thread: Re: [PATCH v2] [RFC] ehci: Disable wake on overcurrent (WKOC_E) and disconnect (WKDISC_E) by Alan Stern on Thursday, April 29, 2010 - 9:16 am. (1 message)

Next thread: Re: [PATCH 1/8] PM: Add suspend block api. by Alan Stern on Thursday, April 29, 2010 - 8:41 am. (1 message)