Re: [PATCH] perf: Store active software events in a hashlist

Previous thread: Re: [PATCH 23/24] video/matrox: fix dangling pointers by James Simmons on Monday, April 5, 2010 - 7:04 am. (1 message)

Next thread: [PATCH 1/2] mfd: Initialise WM831x IRQ masks on chip even if interrupts not in use by Mark Brown on Monday, April 5, 2010 - 8:14 am. (4 messages)
From: Frederic Weisbecker
Date: Monday, April 5, 2010 - 7:08 am

Each time a software event triggers, we need to walk through
the entire list of events from the current cpu and task contexts
to retrieve a running perf event that matches.
We also need to check a matching perf event is actually counting.

This walk is wasteful and makes the event fast path scaling
down with a growing number of events running on the same
contexts.

To solve this, we store the running perf events in a hashlist to
get an immediate access to them against their type:event_id when
they trigger.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
---
 include/linux/perf_event.h |   12 +++
 kernel/perf_event.c        |  226 +++++++++++++++++++++++++++++++------------
 2 files changed, 175 insertions(+), 63 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6e96cc8..03e87c4 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -589,6 +589,14 @@ enum perf_group_flag {
 	PERF_GROUP_SOFTWARE = 0x1,
 };
 
+#define SWEVENT_HLIST_BITS	8
+#define SWEVENT_HLIST_SIZE	((1 << (SWEVENT_HLIST_BITS + 1)) - 1)
+
+struct swevent_hlist {
+	struct hlist_head	heads[SWEVENT_HLIST_SIZE];
+	struct rcu_head		rcu_head;
+};
+
 /**
  * struct perf_event - performance event kernel representation:
  */
@@ -597,6 +605,7 @@ struct perf_event {
 	struct list_head		group_entry;
 	struct list_head		event_entry;
 	struct list_head		sibling_list;
+	struct hlist_node		hlist_entry;
 	int				nr_siblings;
 	int				group_flags;
 	struct perf_event		*group_leader;
@@ -744,6 +753,9 @@ struct perf_cpu_context {
 	int				active_oncpu;
 	int				max_pertask;
 	int				exclusive;
+	struct swevent_hlist		*swevent_hlist;
+	struct mutex			hlist_mutex;
+	struct kref			hlist_refcount;
 
 	/*
 	 * Recursion avoidance:
diff --git a/kernel/perf_event.c ...
From: Peter Zijlstra
Subject:
Date: Tuesday, April 6, 2010 - 8:27 am

So we have a hash-table per-cpu, each event takes a ref on the hash
table, when the thing is empty we free it.

When the event->cpu == -1 (all cpus) we take a ref on all possible cpu's
hash-table (should be online I figure, but that requires adding a
hotplug handler).

Then on event enable/disable we actually add the event to the hash-table
belonging to the cpu the event/task gets scheduled on, since each event
can only ever be active on one cpu.

Right?

So looks good, altough I think we want to do that online/hotplug thing.

--

From: Peter Zijlstra
Date: Wednesday, April 7, 2010 - 2:04 am

Alternatively, you can simply but the hash table into the per-cpu
structure and not allocate it, its only a single page (half a page if
you use 32bit or actually use 8 bits.


--

From: Frederic Weisbecker
Date: Wednesday, April 7, 2010 - 4:58 am

As you prefer. This would indeed make it more simple, but that would also
make these pages unused most of the time.

--

From: Peter Zijlstra
Date: Wednesday, April 7, 2010 - 5:09 am

True I suppose,.. have a go at that hotplug stuff, the only thing I
worried about was that refcount stuff getting rather ugly.

--

From: Frederic Weisbecker
Date: Thursday, April 8, 2010 - 12:39 pm

Each time a software event triggers, we need to walk through
the entire list of events from the current cpu and task contexts
to retrieve a running perf event that matches.
We also need to check a matching perf event is actually counting.

This walk is wasteful and makes the event fast path scaling
down with a growing number of events running on the same
contexts.

To solve this, we store the running perf events in a hashlist to
get an immediate access to them against their type:event_id when
they trigger.

v2: - Fix SWEVENT_HLIST_SIZE definition (and re-learn some basic
      maths along the way)
    - Only allocate hlist for online cpus, but keep track of the
      refcount on offline possible cpus too, so that we allocate it
      if needed when it becomes online.
    - Drop the kref use as it's not adapted to our tricks anymore.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
---
 include/linux/perf_event.h |   12 ++
 kernel/perf_event.c        |  251 +++++++++++++++++++++++++++++++++-----------
 2 files changed, 200 insertions(+), 63 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6e96cc8..bf896d0 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -589,6 +589,14 @@ enum perf_group_flag {
 	PERF_GROUP_SOFTWARE = 0x1,
 };
 
+#define SWEVENT_HLIST_BITS	8
+#define SWEVENT_HLIST_SIZE	(1 << SWEVENT_HLIST_BITS)
+
+struct swevent_hlist {
+	struct hlist_head	heads[SWEVENT_HLIST_SIZE];
+	struct rcu_head		rcu_head;
+};
+
 /**
  * struct perf_event - performance event kernel representation:
  */
@@ -597,6 +605,7 @@ struct perf_event {
 	struct list_head		group_entry;
 	struct list_head		event_entry;
 	struct list_head		sibling_list;
+	struct hlist_node		hlist_entry;
 	int				nr_siblings;
 	int				group_flags;
 	struct ...
From: Eric Dumazet
Date: Thursday, April 8, 2010 - 1:04 pm

Le jeudi 08 avril 2010 à 21:39 +0200, Frederic Weisbecker a écrit :






--

From: Frederic Weisbecker
Date: Thursday, April 8, 2010 - 1:11 pm

From: Frederic Weisbecker
Date: Wednesday, April 7, 2010 - 4:56 am

That would let us allocate on online cpus instead of possibles? Yeah right.

--

From: Peter Zijlstra
Date: Wednesday, April 7, 2010 - 4:58 am

Right, so if you want to go this route (and not simply embed it in the
percpu data), the complication I thought of is that you want the
refcount on offline cpus but not the allocation, since its very hard to
reconstruct the number of events that had event->cpu == -1.

--

From: Frederic Weisbecker
Date: Wednesday, April 7, 2010 - 5:06 am

Ok, right I can do that.

--

From: Peter Zijlstra
Date: Wednesday, April 7, 2010 - 2:02 am

That seems to result in 9 bits worth, doesn't it?


--

From: Frederic Weisbecker
Date: Wednesday, April 7, 2010 - 4:52 am

Oh right, I was confused, will zapp the + 1.

--

From: Peter Zijlstra
Date: Wednesday, April 7, 2010 - 4:56 am

Also, what you're computing seems to be a mask, not a size

So with BITS = 8, you want
SIZE = 1<<BITS = 256
MASK = SIZE-1 = 255

right? 

--

From: Frederic Weisbecker
Date: Wednesday, April 7, 2010 - 5:09 am

Oh right... damn I suck in maths...

--

Previous thread: Re: [PATCH 23/24] video/matrox: fix dangling pointers by James Simmons on Monday, April 5, 2010 - 7:04 am. (1 message)

Next thread: [PATCH 1/2] mfd: Initialise WM831x IRQ masks on chip even if interrupts not in use by Mark Brown on Monday, April 5, 2010 - 8:14 am. (4 messages)