Re: can't flush tlb on e500

Previous thread: Fw: RCU detected CPU 1 stall (t=4295904002/751 jiffies) Pid: 902, comm: md1_raid5 by Janos Haar on Wednesday, May 20, 2009 - 2:46 am. (7 messages)

Next thread: [PATCH] tracing/events: Add timer and high res timer tracepoints by Anton Blanchard on Wednesday, May 20, 2009 - 3:13 am. (19 messages)
From: Saito Hideo
Date: Wednesday, May 20, 2009 - 3:12 am

It seems that the code for powerpc with e500 on linux-2.6.29 has an
regression, because I can't execute hackbench on my platform. I
apologize if already reported.

For example, following message was asserted, however the command is
done successfully if the number given to hackbench is small.

% ./hackbench 20
SERVER: read (error: Bad file descriptor)

% ./hackbench 4
%

I think that the tlb should be cleared before mm->context.id is set
MMU_NO_CONTEXT.

--- arch/powerpc/mm/mmu_context_nohash.c.orig	2009-03-24
08:12:14.000000000 +0900
+++ arch/powerpc/mm/mmu_context_nohash.c	2009-05-20 18:33:53.000000000 +0900
@@ -122,22 +122,22 @@ static unsigned int steal_context_up(uns
 	struct mm_struct *mm;
 	int cpu = smp_processor_id();

 	/* Pick up the victim mm */
 	mm = context_mm[id];

 	pr_debug("[%d] steal context %d from mm @%p\n", cpu, id, mm);

-	/* Mark this mm has having no context anymore */
-	mm->context.id = MMU_NO_CONTEXT;
-
 	/* Flush the TLB for that context */
 	local_flush_tlb_mm(mm);

+	/* Mark this mm has having no context anymore */
+	mm->context.id = MMU_NO_CONTEXT;
+
 	/* XXX This clear should ultimately be part of local_flush_tlb_mm */
 	__clear_bit(id, stale_map[cpu]);

 	return id;
 }

 #ifdef DEBUG_MAP_CONSISTENCY
 static void context_check_map(void)
--

From: Hideo Saito
Date: Thursday, May 21, 2009 - 5:57 pm

How about following changes because all TLB entries are flushed repeatedly if processes overflow at the table mapping the context. I think that the table should be initialized again because _tlbil_pid() flushes all TLB entries and tlbilx instruction isn't supported on E500(MPC8548).

--- arch/powerpc/mm/mmu_context_nohash.c.orig	2009-03-24 08:12:14.000000000 +0900
+++ arch/powerpc/mm/mmu_context_nohash.c	2009-05-21 16:35:09.000000000 +0900
@@ -107,39 +107,69 @@ static unsigned int steal_context_smp(un
 	 */
 	spin_unlock(&context_lock);
 	cpu_relax();
 	spin_lock(&context_lock);
 	goto again;
 }
 #endif  /* CONFIG_SMP */
 
+/*
+ * We're flushed using the all context
+ */
+static void flush_all_context(int cpu)
+{
+	struct mm_struct *mm;
+	int n;
+
+	for (n = first_context; n <= last_context; n++) {
+
+		mm = context_mm[n];
+		if (mm == NULL || mm->context.id == MMU_NO_CONTEXT)
+			continue;
+
+		WARN_ON(mm->context.active != 0);
+
+		mm->context.id = MMU_NO_CONTEXT;
+	}
+	memset(stale_map[cpu], 0, CTX_MAP_SIZE);
+	memset(context_map, 0, CTX_MAP_SIZE);
+	context_map[0] = (1 << first_context) - 1;
+	nr_free_contexts = last_context - first_context + 1;
+}
+
 /* Note that this will also be called on SMP if all other CPUs are
  * offlined, which means that it may be called for cpu != 0. For
  * this to work, we somewhat assume that CPUs that are onlined
  * come up with a fully clean TLB (or are cleaned when offlined)
  */
 static unsigned int steal_context_up(unsigned int id)
 {
 	struct mm_struct *mm;
 	int cpu = smp_processor_id();
 
 	/* Pick up the victim mm */
 	mm = context_mm[id];
 
 	pr_debug("[%d] steal context %d from mm @%p\n", cpu, id, mm);
 
-	/* Mark this mm has having no context anymore */
-	mm->context.id = MMU_NO_CONTEXT;
-
 	/* Flush the TLB for that context */
 	local_flush_tlb_mm(mm);
 
+#ifdef CONFIG_FSL_BOOKE
+	flush_all_context(cpu);
+	__set_bit(id, context_map);
+	nr_free_contexts--;
+#else
+	/* Mark this mm has having no ...
From: Benjamin Herrenschmidt
Date: Friday, May 22, 2009 - 2:20 am

Not in this version of the CPU but in others... so we need to be a bit
more careful.

It's true that the current code will mark one context as stale but will
end up flushing them all from the TLB. This does have the
disadvantage that subsequent context switch might want to proceed to
more stealing with flush despite the fact that the TLB is indeed empty.

Your approach seems a bit hackish in the implementation but makes
somewhat sense in effectively disconnecting all context numbers when the
TLB is flushed. It's however not necessarily useable as-is on SMP.

One thing I would try to do in the future for example is to have a
lockless scenario when a context is already assigned. Your approach
would cause, under pressure, contexts to be de-assigned a bit to
aggressively. There are other reasons also why I wouldn't be so eager to
recycle PIDs.

Maybe a better option is to keep separate maps, between assigned
contexts and "dirty" contexts. IE. The current map means assigned. A
context that is switch_mm'ed is marked dirty. We could maybe use that to
make smarter decisions as to when re-assign a PID and when use one that
we know has been flushed out and not re-dirty'ed yet.

Also, it would also make some sense to have flush_tlb_mm() clear the
dirty map in that case as well. In fact, there's a patch coming from
Dave Kleikamp that implements lazy flushing by effectively just
recycling PIDs for flush_tlb_mm() and only do a flush when running out.

Cheers,

--

From: Benjamin Herrenschmidt
Date: Friday, May 22, 2009 - 2:27 am

You are right, this definitely looks like a bug on platforms that have
HW support for the tlbil instruction (and thus care about the PID for
flushing) which afaik is only the case of recent freescale chips.

Have you verified that this change fixes your problem ?

Can you re-submit to linuxppc-dev@ozlabs.org mailing list, along with
proper changeset comment and signed-off-by: line ?

Cheers,

--

From: Benjamin Herrenschmidt
Date: Friday, May 22, 2009 - 2:42 am

In fact, you are doubly right in that it also happens on other platforms
because local_flush_tlb_mm() will check if the PID is MMU_NO_CONTEXT
regardless of what tlbilx supports.. oops

Looks like I only ran my context torture test with CONFIG_SMP enabled.

--

Previous thread: Fw: RCU detected CPU 1 stall (t=4295904002/751 jiffies) Pid: 902, comm: md1_raid5 by Janos Haar on Wednesday, May 20, 2009 - 2:46 am. (7 messages)

Next thread: [PATCH] tracing/events: Add timer and high res timer tracepoints by Anton Blanchard on Wednesday, May 20, 2009 - 3:13 am. (19 messages)