Re: x86: disable preemption in delay_tsc()

Previous thread: Re: [NET]: rt_check_expire() can take a long time, add a cond_resched() by Arjan van de Ven on Thursday, November 15, 2007 - 11:38 pm. (8 messages)

Next thread: [PATCH] Clustering indirect blocks in Ext3 by Abhishek Rai on Friday, November 16, 2007 - 1:02 am. (19 messages)
To: Linux Kernel Mailing List <linux-kernel@...>
Cc: <akpm@...>, <mingo@...>
Date: Thursday, November 15, 2007 - 11:41 pm

On Thu, 15 Nov 2007 04:00:47 GMT

this worries me.. this appears to effectively disable preemption during
udelay() and mdelay() loops... which are very obvious latency inducers.

Now you can argue that if you're preemptible you should have used
msleep() and co, and I'll totally buy that.

Maybe we should just check if we're still on the same cpu or something,
or have a cheap way to pin a process to a cpu.... but both are longer
term solutions.

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
-

To: Arjan van de Ven <arjan@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, <akpm@...>, <mingo@...>
Date: Friday, November 16, 2007 - 4:08 am

You can use preemption notifiers to get a callback when you are
preempted. Not sure what you'd to with that callback, though.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

-

To: Avi Kivity <avi@...>
Cc: Arjan van de Ven <arjan@...>, Linux Kernel Mailing List <linux-kernel@...>, <akpm@...>
Date: Friday, November 16, 2007 - 4:36 am

but that should not be needed in this case. Why doesnt the TSC using
delay loop simply poll the CPU it is on and fix up the TSC?

Ingo
-

To: Avi Kivity <avi@...>
Cc: Arjan van de Ven <arjan@...>, Linux Kernel Mailing List <linux-kernel@...>, <akpm@...>
Date: Friday, November 16, 2007 - 4:47 am

something like the patch below.

Ingo

--------------->
Subject: x86: make delay_tsc() preemptible again
From: Ingo Molnar <mingo@elte.hu>

make delay_tsc() preemptible again.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/lib/delay_32.c | 28 +++++++++++++++++++++++-----
arch/x86/lib/delay_64.c | 30 ++++++++++++++++++++++++------
2 files changed, 47 insertions(+), 11 deletions(-)

Index: linux/arch/x86/lib/delay_32.c
===================================================================
--- linux.orig/arch/x86/lib/delay_32.c
+++ linux/arch/x86/lib/delay_32.c
@@ -38,17 +38,35 @@ static void delay_loop(unsigned long loo
:"0" (loops));
}

-/* TSC based delay: */
+/*
+ * TSC based delay:
+ *
+ * We are careful about preemption as TSC's are per-CPU.
+ */
static void delay_tsc(unsigned long loops)
{
- unsigned long bclock, now;
+ unsigned long prev, now;
+ long left = loops;
+ int prev_cpu, cpu;

- preempt_disable(); /* TSC's are per-cpu */
- rdtscl(bclock);
+ preempt_disable();
+ rdtscl(prev);
do {
+ prev_cpu = smp_processor_id();
rep_nop();
+ preempt_enable();
+
+ preempt_disable();
+ cpu = smp_processor_id();
rdtscl(now);
- } while ((now-bclock) < loops);
+ /*
+ * If we preempted we skip this small amount of time:
+ */
+ if (prev_cpu != cpu)
+ prev = now;
+ left -= now - prev;
+ prev = now;
+ } while (left > 0);
preempt_enable();
}

Index: linux/arch/x86/lib/delay_64.c
===================================================================
--- linux.orig/arch/x86/lib/delay_64.c
+++ linux/arch/x86/lib/delay_64.c
@@ -26,17 +26,35 @@ int read_current_timer(unsigned long *ti
return 0;
}

+/*
+ * TSC based delay:
+ *
+ * We are careful about preemption as TSC's are per-CPU.
+ */
void __delay(unsigned long loops)
{
- unsigned bclock, now;
+ unsigned long prev, now;
+ long left = loops;
+ int prev_cpu, cpu;

- preempt_disable(); /* TSC's are pre-cpu */
- rdtscl(bcl...

To: Ingo Molnar <mingo@...>
Cc: Avi Kivity <avi@...>, Arjan van de Ven <arjan@...>, Linux Kernel Mailing List <linux-kernel@...>, <akpm@...>
Date: Friday, November 16, 2007 - 5:39 am

Otherwise, looks like a very nice patch :-)

-

To: Peter Zijlstra <peterz@...>
Cc: Avi Kivity <avi@...>, Arjan van de Ven <arjan@...>, Linux Kernel Mailing List <linux-kernel@...>, <akpm@...>
Date: Friday, November 16, 2007 - 6:50 am

yeah, fixed.

updated patch below.

Ingo

---------------->
Subject: x86: make delay_tsc() preemptible again
From: Ingo Molnar <mingo@elte.hu>

make delay_tsc() preemptible again.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/x86/lib/delay_32.c | 27 ++++++++++++++++++++++-----
arch/x86/lib/delay_64.c | 29 +++++++++++++++++++++++------
2 files changed, 45 insertions(+), 11 deletions(-)

Index: linux/arch/x86/lib/delay_32.c
===================================================================
--- linux.orig/arch/x86/lib/delay_32.c
+++ linux/arch/x86/lib/delay_32.c
@@ -38,17 +38,34 @@ static void delay_loop(unsigned long loo
:"0" (loops));
}

-/* TSC based delay: */
+/*
+ * TSC based delay:
+ *
+ * We are careful about preemption as TSC's are per-CPU.
+ */
static void delay_tsc(unsigned long loops)
{
- unsigned long bclock, now;
+ unsigned long prev, now;
+ long left = loops;
+ int prev_cpu, cpu;

- preempt_disable(); /* TSC's are per-cpu */
- rdtscl(bclock);
+ preempt_disable();
+ rdtscl(prev);
do {
+ prev_cpu = smp_processor_id();
+ preempt_enable();
rep_nop();
+ preempt_disable();
+ cpu = smp_processor_id();
rdtscl(now);
- } while ((now-bclock) < loops);
+ /*
+ * If we migrated we skip this small amount of time:
+ */
+ if (prev_cpu != cpu)
+ prev = now;
+ left -= now - prev;
+ prev = now;
+ } while (left > 0);
preempt_enable();
}

Index: linux/arch/x86/lib/delay_64.c
===================================================================
--- linux.orig/arch/x86/lib/delay_64.c
+++ linux/arch/x86/lib/delay_64.c
@@ -26,17 +26,34 @@ int read_current_timer(unsigned long *ti
return 0;
}

+/*
+ * TSC based delay:
+ *
+ * We are careful about preemption as TSC's are per-CPU.
+ */
void __delay(unsigned long loops)
{
- unsigned bclock, now;
+ unsigned long prev, now;
+ long left = loops;
+ int prev_cpu, cpu;...

To: Arjan van de Ven <arjan@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, <mingo@...>
Date: Thursday, November 15, 2007 - 11:52 pm

Yes, we can do better.

But this bug can cause very rare failures in probably a large number of
device drivers on a minorty of machines. Ugly. So I felt it best to
plug it fast while people think about more sophisticated fixes.

-

To: Andrew Morton <akpm@...>
Cc: Arjan van de Ven <arjan@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Friday, November 16, 2007 - 2:13 am

how about using usleep() transparently if high-res timers are active and
we have !preempt_count()? That would be a sufficient solution and would
avoid all the calibration and per-cpu-ness problems.

Ingo
-

To: Ingo Molnar <mingo@...>
Cc: Arjan van de Ven <arjan@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Friday, November 16, 2007 - 3:08 am

It sounds like it would work OK. What is the setup cost for a usleep? I'd
have thought that code which does something like

while (i++ < 1000) {
foo();
udelay(1);
}

would take qiute a bit longer with such a change?

-

To: Andrew Morton <akpm@...>
Cc: Arjan van de Ven <arjan@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Friday, November 16, 2007 - 3:17 am

full roundtrip cost ought to be below 10 usecs, depending on the system.
There's no problem doing a non-preemptible udelay up to 10 usecs and we
could use usleep above that.

Ingo
-

To: Ingo Molnar <mingo@...>
Cc: Arjan van de Ven <arjan@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Friday, November 16, 2007 - 3:26 am

Ow. So the above timeout would take 10x longer. That probably won't break
anything, but quite a few drivers do udelay(1) for post-IO settling times

Yup, with a few smarts in there we could work out which is the best to use,
and also compensate for the setup costs.

It doesn't sound very 2.6.24ish though.

As a quicky things perhaps we could only do the preempt_disable()/preempt_enable()
if the TSCs are unsynced? Do we reliably know that? I guess not..
-

Previous thread: Re: [NET]: rt_check_expire() can take a long time, add a cond_resched() by Arjan van de Ven on Thursday, November 15, 2007 - 11:38 pm. (8 messages)

Next thread: [PATCH] Clustering indirect blocks in Ext3 by Abhishek Rai on Friday, November 16, 2007 - 1:02 am. (19 messages)