Re: Clocksource tsc unstable (delta = -8589361717 ns) in current git

Previous thread: Reply Needed by Dr Carl Lee on Tuesday, October 26, 2010 - 5:59 am. (1 message)

Next thread: wrld_lotto_wrld by MMS INTERNATIONAL on Tuesday, October 26, 2010 - 4:49 am. (1 message)
From: Markus Trippelsdorf
Date: Tuesday, October 26, 2010 - 4:20 am

On my AMD system the following kernel messages are sometimes printed
after several hours of uptime:
Clocksource tsc unstable (delta = -8589361717 ns)
Switching to clocksource hpet

This happened several times already:
# grep "Clocksource" kernel.*
kernel.log:Oct 24 18:31:17 arch kernel: Clocksource tsc unstable (delta = -8591703466 ns)
kernel.log:Oct 26 12:54:38 arch kernel: Clocksource tsc unstable (delta = -8589361717 ns)
kernel.log.1:Oct 22 21:43:51 arch kernel: Clocksource tsc unstable (delta = -25772316650 ns)

The computer is nearly unusable afterwards (second long delays, that
seem to grow larger and larger) and only a prompt reboot can fix the
problem.
I'm running the latest git kernel. Version 2.6.36 does not show these
symptoms.

Dmesg and kernel config are attached.

-- 
Markus
From: Borislav Petkov
Date: Tuesday, October 26, 2010 - 6:18 am

Do you have any warnings/oopses before that, can we have the whole dmesg


otherwise, I don't see anything strange in your dmesg. Unless tglx has a
better idea, I'd ask you to bisect it. 5618 changesets shouldn't be that
much but I don't know, if the issue appears every several hours it could
still be tedious.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
--

From: Markus Trippelsdorf
Date: Tuesday, October 26, 2010 - 6:58 am

That would be a several week long process, as the issue appears ~ every
2 days here on this machine running 24/7.

-- 
Markus
--

From: Markus Trippelsdorf
Date: Tuesday, October 26, 2010 - 7:05 am

This time I will hopefully hit the right key in mutt...
-- 
Markus
From: Peter Zijlstra
Date: Tuesday, October 26, 2010 - 7:38 am

constant_tsc only means the TSC is not affected by cpu freq changes.

It has zero relevance to cross-cpu sync of the TSC. The clocksource
stuff requires your TSC to be synchronous over all your CPUs.
--

From: Thomas Gleixner
Date: Tuesday, October 26, 2010 - 8:46 am

That and not affected by C-States either.

Thanks,

	tglx
--

From: Borislav Petkov
Date: Tuesday, October 26, 2010 - 9:41 am

From: Thomas Gleixner <tglx@linutronix.de>

Well, F10h TSC is P- and C-state invariant - I've never seen any systems
not passing the TSC sync check at boot so far. And if there are still
systems where this is not the case or you see the TSC not behaving, make
sure to tell me about it.

Besides, the TSC calibration with PIT seems to function ok, and the
"Detected " line shows reasonable core frequency. FWIW, I have a similar
configuration here and will run latest -git on it too to check.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
--

From: Thomas Gleixner
Date: Tuesday, October 26, 2010 - 8:48 am

There is only a single commit in that area post 2.6.36:

      8af3c153baf95374eff20a37f00c59a295b52756 

But I have a hard time how this should make this happen. John ?

Thanks,

	tglx
--

From: Markus Trippelsdorf
Date: Tuesday, October 26, 2010 - 10:56 am

There is also: acf01734b1747b1ec4be6f159aff579ea5f7f8e2
but I'm not sure if this commit is really relevant.

-- 
Markus
--

From: Borislav Petkov
Date: Tuesday, October 26, 2010 - 11:19 am

This shouldn't be at fault since your detected core frequency printed
after the TSC PIT calibration looks fine:

Oct 24 10:19:05 arch kernel: Fast TSC calibration using PIT
Oct 24 10:19:05 arch kernel: Detected 3211.206 MHz processor.

The changeset above removes the recalibration which was only required by
very early F10h's (and yours is not).

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
--

From: john stultz
Date: Tuesday, October 26, 2010 - 12:18 pm

Yea, that one doesn't look connected to me.

There have been a few cases that I've seen where we can get false
positives for bad TSCs due to the watchdog clocksource having problems
(or the clocksource watchdog thread getting delayed for such a long time
the watchdog clocksource wraps and we then can't validly compare the two
be on the lookout for hpet related changes. 

Maybe does reverting 995bd3bb5c78f3ff71339803c0b8337ed36d64fb hide the
issue?

thanks
-john



--

From: Markus Trippelsdorf
Date: Wednesday, October 27, 2010 - 7:26 am

I think the remark above points in the right direction, because the
mysterious slowness happened again today (and this time on a vanilla git
kernel). But this time no "Clocksource tsc unstable" message appeared in
the kernel log. My system recovered by itself and after 2 minutes of
slowness it was usable again. (The symptoms were the same as desribed in
my first mail: Second long delays when typing and switching from X to
the console took many seconds.)

Is there anything I can do to pinpoint the cause of this slowdown should
it happen again? (Maybe a perf timechart?)

-- 
Markus
--

From: markus
Date: Wednesday, October 27, 2010 - 11:26 am

During my search for a reliable testcase I found out that kvm guests also
showed random hangs (that didn't happen in v2.6.36) here. These hangs last a
few seconds each, during which the mouse is not movable at all.

So I ran git-bisect with this testcase and the result of the bisection is:

34f971f6f7988be4d014eec3e3526bee6d007ffa is the first bad commit
commit 34f971f6f7988be4d014eec3e3526bee6d007ffa
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date:   Wed Sep 22 13:53:15 2010 +0200

    sched: Create special class for stop/migrate work

    In order to separate the stop/migrate work thread from the SCHED_FIFO
    implementation, create a special class for it that is of higher priority than
    SCHED_FIFO itself.

    This currently solves a problem where cpu-hotplug consumes so much cpu-time
    that the SCHED_FIFO class gets throttled, but has the bandwidth replenishment
    timer pending on the now dead cpu.

    It is also required for when we add the planned deadline scheduling class above
    SCHED_FIFO, as the stop/migrate thread still needs to transcent those tasks.

    Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <1285165776.2275.1022.camel@laptop>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

Reverting the commit solves the kvm hang issue.
(If this issue is related to my original tsc problem is of course open for
debate, but I have a strong hunch it is.)
-- 
Markus
--

From: Peter Zijlstra
Date: Wednesday, October 27, 2010 - 12:36 pm

Too weird,.. what does the hang look like?

Can you generate a sysrq-t dump? The thing I'm looking for is the
migration/# thread being runnable but not being current.

How can I reproduce this?
--

From: Markus Trippelsdorf
Date: Wednesday, October 27, 2010 - 12:42 pm

It's quite easy to reproduce on my setup. I simply run:

qemu-kvm -net nic,vlan=0,model=virtio -net user -drive file=ubuntu,if=virtio,boot=on -smp 2 -m 512

and then move the mousepointer in circles in the qemu window. It will stop and
freeze after a few seconds.
-- 
Markus
--

From: Marcelo Tosatti
Date: Wednesday, October 27, 2010 - 12:58 pm

I can reproduce it reliably here (requires kvm-autotest setup, its
difficult to reproduce manually).

INFO: task qemu:1872 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
qemu          D ffff88007f44f1c0     0  1872      1 0x00000084
ffff88006522dbb8 0000000000000086 ffff8800ffffffff ffff88006522dfd8
0000000000013580 ffff880078dd0000 ffff880078dd0380 ffff880078dd0378
ffff88006522c000 ffff88006522dfd8 0000000000013580 0000000000013580
Call Trace:
[<ffffffff8143c919>] ? _raw_spin_unlock_irq+0x12/0x3c
[<ffffffff8143b111>] schedule_timeout+0x27/0xc0
[<ffffffff8143fc50>] ? sub_preempt_count+0xe/0xab
[<ffffffff810414e0>] ? get_parent_ip+0x11/0x41
[<ffffffff810414e0>] ? get_parent_ip+0x11/0x41
[<ffffffff8143fcd9>] ? sub_preempt_count+0x97/0xab
[<ffffffff8143a6e8>] wait_for_common+0xa3/0x110
[<ffffffff81046f02>] ? default_wake_function+0x0/0x14
[<ffffffff81039e92>] ? synchronize_sched_expedited_cpu_stop+0x0/0x10
[<ffffffff8143a80d>] wait_for_completion+0x1d/0x1f
[<ffffffff81091d90>] __stop_cpus+0xf5/0x112
[<ffffffff81091ded>] try_stop_cpus+0x40/0x59
[<ffffffff81039e92>] ? synchronize_sched_expedited_cpu_stop+0x0/0x10
[<ffffffff81040f50>] synchronize_sched_expedited+0x7e/0x98
[<ffffffff81040ed2>] ? synchronize_sched_expedited+0x0/0x98
[<ffffffff8106b707>] __synchronize_srcu+0x31/0x70
[<ffffffff8106b75b>] synchronize_srcu_expedited+0x15/0x17
[<ffffffffa0206b5c>] kvm_vm_ioctl_get_dirty_log+0x104/0x19b [kvm]
[<ffffffff8120c3e4>] ? rb_insert_color+0x68/0xe5
[<ffffffffa01f6e0b>] kvm_vm_ioctl+0x239/0x353 [kvm]
[<ffffffff8143c8fc>] ? _raw_spin_unlock_irqrestore+0x37/0x42
[<ffffffff8106affc>] ? __hrtimer_start_range_ns+0x329/0x33b
[<ffffffff8143c6ae>] ? _raw_spin_lock_irqsave+0x2a/0x46
[<ffffffff81117563>] do_vfs_ioctl+0x46a/0x4b9
[<ffffffff8143c8fc>] ? _raw_spin_unlock_irqrestore+0x37/0x42
[<ffffffff81117608>] sys_ioctl+0x56/0x79
[<ffffffff8143d48a>] ? do_device_not_available+0xe/0x10
[<ffffffff81009cf2>] ...
From: Heiko Carstens
Date: Tuesday, November 9, 2010 - 5:58 am

I think there is a bug in pick_next_task_stop() in sched_stopclass.c:

If a stop-task scheduling class task (well... the migration thread ;) sets
its state to TASK_INTERRUPTIBLE and then gets preempted it will never
scheduled again, because pick_next_task_stop() ignores all tasks with a
state != TASK_RUNNING:

static struct task_struct *pick_next_task_stop(struct rq *rq)
{
	struct task_struct *stop = rq->stop;

	if (stop && stop->state == TASK_RUNNING)
		return stop;

	return NULL;
}

At least I have two dumps of machines where all cpus but one are running
the migration thread. Only the last one runs usual user space processes but
================================================================
STACK TRACE FOR TASK: 0x7e50cd40 (migration/9)

 STACK:
 0 schedule+1204 [0x55f754]
 1 preempt_schedule+102 [0x56031e]
 2 _raw_spin_unlock_irq+118 [0x563fa6]
 3 cpu_stopper_thread+144 [0x1ad854]
 4 kthread+166 [0x1685aa]
 5 kernel_thread_starter+6 [0x106bea]
================================================================

struct task_struct {
        state = 0x1  /* TASK_INTERRUPTIBLE */
...

I would guess something like the below would probably fix it.
Does that make any sense or did I miss something obvious?

diff --git a/kernel/sched_stoptask.c b/kernel/sched_stoptask.c
index 45bddc0..d5cf344 100644
--- a/kernel/sched_stoptask.c
+++ b/kernel/sched_stoptask.c
@@ -26,7 +26,8 @@ static struct task_struct *pick_next_task_stop(struct rq *rq)
 {
 	struct task_struct *stop = rq->stop;
 
-	if (stop && stop->state == TASK_RUNNING)
+	/* got preempted if on_rq == 1 -- ignore */
+	if (stop && ((stop->state == TASK_RUNNING) || stop->se.on_rq))
 		return stop;
 
 	return NULL;
--

From: Markus Trippelsdorf
Date: Tuesday, November 9, 2010 - 6:21 am

You've missed the fact that Peter already has a patch that fixes the
problem, but he never bothered to post it in this thread.

http://article.gmane.org/gmane.linux.kernel/1058018

-- 
Markus
--

From: Heiko Carstens
Date: Tuesday, November 9, 2010 - 6:28 am

Ah, great... wasted my time for something that's already fixed. ;)

Thanks for the pointer!
--

From: Peter Zijlstra
Date: Tuesday, November 9, 2010 - 6:45 am

Right, forgot about that, sorry.

Still Heiko has a good point, I've queued the below after I realized
that PREEMPT_ACTIVE would still require two tests and the on_rq bit is
actually sufficient.

---
Subject: sched: Fix runnable condition for stoptask
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Tue Nov 09 14:36:52 CET 2010

Heiko reported that the TASK_RUNNING check is not sufficient for
CONFIG_PREEMPT=y since we can get preempted with !TASK_RUNNING.

He suggested adding a ->se.on_rq test to the existing TASK_RUNNING
one, however TASK_RUNNING will always have ->se.on_rq, so we might as
well reduce that to a single test.

Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
---
 kernel/sched_stoptask.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/kernel/sched_stoptask.c
===================================================================
--- linux-2.6.orig/kernel/sched_stoptask.c
+++ linux-2.6/kernel/sched_stoptask.c
@@ -26,7 +26,7 @@ static struct task_struct *pick_next_tas
 {
 	struct task_struct *stop = rq->stop;
 
-	if (stop && stop->state == TASK_RUNNING)
+	if (stop && stop->se.on_rq)
 		return stop;
 
 	return NULL;

--

From: Peter Zijlstra
Date: Tuesday, November 9, 2010 - 6:33 am

I'd write it as a PREEMPT_ACTIVE check, but yes I think you're right.

That said, I also have the below patch queued which should fix up the
wakeup-preemption.


---
Subject: sched: Fixup cross sched_class wakeup preemption
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Sun Oct 31 12:37:04 CET 2010

Instead of dealing with sched classes inside each check_preempt_curr()
implementation, pull out this logic into the generic wakeup preemption
path.

This fixes a hang in KVM (and others) where we are waiting for the
stop machine thread to run..

Tested-by: Marcelo Tosatti <mtosatti@redhat.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1288891946.2039.31.camel@laptop>
---
 kernel/sched.c          |   38 +++++++++++++++++++++++++++-----------
 kernel/sched_fair.c     |    6 ------
 kernel/sched_stoptask.c |    2 +-
 3 files changed, 28 insertions(+), 18 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -560,18 +560,8 @@ struct rq {
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
 
-static inline
-void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
-{
-	rq->curr->sched_class->check_preempt_curr(rq, p, flags);
 
-	/*
-	 * A queue event has occurred, and we're going to schedule.  In
-	 * this case, we can save a useless back to back clock update.
-	 */
-	if (test_tsk_need_resched(p))
-		rq->skip_clock_update = 1;
-}
+static void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags);
 
 static inline int cpu_of(struct rq *rq)
 {
@@ -9402,4 +9392,30 @@ void synchronize_sched_expedited(void)
 }
 EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
 
+static void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
+{
+	const struct sched_class *class;
+
+	if (p->sched_class == ...
From: Markus Trippelsdorf
Date: Tuesday, November 9, 2010 - 6:07 am

You missed the fact that Peter already has a patch that fixes the
problem, but never bothered to post it in this thread.

http://article.gmane.org/gmane.linux.kernel/1058018

-- 
Markus
--

From: Borislav Petkov
Date: Friday, October 29, 2010 - 1:18 am

Well, what do you know, the temporary "slowdown" triggered on my setup
too. After waiting a minute maybe, the machine recovered and guess what
appeared in dmesg:

[44729.650859] Clocksource tsc unstable (delta = -8615263009 ns)
[44729.656229] Switching to clocksource hpet

I've reverted the changeset above and will run without it to check
whether it makes any difference.

-- 
Regards/Gruss,
    Boris.
--

From: Markus Trippelsdorf
Date: Friday, October 29, 2010 - 1:30 am

After further testing I've found out that the tsc- and the kvm-issue are
unrelated. IOW I get these "slowdowns" even with 34f971f6f7988be4
reverted. 


Because with 995bd3bb5c78f reverted I never had a single "slowdown" here.
-- 
Markus
--

From: Borislav Petkov
Date: Friday, October 29, 2010 - 3:27 am

hmm, I'll run with 34f971f6f798 reverted for a while just to confirm

That's strange, when we tested this one everything seemed fine so I
guess this is one of those bugs which appear later, just as if out of
nowhere.

Thomas, could it be that SMI fires in between the HPET write and
subsequent read:

	hpet_writel(cnt, HPET_Tn_CMP(timer));

	...

	res = (s32)(cnt - hpet_readl(HPET_COUNTER));

causing the -ETIME and thus a wait for HPET wraparound? My machine at
least does SMI-initiated C1E so it could very well be the problem.

Markus, can you verify this on your system by compiling x86info from
git://git.choralone.org/git/x86info and doing

./lsmsr Int

as root?

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
--

From: Markus Trippelsdorf
Date: Friday, October 29, 2010 - 4:34 am

x86info v1.28beta.  Dave Jones 2001-2010
Feedback to <davej@redhat.com>.

Found 4 CPUs
--------------------------------------------------------------------------
CPU #1
EFamily: 1 EModel: 0 Family: 15 Model: 4 Stepping: 2
CPU Model: Quad-Core Opteron/Phenom II (RB-C2)
Processor name string: AMD Phenom(tm) II X4 955 Processor
Monitor/Mwait: min/max line size 64/64, ecx bit 0 support, enumeration extension
SVM: revision 1, 64 ASIDs, np, lbrVirt, SVMLock, NRIPSave
Address Size: 48 bits virtual, 48 bits physical
The physical package has 4 of 4 possible cores implemented.
--------------------------------------------------------------------------
CPU #2
EFamily: 1 EModel: 0 Family: 15 Model: 4 Stepping: 2
CPU Model: Quad-Core Opteron/Phenom II (RB-C2)
Processor name string: AMD Phenom(tm) II X4 955 Processor
Monitor/Mwait: min/max line size 64/64, ecx bit 0 support, enumeration extension
SVM: revision 1, 64 ASIDs, np, lbrVirt, SVMLock, NRIPSave
Address Size: 48 bits virtual, 48 bits physical
The physical package has 4 of 4 possible cores implemented.
--------------------------------------------------------------------------
CPU #3
EFamily: 1 EModel: 0 Family: 15 Model: 4 Stepping: 2
CPU Model: Quad-Core Opteron/Phenom II (RB-C2)
Processor name string: AMD Phenom(tm) II X4 955 Processor
Monitor/Mwait: min/max line size 64/64, ecx bit 0 support, enumeration extension
SVM: revision 1, 64 ASIDs, np, lbrVirt, SVMLock, NRIPSave
Address Size: 48 bits virtual, 48 bits physical
The physical package has 4 of 4 possible cores implemented.
--------------------------------------------------------------------------
CPU #4
EFamily: 1 EModel: 0 Family: 15 Model: 4 Stepping: 2
CPU Model: Quad-Core Opteron/Phenom II (RB-C2)
Processor name string: AMD Phenom(tm) II X4 955 Processor
Monitor/Mwait: min/max line size 64/64, ecx bit 0 support, enumeration extension
SVM: revision 1, 64 ASIDs, np, lbrVirt, SVMLock, NRIPSave
Address Size: 48 bits virtual, 48 bits physical
The physical package has 4 ...
From: Borislav Petkov
Date: Friday, October 29, 2010 - 4:39 am

well, x86info doesn't help - there's a lsmsr tool which is being built
too but no need, I can infer the info from your revision: your CPU is
RB-C2 which does SMI initiated C1E.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
--

From: Markus Trippelsdorf
Date: Friday, October 29, 2010 - 4:54 am

Sorry. Here is the correct output:

IntPendingMessage    = 0x0000000008c200b0

-- 
Markus
--

From: Borislav Petkov
Date: Friday, October 29, 2010 - 6:13 am

Yep, bit 27 is set, SmiOnCmpHalt.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
--

From: Thomas Gleixner
Date: Friday, October 29, 2010 - 5:14 am

Well, even if a SMI fires, then the counter will be ahead of cnt and
we get -ETIME. The upper layer of clockevents/timers will then
reprogram HPET. So that's not an issue.

The only problem which might hit us is when our assumption of 8 HPET
cycles being enough to transfer the new match value into the real
match register is wrong.

Thanks,

	tglx
--

From: Borislav Petkov
Date: Friday, October 29, 2010 - 10:00 am

So Markus, can you try with 995bd3bb5c78, but by increasing the value
to, say 16 (I don't know what's a good value here, let's double the old
one). Simply change the line

	return res < 8 ? -ETIME : 0;

to

	return res < 16 ? -ETIME : 0;


in <arch/x86/kernel/hpet.c:hpet_next_event()>. I'll do that too on the
machine here when I get around to it.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
--

From: Markus Trippelsdorf
Date: Friday, October 29, 2010 - 10:26 am

Yes, I already did this, although I'm running
return res < 12 ? -ETIME : 0; 
at the moment.

-- 
Markus
--

From: Thomas Gleixner
Date: Monday, November 1, 2010 - 11:05 am

Any update on this ?

Thanks,

	tglx
--

From: Markus Trippelsdorf
Date: Monday, November 1, 2010 - 11:45 am

Yes, it runs stable thus far. No "slowdowns", no problems.
-- 
Markus
--

From: Thomas Gleixner
Date: Tuesday, November 2, 2010 - 8:26 am

Ok, I'm waiting for Boris to get us the confirmation from HW folks.

Thanks,

	tglx
--

From: Borislav Petkov
Date: Friday, November 5, 2010 - 9:09 am

Ok, I don't have much but it could be something. We could use the
minimum tick u16 value in the ACPI HPET table, offset 53:

Main Counter Minimum   2 53 Unit: Clock tick
Clock_tick in Periodic      The minimum clock ticks can be set without lost
Mode                        interrupts while the counter is programmed to operate in
*Note#3                     periodic mode

...

* Note #3: This field is written by BIOS and may be chipset and/or
platform dependent. This indicates the minimum value that must be used
for any counter programmed in periodic mode to avoid lost interrupts.
For any counter x that has been configured for periodic mode, the number
can be programmed in any Tx_Compare Register must be greater than P,
where P = (Minimum Period) / (Main counter period) in order to avoid
lost interrupts.

However, we don't know (yet) whether this can be used in the
non-periodic mode too. My gut feeling says yes but I wouldn't trust it.

We went and collected that value a bunch of systems and it looks like it
would need more massaging (read: capping) since some BIOSen simply write
crap in it. It ranges from

- 0x37ee on old nVidia and Intel boards (this is definitely crap, I
can't imagine a minimum ticks value of 14318 for a HPET but who knows)

- 0x1000 on a HP machine (also fishy)

- 0x10, 0x14 on current SBxxx boards

- 0x80 on newer Intel boards

and it looks like 0x80 is also the default value which we can use as a
capping value. Btw, I've been running the following on the machine in
question for a while now and no lockups so far.

Opinions?

---
 arch/x86/include/asm/hpet.h |    1 +
 arch/x86/kernel/acpi/boot.c |   12 +++++++++---
 arch/x86/kernel/hpet.c      |    3 ++-
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 2c392d6..01d9480 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -68,6 +68,7 @@ extern unsigned long force_hpet_address;
 extern ...
From: Markus Trippelsdorf
Date: Friday, November 5, 2010 - 9:42 am

"min tick: 20" on my machine. Given that 12 is running stable here, maybe

Wouldn't:

+u16                                  hpet_min_tick;


-- 
Markus
--

From: Markus Trippelsdorf
Date: Friday, November 5, 2010 - 10:45 am

Please scratch that.
-- 
Markus
--

From: Markus Trippelsdorf
Date: Friday, November 5, 2010 - 2:27 pm

Err, 20==0x14 so it's not that much difference and it's running fine here
also.

Feel free to add:
    Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>

Thanks.

-- 
Markus
--

From: Borislav Petkov
Date: Friday, November 5, 2010 - 2:32 pm

Thanks Markus,

actually your board is not what concerns me, 20 ticks is still ok, more
or less, but there are other machines which contain absurd values in
there like 0x37ee or 0x1000 (a Broadcom chipset). We'll need to give a
change like that a good run before we can be absolutely sure it doesn't
break any machines.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
--

From: Thomas Gleixner
Date: Tuesday, November 9, 2010 - 7:02 am

If the ACPI entry is known to be flaky, shouldn't we simply err out on
the safe side and use 128 ticks in any case, which is not a really big
deal.

Thanks,

	tglx
--

From: Borislav Petkov
Date: Tuesday, November 9, 2010 - 7:39 am

Yep, this is what my proposed fix does. I set it by default to 0x80 and
the hpet detection code in acpi_parse_hpet() overrides it if it is less
than that (and obviously a sensible value written by the BIOS).

Otherwise it issues a warning. Come to think of it, we shouldn't be
issuing a warning because this'll scream on a very high number of
systems, IMHO, especially older boards. Instead, we should issue it in
dmesg during boot just in case.

The other concern I have is whether min tick of 128 would work for _all_
possible HPET implementations - I don't know whether there are some very
b0rked incarnations which delay HPET accesses to more than 128 cycles.
Kinda hard to say.

So, to be more specific, here's what I have in mind:

--
From: Borislav Petkov <borislav.petkov@amd.com>
Date: Thu, 4 Nov 2010 00:01:49 +0100
Subject: [PATCH] HPET: Fix HPET readout for small deltas

Some HPET implementations might require longer delay when accessing the
HPET than what 995bd3bb5c78f3ff71339803c0b8337ed36d64fb established (8
cycles). Generally, the proper value should be programmed by BIOS and
written into the ACPI HPET table as the main counter minimum tick in
periodic mode (offset 53).

We assume that value as the minimum value a delta can be in order to
reprogram the HPET successfully. For BIOSen which contain crap, we fall
back to a default value of 128 cycles which should be sensible on all
more or less sane HPET implementations.

LKML-Reference: <20101026112052.GA1672@arch.trippelsdorf.de>
Not-Yet-Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/include/asm/hpet.h |    1 +
 arch/x86/kernel/acpi/boot.c |   10 +++++++---
 arch/x86/kernel/hpet.c      |    3 ++-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 2c392d6..01d9480 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -68,6 +68,7 @@ extern unsigned long force_hpet_address;
 extern u8 ...
From: Markus Trippelsdorf
Date: Wednesday, November 17, 2010 - 4:21 pm

Any update on this issue?

Borislav's patch solves the mysterious "slowdown" problem and is running
without problems for the last two weeks here. 

(And rc2 is already out. So maybe it's time to push this to Linus so
that more people have a chance to test it?)
-- 
Markus
--

From: Borislav Petkov
Date: Wednesday, November 24, 2010 - 4:25 am

Yep, we definitely need some kind of fix for this. It just triggered on
me on -rc3 (system became sluggish, everything advanced only after a
keypress along with the already familiar clocksource unstable message in
dmesg).

Thomas, want me to rebase the patch ontop of tip:master?

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
--

From: Thomas Gleixner
Date: Friday, October 29, 2010 - 10:34 am

Boris,


can you please check with your hardware folks? If we can't get some
affirmative info about that then we'd be forced to revert that commit
(if it turns out to be the culprit one). That'd be a pity as it would
bring back the full IO access penalty to everyone.

Thanks,

	tglx
--

From: Mike Galbraith
Date: Tuesday, October 26, 2010 - 8:24 pm

Turns out that patch is still racy btw.  Very hard to trigger, but you
should remove it for safety and test result integrity anyway.

	-Mike

--

Previous thread: Reply Needed by Dr Carl Lee on Tuesday, October 26, 2010 - 5:59 am. (1 message)

Next thread: wrld_lotto_wrld by MMS INTERNATIONAL on Tuesday, October 26, 2010 - 4:49 am. (1 message)