Re: [RFC] Exposing TSC "reliability" to userland

Previous thread: Re: s2ram slow (radeon) / failing (usb) by Alan Stern on Monday, May 3, 2010 - 1:11 pm. (10 messages)

Next thread: [PATCH] perf: implement recording/reporting per-cpu samples by Arun Sharma on Monday, May 3, 2010 - 1:38 pm. (11 messages)
From: Dan Magenheimer
Date: Monday, May 3, 2010 - 1:21 pm

In a patch posted late last year by Venki: 

http://lkml.org/lkml/2009/12/17/360

it was noted that some systems that specify the "Invariant TSC"
bit in CPUID (on recent processors) are sadly not guaranteed to
have synchronized TSCs.  As a result, Ingo's check_tsc_warp() is
executed; if the warp test passes, the kernel uses TSC
as clocksource and, if it doesn't pass, the kernel marks
the TSC as unstable and chooses a different clocksource.

Whether the kernel deems TSC to be reliable or not is a very
useful piece of information to userland, e.g. to certain
enterprise apps such the Oracle DB, some JVM's, etc.  If
TSC IS reliable, rdtsc can be used by many of these
enterprise applications in many situations in place of a
gettimeofday call.  Rdtsc can be much faster even than
a vsyscall and it is certainly much much faster when,
for one reason or another, vsyscall is not enabled.
This can make a huge performance difference in real
benchmarks when timestamps are frequently taken (10%
benchmark performance improvement was measured using
rdtsc vs gettimeofday syscall).

Running a warp test in userland is not nearly as accurate
as the warp test run by the kernel.  So it makes sense to expose
the results of the kernel warp test to userland, maybe
through /sys/devices/system/clocksource/tsc_reliable

Comments?
--

From: Venkatesh Pallipadi
Date: Tuesday, May 4, 2010 - 4:16 pm

On Mon, May 3, 2010 at 1:21 PM, Dan Magenheimer

[ Sorry if this is a duplicate. I had messed up my mail client format setting ]

One option is to remove tsc from
/sys/devices/system/clocksource/clocksource*/available_clocksource
when it is detected as unstable.

That should already be happening with NOHZ or HIGHRES selected. But,
should be simple to add some code to do this always.

Would that work?

Thanks,
Venki
--

Previous thread: Re: s2ram slow (radeon) / failing (usb) by Alan Stern on Monday, May 3, 2010 - 1:11 pm. (10 messages)

Next thread: [PATCH] perf: implement recording/reporting per-cpu samples by Arun Sharma on Monday, May 3, 2010 - 1:38 pm. (11 messages)