> * Mathieu Desnoyers (
mathieu.desnoyers@polymtl.ca) wrote:
> > * Peter Zijlstra (
a.p.zijlstra@chello.nl) wrote:
> > > On Fri, 2008-11-07 at 14:18 -0500, Mathieu Desnoyers wrote:
> > > > * Steven Rostedt (
rostedt@goodmis.org) wrote:
> > > > >
> > > > > On Fri, 7 Nov 2008, Mathieu Desnoyers wrote:
> > > > > >
> > > > > > __m_cnt_hi
> > > > > > is read before
> > > > > > mmio cnt_lo read
> > > > > >
> > > > > > for the detailed reasons explained in my previous discussion with
> > > > > > Nicolas here :
> > > > > >
http://lkml.org/lkml/2008/10/21/1
> > > > > >
> > > > > > I use smp_rmb() to do this on SMP systems (hrm, actually, a rmb() could
> > > > > > be required so it works also on UP systems safely wrt interrupts).
> > > > >
> > > > > smp_rmb turns into a compiler barrier on UP and should prevent the below
> > > > > description.
> > > > >
> > > >
> > > > Ah, right, preserving program order on UP should be enough. smp_rmb()
> > > > then.
> > >
> > >
> > > I'm not quite sure I'm following here. Is this a global hardware clock
> > > you're reading from multiple cpus, if so, are you sure smp_rmb() will
> > > indeed be enough to sync the read?
> > >
> > > (In which case the smp_wmb() is provided by the hardware increasing the
> > > clock?)
> > >
> > > If these are per-cpu clocks then even in the smp case we'd be good with
> > > a plain barrier() because you'd only ever want to read your own cpu's
> > > clock (and have a separate __m_cnt_hi per cpu).
> > >
> > > Or am I totally missing out on something?
> > >
> >
> > This is the global hardware clock scenario.
> >
> > We have to order an uncached mmio read wrt a cached variable read/write.
> > The uncached mmio read vs smp_rmb() barrier (e.g. lfence instruction)
> > should be insured by program order because the read will skip the cache
> > and go directly to the bus. Luckily we only do a mmio read and no mmio
> > write, so mmiowb() is not required.
> >
> > You might be right in that it could require more barriers.
> >
> > Given adequate program order, we can assume the the mmio read will
> > happen "on the spot", but that the cached read may be delayed.
> >
> > What we want is :
> >
> > readl(io_addr)
> > read __m_cnt_hi
> > write __m_cnt_hi
> >
> > With the two reads in the correct order. If we consider two consecutive
> > executions on the same CPU :
> >
> > readl(io_addr)
> > read __m_cnt_hi
> > write __m_cnt_hi
> >
> > readl(io_addr)
> > read __m_cnt_hi
> > write __m_cnt_hi
> >
> > We might have to order the read/write pair wrt the following readl, such
> > as :
> >
> > smp_rmb(); /* Waits for every cached memory reads to complete */
> > readl(io_addr);
> > barrier(); /* Make sure the compiler leaves mmio read before cached read */
> > read __m_cnt_hi
> > write __m_cnt_hi
> >
> > smp_rmb(); /* Waits for every cached memory reads to complete */
> > readl(io_addr)
> > barrier(); /* Make sure the compiler leaves mmio read before cached read */
> > read __m_cnt_hi
> > write __m_cnt_hi
> >
> > Would that make more sense ?
> >
>
> Oh, actually, I got things reversed in this email : the readl(io_addr)
> must be done _after_ the __m_cnt_hi read.
>
> Therefore, two consecutive executions would look like :
>
> barrier(); /* Make sure the compiler does not reorder __m_cnt_hi and
> previous mmio read. */
> read __m_cnt_hi
> smp_rmb(); /* Waits for every cached memory reads to complete */