As recent discussions[1], and bugs[2] have shown, there is a great deal of confusion about the expected behavior of atomic_read(), compounded by the fact that it is not the same on all architectures. Since users expect calls to atomic_read() to actually perform a read, it is not desirable to allow the compiler to optimize this away. Requiring the use of barrier() in this case is inefficient, since we only want to re-load the atomic_t variable, not everything else in scope. This patchset makes the behavior of atomic_read uniform by removing the volatile keyword from all atomic_t and atomic64_t definitions that currently have it, and instead explicitly casts the variable as volatile in atomic_read(). This leaves little room for creative optimization by the compiler, and is in keeping with the principles behind "volatile considered harmful". Busy-waiters should still use cpu_relax(), but fast paths may be able to reduce their use of barrier() between some atomic_read() calls. -- Chris 1) http://lkml.org/lkml/2007/7/1/52 2) http://lkml.org/lkml/2007/8/8/122 -
Just an idea: since all architectures already include asm-generic/atomic.h, why not move the definitions of atomic_t and atomic64_t, as well as anything that does not involve architecture specific inline assembly into the generic header? Arnd <>< -
a) chicken and egg: asm-generic/atomic.h depends on definitions in asm/atomic.h If you can find a way to reshuffle the code and make it simpler, I personally am all for it. I'm skeptical that you'll get much to show for the effort. b) The definitions aren't precisely identical between all architectures, so it would be a mess of special cases, which gets us right back to where we are now. -- Chris -
I guess it could be done using more macros or new headers, but I don't see Why are they not identical? Anything beyond the 32/64 bit difference should be the same afaics, or it might cause more bugs. Arnd <>< -
volatile is generally harmful even in atomic_read(). Barriers control visibility and AFAICT things are fine. -
But barriers force a flush of *everything* in scope, which we generally don't want. On the other hand, we pretty much always want to flush atomic_* operations. One way or another, we should be restricting the volatile behavior to the thing that needs it. On most architectures, this patch set just moves that from the declaration, where it is considered harmful, to the use, where it is considered an occasional necessary evil. See the resubmitted patchset, which also puts a cast in the atomic[64]_set operations. -- Chris -
Then we would need atomic_read() and atomic_read_volatile() atomic_read_volatile() would imply an object sized memory barrier before and after? -
Frankly, I don't see the need for this series myself either. Personal opinion (others may differ), but I consider "volatile" to be a sad / unfortunate wart in C (numerous threads on this list and on the gcc lists/bugzilla over the years stand testimony to this) and if we _can_ steer clear of it, then why not -- why use this ill-defined primitive whose implementation has often differed over compiler versions and platforms? Granted, barrier() _is_ heavy-handed in that it makes the optimizer forget _everything_, but then somebody did post a forget() macro on this thread itself ... [ BTW, why do we want the compiler to not optimize atomic_read()'s in the first place? Atomic ops guarantee atomicity, which has nothing to do with "volatility" -- users that expect "volatility" from atomic ops are the ones who must be fixed instead, IMHO. ] -
Interactions between mainline code and interrupt/NMI handlers on the same CPU (for example, when both are using per-CPU variables. See examples previously posted in this thread, or look at the rcu_read_lock() and rcu_read_unlock() implementations in http://lkml.org/lkml/2007/8/7/280. Thanx, Paul -
LDD3 says on page 125: "The following operations are defined for the type [atomic_t] and are guaranteed to be atomic with respect to all processors of an SMP computer." Doesn't "atomic WRT all processors" require volatility? -- Stefan Richter -=====-=-=== =--- -==== http://arcgraph.de/sr/ -
Not at all. We also require this to be atomic without any hint of volatility. extern int foo; foo = 4; Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
No, it definitely doesn't. Why should it?
"Atomic w.r.t. all processors" is just your normal, simple "atomicity"
for SMP systems (ensure that that object is modified / set / replaced
in main memory atomically) and has nothing to do with "volatile"
behaviour.
"Volatile behaviour" itself isn't consistently defined (at least
definitely not consistently implemented in various gcc versions across
platforms), but it is /expected/ to mean something like: "ensure that
every such access actually goes all the way to memory, and is not
re-ordered w.r.t. to other accesses, as far as the compiler can take
care of these". The last "as far as compiler can take care" disclaimer
comes about due to CPUs doing their own re-ordering nowadays.
For example (say on i386):
(A)
$ cat tp1.c
int a;
void func(void)
{
a = 10;
a = 20;
}
$ gcc -Os -S tp1.c
$ cat tp1.s
...
movl $20, a
...
(B)
$ cat tp2.c
volatile int a;
void func(void)
{
a = 10;
a = 20;
}
$ gcc -Os -S tp2.c
$ cat tp2.s
...
movl $10, a
movl $20, a
...
(C)
$ cat tp3.c
int a;
void func(void)
{
*(volatile int *)&a = 10;
*(volatile int *)&a = 20;
}
$ gcc -Os -S tp3.c
$ cat tp3.s
...
movl $10, a
movl $20, a
...
In (A) the compiler optimized "a = 10;" away, but the actual store
of the final value "20" to "a" was still "atomic". (B) and (C) also
exhibit "volatile" behaviour apart from the "atomicity".
But as others replied, it seems some callers out there depend upon
atomic ops exhibiting "volatile" behaviour as well, so that answers
my initial question, actually. I haven't looked at the code Paul
pointed me at, but I wonder if that "forget(x)" macro would help
those cases. I'd wish to avoid the "volatile" primitive, personally.
Satyam
-
So, looking at load instead of store, understand I correctly that in
your opinion
int b;
b = atomic_read(&a);
if (b)
do_something_time_consuming();
b = atomic_read(&a);
if (b)
do_something_more();
should be changed to explicitly forget(&a) after
do_something_time_consuming?
If so, how about the following:
static inline void A(atomic_t *a)
{
int b = atomic_read(a);
if (b)
do_something_time_consuming();
}
static inline void B(atomic_t *a)
{
int b = atomic_read(a);
if (b)
do_something_more();
}
static void C(atomic_t *a)
{
A(a);
B(b);
}
Would this need forget(a) after A(a)?
(Is the latter actually answered in C99 or is it compiler-dependent?)
--
Stefan Richter
-=====-=-=== =--- -====
http://arcgraph.de/sr/
-
/* ^ typo */ -- Stefan Richter -=====-=-=== =--- -==== http://arcgraph.de/sr/ -
No, I'd actually prefer something like what Christoph Lameter suggested,
i.e. users (such as above) who want "volatile"-like behaviour from atomic
ops can use alternative functions. How about something like:
#define atomic_read_volatile(v) \
({ \
forget(&(v)->counter); \
((v)->counter); \
})
Or possibly, implement these "volatile" atomic ops variants in inline asm
like the patch that Sebastian Siewior has submitted on another thread just
a while back.
Of course, if we find there are more callers in the kernel who want the
volatility behaviour than those who don't care, we can re-define the
existing ops to such variants, and re-name the existing definitions to
somethine else, say "atomic_read_nonvolatile" for all I care.
-
Wouldn't the above "forget" the value, throw it away, then forget Given that you are advocating a change (please keep in mind that atomic_read() and atomic_set() had volatile semantics on almost all platforms), care to give some example where these historical volatile Do we really need another set of APIs? Can you give even one example where the pre-existing volatile semantics are causing enough of a problem to justify adding yet more atomic_*() APIs? Thanx, Paul -
Let's turn this around. Can you give a single example where the volatile semantics is needed in a legitimate way? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
Sorry, but you are the one advocating for the change. Nice try, though! ;-) Thanx, Paul -
Not for i386 and x86_64 -- those have atomic ops without any "volatile" semantics (currently as per existing definitions). -
I claim unit volumes with arm, and the majority of the architectures, but I cannot deny the popularity of i386 and x86_64 with many developers. ;-) However, I am not aware of code in the kernel that would benefit from the compiler coalescing multiple atomic_set() and atomic_read() invocations, thus I don't see the downside to volatility in this case. Are there some performance-critical code fragments that I am missing? Thanx, Paul -
Hmm, does arm really need that "volatile int counter;"? Hopefully RMK will I don't know, and yes, code with multiple atomic_set's and atomic_read's getting optimized or coalesced does sound strange to start with. Anyway, I'm not against "volatile semantics" per se. As replied elsewhere, I do appreciate the motivation behind this series (to _avoid_ gotchas, not to fix existing ones). Just that I'd like to avoid using "volatile", for aforementioned reasons, especially given that there are perfectly reasonable alternatives to achieve the same desired behaviour. Satyam -
Accessing H/W registers? But apart from that... David -
Communicating between process context and interrupt/NMI handlers using per-CPU variables. Thanx, Paul -
Remeber we're talking about atomic_read/atomic_set. Please cite the actual file/function name you have in mind. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
Yep, we are indeed talking about atomic_read()/atomic_set(). We have been through this issue already in this thread. Thanx, Paul -
Sorry, but I must've missed it. Could you cite the file or function for my benefit? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
I might summarize the thread if there is interest, but I am not able to do so right this minute. Thanx, Paul -
Thanks. But I don't need a summary of the thread, I'm asking for an extant code snippet in our kernel that benefits from the volatile change and is not part of a busy-wait. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
Sorry, can't help you there. I really do believe that the information you need (as opposed to the specific item you are asking for) really has been put forth in this thread. Thanx, Paul -
That only leads me to believe that such a code snippet simply does not exist. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
Hi Paul,
Nope, I don't think so. I wrote the following trivial testcases:
[ See especially tp4.c and tp4.s (last example). ]
==============================================================================
$ cat tp1.c # Using volatile access casts
#define atomic_read(a) (*(volatile int *)&a)
int a;
void func(void)
{
a = 0;
while (atomic_read(a))
;
}
==============================================================================
$ gcc -Os -S tp1.c; cat tp1.s
func:
pushl %ebp
movl %esp, %ebp
movl $0, a
.L2:
movl a, %eax
testl %eax, %eax
jne .L2
popl %ebp
ret
==============================================================================
$ cat tp2.c # Using nothing; gcc will optimize the whole loop away
#define forget(x)
#define atomic_read(a) \
({ \
forget(&(a)); \
(a); \
})
int a;
void func(void)
{
a = 0;
while (atomic_read(a))
;
}
==============================================================================
$ gcc -Os -S tp2.c; cat tp2.s
func:
pushl %ebp
movl %esp, %ebp
popl %ebp
movl $0, a
ret
==============================================================================
$ cat tp3.c # Using a full memory clobber barrier
#define forget(x) asm volatile ("":::"memory")
#define atomic_read(a) \
({ \
forget(&(a)); \
(a); \
})
int a;
void func(void)
{
a = 0;
while (atomic_read(a))
;
}
==============================================================================
$ gcc -Os -S tp3.c; cat tp3.s
func:
pushl %ebp
movl %esp, %ebp
movl $0, a
.L2:
cmpl $0, a
jne .L2
popl %ebp
ret
==============================================================================
$ cat tp4.c # Using a forget(var) macro
#define forget(a) __asm__ __volatile__ ("" :"=m" (a) :"m" (a))
#define atomic_read(a) \
({ \
forget(a); \
(a); \
})
int a;
void func(void)
{
a = 0;
while ...Right. I should have said "wouldn't the compiler be within its rights to forget the value, throw it away, then forget that it forgot it". The value coming out of the #define above is an unadorned ((v)->counter), The trick is to have a sufficiently complicated expression to force the compiler to run out of registers. If the value is non-volatile, it will refetch it (and expect it not to have changed, possibly being Well, if the second set doesn't care, they should be OK with the volatile behavior in this case. Thanx, Paul -
You can use -ffixed-XXX to keep the testcase simple. Segher -
But since there currently is only one such API, and there are users expecting the stronger behaviour, the only sane thing to do is let the API provide that behaviour. You can always add a new API with weaker behaviour later, and move users that are okay with it over to that new API. Segher -
You can *expect* whatever you want, but this isn't in line with reality at all. volatile _does not_ make accesses go all the way to memory. volatile _does not_ prevent reordering wrt other accesses. What volatile does are a) never optimise away a read (or write) to the object, since the data can change in ways the compiler cannot see; and b) never move stores to the object across a sequence point. This does not mean other accesses cannot be reordered wrt the volatile access. If the abstract machine would do an access to a volatile- qualified object, the generated machine code will do that access too. But, for example, it can still be optimised away by the compiler, if it can prove it is allowed to. If you want stuff to go all the way to memory, you need some architecture-specific flush sequence; to make a store globally visible before another store, you need mb(); before some following read, you need mb(); to prevent reordering you need a barrier. Segher -
^
(volatile)
As (now) indicated above, I had meant multiple volatile accesses to
the same object, obviously.
BTW:
#define atomic_read(a) (*(volatile int *)&(a))
#define atomic_set(a,i) (*(volatile int *)&(a) = (i))
int a;
void func(void)
{
int b;
b = atomic_read(a);
atomic_set(a, 20);
b = atomic_read(a);
}
gives:
func:
pushl %ebp
movl a, %eax
movl %esp, %ebp
movl $20, a
movl a, %eax
popl %ebp
ret
Sure, which explains the "as far as the compiler can take care" bit.
Poor phrase / choice of words, probably.
-
Yes, accesses to volatile objects are never reordered with Of course. It is executed by the abstract machine, so it will be executed by the actual machine. On the other hand, try b = 0; if (b) b = atomic_read(a); or similar. Segher -
Yup, obviously. Volatile accesses (or any access to volatile objects), or even "__volatile__ asms" (which gcc normally promises never to elid) can always be optimized for cases such as these where the compiler can trivially determine that the code in question is not reachable. -
Atomic variables are "volatile" in the sense that they are liable to be changed at any time by mechanisms that are outside the knowledge of the C compiler, namely, other CPUs, or this CPU executing an interrupt routine. In the kernel we use atomic variables in precisely those situations where a variable is potentially accessed concurrently by multiple CPUs, and where each CPU needs to see updates done by other CPUs in a timely fashion. That is what they are for. Therefore the compiler must not cache values of atomic variables in registers; each atomic_read must result in a load and each atomic_set must result in a store. Anything else will just lead to subtle bugs. I have no strong opinion about whether or not the best way to achieve this is through the use of the "volatile" C keyword. Segher's idea of using asm instead seems like a good one to me. Paul. -
This may have been the intend. However, today the visibility is controlled using barriers. And we have barriers that we use with atomic operations. Having volatile be the default just lead to confusion. Atomic read should just read with no extras. Extras can be added by using variants like atomic_read_volatile or so. -
Those barriers are for when we need ordering between atomic variables and other memory locations. An atomic variable by itself doesn't and shouldn't need any barriers for other CPUs to be able to see what's happening to it. Paul. -
It does not need any barriers. As soon as one cpu acquires the cacheline for write it will be invalidated in the caches of the others. So the other cpu will have to refetch. No need for volatile. The issue here may be that the compiler has fetched the atomic variable earlier and put it into a register. However, that prefetching is limited because it cannot cross functions calls etc. The only problem could be loops where the compiler does not refetch the variable since it assumes that it does not change and there are no function calls in the body of the loop. But AFAIK these loops need cpu_relax and other measures anyways to avoid bad effects from busy waiting. -
Seems to me that we face greater chance of confusion without the volatile than with, particularly as compiler optimizations become more aggressive. Yes, we could simply disable optimization, but optimization can be quite helpful. Thanx, Paul -
A volatile default would disable optimizations for atomic_read. atomic_read without volatile would allow for full optimization by the compiler. Seems that this is what one wants in many cases. -
The volatile cast should not disable all that many optimizations, for example, it is much less hurtful than barrier(). Furthermore, the main optimizations disabled (pulling atomic_read() and atomic_set() out of loops) really do need to be disabled. Thanx, Paul -
In many cases you do not need a barrier. Having volatile there *will* impact optimization because the compiler cannot use a register that may contain the value that was fetched earlier. And the compiler cannot choose freely when to fetch the value. The order of memory accesses are fixed if you use volatile. If the variable is not volatile then the compiler can arrange memory accesses any way they fit and thus generate better code. -
Understood. My point is not that the impact is precisely zero, but rather that the impact on optimization is much less hurtful than the problems that could arise otherwise, particularly as compilers become more aggressive in their optimizations. Thanx, Paul -
The problems arise because barriers are not used as required. Volatile has wishy washy semantics and somehow marries memory barriers with data access. It is clearer to separate the two. Conceptual cleanness usually translates into better code. If one really wants the volatile then lets make it explicit and use atomic_read_volatile() -
Completely agreed, again. To summarize again (had done so about ~100 mails
earlier in this thread too :-) ...
atomic_{read,set}_volatile() -- guarantees volatility also along with
atomicity (the two _are_ different concepts after all, irrespective of
whether callsites normally want one with the other or not)
atomic_{read,set}_nonvolatile() -- only guarantees atomicity, compiler
free to elid / coalesce / optimize such accesses, can keep the object
in question cached in a local register, leads to smaller text, etc.
As to which one should be the default atomic_read() is a question of
whether majority of callsites (more weightage to important / hot
codepaths, lesser to obscure callsites) want a particular behaviour.
Do we have a consensus here? (hoping against hope, probably :-)
[ This thread has gotten completely out of hand ... for my mail client
alpine as well, it now seems. Reminds of that 1000+ GPLv3 fest :-) ]
-
There are indeed architectures where you can cause gcc to emit memory barriers in response to volatile. I am assuming that we are -not- making gcc do this. Given this, then volatiles and memory barrier instructions are orthogonal -- one controls the compiler, the other controls the CPU. Thanx, Paul -
Name one such case. An atomic_read should do a load from memory. If the programmer puts an atomic_read() in the code then the compiler should emit a load for it, not re-use a value returned by a previous atomic_read. I do not believe it would ever be useful for the compiler to collapse two atomic_read statements into a single load. Paul. -
See sk_stream_mem_schedule in net/core/stream.c:
/* Under limit. */
if (atomic_read(sk->sk_prot->memory_allocated) < sk->sk_prot->sysctl_mem[0]) {
if (*sk->sk_prot->memory_pressure)
*sk->sk_prot->memory_pressure = 0;
return 1;
}
/* Over hard limit. */
if (atomic_read(sk->sk_prot->memory_allocated) > sk->sk_prot->sysctl_mem[2]) {
sk->sk_prot->enter_memory_pressure();
goto suppress_allocation;
}
We don't need to reload sk->sk_prot->memory_allocated here.
Now where is your example again?
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
Are you sure? How do you know some other CPU hasn't changed the value in between? Paul. -
Yes I'm sure, because we don't care if others have increased the reservation. Note that even if we did we'd be using barriers so volatile won't do us any good here. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
If the load-coalescing is important to performance, why not load into a local variable? Thanx, Paul -
But others can also reduce the reservation. Also, the code sets and clears *sk->sk_prot->memory_pressure nonatomically with respect to the reads of sk->sk_prot->memory_allocated, so in fact the code doesn't guarantee any particular relationship between the two. That code looks like a beautiful example of buggy, racy code where someone has sprinkled magic fix-the-races dust (otherwise known as atomic_t) around in a vain attempt to fix the races. That's assuming that all that stuff actually performs any useful purpose, of course, and that there isn't some lock held by the callers. In the latter case it is pointless using atomic_t. Paul. -
Yes others can reduce the reservation, but the point of this is that the code doesn't care. We'll either see the value before or after the reduction and in either case we'll do something sensible. The worst that can happen is when we're just below the hard limit and multiple CPUs fail to allocate but that's not really a problem because if the machine is making progress at all then we will eventually scale back and allow these allocations to succeed. As to the non-atomic operation on memory_pressue, that's OK because we only ever assign values to it and never do other operations such as += or -=. Remember that int/long assignments must be atomic or Linux won't run on your architecture. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
The cpu knows because the cacheline was not invalidated. -
Crap my statement above is wrong..... We do not care that the value was changed otherwise we would have put a barrier in there. -
I can't speak for this particular case, but there could be similar code examples elsewhere, where we do the atomic ops on an atomic_t object inside a higher-level locking scheme that would take care of the kind of problem you're referring to here. It would be useful for such or similar code if the compiler kept the value of that atomic object in a register. -
We might not be using atomic_t (and ops) if we already have a higher-level locking scheme, actually. So as Herbert mentioned, such cases might just not care. [ Too much of this thread, too little sleep, sorry! ] Anyway, the problem, of course, is that this conversion to a stronger / safer-by-default behaviour doesn't happen with zero cost to performance. Converting atomic ops to "volatile" behaviour did add ~2K to kernel text for archs such as i386 (possibly to important codepaths) that didn't have those semantics already so it would be constructive to actually look at those differences and see if there were really any heisenbugs that got rectified. Or if there were legitimate optimizations that got wrongly disabled. Onus lies on those proposing the modifications, I'd say ;-) -
The uses of atomic_read where one might want it to allow caching of the result seem to me to fall into 3 categories: 1. Places that are buggy because of a race arising from the way it's used. 2. Places where there is a race but it doesn't matter because we're doing some clever trick. 3. Places where there is some locking in place that eliminates any potential race. In case 1, adding volatile won't solve the race, of course, but it's hard to argue that we shouldn't do something because it will slow down buggy code. Case 2 is hopefully pretty rare and accompanied by large comment blocks, and in those cases caching the result of atomic_read explicitly in a local variable would probably make the code clearer. And in case 3 there is no reason to use atomic_t at all; we might as well just use an int. So I don't see any good reason to make the atomic API more complex by having "volatile" and "non-volatile" versions of atomic_read. It should just have the "volatile" behaviour. Paul. -
Since adding volatile doesn't help any of the 3 cases, and takes away optimisations from both 2 and 3, I wonder what is the point of the addition after all? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
Note that I said these are the cases _where one might want to allow caching_, so of course adding volatile doesn't help _these_ cases. There are of course other cases where one definitely doesn't want to allow the compiler to cache the value, such as when polling an atomic variable waiting for another CPU to change it, and from my inspection so far these cases seem to be the majority. The reasons for having "volatile" behaviour of atomic_read (whether or not that is achieved by use of the "volatile" C keyword) are - It matches the normal expectation based on the name "atomic_read" - It matches the behaviour of the other atomic_* primitives - It avoids bugs in the cases where "volatile" behaviour is required To my mind these outweigh the small benefit for some code of the non-volatile (caching-allowed) behaviour. In fact it's pretty minor either way, and since x86[-64] has this behaviour, one can expect the potential bugs in generic code to have mostly been found, although perhaps not all of them since x86[-64] has less aggressive reordering of memory accesses and fewer registers in which to cache things than some other architectures. Paul. -
We've been through that already. If it's a busy-wait it should use cpu_relax. If it's scheduling away that already forces the compiler to reread anyway. Can't argue since you left out what those expectations Do you (or anyone else for that matter) have an example of this? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
We use atomic_t for data that is concurrently locklessly written and read at arbitrary times. My naive expectation as driver author (driver maintainer) is that all atomic_t accessors, including atomic_read, (and The only code I somewhat know, the ieee1394 subsystem, was perhaps authored and is currently maintained with the expectation that each occurrence of atomic_read actually results in a load operation, i.e. is not optimized away. This means all atomic_t (bus generation, packet and buffer refcounts, and some other state variables)* and likewise all atomic bitops in that subsystem. If that assumption is wrong, then what is the API or language primitive to force a load operation to occur? *) Interesting what a quick LXR session in search for all atomic_t usages in 'my' subsystem brings to light. I now noticed an apparently unused struct member in the bitrotting pcilynx driver, and more importantly, a pairing of two atomic_t variables in raw1394 that should be audited for race conditions and for possible replacement by plain int. -- Stefan Richter -=====-=-=== =--- =---- http://arcgraph.de/sr/ -
Can you find an actual atomic_read code snippet there that is broken without the volatile modifier? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
What do I have to look for? atomic_read after another read or write access to the same variable, in the same function scope? Or in the sum of scopes of functions that could be merged by function inlining? One example was discussed here earlier: The for (;;) loop in nodemgr_host_thread. There an msleep_interruptible implicitly acted as barrier (at the moment because it's in a different translation unit; if it were the same, then because it hopefully has own barriers). So that happens to work, although such an implicit barrier is bad style: Better enforce the desired behaviour (== guaranteed load operation) *explicitly*. -- Stefan Richter -=====-=-=== =--- =---- http://arcgraph.de/sr/ -
PS: Just to clarify, I'm not speaking for the volatile modifier. I'm
not speaking for any particular implementation of atomic_t and its
accessors at all. All I am saying is that
- we use atomically accessed data types because we concurrently but
locklessly access this data,
- hence a read access to this data that could be optimized away
makes *no sense at all*.
The only sensible read accessor to an atomic datatype is a read accessor
that will not be optimized away.
So, the architecture guys can implement atomic_read however they want
--- as long as it cannot be optimized away.*
PPS: If somebody has code where he can afford to let the compiler
coalesce atomic_read with a previous access to the same data, i.e.
doesn't need and doesn't want all guarantees that the atomic_read API
makes (or IMO should make), then he can replace the atomic_read by a
local temporary variable.
*) Exceptions:
if (known_to_be_false)
read_access(a);
and the like.
--
Stefan Richter
-=====-=-=== =--- =----
http://arcgraph.de/sr/
-
No sane compiler can optimise away an atomic_read per se. That's only possible if there's a preceding atomic_set or atomic_read, with no barriers in the middle. If that's the case, then one has to conclude that doing away with the second read is acceptable, as otherwise a memory (or at least a compiler) barrier should have been used. In fact, volatile doesn't guarantee that the memory gets read anyway. You might be reading some stale value out of the cache. Granted this doesn't happen on x86 but when you're coding for the kernel you can't make such assumptions. So the point here is that if you don't mind getting a stale value from the CPU cache when doing an atomic_read, then surely you won't mind getting a stale value from the compiler They can implement it however they want as long as it stays atomic. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
The compiler can also reorder non-volatile accesses. For an example patch that cares about this, please see: http://lkml.org/lkml/2007/8/7/280 This patch uses an ORDERED_WRT_IRQ() in rcu_read_lock() and rcu_read_unlock() to ensure that accesses aren't reordered with respect Absolutely disagree. An interrupt/NMI/SMI handler running on the CPU will see the same value (whether in cache or in store buffer) that the mainline code will see. In this case, we don't care about CPU misordering, only about compiler misordering. It is easy to see other uses that combine communication with handlers on the current CPU with communication among CPUs -- again, see prior messages in Precisely. And volatility is a key property of "atomic". Let's please not throw it away. Thanx, Paul -
Good, finally we have some code to discuss (even though it's not actually in the kernel yet). First of all, I think this illustrates that what you want here has nothing to do with atomic ops. The ORDERED_WRT_IRQ macro occurs a lot more times in your patch than atomic reads/sets. So *assuming* that it was necessary at all, then having an ordered variant of the atomic_read/atomic_set ops could do just as well. However, I still don't know which atomic_read/atomic_set in your patch would be broken if there were no volatile. Could you please point them out? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
Indeed. If I could trust atomic_read()/atomic_set() to cause the compiler to maintain ordering, then I could just use them instead of having to create an ORDERED_WRT_IRQ(). (Or ACCESS_ONCE(), as it is called in a Suppose I tried replacing the ORDERED_WRT_IRQ() calls with atomic_read() and atomic_set(). Starting with __rcu_read_lock(): o If "ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])++" was ordered by the compiler after "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1", then suppose an NMI/SMI happened after the rcu_read_lock_nesting but before the rcu_flipctr. Then if there was an rcu_read_lock() in the SMI/NMI handler (which is perfectly legal), the nested rcu_read_lock() would believe that it could take the then-clause of the enclosing "if" statement. But because the rcu_flipctr per-CPU variable had not yet been incremented, an RCU updater would be within its rights to assume that there were no RCU reads in progress, thus possibly yanking a data structure out from under the reader in the SMI/NMI function. Fatal outcome. Note that only one CPU is involved here because these are all either per-CPU or per-task variables. o If "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1" was ordered by the compiler to follow the "ORDERED_WRT_IRQ(me->rcu_flipctr_idx) = idx", and an NMI/SMI happened between the two, then an __rcu_read_lock() in the NMI/SMI would incorrectly take the "else" clause of the enclosing "if" statement. If some other CPU flipped the rcu_ctrlblk.completed in the meantime, then the __rcu_read_lock() would (correctly) write the new value into rcu_flipctr_idx. Well and good so far. But the problem arises in __rcu_read_unlock(), which then decrements the wrong counter. Depending on exactly how subsequent events played out, this could result in either prematurely ending grace periods or never-ending grace periods, both of which are fatal outcomes. And the following are not needed in the current ...
Hmm, I never quite got what all this interrupt/NMI/SMI handling and RCU business you mentioned earlier was all about, but now that you've +#define WHATEVER(x) (*(volatile typeof(x) *)&(x)) I suppose one could want volatile access semantics for stuff that's a bit-field too, no? Also, this gives *zero* "re-ordering" guarantees that your code wants as you've explained it below) -- neither w.r.t. CPU re-ordering (which probably you don't care about) *nor* w.r.t. compiler re-ordering Ok, so you don't care about CPU re-ordering. Still, I should let you know that your ORDERED_WRT_IRQ() -- bad name, btw -- is still buggy. What you want is a full compiler optimization barrier(). [ Your code probably works now, and emits correct code, but that's just because of gcc did what it did. Nothing in any standard, or in any documented behaviour of gcc, or anything about the real It's not about interrupt/SMI/NMI handlers at all! What you clearly want, simply put, is that a certain stream of C statements must be emitted by the compiler _as they are_ with no re-ordering optimizations! You must *definitely* use barrier(), IMHO. Satyam -
One could, but this is not supported in general. So if you want that, you need to use the usual bit-mask tricks and (for setting) atomic You are correct about CPU re-ordering (and about the fact that this example doesn't care about it), but not about compiler re-ordering. The compiler is prohibited from moving a volatile access across a sequence point. One example of a sequence point is a statement boundary. Because all of the volatile accesses in this code are separated by statement Really? Why doesn't the prohibition against moving volatile accesses Almost. I don't care about most of the operations, only about the loads and stores marked volatile. Again, although the compiler is free to reorder volatile accesses that occur -within- a single statement, it is prohibited by the standard from moving volatile accesses from one statement to another. Therefore, this code can legitimately use volatile. Or am I missing something subtle? Thanx, Paul -
Yes, you're right, and I believe precisely this was discussed elsewhere
as well today.
But I'd call attention to what Herbert mentioned there. You're using
ORDERED_WRT_IRQ() on stuff that is _not_ defined to be an atomic_t at all:
* Member "completed" of struct rcu_ctrlblk is a long.
* Per-cpu variable rcu_flipctr is an array of ints.
* Members "rcu_read_lock_nesting" and "rcu_flipctr_idx" of
struct task_struct are ints.
So are you saying you're "having to use" this volatile-access macro
because you *couldn't* declare all the above as atomic_t and thus just
expect the right thing to happen by using the atomic ops API by default,
because it lacks volatile access semantics (on x86)?
If so, then I wonder if using the volatile access cast is really the
best way to achieve (at least in terms of code clarity) the kind of
re-ordering guarantees it wants there. (there could be alternative
solutions, such as using barrier(), or that at bottom of this mail)
What I mean is this: If you switch to atomic_t, and x86 switched to
make atomic_t have "volatile" semantics by default, the statements
would be simply a string of: atomic_inc(), atomic_add(), atomic_set(),
and atomic_read() statements, and nothing in there that clearly makes
it *explicit* that the code is correct (and not buggy) simply because
of the re-ordering guarantees that the C "volatile" type-qualifier
keyword gives us as per the standard. But now we're firmly in
True, *(volatile foo *)& _will_ work for this case.
But multiple calls to barrier() (granted, would invalidate all other
optimizations also) would work as well, would it not?
[ Interestingly, if you declared all those objects mentioned earlier as
atomic_t, and x86(-64) switched to an __asm__ __volatile__ based variant
for atomic_{read,set}_volatile(), the bugs you want to avoid would still
be there. "volatile" the C language type-qualifier does have compiler
re-ordering semantics you mentioned earlier, but the "volatile" that
So don't ...In any case, given Linus's note, it appears that atomic_read() and atomic_set() won't consistently have volatile semantics, at least not while the compiler generates such ugly code for volatile accesses. So I will continue with my current approach. In any case, I will not be using atomic_inc() or atomic_add() in this code, as doing so would more than double the overhead, even on machines Well, that certainly would be a point in favor of "volatile" over inline The local_irq_save/restore are something like 30% of the overhead of these two functions, so will be looking hard at getting rid of them. Doing so allows the scheduling-clock interrupt to get into the mix, and also allows preemption. The goal would be to find some trick that suppresses preemption, fends off the grace-period-computation code invoked from the the scheduling-clock interrupt, and otherwise keeps Or that needs to resolve similar races with IRQs without disabling them. One reason to avoid disabling IRQs is to avoid degrading scheduling latency. In any case, I do agree that the amount of code that must worry about this is quite small at the moment. I believe that it will become more common, but would imagine that this belief might not Given some low-level details of the current implementation, I could imagine manipulating rcu_read_lock_nesting on entry to and exit from all NMI/SMI handlers, but would like to avoid that kind of architecture Agreed! Thanx, Paul -
I still don't agree with the underlying assumption that everybody (or lots of kernel code) treats atomic accesses as volatile. Nobody that does has managed to explain my logic problem either: loads and stores to long and ptr have always been considered to be atomic, test_bit is atomic; so why are another special subclass of atomic loads and stores? (and yes, it is perfectly legitimate to want a non-volatile read for a data type that you also want to do atomic RMW operations on) Why are people making these undocumented and just plain false assumptions about atomic_t? If they're using lockless code (ie. which they must be if using atomics), then they actually need to be thinking much harder about memory ordering issues. If that is too It isn't, though (at least not since i386 and x86-64 don't have it). _Adding_ it is trivial, and can be done any time. Throwing it away (ie. making the API weaker) is _hard_. So let's not add it without really good reasons. It most definitely results in worse code generation in practice. I don't know why people would assume volatile of atomics. AFAIK, most of the documentation is pretty clear that all the atomic stuff can be reordered etc. except for those that modify and return a value. It isn't even intuitive: `*lp = value` is like the most fundamental atomic operation in Linux. -- SUSE Labs, Novell Inc. -
Well, it has only been false since December 2006. Prior to that Indeed. I believe that most uses of atomic_read other than in polling loops or debug printk statements are actually racy. In some cases the race doesn't seem to matter, but I'm sure there are cases where it Why use locks when you can just sprinkle magic fix-the-races dust (aka Conceptually it is, because atomic_t is specifically for variables which are liable to be modified by other CPUs, and volatile _means_ "liable to be changed by mechanisms outside the knowledge of the Well, in one sense it's not that hard - Linus did it just 8 months ago 0.0008% increase in kernel text size on powerpc according to my By making something an atomic_t you're saying "other CPUs are going to be modifying this, so treat it specially". It's reasonable to assume Volatility isn't primarily about reordering (though as Linus says it does restrict reordering to some extent). Paul. -
Hmm, although I don't think it has ever been guaranteed by the API documentation (concede documentation is often not treated as the authoritative source here, but for atomic it is actually very good and obviously indispensable as the memory ordering I agree with your skepticism of a lot of lockless code. But I think a lot of the more subtle race problems will not be fixed with volatile. The big, dumb infinite loop bugs would be fixed, but they're pretty Usually that is the case, yes. But also most of the time we don't care that it has been changed and don't mind it being reordered or eliminated. One of the only places we really care about that at all is for Well it would have been harder if the documentation also guaranteed that atomic_read/atomic_set was ordered. Or it would have been harder I don't think you're making a bad choice by keeping it volatile on powerpc and waiting for others to shake out more of the bugs. You But I don't actually know what that "special treatment" is. Well actually, I do know that operations will never result in a partial modification being exposed. I also know that the operators that do not modify and return are not guaranteed to have any sort of ordering constraints. -- SUSE Labs, Novell Inc. -
Which documentation is there?
For driver authors, there is LDD3. It doesn't specifically cover
effects of optimization on accesses to atomic_t.
For architecture port authors, there is Documentation/atomic_ops.txt.
Driver authors also can learn something from that document, as it
indirectly documents the atomic_t and bitops APIs.
Prompted by this thread, I reread this document, and indeed, the
sentence "Unlike the above routines, it is required that explicit memory
barriers are performed before and after [atomic_{inc,dec}_return]"
indicates that atomic_read (one of the "above routines") is very
different from all other atomic_t accessors that return values.
This is strange. Why is it that atomic_read stands out that way? IMO
this API imbalance is quite unexpected by many people. Wouldn't it be
beneficial to change the atomic_read API to behave the same like all
other atomic_t accessors that return values?
OK, it is also different from the other accessors that return data in so
far as it doesn't modify the data. But as driver "author", i.e. user of
the API, I can't see much use of an atomic_read that can be reordered
and, more importantly, can be optimized away by the compiler. Sure, now
that I learned of these properties I can start to audit code and insert
barriers where I believe they are needed, but this simply means that
almost all occurrences of atomic_read will get barriers (unless there
already are implicit but more or less obvious barriers like msleep).
--
Stefan Richter
-=====-=-=== =--- =---=
http://arcgraph.de/sr/
-
"Semantics and Behavior of Atomic and Bitmask Operations" is pretty direct :) Sure, it says that it's for arch maintainers, but there is no It is very consistent and well defined. Operations which both modify the data _and_ return something are defined to have full barriers before and after. What do you want to add to the other atomic accessors? Full memory barriers? Only compiler barriers? It's quite likely that if you think some barriers will fix bugs, then there are other bugs lurking there anyway. Just use spinlocks if you're not absolutely clear about potential It will return to you an atomic snapshot of the data (loaded from memory at some point since the last compiler barrier). All you have to be aware of compiler barriers and the Linux SMP memory ordering You might find that these places that appear to need barriers are buggy for other reasons anyway. Can you point to some in-tree code we can have a look at? -- SUSE Labs, Novell Inc. -
I fully agree with this. As Paul Mackerras mentioned elsewhere, a lot of authors sprinkle atomic_t in code thinking they're somehow done with *locking*. This is sad, and I wonder if it's time for a Such code was mentioned elsewhere (query nodemgr_host_thread in cscope) that managed to escape the requirement for a barrier only because of some completely un-obvious compilation-unit-scope thing. But I find such an non-explicit barrier quite bad taste. Stefan, do consider plunking an explicit call to barrier() there. Satyam -
It is very obvious. msleep calls schedule() (ie. sleeps), which is always a barrier. The "unobvious" thing is that you wanted to know how the compiler knows a function is a barrier -- answer is that if it does not *know* it is not a barrier, it must assume it is a barrier. If the whole msleep call chain including the scheduler were defined static in the current compilation unit, then it would still be a barrier because it would actually be able to see the barriers in schedule(void), if nothing else. -
Probably you didn't mean that, but no, schedule() is not barrier because True, that's clearly what happens here. But are you're definitely joking that this is "obvious" in terms of code-clarity, right? Just 5 minutes back you mentioned elsewhere you like seeing lots of explicit calls to barrier() (with comments, no less, hmm? :-) -
Where did I say it is a barrier because it sleeps? It is always a barrier because, at the lowest level, schedule() (and thus anything that sleeps) is defined to always be a barrier. Regardless of whatever obscure means the compiler might need to infer the barrier. In other words, you can ignore those obscure details because schedule() is No. If you accept that barrier() is implemented correctly, and you know that sleeping is defined to be a barrier, then its perfectly clear. You don't have to know how the compiler "knows" that some function contains Sure, but there are well known primitives which contain barriers, and trivial recognisable code sequences for which you don't need comments. waiting-loops using sleeps or cpu_relax() are prime examples. -
"It is always a barrier because, at the lowest level, anything that sleeps I didn't quite understand what you said here, so I'll tell what I think: * foo() is a compiler barrier if the definition of foo() is invisible to the compiler at a callsite. * foo() is also a compiler barrier if the definition of foo() includes a barrier, and it is inlined at the callsite. If the above is wrong, or if there's something else at play as well, Curiously, that's the second time you've said "sleeping is defined to be a (compiler) barrier". How does the compiler even know if foo() is a function that "sleeps"? Do compilers have some notion of "sleeping" to ensure they automatically assume a compiler barrier whenever such a function is called? Or are you saying that the compiler can see the barrier() inside said function ... nopes, you're saying quite the I think I do, why not? Would appreciate if you could elaborate on this. Satyam -
You're getting too worried about the compiler implementation. Start If a function is not completely visible to the compiler (so it can't determine whether a barrier could be in it or not), then it must always assume it will contain a barrier so it always does the right thing. -
Yup, that's what I'd said just a few sentences above, as you can see. I was actually asking for "elaboration" on "how a compiler determines that function foo() (say foo == schedule), even when it cannot see that it has a barrier(), as you'd mentioned, is a 'sleeping' function" actually, and whether compilers have a "notion of sleep to automatically assume a compiler barrier whenever such a sleeping function foo() is called". But I think you've already qualified the discussion to this kernel, so okay, I shouldn't nit-pick anymore. -
"Indirect", "pretty direct"... It's subjective. Yes, but unlike these, atomic_read returns a value. Without me (the API user) providing extra barriers, that value may become something else whenever someone touches code in the vicinity of You are right, atomic_read is not only different from accessors that don't retunr values, it is also different from all other accessors that return values (because they all also modify the value). There is just no actual API documentation, which contributes to the issue that some people (or at least one: me) learn a little bit late how special A lot of different though related issues are discussed in this thread, but I personally am only occupied by one particular thing: What kind of Probably good advice, like generally if driver guys consider lockless OK, that's what I slowly realized during this discussion, and I I could, or could not, if I were through with auditing the code. I remembered one case and posted it (nodemgr_host_thread) which was safe because msleep_interruptible provided the necessary barrier there, and this implicit barrier is not in danger to be removed by future patches. -- Stefan Richter -=====-=-=== =--- =---= http://arcgraph.de/sr/ -
PS, just in case anybody holds his breath for more example code from me, I don't plan to continue with an actual audit of the drivers I maintain. It's an important issue, but my current time budget will restrict me to look at it ad hoc, per case. (Open bugs have higher priority than potential bugs.) -- Stefan Richter -=====-=-=== =--- =---= http://arcgraph.de/sr/ -
Note, LDD3 page 238 says: "It is worth noting that most of the other kernel primitives dealing with synchronization, such as spinlock and atomic_t operations, also function as memory barriers." I don't know about Linux 2.6.10 against which LDD3 was written, but currently only _some_ atomic_t operations function as memory barriers. Besides, judging from some posts in this thread, saying that atomic_t operations dealt with synchronization may not be entirely precise. -- Stefan Richter -=====-=-=== =--- =--=- http://arcgraph.de/sr/ -
atomic_t is often used as the basis for implementing more sophisticated synchronization mechanisms, such as rwlocks. Whether or not they are designed for that purpose, the atomic_* operations are de facto synchronization primitives. -- Chris -
...which is undefined behaviour in C (and GCC) when that data is declared volatile, which is a good argument against implementing atomics that way in itself. Segher -
No, that particular argument is bogus, because there is a cache coherency protocol operating to keep the CPU cache coherent with stores from other CPUs, but there isn't any such protocol (nor should there be) for a register used as a "cache". (Linux requires SMP systems to keep any CPU caches coherent as far as accesses by other CPUs are concerned. It doesn't support any SMP systems that are not cache-coherent as far as CPU accesses are concerned. It does support systems with non-cache-coherent DMA.) Paul. -
Hmm, it's not bad style at all. Let's assume that everything is in the same scope. Such a loop must either call a function that busy-waits, which should always have a cpu_relax or something equivalent, or it'll call a function that schedules away which immediately invalidates any values the compiler might have cached for the atomic_read. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
A whole bunch of atomic_read uses will be broken without the volatile modifier once we start removing barriers that aren't needed if volatile behavior is guaranteed. barrier() clobbers all your registers. volatile atomic_read() only clobbers one register, and more often than not it's a register you wanted to clobber anyway. -- Chris -
Could you please cite the file/function names so we can see whether removing the barrier makes sense? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
At a glance, several architectures' implementations of smp_call_function() have one or more legitimate atomic_read() busy-waits that shouldn't be using CPU-relax. Some of them do work in the loop. I'm sure there are plenty more examples that various maintainers could find in their own code. -- Chris -
Care to name one so we can discuss it? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
sh looks like the only one there that would be broken (and that's only because they don't have a cpu_relax there, but it should be added anyway). Sure, if we removed volatile from other architectures, it would be wise to audit arch code because arch maintainers do sometimes make assumptions about their implementation details... however we can assume most generic code is safe without volatile. -- SUSE Labs, Novell Inc. -
There are some in arch-specific code, for example line 1073 of arch/mips/kernel/smtc.c. On mips, cpu_relax() is just barrier(), so the empty loop body is ok provided that atomic_read actually does the load each time around the loop. Paul. -
A barrier() is all you need to force the compiler to reread the value. The people advocating volatile in this thread are talking about code that doesn't use barrier()/cpu_relax(). Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
Did you look at it? Here it is: /* Someone else is initializing in parallel - let 'em finish */ while (atomic_read(&idle_hook_initialized) < 1000) ; Paul. -
Honestly, this thread is suffering from HUGE communication gaps. What Herbert (obviously) meant there was that "this loop could've been okay _without_ using volatile-semantics-atomic_read() also, if only it used cpu_relax()". That does work, because cpu_relax() is _at least_ barrier() on all archs (on some it also emits some arch-dependent "pause" kind of instruction). Now, saying that "MIPS does not have such an instruction so I won't use cpu_relax() for arch-dependent-busy-while-loops in arch/mips/" sounds like a wrong argument, because: tomorrow, such arch's _may_ introduce such an instruction, so naturally, at that time we'd change cpu_relax() appropriately (in reality, we would actually *re-define* cpu_relax() and ensure that the correct version gets pulled in depending on whether the callsite code is legacy or only for the newer such CPUs of said arch, whatever), but loops such as this would remain un-changed, because they never used cpu_relax()! OTOH an argument that said the following would've made a stronger case: "I don't want to use cpu_relax() because that's a full memory clobber barrier() and I have loop-invariants / other variables around in that code that I *don't* want the compiler to forget just because it used cpu_relax(), and hence I will not use cpu_relax() but instead make my atomic_read() itself have "volatility" semantics. Not just that, but I will introduce a cpu_relax_no_barrier() on MIPS, that would be a no-op #define for now, but which may not be so forever, and continue to use that in such busy loops." In general, please read the thread-summary I've tried to do at: http://lkml.org/lkml/2007/8/17/25 Feel free to continue / comment / correct stuff from there, there's too much confusion and circular-arguments happening on this thread otherwise. [ I might've made an incorrect statement there about "volatile" w.r.t. cache on non-x86 archs, I think. ] Satyam -
I looked around a bit by using some command lines and ended up wondering if these are equal to busy-wait case (and should be fixed) or not: ./drivers/telephony/ixj.c 6674: while (atomic_read(&j->DSPWrite) > 0) 6675- atomic_dec(&j->DSPWrite); ...besides that, there are couple of more similar cases in the same file (with braces)... -- i. -
Generally, ixj.c has several occurrences of couples of atomic write and atomic read which potentially do not do what the author wanted. -- Stefan Richter -=====-=-=== =--- =---- http://arcgraph.de/sr/ -
atomic_dec() already has volatile behavior everywhere, so this is semantically okay, but this code (and any like it) should be calling cpu_relax() each iteration through the loop, unless there's a compelling reason not to. I'll allow that for some hardware drivers (possibly this one) such a compelling reason may exist, but hardware-independent core subsystems probably have no excuse. If the maintainer of this code doesn't see a compelling reason to add cpu_relax() in this loop, then it should be patched. -- Chris -
No it does not have any volatile semantics. atomic_dec() can be reordered at will by the compiler within the current basic unit if you do not add a barrier. -
Yep. Or you can use atomic_dec_return() instead of using a barrier. Thanx, Paul -
Or you could use smp_mb__{before,after}_atomic_dec.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
Yep. That would be an example of a barrier, either in the atomic_dec() itself or in the smp_mb...(). Thanx, Paul -
Volatile doesn't mean it can't be reordered; volatile means the accesses can't be eliminated. Paul. -
It also does limit re-ordering. Of course, since *normal* accesses aren't necessarily limited wrt re-ordering, the question then becomes one of "with regard to *what* does it limit re-ordering?". A C compiler that re-orders two different volatile accesses that have a sequence point in between them is pretty clearly a buggy compiler. So at a minimum, it limits re-ordering wrt other volatiles (assuming sequence points exists). It also means that the compiler cannot move it speculatively across conditionals, but other than that it's starting to get fuzzy. In general, I'd *much* rather we used barriers. Anything that "depends" on volatile is pretty much set up to be buggy. But I'm certainly also willing to have that volatile inside "atomic_read/atomic_set()" if it avoids code that would otherwise break - ie if it hides a bug. Linus -
The cost of doing so seems to me to be well down in the noise - 44 bytes of extra kernel text on a ppc64 G5 config, and I don't believe the extra few cycles for the occasional extra load would be measurable (they should all hit in the L1 dcache). I don't mind if x86[-64] have atomic_read/set be nonvolatile and find all the missing barriers, but for now on powerpc, I think that not having to find those missing barriers is worth the 0.00076% increase in kernel text size. Paul. -
BTW, the sort of missing barriers that triggered this thread aren't that subtle. It'll result in a simple lock-up if the loop condition holds upon entry. At which point it's fairly straightforward to find the culprit. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
Not necessarily. A barrier-less buggy code such as below: atomic_set(&v, 0); ... /* some initial code */ while (atomic_read(&v)) ; ... /* code that MUST NOT be executed unless v becomes non-zero */ (where v->counter is has no volatile access semantics) could be generated by the compiler to simply *elid* or *do away* with the loop itself, thereby making the: "/* code that MUST NOT be executed unless v becomes non-zero */" to be executed even when v is zero! That is subtle indeed, and causes no hard lockups. Granted, the above IS buggy code. But, the stated objective is to avoid heisenbugs. And we have driver / subsystem maintainers such as Stefan coming up and admitting that often a lot of code that's written to use atomic_read() does assume the read will not be elided by the compiler. See, I agree, "volatility" semantics != what we often want. However, if what we want is compiler barrier, for only the object under consideration, "volatility" semantics aren't really "nonsensical" or anything. -
Then I presume you mean
while (!atomic_read(&v))
;
Which is just the same old infinite loop bug solved with cpu_relax().
These are pretty trivial to audit and fix, and also to debug, I would
Anyway, why are you making up code snippets that are buggy in other
ways in order to support this assertion being made that lots of kernel
code supposedly depends on volatile semantics. Just reference the
So these are broken on i386 and x86-64?
Are they definitely safe on SMP and weakly ordered machines with
just a simple compiler barrier there? Because I would not be
surprised if there are a lot of developers who don't really know
what to assume when it comes to memory ordering issues.
This is not a dig at driver writers: we still have memory ordering
problems in the VM too (and probably most of the subtle bugs in
lockless VM code are memory ordering ones). Let's not make up a
false sense of security and hope that sprinkling volatile around
will allow people to write bug-free lockless code. If a writer
can't be bothered reading API documentation and learning the Linux
memory model, they can still be productive writing safely locked
code.
--
SUSE Labs, Novell Inc.
-
Because the point is *not* about existing bugs in kernel code. At some
point Chris Snook (who started this thread) did write that "If I knew
of the existing bugs in the kernel, I would be sending patches for them,
not this series" or something to that effect.
The point is about *author expecations*. If people do expect atomic_read()
(or a variant thereof) to have volatile semantics, why not give them such
a variant?
And by the way, the point is *also* about the fact that cpu_relax(), as
of today, implies a full memory clobber, which is not what a lot of such
^^^^^^^^^^^^^
(so it's about compiler barrier expectations only, though I fully agree
that those who're using atomic_t as if it were some magic thing that lets
Possibly, but the point is not about existing bugs, as mentioned above.
Some such bugs have been found nonetheless -- reminds me, can somebody
please apply http://www.gossamer-threads.com/lists/linux/kernel/810674 ?
Satyam
-
Because they should be thinking about them in terms of barriers, over which the compiler / CPU is not to reorder accesses or cache memory operations, rather than "special" "volatile" accesses. Linux's whole memory ordering and locking model is completely geared around the That's not the point, because as I also mentioned, the logical extention to Linux's barrier API to handle this is the order(x) macro. Again, not special volatile accessors. -
This is obviously just a taste thing. Whether to have that forget(x) barrier as something author should explicitly sprinkle appropriately in appropriate places in the code by himself or use a primitive that includes it itself. I'm not saying "taste matters aren't important" (they are), but I'm really That's definitely the point, why not. This is why "barrier()", being Sure, that forget(x) macro _is_ proposed to be made part of the generic API. Doesn't explain why not to define/use primitives that has volatility semantics in itself, though (taste matters apart). -
That's not obviously just taste to me. Not when the primitive has many (perhaps, the majority) of uses that do not require said barriers. And this is not solely about the code generation (which, as Paul says, is relatively minor even on x86). I prefer people to think explicitly That is _not_ the point (of why a volatile atomic_read is good) because there has already been an alternative posted that better conforms with Linux barrier API and is much more widely useful and more usable. If you are so worried about barrier() being too heavyweight, then you're off to a poor start by wanting to If you follow the discussion.... You were thinking of a reason why the semantics *should* be changed or added, and I was rebutting your argument that it must be used when a full barrier() is too heavy (ie. by pointing out that order() has superior semantics anyway). Why do I keep repeating the same things? I'll not continue bloating this thread until a new valid point comes up... -
See, you do *require* people to have to repeat the same things to you! As has been written about enough times already, and if you followed the discussion on this thread, I am *not* proposing that atomic_read()'s semantics be changed to have any extra barriers. What is proposed is a different atomic_read_xxx() variant thereof, that those can use who do want that. Now whether to have a kind of barrier ("volatile", whatever) in the atomic_read_xxx() itself, or whether to make the code writer himself to explicitly write the order(x) appropriately in appropriate places in the Again, you're requiring me to repeat things that were already made evident on this thread (if you follow it). This _is_ the point, because a lot of loops out there (too many of them, I WILL NOT bother citing file_name:line_number) end up having to use a barrier just because they're using a loop-exit-condition that depends on a value returned by atomic_read(). It would be good for them if they used an atomic_read_xxx() primitive that gave these "volatility" semantics Whether that alternative (explicitly using forget(x), or wrappers thereof, such as the "order_atomic" you proposed) is better than other alternatives (such as atomic_read_xxx() which includes the volatility behaviour in itself) is still open, and precisely what we started discussing just one mail back. I don't think so. Repeating myself, for the N'th time, NO, I DON'T want to make atomic_read Amazing. Either you have reading comprehension problems, or else, please try reading this thread (or at least this sub-thread) again. I don't want _you_ blaming _me_ for having to repeat things to you all over again. -
Indeed. I think the important issues are: - "volatile" itself is simply a badly/weakly defined issue. The semantics of it as far as the compiler is concerned are really not very good, and in practice tends to boil down to "I will generate so bad code that nobody can accuse me of optimizing anything away". - "volatile" - regardless of how well or badly defined it is - is purely a compiler thing. It has absolutely no meaning for the CPU itself, so it at no point implies any CPU barriers. As a result, even if the compiler generates crap code and doesn't re-order anything, there's nothing that says what the CPU will do. - in other words, the *only* possible meaning for "volatile" is a purely single-CPU meaning. And if you only have a single CPU involved in the process, the "volatile" is by definition pointless (because even without a volatile, the compiler is required to make the C code appear consistent as far as a single CPU is concerned). So, let's take the example *buggy* code where we use "volatile" to wait for other CPU's: atomic_set(&var, 0); while (!atomic_read(&var)) /* nothing */; which generates an endless loop if we don't have atomic_read() imply volatile. The point here is that it's buggy whether the volatile is there or not! Exactly because the user expects multi-processing behaviour, but "volatile" doesn't actually give any real guarantees about it. Another CPU may have done: external_ptr = kmalloc(..); /* Setup is now complete, inform the waiter */ atomic_inc(&var); but the fact is, since the other CPU isn't serialized in any way, the "while-loop" (even in the presense of "volatile") doesn't actually work right! Whatever the "atomic_read()" was waiting for may not have completed, because we have no barriers! So if "volatile" makes a difference, it is invariably a sign of a bug in serialization (the one exception is for IO - we use "volatile" to avoid having to use ...
I assume you mean "except for IO-related code and 'random' values like jiffies" as you mention later on? I assume other values set in What about reading values modified in interrupt handlers, as in your "random" case? Or is this a bug where the user of atomic_read() is invalidly expecting a read each time it is called? Chris -
the interrupt handler case is an SMP case since you do not know beforehand what cpu your interrupt handler will run on. -
With the exception of per-CPU variables, yes. Thanx, Paul -
Yes. There *are* valid uses for "volatile", but they have remained the same for the last few years: - "jiffies" - internal per-architecture IO implementations that can do them as normal I don't really see any valid case. I can imagine that you have your own "jiffy" counter in a driver, but what's the point, really? I'd suggest not Quite frankly, the biggest reason for using "volatile" on jiffies was really historic. So even the "random" case is not really a very strong one. You'll notice that anybody who is actually careful will be using sequence locks for the jiffy accesses, if only because the *full* jiffy count is actually a 64-bit value, and so you cannot get it atomically on a 32-bit architecture even on a single CPU (ie a timer interrupt might happen in between reading the low and the high word, so "volatile" is only used for the low 32 bits). So even for jiffies, we actually have: extern u64 __jiffy_data jiffies_64; extern unsigned long volatile __jiffy_data jiffies; where the *real* jiffies is not volatile: the volatile one is using linker tricks to alias the low 32 bits: - arch/i386/kernel/vmlinux.lds.S: ... jiffies = jiffies_64; ... and the only reason we do all these games is (a) it works and (b) it's legacy. Note how I do *not* say "(c) it's a good idea". Linus -
What about barrier removal? With consistent semantics we could optimize a fair amount of code. Whether or not that constitutes "premature" optimization is open to debate, but there's no question we could reduce our register wiping in some places. -- Chris -
If you've been reading all of Linus's emails you should be thinking about adding memory barriers, and not removing compiler barriers. He's just told you that code of the kind while (!atomic_read(cond)) ; do_something() probably needs a memory barrier (not just compiler) so that do_something() doesn't see stale cache content that occured before cond flipped. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
Such code generally doesn't care precisely when it gets the update, just that the update is atomic, and it doesn't loop forever. Regardless, I'm convinced we just need to do it all in assembly. -- Chris -
Yes, it _does_ care that it gets the update _at all_, and preferably So do you want "volatile asm" or "plain asm", for atomic_read()? The asm version has two ways to go about it too... Segher -
Why do people think that barriers are expensive? They really aren't. Especially the regular compiler barrier is basically zero cost. Any reasonable compiler will just flush the stuff it holds in registers that isn't already automatic local variables, and for regular kernel code, that tends to basically be nothing at all. Ie a "barrier()" is likely _cheaper_ than the code generation downside from using "volatile". Linus -
From: Linus Torvalds <torvalds@linux-foundation.org> Assuming GCC were ever better about the code generation badness with volatile that has been discussed here, I much prefer we tell GCC "this memory piece changed" rather than "every piece of memory has changed" which is what the barrier() does. I happened to have been scanning a lot of assembler lately to track down a gcc-4.2 miscompilation on sparc64, and the barriers do hurt quite a bit in some places. Instead of keeping unrelated variables around cached in local registers, it reloads everything. -
Moore's law is definitely working against us here. Register counts, pipeline depths, core counts, and clock multipliers are all increasing in the long run. At some point in the future, barrier() will be universally regarded as a hammer too big for most purposes. Whether or not removing it now constitutes premature optimization is arguable, but I think we should allow such optimization to happen (or not happen) in architecture-dependent code, and provide a consistent API that doesn't require the use of such things in arch-independent code where it might turn into a totally superfluous performance killer depending on what hardware it gets compiled for. -- Chris -
You can't just remove it, it is needed in some places; you want to replace it in most places with a more fine-grained "compiler barrier", Explicit barrier()s won't be too hard to replace -- but what to do about the implicit barrier()s in rmb() etc. etc. -- *those* will be hard to get rid of, if only because it is hard enough to teach driver authors about how to use those primitives *already*. It is far from clear what a good interface like that would look like, anyway. Probably we should first start experimenting with a forget()-style micro-barrier (but please, find a better name), and see if a nice usage pattern shows up that can be turned into an API. Segher -
I do agree, and the important point to note is that the benefits of a /lighter/ compiler barrier, such as what David referred to above, _can_ be had without having to do anything with the "volatile" keyword at all. And such a primitive has already been mentioned/proposed on this thread. But this is all tangential to the core question at hand -- whether to have implicit (compiler, possibly "light-weight" of the kind referred above) barrier semantics in atomic ops that do not have them, or not. I was lately looking in the kernel for _actual_ code that uses atomic_t and benefits from the lack of any implicit barrier, with the compiler being free to cache the atomic_t in a register. Now that often does _not_ happen, because all other ops (implemented in asm with LOCK prefix on x86) _must_ therefore constrain the atomic_t to memory anyway. So typically all atomic ops code sequences end up operating on memory. Then I did locate sched.c:select_nohz_load_balancer() -- it repeatedly references the same atomic_t object, and the code that I saw generated (with CC_OPTIMIZE_FOR_SIZE=y) did cache it in a register for a sequence of instructions. It uses atomic_cmpxchg, thereby not requiring explicit memory barriers anywhere in the code, and is an example of an atomic_t user that is safe, and yet benefits from its memory loads/stores being elided/coalesced by the compiler. # at this point, %%eax holds num_online_cpus() and # %%ebx holds cpus_weight(nohz.cpu_mask) # the variable "cpu" is in %esi 0xc1018e1d: cmp %eax,%ebx # if No.A. 0xc1018e1f: mov 0xc134d900,%eax # first atomic_read() 0xc1018e24: jne 0xc1018e36 0xc1018e26: cmp %esi,%eax # if No.B. 0xc1018e28: jne 0xc1018e80 # returns with 0 0xc1018e2a: movl $0xffffffff,0xc134d900 # atomic_set(-1), and ... 0xc1018e34: jmp 0xc1018e80 # ... returns with 0 0xc1018e36: cmp $0xffffffff,%eax # if No.C. (NOTE!) 0xc1018e39: jne 0xc1018e46 0xc1018e3b: ...
Note that "barrier()" is purely a compiler barrier. It has zero impact on the CPU pipeline itself, and also has zero impact on anything that gcc knows isn't visible in memory (ie local variables that don't have their address taken), so barrier() really is pretty cheap. Now, it's possible that gcc messes up in some circumstances, and that the memory clobber will cause gcc to also do things like flush local registers unnecessarily to their stack slots, but quite frankly, if that happens, it's a gcc problem, and I also have to say that I've not seen that myself. So in a very real sense, "barrier()" will just make sure that there is a stronger sequence point for the compiler where things are stable. In most cases it has absolutely zero performance impact - apart from the -intended- impact of making sure that the compiler doesn't re-order or cache stuff around it. And sure, we could make it more finegrained, and also introduce a per-variable barrier, but the fact is, people _already_ have problems with thinking about these kinds of things, and adding new abstraction issues with subtle semantics is the last thing we want. So I really think you'd want to show a real example of real code that actually gets noticeably slower or bigger. In removing "volatile", we have shown that. It may not have made a big difference on powerpc, but it makes a real difference on x86 - and more importantly, it removes something that people clearly don't know how it works, and incorrectly expect to just fix bugs. [ There are *other* barriers - the ones that actually add memory barriers to the CPU - that really can be quite expensive. The good news is that the expense is going down rather than up: both Intel and AMD are not only removing the need for some of them (ie "smp_rmb()" will become a compiler-only barrier), but we're _also_ seeing the whole "pipeline flush" approach go away, and be replaced by the CPU itself actually being better - so even the ...
Why is all this fixation on "volatile"? I don't think people want "volatile" keyword per se, they want atomic_read(&x) to _always_ compile into an memory-accessing instruction, not register access. -- vda -
On Sun, 9 Sep 2007 19:02:54 +0100 and ... why is that? is there any valid, non-buggy code sequence that makes that a reasonable requirement? -
Well, if you insist on having it again:
Waiting for atomic value to be zero:
while (atomic_read(&x))
continue;
gcc may happily convert it into:
reg = atomic_read(&x);
while (reg)
continue;
Expecting every driver writer to remember that atomic_read is not in fact
a "read from memory" is naive. That won't happen. Face it, majority of
driver authors are a bit less talented than Ingo Molnar or Arjan van de Ven ;)
The name of the macro is saying that it's a read.
We are confusing users here.
It's doubly confusing that cpy_relax(), which says _nothing_ about barriers
in its name, is actually a barrier you need to insert here.
--
vda
-
For driver authors who're too busy to learn the intricacies of atomic operations, we have the plain old spin lock which then lets you use normal data structures such as u32 safely. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
Bzzt. Even if you fixed gcc to actually convert it to a busy loop on a memory variable, you STILL HAVE A BUG as it may *NOT* be gcc that does the conversion, it may be that the CPU does the caching of the memory value. GCC has no mechanism to do cache-flushes or memory- barriers except through our custom inline assembly. Also, you probably want a cpu_relax() in there somewhere to avoid overheating the CPU. Thirdly, on a large system it may take some arbitrarily large amount of time for cache-propagation to update the value of the variable in your local CPU cache. Finally, if atomics are based on based on spinlock+interrupt-disable then you will sit in a tight busy- loop of spin_lock_irqsave()->spin_unlock_irqrestore(). Depending on your system's internal model this may practically lock up your core because the spin_lock() will take the cacheline for exclusive access and doing that in a loop can prevent any other CPU from doing any operation on it! Since your IRQs are disabled you even have a very small window that an IRQ will come along and free it up long enough for the update to take place. is *completely* buggy because you could very easily have 4 CPUs doing this on an atomic variable with a value of 1 and end up with it at negative 3 by the time you are done. Moreover all the alternatives are also buggy, with the sole exception of this rather obvious- You simply CANNOT use an atomic_t as your sole synchronizing primitive, it doesn't work! You virtually ALWAYS want to use an atomic_t in the following types of situations: (A) As an object refcount. The value is never read except as part of an atomic_dec_return(). Why aren't you using "struct kref"? (B) As an atomic value counter (number of processes, for example). Just "reading" the value is racy anyways, if you want to enforce a limit or something then use atomic_inc_return(), check the result, and use atomic_dec() if it's too big. If you just want to ...
CPU can cache the value all right, but it cannot use that cached value
*forever*, it has to react to invalidate cycles on the shared bus
and re-fetch new data.
IOW: atomic_read(&x) which compiles down to memory accessor
Yes, but "arbitrarily large amount of time" is actually measured
Yes, but
1. CPU shouldn't overheat (in a sense that it gets damaged),
it will only use more power than needed.
2. cpu_relax() just throttles down my CPU, so it's performance
optimization only. Wait, it isn't, it's a barrier too.
Wow, "cpu_relax" is a barrier? How am I supposed to know
that without reading lkml flamewars and/or header files?
Let's try reading headers. asm-x86_64/processor.h:
#define cpu_relax() rep_nop()
So, is it a barrier? No clue yet.
/* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
static inline void rep_nop(void)
{
__asm__ __volatile__("rep;nop": : :"memory");
}
Comment explicitly says that it is "a good thing" (doesn't say
that it is mandatory) and says NOTHING about barriers!
Barrier-ness is not mentioned and is hidden in "memory" clobber.
Do you think it's obvious enough for average driver writer?
I think not, especially that it's unlikely for him to even start
suspecting that it is a memory barrier based on the "cpu_relax"
You are basically trying to educate me how to use atomic properly.
You don't need to do it, as I am (currently) not a driver author.
I am saying that people who are already using atomic_read()
(and who unfortunately did not read your explanation above)
will still sometimes use atomic_read() as a way to read atomic value
*from memory*, and will create nasty heisenbugs for you to debug.
--
vda
-
static inline int
qla2x00_wait_for_loop_ready(scsi_qla_host_t *ha)
{
int return_status = QLA_SUCCESS;
unsigned long loop_timeout ;
scsi_qla_host_t *pha = to_qla_parent(ha);
/* wait for 5 min at the max for loop to be ready */
loop_timeout = jiffies + (MAX_LOOP_TIMEOUT * HZ);
while ((!atomic_read(&pha->loop_down_timer) &&
atomic_read(&pha->loop_state) == LOOP_DOWN) ||
atomic_read(&pha->loop_state) != LOOP_READY) {
if (atomic_read(&pha->loop_state) == LOOP_DEAD) {
return_status = QLA_FUNCTION_FAILED;
break;
}
msleep(1000);
if (time_after_eq(jiffies, loop_timeout)) {
return_status = QLA_FUNCTION_FAILED;
break;
}
}
return (return_status);
}
Is above correct or buggy? Correct, because msleep is a barrier.
Is it obvious? No.
static void
qla2x00_rst_aen(scsi_qla_host_t *ha)
{
if (ha->flags.online && !ha->flags.reset_active &&
!atomic_read(&ha->loop_down_timer) &&
!(test_bit(ABORT_ISP_ACTIVE, &ha->dpc_flags))) {
do {
clear_bit(RESET_MARKER_NEEDED, &ha->dpc_flags);
/*
* Issue marker command only when we are going to start
* the I/O.
*/
ha->marker_needed = 1;
} while (!atomic_read(&ha->loop_down_timer) &&
(test_bit(RESET_MARKER_NEEDED, &ha->dpc_flags)));
}
}
Is above correct? I honestly don't know. Correct, because set_bit is
a barrier on _all _memory_? Will it break if set_bit will be changed
to be a barrier only on its operand? Probably yes.
drivers/kvm/kvm_main.c
while (atomic_read(&completed) != needed) {
cpu_relax();
...It's *buggy*. But it has nothing to do with any msleep() in the loop, or anything else. And more importantly, it would be equally buggy even *with* a "volatile" atomic_read(). Why is this so hard for people to understand? You're all acting like morons. The reason it is buggy has absolutely nothing to do with whether the read is done or not, it has to do with the fact that the CPU may re-order the reads *regardless* of whether the read is done in some specific order by the compiler ot not! In effect, there is zero ordering between all those three reads, and if you don't have memory barriers (or a lock or other serialization), that code is buggy. So stop this idiotic discussion thread already. The above kind of code needs memory barriers to be non-buggy. The whole "volatile or not" discussion is totally idiotic, and pointless, and anybody who doesn't understand that by now needs to just shut up and think about it more, rather than make this discussion drag out even further. The fact is, "volatile" *only* makes things worse. It generates worse code, and never fixes any real bugs. This is a *fact*. Linus -
I am not saying that this code is okay, this isn't the point. (The code is in fact awful for several more reasons). My point is that people are confused as to what atomic_read() exactly means, and this is bad. Same for cpu_relax(). First one says "read", and second one doesn't say "barrier". This is real code from current kernel which demonstrates this: "I don't know that cpu_relax() is a barrier already": drivers/kvm/kvm_main.c
Q&A: Q: When is it OK to use atomic_read()? A: You are asking the question, so never. Q: But I need to check the value of the atomic at this point in time... A: Your code is buggy if it needs to do that on an atomic_t for anything other than debugging or optimization. Use either atomic_*_return() or a lock and some normal integers. Q: "So why can't the atomic_read DTRT magically?" A: Because "the right thing" depends on the situation and is usually best done with something other than atomic_t. If somebody can post some non-buggy code which is correctly using atomic_read() *and* depends on the compiler generating extra nonsensical loads due to "volatile" then the issue *might* be reconsidered. This also includes samples of code which uses atomic_read() and needs memory barriers (so that we can fix the buggy code, not so we can change atomic_read()). So far the only code samples anybody has posted are buggy regardless of whether or not the value and/or accessors are flagged "volatile" or not. And hey, maybe the volatile ops *should* be implemented in inline ASM for future- proof-ness, but that's a separate issue. Cheers, Kyle Moffett -
Yes, lets just drop the volatiles now! We need a patch that gets rid of them.... Volunteers? -
From: Chris Snook <csnook@redhat.com>
Unambiguously document the fact that atomic_read() and atomic_set()
do not imply any ordering or memory access, and that callers are
obligated to explicitly invoke barriers as needed to ensure that
changes to atomic variables are visible in all contexts that need
to see them.
Signed-off-by: Chris Snook <csnook@redhat.com>
--- a/Documentation/atomic_ops.txt 2007-07-08 19:32:17.000000000 -0400
+++ b/Documentation/atomic_ops.txt 2007-09-10 19:02:50.000000000 -0400
@@ -12,7 +12,11 @@
C integer type will fail. Something like the following should
suffice:
- typedef struct { volatile int counter; } atomic_t;
+ typedef struct { int counter; } atomic_t;
+
+ Historically, counter has been declared volatile. This is now
+discouraged. See Documentation/volatile-considered-harmful.txt for the
+complete rationale.
The first operations to implement for atomic_t's are the
initializers and plain reads.
@@ -42,6 +46,22 @@
which simply reads the current value of the counter.
+*** WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! ***
+
+Some architectures may choose to use the volatile keyword, barriers, or
+inline assembly to guarantee some degree of immediacy for atomic_read()
+and atomic_set(). This is not uniformly guaranteed, and may change in
+the future, so all users of atomic_t should treat atomic_read() and
+atomic_set() as simple C assignment statements that may be reordered or
+optimized away entirely by the compiler or processor, and explicitly
+invoke the appropriate compiler and/or memory barrier for each use case.
+Failure to do so will result in code that may suddenly break when used with
+different architectures or compiler optimizations, or even changes in
+unrelated code which changes how the compiler optimizes the section
+accessing atomic_t variables.
+
+*** YOU HAVE BEEN WARNED! ***
+
Now, we move onto the actual atomic operation interfaces.
void atomic_add(int i, atomic_t *v);
-
Acked-by: Christoph Lameter <clameter@sgi.com> -
On Mon, 10 Sep 2007 11:56:29 +0100 and this I would say is buggy code all the way. Not from a pure C level semantics, but from a "busy waiting is buggy" semantics level and a "I'm inventing my own locking" semantics level. -
After inspecting arch/*, I cannot agree with you. Otherwise almost all major architectures use "conceptually buggy busy-waiting": arch/alpha arch/i386 arch/ia64 arch/m32r arch/mips arch/parisc arch/powerpc arch/sh arch/sparc64 arch/um arch/x86_64 All of the above contain busy-waiting on atomic_read. Including these loops without barriers: arch/mips/kernel/smtc.c while (atomic_read(&idle_hook_initialized) < 1000) ; arch/mips/sgi-ip27/ip27-nmi.c while (atomic_read(&nmied_cpus) != num_online_cpus()); [Well maybe num_online_cpus() is a barrier, I didn't check] arch/sh/kernel/smp.c if (wait) while (atomic_read(&smp_fn_call.finished) != (nr_cpus - 1)); Bugs? -- vda -
On Mon, 10 Sep 2007 15:38:23 +0100 the arch/ people obviously are allowed to do their own locking stuff... BECAUSE THEY HAVE TO IMPLEMENT THAT! the arch maintainers know EXACTLY how their hw behaves (well, we hope) so they tend to be the exception to many rules in the kernel.... -
The ieee1394 and firewire subsystems have open, undiagnosed bugs, also on i386 and x86-64. But whether there is any bug because of wrong assumptions about atomic_read among them, I don't know. I don't know which assumptions the authors made, I only know that I wasn't aware of ...or, if there is none, the implementation specification (as in case of the atomic ops), or, if there is none, the implementation (as in case of Provided they are aware that they might not have the full picture of the lockless primitives. :-) -- Stefan Richter -=====-=-=== =--- =---= http://arcgraph.de/sr/ -
This is actually really well-defined in C, not fuzzy at all. "Volatile accesses" are a side effect, and no side effects can be reordered with respect to sequence points. The side effects that matter in the kernel environment are: 1) accessing a volatile object; 2) modifying an object; 3) volatile asm(); 4) calling a function that does any of these. We certainly should avoid volatile whenever possible, but "because it's fuzzy wrt reordering" is not a reason -- all alternatives have exactly the same issues. Segher -
"volatile" has nothing to do with reordering. atomic_dec() writes to memory, so it _does_ have "volatile semantics", implicitly, as long as the compiler cannot optimise the atomic variable away completely -- any store counts as a side effect. Segher -
If you're talking of "volatile" the type-qualifier keyword, then http://lkml.org/lkml/2007/8/16/231 (and sub-thread below it) shows I don't think an atomic_dec() implemented as an inline "asm volatile" or one that uses a "forget" macro would have the same re-ordering guarantees as an atomic_dec() that uses a volatile access cast. -
I'm not sure what in that mail you mean, but anyway... Yes, of course, the fact that "volatile" creates a side effect prevents certain things from being reordered wrt the atomic_dec(); but the atomic_dec() has a side effect *already* so the volatile The "asm volatile" implementation does have exactly the same reordering guarantees as the "volatile cast" thing, if that is implemented by GCC in the "obvious" way. Even a "plain" asm() will do the same. Segher -
That's precisely what that sub-thread (read down to the last mail there, and not the first mail only) shows. So yes, "volatile" does have something to do with re-ordering (as guaranteed by the C Read the relevant GCC documentation. [ of course, if the (latest) GCC documentation is *yet again* wrong, then alright, not much I can do about it, is there. ] -
"asm volatile" creates a side effect. Side effects aren't allowed to be reordered wrt sequence points. This is exactly There was (and is) nothing wrong about the "+m" documentation, if that is what you are talking about. It could be extended now, to allow "+m" -- but that takes more than just "fixing" the documentation. Segher -
No, the code in that sub-thread I earlier pointed you at *WAS* written such that there was a sequence point after all the uses of that volatile access cast macro, and _therefore_ we were safe from re-ordering (behaviour guaranteed by the C standard). But you seem to be missing the simple and basic fact that: (something_that_has_side_effects || statement) != something_that_is_a_sequence_point Now, one cannot fantasize that "volatile asms" are also sequence points. In fact such an argument would be sadly mistaken, because "sequence points" are defined by the C standard and it'd be horribly wrong to No, you didn't read: http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html Read the bit about the need for artificial dependencies, and the example given there: asm volatile("mtfsf 255,%0" : : "f" (fpenv)); sum = x + y; The docs explicitly say the addition can be moved before the "volatile asm". Hopefully, as you know, (x + y) is an C "expression" and hence a "sequence point" as defined by the standard. So the "volatile asm" should've happened before it, right? Wrong. I know there is also stuff written about "side-effects" there which _could_ give the same semantic w.r.t. sequence points as the volatile access casts, but hey, it's GCC's own documentation, you obviously can't find fault with _me_ if there's wrong stuff written in there. Say that to GCC ... See, "volatile" C keyword, for all it's ill-definition and dodgy semantics, is still at least given somewhat of a treatment in the C standard (whose quality is ... ummm, sadly not always good and clear, but unsurprisingly, still about 5,482 orders-of-magnitude times better than GCC docs). Semantics of "volatile" as applies to inline No, there was (and is) _everything_ wrong about the "+" documentation as applies to memory-constrained operands. I don't give a whit if it's some workaround in their gimplifier, or the other, that makes it possible to use "+m" (like the current kernel code does). The ...
That's nonsense. GCC can extend the C standard any way they bloody well please -- witness the fact that they added an The _end of a full expression_ is a sequence point, not every expression. And that is irrelevant here anyway. It is perfectly fine to compute x+y any time before the assignment; the C compiler is allowed to compute it _after_ the assignment even, if it could figure out how ;-) If you find any problems/shortcomings in the GCC documentation, please file a PR, don't go whine on some unrelated mailing lists. The documentation simply doesn't say "+m" is allowed. The code to allow it was added for the benefit of people who do not read the documentation. Documentation for "+m" might get added later if it is decided this [the code, not the documentation] is a sane thing If you're talking details, you better get them right. Handwaving is fine with me as long as you're not purporting you're not. And I simply cannot stand false assertions. You can always ignore me if _you_ take issue with _that_ :-) Segher -
LOTS there, which obviously isn't correct, but which I'll reply to later, easier stuff first. (call this "handwaving" if you want, but don't worry, Huh? "If the (current) documentation doesn't match up with the (current) code, then _at least one_ of them has to be (as of current) wrong." I wonder how could you even try to disagree with that. And I didn't go whining about this ... you asked me. (I think I'd said something to the effect of GCC docs are often wrong, which is true, but probably you feel saying that is "not allowed" on non-gcc lists?) As for the "PR" you're requesting me to file with GCC for this, that gcc-patches@ thread did precisely that and more (submitted a patch to said documentation -- and no, saying "documentation might get added later" is totally bogus and nonsensical -- documentation exists to document current behaviour, not past). But come on, this is wholly petty. I wouldn't have replied, really, if you weren't so provoking. -
Easy. The GCC documentation you're referring to is the user's manual. See the blurb on the first page: "This manual documents how to use the GNU compilers, as well as their features and incompatibilities, and how to report bugs. It corresponds to GCC version 4.3.0. The internals of the GNU compilers, including how to port them to new targets and some information about how to write front ends for new languages, are documented in a separate manual." _How to use_. This documentation doesn't describe in minute detail everything the compiler does (see the source code for that -- no, it isn't described in the internals manual either). If it doesn't tell you how to use "+m", and even tells you _not_ to use it, maybe that is what it means to say? It doesn't mean "+m" doesn't actually do something. It also doesn't mean it does what you think it should do. It might do just that of course. But treating No need to guess at what you said, even if you managed to delete your own mail already, there are plenty of free web-based archives and that to me reads as complaining that the ISO C standard "isn't very good" and that the GCC documentation is 10**5482 times worse even. Which of course is hyperbole and cannot be true. It also isn't helpful in any way or form for anyone on this list. I call Yes, documentation of that size often has shortcomings. No surprise there. However, great effort is made to make it better documentation, and especially to keep it up to date; if you find any errors or omissions, please report them. There are many ways how to do that, You're allowed to say whatever you want. Let's have a quote again I read that as a friendly request, not a prohibition. Well maybe "Problem report", a bugzilla ticket. Sorry for using terminology Actually not -- PRs make sure issues aren't forgotten (although they might gather dust, sure). But yes, submitting patches is a When code like you want to write becomes a supported feature, that will be ...
[ LOL, you _are_ shockingly petty! ]
Wow, now that's a nice "disclaimer". By your (poor) standards of writing
documentation, one can as well write any factually incorrect stuff that
Oh, really? Considering how much is (left out of being) documented, often
one would reasonably have to experimentally see (with testcases) how the
compiler behaves for some given code. Well, at least _I_ do it often
(several others on this list do as well), and I think there's everything
smart about it rather than having to read gcc sources -- I'd be surprised
(unless you have infinite free time on your hands, which does look like
teh case actually) if someone actually prefers reading gcc sources first
to know what/how gcc does something for some given code, rather than
simply write it out, compile and look the generated code (saves time for
Try _reading_ what I said there, for a change, dude. I'd originally only
said "unless GCC's docs is yet again wrong" ... then _you_ asked me what,
after which this discussion began and I wrote the above [which I fully
agree with -- so what if I used hyperbole in my sentence (yup, that was
intended, and obviously, exaggeration), am I not even allowed to do that?
Man, you're a Nazi or what ...] I didn't go whining about on my own as
you'd had earlier suggested, until _you_ asked me.
[ Ick, I somehow managed to reply this ... this is such a ...
^^^^^^^^^^^^^^^^^^^^^^
What crap. It is _perfectly reasonable_ to expect (current) documentation
to keep up with (current) code behaviour. In fact trying to justify such
a state is completely bogus and nonsensical.
Satyam
-
Stores can be reordered. Only x86 has (mostly) implicit write ordering. So no atomic_dec has no volatile semantics and may be reordered on a variety of processors. Writes to memory may not follow code order on several processors. -
The one exception to this being the case where process-level code is communicating to an interrupt handler running on that same CPU -- on all CPUs that I am aware of, a given CPU always sees its own writes in order. Thanx, Paul -
Yes but that is due to the code path effectively continuing in the interrupt handler. The cpu makes sure that op codes being executed always see memory in a consistent way. The basic ordering problem with out of order writes is therefore coming from other processors concurrently executing code and holding variables in registers that are modified elsewhere. The only solution that I know of are one or the other form of barrier. -
So we are agreed then -- volatile accesses may be of some assistance when interacting with interrupt handlers running on the same CPU (presumably when using per-CPU variables), but are generally useless when sharing variables among CPUs. Correct? Thanx, Paul -
Read again: I said the C "volatile" construct has nothing to do The _compiler_ isn't allowed to reorder things here. Yes, of course you do need stronger barriers for many purposes, volatile isn't all that useful you know. Segher -
Shouldn't it be just re-written without the loop: if ((tmp = atomic_read(&j->DSPWrite)) > 0) atomic_sub(&j->DSPWrite, tmp); Has all the same bugs, but runs much faster :-) -Tony -
atomic_dec() already has volatile behavior everywhere, so this is semantically okay, but this code (and any like it) should be calling cpu_relax() each iteration through the loop, unless there's a compelling reason not to. I'll allow that for some hardware drivers (possibly this one) such a compelling reason may exist, but hardware-independent core subsystems probably have no excuse. If the maintainer of this code doesn't see a compelling reason not to add cpu_relax() in this loop, then it should be patched. -- Chris -
In 2 + 3 you may increment the atomic variable in some places. The value of the atomic variable may not matter because you only do optimizations. Checking a atomic_t for a definite state has to involve either some side conditions (lock only taken if refcount is <= 0 or so) or done If you want to make it less complex then drop volatile which causes weird side effects without solving any problems as you just pointed out. -
The other set of problems are communication between process context and interrupt/NMI handlers. Volatile does help here. And the performance impact of volatile is pretty near zero, so why have the non-volatile variant? Thanx, Paul -
If there is a higher-level locking scheme then there is no point to using atomic_t variables. Atomic_t is specifically for the situation where multiple CPUs are updating a variable without locking. Paul. -
And don't forget about the case where it is an I/O device that is updating the memory (in buffer descriptors or similar). The driver needs to do a "volatile" atomic read to get at the most recent version of that data, which can be important for optimising latency (or throughput even). There is no other way the kernel can get that info -- doing an MMIO read is way way too expensive. Segher -
An atomic_read() implemented as a "normal" C variable read would allow that read to be combined with another "normal" read from that variable. This could perhaps be marginally useful, although I'd bet you cannot see it unless counting cycles on a simulator or counting bits in the binary size. With an asm() implementation, the compiler can not do this; with a "volatile" implementation (either volatile variable or volatile-cast), this invokes undefined behaviour (in both C and GCC). Segher -
A "timely" fashion? One cannot rely on something like that when coding. The visibility of updates is insured by barriers and not by some fuzzy notion of "timeliness". -
But here you do have some notion of time: while (atomic_read(&x)) continue; "continue when other CPU(s) decrement it down to zero". If "read" includes an insn which accesses RAM, you will see "new" value sometime after other CPU decrements it. "Sometime after" is on the order of nanoseconds here. It is a valid concept of time, right? The whole confusion is about whether atomic_read implies "read from RAM" or not. I am in a camp which thinks it does. You are in an opposite one. We just need a less ambiguous name. What about this: /** * atomic_read - read atomic variable * @v: pointer of type atomic_t * * Atomically reads the value of @v. * No compiler barrier implied. */ #define atomic_read(v) ((v)->counter) +/** + * atomic_read_uncached - read atomic variable from memory + * @v: pointer of type atomic_t + * + * Atomically reads the value of @v. This is guaranteed to emit an insn + * which accesses memory, atomically. No ordering guarantees! + */ +#define atomic_read_uncached(v) asm_or_volatile_ptr_magic(v) I was thinking of s/atomic_read/atomic_get/ too, but it implies "taking" atomic a-la get_cpu()... -- vda -
I'm curious about one minor tangential point. Why, instead of: b = *(volatile int *)&a; why can't this just be expressed as: b = (volatile int)a; Isn't it the contents of a that's volatile, i.e. it's value can change invisibly to the compiler, and that's why you want to force a read from memory? Why do you need the "*(volatile int *)&" construct? -Bill -
Hi Bill, "b = (volatile int)a;" doesn't help us because a cast to a qualified type has the same effect as a cast to an unqualified version of that type, as mentioned in 6.5.4:4 (footnote 86) of the standard. Note that "volatile" is a type-qualifier, not a type itself, so a cast of the _object_ itself to a qualified-type i.e. (volatile int) would not make the access itself volatile-qualified. To serve our purposes, it is necessary for us to take the address of this (non-volatile) object, cast the resulting _pointer_ to the corresponding volatile-qualified pointer-type, and then dereference it. This makes that particular _access_ be volatile-qualified, without the object itself being such. Also note that the (dereferenced) result is also a valid lvalue and hence can be used in "*(volatile int *)&a = b;" kind of construction (which we use for the atomic_set case). Satyam -
Here, I should obviously admit that the semantics of *(volatile int *)& aren't any neater or well-defined in the _language standard_ at all. The standard does say (verbatim) "precisely what constitutes as access to object of volatile-qualified type is implementation-defined", but GCC does help us out here by doing the right thing. Accessing the non-volatile object there using the volatile-qualified pointer-type cast makes GCC treat the object stored at that memory address itself as if it were a volatile object, thus making the access end up having what we're calling as "volatility" semantics here. Honestly, given such confusion, and the propensity of the "volatile" type-qualifier keyword to be ill-defined (or at least poorly understood, often inconsistently implemented), I'd (again) express my opinion that it would be best to avoid its usage, given other alternatives do exist. -
[ Bill tells me in private communication he gets this already, but I think it's more complicated than the shoddy explanation I'd made earlier so would wish to make this clearer in detail one last time, for the benefit of others listening in or reading the archives. ] Casts don't produce lvalues, and the cast ((volatile int)a) does not produce the object-int-a-qualified-as-"volatile" -- in fact, the result of the above cast is whatever is the _value_ of "int a", with the access to that object having _already_ taken place, as per the actual type-qualification of the object (that was originally declared as being _non-volatile_, in fact). Hence, defining atomic_read() as: #define atomic_read(v) ((volatile int)((v)->counter)) would be buggy and not give "volatility" semantics at all, unless the "counter" object itself isn't volatile-qualified already (which it isn't). The result of the cast itself being the _value_ of the int object, and not the object itself (i.e., not an lvalue), is thereby independent of type-qualification in that cast itself (it just wouldn't make any difference), hence the "cast to a qualified type has the same effect as a cast to an unqualified version of that type" bit in section 6.5.4:4 Dereferencing using the *(pointer-type-cast)& construct, OTOH, serves us well: #define atomic_read(v) (*(volatile int *)&(v)->counter) Firstly, note that the cast here being (volatile int *) and not (int * volatile) qualifies the type of the _object_ being pointed to by the pointer in question as being volatile-qualified, and not the pointer itself (6.2.5:27 of the standard, and 6.3.2.3:2 allows us to convert from a pointer-to-non-volatile-qualified-int to a pointer-to- volatile-qualified-int, which suits us just fine) -- but note that the _access_ to that address itself has not yet occurred. _After_ specifying the memory address as containing a volatile-qualified- int-type object, (and GCC co-operates as mentioned below), we proceed ...
Where do you get that idea? GCC manual, section 6.1, "When is a Volatile Object Accessed?" doesn't say anything of the Yeah. Or we can have an email thread like this every time someone proposes a patch that uses an atomic variable ;-) Segher -
True, "implementation-defined" as per the C standard _is_ supposed to mean "unspecified behaviour where each implementation documents how the choice is made". So ok, probably GCC isn't "documenting" this implementation-defined behaviour which it is supposed to, but can't really fault them much for this, probably. -
That doesn't prove anything. Experiments can only disprove GCC _is_ documenting this, namely in this section 6.1. It doesn't mention volatile-casted stuff. Draw your own conclusions. Segher -
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ... exactly. So that's why I said "GCC isn't documenting _this_". Man, try _reading_ mails before replying to them ... -
There is no such thing as "volatile-qualified access" defined anywhere; there only is the concept of a "volatile-qualified There is a quite convincing argument that such an access _is_ an access to a volatile object; see GCC PR21568 comment #9. This probably isn't the last word on the matter though... Segher -
I find this argument completely convincing and retract the contrary argument that I've made many times in this forum and others. You learn something new every day. Just in case it wasn't clear: int i; *(volatile int *)&i=2; In this case, there *is* an access to a volatile object. This is the end result of the the standard's definition of what it means to apply the 'volatile int *' cast to '&i' and then apply the '*' operator to the result and use it as an lvalue. C does not define the type of an object by how it is defined but by how it is accessed! DS -
[ Your mailer drops Cc: lists, munges headers, does all sorts of badness. Please fix that. ] True, see my last mail in this sub-thread that explains precisely this :-) Satyam -
Sure, "volatile-qualified access" was not some standard term I used there. Just something to mean "an access that would make the compiler treat the object at that memory as if it were an object with a volatile-qualified type". Now the second wording *IS* technically correct, but come on, it's 24 words long whereas the original one was 3 -- and hopefully anybody reading the shorter phrase *would* have known anyway what was meant, without having to be pedantic about it :-) Satyam -
Well you were talking pretty formal (and detailed) stuff, so IMHO it's good to have that exactly correct. Sure it's nicer to use small words most of the time :-) Segher -
Atomic operations only require exclusive access to the cacheline while the value is modified. -
