Home All Groups Group Topic Archive Search About

The Interlocked on the Edge of Forever

Author
30 Apr 2007 8:13 PM
Chris Mullins [MVP]

I've got a quick question that's been bugging me for a long, long  time:

Let's say I've got a member variable (in a heavily threaded app):
private int _firstTime = 0;

To make changes to this is easy enough:
    InterlockedExchange(ref _firstTime, 1);

.... or I can use the InterlockedCompareExchange:
    int oldValue = InterlockedCompareExchange(ref _firstTime, 1, 0);

No problems here.

.... but, if I just want to read the value, I CANNOT do:

    int myValue = _firstTime;
or
    if (_fistTime==0)

Doing these requires a memory barrier of some type (volatile variable,
monitor, etc) to have any degree of reliability.

Now, reading the value out using InterlockedCompareExchange always seemed a
bit silly to me. I don't want to always write:

int  myValue = InterlockedCompareExchange(ref _firstTime, Int32.MinValue,
Int32.MinValue);

I've never found (although not for lack of looking):
int myValue = InterlockedRead(ref _firstTime);

Of the code that I write, reading the value, and branching based on it's
value, is much more common than writing to the value.

I have, in the past, used volitile variables for this - but I've stopped due
to the various issues regarding volatiles (non atomic reads & writes).

Is there a concensus on the best way to do this? It seems like all the
solutions are.... incomplete.

Essentially I'm looking for a volitile varaiable that is atomically read and
written. Unfortuantly, these don't exist in any language I've yet used.

--
Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP
http://www.coversant.com/blogs/cmullins
Author
30 Apr 2007 10:04 PM
Jon Skeet [C# MVP]
Chris Mullins [MVP] <cmull***@yahoo.com> wrote:

<snip>

> Essentially I'm looking for a volitile varaiable that is atomically
> read and written. Unfortuantly, these don't exist in any language
> I've yet used.

I'm not sure what you're after that a simple volatile *wouldn't* give
you here, given that *any* Int32 variable will be accessed atomically,
whether it's volatile or not. You can't do atomic increments without
something like Interlocked, but simple reads and writes are atomic.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Are all your drivers up to date? click for free checkup

Author
1 May 2007 1:16 AM
Peter Duniho
On Mon, 30 Apr 2007 13:13:47 -0700, Chris Mullins [MVP] 
<cmull***@yahoo.com> wrote:

> [...]
> Of the code that I write, reading the value, and branching based on it's
> value, is much more common than writing to the value.
>
> I have, in the past, used volitile variables for this - but I've stopped 
> due to the various issues regarding volatiles (non atomic reads & 
> writes).

Jon's reply matches what I believed to be true.  Can you explain why it is 
you believe that a non-atomic read or write is possible for a 32-bit 
value?  If that's true, it's a pretty significant error in the way I've 
approached multi-threaded programming.  Which is not out of the question 
by any means, but if that's the case I really want to know about it and 
know why it's the case.

Pete
Author
1 May 2007 4:09 AM
Barry Kelly
Chris Mullins [MVP] wrote:

> I've got a quick question that's been bugging me for a long, long  time:
>
> Let's say I've got a member variable (in a heavily threaded app):
> private int _firstTime = 0;

> Doing these requires a memory barrier of some type (volatile variable,
> monitor, etc) to have any degree of reliability.

Yes.

> I've never found (although not for lack of looking):
> int myValue = InterlockedRead(ref _firstTime);

Perhaps Thread.VolatileRead() is what you're looking for?

> I have, in the past, used volitile variables for this - but I've stopped due
> to the various issues regarding volatiles (non atomic reads & writes).

Can you expand on your objection to volatile variables?

> Is there a concensus on the best way to do this? It seems like all the
> solutions are.... incomplete.
>
> Essentially I'm looking for a volitile varaiable that is atomically read and
> written. Unfortuantly, these don't exist in any language I've yet used.

I'm not aware of any circumstances where normally aligned volatile
variables of the machine word size or smaller are read or written
non-atomically on the CLR. Can you explain more?

-- Barry

Author
5 May 2007 10:07 PM
Holger Grund
Show quote Hide quote
"Chris Mullins [MVP]" <cmull***@yahoo.com> wrote
> I've got a quick question that's been bugging me for a long, long  time:
>
> Let's say I've got a member variable (in a heavily threaded app):
> private int _firstTime = 0;
>
> To make changes to this is easy enough:
>    InterlockedExchange(ref _firstTime, 1);
>
> ... or I can use the InterlockedCompareExchange:
>    int oldValue = InterlockedCompareExchange(ref _firstTime, 1, 0);
>
> No problems here.
>
> ... but, if I just want to read the value, I CANNOT do:
>
>    int myValue = _firstTime;
> or
>    if (_fistTime==0)
>
Why?

> Doing these requires a memory barrier of some type (volatile variable,
> monitor, etc) to have any degree of reliability.
>
I don't see the need for a memory barrier, unless you need memory
ordering guarantees. Volatile access would make sense since you really want
to read from memory. However, this only affects how the Jitter generates
code.

> Now, reading the value out using InterlockedCompareExchange always seemed
> a bit silly to me. I don't want to always write:
>
> int  myValue = InterlockedCompareExchange(ref _firstTime, Int32.MinValue,
> Int32.MinValue);

This looks quite pointless. If you don't need memory ordering guarantees a
volatile read is good enough. I don't find the spec now, but I suspect
InterlockedCompareExchange has probably both acquire and release
semantics.

>
> I've never found (although not for lack of looking):
> int myValue = InterlockedRead(ref _firstTime);
>
This is just a memory fence plus a volatile read. And it's not clear that
you
actually need the former.

> Of the code that I write, reading the value, and branching based on it's
> value, is much more common than writing to the value.
>
Again volatile read is what you want.

> I have, in the past, used volitile variables for this - but I've stopped
> due to the various issues regarding volatiles (non atomic reads & writes).
>
What makes you believe that either reads or writes are non-atomic?
They certainly are at CLR instruction level.

> Is there a concensus on the best way to do this? It seems like all the
> solutions are.... incomplete.

For a one-time init thing you will probably need atomic compare
exchange for acquiring the construction lock and release semantics when
signaling successful construction.

If you need to wait for construction you'd probably have three states:
- uninitialized
- under_construction
- constructed

Checking whether the shared object is constructed is done by a volatile
read from the guard variable. If the result is "constructed" you're done.
Just a few cycles - maybe satisified from cache.

If not, use atomic compare exchange to acquire the "under-construction"
lock. If an other thread was faster wait. If not construct object and
use store with release semantics to signal object constructed.

I'm not sure if there is a really efficient Wait method (in this particular
context) in the CLR.

Anyway, the standard case (object already constructed) is very fast.
>
> Essentially I'm looking for a volitile varaiable that is atomically read
> and written. Unfortuantly, these don't exist in any language I've yet
> used.
>
I think the CLI standard requires that to a certain extent. That said, the
text is vague. Bottom line I think you can rely on reads or writes being
volatile, but not more complex operations that both read and write,
e.g.:

volatile int i;
do_something_with(i); // ok
i = some_value(); // ok
i++; // bad no guarantees here

But one might argue that this is really a language thing.

-hg
Author
6 May 2007 5:38 AM
William Stacey [C# MVP]
Show quote Hide quote
| To make changes to this is easy enough:
|    InterlockedExchange(ref _firstTime, 1);
|
| ... or I can use the InterlockedCompareExchange:
|    int oldValue = InterlockedCompareExchange(ref _firstTime, 1, 0);
|
| No problems here.
|
| ... but, if I just want to read the value, I CANNOT do:
|
|    int myValue = _firstTime;
| or
|    if (_fistTime==0)
|
| Doing these requires a memory barrier of some type (volatile variable,
| monitor, etc) to have any degree of reliability.

you can.  The Interlocked write ensures this across all cpus and cache.  Any
thread reading the var after the Interlocked write will see the correct var
so it is a memory barrier without a volatile modifier.  All locks are
ultimatley built on interlocked at the lowest level to get this behavior.
So for reads, you can just go "if (var == 2){}".  Naturally, this works for
a single var.  If you have multiple invariants you need to protect, you
probabaly need a lock.
Author
6 May 2007 6:49 AM
Jon Skeet [C# MVP]
William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
> | Doing these requires a memory barrier of some type (volatile variable,
> | monitor, etc) to have any degree of reliability.
>
> you can.  The Interlocked write ensures this across all cpus and cache.  Any
> thread reading the var after the Interlocked write will see the correct var
> so it is a memory barrier without a volatile modifier.

But surely that's only guaranteed if the read *actually* happens after
the Interlocked write occurs. If the variable isn't volatile, there's
nothing to stop the reading thread from having reordered the reads. For
instance:

int x=0, y=0;

....

Thread 1:
Interlocked.Increment (ref x, 1);
Interlocked.Increment (ref y, 1);



Thread 2:
int a = x;
int b = y;

The JIT *could* reorder the reads of x and y in thread 2, such that you
get a=0, b=1, despite the fact that x is incremented before y. That
wouldn't be the case with volatile variables.


At least, that's how I understand it.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Author
6 May 2007 7:27 PM
William Stacey [C# MVP]
| But surely that's only guaranteed if the read *actually* happens after
| the Interlocked write occurs. If the variable isn't volatile, there's
| nothing to stop the reading thread from having reordered the reads. For
| instance:

Hi Jon. Not sure I totally follow you here.  If Interlocked x happens, a
reader will never see x==0 (until it wraps). So a read on x or y is volatile
using an interlocked. Naturally, in either method, a reader could read x and
y after x was incremented, but before y was incremented because there is no
lock here. But this is a different problem (i.e race).  If we need both vars
to be atomic in respect to each other and handled as an invariant "set",
then we need a lock of some type protecting the set.  As I said, maybe I am
not following your intent and will slap my head.
--wjs
Author
6 May 2007 10:46 PM
Jon Skeet [C# MVP]
William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
> | But surely that's only guaranteed if the read *actually* happens after
> | the Interlocked write occurs. If the variable isn't volatile, there's
> | nothing to stop the reading thread from having reordered the reads. For
> | instance:
>
> Hi Jon. Not sure I totally follow you here.  If Interlocked x happens, a
> reader will never see x==0 (until it wraps). So a read on x or y is volatile
> using an interlocked. Naturally, in either method, a reader could read x and
> y after x was incremented, but before y was incremented because there is no
> lock here.

I still believe that it can read y before either of them are
incremented, and x after both of them are incremented. There's nothing
to stop the JIT from reordering the reads to read y before x, because
they're not volatile. I wouldn't be surprised to learn that the
interlocked operation makes sure that when it *does* try to read x and
y, those values are really the latest ones, but I don't think there's
any guarantee about the order of those reads unless at least one of
them is volatile. (It's nearly midnight, so I can't be bothered to work
out which it should be - y, I think.)

I definitely *wasn't* talking about the atomicity of the two operations
- just the ordering.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Author
6 May 2007 11:06 PM
William Stacey [C# MVP]
So are you saying the JIT can reorder these two lines:
int a = x;
int b = y;

Like so:
int b = y;
int a = x;

? tia

--
William Stacey [C# MVP]



Show quoteHide quote
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MPG.20a86b07bb54e61e7d@msnews.microsoft.com...
| William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
| > | But surely that's only guaranteed if the read *actually* happens after
| > | the Interlocked write occurs. If the variable isn't volatile, there's
| > | nothing to stop the reading thread from having reordered the reads.
For
| > | instance:
| >
| > Hi Jon. Not sure I totally follow you here.  If Interlocked x happens, a
| > reader will never see x==0 (until it wraps). So a read on x or y is
volatile
| > using an interlocked. Naturally, in either method, a reader could read x
and
| > y after x was incremented, but before y was incremented because there is
no
| > lock here.
|
| I still believe that it can read y before either of them are
| incremented, and x after both of them are incremented. There's nothing
| to stop the JIT from reordering the reads to read y before x, because
| they're not volatile. I wouldn't be surprised to learn that the
| interlocked operation makes sure that when it *does* try to read x and
| y, those values are really the latest ones, but I don't think there's
| any guarantee about the order of those reads unless at least one of
| them is volatile. (It's nearly midnight, so I can't be bothered to work
| out which it should be - y, I think.)
|
| I definitely *wasn't* talking about the atomicity of the two operations
| - just the ordering.
|
| --
| Jon Skeet - <sk***@pobox.com>
| http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
| If replying to the group, please do not mail me too
Author
7 May 2007 5:51 AM
Barry Kelly
William Stacey [C# MVP] wrote:

> So are you saying the JIT can reorder these two lines:
> int a = x;
> int b = y;
>
> Like so:
> int b = y;
> int a = x;

Yes, if x and y are not volatile locations. The language of the spec is
that reads and writes to volatile locations are the only observable side
effects of memory access; memory accesses may be arbitrarily reordered
if they aren't volatile. For example, if the value of y has been loaded
to a register because it was used somewhere before the line 'int a =
x;', then the value of y used in the line 'int b = y;' may predate the
actual in-memory value of x, because the value may be used from the
register, rather than freshly loading it from memory. So, volatile can
inhibit these kinds of optimizations (avoiding redundant loads).

There's more to it than that, volatile reads have acquire semantics,
volatile writes release semantics. That has stronger guarantees than
just inhibiting optimization, as it affects e.g. write buffer reordering
in some processors.

-- Barry

Author
7 May 2007 6:00 PM
Holger Grund
"Barry Kelly" <barry.j.ke***@gmail.com> wrote

> There's more to it than that, volatile reads have acquire semantics,
> volatile writes release semantics. That has stronger guarantees than
> just inhibiting optimization, as it affects e.g. write buffer reordering
> in some processors.

I see that in CLI spec, but have you actually verified it? It is not
what I would expect and not what I have seen.

Oddly, I see that the Jitter even removes dead loads altogher.

Anyway, given the rationale in the CLI spec (hardware
register access), I don't see how this could possibly work in
a reasonably cheap way for the P4 memory model.

-hg
Author
7 May 2007 7:09 AM
Jon Skeet [C# MVP]
William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
> So are you saying the JIT can reorder these two lines:
> int a = x;
> int b = y;
>
> Like so:
> int b = y;
> int a = x;

Exactly, so long as there aren't any memory barriers in the way. Memory
barriers (which include operations with volatile locations) stop the
JIT from moving memory operations beyond the barrier.

It's unfortunate that the CLI spec isn't nearly as clear as it might be
about all this, but Vance Morrison wrote a great MSDN article about the
..NET 2.0 model (which is stricter than the CLI spec):
http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Author
7 May 2007 3:57 PM
William Stacey [C# MVP]
Here is some more food for the party:
http://msdn2.microsoft.com/en-us/library/ms686355.aspx

Here he states the compiler will not re-order volatile variable access,
However these operations still *could be re-ordered by the processor.  In
fact, he shows correcting this issue with an Interlocked operation.  This
article also state that Interlocked functions *ensure appropriat barriers
for memory ordering.  So (at least for this example) it would seem
Interlocked provides as good or better symatics as the volatile example (and
also prevents CPU re-order).  However, I am not sure any this matters for
this example, as in both cases you still have the race issue.  R1 can read
x, then W1 writes x and y, then R1 reads y - or various combinations of
same. So some form of critical section around both x and y would seem to be
the proper behavior, unless we don't care about x or y in respect to each
other (i.e. simple counters that are not treated as a set).

--
William Stacey [C# MVP]
PCR concurrency library: www.codeplex.com/pcr
PSH Scripts Project www.codeplex.com/psobject


Show quoteHide quote
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MPG.20a8e1115b0c872180@msnews.microsoft.com...
| William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
| > So are you saying the JIT can reorder these two lines:
| > int a = x;
| > int b = y;
| >
| > Like so:
| > int b = y;
| > int a = x;
|
| Exactly, so long as there aren't any memory barriers in the way. Memory
| barriers (which include operations with volatile locations) stop the
| JIT from moving memory operations beyond the barrier.
|
| It's unfortunate that the CLI spec isn't nearly as clear as it might be
| about all this, but Vance Morrison wrote a great MSDN article about the
| .NET 2.0 model (which is stricter than the CLI spec):
| http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/
|
| --
| Jon Skeet - <sk***@pobox.com>
| http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
| If replying to the group, please do not mail me too
Author
7 May 2007 5:26 PM
Barry Kelly
William Stacey [C# MVP] wrote:

> Here is some more food for the party:
> http://msdn2.microsoft.com/en-us/library/ms686355.aspx
>
> Here he states the compiler will not re-order volatile variable access,
> However these operations still *could be re-ordered by the processor.

Volatile read has acquire semantics. Loads won't move before an acquire,
similarly, writes won't move after a release. These should be
implemented by CPU instructions, as explained in the article you linked
to, and aren't restricted to just compiler optimizations. This means
that volatile read on .NET is sufficient, because the CPU won't reorder
such that the second load moves before the first load, because of the
acquire semantics (aka read fence).

>  In
> fact, he shows correcting this issue with an Interlocked operation.  This
> article also state that Interlocked functions *ensure appropriat barriers
> for memory ordering.

The have full fence semantics (except the Interlocked*Acquire and
Interlocked*Release).

>  So (at least for this example) it would seem
> Interlocked provides as good or better symatics as the volatile example (and
> also prevents CPU re-order).

Volatile also prevents CPU re-order, but asymmetrically: read causes
acquire, write causes release. The write side release means that e.g.
"publishing" a new instance to a static volatile variable won't have a
problem such that some fields in the instance aren't fully retired
before other threads could read the instance reference - writes won't
move ahead of the release / write fence.

-- Barry

Author
8 May 2007 1:05 AM
William Stacey [C# MVP]
Thanks Barry.  My assertion has been that Interlocked works here, not that
volatile does not.  One thing outstanding, however, is that little "blurb"
about the cpu being able to reordering reads even with volatile.  That is a
bit unsettling.

--
William Stacey [C# MVP]


Show quoteHide quote
"Barry Kelly" <barry.j.ke***@gmail.com> wrote in message
news:t4nu339s34tpgovpblna8ilkf2rb2r194e@4ax.com...
| William Stacey [C# MVP] wrote:
|
| > Here is some more food for the party:
| > http://msdn2.microsoft.com/en-us/library/ms686355.aspx
| >
| > Here he states the compiler will not re-order volatile variable access,
| > However these operations still *could be re-ordered by the processor.
|
| Volatile read has acquire semantics. Loads won't move before an acquire,
| similarly, writes won't move after a release. These should be
| implemented by CPU instructions, as explained in the article you linked
| to, and aren't restricted to just compiler optimizations. This means
| that volatile read on .NET is sufficient, because the CPU won't reorder
| such that the second load moves before the first load, because of the
| acquire semantics (aka read fence).
|
| >  In
| > fact, he shows correcting this issue with an Interlocked operation.
This
| > article also state that Interlocked functions *ensure appropriat
barriers
| > for memory ordering.
|
| The have full fence semantics (except the Interlocked*Acquire and
| Interlocked*Release).
|
| >  So (at least for this example) it would seem
| > Interlocked provides as good or better symatics as the volatile example
(and
| > also prevents CPU re-order).
|
| Volatile also prevents CPU re-order, but asymmetrically: read causes
| acquire, write causes release. The write side release means that e.g.
| "publishing" a new instance to a static volatile variable won't have a
| problem such that some fields in the instance aren't fully retired
| before other threads could read the instance reference - writes won't
| move ahead of the release / write fence.
|
| -- Barry
|
| --
| http://barrkel.blogspot.com/
Author
8 May 2007 1:15 AM
William Stacey [C# MVP]
After re-reading, I think I found it.  He was talking about VS2005
volatile -vs- VS2003 volatile - they seem to have implemented them better in
2005 using proper acquire and release semantics.  That is info I never saw
before.  The volatile code he shows works correctly with vs2005 compiler,
but may not with 2003 compiler, hence the alternative Interlocked example
given.  Makes more sense now.

--
William Stacey [C# MVP]


Show quoteHide quote
"William Stacey [C# MVP]" <william.sta***@gmail.com> wrote in message
news:epyqE0QkHHA.4516@TK2MSFTNGP03.phx.gbl...
| Thanks Barry.  My assertion has been that Interlocked works here, not that
| volatile does not.  One thing outstanding, however, is that little "blurb"
| about the cpu being able to reordering reads even with volatile.  That is
a
| bit unsettling.
|
| --
| William Stacey [C# MVP]
|
|
| "Barry Kelly" <barry.j.ke***@gmail.com> wrote in message
| news:t4nu339s34tpgovpblna8ilkf2rb2r194e@4ax.com...
|| William Stacey [C# MVP] wrote:
||
|| > Here is some more food for the party:
|| > http://msdn2.microsoft.com/en-us/library/ms686355.aspx
|| >
|| > Here he states the compiler will not re-order volatile variable access,
|| > However these operations still *could be re-ordered by the processor.
||
|| Volatile read has acquire semantics. Loads won't move before an acquire,
|| similarly, writes won't move after a release. These should be
|| implemented by CPU instructions, as explained in the article you linked
|| to, and aren't restricted to just compiler optimizations. This means
|| that volatile read on .NET is sufficient, because the CPU won't reorder
|| such that the second load moves before the first load, because of the
|| acquire semantics (aka read fence).
||
|| >  In
|| > fact, he shows correcting this issue with an Interlocked operation.
| This
|| > article also state that Interlocked functions *ensure appropriat
| barriers
|| > for memory ordering.
||
|| The have full fence semantics (except the Interlocked*Acquire and
|| Interlocked*Release).
||
|| >  So (at least for this example) it would seem
|| > Interlocked provides as good or better symatics as the volatile example
| (and
|| > also prevents CPU re-order).
||
|| Volatile also prevents CPU re-order, but asymmetrically: read causes
|| acquire, write causes release. The write side release means that e.g.
|| "publishing" a new instance to a static volatile variable won't have a
|| problem such that some fields in the instance aren't fully retired
|| before other threads could read the instance reference - writes won't
|| move ahead of the release / write fence.
||
|| -- Barry
||
|| --
|| http://barrkel.blogspot.com/
|
|
Author
7 May 2007 6:16 PM
Jon Skeet [C# MVP]
William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
> Here is some more food for the party:
> http://msdn2.microsoft.com/en-us/library/ms686355.aspx
>
> Here he states the compiler will not re-order volatile variable access,
> However these operations still *could be re-ordered by the processor.

Hmm. That sounds like it's a broken implementation then. Basically, if
the .NET CLR executes code in a way which violates the spec, the
implementation is flawed. If the processor could reorder things, the
CLR should make sure that that is invisible to the user, beyond what is
possible within the CLI spec.

I'd be interested to hear Joe Duffy's opinion on that part of the
article.

> In  fact, he shows correcting this issue with an Interlocked operation.  This
> article also state that Interlocked functions *ensure appropriat barriers
> for memory ordering.  So (at least for this example) it would seem
> Interlocked provides as good or better symatics as the volatile example (and
> also prevents CPU re-order).

No - because only the thread which *calls* the Interlocked method knows
that interlocked is involved, whereas with volatile both the reading
thread *and* the writing thread know to use memory barriers. Of course,
if you use Interlocked in both threads, to both read *and* write the
value, then everything will be okay.

> However, I am not sure any this matters for
> this example, as in both cases you still have the race issue.  R1 can read
> x, then W1 writes x and y, then R1 reads y - or various combinations of
> same. So some form of critical section around both x and y would seem to be
> the proper behavior, unless we don't care about x or y in respect to each
> other (i.e. simple counters that are not treated as a set).

If we care about atomicity, there needs to be some kind of locking.
If we only care that the change to y is seen after the change to x
(i.e. you can't see x=0, y=1) then volatile will do the job but I don't
believe that changing the variable with Interlocked and then reading it
directly in another thread is guaranteed to work.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Author
8 May 2007 12:43 AM
William Stacey [C# MVP]
| Hmm. That sounds like it's a broken implementation then. Basically, if
| the .NET CLR executes code in a way which violates the spec, the
| implementation is flawed. If the processor could reorder things, the
| CLR should make sure that that is invisible to the user, beyond what is
| possible within the CLI spec.
|
| I'd be interested to hear Joe Duffy's opinion on that part of the
| article.

I would like to see Joe comment on that as well.  If that article is
correct, it sounds like "volatile" may not work as people expect it to be
working.

| No - because only the thread which *calls* the Interlocked method knows
| that interlocked is involved, whereas with volatile both the reading
| thread *and* the writing thread know to use memory barriers. Of course,
| if you use Interlocked in both threads, to both read *and* write the
| value, then everything will be okay.

The reader does not need to know as it gets the read fence automatically (at
the hardware layer) because the Interlocked buts up a *full memory barrier
fence so reads can not be reordered at the instruction level.  vista has
some other versions with seperate read aquire/write aquire methods, but the
ones in the framework use the full fence.  I am not sure how volatile is
implemented under the covers, but it would not suprise me if it also uses an
interlocked operation.

| If we care about atomicity, there needs to be some kind of locking.
| If we only care that the change to y is seen after the change to x
| (i.e. you can't see x=0, y=1) then volatile will do the job but I don't
| believe that changing the variable with Interlocked and then reading it
| directly in another thread is guaranteed to work.

hmm.  We have seem to cover the same ground.  Everything I have seen (more
then that link) says it does.  I would be interested if you find something
to contrary as that would be good info.  Also, most (if not all) locks are
based on Interlocked in their guts, so if it does not work, then everything
in the history of the world is broke. Ok, that is bit drastic...  tia

--
William Stacey [C# MVP]
Author
8 May 2007 1:16 AM
William Stacey [C# MVP]
|| I'd be interested to hear Joe Duffy's opinion on that part of the
|| article.

I posted correction in thread above.  It seems the issue with volatile was
with vs2003, not vs2005.

--wjs
Author
8 May 2007 5:44 AM
Jon Skeet [C# MVP]
William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
>
> | Hmm. That sounds like it's a broken implementation then. Basically, if
> | the .NET CLR executes code in a way which violates the spec, the
> | implementation is flawed. If the processor could reorder things, the
> | CLR should make sure that that is invisible to the user, beyond what is
> | possible within the CLI spec.
> |
> | I'd be interested to hear Joe Duffy's opinion on that part of the
> | article.
>
> I would like to see Joe comment on that as well.  If that article is
> correct, it sounds like "volatile" may not work as people expect it to be
> working.

Well, not just how people expect it to work, but how it's *specified*
to work.

I've now seen your post about it basically being a potential bug in the
..NET 1.1 CLR.

> | No - because only the thread which *calls* the Interlocked method knows
> | that interlocked is involved, whereas with volatile both the reading
> | thread *and* the writing thread know to use memory barriers. Of course,
> | if you use Interlocked in both threads, to both read *and* write the
> | value, then everything will be okay.
>
> The reader does not need to know as it gets the read fence automatically (at
> the hardware layer) because the Interlocked buts up a *full memory barrier
> fence so reads can not be reordered at the instruction level.

But the read could *already* have been reordered by the JIT, ages
before this occurs. How could the JIT know whether a variable would
*ever* be used with Interlocked, and avoid reordering?

>  vista has
> some other versions with seperate read aquire/write aquire methods, but the
> ones in the framework use the full fence.  I am not sure how volatile is
> implemented under the covers, but it would not suprise me if it also uses an
> interlocked operation.

But there's more to it than that - there's the restriction on JIT
reordering, and *that* can't be known about ahead of time.

> | If we care about atomicity, there needs to be some kind of locking.
> | If we only care that the change to y is seen after the change to x
> | (i.e. you can't see x=0, y=1) then volatile will do the job but I don't
> | believe that changing the variable with Interlocked and then reading it
> | directly in another thread is guaranteed to work.
>
> hmm.  We have seem to cover the same ground.  Everything I have seen (more
> then that link) says it does.  I would be interested if you find something
> to contrary as that would be good info.  Also, most (if not all) locks are
> based on Interlocked in their guts, so if it does not work, then everything
> in the history of the world is broke. Ok, that is bit drastic...  tia

I'm not saying that Interlocked doesn't work. I'm saying that it only
works if you use it consistently across all threads.

I'll try to get round to mailing Joe about it...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Author
8 May 2007 8:44 PM
William Stacey [C# MVP]
| But the read could *already* have been reordered by the JIT, ages
| before this occurs. How could the JIT know whether a variable would
| *ever* be used with Interlocked, and avoid reordering?

I assume it knows the same way it knows a var is volatile - by inspecting
the code as it compiles and keeping some state if it sees a var wrapped with
interlocked or volatile, it does not reordering. Not sure.  I am not sure
when and where is actually does these optimizations.  I think I read
something about tight local loops, but not sure.  It would be interesting to
see a good article on the subject.

| I'll try to get round to mailing Joe about it...

Thanks.  That would be great to see what he says.

--
William Stacey [C# MVP]
PCR concurrency library: www.codeplex.com/pcr
PSH Scripts Project www.codeplex.com/psobject
Author
8 May 2007 8:58 PM
Jon Skeet [C# MVP]
William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
> | But the read could *already* have been reordered by the JIT, ages
> | before this occurs. How could the JIT know whether a variable would
> | *ever* be used with Interlocked, and avoid reordering?
>
> I assume it knows the same way it knows a var is volatile - by inspecting
> the code as it compiles and keeping some state if it sees a var wrapped with
> interlocked or volatile, it does not reordering. Not sure.  I am not sure
> when and where is actually does these optimizations.  I think I read
> something about tight local loops, but not sure.  It would be interesting to
> see a good article on the subject.

When the JIT has to create native code for one method, it might not
even have loaded all the assemblies which will manipulate the same
variables as that method does. I can't see that it could *possibly*
know whether or not some code, somewhere, is going to use it with the
Interlocked class.

Compare that with finding out whether or not the variable is volatile -
you look in a single place, and that's it.

> | I'll try to get round to mailing Joe about it...
>
> Thanks.  That would be great to see what he says.

Yeah. This relies on me finding the time, of course :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Author
9 May 2007 10:47 AM
Jon Skeet [C# MVP]
Jon Skeet [C# MVP] <sk***@pobox.com> wrote:

<snip>

> > | I'll try to get round to mailing Joe about it...
> >
> > Thanks.  That would be great to see what he says.
>
> Yeah. This relies on me finding the time, of course :)

I've found the time:

http://msmvps.com/blogs/jon.skeet/archive/2007/05/09/non-volatile-
reads-and-interlocked-and-how-they-interact.aspx

All comments welcome. I'm sorry that it's clearly written from my
viewpoint rather than that of an impartial observer - I hope I haven't
misrepresented any views. I'm happy to edit where appropriate.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Author
10 May 2007 9:06 PM
William Stacey [C# MVP]
"Ecma-335:2006
....
12.6.5

5. Explicit atomic operations... These operations (e.g. Increment,
Decrement, Exchange, and CompareExchange) perform implicit acquire/release
operations. ..."

So by my read, this contract would require the CLI to respect the same
optimization rules as volatile.  System.Thread.MemoryBarrier would seem to
have to respect proper order too even if the vars are not decorated with
volatile. Do you read it different?  Thanks.

--
William Stacey [C# MVP]

Show quoteHide quote
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MPG.20abb72ef1c3a592b6@msnews.microsoft.com...
| Jon Skeet [C# MVP] <sk***@pobox.com> wrote:
|
| <snip>
|
| > > | I'll try to get round to mailing Joe about it...
| > >
| > > Thanks.  That would be great to see what he says.
| >
| > Yeah. This relies on me finding the time, of course :)
|
| I've found the time:
|
| http://msmvps.com/blogs/jon.skeet/archive/2007/05/09/non-volatile-
| reads-and-interlocked-and-how-they-interact.aspx
|
| All comments welcome. I'm sorry that it's clearly written from my
| viewpoint rather than that of an impartial observer - I hope I haven't
| misrepresented any views. I'm happy to edit where appropriate.
|
| --
| Jon Skeet - <sk***@pobox.com>
| http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
| If replying to the group, please do not mail me too
Author
10 May 2007 9:31 PM
Jon Skeet [C# MVP]
William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
> "Ecma-335:2006
> ...
> 12.6.5
>
> 5. Explicit atomic operations... These operations (e.g. Increment,
> Decrement, Exchange, and CompareExchange) perform implicit acquire/release
> operations. ..."
>
> So by my read, this contract would require the CLI to respect the same
> optimization rules as volatile.  System.Thread.MemoryBarrier would seem to
> have to respect proper order too even if the vars are not decorated with
> volatile. Do you read it different?  Thanks.

The operations on Interlocked *do* perform implicit acquire/release
operations - but the *reading* operations don't. That's the problem.

To put it an alternative way, it is (in some ways - not all) a bit like
acquiring a lock while writing a value, but not acquiring the same lock
when reading. Yes, you've made sure that the data is available to be
read with the memory barrier on the writing thread, but you haven't
made sure that the reading thread actually performs the read when you
expect it to.

Basically, both threads need a memory barrier in order to be effective.
Using Interlocked puts a memory barrier on the calling thread, but
that's all - IMO :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Author
11 May 2007 2:17 AM
William Stacey [C# MVP]
| Basically, both threads need a memory barrier in order to be effective.
| Using Interlocked puts a memory barrier on the calling thread, but
| that's all - IMO :)

Hi Jon.  Two issues here I think:
1) Interlocked and barrier.  Interlocked does provide a full memory barrier.
At least from the docs, this is clear.
2) Does it prevent reads from being reordered by the CLI?  This is not as
clear.  It seems that if it implicitly expresses volatile read/write as it
claims in ECMA, then it would also prevent such optimizations. Same way you
would expect Thread.MemoryBarrier() to do.  IIRC, these CLI optimization
rules must be very localized and are very strict.  They can't just do them
all over the place.  They can only do them them when they *know there can be
no side effects.

http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx
"We've spent a lot of time thinking about what the correct memory model for
the CLR should be.  If I had to guess, we're going to switch from the ECMA
model to the following model.  I think that we will try to persuade other
CLI implementations to adopt this same model, and that we will try to change
the ECMA specification to reflect this.



  1.. Memory ordering only applies to locations which can be globally
visible or locations that are marked volatile.  Any locals that are not
address exposed can be optimized without using memory ordering as a
constraint since these locations cannot be touched by multiple threads in
parallel.
  2.. Non-volatile loads can be reordered freely.
  3.. Every store (regardless of volatile marking) is considered a release.
  4.. Volatile loads are considered acquire.
  5.. Device oriented software may need special programmer care.  Volatile
stores are still required for any access of device memory.  This is
typically not a concern for the managed developer."
This was wrote back in May 2003, and don't know if this made it into 2.0 or
latter or at all.  But he seems to touch on this very issue we are talking
about.  Order changes can only be applied to locals that are not address
exposed and do not have some volatile symatics around them.  As interlocked
and MemoryBarrier has implicit volatile contract, I would assume CLI
inspection would have to honor those same as with volatile keyword.  Maybe
not.  Would be nice to get a clearer writing on this.

--
William Stacey [C# MVP]
Author
11 May 2007 3:19 AM
Barry Kelly
William Stacey [C# MVP] wrote:

I think we're starting to lose some context. Here's your original reply
to Chris:

William wrote:
> Chris wrote:
> | ... but, if I just want to read the value, I CANNOT do:
> |
> |    int myValue = _firstTime;
> | or
> |    if (_fistTime==0)
> |
> | Doing these requires a memory barrier of some type (volatile variable,
> | monitor, etc) to have any degree of reliability.
>
> you can.  The Interlocked write ensures this across all cpus and cache.

This is the basic point that Jon (and I, and the standard, and Chris
Brumme's post that you quote) disagree with you on.

An interlocked operation is a full fence (read and write barrier, both
acquire and release) on the thread that it was *executed* on. It doesn't
affect reads from *other* threads.

> | Basically, both threads need a memory barrier in order to be effective.
> | Using Interlocked puts a memory barrier on the calling thread, but
> | that's all - IMO :)
>
> Hi Jon.  Two issues here I think:
> 1) Interlocked and barrier.  Interlocked does provide a full memory barrier.
> At least from the docs, this is clear.
> 2) Does it prevent reads from being reordered by the CLI?  This is not as
> clear.

There is no way it could perform its advertised purpose if it didn't
prevent related reads moving before it. The CLI needs to pass the
location of the target to the interlocked operation, so it can't have
related values cached in (e.g.) a register or it would be plainly faulty
with only a *single* thread of operation. Since that discounts compiler
optimizations that would move related reads before the interlocked
operation, the only thing left is CPU semantics - and that's what the
Interlocked operation guarantees, it guarantees a full fence for the
calling thread.

Show quoteHide quote
> http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx :
> [extra quoted for clarity -- Barry]
> > "We've spent a lot of time thinking about what the correct memory model for
> > the CLR should be.  If I had to guess, we're going to switch from the ECMA
> > model to the following model.  I think that we will try to persuade other
> > CLI implementations to adopt this same model, and that we will try to change
> > the ECMA specification to reflect this.
> >
> >   1.. Memory ordering only applies to locations which can be globally
> > visible or locations that are marked volatile.  Any locals that are not
> > address exposed can be optimized without using memory ordering as a
> > constraint since these locations cannot be touched by multiple threads in
> > parallel.
> >   2.. Non-volatile loads can be reordered freely.
> >   3.. Every store (regardless of volatile marking) is considered a release.
> >   4.. Volatile loads are considered acquire.
> >   5.. Device oriented software may need special programmer care.  Volatile
> > stores are still required for any access of device memory.  This is
> > typically not a concern for the managed developer."

> This was wrote back in May 2003, and don't know if this made it into 2.0 or
> latter or at all.  But he seems to touch on this very issue we are talking
> about.

Interlocked operations pass the address (reference to the location), so
they are "address exposed" and therefore the read can't move before the
interlocked operation. Volatile semantics aren't required for this; in
fact, not even threads are required for this. An optimization that moved
the read before the interlocked call (or any other call by reference) is
wrong in every case.

> Order changes can only be applied to locals that are not address
> exposed and do not have some volatile symatics around them.

That's not what Chris Brumme wrote. Instead, he writes:

> Chris Brumme (in post above) wrote:
> >   1.. Memory ordering only applies to locations which can be globally
> > visible or locations that are marked volatile.  Any locals that are not
> > address exposed can be optimized without using memory ordering as a
> > constraint since these locations cannot be touched by multiple threads in
> > parallel.

That is to say, order changes are irrelevant (and in fact, can't even be
detected, if the compiler has no bugs) for locals that are not address
exposed, since only a single thread of execution can ever see their
values. That is to say, he says almost exactly the *opposite* of what
you said - that order changes can *only* be detected, are *only*
relevant, for locations that are globally visible, address exposed or
marked volatile.

> As interlocked
> and MemoryBarrier has implicit volatile contract, I would assume CLI
> inspection would have to honor those same as with volatile keyword.  Maybe
> not.  Would be nice to get a clearer writing on this.

All this is irrelevant to your original point at the root of the thread
though, in that, contrary to what you claim, a memory barrier is still
required for reads from threads *other* than the one that performed the
interlocked operation, if you're going to guarantee that you read what's
really there and not some stale value.

-- Barry

Author
11 May 2007 5:14 AM
William Stacey [C# MVP]
| All this is irrelevant to your original point at the root of the thread
| though, in that, contrary to what you claim, a memory barrier is still
| required for reads from threads *other* than the one that performed the
| interlocked operation, if you're going to guarantee that you read what's
| really there and not some stale value.

Hi Barry.  I address this part first as it is key to everything else.  There
*is a full fence on interlocked methods (save the newer win32 ones like
InterlockedIncrementAcquire that provide acquire and release semantics).  It
is *gaureenteed* by the OS and hw.  It is documented in many places (see
some below). There can never be stale data, as hw ensures this across all
cpus and caches.  After an Interlock completes, any and all threads will see
only the last data.  This is one reason why it is a relatively expensive
call.  Where is it documented there is not a full barrier?

Here is some linq:
http://msdn2.microsoft.com/en-us/library/ms684122.aspx
http://www.gamasutra.com/features/20060630/paquet_01.shtml
http://blogs.msdn.com/oldnewthing/archive/2004/05/28/143769.aspx
http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx

--
William Stacey [C# MVP]
Author
11 May 2007 6:41 AM
Jon Skeet [C# MVP]
William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
Show quoteHide quote
> | All this is irrelevant to your original point at the root of the thread
> | though, in that, contrary to what you claim, a memory barrier is still
> | required for reads from threads *other* than the one that performed the
> | interlocked operation, if you're going to guarantee that you read what's
> | really there and not some stale value.
>
> Hi Barry.  I address this part first as it is key to everything else.  There
> *is a full fence on interlocked methods (save the newer win32 ones like
> InterlockedIncrementAcquire that provide acquire and release semantics).  It
> is *gaureenteed* by the OS and hw.  It is documented in many places (see
> some below). There can never be stale data, as hw ensures this across all
> cpus and caches.  After an Interlock completes, any and all threads will see
> only the last data.  This is one reason why it is a relatively expensive
> call.  Where is it documented there is not a full barrier?

There *is* a full barrier, *within that thread*. Not within all others.

Any thread which actually *performs* a read after the Interlock
completes will indeed see the new value. The point is that with
reordering in the reading thread, there's no guarantee that it will
make any attempt to read the value after the Interlock completes - it
could have read it before the Interlocked operation executes, even if a
value is later read which looks like it should have been read
beforehand. Suppose the source code is:

Thread 1             
int b = y;
int a = x;

Thread 2
Interlocked.Increment (ref x);
Interlocked.Increment (ref y);

The first thread's code can be reordered by the JIT, and then executed
as:

Thread 1                   Thread 2
int a = x;
                           Interlocked.Increment (ref x);
                           Interlocked.Increment (ref y);
int b = y;

Note the reordering. The guarantee that threads reading after
Interlocked.Increment see the new values is still intact - it's just
that x has been read before y, which appears to be wrong according to
the original source code.

This is because there aren't any memory barriers in the reading thread.


Could you point to where in the ECMA spec it says anything about a
memory barrier on one thread affecting what is possible in terms of
reordering on another thread?

It's important to understand that this *isn't* just a case of hardware
caches and memory fences etc - the JIT itself can reorder the reads,
which is what I've been trying to show above.

I don't see anything in those pages to prevent the JIT from reordering
the reads on the reading thread. Half of them aren't even about .NET,
which means they won't be addressing the JIT-reordering issue at all.

I haven't heard back from Joe yet - I might mail Chris Brumme to see if
he'd care to comment.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Author
11 May 2007 3:51 PM
William Stacey [C# MVP]
| There *is* a full barrier, *within that thread*. Not within all others.

What would be the point of a full barrier if it did not work across threads?
You would not need it for a single thread.  The full barrier ensures the cpu
can not reorder reads and all reads will see the last write.  But, I think
your talking about reorder by the CLI as below.


Show quoteHide quote
| Any thread which actually *performs* a read after the Interlock
| completes will indeed see the new value. The point is that with
| reordering in the reading thread, there's no guarantee that it will
| make any attempt to read the value after the Interlock completes - it
| could have read it before the Interlocked operation executes, even if a
| value is later read which looks like it should have been read
| beforehand. Suppose the source code is:
|
| Thread 1
| int b = y;
| int a = x;
|
| Thread 2
| Interlocked.Increment (ref x);
| Interlocked.Increment (ref y);
|
| The first thread's code can be reordered by the JIT, and then executed
| as:
|
| Thread 1                   Thread 2
| int a = x;
|                           Interlocked.Increment (ref x);
|                           Interlocked.Increment (ref y);
| int b = y;
|
| Note the reordering. The guarantee that threads reading after
| Interlocked.Increment see the new values is still intact - it's just
| that x has been read before y, which appears to be wrong according to
| the original source code.
|
| This is because there aren't any memory barriers in the reading thread.

I understand your point.  I am playing devils advocate with myself trying to
find proof either way.  But, by my read (reading thru the sparse lines) the
JIT is *not allowed to make this optimization here.  It can only reorder
when:
1) There is no volatile
2) The var is not address exposed (e.g. local).
3) There is no other explicit or implicit memory barriers protecting the var
(i.e. Thread.MemoryBarrier(), Interlocked, Monitor, etc)

So this would fail test #3 and not allow reorder by JIT.  The ecma I posted
hints at this, but is far from detailed on this.  Chris B talks more about
this, but don't know when/if he implemented it in the CLR.  Naturally, I
could be nuts.  It would not be the first time.  Thank you.

--
William Stacey [C# MVP]
Author
11 May 2007 7:31 PM
Jon Skeet [C# MVP]
William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
> | There *is* a full barrier, *within that thread*. Not within all others.
>
> What would be the point of a full barrier if it did not work across threads?

The same point as locking has. Both threads need to try to acquire a
lock before it becomes particularly useful, but I don't think you'd
argue that it *isn't* useful.

> You would not need it for a single thread.  The full barrier ensures the cpu
> can not reorder reads and all reads will see the last write.

.... if they perform a genuine read *after* that point, yes.

> But, I think your talking about reorder by the CLI as below.

Yes - although from the memory model point of view (which doesn't know
about CPU fences) it's irrelevant. The memory model makes certain
guarantees without saying whether the optimisations which can be
performed are due to the JIT or the CPU.

<snip>

Show quoteHide quote
> | Note the reordering. The guarantee that threads reading after
> | Interlocked.Increment see the new values is still intact - it's just
> | that x has been read before y, which appears to be wrong according to
> | the original source code.
> |
> | This is because there aren't any memory barriers in the reading thread.
>
> I understand your point.  I am playing devils advocate with myself trying to
> find proof either way.  But, by my read (reading thru the sparse lines) the
> JIT is *not allowed to make this optimization here.  It can only reorder
> when:
> 1) There is no volatile
> 2) The var is not address exposed (e.g. local).
> 3) There is no other explicit or implicit memory barriers protecting the var
> (i.e. Thread.MemoryBarrier(), Interlocked, Monitor, etc)

You seem to be treating these as an "and" whereas it's not - otherwise
you could never have reordering of *any* non-local reads.

I agree with 1. I believe you're misreading the spec on point 2 - it's
not a condition, it's just that *any* reads/writes can be reordered.

> So this would fail test #3 and not allow reorder by JIT.  The ecma I posted
> hints at this, but is far from detailed on this.  Chris B talks more about
> this, but don't know when/if he implemented it in the CLR.  Naturally, I
> could be nuts.  It would not be the first time.  Thank you.

I don't see where it hints at this at all. Out of the 5 rules you
mentioned in the spec, the important one is this: "Non-volatile loads
can be reordered freely."

The statement:

int a = x;

is *not* a volatile load when x isn't a volatile variable. Just because
some other thread may or may not perform an interlocked operation
doesn't make loading it volatile. I think that's the crux of our
disagreement.

So, having reduced it to this (if you agree with me on that much) can
you provide any indication that accessing a variable at any point on
any thread with an Interlocked operation makes every load of that
variable on every thread count as being volatile? I haven't seen any
evidence of that.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Author
11 May 2007 9:26 PM
Barry Kelly
William Stacey [C# MVP] wrote:

> | There *is* a full barrier, *within that thread*. Not within all others.
>
> What would be the point of a full barrier if it did not work across threads?

A full barrier is a combination of both a read barrier and a write
barrier.

A write barrier makes sure that all pending writes are retired, and is
useful just before publishing a value, so that when it is read on a
different thread / CPU, there are no pending writes that could catch any
readers by surprise if they were to end up being retired a little later
(out of order). The hardware-level concept of a write barrier affects
only the CPU executing the write barrier, ensuring that all pending
writes are retired before the publishing the final value. It's an
ordering primitive for that CPU. Yes it does affect other threads, but
only in so far as what they can observe from memory, in conjunction with
the *other* half of the equation - a read barrier.

A read barrier makes sure that no reads / prefetches / what have you
have snuck before retrieve a particular value from a location, a value
that you expect to have been published by another thread. The read
barrier only affects the CPU performing the read. It's directly
complementary to a write barrier on the publishing thread, it ensures
that you read what you think you read, and not what the CPU inferred you
we're going to indirectly read by (e.g.) sneaking ahead and peeking at
the instruction stream.

Read and write barriers are complementary. Write barriers stop writes
moving forward in time, while read barriers stop reads moving backwards
in time. If you don't have balanced write and read barriers on threads
that are communicating via shared memory without locks, then their reads
and writes will overlap, and consequently linear, intuitive, von Neumann
expectations of invariants will be broken [unless of course, a finite
state machine model for all possible combinations have been worked out
and tested, see e.g. Cliff Click's work on a lock-free hash table for
Java:

http://blogs.azulsystems.com/cliff/2007/05/hardware_visibi.html

..]

Again, a read barrier only affects the thread / CPU that it's executed
on. It doesn't affect any writers on other threads. And read barriers of
some kind (even the implicit ones with 'lock') aren't optional even when
the writing thread is using volatile / interlocked / write barriers /
other mechanisms.

> You would not need it for a single thread.

Barriers aren't needed for single threads because then there's only one
thread context, one CPU, and the OS would have responsibility for any
read / write barriers before context switching into / out of your
application's thread of control, since there's no concurrency possible
with only a single thread.

> The full barrier ensures the cpu
> can not reorder reads and all reads will see the last write.  But, I think
> your talking about reorder by the CLI as below.

Memory barriers / fences only affect ordering of reads and writes of the
thread / CPU they were executed on, though.


> I understand your point.  I am playing devils advocate with myself trying to
> find proof either way.  But, by my read (reading thru the sparse lines) the
> JIT is *not allowed to make this optimization here.  It can only reorder
> when:
> 1) There is no volatile
> 2) The var is not address exposed (e.g. local).
>> 3) There is no other explicit or implicit memory barriers protecting the var
> (i.e. Thread.MemoryBarrier(), Interlocked, Monitor, etc)

I believe you're misreading something big. Quoting from Chris Brumme's
post:

http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx:
>    1.  Memory ordering only applies to locations which
> can be globally visible or locations that are marked
> volatile.  Any locals that are not address exposed can be
> optimized without using memory ordering as a constraint
> since these locations cannot be touched by multiple threads
> in parallel.

This implies that loads and stores can be reordered freely when its
non-exposed locals being dealt with. ("Reordered freely" is always
subject to correctness on a single-threaded basis, of course, so there
are limits to this reordering.)

>    2. Non-volatile loads can be reordered freely.

Now here's something that you seem to have missed: any and all reads
(loads) that aren't volatile can be reordered *freely*, even though the
location being loaded *isn't* local.

>    3. Every store (regardless of volatile marking) is considered a release.

This was an upgrade in the CLR's memory model over the ECMA memory
model, IIRC, though Intel x86 architecture keeps it ok, but it did need
specifying for IA64. It implies that a write barrier isn't required when
publishing a value, because the CLR (but not ECMA) will handle that
situation automatically.

>    4. Volatile loads are considered acquire.

And by contrast, from point 2, non-volatile loads are not considered
acquire, so they can freely move backwards in time. This is what makes
your original post to the thread incorrect.

>    5. Device oriented software may need special programmer care.

I've little to say about devices in CLR :)

> So this would fail test #3 and not allow reorder by JIT.

I believe the rules you write are mistaken and not supported by the
standards or links you post, and in any case, interlocked operations
only affect a *single* thread, not all threads.

-- Barry

Author
11 May 2007 7:50 PM
Ben Voigt
> cpus and caches.  After an Interlock completes, any and all threads will
> see
> only the last data.  This is one reason why it is a relatively expensive
> call.  Where is it documented there is not a full barrier?

It is a full memory barrier.  That means that the new value is actually in
RAM, coherent across all processors (some architectures might choose write
snooping instead of invalidating caches in other cores) before the
Interlocked call returns.  This falls far short of forcing other threads to
"see the last data", because another thread could have cached the value in a
CPU register (hoisting a complex expression outside a loop, anyone?).
Access to the memory address is guaranteed to yield the new value, but only
volatile variables result in memory being addressed on every use of the
variable name.  Otherwise the compiler can perform alias analysis, determine
that the reading thread can't change the variable in the interim, and thus
uses the cached value.
Author
11 May 2007 7:13 PM
Jon Skeet [C# MVP]
William Stacey [C# MVP] <william.sta***@gmail.com> wrote:

<snip>

>   2.. Non-volatile loads can be reordered freely.

And that's the absolutely key one. As far as the reading thread is
concerned, it's performing non-volatile loads. It doesn't know that an
Interlocked operation will occur anywhere else, and it's not loading a
volatile variable - therefore it's a non-volatile load, and the
operations can be reordered as shown in the post I've just made.

<snip>

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Author
11 May 2007 7:59 PM
Chris Mullins [MVP]
There's quite a bit of dicusssion going on with regard to memory models and
interlocked writes and standard reads.

I figured I would write a little program and see if I can make the problem
show up.
The sample app I wrote below does, every once in a while, go "Boom".

This means the memory model is indeed reordering the reads of test and
test2.

My environment for running this test is:
Dual-Core AMD Athlon
Windows XP x64
Compiled in Release Mode, using "Any CPU"
Running without a debugger attached.

I've run the test using "Any CPU", "x86" and "x64" as specific compile
targets. Same result each time.

Sometimes it runs for a long time without failing, other times I get a ton
of failures. For example:
....
Boom : 287884447, 287884446
Boom : 287885420, 287885419
Boom : 287885780, 287885779
Boom : 287886016, 287886015
Boom : 287887440, 287887439
Boom : 287887798, 287887797



using System;
using System.Collections.Generic;
using System.Text;
using System.Threading;

namespace ConsoleApplication4
{
    class Program
    {
        static long test = 0;
        static long test2 = 0;

        static void Main(string[] args)
        {
            Thread t1 = new Thread(ReadThreadProc);
            Thread t2 = new Thread(ReadThreadProc);
            Thread t3 = new Thread(WriteThreadProc);

            t1.IsBackground = true;
            t2.IsBackground = true;
            t3.IsBackground = true;

            t1.Start();
            t2.Start();
            t3.Start();

            Console.WriteLine("Press any key to exit");
            Console.ReadLine();

        }

        private static void WriteThreadProc()
        {
            while (true)
            {
                Interlocked.Increment(ref test);
                Interlocked.Increment(ref test2);
            }
        }

        private static void ReadThreadProc()
        {
            while (true)
            {
                long v1 = test;
                long v2 = test2;

                if (v1 > v2)
                    Console.WriteLine("Boom : {0}, {1}", v1, v2);
            }
        }
    }
}


--
Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP
http://www.coversant.com/blogs/cmullins
Author
11 May 2007 8:12 PM
Chris Mullins [MVP]
There's a (very obvious) bug in my test.

Please ignore that last post...

--
Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP
http://www.coversant.com/blogs/cmullins

Show quoteHide quote
"Chris Mullins [MVP]" <cmull***@yahoo.com> wrote in message
news:eJAdpbAlHHA.4852@TK2MSFTNGP03.phx.gbl...
> There's quite a bit of dicusssion going on with regard to memory models
> and interlocked writes and standard reads.
>
> I figured I would write a little program and see if I can make the problem
> show up.
> The sample app I wrote below does, every once in a while, go "Boom".
>
> This means the memory model is indeed reordering the reads of test and
> test2.
>
> My environment for running this test is:
> Dual-Core AMD Athlon
> Windows XP x64
> Compiled in Release Mode, using "Any CPU"
> Running without a debugger attached.
>
> I've run the test using "Any CPU", "x86" and "x64" as specific compile
> targets. Same result each time.
>
> Sometimes it runs for a long time without failing, other times I get a ton
> of failures. For example:
> ...
> Boom : 287884447, 287884446
> Boom : 287885420, 287885419
> Boom : 287885780, 287885779
> Boom : 287886016, 287886015
> Boom : 287887440, 287887439
> Boom : 287887798, 287887797
>
>
>
> using System;
> using System.Collections.Generic;
> using System.Text;
> using System.Threading;
>
> namespace ConsoleApplication4
> {
>    class Program
>    {
>        static long test = 0;
>        static long test2 = 0;
>
>        static void Main(string[] args)
>        {
>            Thread t1 = new Thread(ReadThreadProc);
>            Thread t2 = new Thread(ReadThreadProc);
>            Thread t3 = new Thread(WriteThreadProc);
>
>            t1.IsBackground = true;
>            t2.IsBackground = true;
>            t3.IsBackground = true;
>
>            t1.Start();
>            t2.Start();
>            t3.Start();
>
>            Console.WriteLine("Press any key to exit");
>            Console.ReadLine();
>
>        }
>
>        private static void WriteThreadProc()
>        {
>            while (true)
>            {
>                Interlocked.Increment(ref test);
>                Interlocked.Increment(ref test2);
>            }
>        }
>
>        private static void ReadThreadProc()
>        {
>            while (true)
>            {
>                long v1 = test;
>                long v2 = test2;
>
>                if (v1 > v2)
>                    Console.WriteLine("Boom : {0}, {1}", v1, v2);
>            }
>        }
>    }
> }
>
>
> --
> Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP
> http://www.coversant.com/blogs/cmullins
>
>
>
Author
11 May 2007 8:41 PM
Jon Skeet [C# MVP]
Chris Mullins [MVP] <cmull***@yahoo.com> wrote:
> There's a (very obvious) bug in my test.
>
> Please ignore that last post...

If it's any consolation, I made the exact same mistake when I first
made the blog post :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Author
12 May 2007 11:29 AM
Stelrad Kypski
not obvious to me. what is it?

Show quoteHide quote
"Chris Mullins [MVP]" <cmull***@yahoo.com> wrote in message
news:unrnxiAlHHA.4904@TK2MSFTNGP05.phx.gbl...
> There's a (very obvious) bug in my test.
>
> Please ignore that last post...
>
> --
> Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP
> http://www.coversant.com/blogs/cmullins
>
> "Chris Mullins [MVP]" <cmull***@yahoo.com> wrote in message
> news:eJAdpbAlHHA.4852@TK2MSFTNGP03.phx.gbl...
>> There's quite a bit of dicusssion going on with regard to memory models
>> and interlocked writes and standard reads.
>>
>> I figured I would write a little program and see if I can make the
>> problem show up.
>> The sample app I wrote below does, every once in a while, go "Boom".
>>
>> This means the memory model is indeed reordering the reads of test and
>> test2.
>>
>> My environment for running this test is:
>> Dual-Core AMD Athlon
>> Windows XP x64
>> Compiled in Release Mode, using "Any CPU"
>> Running without a debugger attached.
>>
>> I've run the test using "Any CPU", "x86" and "x64" as specific compile
>> targets. Same result each time.
>>
>> Sometimes it runs for a long time without failing, other times I get a
>> ton of failures. For example:
>> ...
>> Boom : 287884447, 287884446
>> Boom : 287885420, 287885419
>> Boom : 287885780, 287885779
>> Boom : 287886016, 287886015
>> Boom : 287887440, 287887439
>> Boom : 287887798, 287887797
>>
>>
>>
>> using System;
>> using System.Collections.Generic;
>> using System.Text;
>> using System.Threading;
>>
>> namespace ConsoleApplication4
>> {
>>    class Program
>>    {
>>        static long test = 0;
>>        static long test2 = 0;
>>
>>        static void Main(string[] args)
>>        {
>>            Thread t1 = new Thread(ReadThreadProc);
>>            Thread t2 = new Thread(ReadThreadProc);
>>            Thread t3 = new Thread(WriteThreadProc);
>>
>>            t1.IsBackground = true;
>>            t2.IsBackground = true;
>>            t3.IsBackground = true;
>>
>>            t1.Start();
>>            t2.Start();
>>            t3.Start();
>>
>>            Console.WriteLine("Press any key to exit");
>>            Console.ReadLine();
>>
>>        }
>>
>>        private static void WriteThreadProc()
>>        {
>>            while (true)
>>            {
>>                Interlocked.Increment(ref test);
>>                Interlocked.Increment(ref test2);
>>            }
>>        }
>>
>>        private static void ReadThreadProc()
>>        {
>>            while (true)
>>            {
>>                long v1 = test;
>>                long v2 = test2;
>>
>>                if (v1 > v2)
>>                    Console.WriteLine("Boom : {0}, {1}", v1, v2);
>>            }
>>        }
>>    }
>> }
>>
>>
>> --
>> Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP
>> http://www.coversant.com/blogs/cmullins
>>
>>
>>
>
>
Author
12 May 2007 12:18 PM
Jon Skeet [C# MVP]
Stelrad Kypski <b***@blah.com> wrote:
> not obvious to me. what is it?

The reads are done in the same order as the increments, which means
that even without any reordering, it's quite possible to get any of the
four possibilities.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Author
11 May 2007 9:57 PM
Jos Scherders
Hi,

One of the posts in this very interesting threat referred to this article :

http://msdn2.microsoft.com:80/en-us/library/ms686355.aspx

This article includes the following piece of code (to fix the race
condition):

volatile int iValue;
volatile BOOL fValueHasBeenComputed = FALSE;
extern int ComputeValue();

void CacheComputedValue()
{
  if (!fValueHasBeenComputed)
  {
    iValue = ComputeValue();
    fValueHasBeenComputed = TRUE;
  }
}

BOOL FetchComputedValue(int *piResult)
{
  if (fValueHasBeenComputed)
  {
    *piResult = iValue;
    return TRUE;
  }

  else return FALSE;
}

My (complete ignorent) question is this:

Must fValueHasBeenComputed  really be declared volatile for the program to
produce correct results ? Isn't it true that , because iValue is declared
volatile, it is impossible that fValueHasBeenComputed = TRUE and the result
of ComputeValue() is not stored in iValue ?
Of course, because fValueHasBeenComputed is not declared volatile it may be
that FetchComputedValue() returns FALSE when in fact iValue does contain the
computed value but the other case can never occur. E.g. FetchComputedValue
returns TRUE and piResult is incorrect.

Are my assumptions correct ?

Jos

Bookmark and Share