|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
The Interlocked on the Edge of ForeverLet's say I've got a member variable (in a heavily threaded app): private int _firstTime = 0; To make changes to this is easy enough: InterlockedExchange(ref _firstTime, 1); .... or I can use the InterlockedCompareExchange: int oldValue = InterlockedCompareExchange(ref _firstTime, 1, 0); No problems here. .... but, if I just want to read the value, I CANNOT do: int myValue = _firstTime; or if (_fistTime==0) Doing these requires a memory barrier of some type (volatile variable, monitor, etc) to have any degree of reliability. Now, reading the value out using InterlockedCompareExchange always seemed a bit silly to me. I don't want to always write: int myValue = InterlockedCompareExchange(ref _firstTime, Int32.MinValue, Int32.MinValue); I've never found (although not for lack of looking): int myValue = InterlockedRead(ref _firstTime); Of the code that I write, reading the value, and branching based on it's value, is much more common than writing to the value. I have, in the past, used volitile variables for this - but I've stopped due to the various issues regarding volatiles (non atomic reads & writes). Is there a concensus on the best way to do this? It seems like all the solutions are.... incomplete. Essentially I'm looking for a volitile varaiable that is atomically read and written. Unfortuantly, these don't exist in any language I've yet used. -- Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP http://www.coversant.com/blogs/cmullins Chris Mullins [MVP] <cmull***@yahoo.com> wrote:
<snip> > Essentially I'm looking for a volitile varaiable that is atomically I'm not sure what you're after that a simple volatile *wouldn't* give > read and written. Unfortuantly, these don't exist in any language > I've yet used. you here, given that *any* Int32 variable will be accessed atomically, whether it's volatile or not. You can't do atomic increments without something like Interlocked, but simple reads and writes are atomic. -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too On Mon, 30 Apr 2007 13:13:47 -0700, Chris Mullins [MVP]
<cmull***@yahoo.com> wrote: > [...] Jon's reply matches what I believed to be true. Can you explain why it is > Of the code that I write, reading the value, and branching based on it's > value, is much more common than writing to the value. > > I have, in the past, used volitile variables for this - but I've stopped > due to the various issues regarding volatiles (non atomic reads & > writes). you believe that a non-atomic read or write is possible for a 32-bit value? If that's true, it's a pretty significant error in the way I've approached multi-threaded programming. Which is not out of the question by any means, but if that's the case I really want to know about it and know why it's the case. Pete Chris Mullins [MVP] wrote:
> I've got a quick question that's been bugging me for a long, long time: Yes.> > Let's say I've got a member variable (in a heavily threaded app): > private int _firstTime = 0; > Doing these requires a memory barrier of some type (volatile variable, > monitor, etc) to have any degree of reliability. > I've never found (although not for lack of looking): Perhaps Thread.VolatileRead() is what you're looking for?> int myValue = InterlockedRead(ref _firstTime); > I have, in the past, used volitile variables for this - but I've stopped due Can you expand on your objection to volatile variables?> to the various issues regarding volatiles (non atomic reads & writes). > Is there a concensus on the best way to do this? It seems like all the I'm not aware of any circumstances where normally aligned volatile> solutions are.... incomplete. > > Essentially I'm looking for a volitile varaiable that is atomically read and > written. Unfortuantly, these don't exist in any language I've yet used. variables of the machine word size or smaller are read or written non-atomically on the CLR. Can you explain more? -- Barry
Show quote
"Chris Mullins [MVP]" <cmull***@yahoo.com> wrote I don't see the need for a memory barrier, unless you need memory> I've got a quick question that's been bugging me for a long, long time: > > Let's say I've got a member variable (in a heavily threaded app): > private int _firstTime = 0; > > To make changes to this is easy enough: > InterlockedExchange(ref _firstTime, 1); > > ... or I can use the InterlockedCompareExchange: > int oldValue = InterlockedCompareExchange(ref _firstTime, 1, 0); > > No problems here. > > ... but, if I just want to read the value, I CANNOT do: > > int myValue = _firstTime; > or > if (_fistTime==0) > Why? > Doing these requires a memory barrier of some type (volatile variable, > monitor, etc) to have any degree of reliability. > ordering guarantees. Volatile access would make sense since you really want to read from memory. However, this only affects how the Jitter generates code. > Now, reading the value out using InterlockedCompareExchange always seemed This looks quite pointless. If you don't need memory ordering guarantees a> a bit silly to me. I don't want to always write: > > int myValue = InterlockedCompareExchange(ref _firstTime, Int32.MinValue, > Int32.MinValue); volatile read is good enough. I don't find the spec now, but I suspect InterlockedCompareExchange has probably both acquire and release semantics. > This is just a memory fence plus a volatile read. And it's not clear that > I've never found (although not for lack of looking): > int myValue = InterlockedRead(ref _firstTime); > you actually need the former. > Of the code that I write, reading the value, and branching based on it's Again volatile read is what you want.> value, is much more common than writing to the value. > > I have, in the past, used volitile variables for this - but I've stopped What makes you believe that either reads or writes are non-atomic?> due to the various issues regarding volatiles (non atomic reads & writes). > They certainly are at CLR instruction level. > Is there a concensus on the best way to do this? It seems like all the For a one-time init thing you will probably need atomic compare> solutions are.... incomplete. exchange for acquiring the construction lock and release semantics when signaling successful construction. If you need to wait for construction you'd probably have three states: - uninitialized - under_construction - constructed Checking whether the shared object is constructed is done by a volatile read from the guard variable. If the result is "constructed" you're done. Just a few cycles - maybe satisified from cache. If not, use atomic compare exchange to acquire the "under-construction" lock. If an other thread was faster wait. If not construct object and use store with release semantics to signal object constructed. I'm not sure if there is a really efficient Wait method (in this particular context) in the CLR. Anyway, the standard case (object already constructed) is very fast. > I think the CLI standard requires that to a certain extent. That said, the> Essentially I'm looking for a volitile varaiable that is atomically read > and written. Unfortuantly, these don't exist in any language I've yet > used. > text is vague. Bottom line I think you can rely on reads or writes being volatile, but not more complex operations that both read and write, e.g.: volatile int i; do_something_with(i); // ok i = some_value(); // ok i++; // bad no guarantees here But one might argue that this is really a language thing. -hg
Show quote
| To make changes to this is easy enough: you can. The Interlocked write ensures this across all cpus and cache. Any | InterlockedExchange(ref _firstTime, 1); | | ... or I can use the InterlockedCompareExchange: | int oldValue = InterlockedCompareExchange(ref _firstTime, 1, 0); | | No problems here. | | ... but, if I just want to read the value, I CANNOT do: | | int myValue = _firstTime; | or | if (_fistTime==0) | | Doing these requires a memory barrier of some type (volatile variable, | monitor, etc) to have any degree of reliability. thread reading the var after the Interlocked write will see the correct var so it is a memory barrier without a volatile modifier. All locks are ultimatley built on interlocked at the lowest level to get this behavior. So for reads, you can just go "if (var == 2){}". Naturally, this works for a single var. If you have multiple invariants you need to protect, you probabaly need a lock. William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
> | Doing these requires a memory barrier of some type (volatile variable, But surely that's only guaranteed if the read *actually* happens after > | monitor, etc) to have any degree of reliability. > > you can. The Interlocked write ensures this across all cpus and cache. Any > thread reading the var after the Interlocked write will see the correct var > so it is a memory barrier without a volatile modifier. the Interlocked write occurs. If the variable isn't volatile, there's nothing to stop the reading thread from having reordered the reads. For instance: int x=0, y=0; .... Thread 1: Interlocked.Increment (ref x, 1); Interlocked.Increment (ref y, 1); Thread 2: int a = x; int b = y; The JIT *could* reorder the reads of x and y in thread 2, such that you get a=0, b=1, despite the fact that x is incremented before y. That wouldn't be the case with volatile variables. At least, that's how I understand it. -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too | But surely that's only guaranteed if the read *actually* happens after Hi Jon. Not sure I totally follow you here. If Interlocked x happens, a | the Interlocked write occurs. If the variable isn't volatile, there's | nothing to stop the reading thread from having reordered the reads. For | instance: reader will never see x==0 (until it wraps). So a read on x or y is volatile using an interlocked. Naturally, in either method, a reader could read x and y after x was incremented, but before y was incremented because there is no lock here. But this is a different problem (i.e race). If we need both vars to be atomic in respect to each other and handled as an invariant "set", then we need a lock of some type protecting the set. As I said, maybe I am not following your intent and will slap my head. --wjs William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
> | But surely that's only guaranteed if the read *actually* happens after I still believe that it can read y before either of them are > | the Interlocked write occurs. If the variable isn't volatile, there's > | nothing to stop the reading thread from having reordered the reads. For > | instance: > > Hi Jon. Not sure I totally follow you here. If Interlocked x happens, a > reader will never see x==0 (until it wraps). So a read on x or y is volatile > using an interlocked. Naturally, in either method, a reader could read x and > y after x was incremented, but before y was incremented because there is no > lock here. incremented, and x after both of them are incremented. There's nothing to stop the JIT from reordering the reads to read y before x, because they're not volatile. I wouldn't be surprised to learn that the interlocked operation makes sure that when it *does* try to read x and y, those values are really the latest ones, but I don't think there's any guarantee about the order of those reads unless at least one of them is volatile. (It's nearly midnight, so I can't be bothered to work out which it should be - y, I think.) I definitely *wasn't* talking about the atomicity of the two operations - just the ordering. -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too So are you saying the JIT can reorder these two lines:
int a = x; int b = y; Like so: int b = y; int a = x; ? tia -- Show quoteWilliam Stacey [C# MVP] "Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message news:MPG.20a86b07bb54e61e7d@msnews.microsoft.com... | William Stacey [C# MVP] <william.sta***@gmail.com> wrote: | > | But surely that's only guaranteed if the read *actually* happens after | > | the Interlocked write occurs. If the variable isn't volatile, there's | > | nothing to stop the reading thread from having reordered the reads. For | > | instance: | > | > Hi Jon. Not sure I totally follow you here. If Interlocked x happens, a | > reader will never see x==0 (until it wraps). So a read on x or y is volatile | > using an interlocked. Naturally, in either method, a reader could read x and | > y after x was incremented, but before y was incremented because there is no | > lock here. | | I still believe that it can read y before either of them are | incremented, and x after both of them are incremented. There's nothing | to stop the JIT from reordering the reads to read y before x, because | they're not volatile. I wouldn't be surprised to learn that the | interlocked operation makes sure that when it *does* try to read x and | y, those values are really the latest ones, but I don't think there's | any guarantee about the order of those reads unless at least one of | them is volatile. (It's nearly midnight, so I can't be bothered to work | out which it should be - y, I think.) | | I definitely *wasn't* talking about the atomicity of the two operations | - just the ordering. | | -- | Jon Skeet - <sk***@pobox.com> | http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet | If replying to the group, please do not mail me too William Stacey [C# MVP] wrote:
> So are you saying the JIT can reorder these two lines: Yes, if x and y are not volatile locations. The language of the spec is> int a = x; > int b = y; > > Like so: > int b = y; > int a = x; that reads and writes to volatile locations are the only observable side effects of memory access; memory accesses may be arbitrarily reordered if they aren't volatile. For example, if the value of y has been loaded to a register because it was used somewhere before the line 'int a = x;', then the value of y used in the line 'int b = y;' may predate the actual in-memory value of x, because the value may be used from the register, rather than freshly loading it from memory. So, volatile can inhibit these kinds of optimizations (avoiding redundant loads). There's more to it than that, volatile reads have acquire semantics, volatile writes release semantics. That has stronger guarantees than just inhibiting optimization, as it affects e.g. write buffer reordering in some processors. -- Barry "Barry Kelly" <barry.j.ke***@gmail.com> wrote I see that in CLI spec, but have you actually verified it? It is not> There's more to it than that, volatile reads have acquire semantics, > volatile writes release semantics. That has stronger guarantees than > just inhibiting optimization, as it affects e.g. write buffer reordering > in some processors. what I would expect and not what I have seen. Oddly, I see that the Jitter even removes dead loads altogher. Anyway, given the rationale in the CLI spec (hardware register access), I don't see how this could possibly work in a reasonably cheap way for the P4 memory model. -hg William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
> So are you saying the JIT can reorder these two lines: Exactly, so long as there aren't any memory barriers in the way. Memory > int a = x; > int b = y; > > Like so: > int b = y; > int a = x; barriers (which include operations with volatile locations) stop the JIT from moving memory operations beyond the barrier. It's unfortunate that the CLI spec isn't nearly as clear as it might be about all this, but Vance Morrison wrote a great MSDN article about the ..NET 2.0 model (which is stricter than the CLI spec): http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/ -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too Here is some more food for the party:
http://msdn2.microsoft.com/en-us/library/ms686355.aspx Here he states the compiler will not re-order volatile variable access, However these operations still *could be re-ordered by the processor. In fact, he shows correcting this issue with an Interlocked operation. This article also state that Interlocked functions *ensure appropriat barriers for memory ordering. So (at least for this example) it would seem Interlocked provides as good or better symatics as the volatile example (and also prevents CPU re-order). However, I am not sure any this matters for this example, as in both cases you still have the race issue. R1 can read x, then W1 writes x and y, then R1 reads y - or various combinations of same. So some form of critical section around both x and y would seem to be the proper behavior, unless we don't care about x or y in respect to each other (i.e. simple counters that are not treated as a set). -- Show quoteWilliam Stacey [C# MVP] PCR concurrency library: www.codeplex.com/pcr PSH Scripts Project www.codeplex.com/psobject "Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message news:MPG.20a8e1115b0c872180@msnews.microsoft.com... | William Stacey [C# MVP] <william.sta***@gmail.com> wrote: | > So are you saying the JIT can reorder these two lines: | > int a = x; | > int b = y; | > | > Like so: | > int b = y; | > int a = x; | | Exactly, so long as there aren't any memory barriers in the way. Memory | barriers (which include operations with volatile locations) stop the | JIT from moving memory operations beyond the barrier. | | It's unfortunate that the CLI spec isn't nearly as clear as it might be | about all this, but Vance Morrison wrote a great MSDN article about the | .NET 2.0 model (which is stricter than the CLI spec): | http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/ | | -- | Jon Skeet - <sk***@pobox.com> | http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet | If replying to the group, please do not mail me too William Stacey [C# MVP] wrote:
> Here is some more food for the party: Volatile read has acquire semantics. Loads won't move before an acquire,> http://msdn2.microsoft.com/en-us/library/ms686355.aspx > > Here he states the compiler will not re-order volatile variable access, > However these operations still *could be re-ordered by the processor. similarly, writes won't move after a release. These should be implemented by CPU instructions, as explained in the article you linked to, and aren't restricted to just compiler optimizations. This means that volatile read on .NET is sufficient, because the CPU won't reorder such that the second load moves before the first load, because of the acquire semantics (aka read fence). > In The have full fence semantics (except the Interlocked*Acquire and> fact, he shows correcting this issue with an Interlocked operation. This > article also state that Interlocked functions *ensure appropriat barriers > for memory ordering. Interlocked*Release). > So (at least for this example) it would seem Volatile also prevents CPU re-order, but asymmetrically: read causes> Interlocked provides as good or better symatics as the volatile example (and > also prevents CPU re-order). acquire, write causes release. The write side release means that e.g. "publishing" a new instance to a static volatile variable won't have a problem such that some fields in the instance aren't fully retired before other threads could read the instance reference - writes won't move ahead of the release / write fence. -- Barry Thanks Barry. My assertion has been that Interlocked works here, not that
volatile does not. One thing outstanding, however, is that little "blurb" about the cpu being able to reordering reads even with volatile. That is a bit unsettling. -- Show quoteWilliam Stacey [C# MVP] "Barry Kelly" <barry.j.ke***@gmail.com> wrote in message news:t4nu339s34tpgovpblna8ilkf2rb2r194e@4ax.com... | William Stacey [C# MVP] wrote: | | > Here is some more food for the party: | > http://msdn2.microsoft.com/en-us/library/ms686355.aspx | > | > Here he states the compiler will not re-order volatile variable access, | > However these operations still *could be re-ordered by the processor. | | Volatile read has acquire semantics. Loads won't move before an acquire, | similarly, writes won't move after a release. These should be | implemented by CPU instructions, as explained in the article you linked | to, and aren't restricted to just compiler optimizations. This means | that volatile read on .NET is sufficient, because the CPU won't reorder | such that the second load moves before the first load, because of the | acquire semantics (aka read fence). | | > In | > fact, he shows correcting this issue with an Interlocked operation. This | > article also state that Interlocked functions *ensure appropriat barriers | > for memory ordering. | | The have full fence semantics (except the Interlocked*Acquire and | Interlocked*Release). | | > So (at least for this example) it would seem | > Interlocked provides as good or better symatics as the volatile example (and | > also prevents CPU re-order). | | Volatile also prevents CPU re-order, but asymmetrically: read causes | acquire, write causes release. The write side release means that e.g. | "publishing" a new instance to a static volatile variable won't have a | problem such that some fields in the instance aren't fully retired | before other threads could read the instance reference - writes won't | move ahead of the release / write fence. | | -- Barry | | -- | http://barrkel.blogspot.com/ After re-reading, I think I found it. He was talking about VS2005
volatile -vs- VS2003 volatile - they seem to have implemented them better in 2005 using proper acquire and release semantics. That is info I never saw before. The volatile code he shows works correctly with vs2005 compiler, but may not with 2003 compiler, hence the alternative Interlocked example given. Makes more sense now. -- Show quoteWilliam Stacey [C# MVP] "William Stacey [C# MVP]" <william.sta***@gmail.com> wrote in message news:epyqE0QkHHA.4516@TK2MSFTNGP03.phx.gbl... | Thanks Barry. My assertion has been that Interlocked works here, not that | volatile does not. One thing outstanding, however, is that little "blurb" | about the cpu being able to reordering reads even with volatile. That is a | bit unsettling. | | -- | William Stacey [C# MVP] | | | "Barry Kelly" <barry.j.ke***@gmail.com> wrote in message | news:t4nu339s34tpgovpblna8ilkf2rb2r194e@4ax.com... || William Stacey [C# MVP] wrote: || || > Here is some more food for the party: || > http://msdn2.microsoft.com/en-us/library/ms686355.aspx || > || > Here he states the compiler will not re-order volatile variable access, || > However these operations still *could be re-ordered by the processor. || || Volatile read has acquire semantics. Loads won't move before an acquire, || similarly, writes won't move after a release. These should be || implemented by CPU instructions, as explained in the article you linked || to, and aren't restricted to just compiler optimizations. This means || that volatile read on .NET is sufficient, because the CPU won't reorder || such that the second load moves before the first load, because of the || acquire semantics (aka read fence). || || > In || > fact, he shows correcting this issue with an Interlocked operation. | This || > article also state that Interlocked functions *ensure appropriat | barriers || > for memory ordering. || || The have full fence semantics (except the Interlocked*Acquire and || Interlocked*Release). || || > So (at least for this example) it would seem || > Interlocked provides as good or better symatics as the volatile example | (and || > also prevents CPU re-order). || || Volatile also prevents CPU re-order, but asymmetrically: read causes || acquire, write causes release. The write side release means that e.g. || "publishing" a new instance to a static volatile variable won't have a || problem such that some fields in the instance aren't fully retired || before other threads could read the instance reference - writes won't || move ahead of the release / write fence. || || -- Barry || || -- || http://barrkel.blogspot.com/ | | William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
> Here is some more food for the party: Hmm. That sounds like it's a broken implementation then. Basically, if > http://msdn2.microsoft.com/en-us/library/ms686355.aspx > > Here he states the compiler will not re-order volatile variable access, > However these operations still *could be re-ordered by the processor. the .NET CLR executes code in a way which violates the spec, the implementation is flawed. If the processor could reorder things, the CLR should make sure that that is invisible to the user, beyond what is possible within the CLI spec. I'd be interested to hear Joe Duffy's opinion on that part of the article. > In fact, he shows correcting this issue with an Interlocked operation. This No - because only the thread which *calls* the Interlocked method knows > article also state that Interlocked functions *ensure appropriat barriers > for memory ordering. So (at least for this example) it would seem > Interlocked provides as good or better symatics as the volatile example (and > also prevents CPU re-order). that interlocked is involved, whereas with volatile both the reading thread *and* the writing thread know to use memory barriers. Of course, if you use Interlocked in both threads, to both read *and* write the value, then everything will be okay. > However, I am not sure any this matters for If we care about atomicity, there needs to be some kind of locking.> this example, as in both cases you still have the race issue. R1 can read > x, then W1 writes x and y, then R1 reads y - or various combinations of > same. So some form of critical section around both x and y would seem to be > the proper behavior, unless we don't care about x or y in respect to each > other (i.e. simple counters that are not treated as a set). If we only care that the change to y is seen after the change to x (i.e. you can't see x=0, y=1) then volatile will do the job but I don't believe that changing the variable with Interlocked and then reading it directly in another thread is guaranteed to work. -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too | Hmm. That sounds like it's a broken implementation then. Basically, if I would like to see Joe comment on that as well. If that article is | the .NET CLR executes code in a way which violates the spec, the | implementation is flawed. If the processor could reorder things, the | CLR should make sure that that is invisible to the user, beyond what is | possible within the CLI spec. | | I'd be interested to hear Joe Duffy's opinion on that part of the | article. correct, it sounds like "volatile" may not work as people expect it to be working. | No - because only the thread which *calls* the Interlocked method knows The reader does not need to know as it gets the read fence automatically (at | that interlocked is involved, whereas with volatile both the reading | thread *and* the writing thread know to use memory barriers. Of course, | if you use Interlocked in both threads, to both read *and* write the | value, then everything will be okay. the hardware layer) because the Interlocked buts up a *full memory barrier fence so reads can not be reordered at the instruction level. vista has some other versions with seperate read aquire/write aquire methods, but the ones in the framework use the full fence. I am not sure how volatile is implemented under the covers, but it would not suprise me if it also uses an interlocked operation. | If we care about atomicity, there needs to be some kind of locking. hmm. We have seem to cover the same ground. Everything I have seen (more | If we only care that the change to y is seen after the change to x | (i.e. you can't see x=0, y=1) then volatile will do the job but I don't | believe that changing the variable with Interlocked and then reading it | directly in another thread is guaranteed to work. then that link) says it does. I would be interested if you find something to contrary as that would be good info. Also, most (if not all) locks are based on Interlocked in their guts, so if it does not work, then everything in the history of the world is broke. Ok, that is bit drastic... tia -- William Stacey [C# MVP] || I'd be interested to hear Joe Duffy's opinion on that part of the I posted correction in thread above. It seems the issue with volatile was || article. with vs2003, not vs2005. --wjs William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
> Well, not just how people expect it to work, but how it's *specified* > | Hmm. That sounds like it's a broken implementation then. Basically, if > | the .NET CLR executes code in a way which violates the spec, the > | implementation is flawed. If the processor could reorder things, the > | CLR should make sure that that is invisible to the user, beyond what is > | possible within the CLI spec. > | > | I'd be interested to hear Joe Duffy's opinion on that part of the > | article. > > I would like to see Joe comment on that as well. If that article is > correct, it sounds like "volatile" may not work as people expect it to be > working. to work. I've now seen your post about it basically being a potential bug in the ..NET 1.1 CLR. > | No - because only the thread which *calls* the Interlocked method knows But the read could *already* have been reordered by the JIT, ages > | that interlocked is involved, whereas with volatile both the reading > | thread *and* the writing thread know to use memory barriers. Of course, > | if you use Interlocked in both threads, to both read *and* write the > | value, then everything will be okay. > > The reader does not need to know as it gets the read fence automatically (at > the hardware layer) because the Interlocked buts up a *full memory barrier > fence so reads can not be reordered at the instruction level. before this occurs. How could the JIT know whether a variable would *ever* be used with Interlocked, and avoid reordering? > vista has But there's more to it than that - there's the restriction on JIT > some other versions with seperate read aquire/write aquire methods, but the > ones in the framework use the full fence. I am not sure how volatile is > implemented under the covers, but it would not suprise me if it also uses an > interlocked operation. reordering, and *that* can't be known about ahead of time. > | If we care about atomicity, there needs to be some kind of locking. I'm not saying that Interlocked doesn't work. I'm saying that it only > | If we only care that the change to y is seen after the change to x > | (i.e. you can't see x=0, y=1) then volatile will do the job but I don't > | believe that changing the variable with Interlocked and then reading it > | directly in another thread is guaranteed to work. > > hmm. We have seem to cover the same ground. Everything I have seen (more > then that link) says it does. I would be interested if you find something > to contrary as that would be good info. Also, most (if not all) locks are > based on Interlocked in their guts, so if it does not work, then everything > in the history of the world is broke. Ok, that is bit drastic... tia works if you use it consistently across all threads. I'll try to get round to mailing Joe about it... -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too | But the read could *already* have been reordered by the JIT, ages I assume it knows the same way it knows a var is volatile - by inspecting | before this occurs. How could the JIT know whether a variable would | *ever* be used with Interlocked, and avoid reordering? the code as it compiles and keeping some state if it sees a var wrapped with interlocked or volatile, it does not reordering. Not sure. I am not sure when and where is actually does these optimizations. I think I read something about tight local loops, but not sure. It would be interesting to see a good article on the subject. | I'll try to get round to mailing Joe about it... Thanks. That would be great to see what he says.-- William Stacey [C# MVP] PCR concurrency library: www.codeplex.com/pcr PSH Scripts Project www.codeplex.com/psobject William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
> | But the read could *already* have been reordered by the JIT, ages When the JIT has to create native code for one method, it might not > | before this occurs. How could the JIT know whether a variable would > | *ever* be used with Interlocked, and avoid reordering? > > I assume it knows the same way it knows a var is volatile - by inspecting > the code as it compiles and keeping some state if it sees a var wrapped with > interlocked or volatile, it does not reordering. Not sure. I am not sure > when and where is actually does these optimizations. I think I read > something about tight local loops, but not sure. It would be interesting to > see a good article on the subject. even have loaded all the assemblies which will manipulate the same variables as that method does. I can't see that it could *possibly* know whether or not some code, somewhere, is going to use it with the Interlocked class. Compare that with finding out whether or not the variable is volatile - you look in a single place, and that's it. > | I'll try to get round to mailing Joe about it... Yeah. This relies on me finding the time, of course :)> > Thanks. That would be great to see what he says. -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too Jon Skeet [C# MVP] <sk***@pobox.com> wrote:
<snip> > > | I'll try to get round to mailing Joe about it... I've found the time:> > > > Thanks. That would be great to see what he says. > > Yeah. This relies on me finding the time, of course :) http://msmvps.com/blogs/jon.skeet/archive/2007/05/09/non-volatile- reads-and-interlocked-and-how-they-interact.aspx All comments welcome. I'm sorry that it's clearly written from my viewpoint rather than that of an impartial observer - I hope I haven't misrepresented any views. I'm happy to edit where appropriate. -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too "Ecma-335:2006
.... 12.6.5 5. Explicit atomic operations... These operations (e.g. Increment, Decrement, Exchange, and CompareExchange) perform implicit acquire/release operations. ..." So by my read, this contract would require the CLI to respect the same optimization rules as volatile. System.Thread.MemoryBarrier would seem to have to respect proper order too even if the vars are not decorated with volatile. Do you read it different? Thanks. -- Show quoteWilliam Stacey [C# MVP] "Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message news:MPG.20abb72ef1c3a592b6@msnews.microsoft.com... | Jon Skeet [C# MVP] <sk***@pobox.com> wrote: | | <snip> | | > > | I'll try to get round to mailing Joe about it... | > > | > > Thanks. That would be great to see what he says. | > | > Yeah. This relies on me finding the time, of course :) | | I've found the time: | | http://msmvps.com/blogs/jon.skeet/archive/2007/05/09/non-volatile- | reads-and-interlocked-and-how-they-interact.aspx | | All comments welcome. I'm sorry that it's clearly written from my | viewpoint rather than that of an impartial observer - I hope I haven't | misrepresented any views. I'm happy to edit where appropriate. | | -- | Jon Skeet - <sk***@pobox.com> | http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet | If replying to the group, please do not mail me too William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
> "Ecma-335:2006 The operations on Interlocked *do* perform implicit acquire/release > ... > 12.6.5 > > 5. Explicit atomic operations... These operations (e.g. Increment, > Decrement, Exchange, and CompareExchange) perform implicit acquire/release > operations. ..." > > So by my read, this contract would require the CLI to respect the same > optimization rules as volatile. System.Thread.MemoryBarrier would seem to > have to respect proper order too even if the vars are not decorated with > volatile. Do you read it different? Thanks. operations - but the *reading* operations don't. That's the problem. To put it an alternative way, it is (in some ways - not all) a bit like acquiring a lock while writing a value, but not acquiring the same lock when reading. Yes, you've made sure that the data is available to be read with the memory barrier on the writing thread, but you haven't made sure that the reading thread actually performs the read when you expect it to. Basically, both threads need a memory barrier in order to be effective. Using Interlocked puts a memory barrier on the calling thread, but that's all - IMO :) -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too | Basically, both threads need a memory barrier in order to be effective. Hi Jon. Two issues here I think:| Using Interlocked puts a memory barrier on the calling thread, but | that's all - IMO :) 1) Interlocked and barrier. Interlocked does provide a full memory barrier. At least from the docs, this is clear. 2) Does it prevent reads from being reordered by the CLI? This is not as clear. It seems that if it implicitly expresses volatile read/write as it claims in ECMA, then it would also prevent such optimizations. Same way you would expect Thread.MemoryBarrier() to do. IIRC, these CLI optimization rules must be very localized and are very strict. They can't just do them all over the place. They can only do them them when they *know there can be no side effects. http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx "We've spent a lot of time thinking about what the correct memory model for the CLR should be. If I had to guess, we're going to switch from the ECMA model to the following model. I think that we will try to persuade other CLI implementations to adopt this same model, and that we will try to change the ECMA specification to reflect this. 1.. Memory ordering only applies to locations which can be globally visible or locations that are marked volatile. Any locals that are not address exposed can be optimized without using memory ordering as a constraint since these locations cannot be touched by multiple threads in parallel. 2.. Non-volatile loads can be reordered freely. 3.. Every store (regardless of volatile marking) is considered a release. 4.. Volatile loads are considered acquire. 5.. Device oriented software may need special programmer care. Volatile stores are still required for any access of device memory. This is typically not a concern for the managed developer." This was wrote back in May 2003, and don't know if this made it into 2.0 or latter or at all. But he seems to touch on this very issue we are talking about. Order changes can only be applied to locals that are not address exposed and do not have some volatile symatics around them. As interlocked and MemoryBarrier has implicit volatile contract, I would assume CLI inspection would have to honor those same as with volatile keyword. Maybe not. Would be nice to get a clearer writing on this. -- William Stacey [C# MVP] William Stacey [C# MVP] wrote:
I think we're starting to lose some context. Here's your original reply to Chris: William wrote: > Chris wrote: This is the basic point that Jon (and I, and the standard, and Chris> | ... but, if I just want to read the value, I CANNOT do: > | > | int myValue = _firstTime; > | or > | if (_fistTime==0) > | > | Doing these requires a memory barrier of some type (volatile variable, > | monitor, etc) to have any degree of reliability. > > you can. The Interlocked write ensures this across all cpus and cache. Brumme's post that you quote) disagree with you on. An interlocked operation is a full fence (read and write barrier, both acquire and release) on the thread that it was *executed* on. It doesn't affect reads from *other* threads. > | Basically, both threads need a memory barrier in order to be effective. There is no way it could perform its advertised purpose if it didn't> | Using Interlocked puts a memory barrier on the calling thread, but > | that's all - IMO :) > > Hi Jon. Two issues here I think: > 1) Interlocked and barrier. Interlocked does provide a full memory barrier. > At least from the docs, this is clear. > 2) Does it prevent reads from being reordered by the CLI? This is not as > clear. prevent related reads moving before it. The CLI needs to pass the location of the target to the interlocked operation, so it can't have related values cached in (e.g.) a register or it would be plainly faulty with only a *single* thread of operation. Since that discounts compiler optimizations that would move related reads before the interlocked operation, the only thing left is CPU semantics - and that's what the Interlocked operation guarantees, it guarantees a full fence for the calling thread. Show quote > http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx : Interlocked operations pass the address (reference to the location), so> [extra quoted for clarity -- Barry] > > "We've spent a lot of time thinking about what the correct memory model for > > the CLR should be. If I had to guess, we're going to switch from the ECMA > > model to the following model. I think that we will try to persuade other > > CLI implementations to adopt this same model, and that we will try to change > > the ECMA specification to reflect this. > > > > 1.. Memory ordering only applies to locations which can be globally > > visible or locations that are marked volatile. Any locals that are not > > address exposed can be optimized without using memory ordering as a > > constraint since these locations cannot be touched by multiple threads in > > parallel. > > 2.. Non-volatile loads can be reordered freely. > > 3.. Every store (regardless of volatile marking) is considered a release. > > 4.. Volatile loads are considered acquire. > > 5.. Device oriented software may need special programmer care. Volatile > > stores are still required for any access of device memory. This is > > typically not a concern for the managed developer." > This was wrote back in May 2003, and don't know if this made it into 2.0 or > latter or at all. But he seems to touch on this very issue we are talking > about. they are "address exposed" and therefore the read can't move before the interlocked operation. Volatile semantics aren't required for this; in fact, not even threads are required for this. An optimization that moved the read before the interlocked call (or any other call by reference) is wrong in every case. > Order changes can only be applied to locals that are not address That's not what Chris Brumme wrote. Instead, he writes:> exposed and do not have some volatile symatics around them. > Chris Brumme (in post above) wrote: That is to say, order changes are irrelevant (and in fact, can't even be> > 1.. Memory ordering only applies to locations which can be globally > > visible or locations that are marked volatile. Any locals that are not > > address exposed can be optimized without using memory ordering as a > > constraint since these locations cannot be touched by multiple threads in > > parallel. detected, if the compiler has no bugs) for locals that are not address exposed, since only a single thread of execution can ever see their values. That is to say, he says almost exactly the *opposite* of what you said - that order changes can *only* be detected, are *only* relevant, for locations that are globally visible, address exposed or marked volatile. > As interlocked All this is irrelevant to your original point at the root of the thread> and MemoryBarrier has implicit volatile contract, I would assume CLI > inspection would have to honor those same as with volatile keyword. Maybe > not. Would be nice to get a clearer writing on this. though, in that, contrary to what you claim, a memory barrier is still required for reads from threads *other* than the one that performed the interlocked operation, if you're going to guarantee that you read what's really there and not some stale value. -- Barry | All this is irrelevant to your original point at the root of the thread Hi Barry. I address this part first as it is key to everything else. There | though, in that, contrary to what you claim, a memory barrier is still | required for reads from threads *other* than the one that performed the | interlocked operation, if you're going to guarantee that you read what's | really there and not some stale value. *is a full fence on interlocked methods (save the newer win32 ones like InterlockedIncrementAcquire that provide acquire and release semantics). It is *gaureenteed* by the OS and hw. It is documented in many places (see some below). There can never be stale data, as hw ensures this across all cpus and caches. After an Interlock completes, any and all threads will see only the last data. This is one reason why it is a relatively expensive call. Where is it documented there is not a full barrier? Here is some linq: http://msdn2.microsoft.com/en-us/library/ms684122.aspx http://www.gamasutra.com/features/20060630/paquet_01.shtml http://blogs.msdn.com/oldnewthing/archive/2004/05/28/143769.aspx http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx -- William Stacey [C# MVP] William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
Show quote > | All this is irrelevant to your original point at the root of the thread There *is* a full barrier, *within that thread*. Not within all others.> | though, in that, contrary to what you claim, a memory barrier is still > | required for reads from threads *other* than the one that performed the > | interlocked operation, if you're going to guarantee that you read what's > | really there and not some stale value. > > Hi Barry. I address this part first as it is key to everything else. There > *is a full fence on interlocked methods (save the newer win32 ones like > InterlockedIncrementAcquire that provide acquire and release semantics). It > is *gaureenteed* by the OS and hw. It is documented in many places (see > some below). There can never be stale data, as hw ensures this across all > cpus and caches. After an Interlock completes, any and all threads will see > only the last data. This is one reason why it is a relatively expensive > call. Where is it documented there is not a full barrier? Any thread which actually *performs* a read after the Interlock completes will indeed see the new value. The point is that with reordering in the reading thread, there's no guarantee that it will make any attempt to read the value after the Interlock completes - it could have read it before the Interlocked operation executes, even if a value is later read which looks like it should have been read beforehand. Suppose the source code is: Thread 1 int b = y; int a = x; Thread 2 Interlocked.Increment (ref x); Interlocked.Increment (ref y); The first thread's code can be reordered by the JIT, and then executed as: Thread 1 Thread 2 int a = x; Interlocked.Increment (ref x); Interlocked.Increment (ref y); int b = y; Note the reordering. The guarantee that threads reading after Interlocked.Increment see the new values is still intact - it's just that x has been read before y, which appears to be wrong according to the original source code. This is because there aren't any memory barriers in the reading thread. Could you point to where in the ECMA spec it says anything about a memory barrier on one thread affecting what is possible in terms of reordering on another thread? It's important to understand that this *isn't* just a case of hardware caches and memory fences etc - the JIT itself can reorder the reads, which is what I've been trying to show above. > Here is some linq: I don't see anything in those pages to prevent the JIT from reordering > http://msdn2.microsoft.com/en-us/library/ms684122.aspx > http://www.gamasutra.com/features/20060630/paquet_01.shtml > http://blogs.msdn.com/oldnewthing/archive/2004/05/28/143769.aspx > http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx the reads on the reading thread. Half of them aren't even about .NET, which means they won't be addressing the JIT-reordering issue at all. I haven't heard back from Joe yet - I might mail Chris Brumme to see if he'd care to comment. -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too | There *is* a full barrier, *within that thread*. Not within all others. What would be the point of a full barrier if it did not work across threads? You would not need it for a single thread. The full barrier ensures the cpu can not reorder reads and all reads will see the last write. But, I think your talking about reorder by the CLI as below. Show quote | Any thread which actually *performs* a read after the Interlock I understand your point. I am playing devils advocate with myself trying to | completes will indeed see the new value. The point is that with | reordering in the reading thread, there's no guarantee that it will | make any attempt to read the value after the Interlock completes - it | could have read it before the Interlocked operation executes, even if a | value is later read which looks like it should have been read | beforehand. Suppose the source code is: | | Thread 1 | int b = y; | int a = x; | | Thread 2 | Interlocked.Increment (ref x); | Interlocked.Increment (ref y); | | The first thread's code can be reordered by the JIT, and then executed | as: | | Thread 1 Thread 2 | int a = x; | Interlocked.Increment (ref x); | Interlocked.Increment (ref y); | int b = y; | | Note the reordering. The guarantee that threads reading after | Interlocked.Increment see the new values is still intact - it's just | that x has been read before y, which appears to be wrong according to | the original source code. | | This is because there aren't any memory barriers in the reading thread. find proof either way. But, by my read (reading thru the sparse lines) the JIT is *not allowed to make this optimization here. It can only reorder when: 1) There is no volatile 2) The var is not address exposed (e.g. local). 3) There is no other explicit or implicit memory barriers protecting the var (i.e. Thread.MemoryBarrier(), Interlocked, Monitor, etc) So this would fail test #3 and not allow reorder by JIT. The ecma I posted hints at this, but is far from detailed on this. Chris B talks more about this, but don't know when/if he implemented it in the CLR. Naturally, I could be nuts. It would not be the first time. Thank you. -- William Stacey [C# MVP] William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
> | There *is* a full barrier, *within that thread*. Not within all others. The same point as locking has. Both threads need to try to acquire a > > What would be the point of a full barrier if it did not work across threads? lock before it becomes particularly useful, but I don't think you'd argue that it *isn't* useful. > You would not need it for a single thread. The full barrier ensures the cpu .... if they perform a genuine read *after* that point, yes.> can not reorder reads and all reads will see the last write. > But, I think your talking about reorder by the CLI as below. Yes - although from the memory model point of view (which doesn't know about CPU fences) it's irrelevant. The memory model makes certain guarantees without saying whether the optimisations which can be performed are due to the JIT or the CPU. <snip> Show quote > | Note the reordering. The guarantee that threads reading after You seem to be treating these as an "and" whereas it's not - otherwise > | Interlocked.Increment see the new values is still intact - it's just > | that x has been read before y, which appears to be wrong according to > | the original source code. > | > | This is because there aren't any memory barriers in the reading thread. > > I understand your point. I am playing devils advocate with myself trying to > find proof either way. But, by my read (reading thru the sparse lines) the > JIT is *not allowed to make this optimization here. It can only reorder > when: > 1) There is no volatile > 2) The var is not address exposed (e.g. local). > 3) There is no other explicit or implicit memory barriers protecting the var > (i.e. Thread.MemoryBarrier(), Interlocked, Monitor, etc) you could never have reordering of *any* non-local reads. I agree with 1. I believe you're misreading the spec on point 2 - it's not a condition, it's just that *any* reads/writes can be reordered. > So this would fail test #3 and not allow reorder by JIT. The ecma I posted I don't see where it hints at this at all. Out of the 5 rules you > hints at this, but is far from detailed on this. Chris B talks more about > this, but don't know when/if he implemented it in the CLR. Naturally, I > could be nuts. It would not be the first time. Thank you. mentioned in the spec, the important one is this: "Non-volatile loads can be reordered freely." The statement: int a = x; is *not* a volatile load when x isn't a volatile variable. Just because some other thread may or may not perform an interlocked operation doesn't make loading it volatile. I think that's the crux of our disagreement. So, having reduced it to this (if you agree with me on that much) can you provide any indication that accessing a variable at any point on any thread with an Interlocked operation makes every load of that variable on every thread count as being volatile? I haven't seen any evidence of that. -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too William Stacey [C# MVP] wrote:
> | There *is* a full barrier, *within that thread*. Not within all others. A full barrier is a combination of both a read barrier and a write> > What would be the point of a full barrier if it did not work across threads? barrier. A write barrier makes sure that all pending writes are retired, and is useful just before publishing a value, so that when it is read on a different thread / CPU, there are no pending writes that could catch any readers by surprise if they were to end up being retired a little later (out of order). The hardware-level concept of a write barrier affects only the CPU executing the write barrier, ensuring that all pending writes are retired before the publishing the final value. It's an ordering primitive for that CPU. Yes it does affect other threads, but only in so far as what they can observe from memory, in conjunction with the *other* half of the equation - a read barrier. A read barrier makes sure that no reads / prefetches / what have you have snuck before retrieve a particular value from a location, a value that you expect to have been published by another thread. The read barrier only affects the CPU performing the read. It's directly complementary to a write barrier on the publishing thread, it ensures that you read what you think you read, and not what the CPU inferred you we're going to indirectly read by (e.g.) sneaking ahead and peeking at the instruction stream. Read and write barriers are complementary. Write barriers stop writes moving forward in time, while read barriers stop reads moving backwards in time. If you don't have balanced write and read barriers on threads that are communicating via shared memory without locks, then their reads and writes will overlap, and consequently linear, intuitive, von Neumann expectations of invariants will be broken [unless of course, a finite state machine model for all possible combinations have been worked out and tested, see e.g. Cliff Click's work on a lock-free hash table for Java: http://blogs.azulsystems.com/cliff/2007/05/hardware_visibi.html ..] Again, a read barrier only affects the thread / CPU that it's executed on. It doesn't affect any writers on other threads. And read barriers of some kind (even the implicit ones with 'lock') aren't optional even when the writing thread is using volatile / interlocked / write barriers / other mechanisms. > You would not need it for a single thread. Barriers aren't needed for single threads because then there's only onethread context, one CPU, and the OS would have responsibility for any read / write barriers before context switching into / out of your application's thread of control, since there's no concurrency possible with only a single thread. > The full barrier ensures the cpu Memory barriers / fences only affect ordering of reads and writes of the> can not reorder reads and all reads will see the last write. But, I think > your talking about reorder by the CLI as below. thread / CPU they were executed on, though. > I understand your point. I am playing devils advocate with myself trying to I believe you're misreading something big. Quoting from Chris Brumme's> find proof either way. But, by my read (reading thru the sparse lines) the > JIT is *not allowed to make this optimization here. It can only reorder > when: > 1) There is no volatile > 2) The var is not address exposed (e.g. local). >> 3) There is no other explicit or implicit memory barriers protecting the var > (i.e. Thread.MemoryBarrier(), Interlocked, Monitor, etc) post: http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx: > 1. Memory ordering only applies to locations which This implies that loads and stores can be reordered freely when its> can be globally visible or locations that are marked > volatile. Any locals that are not address exposed can be > optimized without using memory ordering as a constraint > since these locations cannot be touched by multiple threads > in parallel. non-exposed locals being dealt with. ("Reordered freely" is always subject to correctness on a single-threaded basis, of course, so there are limits to this reordering.) > 2. Non-volatile loads can be reordered freely. Now here's something that you seem to have missed: any and all reads(loads) that aren't volatile can be reordered *freely*, even though the location being loaded *isn't* local. > 3. Every store (regardless of volatile marking) is considered a release. This was an upgrade in the CLR's memory model over the ECMA memorymodel, IIRC, though Intel x86 architecture keeps it ok, but it did need specifying for IA64. It implies that a write barrier isn't required when publishing a value, because the CLR (but not ECMA) will handle that situation automatically. > 4. Volatile loads are considered acquire. And by contrast, from point 2, non-volatile loads are not consideredacquire, so they can freely move backwards in time. This is what makes your original post to the thread incorrect. > 5. Device oriented software may need special programmer care. I've little to say about devices in CLR :)> So this would fail test #3 and not allow reorder by JIT. I believe the rules you write are mistaken and not supported by thestandards or links you post, and in any case, interlocked operations only affect a *single* thread, not all threads. -- Barry > cpus and caches. After an Interlock completes, any and all threads will It is a full memory barrier. That means that the new value is actually in > see > only the last data. This is one reason why it is a relatively expensive > call. Where is it documented there is not a full barrier? RAM, coherent across all processors (some architectures might choose write snooping instead of invalidating caches in other cores) before the Interlocked call returns. This falls far short of forcing other threads to "see the last data", because another thread could have cached the value in a CPU register (hoisting a complex expression outside a loop, anyone?). Access to the memory address is guaranteed to yield the new value, but only volatile variables result in memory being addressed on every use of the variable name. Otherwise the compiler can perform alias analysis, determine that the reading thread can't change the variable in the interim, and thus uses the cached value. William Stacey [C# MVP] <william.sta***@gmail.com> wrote:
<snip> > 2.. Non-volatile loads can be reordered freely. And that's the absolutely key one. As far as the reading thread is concerned, it's performing non-volatile loads. It doesn't know that an Interlocked operation will occur anywhere else, and it's not loading a volatile variable - therefore it's a non-volatile load, and the operations can be reordered as shown in the post I've just made. <snip> -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too There's quite a bit of dicusssion going on with regard to memory models and
interlocked writes and standard reads. I figured I would write a little program and see if I can make the problem show up. The sample app I wrote below does, every once in a while, go "Boom". This means the memory model is indeed reordering the reads of test and test2. My environment for running this test is: Dual-Core AMD Athlon Windows XP x64 Compiled in Release Mode, using "Any CPU" Running without a debugger attached. I've run the test using "Any CPU", "x86" and "x64" as specific compile targets. Same result each time. Sometimes it runs for a long time without failing, other times I get a ton of failures. For example: .... Boom : 287884447, 287884446 Boom : 287885420, 287885419 Boom : 287885780, 287885779 Boom : 287886016, 287886015 Boom : 287887440, 287887439 Boom : 287887798, 287887797 using System; using System.Collections.Generic; using System.Text; using System.Threading; namespace ConsoleApplication4 { class Program { static long test = 0; static long test2 = 0; static void Main(string[] args) { Thread t1 = new Thread(ReadThreadProc); Thread t2 = new Thread(ReadThreadProc); Thread t3 = new Thread(WriteThreadProc); t1.IsBackground = true; t2.IsBackground = true; t3.IsBackground = true; t1.Start(); t2.Start(); t3.Start(); Console.WriteLine("Press any key to exit"); Console.ReadLine(); } private static void WriteThreadProc() { while (true) { Interlocked.Increment(ref test); Interlocked.Increment(ref test2); } } private static void ReadThreadProc() { while (true) { long v1 = test; long v2 = test2; if (v1 > v2) Console.WriteLine("Boom : {0}, {1}", v1, v2); } } } } -- Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP http://www.coversant.com/blogs/cmullins There's a (very obvious) bug in my test.
Please ignore that last post... -- Show quoteChris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP http://www.coversant.com/blogs/cmullins "Chris Mullins [MVP]" <cmull***@yahoo.com> wrote in message news:eJAdpbAlHHA.4852@TK2MSFTNGP03.phx.gbl... > There's quite a bit of dicusssion going on with regard to memory models > and interlocked writes and standard reads. > > I figured I would write a little program and see if I can make the problem > show up. > The sample app I wrote below does, every once in a while, go "Boom". > > This means the memory model is indeed reordering the reads of test and > test2. > > My environment for running this test is: > Dual-Core AMD Athlon > Windows XP x64 > Compiled in Release Mode, using "Any CPU" > Running without a debugger attached. > > I've run the test using "Any CPU", "x86" and "x64" as specific compile > targets. Same result each time. > > Sometimes it runs for a long time without failing, other times I get a ton > of failures. For example: > ... > Boom : 287884447, 287884446 > Boom : 287885420, 287885419 > Boom : 287885780, 287885779 > Boom : 287886016, 287886015 > Boom : 287887440, 287887439 > Boom : 287887798, 287887797 > > > > using System; > using System.Collections.Generic; > using System.Text; > using System.Threading; > > namespace ConsoleApplication4 > { > class Program > { > static long test = 0; > static long test2 = 0; > > static void Main(string[] args) > { > Thread t1 = new Thread(ReadThreadProc); > Thread t2 = new Thread(ReadThreadProc); > Thread t3 = new Thread(WriteThreadProc); > > t1.IsBackground = true; > t2.IsBackground = true; > t3.IsBackground = true; > > t1.Start(); > t2.Start(); > t3.Start(); > > Console.WriteLine("Press any key to exit"); > Console.ReadLine(); > > } > > private static void WriteThreadProc() > { > while (true) > { > Interlocked.Increment(ref test); > Interlocked.Increment(ref test2); > } > } > > private static void ReadThreadProc() > { > while (true) > { > long v1 = test; > long v2 = test2; > > if (v1 > v2) > Console.WriteLine("Boom : {0}, {1}", v1, v2); > } > } > } > } > > > -- > Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP > http://www.coversant.com/blogs/cmullins > > > Chris Mullins [MVP] <cmull***@yahoo.com> wrote:
> There's a (very obvious) bug in my test. If it's any consolation, I made the exact same mistake when I first > > Please ignore that last post... made the blog post :) -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too not obvious to me. what is it?
Show quote "Chris Mullins [MVP]" <cmull***@yahoo.com> wrote in message news:unrnxiAlHHA.4904@TK2MSFTNGP05.phx.gbl... > There's a (very obvious) bug in my test. > > Please ignore that last post... > > -- > Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP > http://www.coversant.com/blogs/cmullins > > "Chris Mullins [MVP]" <cmull***@yahoo.com> wrote in message > news:eJAdpbAlHHA.4852@TK2MSFTNGP03.phx.gbl... >> There's quite a bit of dicusssion going on with regard to memory models >> and interlocked writes and standard reads. >> >> I figured I would write a little program and see if I can make the >> problem show up. >> The sample app I wrote below does, every once in a while, go "Boom". >> >> This means the memory model is indeed reordering the reads of test and >> test2. >> >> My environment for running this test is: >> Dual-Core AMD Athlon >> Windows XP x64 >> Compiled in Release Mode, using "Any CPU" >> Running without a debugger attached. >> >> I've run the test using "Any CPU", "x86" and "x64" as specific compile >> targets. Same result each time. >> >> Sometimes it runs for a long time without failing, other times I get a >> ton of failures. For example: >> ... >> Boom : 287884447, 287884446 >> Boom : 287885420, 287885419 >> Boom : 287885780, 287885779 >> Boom : 287886016, 287886015 >> Boom : 287887440, 287887439 >> Boom : 287887798, 287887797 >> >> >> >> using System; >> using System.Collections.Generic; >> using System.Text; >> using System.Threading; >> >> namespace ConsoleApplication4 >> { >> class Program >> { >> static long test = 0; >> static long test2 = 0; >> >> static void Main(string[] args) >> { >> Thread t1 = new Thread(ReadThreadProc); >> Thread t2 = new Thread(ReadThreadProc); >> Thread t3 = new Thread(WriteThreadProc); >> >> t1.IsBackground = true; >> t2.IsBackground = true; >> t3.IsBackground = true; >> >> t1.Start(); >> t2.Start(); >> t3.Start(); >> >> Console.WriteLine("Press any key to exit"); >> Console.ReadLine(); >> >> } >> >> private static void WriteThreadProc() >> { >> while (true) >> { >> Interlocked.Increment(ref test); >> Interlocked.Increment(ref test2); >> } >> } >> >> private static void ReadThreadProc() >> { >> while (true) >> { >> long v1 = test; >> long v2 = test2; >> >> if (v1 > v2) >> Console.WriteLine("Boom : {0}, {1}", v1, v2); >> } >> } >> } >> } >> >> >> -- >> Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP >> http://www.coversant.com/blogs/cmullins >> >> >> > > Stelrad Kypski <b***@blah.com> wrote:
> not obvious to me. what is it? The reads are done in the same order as the increments, which means that even without any reordering, it's quite possible to get any of the four possibilities. -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too Hi,
One of the posts in this very interesting threat referred to this article : http://msdn2.microsoft.com:80/en-us/library/ms686355.aspx This article includes the following piece of code (to fix the race condition): volatile int iValue; volatile BOOL fValueHasBeenComputed = FALSE; extern int ComputeValue(); void CacheComputedValue() { if (!fValueHasBeenComputed) { iValue = ComputeValue(); fValueHasBeenComputed = TRUE; } } BOOL FetchComputedValue(int *piResult) { if (fValueHasBeenComputed) { *piResult = iValue; return TRUE; } else return FALSE; } My (complete ignorent) question is this: Must fValueHasBeenComputed really be declared volatile for the program to produce correct results ? Isn't it true that , because iValue is declared volatile, it is impossible that fValueHasBeenComputed = TRUE and the result of ComputeValue() is not stored in iValue ? Of course, because fValueHasBeenComputed is not declared volatile it may be that FetchComputedValue() returns FALSE when in fact iValue does contain the computed value but the other case can never occur. E.g. FetchComputedValue returns TRUE and piResult is incorrect. Are my assumptions correct ? Jos |
|||||||||||||||||||||||