|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
NOT using the System.Threadpool in server class applicationsQuite often of late I've been asked about when to use the System Threadpool.
I finally sat down and wrote a blog detailing the issues in the ThreadPool and why it's unsuitable for use in production grade server applications. Any feedback on this is appriciated! http://www.coversant.net/dotnetnuke/Coversant/Blogs/tabid/88/EntryID/8/Default.aspx -- Chris Mullins Chris Mullins <cmull***@yahoo.com> wrote:
> Quite often of late I've been asked about when to use the System Threadpool. My only feedback is that this is something I've been aware of for a > > I finally sat down and wrote a blog detailing the issues in the ThreadPool > and why it's unsuitable for use in production grade server applications. > > Any feedback on this is appriciated! > > http://www.coversant.net/dotnetnuke/Coversant/Blogs/tabid/88/EntryID/8/Default.aspx while. When I tried to make the same point on the CLR mailing list, I was variously accused of being naive or finding problems when there really aren't any. Half the problem is that *anything* can use the system threadpool - there's no way of isolating different areas of code to use different pools. Oh well - at least I'm not alone in considering it a problem. -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too "Jon Skeet [C# MVP]" <sk***@pobox.com> wrote [Jon has seen this too]> Chris Mullins <cmull***@yahoo.com> wrote: >> >> http://www.coversant.net/dotnetnuke/Coversant/Blogs/tabid/88/EntryID/8/Default.aspx > > My only feedback is that this is something I've been aware of for a That's what I've been bumping into as well. I dismiss alot of it, as its > while. When I tried to make the same point on the CLR mailing list, I > was variously accused of being naive or finding problems when there > really aren't any. from people who write proof-of-concept applications for a living, rather than production grade applications. As we all know, there's a world of difference between them. > Half the problem is that *anything* can use the system threadpool - Yea, I agree. This was, when we ran into this 3 years or so ago, the big > there's no way of isolating different areas of code to use different > pools. issue. We needed the ability to say, "CLR gets these threads, SoapBox Server gets these threads". Without that ability, the threadpool runs out of threads and the whole appdomain hangs. -- Chris Mullins Hello, Chris!
CM> I finally sat down and wrote a blog detailing the issues in the CM> ThreadPool and why it's unsuitable for use in production grade server CM> applications. CM> Any feedback on this is appriciated! CM> http://www.coversant.net/dotnetnuke/Coversant/Blogs/tabid/88/EntryID/8/ CM> Default.aspx Nice post :8-) I have some questions on how you handled issues with ThreadPool. Did you consider increasing number of worker threads in the threadpool? Yes, ThreadPool is being used by vast number of classes in the framework, however, developer possesses some power to influence that threadpool consumption, albeit indirect. For instance, consider ADO.NET, server can use limited connection pool, that will limit usage of ThreadPool threads. IMO under heavy load any threadpool can starve threads. In one of the earlier threads William Stacey mentioned SEDA architecture. ( http://groups.google.com.ua/group/microsoft.public.dotnet.framework/browse_thread/thread/d9dfe43f5f60a98d/9c4bff3f789d2e32?lnk=st&q=Chris+Mullins+async&rnum=1&hl=uk#9c4bff3f789d2e32 ) SEDA paper ( http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf ) In that scenario custom thread pool would be very helpful. But I think that such an architerure can also be built using ThreadPool class, however one has to know what class consumes threads and how many of them it consumes. This will help tune ThreadPool appropriately ( increase number of working threads )
Show quote
"Vadym Stetsyak" <vady***@ukr.net> wrote Thanks. I'm always worred about a post like that, as it goes against what so > Hello, Chris! > > CM> I finally sat down and wrote a blog detailing the issues in the > CM> ThreadPool and why it's unsuitable for use in production grade server > CM> applications. > > CM> Any feedback on this is appriciated! > > CM> > http://www.coversant.net/dotnetnuke/Coversant/Blogs/tabid/88/EntryID/8/ > CM> Default.aspx > > Nice post :8-) many other people preach. > I have some questions on how you handled issues with ThreadPool. The problem isn't the size of the threadpool. I can use one of the ICor > Did you consider increasing number of worker threads in the threadpool? interfaces (I don't remember which one) to manipulate the size of the thread pool, and have in fact done this. I've also run into this problem on machines that have 400 threads (by default) in the threadpool. The problems stems from the fact that no matter how many threads are in there, I can exhaust them all. Because the CLR itself needs threadpool threads, this means the whole system then deadlocks. What's needed are two threadpools: One for my application to use, and one for the CLR to use. There's just no getting around this. -- Chris Mullins Hello, Chris!
You wrote on Tue, 27 Jun 2006 10:12:56 -0700: CM> The problems stems from the fact that no matter how many threads are in there, I can exhaust them all. Because the CLR itself needs threadpool threads, this means the whole system then deadlocks. Strange workaround just came to my mind, it can solve the problem with single thread pool, but its not optiomal, for the software that you described in your post. The solutions is multiple AppDomains. Every AppDomain has its own ThreadPool, so we can create something like "worker" AppDomain. This approach is an overkill, due to marshalling, but nevertheless this solution exists, and application can be built using this solution. CM> What's needed are two threadpools: One for my application to use, and one for the CLR to use. There's just no getting around this. Yeah, that would simplify a lot of problems, with parallel exucution in hight performance systems. It might be easier to just use your own thread pool. There are multiple
implementations out there. -- Show quoteWilliam Stacey [MVP] "Vadym Stetsyak" <vady***@ukr.net> wrote in message news:urHGmqhmGHA.4700@TK2MSFTNGP03.phx.gbl... | Hello, Chris! | You wrote on Tue, 27 Jun 2006 10:12:56 -0700: | | CM> The problems stems from the fact that no matter how many threads | are in | there, I can exhaust them all. Because the CLR itself needs threadpool | threads, this means the whole system then deadlocks. | | | Strange workaround just came to my mind, it can solve the problem with | single thread pool, but its not optiomal, for the software that you | described in your post. | | The solutions is multiple AppDomains. Every AppDomain has its own | ThreadPool, so we can create something like "worker" AppDomain. | | This approach is an overkill, due to marshalling, but nevertheless this | solution exists, and application can be built using this solution. | | CM> What's needed are two threadpools: One for my application to use, | and | one for the CLR to use. There's just no getting around this. | | | Yeah, that would simplify a lot of problems, with parallel exucution in | hight performance systems. | | -- | Regards, | Vadym Stetsyak. | Blog: http://vadmyst.blogspot.com | | "William Stacey [MVP]" <william.sta***@gmail.com> wrote I've looked all over the place for a good threadpool implementation, and > It might be easier to just use your own thread pool. There are multiple > implementations out there. have yet to find one. My ideal thread pool would have the following features: 1 - Threads are dynamically added and removed form the queue depending on load. 2 - Threads have some degree of processor affinity in order to minimize cache issues 3 - The scheduler is smart about assigning work items to threads based on processor load. 4 - A built-in watchdog. If a work item is exceeding some timeout, a mechanism exists to abort it. 5 - Intelligent about locking on the work item queue I've built a number of thread pools, and looked at several that I've found on the Internet. None of them have ever had all these qualities.. -- Chris Mullins "Chris Mullins" <cmull***@yahoo.com> wrote: There's no good way to abort a thread, so I wouldn't be too optimistic> 4 - A built-in watchdog. If a work item is exceeding some timeout, a > mechanism exists to abort it. about finding this feature. Thread.Abort is dangerous and evil, and any shared structures being currently manipulated by the thread will likely be in an inconsistent state. The best you can do is tear down the AppDomain. -- Barry That is true. The best place for this is in your own code (running on a TP
thread). I guess an optional callback could be a safty for things user code does not check for, like timeouts on locks or send/receives. But then again, the callback has to interupt and/or abort the thread, but at least you get the option to do some last second clean up before tearing down the app. Not sure a great solution exists as you said. -- Show quoteWilliam Stacey [MVP] "Barry Kelly" <barry.j.ke***@gmail.com> wrote in message news:vjc6a2pqi32e4s106le487tumkl6qort55@4ax.com... | "Chris Mullins" <cmull***@yahoo.com> wrote: | | > 4 - A built-in watchdog. If a work item is exceeding some timeout, a | > mechanism exists to abort it. | | There's no good way to abort a thread, so I wouldn't be too optimistic | about finding this feature. Thread.Abort is dangerous and evil, and any | shared structures being currently manipulated by the thread will likely | be in an inconsistent state. The best you can do is tear down the | AppDomain. | | -- Barry | | -- | http://barrkel.blogspot.com/ I'll play. I wrote a couple before so have general interest and maybe I'll
blow the dust off it and update. In general, some of these things may have more cost then payoff for a general purpose thread pool, but let me ask. | 1 - Threads are dynamically added and removed form the queue depending on What defines "load" here? Is it Q size or cpu load. CPU load logic may | load. compete against min/max thread parms. So what is algo here? | 2 - Threads have some degree of processor affinity in order to minimize Windows using soft affinity automatically. It prefers to run a task on the | cache issues same processor as last time, but will swith if it has to based on its algo. IMO, the system will do a better job then we can do here. The successful usage cases of changing processor affinity are rare. However, I agree there may be special cases - but wonder if that will be abused if available. What are your thoughts here? | 3 - The scheduler is smart about assigning work items to threads based on What you mean? tia| processor load. | 4 - A built-in watchdog. If a work item is exceeding some timeout, a That could be a good add. It is overhead, but probably not too bad.| mechanism exists to abort it. | 5 - Intelligent about locking on the work item queue At a minimum, you need to lock on add and remove. Are you thinking of something specific, or just seen poor locking in the TPs you reviewed? -- William Stacey [MVP] "William Stacey [MVP]" <william.sta***@gmail.com> wrote [Features a threadpool should have]> | 1 - Threads are dynamically added and removed form the queue depending I think it would be more cpu load than anything else. I'm not sold however > on > | load. > > What defines "load" here? Is it Q size or cpu load. CPU load logic may > compete against min/max thread parms. So what is algo here? on this being the correct metric. The behavior that I explicitly want to avoid is dumping 100 items into a Queue, and having the thread pool immediatly spin up 100 threads. Likewise, if 100 threads are spun up, I wouldn't want to see them all terminated immediatly should there be no items in the pool. The System thread pool seems to have a timeout of some sort before spawning new threads. This seems to work fairly well. > [Thread Affinity]> | 2 - Threads have some degree of processor affinity in order to minimize > | cache issues > IMO, the system will do a better job then we can do here. The successful I tend to agree with you here, and as a result I have never myself written > usage cases of changing processor affinity are rare. However, I agree > there > may be special cases - but wonder if that will be abused if available. > What > are your thoughts here? code that has processor affinity. Windows has always done a good job of things. However I've been runnnig alot of tests latley on this machine: http://www.coversant.net/bigiron.png That particular machine has 16 physical Itanium2 processors, and some of it's behavior has been unexpected. We consistantly see certain processors load signifigantly higher than others, and it's tough to tell quite why. Down deep in the network stack, there are some issues (which are solved by the new Chimney features in the Scalable Network Pack) with interrupts and processor affinity, so this may explain some of it. On the other hand, there seems to be room for improvement. This is a more of a feeling than a fact, I'll readily admit. It's also compounded by the lack of profiling tools that run on Itanium (there are a few, but they're not nearly as usefull for .Net as the Compuware stuff is). I would really like to be able to have a way to figure out what processor cache (especially on these Itanium2 machines with their HUGE caches) my datastructure is already in, have a thread on that processor wake up, process the item from the cache, write the results to main memory. I don't think I'm going to see this any time in the furture, but it's on my wish list. I have that feeling (I know, I know) this will give a fiarly signifigant boost to performance as the load increases. > | 3 - The scheduler is smart about assigning work items to threads based Many of the threadpools that I have seen use this algorithm:> on > | processor load. > > What you mean? tia 1 -thread that wakes up 2 - thread checks for shutdown 3 - thread checks for a work item 4 - thread processes work item 5 - thread goes back to sleep 6 - goto 1 Some use variations on this, and fancy means of putting threads to sleep - I've seen a few algorithms: 1 - Keep all threads blocked in a Montior. Whenver a work items comes in, an "PulseOne" (or, god help us, a Pulse-All with a "First one wins" algorithm) 2 - Use Events and Waits to test for shutdown, items in the queue, etc. 3 - Items are posted to a Queue, a control thread wakes up and gives the work item to a particular work thread. I would really like to see an algorithm that has, in the "busy" case, as few context switches as possible. When I post an item to a queue, there should be exactly 1 context switch needed to process my item. > I am thinking of some specific stuff. That 16 processor Itanium box has > | 5 - Intelligent about locking on the work item queue > > At a minimum, you need to lock on add and remove. Are you thinking of > something specific, or just seen poor locking in the TPs you reviewed? signifigantly influenced the way I think about locking and contention. Typically a queue has two locks - one on Enqueue, one on Dequeue. This is how I've been coding it for years. However, after seeing some performance that really puzzled me, I spent some time looking into the contention in our system. While doing that, I came across this: http://makeashorterlink.com/?Q4132185D I played with this test for a while, and ran it on single, dual, quad, and 16 processor boxes, and got some shocking results. After looking into this more and more, I became quite alarmed at just how much contention we were seeing. It was.... shocking. I have all the numbers recorded, and will be doing a blog entry on my findings here in the not-too-distant future. Anyway, after much wailing and gnashing of teeth, I stumbled across thread-safe, lock-free datastructures: http://www.boyet.com/Articles/LockfreeQueue.html and have been educating myself. I added these into the test suites, and got more interesting results. On a single or dual processor machine they're signifigantly slower. On a 16 way machine, they're waaaaaay faster. The end result, is that on different processor architectures, and different numbers of processors, the locking mechanisms need to change. The difference in some cases was a full 4 orders of magnitude (in one very dramatic case, what took 85 milliseconds on a single processor x86 machine, took 200 milliseconds on a dual-processor machine, took 2 minutes on a quad processor machine, and took more than 30 minutes on a 16-way machine). Granted these cases were contrived and almost nothing but thread contention, but they illustrated the case and let me pick the right datastructures. My hypothetical thread pool would take this into account and have a variety of locking mechanisms. It would even be especially nice if, in a general some case, a work item could be handed directly to a thread with no lock required at all. In the "lots of threads awake" case, it seems as if there should be a way to do this. Now, I'll be the first to admit that locking and unlocking on the queue for work items shouldn't be too big a hit, especially when compared to parsing some xml, string manipulation, etc - but these locks are hit at least twice per operation (one in, one out), and at 10k operations per second, that adds up quick. -- Chris Chris Mullins wrote:
Show quote > "William Stacey [MVP]" <william.sta***@gmail.com> wrote [lots more great discussion about thread pools and lock-free data > > [Features a threadpool should have] > >>> 1 - Threads are dynamically added and removed form the queue >>> depending >> on >>> load. >> >> What defines "load" here? Is it Q size or cpu load. CPU load logic >> may compete against min/max thread parms. So what is algo here? > > I think it would be more cpu load than anything else. I'm not sold > however on this being the correct metric. > > The behavior that I explicitly want to avoid is dumping 100 items > into a Queue, and having the thread pool immediatly spin up 100 > threads. Likewise, if 100 threads are spun up, I wouldn't want to see > them all terminated immediatly should there be no items in the pool. structures] It seems to me that there's a great model of how to make a high performance server make the most effective use of multiple CPUs out there already: SQL Server (and, I would presume, Oracle and DB2 must use similar techniques). Within SQL Server, they go to extraordinary lengths to try to have exactly 1 unblocked thread per CPU, almost never more, and fewer only when there are fewer work items than CPUs. Further, they go to extraordinary lengths to ensure that context switches never occur and that worker thread never block. Considerable writing has been done on the inner workings of the User Mode Scheduler that SQL Server uses. For example, http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsqldev/html/sqldev_02252004.asp How does that mesh with your wish-list for the ultimate thread pool? Of course, it's more than a thread pool - it is, in effect, an entire Operating System built on top of the OS, requiring that code running atop it use it's I/O and memory allocation facilities exclusively. -cd "Carl Daniel [VC++ MVP]" <cpdaniel_remove_this_and_nospam@mvps.org.nospam> The SQL Server infrastructure is great, but I'm afraid it doesn't really do wrote in message news:eeK3sM4mGHA.4960@TK2MSFTNGP04.phx.gbl... > Chris Mullins wrote: >> "William Stacey [MVP]" <william.sta***@gmail.com> wrote >> >> [Features a threadpool should have] >> > It seems to me that there's a great model of how to make a high > performance server make the most effective use of multiple CPUs out there > already: SQL Server (and, I would presume, Oracle and DB2 must use > similar techniques). me any good. Now, as soon as Microsoft releases SQL Server 2000 or 2005 under a GPL or BSD license, I'll use that code :) I'm not so desperate yet as to use Fibers and write my own scheduler. Next year, perhaps, but not yet! -- Chris Mullins Chris Mullins wrote:
Show quote > "Carl Daniel [VC++ MVP]" I have, a couple years back. It would be interestsing to dust that code > <cpdaniel_remove_this_and_nospam@mvps.org.nospam> wrote in message > news:eeK3sM4mGHA.4960@TK2MSFTNGP04.phx.gbl... >> Chris Mullins wrote: >>> "William Stacey [MVP]" <william.sta***@gmail.com> wrote >>> >>> [Features a threadpool should have] >>> > >> It seems to me that there's a great model of how to make a high >> performance server make the most effective use of multiple CPUs out >> there already: SQL Server (and, I would presume, Oracle and DB2 >> must use similar techniques). > > The SQL Server infrastructure is great, but I'm afraid it doesn't > really do me any good. Now, as soon as Microsoft releases SQL Server > 2000 or 2005 under a GPL or BSD license, I'll use that code :) > > I'm not so desperate yet as to use Fibers and write my own scheduler. > Next year, perhaps, but not yet! off, make it .NET friendly (it's native C++), and use it to build a UMS-like framework for .NET. That'd be a powerful tool to have in the .NET toolbox. -cd "Carl Daniel [VC++ MVP]"
> .... and here I thought I was hardcore. You, sir, are well and truly sick! :)>> I'm not so desperate yet as to use Fibers and write my own scheduler. >> Next year, perhaps, but not yet! > I have, a couple years back. It would be interestsing to dust that code > off, make it .NET friendly (it's native C++), and use it to build a > UMS-like framework for .NET. That'd be a powerful tool to have in the > .NET toolbox. -- Chris Mullins | The behavior that I explicitly want to avoid is dumping 100 items into a Here is where trade-offs come into play and needing a specific behavior | Queue, and having the thread pool immediatly spin up 100 threads. Likewise, | if 100 threads are spun up, I wouldn't want to see them all terminated | immediatly should there be no items in the pool. instead of something general. A simple way (that I used) is to have each thread check, after, it is done with the delegate if new threads need to be created (i.e. Q.Count > 1 and TPThreads.Count < maxThreads). This avoids needing a seperate manager thread running. This, in fact, may be the required behavior for the app (i.e. don't wait but bound the max). Having to walk the active threads and check if they are in fact blocked and only then spin up more can induce delay and not sure there is a managed way to do this. | The System thread pool seems to have a timeout of some sort before Not sure exact way they do it either. Put has something to do with checking spawning | new threads. This seems to work fairly well. if threads are blocked, then spin 1 more, and so on, upto the max. | > | 2 - Threads have some degree of processor affinity in order to Not sure how much thread affinity will help here and is more likely to slow minimize | > | cache issues | | [Thread Affinity] things down. You never want to wait, so if there is a ready, willing, and able thread ready to pop the queue and start work, then that is better then trying to wait for same thread used last time or spend cycles trying to figure that out when the work could have already started or completed. So not sure we get payoff here in general. Naturally, there are probably use cases that prove my feeling wrong. | Some use variations on this, and fancy means of putting threads to sleep - Need to look very hard at this one. I use pulseall in my blocking-Q because | I've seen a few algorithms: | 1 - Keep all threads blocked in a Montior. Whenver a work items comes in, an | "PulseOne" (or, god help us, a Pulse-All with a "First one wins" algorithm) it is the right thing to do in that case. I have seen many incorrect implementation that use Pulse(). It may look right at first, but there a few funky things that can happen that can create a live-lock when using just Pulse(). In most cases, if the loop is tight, a PulseAll is not going to create a hurding issue and may, in fact, be the only way to ensure correctness. Sure you may get a couple threads here and there that wake-up only to sleep again, but that should be the exception. | I would really like to see an algorithm that has, in the "busy" case, as That is the way mine implementation works. I admit to more locking then I few | context switches as possible. When I post an item to a queue, there should | be exactly 1 context switch needed to process my item. would like, but can't figure out a way around it while keeping everything correct and not getting into lock-free stuff. I'll post here, so please take shoots at it. -- William Stacey [MVP]
Show quote
"William Stacey [MVP]" <william.sta***@gmail.com> wrote I used the word dreaded because the normal case (where only some threads > | 1 - Keep all threads blocked in a Montior. Whenver a work items comes > in, > an > | "PulseOne" (or, god help us, a Pulse-All with a "First one wins" > algorithm) > > Need to look very hard at this one. I use pulseall in my blocking-Q > because > it is the right thing to do in that case. I have seen many incorrect > implementation that use Pulse(). It may look right at first, but there a > few funky things that can happen that can create a live-lock when using > just > Pulse(). In most cases, if the loop is tight, a PulseAll is not going to > create a hurding issue and may, in fact, be the only way to ensure > correctness. Sure you may get a couple threads here and there that > wake-up > only to sleep again, but that should be the exception. should be active) has Windows waking up all the threads, then putting most of them right back to sleep. This can't be a good idea... Especially with dozens (or hundreds) of threads in the pool. I don't know the right answer to this, but there's gotta be a better way. -- Chris Mullins Chris Mullins wrote:
Show quote > "William Stacey [MVP]" <william.sta***@gmail.com> wrote I'm not sure how you're using the queue here. If it's just a simple queue> > >>| 1 - Keep all threads blocked in a Montior. Whenver a work items comes >>in, >>an >>| "PulseOne" (or, god help us, a Pulse-All with a "First one wins" >>algorithm) >> >>Need to look very hard at this one. I use pulseall in my blocking-Q >>because >>it is the right thing to do in that case. I have seen many incorrect >>implementation that use Pulse(). It may look right at first, but there a >>few funky things that can happen that can create a live-lock when using >>just >>Pulse(). In most cases, if the loop is tight, a PulseAll is not going to >>create a hurding issue and may, in fact, be the only way to ensure >>correctness. Sure you may get a couple threads here and there that >>wake-up >>only to sleep again, but that should be the exception. > > > I used the word dreaded because the normal case (where only some threads > should be active) has Windows waking up all the threads, then putting most > of them right back to sleep. This can't be a good idea... Especially with > dozens (or hundreds) of threads in the pool. > > I don't know the right answer to this, but there's gotta be a better way. > use a semaphore to control blocking on the queue. You'll need a lock as well if you have more than one queueing and one dequeueing thread. Unless it's a lock-free queue of course. If you want a fast pathed semaphore which avoids system calls much like what critical section does for mutex, you can check here http://groups.google.com/groups?selm=412F5D8D.416AE***@xemaps.com Note that if you use this on xbox360 the interlocked functions don't have memory barriers so you'll have to provide your own. For things like timer queues I use a gating mutex or semaphore to avoid the thundering herd problem caused by having the entire thread pool wait on the same timeout. The gating mutex ensures only one thread is waiting on an autoreset event with infinite timeout if there's no work and a specific timeout if there is a scheduled piece of work. So in pseudo code gate.lock(); queue.lock(); for (;;) { if queue.empty() timeout = INFINITE; else timeout = queue.peek().deadline - now; if (timeout < 0) break; queue.unlock(); event.wait(); // autoreset event queue.lock() } item = queue.pop(); queue.unlock(); gate.unlock(); Any queueing of work that changes the head of the queue (next item to be scheduled) signals the autoreset event. -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software.
Show quote
| gate.lock(); I am not following that totally.| queue.lock(); | for (;;) { | if queue.empty() | timeout = INFINITE; | else | timeout = queue.peek().deadline - now; | | | if (timeout < 0) | break; | | queue.unlock(); | event.wait(); // autoreset event | queue.lock() | } | | item = queue.pop(); | queue.unlock(); | gate.unlock(); | 1) if queue is empty timeout is set to -1 (i.e. INFINITE). The next "if" is then true and does a break. Then you try to pop() an empty queue? This does not seem right. Please help. 2) Where else do you wait on timeout? Should it be event.wait(timeout) instead of event.wait()? tia William Stacey [MVP] wrote:
Show quote > | gate.lock(); Right that should be "if (timeout != INFINITE && timeout < 0)"> | queue.lock(); > | for (;;) { > | if queue.empty() > | timeout = INFINITE; > | else > | timeout = queue.peek().deadline - now; > | > | > | if (timeout < 0) > | break; > | > | queue.unlock(); > | event.wait(); // autoreset event > | queue.lock() > | } > | > | item = queue.pop(); > | queue.unlock(); > | gate.unlock(); > | > > I am not following that totally. > 1) if queue is empty timeout is set to -1 (i.e. INFINITE). The next "if" is > then true and does a break. Then you try to pop() an empty queue? This > does not seem right. Please help. > 2) Where else do you wait on timeout? Should it be event.wait(timeout) > instead of event.wait()? > > tia > > and "event.wait(timeout)". -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software. Chris Mullins wrote:
> Anyway, after much wailing and gnashing of teeth, I stumbled across That's Maged Michael and Michael Scott's lock free queue algorithm if you> thread-safe, lock-free datastructures: > http://www.boyet.com/Articles/LockfreeQueue.html and have been educating > myself. > want to google for it. -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software. "Joe Seigh" <jseigh***@xemaps.com> wrote Wow. I just spent some time reading Maged Michael's paper's over at IBM > Chris Mullins wrote: >> Anyway, after much wailing and gnashing of teeth, I stumbled across >> thread-safe, lock-free datastructures: >> http://www.boyet.com/Articles/LockfreeQueue.html and have been educating >> myself. >> > That's Maged Michael and Michael Scott's lock free queue algorithm if you > want to google for it. reasearch. He's got some great, great stuff in there. Thanks for the pointer. Any idea if I can find nice (and debugged) C# implementations of his algorithms? -- Chris Mullins Hello, Chris!
CM> Any idea if I can find nice (and debugged) C# implementations of his CM> algorithms? Yep, the stuff is great out there. Also some of the algorithms are in the links provided by you, namely CAS, ( http://www.boyet.com/Articles/LockfreeStack.html ) Chris Mullins wrote:
Show quote > "Joe Seigh" <jseigh***@xemaps.com> wrote I'm not sure. Most of the stuff I did was read lock-free (as opposed> >>Chris Mullins wrote: >> >>>Anyway, after much wailing and gnashing of teeth, I stumbled across >>>thread-safe, lock-free datastructures: >>>http://www.boyet.com/Articles/LockfreeQueue.html and have been educating >>>myself. >>> >> >>That's Maged Michael and Michael Scott's lock free queue algorithm if you >>want to google for it. > > > Wow. I just spent some time reading Maged Michael's paper's over at IBM > reasearch. He's got some great, great stuff in there. Thanks for the > pointer. > > Any idea if I can find nice (and debugged) C# implementations of his > algorithms? > to write lock-free) and mainly used RCU, Maged's SMR hazard pointers, or lock-free reference counting for the GC component of the lock-free algorithms. So you could have read lock-free hash tables, linked lists, or even balanced binary trees if I had ever gotten around to working out the heuristics for balancing (you can't just plug in the current AVL or Red/Black algorithms as is). But too much patent activity in that area so I shut down those projects. You could take a look at the java.util.concurrent classes in the latest version of Java to see how they did it. JSR-166 added in lock-free support. They basically use double wide interlocked compare and exchange to implement the AtomicMarkableReference and AtomicStampedReference classes. You need stuff like that to avoid the ABA problem. You can use tracing GC if it's built in to avoid the ABA problem like the article above did but for a high throughput queue, you are going to generate a lot of garbage for the GC to collect. -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software. Here is rough pseudocode for a server that uses a threadpool (the clr or
other) to handle N client sockets. We break a TP rule here because we use a threadpool thread for a long running operation (i.e. the socket read). However, it is our server, so we understand and manage that issue. Basically, we add all client sockets to a list - for management, etc. We then kick off the first read (and future reads) on a thread pool thread. We set a max of threads to 100 in this example. So at any one instant, we could have a 100 pending reads. Other client reads will just queue up in the TP FIFO queue. Issues: 1) 100 slow clients (in this example) can stall the TP. This could be an attack, or just 100 slow clients. We can mitigate attack by only allowing so many connections from same IP. Other methods, such as an async server, can also stall after many pending reads, so not sure how much of issue this is in reality compared to other methods. After a client does a read/reply, it frees that thread and posts another read to the queue so other clients get time. The next client in the queue gets to read so the server will keep running as long as client reads eventually happen. 2) Writes are done sync on the same thread. Not sure this is much of an issue however. The write will be done quickly with no wait. Not sure we gain anything by doing an async write. Advantages: 1) Built in fairness. Client requests are handled FIFO. 2) Dynamic throttle. We can throttle and apply "back-pressure" just by changing the number of MaxThreads in the TP. This will limit the number of pending sync reads/readprocessing on the server at any one time. New clients just queue up. 3) Logic is sync. However we don't assign 1 thread per client for the whole client session. Each request/reply will be handle by another TP thread (possibly the same if Q is empty). This makes it easy to code and find bugs. We can also use the socket timeout feature on reads and writes. Slow clients that timeout can just be dropped (or use some algo). After a timeout, we free that thread for another client read. 4) Easy to replace system TP with a custom TP if needed. 5) Can handle many thousands of clients with little number of threads. Only testing we find the optimal number of threads needed for the app. PseudoCode (just the basics. Locks, exception handling, etc, not shown for simplicity): --------------------------------------------------------------------------------------- ThreadPool.MinThreads = 25; ThreadPool.MaxThreads = 100; while(started) { sock = Accept(); sock.ReadTimeout = 5000; if ( userList.Count > 5000 ) // Set to how many concurrent clients you want. { sock.Close(); //A hard close - too many clients. Can do this better. continue; } userList.Add(sock); // Kick off the request/response chain for client. ThreadPool.QueueUserWorkItem(ReadMethod, sock); } private void ReadMethod(object state) { // This is on a threadpool thread. // Socket sock = (Socket)state; byte[] buf = new byte[1024]; try { int read = sock.Receive(buf); // Handle request processing. // Write reply back to client. sock.Send(writebuf); if ( done ) { sock.Close() userList.Remove(sock); return; // Client done. } // Release this thread to pool, but queue another read before we leave. // Note: If there is no other queued work items, // the TP will most likely just use the same thread. ThreadPool.QueueUserWorkItem(ReadMethod, sock); } catch(Timeout te) { //Handle timeout. } catch(Exception ex) { //Handle exception } } -- William Stacey [MVP] "Chris Mullins" <cmull***@yahoo.com> wrote in message http://www.coversant.net/dotnetnuke/Coversant/Blogs/tabid/88/EntryID/8/Default.aspxnews:uN57HGamGHA.4716@TK2MSFTNGP04.phx.gbl... | Quite often of late I've been asked about when to use the System Threadpool. | | I finally sat down and wrote a blog detailing the issues in the ThreadPool | and why it's unsuitable for use in production grade server applications. | | Any feedback on this is appriciated! | | Show quote | | -- | Chris Mullins | | |
|||||||||||||||||||||||