Home All Groups Group Topic Archive Search About

Async Socket IO Question

Author
4 Oct 2006 6:17 PM
EmeraldShield
Hi all.  I have been digging around trying to find an answer to a few
questions that are bugging me.  I am hoping someone here can help.

1 - If you start an Async IO (BeginAccept) and the client hangs up what
happens?  The docs really don't tell you one way or the other.  In my local
testing many of the sockets are never getting released for some reason.  I
am not seeing my delegate called.  I used an Interlock increment on each
begin and a decrement on each delegate to track it and sometimes they come
through, and sometimes they don't.  Very odd.  What is the correct behavior?

2 - Why does the .Connected property not update?  I have read the docs and
tried lots of tests.  It still does not appear to work.

try
{
     // Attempt to ensure we are still connected...
     byte[] temp = { 0x00 };
    // I tried blocking and non-blocking didn't make a difference
     sp.Client.Blocking = false;
    // This is the MS recommended way.  It ALWAYS returns 0 for me and
connected is not updated.
     int res = sp.Client.Send(temp, 0, 0);
    // I added this as an additional test and the same thing happens.
Always get back 0.
     res = sp.Client.Receive(temp, 0, SocketFlags.None);
     // If we get here we are still connected and alive...
    }
catch( SocketException e )
{
     if( e.NativeErrorCode.Equals(10035) )
     {
      // Still connected - the call would black
     }
     else
     {
      // We are disconnected
      TimeoutOccured(ref sp);
      return (false);
     }
}

This app is a socket server that has remote clients connect.  I can manually
telnet to the app, watch it start a read, and then kill the telnet app.  I
know the socket is gone.  I look in the process list and telnet is gone.
But the reads and sends still report it is valid, and connected still
reports true.
I do this test above in my routine prior to calling the beginreceive (I
figure there is no use beginning a receive if the client hung up), and
before sending data.  Doesn't seem to make a difference.

What am I doing wrong?

Author
4 Oct 2006 6:31 PM
Chris Mullins
I think you're mostly bumping into Socket Timeout issues.

If you kill the process of your client app, the TCP session isn't cleanly
shut down. Your server still thinks the connection exists. You can verify
this using "NetStat -a". You'll see your connection still in there.

When your server sends to the client app, that send happens just fine (the
TCP Session still there). After a few moments, your TCP send will timeout,
and you'll get an error. At this point the TCP Session is torn down.

Just because it's amusing, I have had more bugs in socket shutdown code than
all the other areas of networking put together. There are so many ways, and
so many conditions, that can cause a TCP session to be torn down that it's
just depressing.

Here's an MS KB article that goes into how to adjust your timeouts:
http://support.microsoft.com/?kbid=170359

Note: I don't recommend adjusting your timeouts - but you do need to have a
solid understanding of what's going on.

--
Chris Mullins, MCSD.NET, MCPD:Enterprise
http://www.coversant.net/blogs/cmullins


Show quote
"EmeraldShield" <emeraldshield@noemail.noemail> wrote in message
news:%23Q$abF%235GHA.4644@TK2MSFTNGP04.phx.gbl...
> Hi all.  I have been digging around trying to find an answer to a few
> questions that are bugging me.  I am hoping someone here can help.
>
> 1 - If you start an Async IO (BeginAccept) and the client hangs up what
> happens?  The docs really don't tell you one way or the other.  In my
> local testing many of the sockets are never getting released for some
> reason.  I am not seeing my delegate called.  I used an Interlock
> increment on each begin and a decrement on each delegate to track it and
> sometimes they come through, and sometimes they don't.  Very odd.  What is
> the correct behavior?
>
> 2 - Why does the .Connected property not update?  I have read the docs and
> tried lots of tests.  It still does not appear to work.
>
> try
> {
>     // Attempt to ensure we are still connected...
>     byte[] temp = { 0x00 };
>    // I tried blocking and non-blocking didn't make a difference
>     sp.Client.Blocking = false;
>    // This is the MS recommended way.  It ALWAYS returns 0 for me and
> connected is not updated.
>     int res = sp.Client.Send(temp, 0, 0);
>    // I added this as an additional test and the same thing happens.
> Always get back 0.
>     res = sp.Client.Receive(temp, 0, SocketFlags.None);
>     // If we get here we are still connected and alive...
>    }
> catch( SocketException e )
> {
>     if( e.NativeErrorCode.Equals(10035) )
>     {
>      // Still connected - the call would black
>     }
>     else
>     {
>      // We are disconnected
>      TimeoutOccured(ref sp);
>      return (false);
>     }
> }
>
> This app is a socket server that has remote clients connect.  I can
> manually telnet to the app, watch it start a read, and then kill the
> telnet app.  I know the socket is gone.  I look in the process list and
> telnet is gone. But the reads and sends still report it is valid, and
> connected still reports true.
> I do this test above in my routine prior to calling the beginreceive (I
> figure there is no use beginning a receive if the client hung up), and
> before sending data.  Doesn't seem to make a difference.
>
> What am I doing wrong?
>
>
Author
4 Oct 2006 6:53 PM
EmeraldShield
Logically I knew that... :)  yes, the connection has not been closed and is
still pending timeout.

Thanks for the link.

But what happens if 500 people hit your socket and then all hangup?  You
will get all the accepts and start receives on them.  But then what happens?
Did that just kill your IOCP pool of threads?

Thanks.


Show quote
"Chris Mullins" <cmull***@yahoo.com> wrote in message
news:Okj4LN%235GHA.2188@TK2MSFTNGP02.phx.gbl...
>I think you're mostly bumping into Socket Timeout issues.
>
> If you kill the process of your client app, the TCP session isn't cleanly
> shut down. Your server still thinks the connection exists. You can verify
> this using "NetStat -a". You'll see your connection still in there.
>
> When your server sends to the client app, that send happens just fine (the
> TCP Session still there). After a few moments, your TCP send will timeout,
> and you'll get an error. At this point the TCP Session is torn down.
>
> Just because it's amusing, I have had more bugs in socket shutdown code
> than all the other areas of networking put together. There are so many
> ways, and so many conditions, that can cause a TCP session to be torn down
> that it's just depressing.
>
> Here's an MS KB article that goes into how to adjust your timeouts:
> http://support.microsoft.com/?kbid=170359
>
> Note: I don't recommend adjusting your timeouts - but you do need to have
> a solid understanding of what's going on.
>
> --
> Chris Mullins, MCSD.NET, MCPD:Enterprise
> http://www.coversant.net/blogs/cmullins
>
>
> "EmeraldShield" <emeraldshield@noemail.noemail> wrote in message
> news:%23Q$abF%235GHA.4644@TK2MSFTNGP04.phx.gbl...
>> Hi all.  I have been digging around trying to find an answer to a few
>> questions that are bugging me.  I am hoping someone here can help.
>>
>> 1 - If you start an Async IO (BeginAccept) and the client hangs up what
>> happens?  The docs really don't tell you one way or the other.  In my
>> local testing many of the sockets are never getting released for some
>> reason.  I am not seeing my delegate called.  I used an Interlock
>> increment on each begin and a decrement on each delegate to track it and
>> sometimes they come through, and sometimes they don't.  Very odd.  What
>> is the correct behavior?
>>
>> 2 - Why does the .Connected property not update?  I have read the docs
>> and tried lots of tests.  It still does not appear to work.
>>
>> try
>> {
>>     // Attempt to ensure we are still connected...
>>     byte[] temp = { 0x00 };
>>    // I tried blocking and non-blocking didn't make a difference
>>     sp.Client.Blocking = false;
>>    // This is the MS recommended way.  It ALWAYS returns 0 for me and
>> connected is not updated.
>>     int res = sp.Client.Send(temp, 0, 0);
>>    // I added this as an additional test and the same thing happens.
>> Always get back 0.
>>     res = sp.Client.Receive(temp, 0, SocketFlags.None);
>>     // If we get here we are still connected and alive...
>>    }
>> catch( SocketException e )
>> {
>>     if( e.NativeErrorCode.Equals(10035) )
>>     {
>>      // Still connected - the call would black
>>     }
>>     else
>>     {
>>      // We are disconnected
>>      TimeoutOccured(ref sp);
>>      return (false);
>>     }
>> }
>>
>> This app is a socket server that has remote clients connect.  I can
>> manually telnet to the app, watch it start a read, and then kill the
>> telnet app.  I know the socket is gone.  I look in the process list and
>> telnet is gone. But the reads and sends still report it is valid, and
>> connected still reports true.
>> I do this test above in my routine prior to calling the beginreceive (I
>> figure there is no use beginning a receive if the client hung up), and
>> before sending data.  Doesn't seem to make a difference.
>>
>> What am I doing wrong?
>>
>>
>
>
Author
4 Oct 2006 9:37 PM
Chris Mullins
"EmeraldShield" <emeraldshield@noemail.noemail> wrote

[TCP Timeouts]
> Logically I knew that... :)  yes, the connection has not been closed and
> is still pending timeout.

Glad that was easy to track down. Netstat can be your friend.

> But what happens if 500 people hit your socket and then all hangup?  You
> will get all the accepts and start receives on them.  But then what
> happens? Did that just kill your IOCP pool of threads?

The Async Socket Mechanism will keep chugging along just fine. Just because
you have 500 sockets in "BeginRead" does NOT mean you have 500 sockets
blocked inside a thread somewhere. Quite the opposite in fact.

I wrote up a fair bit of architecture around what we found to be the best
way to do this a while back:
http://www.coversant.net/dotnetnuke/Default.aspx?tabid=88&EntryID=10

Note: the architecture described there differs from what many people
recommend. Especially some very well known, and well respected authors on
the subjects of scalability. I've personally talked with many of those
authors regarding this blog post, and had some very interesting discussions.
At the end of the day though, the architecture described here does really
works and scale that high. The archtiectures many others recommend (which is
where we started) completely failed us in real-world usage.

We have scaled this up beyond belief. The Async stuff works very, very well.

--
Chris Mullins, MCSD.NET, MCPD:Enterprise
http://www.coversant.net/blogs/cmullins
Author
4 Oct 2006 11:26 PM
EmeraldShield
Thanks Chris,

>> But what happens if 500 people hit your socket and then all hangup?  You
>> will get all the accepts and start receives on them.  But then what
>> happens? Did that just kill your IOCP pool of threads?
>
> The Async Socket Mechanism will keep chugging along just fine. Just
> because you have 500 sockets in "BeginRead" does NOT mean you have 500
> sockets blocked inside a thread somewhere. Quite the opposite in fact.

I have written quite a bit of IOCP code in C++ over the years, but not much
in C# yet.  I guess I am going to have to really build some test cases and
beat on it like I did while learning the IOCP / C++ code.  In fact many of
those old test apps I wrote may still be applicable...  Didn't think about
that.


Show quote
> I wrote up a fair bit of architecture around what we found to be the best
> way to do this a while back:
> http://www.coversant.net/dotnetnuke/Default.aspx?tabid=88&EntryID=10
>
> Note: the architecture described there differs from what many people
> recommend. Especially some very well known, and well respected authors on
> the subjects of scalability. I've personally talked with many of those
> authors regarding this blog post, and had some very interesting
> discussions. At the end of the day though, the architecture described here
> does really works and scale that high. The archtiectures many others
> recommend (which is where we started) completely failed us in real-world
> usage.
>
> We have scaled this up beyond belief. The Async stuff works very, very
> well.

Very interesting read.  Non-inuitive. But you can't argue with the results.

I am puzzled as to how you were able to support 100,000 TCP connetions under
NT though.  AFIAK the system only allows 10,000 by default without registry
tweaks.  And then beyond around 75,000 you start to run into paged pool
limits and vtables locked in RAM by the OS.  Unless dot net does something
different under the hood from how we used to do it in C++.

Very encouraging results though.  I am porting a smaller app of ours to C#
mostly for maintenance reasons.  It is an older C++ app that needed to be
updated for new things anyway, so I decided to rearchitect it and move up to
C# at the same time.

We support 100,000+ connections on the C++ app.  Will be interesting to see
what the C# app can do.  If this works well we have a much larger app that I
will make the move with as well.  The entire managed architecture is very,
very appealing to me after years of off by one crashes. :)

Thanks.
Author
5 Oct 2006 12:04 AM
Chris Mullins
"EmeraldShield" <emeraldshield@noemail.noemail> wrote
> [Scaling Socket Apps]

>> We have scaled this up beyond belief. The Async stuff works very, very
>> well.
>
> Very interesting read.  Non-inuitive. But you can't argue with the
> results.

Yea, there is that.

> I am puzzled as to how you were able to support 100,000 TCP connetions
> under NT though.

Well, we've never tried under Windows NT. The O/S we do our heavy lifting on
is Windows 2003 Server.

> AFIAK the system only allows 10,000 by default without registry tweaks.

As a socket client application, there are default limits in the system. As a
socket server application, these limits don't really exist. Perhps on NT,
but not (that I remember) on Win2K, and certainly not on Win2k3.

> And then beyond around 75,000 you start to run into paged pool limits and
> vtables locked in RAM by the OS.

You sure do. Each Async socket takes a certain amount of non-paged memory.
On a "big" 32 bit machine (4 gb of memory) we can do about 50k simultanious
sockets. Things don't really scale that well there.

The key phrase there is "32 bit machine". On 64-bit hardware the limits are
much, much higher.

> Unless dot net does something different under the hood from how we used to
> do it in C++.

Nope. Same thing. 64-bit hardware is just bigger and badder:
http://www.coversant.net/bigiron.PNG

Low-end 64-bit hardware is practical today. A dual-core AMD machine with 4GB
of memory and Windows Server 2003 x64 installed will do just fine for big
socket apps.

> We support 100,000+ connections on the C++ app.  Will be interesting to
> see what the C# app can do.  If this works well we have a much larger app
> that I will make the move with as well.  The entire managed architecture
> is very, very appealing to me after years of off by one crashes. :)

If it's a simple socket server, you should be fine. When I was playing with
scalability limits (by writing nothing more than an Async Echo Server), the
limits were very high. I didn't document it anywhere, and don't really
remember what "very high" is though. :(

I do remember "very high" was such that finding enough client firepower to
max it out was difficult.


--
Chris Mullins
Author
5 Oct 2006 2:31 AM
William Stacey [C# MVP]
Thanks for writing that up Chris.  The pattern is a good one.  I have seen
it called Half-Sync/Half-Async before:
http://www.cs.wustl.edu/~schmidt/PDF/PLoP-95.pdf

--
William Stacey [C# MVP]
Author
5 Oct 2006 3:27 AM
Chris Mullins
"William Stacey [C# MVP]" wrote in message:
> Thanks for writing that up Chris.  The pattern is a good one.
> I have seen it called Half-Sync/Half-Async before:
> http://www.cs.wustl.edu/~schmidt/PDF/PLoP-95.pdf
>

That's interesting. I haven't seen that paper before, or heard the term, yet
it's very close to what we do.

As it sits now, it's fairly easy for developers to add new features into our
server code base - the majority of the code they have to write is
synchronous. It's not until things go Async that developers tend to get
quickly confused.

We've been looking to go more and more async, but given the synchronous
nature of database access (and the lack of support for an async model),
we're stuck in sync land for much of our processing.

--
Chris Mullins, MCSD.NET, MCPD:Enterprise
http://www.coversant.net/blogs/cmullins
Author
5 Oct 2006 5:51 AM
William Stacey [C# MVP]
Another related idea would be to use the CCR library (still in beta and
currently distributed with the robotics library, but very good).  It is
Port/Message based with it own thread pool(s) (can also use the native .Net
TP) and allows all kinds of interesting stuff like scatter/gather, port
joins, etc.  So for example, you could async wait on a socket message which
gets posted to a Port after full message arrives.  Your port arbiter (i.e.
delegate) now does work with the message.  This work could be kicking off 4
additional requests to different databases (or files, etc) and then start an
async wait on a Result Port for all operations to return or error.  Finally,
return result to original requester.  All this can be done without blocking
any threads on IO.   Effectively, it allows the same pattern with port
queues and thread pools, but has it all built in to the Port abstractions -
very powerful.

--
William Stacey [C# MVP]

Show quote
"Chris Mullins" <cmull***@yahoo.com> wrote in message
news:ebjd14C6GHA.4644@TK2MSFTNGP04.phx.gbl...
| "William Stacey [C# MVP]" wrote in message:
| > Thanks for writing that up Chris.  The pattern is a good one.
| > I have seen it called Half-Sync/Half-Async before:
| > http://www.cs.wustl.edu/~schmidt/PDF/PLoP-95.pdf
| >
|
| That's interesting. I haven't seen that paper before, or heard the term,
yet
| it's very close to what we do.
|
| As it sits now, it's fairly easy for developers to add new features into
our
| server code base - the majority of the code they have to write is
| synchronous. It's not until things go Async that developers tend to get
| quickly confused.
|
| We've been looking to go more and more async, but given the synchronous
| nature of database access (and the lack of support for an async model),
| we're stuck in sync land for much of our processing.
|
| --
| Chris Mullins, MCSD.NET, MCPD:Enterprise
| http://www.coversant.net/blogs/cmullins
|
|

AddThis Social Bookmark Button