[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference pamsrc::objectbroker_development

Title:ObjectBroker Development - BEA Systems' CORBA
Notice:See note 2 for kit locations; note 4 for training
Moderator:RECV::GUMBELd
Created:Thu Dec 27 1990
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2482
Total number of notes:13057

2427.0. "Yet another TCP dead server socket build-up problem" by LEMAN::DONALDSON (Froggisattva! Froggisattva!) Thu Jan 30 1997 03:36

Hi. I'm working with a customer this week. (You know the one -
large Swiss bank! ;-)).

They have lots of ObjectBroker applications and they're extending.
So good news. I'm here to ease them up to better and more efficient 
solutions. They're opening up a number of their applications to
their global intranet.

That's starting to hit them with a problem. The servers handle the
traffic and the clients have good response. But now and then a
client leaves a socket open on a server. Finally the server refuses
to open more sockets and goes into an unresponsive loop.

Different clients: some Windows 3.1 and some NT (migrating to NT over
perhaps 2 years). The W3.1 clients seem to *always* leave their sockets
open on the server. Even if they shut down tidily (release and rundown).
The NT clients only leave sockets if aborted in some way.

Currently we have a stop-gap solution in place. Kill servers if their
socket total goes above a certain total. This is not ideal because
some of the clients have automatically bound objrefs to that server.
We'll fix that with another release of the clients which implement
a "back off and try again" policy on their requests for a set of expected
errors.

What we'd like though is to get rid of all the 'dead' sockets. Ones
which have been inactive for a certain time. It seems that this is 
possible but that ObjectBroker doesn't create the socket correctly.

We've got a mixture of VAX, Alpha, NT and Win3.1. With servers on
mostly VMS. Versions of OBB are 2.5B and 2.6. 

It seems that UCXcan be configured to probe and drop sockets. (We've
done that). But that ObjectBroker when it creates sockets doesn't
propagate these attributes. The sockets have no flags set. (In comparison,
programs like Rlogin do this).

Any ideas? 
	- how to recognize inactive sockets?
	- how to enable the probe and drop?
	- any other approaches that might work?

We have a server pool ready for the next release - this will 
help a lot. All NT clients will help a lot. *But* we will *still*
have a build up of dead server sockets from aborted clients.

John D.
T.RTitleUserPersonal
Name
DateLines
2427.1Fix the client stackCFSCTC::HUSTONSteve HustonThu Jan 30 1997 09:4316
>The W3.1 clients seem to *always* leave their sockets
>open on the server. Even if they shut down tidily (release and rundown).

This indicates a problem with the W3.1 TCP stack.  You should get a
TCP trace of one of these TCP connections, verify that the client is
not handling the session correctly, and go to the stack vendor for
a real fix.

One other off-the-cuff thing you could try on the client side... do the
rundowns and then sleep a few seconds before exiting the program.  Maybe if
the stack code has a bit more time, it'll finish completely.

I'd be willing to help more with this if needed - you can contact me
off-line at [email protected].

-Steve
2427.2LEMAN::DONALDSONFroggisattva! Froggisattva!Thu Jan 30 1997 10:4528
Steve, thanks for the reply. I'm pretty sure you must be right 
(I've seen W3.1 OBB v2.5 cleaning up properly, so I was a
bit surprised with their claims). However, the solution
"fix the stack" will probably not be accepted (they dont
want to spend money on something that will be thrown away
soon).

In any case we need to put in place a solution for those
clients that end abruptly (powerfail or whatever). 

At the moment: we monitor (with a DCL script) the number
of sockets and re-start the server if they go above danger-level
(about 70 sockets); we check each socket and ping the remote
end - if it's inactive we disconnect the socket. As a 
solution its clunky but it seems to work.

I'd like any sockets which haven't been used for x minutes
to go away automatically. Any ideas how to do this?
(Wishlist: allow me to configure this).

In the medium term the clients will get a release 
which releases the objref and re-connects if a request
fails. This will avoid the worst effects of re-starting 
the servers.
(Wishlist: add this binding - something like
	OBB_BINDING_AUTOMATIC_WITH_FAILOVER).

John D.
2427.3CFSCTC::HUSTONSteve HustonThu Jan 30 1997 12:1940
>"fix the stack" will probably not be accepted (they dont
>want to spend money on something that will be thrown away
>soon).

Well, if the problem is a bug in the TCP stack, maybe the vendor would
give you a fix for free.  Especially such an obvious problem as not
handling connection shutdown correctly.
Ok, I won't push this... you know your customer.  I was trying
to get ObjectBroker (and DEC) off the hook for the problem by
transferring blame (and attention) to the appropriate place.

>In any case we need to put in place a solution for those
>clients that end abruptly (powerfail or whatever). 

This is where OBB may be able to help.  If keepalives are enabled
on the sockets, it may help to catch this condition and kill the
socket.  But it can take keepalives quite a while to notice and kill
a dead connection.  And depending on what the PC end is doing, keepalives
may not be any help at all.  More info would be needed (below...)

>At the moment: we monitor (with a DCL script) the number
>of sockets and re-start the server if they go above danger-level
>(about 70 sockets); we check each socket and ping the remote
>end - if it's inactive we disconnect the socket. As a 
>solution its clunky but it seems to work.

Do you know what state the TCP connection is in when they're "stuck"
or hung?  This is a key piece of info to getting the base problem
taken care of.

>I'd like any sockets which haven't been used for x minutes
>to go away automatically. Any ideas how to do this?
>(Wishlist: allow me to configure this).

No, this is an application responsibility (e.g. OBB) and it's incredibly
hard to get it right without intimate knowledge of what both sides
are doing, which ObjectBroker does not (and can not) have.


-Steve
2427.4LEMAN::DONALDSONFroggisattva! Froggisattva!Mon Feb 10 1997 05:4919
The customer is still investigating at a low priority level.
(At least their current production system is stable, if not
exactly elegant). I'll come back if we make any more progress.

>>I'd like any sockets which haven't been used for x minutes
>>to go away automatically. Any ideas how to do this?
>>(Wishlist: allow me to configure this).
>
>No, this is an application responsibility (e.g. OBB) and it's incredibly
>hard to get it right without intimate knowledge of what both sides
>are doing, which ObjectBroker does not (and can not) have.

Well, I think it should be possible to add some method 
like timeout which could do whatever is appropriate to kill
of the link. OBB::Object::timeout or something (OBB_Object_set_timeout).
Which basically says if this object is active then timeout its
link (network dependant) if there's no activity for a certain period.

John D.
2427.5Can't time out on per-object basisREQUE::ctxobj.zko.dec.com::PatrickObjectBroker EngineeringMon Feb 10 1997 08:3027
ObjectBroker does not create a separate logically link for each
object.  If it did, we'd not scale in systems with large number of
objects.  It's for this same reason we don't track every object in
the system.

What we do support is multiplexing objects over a single network
link when more than one object reference is being held by the
client that has resulted in going to the same server process.

Given this, if we timed out the link because Object #1 had not
been used, we'd also time out the link for Object #2 which had
just been used.

As indicated, this is a problem in the TCP/IP implementation
being used.  We will continue to consider mechanisms that
would allow idle links to be shutdown and re-established to
be supported in some future version, but nothing definite yet.

FWIW: it appears that many of the TCP/IP implementations
            for Windows 3.x (16-bit) just don't work properly.  As
            a result, you see things just like your seeing.  My
            suggestion is move to 32-bit Windows quickly, if at
            all possible.


Paul Patrick
    
2427.6LEMAN::DONALDSONFroggisattva! Froggisattva!Tue Feb 11 1997 10:3029
I'm glad you're multiplexing - I hadn't got a good inside
story on that (any chance of an 'internals' one-off course?).

>What we do support is multiplexing objects over a single network
>link when more than one object reference is being held by the
>client that has resulted in going to the same server process.

>Given this, if we timed out the link because Object #1 had not
>been used, we'd also time out the link for Object #2 which had
>just been used.

Well, if you take a simplistic approach that's true.
But you could use some kind of 'reference count' technique.

The common-sense expectation ought to be implementable
here - when a link hasn't been used - time it out.

>FWIW: it appears that many of the TCP/IP implementations
>            for Windows 3.x (16-bit) just don't work properly.  As
>            a result, you see things just like your seeing.  My
>            suggestion is move to 32-bit Windows quickly, if at
>            all possible.

I understand what you're saying and I'm trying to get
more accurate data on *when* the sockets are getting 
left behind. You can imagine that in a large global enterprise
there are *lots* of different PCs and versions of software etc.

John D.
2427.7REQUE::BOWERPeter Bower, ObjectBrokerSat Feb 15 1997 07:5710
    
    Steve's question from .3 is a good one. Do you know the answer
    for it ?
    
    > Do you know what state the TCP connection is in when they're "stuck"
    > or hung?  This is a key piece of info to getting the base problem
    > taken care of.
    
    A UCX SHOW DEVICE/full on a hung socket may be usefull.
    
2427.8LEMAN::DONALDSONFroggisattva! Froggisattva!Mon Feb 17 1997 05:4814
>    Steve's question from .3 is a good one. Do you know the answer
>    for it ?
>    
>    > Do you know what state the TCP connection is in when they're "stuck"
>    > or hung?  This is a key piece of info to getting the base problem
>    > taken care of.
    
Yes, I know. I was on site with this customer for a week 
recently and we discovered this problem. I delivered the 
consulting I'd been hired to do and I'm working elsewhere
now. So, I cant push too much to get the customer to look
at this. When I can I'll be back with more info.

John D.