[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference pamsrc::objectbroker_development

Title:	ObjectBroker Development - BEA Systems' CORBA
Notice:	See note 2 for kit locations; note 4 for training
Moderator:	RECV::GUMBELd

Created:	Thu Dec 27 1990
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2482
Total number of notes:	13057

2427.0. "Yet another TCP dead server socket build-up problem" by LEMAN::DONALDSON (Froggisattva! Froggisattva!) Thu Jan 30 1997 03:36

Hi. I'm working with a customer this week. (You know the one -
large Swiss bank! ;-)).

They have lots of ObjectBroker applications and they're extending.
So good news. I'm here to ease them up to better and more efficient 
solutions. They're opening up a number of their applications to
their global intranet.

That's starting to hit them with a problem. The servers handle the
traffic and the clients have good response. But now and then a
client leaves a socket open on a server. Finally the server refuses
to open more sockets and goes into an unresponsive loop.

Different clients: some Windows 3.1 and some NT (migrating to NT over
perhaps 2 years). The W3.1 clients seem to *always* leave their sockets
open on the server. Even if they shut down tidily (release and rundown).
The NT clients only leave sockets if aborted in some way.

Currently we have a stop-gap solution in place. Kill servers if their
socket total goes above a certain total. This is not ideal because
some of the clients have automatically bound objrefs to that server.
We'll fix that with another release of the clients which implement
a "back off and try again" policy on their requests for a set of expected
errors.

What we'd like though is to get rid of all the 'dead' sockets. Ones
which have been inactive for a certain time. It seems that this is 
possible but that ObjectBroker doesn't create the socket correctly.

We've got a mixture of VAX, Alpha, NT and Win3.1. With servers on
mostly VMS. Versions of OBB are 2.5B and 2.6. 

It seems that UCXcan be configured to probe and drop sockets. (We've
done that). But that ObjectBroker when it creates sockets doesn't
propagate these attributes. The sockets have no flags set. (In comparison,
programs like Rlogin do this).

Any ideas? 
	- how to recognize inactive sockets?
	- how to enable the probe and drop?
	- any other approaches that might work?

We have a server pool ready for the next release - this will 
help a lot. All NT clients will help a lot. *But* we will *still*
have a build up of dead server sockets from aborted clients.

John D.

T.R	Title	User	Personal Name	Date	Lines
2427.1	Fix the client stack	CFSCTC::HUSTON	Steve Huston	`Thu Jan 30 1997 09:43`	16
	>The W3.1 clients seem to always leave their sockets >open on the server. Even if they shut down tidily (release and rundown). This indicates a problem with the W3.1 TCP stack. You should get a TCP trace of one of these TCP connections, verify that the client is not handling the session correctly, and go to the stack vendor for a real fix. One other off-the-cuff thing you could try on the client side... do the rundowns and then sleep a few seconds before exiting the program. Maybe if the stack code has a bit more time, it'll finish completely. I'd be willing to help more with this if needed - you can contact me off-line at [email protected]. -Steve
2427.2		LEMAN::DONALDSON	Froggisattva! Froggisattva!	`Thu Jan 30 1997 10:45`	28
	Steve, thanks for the reply. I'm pretty sure you must be right (I've seen W3.1 OBB v2.5 cleaning up properly, so I was a bit surprised with their claims). However, the solution "fix the stack" will probably not be accepted (they dont want to spend money on something that will be thrown away soon). In any case we need to put in place a solution for those clients that end abruptly (powerfail or whatever). At the moment: we monitor (with a DCL script) the number of sockets and re-start the server if they go above danger-level (about 70 sockets); we check each socket and ping the remote end - if it's inactive we disconnect the socket. As a solution its clunky but it seems to work. I'd like any sockets which haven't been used for x minutes to go away automatically. Any ideas how to do this? (Wishlist: allow me to configure this). In the medium term the clients will get a release which releases the objref and re-connects if a request fails. This will avoid the worst effects of re-starting the servers. (Wishlist: add this binding - something like OBB_BINDING_AUTOMATIC_WITH_FAILOVER). John D.
2427.3		CFSCTC::HUSTON	Steve Huston	`Thu Jan 30 1997 12:19`	40
	>"fix the stack" will probably not be accepted (they dont >want to spend money on something that will be thrown away >soon). Well, if the problem is a bug in the TCP stack, maybe the vendor would give you a fix for free. Especially such an obvious problem as not handling connection shutdown correctly. Ok, I won't push this... you know your customer. I was trying to get ObjectBroker (and DEC) off the hook for the problem by transferring blame (and attention) to the appropriate place. >In any case we need to put in place a solution for those >clients that end abruptly (powerfail or whatever). This is where OBB may be able to help. If keepalives are enabled on the sockets, it may help to catch this condition and kill the socket. But it can take keepalives quite a while to notice and kill a dead connection. And depending on what the PC end is doing, keepalives may not be any help at all. More info would be needed (below...) >At the moment: we monitor (with a DCL script) the number >of sockets and re-start the server if they go above danger-level >(about 70 sockets); we check each socket and ping the remote >end - if it's inactive we disconnect the socket. As a >solution its clunky but it seems to work. Do you know what state the TCP connection is in when they're "stuck" or hung? This is a key piece of info to getting the base problem taken care of. >I'd like any sockets which haven't been used for x minutes >to go away automatically. Any ideas how to do this? >(Wishlist: allow me to configure this). No, this is an application responsibility (e.g. OBB) and it's incredibly hard to get it right without intimate knowledge of what both sides are doing, which ObjectBroker does not (and can not) have. -Steve
2427.4		LEMAN::DONALDSON	Froggisattva! Froggisattva!	`Mon Feb 10 1997 05:49`	19
	The customer is still investigating at a low priority level. (At least their current production system is stable, if not exactly elegant). I'll come back if we make any more progress. >>I'd like any sockets which haven't been used for x minutes >>to go away automatically. Any ideas how to do this? >>(Wishlist: allow me to configure this). > >No, this is an application responsibility (e.g. OBB) and it's incredibly >hard to get it right without intimate knowledge of what both sides >are doing, which ObjectBroker does not (and can not) have. Well, I think it should be possible to add some method like timeout which could do whatever is appropriate to kill of the link. OBB::Object::timeout or something (OBB_Object_set_timeout). Which basically says if this object is active then timeout its link (network dependant) if there's no activity for a certain period. John D.
2427.5	Can't time out on per-object basis	REQUE::ctxobj.zko.dec.com::Patrick	ObjectBroker Engineering	`Mon Feb 10 1997 08:30`	27
	ObjectBroker does not create a separate logically link for each object. If it did, we'd not scale in systems with large number of objects. It's for this same reason we don't track every object in the system. What we do support is multiplexing objects over a single network link when more than one object reference is being held by the client that has resulted in going to the same server process. Given this, if we timed out the link because Object #1 had not been used, we'd also time out the link for Object #2 which had just been used. As indicated, this is a problem in the TCP/IP implementation being used. We will continue to consider mechanisms that would allow idle links to be shutdown and re-established to be supported in some future version, but nothing definite yet. FWIW: it appears that many of the TCP/IP implementations for Windows 3.x (16-bit) just don't work properly. As a result, you see things just like your seeing. My suggestion is move to 32-bit Windows quickly, if at all possible. Paul Patrick
2427.6		LEMAN::DONALDSON	Froggisattva! Froggisattva!	`Tue Feb 11 1997 10:30`	29
	I'm glad you're multiplexing - I hadn't got a good inside story on that (any chance of an 'internals' one-off course?). >What we do support is multiplexing objects over a single network >link when more than one object reference is being held by the >client that has resulted in going to the same server process. >Given this, if we timed out the link because Object #1 had not >been used, we'd also time out the link for Object #2 which had >just been used. Well, if you take a simplistic approach that's true. But you could use some kind of 'reference count' technique. The common-sense expectation ought to be implementable here - when a link hasn't been used - time it out. >FWIW: it appears that many of the TCP/IP implementations > for Windows 3.x (16-bit) just don't work properly. As > a result, you see things just like your seeing. My > suggestion is move to 32-bit Windows quickly, if at > all possible. I understand what you're saying and I'm trying to get more accurate data on when the sockets are getting left behind. You can imagine that in a large global enterprise there are lots of different PCs and versions of software etc. John D.
2427.7		REQUE::BOWER	Peter Bower, ObjectBroker	`Sat Feb 15 1997 07:57`	10
	Steve's question from .3 is a good one. Do you know the answer for it ? > Do you know what state the TCP connection is in when they're "stuck" > or hung? This is a key piece of info to getting the base problem > taken care of. A UCX SHOW DEVICE/full on a hung socket may be usefull.
2427.8		LEMAN::DONALDSON	Froggisattva! Froggisattva!	`Mon Feb 17 1997 05:48`	14
	> Steve's question from .3 is a good one. Do you know the answer > for it ? > > > Do you know what state the TCP connection is in when they're "stuck" > > or hung? This is a key piece of info to getting the base problem > > taken care of. Yes, I know. I was on site with this customer for a week recently and we discovered this problem. I delivered the consulting I'd been hired to do and I'm working elsewhere now. So, I cant push too much to get the customer to look at this. When I can I'll be back with more info. John D.