[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference tuxedo::dce-products

Title:	DCE Product Information
Notice:	Kit Info - See 2.-4.
Moderator:	TUXEDO::MAZZAFERRO

Created:	Fri Jun 26 1992
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2269
Total number of notes:	10003

2246.0. "Many Authenticates gives - 282110316, Error with socket (dce / cds)" by OZROCK::THOMAN (Yoda on C++ to C programmers: "You Must Unlearn What You Have Learned!") Sun May 11 1997 23:05

	When running a test suite that tries to start many clients
	at once, I get the following error when approx 20% of the 
	client try to Authenticate.

		282110316, Error with socket (dce / cds)
	
	Any ideas on what we can "tweak" to get around this problem ?

	We're using DCE 1.3.2 (13b) on Digital Unix 3.2c -> 3.2g 
	machines.

	Thanks, 

	Craig.

T.R	Title	User	Personal Name	Date	Lines
2246.1	error implies cdsadv/cdsclerk problems	TUXEDO::ZEE	There you go.	`Mon May 12 1997 12:06`	17
	How many clients are we talking about? Are they running under the same username? Is cdsadv still running, and how many cdsclerk processes do you have? The "error with socket" message implies a problem on the local node in a process trying to talk to either cdsadv (on the initial CDS call), or to its respective cdsclerk process. > Any ideas on what we can "tweak" to get around this problem ? Besides a redesign of cdsadv and cdsclerk, if it's just too many processes overloading one socket (cdsLib named socket), there's not much you can do. If it's too many processes overloading a cdsclerk_<pid>_<username> socket, you could try running your clients under different usernames. --Roger
2246.2	Some Answers....	OZROCK::THOMAN	Yoda on C++ to C programmers: "You Must Unlearn What You Have Learned!"	`Tue May 13 1997 03:05`	62
	>How many clients are we talking about? as low as 10. I don't know if this is a coincidence, but when I increased the maxuser to 512, so I could increase max-threads-per-user to 5000, I got an improvement, but I'm not sure what side effects this has. Even at 5000 th/user, I still get 3 failures in 40 authen'ns. >Are they running under the same username? Yes... >You could try running your clients under different usernames. Does this mean the problem is significantly LESS likely if I run them as different names ? >overloading one socket (cdsLib named socket) How can I look at the number of connections to that socket? >cdsclerk_<pid>_<username> socket ... and look at these ones ? > Is cdsadv still running, ... Yes >and how many cdsclerk processes do you have? Only 4 - always, whether the tests are running or not. There are 3 sessions logged into the machine (me twice and root once), so I assume they account for 3, & there's the usual: cdsclerk -U DNS$SERVER -u 0 -m 0 .... (Which always makes me believe you prefer VMS ;-) ?? ) We don't run SIA. Thanks, Craig.
2246.3	Clarification...	OZROCK::THOMAN	Yoda on C++ to C programmers: "You Must Unlearn What You Have Learned!"	`Tue May 13 1997 22:13`	68
	>>cdsclerk_<pid>_<username> socket, you could try running your clients >>under different usernames. By "different usernames", you mean different UNIX usernames, or different DCE principals. I'm trying to get an understanding of how the cdsclerk's are started... At the momement one a DCE client (sec client & cds client) node there are 4 cdsclerk's: --------------------------------------------------------------------------- ie: /opt/dcelocal/bin/cdsclerk -U root -u 0 -m 0 /opt/dcelocal/bin/cdsclerk -U root -u 0 -m 0 /opt/dcelocal/bin/cdsclerk -U thoman -u 321 -m 0 /opt/dcelocal/bin/cdsclerk -U moored -u 397 -m 0 but only 3 logins: ------------------ # w 11:09 up 18:37, 3 users, load average: 0.00, 0.01, 0.00 User tty from login@ idle JCPU PCPU what thoman p0 x.dec.c 16:34 1:14 207:10 1 -tcsh thoman p1 x.dec.c 16:34 209:18 1 -tcsh root p2 x.dec.c 16:38 17:22 58 -csh On the DCE SERVER (both security & CDS) in my 2 node cell, there is: --------------------------------------------------------------------- /opt/dcelocal/bin/cdsclerk -U DNS$SERVER -u 0 -m 0 /opt/dcelocal/bin/cdsclerk -U root -u 0 -m 0 /opt/dcelocal/bin/cdsclerk -U root -u 0 -m 0 and only 2 logins: ------------------ # w 11:10 up 18:38, 2 users, load average: 0.04, 0.02, 0.01 User tty from login@ idle JCPU PCPU what root p0 x.dec.com 16:33 8 1 w thoman p1 x.dec.com 16:34 18:17 -tcsh Can you pls tell me how the clerk is triggered ? Thanks, Craig.
2246.4	Under these load conditions, we also see:	OZROCK::THOMAN	Yoda on C++ to C programmers: "You Must Unlearn What You Have Learned!"	`Wed May 14 1997 03:10`	11
	382312534, association shutdown (dce / rpc) What's that mean ? Thx Craig.
2246.5	answers to .2 and .3 and .4	TUXEDO::ZEE	There you go.	`Thu May 15 1997 18:08`	75
	Note: All of the following only pertains to the DCE UNIX implementations. DCE NT and DCE VMS have different clerk designs. >>You could try running your clients under different usernames. > Does this mean the problem is significantly LESS likely > if I run them as different names ? Perhaps, depending on which socket seems to be having the problem. > How can I look at the number of connections to that socket? I don't know how to do this, but a roundabout method is to use the lsof (list of open file descriptors) utility that is available in cyberspace. As root, it will return a list of open file descriptors for each process on the system. You can then grep for cdsLib. It might be difficult to time it properly because the communication via the cdsLib socket is short-lived. Let me know if you'd like lsof for Digital UNIX or OSF/1. > By "different usernames", you mean different UNIX usernames, or > different DCE principals. I mean different UNIX usernames. Actually, it's really a UNIX system UID/GID pair combination. So, two processes logged in as thoman will most likely share the same cdsclerk process. Each process may have different DCE credentials, but the cdsclerk process deals with that properly. The reason you may see two cdsclerk processes for the same user is probably that the GID of whatever running DCE application is different between the two. You can do an ls -lg /opt/dcelocal/var/adm/directory/cds to see a list of the named sockets. >Can you pls tell me how the clerk is triggered ? 1. The cdsadv starts and a thread listens over the cdsLib socket in /opt/dcelocal/var/adm/directory/cds. 2. Someone starts up a DCE application. In libdce.so, the first CDS library call will attempt to connect to the cdsLib socket and passes along the UID/GID info. Perhaps the socket error occurs here if there are many (don't know how many - depends on the system configuration) concurrent initial CDS calls amongst the processes. 3. The cdsadv listener thread keeps a list of existing cdsclerk processes and their respective UID/GID attribute. a. If nothing matches, cdsadv will fork and exec a cdsclerk process and hand both the cdsclerk process and the DCE application a unique named socket (cdsclerk_<pid>_<user>) over which they will then communicate forever more as the DCE application will then close it's socket to cdsLib. b. If there is a match, cdsadv returns the corresponding named socket to the DCE application, over which they will communicate forever more as the application will then close it's socket to cdsLib. Perhaps the socket error is returned on this named socket, although the error in 2. is more likely. Yes, this context switching is time consuming, and the code to handle all of the different call contexts is somewhat messy, but this was designed in the days of a process only having a max of 64 file descriptors on Ultrix. We are in the process of prototyping an inline clerk, similar to the DCE NT V2.0 implementation. > 382312534, association shutdown (dce / rpc) > What's that mean ? This is a lower level RPC status that usually is not seen at the user level. Where does this show up? --Roger
2246.6	RE -.1	OZROCK::THOMAN	For SII support dial 110 ! (OZY internal only)	`Thu May 22 1997 01:16`	114
	Sorry for the delay in getting back to you. Thanks for -.1 - that was some realy useful, but dangerous info for us. It looks like we're going to have some real concerns in the near future. Our customer might have as many as 800 clients trying to authenticate at each change of shift of its telephone constultants. At the moment we're trying to walk before running - I just tried 200 & got 25 failures due to "association shutdown (dce / rpc)" With these 200, 1 shell script started the test clients su(1)'ing them as UNIX users test1,...,test100 while a 2nd shell script started them in the reverse order. Hence AT-MOST 2 application clients would've being sharing 1 cdsclerk. With this run of 200, I got NO "Error On Socket" problems but I can't say that 1 test run is conclusive, mainly because the machine is so much more heavily loaded. It's now running approx 105 cdsclerks, instead of approx 5. Also, there's 200 su(1) processes that aren't in existance when I run all tests under root. >> 382312534, association shutdown (dce / rpc) >> What's that mean ? > >This is a lower level RPC status that usually is not seen at the user >level. Where does this show up? We get this as some (upto 10%) of the application clients try to authenticate. They all run on a DCE client machine. Below is the code that the client is running. I'm not very familar with the DCE-API, and didn't write this code. /* ** Do the validation / if( privileged ) { sec_login_valid_and_cert_ident(dceContext, &dcePassword, &passwordExpired, &dceAuthorisor, &dceStatus); } else { sec_login_validate_identity(dceContext, &dcePassword, &passwordExpired, &dceAuthorisor, &dceStatus); if( dceStatus == error_status_ok ) { sec_login_certify_identity( dceContext, &dceStatus ); } } if( (status = CHECK_DCE_STATUS( dceStatus )) != ZAP_OK ) { sec_login_purge_context( &dceContext, &dceStatus ); return status; } / Make sure we were authenticated by the security server and don't just have local credentials. No access at all if we can't establish network credentials. */ if( dceAuthorisor != sec_login_auth_src_network ) { sec_login_purge_context( &dceContext, &dceStatus ); status = ZAP_NETWORK; return ZAP_REPORT( status, ZapSevError, ZAP_SOURCE_ZAP, status, Z18_NO_NETWORK_AUTH ); } That "ZAP_REPORT() macro is causing the appl'n client to print: SI API: Error message from DCE 382312534, association shutdown (dce / rpc) ../../../base/api/src/zapauthn.c, 457 Thanks, Craig.
2246.7	Does .6 help Roger?	OZROCK::THOMAN	For SII support dial 110 ! (OZY internal only)	`Wed May 28 1997 21:50`	10
	Does my description of .6 explain why I get "association shutdown". Can you tell me what it means ? Thanks, Craig.
2246.8	Also, what's "336761021 Credentials cache I/O operation failed" mean?	OZROCK::THOMAN	For SII support dial 110 ! (OZY internal only)	`Fri May 30 1997 00:24`	7
	The machine was under high load at the time - ie: 99% CPU in use. Thx Craig.
2246.9	association rundown doesn't mean much per se	TUXEDO::ZEE	There you go.	`Fri May 30 1997 20:09`	13
	Looking through the RPC runtime code, rpc_s_assoc_shutdown gets returned if the client side received a shutdown request from the server. Since you received this status from a sec_login* call, perhaps your security server is overloaded? Do you have any security replicas? I'm grasping a bit here, but I'll keep looking to see how a shutdown request gets generated. >Also, what's "336761021 Credentials cache I/O operation failed" mean? That is the KRB5_CC_IO status code, which doesn't help much. It looks like a side effect of the unusually high load, perhaps a call timing out. --Roger
2246.10	Yes & Scale Testing Done ?	OZROCK::THOMAN	For SII support dial 110 ! (OZY internal only)	`Mon Jun 02 1997 04:48`	23
	>you received this status from a sec_login* call, perhaps your security >server is overloaded? That's most likely the case. > Do you have any security replicas? No. It's only a 2 (or 3) node cell. By using 2 different UNIX user names to start the tests under I get noticeable improvements. Has any one in your team ever done any DCE CDS & Security Scalability Testing ? If so, is there a report, or a summary of the results avaliable (for internal use only) ? Thx Craig.

Conference tuxedo::dce-products

2246.0. "Many Authenticates gives - 282110316, Error with socket (dce / cds)" by OZROCK::THOMAN (Yoda on C++ to C programmers: &quot;You Must Unlearn What You Have Learned!&quot;) Sun May 11 1997 23:05

2246.0. "Many Authenticates gives - 282110316, Error with socket (dce / cds)" by OZROCK::THOMAN (Yoda on C++ to C programmers: "You Must Unlearn What You Have Learned!") Sun May 11 1997 23:05