[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference tuxedo::dce-products

Title:DCE Product Information
Notice:Kit Info - See 2.*-4.*
Moderator:TUXEDO::MAZZAFERRO
Created:Fri Jun 26 1992
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2269
Total number of notes:10003

2246.0. "Many Authenticates gives - 282110316, Error with socket (dce / cds)" by OZROCK::THOMAN (Yoda on C++ to C programmers: "You Must Unlearn What You Have Learned!") Sun May 11 1997 23:05

	When running a test suite that tries to start many clients
	at once, I get the following error when approx 20% of the 
	client try to Authenticate.

		282110316, Error with socket (dce / cds)
	
	Any ideas on what we can "tweak" to get around this problem ?

	We're using DCE 1.3.2 (13b) on Digital Unix 3.2c -> 3.2g 
	machines.

	Thanks, 

	Craig.

	
T.RTitleUserPersonal
Name
DateLines
2246.1error implies cdsadv/cdsclerk problemsTUXEDO::ZEEThere you go.Mon May 12 1997 12:0617
How many clients are we talking about?  Are they running under the same
username?  Is cdsadv still running, and how many cdsclerk processes do
you have?

The "error with socket" message implies a problem on the local node in
a process trying to talk to either cdsadv (on the initial CDS call), or
to its respective cdsclerk process.

>	Any ideas on what we can "tweak" to get around this problem ?

Besides a redesign of cdsadv and cdsclerk, if it's just too many
processes overloading one socket (cdsLib named socket), there's not much
you can do.  If it's too many processes overloading a
cdsclerk_<pid>_<username> socket, you could try running your clients
under different usernames.

--Roger
2246.2Some Answers....OZROCK::THOMANYoda on C++ to C programmers: &quot;You Must Unlearn What You Have Learned!&quot;Tue May 13 1997 03:0562
>How many clients are we talking about?  

	as low as 10.

	I don't know if this is a coincidence, but when I increased 
	the maxuser to 512, so I could increase max-threads-per-user
	to 5000, I got an improvement, but I'm not sure what
	side effects this has. Even at 5000 th/user, I still get 3
	failures in 40 authen'ns.


>Are they running under the same username? 

	Yes...

	
>You could try running your clients under different usernames.

	Does this mean the problem is significantly LESS likely
	if I run them as different names ?


>overloading one socket (cdsLib named socket)

	How can I look at the number of connections to that socket?


>cdsclerk_<pid>_<username> socket

	... and look at these ones ?


> Is cdsadv still running, ...

	Yes 


>and how many cdsclerk processes do you have?

	Only 4 - always, whether the tests are running or not.

	There are 3 sessions logged into the machine (me twice
	and root once), so I assume they account for 3, & there's 
	the usual:

		cdsclerk -U DNS$SERVER -u 0 -m 0 ....

	(Which always makes me believe you prefer VMS  ;-)  ?? )

	We  ** don't ** run SIA.




	
Thanks,

Craig.


	
2246.3Clarification...OZROCK::THOMANYoda on C++ to C programmers: &quot;You Must Unlearn What You Have Learned!&quot;Tue May 13 1997 22:1368
	

	
>>cdsclerk_<pid>_<username> socket, you could try running your clients
>>under different usernames.


	By "different usernames", you mean different UNIX usernames, or
	different DCE principals.


	
I'm trying to get an understanding of how the cdsclerk's are started...

At the momement one a DCE client (sec client & cds client) node there are 4
cdsclerk's:
---------------------------------------------------------------------------

ie:

 /opt/dcelocal/bin/cdsclerk -U root -u 0 -m 0
 /opt/dcelocal/bin/cdsclerk -U root -u 0 -m 0
 /opt/dcelocal/bin/cdsclerk -U thoman -u 321 -m 0
 /opt/dcelocal/bin/cdsclerk -U moored -u 397 -m 0



but only 3 logins:
------------------

# w
11:09  up 18:37,  3 users,  load average: 0.00, 0.01, 0.00
User     tty        from             login@    idle   JCPU   PCPU what
thoman   p0         x.dec.c 	     16:34     1:14 207:10      1 -tcsh
thoman   p1         x.dec.c 	     16:34          209:18      1 -tcsh
root     p2         x.dec.c          16:38    17:22     58        -csh





On the DCE SERVER (both security & CDS) in my 2 node cell, there is:
---------------------------------------------------------------------

 /opt/dcelocal/bin/cdsclerk -U DNS$SERVER -u 0 -m 0
 /opt/dcelocal/bin/cdsclerk -U root -u 0 -m 0
 /opt/dcelocal/bin/cdsclerk -U root -u 0 -m 0


and only 2 logins:
------------------ 

# w
11:10  up 18:38,  2 users,  load average: 0.04, 0.02, 0.01
User     tty        from             login@    idle   JCPU   PCPU what
root     p0         x.dec.com 	     16:33               8      1 w
thoman   p1         x.dec.com 	     16:34    18:17               -tcsh


Can you pls tell me how the clerk is triggered ?




	
	Thanks,

	Craig.
2246.4Under these load conditions, we also see:OZROCK::THOMANYoda on C++ to C programmers: &quot;You Must Unlearn What You Have Learned!&quot;Wed May 14 1997 03:1011

	382312534, association shutdown (dce / rpc)


	What's that mean ?


	Thx

	Craig.
2246.5answers to .2 and .3 and .4TUXEDO::ZEEThere you go.Thu May 15 1997 18:0875
Note: All of the following only pertains to the DCE UNIX implementations.
DCE NT and DCE VMS have different clerk designs.

>>You could try running your clients under different usernames.

>	Does this mean the problem is significantly LESS likely
>	if I run them as different names ?

Perhaps, depending on which socket seems to be having the problem.

>	How can I look at the number of connections to that socket?

I don't know how to do this, but a roundabout method is to use the lsof
(list of open file descriptors) utility that is available in cyberspace.
As root, it will return a list of open file descriptors for each process
on the system.  You can then grep for cdsLib.  It might be difficult
to time it properly because the communication via the cdsLib socket
is short-lived.  Let me know if you'd like lsof for Digital UNIX or OSF/1.

>	By "different usernames", you mean different UNIX usernames, or
>	different DCE principals.

I mean different UNIX usernames.  Actually, it's really a UNIX system
UID/GID pair combination.  So, two processes logged in as thoman will
most likely share the same cdsclerk process.  Each process may have
different DCE credentials, but the cdsclerk process deals with that
properly.  The reason you may see two cdsclerk processes for the same
user is probably that the GID of whatever running DCE application
is different between the two.  You can do an

ls -lg /opt/dcelocal/var/adm/directory/cds

to see a list of the named sockets.


>Can you pls tell me how the clerk is triggered ?

1. The cdsadv starts and a thread listens over the cdsLib socket in
  /opt/dcelocal/var/adm/directory/cds.

2. Someone starts up a DCE application.  In libdce.so, the first CDS
  library call will attempt to connect to the cdsLib socket and passes
  along the UID/GID info.  Perhaps the socket error occurs here if
  there are many (don't know how many - depends on the system configuration)
  concurrent *initial* CDS calls amongst the processes.

3. The cdsadv listener thread keeps a list of existing cdsclerk processes
  and their respective UID/GID attribute.

  a. If nothing matches, cdsadv will fork and exec a cdsclerk process
    and hand both the cdsclerk process and the DCE application a
    unique named socket (cdsclerk_<pid>_<user>) over which they will
    then communicate forever more as the DCE application will then
    close it's socket to cdsLib.

  b. If there is a match, cdsadv returns the corresponding named socket
    to the DCE application, over which they will communicate forever more
    as the application will then close it's socket to cdsLib.

  Perhaps the socket error is returned on this named socket, although
  the error in 2. is more likely.

Yes, this context switching is time consuming, and the code to handle all
of the different call contexts is somewhat messy, but this was designed
in the days of a process only having a max of 64 file descriptors on Ultrix.
We are in the process of prototyping an inline clerk, similar to the
DCE NT V2.0 implementation.

>	382312534, association shutdown (dce / rpc)
>	What's that mean ?

This is a lower level RPC status that usually is not seen at the user
level.  Where does this show up?

--Roger
2246.6RE -.1OZROCK::THOMANFor SII support dial 110 ! (OZY internal only)Thu May 22 1997 01:16114
	Sorry for the delay in getting back to you.

	Thanks for -.1 - that was some realy useful, but dangerous
	info for us. It looks like we're going to have some real
	concerns in the near future. Our customer might have as 
	many as 800 clients trying to authenticate at each change
	of shift of its telephone constultants.

	At the moment we're trying to walk before running - I just
	tried 200 & got 25 failures due to
 
		"association shutdown (dce / rpc)"

	With these 200, 1 shell script started the test clients
	su(1)'ing them as UNIX users 
		
		test1,...,test100

	while a 2nd shell script started them in the reverse order.
	
	Hence AT-MOST 2 application clients would've being sharing 
	1 cdsclerk.

	
	With this run of 200, I got NO "Error On Socket" problems
	but I can't say that 1 test run is conclusive, mainly because
	the machine is so much more heavily loaded. It's now running
	approx 105 cdsclerks, instead of approx 5. Also, there's 
	200 su(1) processes that  aren't in existance when I run all
	tests under root.


>>	382312534, association shutdown (dce / rpc)
>>	What's that mean ?
>
>This is a lower level RPC status that usually is not seen at the user
>level.  Where does this show up?


	We get this as some (upto 10%) of the application clients try
	to authenticate. They all run on a DCE client machine.

	Below is the code that the client is running. I'm not very 
	familar with the DCE-API, and didn't write this code.


    /*
    **  Do the validation
    */
    if( privileged )
    {
        sec_login_valid_and_cert_ident(dceContext,
                                       &dcePassword,
                                       &passwordExpired,
                                       &dceAuthorisor,
                                       &dceStatus);
    }
    else
    {
        sec_login_validate_identity(dceContext,
                                    &dcePassword,
                                    &passwordExpired,
                                    &dceAuthorisor,
                                    &dceStatus);
        if( dceStatus == error_status_ok )
        {
            sec_login_certify_identity( dceContext,
                                       &dceStatus );
        }
    }

    if( (status = CHECK_DCE_STATUS( dceStatus )) != ZAP_OK )
    {
        sec_login_purge_context( &dceContext, &dceStatus );
        return status;
    }

    /*
    **  Make sure we were authenticated by the security server and
    **  don't just have local credentials.
    **
    **  No access at all if we can't establish network credentials.
    */
    if( dceAuthorisor != sec_login_auth_src_network )
    {
        sec_login_purge_context( &dceContext, &dceStatus );
        status = ZAP_NETWORK;
        return ZAP_REPORT( status, ZapSevError, ZAP_SOURCE_ZAP, status, 	
			   Z18_NO_NETWORK_AUTH );
    }




	That "ZAP_REPORT() macro is causing the appl'n client to print:

SI API: Error message from DCE
        382312534, association shutdown (dce / rpc)
        ../../../base/api/src/zapauthn.c, 457




	
	Thanks,

	Craig.






	
2246.7Does .6 help Roger?OZROCK::THOMANFor SII support dial 110 ! (OZY internal only)Wed May 28 1997 21:5010
 
	Does my description of .6 explain why I get "association shutdown".

	Can you tell me what it means ?


	Thanks,

	Craig.

2246.8Also, what's "336761021 Credentials cache I/O operation failed" mean?OZROCK::THOMANFor SII support dial 110 ! (OZY internal only)Fri May 30 1997 00:247
	The machine was under high load at the time - ie: 99% CPU
	in use.

	Thx

	Craig.
2246.9association rundown doesn't mean much per seTUXEDO::ZEEThere you go.Fri May 30 1997 20:0913
Looking through the RPC runtime code, rpc_s_assoc_shutdown gets returned
if the client side received a shutdown request from the server.  Since
you received this status from a sec_login* call, perhaps your security
server is overloaded?  Do you have any security replicas?  I'm grasping
a bit here, but I'll keep looking to see how a shutdown request gets
generated.

>Also, what's  "336761021 Credentials cache I/O operation failed"   mean?

That is the KRB5_CC_IO status code, which doesn't help much.  It looks
like a side effect of the unusually high load, perhaps a call timing out.

--Roger
2246.10Yes & Scale Testing Done ?OZROCK::THOMANFor SII support dial 110 ! (OZY internal only)Mon Jun 02 1997 04:4823
>you received this status from a sec_login* call, perhaps your security
>server is overloaded?  

	That's most likely the case.


> Do you have any security replicas?  
	
	No. It's only a 2 (or 3) node cell.

	
By using 2 different UNIX user names to start the tests under I get
noticeable improvements. 


Has any one in your team ever done any DCE CDS  & Security Scalability
Testing ? If so, is there a report, or a summary of the results avaliable 
(for internal use only) ?

	
		Thx

		Craig.