T.R | Title | User | Personal Name | Date | Lines |
---|
2246.1 | error implies cdsadv/cdsclerk problems | TUXEDO::ZEE | There you go. | Mon May 12 1997 12:06 | 17 |
| How many clients are we talking about? Are they running under the same
username? Is cdsadv still running, and how many cdsclerk processes do
you have?
The "error with socket" message implies a problem on the local node in
a process trying to talk to either cdsadv (on the initial CDS call), or
to its respective cdsclerk process.
> Any ideas on what we can "tweak" to get around this problem ?
Besides a redesign of cdsadv and cdsclerk, if it's just too many
processes overloading one socket (cdsLib named socket), there's not much
you can do. If it's too many processes overloading a
cdsclerk_<pid>_<username> socket, you could try running your clients
under different usernames.
--Roger
|
2246.2 | Some Answers.... | OZROCK::THOMAN | Yoda on C++ to C programmers: "You Must Unlearn What You Have Learned!" | Tue May 13 1997 03:05 | 62 |
|
>How many clients are we talking about?
as low as 10.
I don't know if this is a coincidence, but when I increased
the maxuser to 512, so I could increase max-threads-per-user
to 5000, I got an improvement, but I'm not sure what
side effects this has. Even at 5000 th/user, I still get 3
failures in 40 authen'ns.
>Are they running under the same username?
Yes...
>You could try running your clients under different usernames.
Does this mean the problem is significantly LESS likely
if I run them as different names ?
>overloading one socket (cdsLib named socket)
How can I look at the number of connections to that socket?
>cdsclerk_<pid>_<username> socket
... and look at these ones ?
> Is cdsadv still running, ...
Yes
>and how many cdsclerk processes do you have?
Only 4 - always, whether the tests are running or not.
There are 3 sessions logged into the machine (me twice
and root once), so I assume they account for 3, & there's
the usual:
cdsclerk -U DNS$SERVER -u 0 -m 0 ....
(Which always makes me believe you prefer VMS ;-) ?? )
We ** don't ** run SIA.
Thanks,
Craig.
|
2246.3 | Clarification... | OZROCK::THOMAN | Yoda on C++ to C programmers: "You Must Unlearn What You Have Learned!" | Tue May 13 1997 22:13 | 68 |
|
>>cdsclerk_<pid>_<username> socket, you could try running your clients
>>under different usernames.
By "different usernames", you mean different UNIX usernames, or
different DCE principals.
I'm trying to get an understanding of how the cdsclerk's are started...
At the momement one a DCE client (sec client & cds client) node there are 4
cdsclerk's:
---------------------------------------------------------------------------
ie:
/opt/dcelocal/bin/cdsclerk -U root -u 0 -m 0
/opt/dcelocal/bin/cdsclerk -U root -u 0 -m 0
/opt/dcelocal/bin/cdsclerk -U thoman -u 321 -m 0
/opt/dcelocal/bin/cdsclerk -U moored -u 397 -m 0
but only 3 logins:
------------------
# w
11:09 up 18:37, 3 users, load average: 0.00, 0.01, 0.00
User tty from login@ idle JCPU PCPU what
thoman p0 x.dec.c 16:34 1:14 207:10 1 -tcsh
thoman p1 x.dec.c 16:34 209:18 1 -tcsh
root p2 x.dec.c 16:38 17:22 58 -csh
On the DCE SERVER (both security & CDS) in my 2 node cell, there is:
---------------------------------------------------------------------
/opt/dcelocal/bin/cdsclerk -U DNS$SERVER -u 0 -m 0
/opt/dcelocal/bin/cdsclerk -U root -u 0 -m 0
/opt/dcelocal/bin/cdsclerk -U root -u 0 -m 0
and only 2 logins:
------------------
# w
11:10 up 18:38, 2 users, load average: 0.04, 0.02, 0.01
User tty from login@ idle JCPU PCPU what
root p0 x.dec.com 16:33 8 1 w
thoman p1 x.dec.com 16:34 18:17 -tcsh
Can you pls tell me how the clerk is triggered ?
Thanks,
Craig.
|
2246.4 | Under these load conditions, we also see: | OZROCK::THOMAN | Yoda on C++ to C programmers: "You Must Unlearn What You Have Learned!" | Wed May 14 1997 03:10 | 11 |
|
382312534, association shutdown (dce / rpc)
What's that mean ?
Thx
Craig.
|
2246.5 | answers to .2 and .3 and .4 | TUXEDO::ZEE | There you go. | Thu May 15 1997 18:08 | 75 |
| Note: All of the following only pertains to the DCE UNIX implementations.
DCE NT and DCE VMS have different clerk designs.
>>You could try running your clients under different usernames.
> Does this mean the problem is significantly LESS likely
> if I run them as different names ?
Perhaps, depending on which socket seems to be having the problem.
> How can I look at the number of connections to that socket?
I don't know how to do this, but a roundabout method is to use the lsof
(list of open file descriptors) utility that is available in cyberspace.
As root, it will return a list of open file descriptors for each process
on the system. You can then grep for cdsLib. It might be difficult
to time it properly because the communication via the cdsLib socket
is short-lived. Let me know if you'd like lsof for Digital UNIX or OSF/1.
> By "different usernames", you mean different UNIX usernames, or
> different DCE principals.
I mean different UNIX usernames. Actually, it's really a UNIX system
UID/GID pair combination. So, two processes logged in as thoman will
most likely share the same cdsclerk process. Each process may have
different DCE credentials, but the cdsclerk process deals with that
properly. The reason you may see two cdsclerk processes for the same
user is probably that the GID of whatever running DCE application
is different between the two. You can do an
ls -lg /opt/dcelocal/var/adm/directory/cds
to see a list of the named sockets.
>Can you pls tell me how the clerk is triggered ?
1. The cdsadv starts and a thread listens over the cdsLib socket in
/opt/dcelocal/var/adm/directory/cds.
2. Someone starts up a DCE application. In libdce.so, the first CDS
library call will attempt to connect to the cdsLib socket and passes
along the UID/GID info. Perhaps the socket error occurs here if
there are many (don't know how many - depends on the system configuration)
concurrent *initial* CDS calls amongst the processes.
3. The cdsadv listener thread keeps a list of existing cdsclerk processes
and their respective UID/GID attribute.
a. If nothing matches, cdsadv will fork and exec a cdsclerk process
and hand both the cdsclerk process and the DCE application a
unique named socket (cdsclerk_<pid>_<user>) over which they will
then communicate forever more as the DCE application will then
close it's socket to cdsLib.
b. If there is a match, cdsadv returns the corresponding named socket
to the DCE application, over which they will communicate forever more
as the application will then close it's socket to cdsLib.
Perhaps the socket error is returned on this named socket, although
the error in 2. is more likely.
Yes, this context switching is time consuming, and the code to handle all
of the different call contexts is somewhat messy, but this was designed
in the days of a process only having a max of 64 file descriptors on Ultrix.
We are in the process of prototyping an inline clerk, similar to the
DCE NT V2.0 implementation.
> 382312534, association shutdown (dce / rpc)
> What's that mean ?
This is a lower level RPC status that usually is not seen at the user
level. Where does this show up?
--Roger
|
2246.6 | RE -.1 | OZROCK::THOMAN | For SII support dial 110 ! (OZY internal only) | Thu May 22 1997 01:16 | 114 |
| Sorry for the delay in getting back to you.
Thanks for -.1 - that was some realy useful, but dangerous
info for us. It looks like we're going to have some real
concerns in the near future. Our customer might have as
many as 800 clients trying to authenticate at each change
of shift of its telephone constultants.
At the moment we're trying to walk before running - I just
tried 200 & got 25 failures due to
"association shutdown (dce / rpc)"
With these 200, 1 shell script started the test clients
su(1)'ing them as UNIX users
test1,...,test100
while a 2nd shell script started them in the reverse order.
Hence AT-MOST 2 application clients would've being sharing
1 cdsclerk.
With this run of 200, I got NO "Error On Socket" problems
but I can't say that 1 test run is conclusive, mainly because
the machine is so much more heavily loaded. It's now running
approx 105 cdsclerks, instead of approx 5. Also, there's
200 su(1) processes that aren't in existance when I run all
tests under root.
>> 382312534, association shutdown (dce / rpc)
>> What's that mean ?
>
>This is a lower level RPC status that usually is not seen at the user
>level. Where does this show up?
We get this as some (upto 10%) of the application clients try
to authenticate. They all run on a DCE client machine.
Below is the code that the client is running. I'm not very
familar with the DCE-API, and didn't write this code.
/*
** Do the validation
*/
if( privileged )
{
sec_login_valid_and_cert_ident(dceContext,
&dcePassword,
&passwordExpired,
&dceAuthorisor,
&dceStatus);
}
else
{
sec_login_validate_identity(dceContext,
&dcePassword,
&passwordExpired,
&dceAuthorisor,
&dceStatus);
if( dceStatus == error_status_ok )
{
sec_login_certify_identity( dceContext,
&dceStatus );
}
}
if( (status = CHECK_DCE_STATUS( dceStatus )) != ZAP_OK )
{
sec_login_purge_context( &dceContext, &dceStatus );
return status;
}
/*
** Make sure we were authenticated by the security server and
** don't just have local credentials.
**
** No access at all if we can't establish network credentials.
*/
if( dceAuthorisor != sec_login_auth_src_network )
{
sec_login_purge_context( &dceContext, &dceStatus );
status = ZAP_NETWORK;
return ZAP_REPORT( status, ZapSevError, ZAP_SOURCE_ZAP, status,
Z18_NO_NETWORK_AUTH );
}
That "ZAP_REPORT() macro is causing the appl'n client to print:
SI API: Error message from DCE
382312534, association shutdown (dce / rpc)
../../../base/api/src/zapauthn.c, 457
Thanks,
Craig.
|
2246.7 | Does .6 help Roger? | OZROCK::THOMAN | For SII support dial 110 ! (OZY internal only) | Wed May 28 1997 21:50 | 10 |
|
Does my description of .6 explain why I get "association shutdown".
Can you tell me what it means ?
Thanks,
Craig.
|
2246.8 | Also, what's "336761021 Credentials cache I/O operation failed" mean? | OZROCK::THOMAN | For SII support dial 110 ! (OZY internal only) | Fri May 30 1997 00:24 | 7 |
|
The machine was under high load at the time - ie: 99% CPU
in use.
Thx
Craig.
|
2246.9 | association rundown doesn't mean much per se | TUXEDO::ZEE | There you go. | Fri May 30 1997 20:09 | 13 |
| Looking through the RPC runtime code, rpc_s_assoc_shutdown gets returned
if the client side received a shutdown request from the server. Since
you received this status from a sec_login* call, perhaps your security
server is overloaded? Do you have any security replicas? I'm grasping
a bit here, but I'll keep looking to see how a shutdown request gets
generated.
>Also, what's "336761021 Credentials cache I/O operation failed" mean?
That is the KRB5_CC_IO status code, which doesn't help much. It looks
like a side effect of the unusually high load, perhaps a call timing out.
--Roger
|
2246.10 | Yes & Scale Testing Done ? | OZROCK::THOMAN | For SII support dial 110 ! (OZY internal only) | Mon Jun 02 1997 04:48 | 23 |
| >you received this status from a sec_login* call, perhaps your security
>server is overloaded?
That's most likely the case.
> Do you have any security replicas?
No. It's only a 2 (or 3) node cell.
By using 2 different UNIX user names to start the tests under I get
noticeable improvements.
Has any one in your team ever done any DCE CDS & Security Scalability
Testing ? If so, is there a report, or a summary of the results avaliable
(for internal use only) ?
Thx
Craig.
|