T.R | Title | User | Personal Name | Date | Lines |
---|
2142.1 | Xref | VAXCPU::michaud | Jeff Michaud - ObjectBroker | Wed Mar 12 1997 10:47 | 2 |
| 441 GIDDAY::CHONG 17-DEC-1992 3 dnascd core dumps
1140 WPOPTH::ZAMBOTTI 14-JUL-1994 10 dnascd core dumps on any remote login (conflict with ENHANCED security)
|
2142.2 | | UPSAR::WALLACE | Digital: A Dilbertian Company | Wed Mar 12 1997 11:02 | 4 |
| There have been several fixes for dnnascd core dumps, which I believe
are in DECnet V3.2B. You should install 3.2B, or at least grab
the dnascd executable from a 3.2B system. -- Vince
|
2142.3 | | COMICS::HESS | | Wed Mar 12 1997 11:10 | 8 |
| Hi,
Thanks for the replies, I did check 3.2b release notes but did not
see anything relevant and the site has already indicated that they are
not willing to upgrade on the offchance that it will fix the issue,
however as you indicate there are fixes in this area I will attempt to
persuade them otherwise.
Pete
|
2142.4 | | COMICS::HESS | | Wed Mar 12 1997 11:49 | 4 |
| Is it just the dnascd executable needed or is there anything else
required, I think they will try that before going to 3.2b as it will
take some time for them to upgrade due to change constraints.
Pete
|
2142.5 | | UPSAR::WALLACE | Digital: A Dilbertian Company | Thu Mar 13 1997 13:36 | 4 |
| I can't think of anything that changed from 3.2A -> 3.2B that
would cause incompatabilities for dnascd. You should be able
to just upgrade that one executable. -- Vince
|
2142.6 | | COMICS::HESS | | Wed Apr 23 1997 13:33 | 8 |
| well, the upgrade to 3.2b has been done, however the problem still
remains, they have noticed though that when the problem occurs free
memory is down to 0. restarting decnet enables logins again, I am
waiting more information but have advised them to reduce ubc-maxpercent
to 60 , it was 100.
Any comments ?
Thanks
Pete
|
2142.7 | | COMICS::HESS | | Tue May 13 1997 08:04 | 11 |
| Hi,
with ubc-maxpercent now at 40 the problem is still there although
not as frequent, and of course the main issue is having to restart
Decnet each time to clear the problem, as this is a production
environment this is a real issue for this customer, I will raise this
as an IPMT. I have extracts from the daemon.log showing the
segmentation faults, is there anything else that would be useful to
provide, e.g will we need to run dnascd in debug.
Thanks for any advice.
Pete.
|
2142.8 | some other resource?? | KITCHE::schott | Eric R. Schott USG Product Management | Tue May 13 1997 09:55 | 13 |
| Hi
I don't know what ubcmax would have to do with thisl..I suggest
you put it back...are they running out of swap space? or some other
kernel resource??
Have you run sys_check on the system?
see
http://www-unix.zk3.dec.com/tuning/tools/sys_check/sys_check.html
|
2142.9 | | DRAGNS::WALLACE | | Tue May 13 1997 14:02 | 21 |
| Hi,
You really need to do some more problem isolation at the customer
site. This problem does not sound familiar, and we have done
some fairly rigorous load tests on DECnet.
Are you monitoring system resources, like memory and swap space?
You say you restart DECnet. I assume that means running
decnetshutdown/decnetstartup. What about less drastic
mesures, ie just cycling various parts of DECnet, eg
osi transport, routing, session conreol, etc ?
Can you establish outgoing connections from the system ?
Do incoming connections other than dlogin work (eg dcp) ?
Is x25 being used on the system ?
Vince
|
2142.10 | | COMICS::HESS | | Wed May 21 1997 10:50 | 57 |
| Thanks for the input , however things get worse, using the Polycentre
performance monitor indicates tha memory utilisation is around 70-80 percent,
no swapping, so far only shutting down decnet (decnetshutdown) seems to clear
the problem, trying to login to the system from anywhere gives
Login information invalid at remote node.
( we are not now seeing a node unreachable)
dcp fails
outgoing works fine
incoming using explicit user and password fails also.
X25 is not being used on this system
The dna processes are in the following states
dnansd IW
dnalimd Iw
dnaevld I
dnaksd IW
dnascd S
dnanoded IW
There are no core dumps produced, and an extract from the daemon log
shows the entries for decnet as follows along with the error.
LNFSL2 >copy/log login.com lnxcm3"dcs_sh XXXXXXXXXX"::
%COPY-E-OPENOUT, error opening LNXCM3"dcs_sh password"::[]LOGIN.COM;3 as output
-RMS-E-CRE, ACP file create failed
-SYSTEM-F-INVLOGIN, login information invalid at remote node
%COPY-W-NOTCOPIED, SYS$SYSDEVICE:[OPS.OPS_HOPKINS]LOGIN.COM;3 not copied
LNFSL2 >set host lnxcm3
%SYSTEM-F-INVLOGIN, login information invalid at remote node
LNFSL2 >
looking in the daemon.log here are the last DECnet bits int it
May 15 19:56:32 LNGIBX0010G fal[15904]: DIRECTORY access from
LOCAL:.LNFSL2::uic
=[0,0]LED_OPER, user=g1_copy, directory=/disk1/pickbackup,
filename=/pickbackup/
RC*.*
May 15 19:56:32 LNGIBX0010G fal[23332]: DIRECTORY access from
LOCAL:.LNFSL2::uic
=[0,0]LED_OPER, user=g1_copy, directory=/disk1/pickbackup,
filename=/pickbackup/
G1PROUT.DAT
May 15 19:56:32 LNGIBX0010G dnascd[21279]: Process exit (PID 23332).
May 15 19:56:32 LNGIBX0010G dnascd[21279]: Process exit (PID 15904).
May 15 19:56:32 LNGIBX0010G fal[27096]: DIRECTORY access from
LOCAL:.LNFSL2::uic
=[0,0]LED_OPER, user=g1_copy, directory=/disk1/pickbackup,
filename=/pickbackup/
G1CCNTRY.???
May 15 20:20:50 LNGIBX0010G netacl[28304]: permit
host=hbsltw0002.btco.com/138.9
3.213.204 service=telnetd execute=/usr/sbin/telnetd
I am really at a loss as to what to check next. Any Ideas ?
Thanks
Pete
|
2142.11 | | DRAGNS::WALLACE | | Wed May 21 1997 15:00 | 35 |
| Use ncl to look at session:
ncl> show session control all
ncl> show session control appl fal all
The counters might help indicate what the problem is.
If that doesn't help you could try running dnascd with some
debug options:
1) Edit /etc/cml.conf and change the line
8 lim /usr/sbin/dnascd
to
8 /usr/sbin/dnascd
2) Send a hangup signal to dnalimd, ie "kill -HUP #" where '#'
is the pid of dnalimd
3) Kill the current dnascd process
4) Manually start dnascd. You might first try
/usr/sbin/dnascd -logfile /tmp/log
and if nothing in the log file helps, kill dnascd again & try
/usr/sbin/dnascd -debug -verbose
This second version will print lots of messages to your terminal
5) In either case, after manually starting dnascd you have to
issue the necessary ncl commands to start session:
ncl create session control
ncl enable session control
Hopefully either the logging or debug messages will give enough
information to figure out why the connect is failing.
Vince
|
2142.12 | | COMICS::HESS | | Fri May 23 1997 06:16 | 14 |
| Vince, thanks for the reply,
The NCL counters did not reveal any problems,issueing the hangup to
dnalimd caused dnascd to coredump , is this expected ? dnascd is now started
with logging enabled but there is nothing being written to the log file ,
will this only log errors ?
after restarting dnascd there was not a problem.
It seems the next step is to run dnascd in debug as you suggest, as the problem
seems to occur more at night (but not always) they will direct the output to a
file . does this produce a lot of data.
also this time they also saw segmentation faults , but these do not
always occur.
Pete
|
2142.13 | | DRAGNS::WALLACE | | Fri May 23 1997 14:36 | 13 |
| Hi,
Sorry, it looks like you also have to specify -debug & -verbose when
you specify a log file. It should be recording various operations
that it performs in response to connect requests or other events.
It shouldn't core dump when you send the HUP signal to dnalimd.
For clarification, are you saying that the problem went away
when you restarted dnascd?
Vince
|
2142.14 | | COMICS::HESS | | Tue May 27 1997 09:20 | 7 |
| Vince,
Thanks, to confirm, yes, restarting dnascd resolved the issue
temporarily, I guess we will have to make the logging permanent to
capture this ?
Pete
|
2142.15 | | DRAGNS::WALLACE | | Tue May 27 1997 14:22 | 13 |
| Hi,
Well, it seems clear enough that dnascd is getting screwed up
somehow. The following invocation on my system produces output
both to the log file and to the terminal from which I run dnascd:
./dnascd -debug -verbose -logfile /tmp/log
BTW, did you open an IPMT case on this (CFS.51442) ?
Vince
|
2142.16 | | COMICS::HESS | | Wed May 28 1997 05:18 | 5 |
| Vince,
Yes I have raised an IPMT, this is now getting critical for the
customer. I will get the information you requested on the case.
Pete
|