T.R | Title | User | Personal Name | Date | Lines |
---|
2121.1 | | UPSAR::WALLACE | Digital: A Dilbertian Company | Thu Jan 30 1997 13:02 | 19 |
| Are we talking ULTRIX or OSF here? On ULTRIX, dlogind does a
"kill(0, SIGKILL)". On OSF, it doesn't send any signal. However,
it does do a "revoke()" on the pty, which apparently telnetd
doesn't do.
The man page for ps says:
[Digital] The system puts exiting child processes in the <defunct>
state if their parent process is still running and has not caught the
SIGCHLD signal or executed a wait() system call.
I would expect the parent of these problem processes to be some
shell, not dlogind. In any event, dlogind does catch SIGCHLD
signals and does a waitpid() in the handler routine.
Hope this helps.
vince
|
2121.2 | applic programmingf= bug or init bug with revoke() ? | PANTER::MARTIN | Be vigilant... | Fri Jan 31 1997 05:28 | 37 |
| Hi Vince,
>> Are we talking ULTRIX or OSF here?
Sorry I forget to mention it was on Digital Unix v3.2x !
>> On OSF, it doesn't send any signal. However, it does
>> do a "revoke()" on the pty, which apparently telnetd
>> doesn't do.
Interesting !
>> I would expect the parent of these problem processes to be some
>> shell, not dlogind. In any event, dlogind does catch SIGCHLD
>> signals and does a waitpid() in the handler routine.
You are correct, the parent process of the application is a shell
(I think it's "sh" but I should check...)
But if we know that dlogind does a revoke() on the pty (I don't
know what exactly it means in term of child-parent relationship)
how would you explain that the child of the shell (the applic.)
goes into the <defunct> state if both the dlogind and the parent
shell process do disappear ?
I would expect the applic to become child of the "init" process
and be "cleaned" by the "init"'s SIGCHLD signal handler !?
As I mentioned in .0, it does work for LAT and telnet connections:
all the processes are properly killed.
So do you think it's an "init" bug with revoke() or could it be related
to the applic programming ?
Thanks for your help,
============================
Alain MARTIN/SSG Switzerland
|
2121.3 | | netrix.lkg.dec.com::thomas | The Code Warrior | Fri Jan 31 1997 09:14 | 3 |
| A guess is that the revoke is preventing the SIGHUP from being sent to
the foreground process group. dlogind should do a kill(0, SIGHUP)
before doing the revoke (and maybe a sleep(1) too.).
|
2121.4 | | UPSAR::WALLACE | Digital: A Dilbertian Company | Mon Feb 03 1997 16:09 | 3 |
| I'm willing to code up Matt's suggestion if the customer is
willing to give it a try. -- Vince
|
2121.5 | let me know when dlogind is modified... | LEMAN::MARTIN_A | Be vigilant... | Wed Feb 05 1997 03:58 | 6 |
| I can ask the customer to test it if you do modify dlogind...
Let me know when the fix is available.
Cheers, ============================
Alain MARTIN/SSG Switzerland
|
2121.6 | | UPSAR::WALLACE | Digital: A Dilbertian Company | Fri Feb 07 1997 10:28 | 7 |
| Hi,
Copy over netrix::test/dlogind.note2121.Z and give it a try. Don't
forget dlogind needs to be set uid root.
Vince
|
2121.7 | modified "dlogind" does not help ! | PANTER::MARTIN | Be vigilant... | Wed Mar 19 1997 10:20 | 27 |
| Hi Vince,
We tried the modified dlogind and 8-( the same...
When the decnet connection close at the remote side, the dlogind
disappear but the process launched at login time by .profile (a
korn shell script doing anything) becomes child of "init"
instead of being killed as it's father (dlogind).
We discovered however that changing the user's login shell from
ksh to csh does change the behaviour. In such a case all the
descending processes of dlogind process do disappear.
But using the same user account with ksh from telnet or lat connections
does not show the symptom (all the descending processes of dlogind
process do disappear too).
Any idea ?
dlogind or ksh bug ????
Sorry to be so late answering, but customer was not ready to test as
the workaround we provided (connection through LAT) does work !
Cheers,
Alain
|
2121.8 | | UPSAR::WALLACE | Digital: A Dilbertian Company | Thu Mar 20 1997 16:37 | 14 |
| Hi,
That's a good clue about the ksh. We've run into differences between
shells before.
I can get a backgrounded process to hang around after dlogind exits,
but it does not run out of control. It just continues to function
normally.
Can you get a copy of the .profile file? And what happens if you
kill dlogind, rather than breaking the network connection?
Vince
|
2121.9 | .profile & more info... | PANTER::MARTIN | Be vigilant... | Fri Mar 21 1997 04:02 | 78 |
| Hi Vince,
>> Can you get a copy of the .profile file?
I just used Digital Unix's template and launched a simple ksh
script from it at the end (customer's one is running an applic.
on top of oracle db) :
#
# *****************************************************************
# * *
# * Copyright (c) Digital Equipment Corporation, 1991, 1995 *
# * *
# * All Rights Reserved. Unpublished rights reserved under *
# * the copyright laws of the United States. *
# * *
# * The software contained on this media is proprietary to *
# * and embodies the confidential technology of Digital *
# * Equipment Corporation. Possession, use, duplication or *
# * dissemination of the software and media is authorized only *
# * pursuant to a valid written license from Digital Equipment *
# * Corporation. *
# * *
# * RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure *
# * by the U.S. Government is subject to restrictions as set *
# * forth in Subparagraph (c)(1)(ii) of DFARS 252.227-7013, *
# * or in FAR 52.227-19, as applicable. *
# *
# *****************************************************************
#
# HISTORY
#
# @(#)$RCSfile: .profile,v $ $Revision: 4.1.3.4 $ (DEC) $Date:
1992/09/30 13:49:
15 $
#
PATH=$HOME/bin:${PATH:-/usr/bin:.}
export PATH
stty dec
tset -I -Q
PS1="`hostname`> "
MAIL=/usr/spool/mail/$USER
./sleep_600.ksh
Here is my sleep_600.ksh
#!/bin/ksh
echo "Sleeping for 10 min."
sleep 600
>> And what happens if you kill dlogind, rather than breaking
>> the network connection?
Exactly the same. When ksh is the user's login shell, ksh becomes
child of init process instead of being killed by signal 9 (KILL).
As if the ksh login shell hasn't received the SIGKILL signal.
So the ksh process survives, all it's descending processes do
also survive.
When we either break the connection at the remote side (VMS machine)
for a csh login shell or we kill (SIGKILL) the dlogind manually , all
the descending processes of dlogind are killed.
That's what we expect !
We tested your modified dlogind (netrix::test/dlogind.note2121.Z)
on both v3.2c and v3.2g, but we haven't noticed any difference.
Thanks again for your help, do you think it's time for IPMT
(I understand you cannot work too long on unofficial requests,
so we do...), but I'd like to know better where the problem is
before filling up the IPMT form (dlogind or ksh bug???) ?
Cheers,
Alain
|
2121.10 | | UPSAR::WALLACE | Digital: A Dilbertian Company | Fri Mar 21 1997 13:32 | 6 |
| I think it is probably time to open an IPMT case. Since things
seem to work OK with lat & telnet, open the case against DECnet.
If it turns out to be ksh after all we'll forward it up to OSG.
Vince
|
2121.11 | Ok, I'll fill up an IPMT form against DECnet. | PANTER::MARTIN | Be vigilant... | Tue Mar 25 1997 03:32 | 10 |
| Hi Vince,
I'll open an IPMT then against DECnet, but will wait until the
onsite engineer comes back from holiday (next week) to collect
all the necessary info.
Thanks for your help.
Alain
|