| Title: | NAS Message Queuing Bus |
| Notice: | KITS/DOC, see 4.*; Entering QARs, see 9.1; Register in 10 |
| Moderator: | PAMSRC::MARCUS EN |
| Created: | Wed Feb 27 1991 |
| Last Modified: | Thu Jun 05 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 2898 |
| Total number of notes: | 12363 |
I'm seeing
ld, caught signal 11
in my group log file regularly. After reading Note
2718 of this conference, I'll use "sjz"'s method of
giving you version of DMQ processes:
# su - dmq
% what `which dmqgcp` | grep DECmessageQ
DECmessageQ for UNIX, V3.2A, Sun Nov 24 00:01:39 EST 1996
% what `which dmqld` | grep DECmessageQ
DECmessageQ for UNIX, V3.2A, Sun Nov 24 00:01:39 EST 1996
% what `which dmqbcp` | grep DECmessageQ
DECmessageQ for UNIX, V3.2A, Sun Nov 24 00:01:39 EST 1996
% what `which dmqqe` | grep DECmessageQ
DECmessageQ for UNIX, V3.2A, Sun Nov 24 00:01:39 EST 1996
# setld -i | grep DMA
DMACL32A installed DECmessageQ Client Library
DMACLS32A installed DECmessageQ Client Library Server
DMADEV32A installed DECmessageQ Development Environment
DMAEXA32A installed DECmessageQ Example Programs
DMAMAN32A installed DECmessageQ Manual Reference Pages
DMARLS32A installed DECmessageQ Release Notes
DMARTO32A installed DECmessageQ Run Time Environment
# uname -a
OSF1 mynode.ozy.dec.com V3.2 148 alpha
The symptoms are that a 2nd node is continually trying to
connect but is being denied 'cause the group init file
on the node experiencing the crashes is doesn't have the
2nd in it's "XGROUP SECTION" and does have XGROUP_VERIFY
set to YES.
The solution for now is to fix the group init file, but
is there a known resouce leak in this process that results
in the dmqld crashing ?
I can't find any core files to give you.
Craig.
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 2751.1 | XHOST::SJZ | Rocking the Messaging Desktop ! | Thu Jan 30 1997 21:34 | 7 | |
resource leaks don't necessarily cause a signal 11 (SIGSEGV,
or segmentation fault (also known as a GPF or accvio depend-
ing on which world you grew up in)).
Bad programming causes a signal 11. We will look into it.
_sjz.
| |||||
| 2751.2 | XHOST::SJZ | Rocking the Messaging Desktop ! | Thu Jan 30 1997 21:37 | 12 | |
one more note, when you so the setld -i | grep thing
we usually instruct people to search for "DECmessageQ"
rather than "DMA" since the 'A' in "DMA" changes by
platform.
# setld -i | grep DECmessageQ
It produces the exact same results in this case, but
it is good to know for general trouble shooting.
_sjz.
| |||||
| 2751.3 | Ok - Your Move | OZROCK::THOMAN | The House Of Script | Sun Feb 02 1997 20:19 | 24 |
RE .1 > ................ We will look into it. Thanks. RE .2 > rather than "DMA" since the 'A' in "DMA" changes by > platform. > > # setld -i | grep DECmessageQ I've been tricked by that "floating letter" b4. In future I'll use what you showed. Thx C. | |||||
| 2751.4 | More details from the grp log file | OZROCK::THOMAN | The House Of Script | Mon Feb 03 1997 08:28 | 71 |
I just noticed the following in our group log file on the
offending node:
ld, link listener for group 1073756384 is running
The group numbers involved are both in the two thousands !
Can you pls tell me what the (n.m) means in the lines that are
displayed in the log files ?
If it is the PID of the dmq process, then I don't understand how
the number is being reassigned to the new dmqld process in the
following
example:
************ dmqld (6163.0) 03-FEB-1997 10:54:56 ************
ld, link sender for group 2966 to group 2808 is exiting
************ dmqld (23269.0) 03-FEB-1997 10:54:56 ************
ld, link receiver for group 2966 from group 2808 is exiting
************ dmqld (23269.0) 03-FEB-1997 10:54:56 ************
ld, caught signal 11
************ dmqld (23269.0) 03-FEB-1997 10:54:56 ************
ld, link receiver for group 2966 from group 2808 is exiting
************ dmqld (23269.0) 03-FEB-1997 10:55:06 ************
ld, duplicate link receiver for group 2803
************ dmqld (23269.0) 03-FEB-1997 10:55:06 ************
ld, initialization failure
************ dmqld (23269.0) 03-FEB-1997 10:55:16 ************
ld, duplicate link receiver for group 2803
************ dmqld (23269.0) 03-FEB-1997 10:55:16 ************
ld, initialization failure
************ dmqld (23269.0) 03-FEB-1997 10:55:24 ************
ld, link receiver for group 2966 from group 2808 is running
************ dmqld (23269.0) 03-FEB-1997 10:55:24 ************
ld, link receiver for group 2966 is connected to group 2808
************ dmqld (24282.0) 03-FEB-1997 10:55:24 ************
ld, link sender for group 2966 to group 2808 is running
************ dmqld (24282.0) 03-FEB-1997 10:55:24 ************
ld, link sender for group 2966 is connected to group 2808
************ dmqld (23269.0) 03-FEB-1997 10:55:26 ************
ld, duplicate link receiver for group 2803
************ dmqld (23269.0) 03-FEB-1997 10:55:26 ************
Thanks
Craig.
g
| |||||
| 2751.5 | XHOST::SJZ | Rocking the Messaging Desktop ! | Mon Feb 03 1997 09:30 | 5 | |
we have a fix for the segmentation fault. stay tuned.
it should be available later this week.
_sjz.
| |||||