Title: | NAS Message Queuing Bus |
Notice: | KITS/DOC, see 4.*; Entering QARs, see 9.1; Register in 10 |
Moderator: | PAMSRC::MARCUS EN |
Created: | Wed Feb 27 1991 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 2898 |
Total number of notes: | 12363 |
I'm seeing ld, caught signal 11 in my group log file regularly. After reading Note 2718 of this conference, I'll use "sjz"'s method of giving you version of DMQ processes: # su - dmq % what `which dmqgcp` | grep DECmessageQ DECmessageQ for UNIX, V3.2A, Sun Nov 24 00:01:39 EST 1996 % what `which dmqld` | grep DECmessageQ DECmessageQ for UNIX, V3.2A, Sun Nov 24 00:01:39 EST 1996 % what `which dmqbcp` | grep DECmessageQ DECmessageQ for UNIX, V3.2A, Sun Nov 24 00:01:39 EST 1996 % what `which dmqqe` | grep DECmessageQ DECmessageQ for UNIX, V3.2A, Sun Nov 24 00:01:39 EST 1996 # setld -i | grep DMA DMACL32A installed DECmessageQ Client Library DMACLS32A installed DECmessageQ Client Library Server DMADEV32A installed DECmessageQ Development Environment DMAEXA32A installed DECmessageQ Example Programs DMAMAN32A installed DECmessageQ Manual Reference Pages DMARLS32A installed DECmessageQ Release Notes DMARTO32A installed DECmessageQ Run Time Environment # uname -a OSF1 mynode.ozy.dec.com V3.2 148 alpha The symptoms are that a 2nd node is continually trying to connect but is being denied 'cause the group init file on the node experiencing the crashes is doesn't have the 2nd in it's "XGROUP SECTION" and does have XGROUP_VERIFY set to YES. The solution for now is to fix the group init file, but is there a known resouce leak in this process that results in the dmqld crashing ? I can't find any core files to give you. Craig.
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
2751.1 | XHOST::SJZ | Rocking the Messaging Desktop ! | Thu Jan 30 1997 21:34 | 7 | |
resource leaks don't necessarily cause a signal 11 (SIGSEGV, or segmentation fault (also known as a GPF or accvio depend- ing on which world you grew up in)). Bad programming causes a signal 11. We will look into it. _sjz. | |||||
2751.2 | XHOST::SJZ | Rocking the Messaging Desktop ! | Thu Jan 30 1997 21:37 | 12 | |
one more note, when you so the setld -i | grep thing we usually instruct people to search for "DECmessageQ" rather than "DMA" since the 'A' in "DMA" changes by platform. # setld -i | grep DECmessageQ It produces the exact same results in this case, but it is good to know for general trouble shooting. _sjz. | |||||
2751.3 | Ok - Your Move | OZROCK::THOMAN | The House Of Script | Sun Feb 02 1997 20:19 | 24 |
RE .1 > ................ We will look into it. Thanks. RE .2 > rather than "DMA" since the 'A' in "DMA" changes by > platform. > > # setld -i | grep DECmessageQ I've been tricked by that "floating letter" b4. In future I'll use what you showed. Thx C. | |||||
2751.4 | More details from the grp log file | OZROCK::THOMAN | The House Of Script | Mon Feb 03 1997 08:28 | 71 |
I just noticed the following in our group log file on the offending node: ld, link listener for group 1073756384 is running The group numbers involved are both in the two thousands ! Can you pls tell me what the (n.m) means in the lines that are displayed in the log files ? If it is the PID of the dmq process, then I don't understand how the number is being reassigned to the new dmqld process in the following example: ************ dmqld (6163.0) 03-FEB-1997 10:54:56 ************ ld, link sender for group 2966 to group 2808 is exiting ************ dmqld (23269.0) 03-FEB-1997 10:54:56 ************ ld, link receiver for group 2966 from group 2808 is exiting ************ dmqld (23269.0) 03-FEB-1997 10:54:56 ************ ld, caught signal 11 ************ dmqld (23269.0) 03-FEB-1997 10:54:56 ************ ld, link receiver for group 2966 from group 2808 is exiting ************ dmqld (23269.0) 03-FEB-1997 10:55:06 ************ ld, duplicate link receiver for group 2803 ************ dmqld (23269.0) 03-FEB-1997 10:55:06 ************ ld, initialization failure ************ dmqld (23269.0) 03-FEB-1997 10:55:16 ************ ld, duplicate link receiver for group 2803 ************ dmqld (23269.0) 03-FEB-1997 10:55:16 ************ ld, initialization failure ************ dmqld (23269.0) 03-FEB-1997 10:55:24 ************ ld, link receiver for group 2966 from group 2808 is running ************ dmqld (23269.0) 03-FEB-1997 10:55:24 ************ ld, link receiver for group 2966 is connected to group 2808 ************ dmqld (24282.0) 03-FEB-1997 10:55:24 ************ ld, link sender for group 2966 to group 2808 is running ************ dmqld (24282.0) 03-FEB-1997 10:55:24 ************ ld, link sender for group 2966 is connected to group 2808 ************ dmqld (23269.0) 03-FEB-1997 10:55:26 ************ ld, duplicate link receiver for group 2803 ************ dmqld (23269.0) 03-FEB-1997 10:55:26 ************ Thanks Craig. g | |||||
2751.5 | XHOST::SJZ | Rocking the Messaging Desktop ! | Mon Feb 03 1997 09:30 | 5 | |
we have a fix for the segmentation fault. stay tuned. it should be available later this week. _sjz. |