[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference pamsrc::decmessageq

Title:NAS Message Queuing Bus
Notice:KITS/DOC, see 4.*; Entering QARs, see 9.1; Register in 10
Moderator:PAMSRC::MARCUSEN
Created:Wed Feb 27 1991
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2898
Total number of notes:12363

2751.0. "DMQ Link Driver (dmqld) catches sig 11 on D/Unix" by OZROCK::THOMAN (The House Of Script) Thu Jan 30 1997 20:01

	I'm seeing 

		ld, caught signal 11


	in my group log file regularly. After reading Note
	2718 of this conference, I'll use "sjz"'s method of
	giving you version of DMQ processes:


# su - dmq
% what `which dmqgcp` | grep DECmessageQ
        DECmessageQ for UNIX, V3.2A, Sun Nov 24 00:01:39 EST 1996
% what `which dmqld` | grep DECmessageQ
        DECmessageQ for UNIX, V3.2A, Sun Nov 24 00:01:39 EST 1996
% what  `which dmqbcp` | grep DECmessageQ
        DECmessageQ for UNIX, V3.2A, Sun Nov 24 00:01:39 EST 1996
% what `which dmqqe` | grep DECmessageQ
        DECmessageQ for UNIX, V3.2A, Sun Nov 24 00:01:39 EST 1996

# setld -i | grep DMA
DMACL32A        installed       DECmessageQ Client Library
DMACLS32A       installed       DECmessageQ Client Library Server
DMADEV32A       installed       DECmessageQ Development Environment
DMAEXA32A       installed       DECmessageQ Example Programs
DMAMAN32A       installed       DECmessageQ Manual Reference Pages
DMARLS32A       installed       DECmessageQ Release Notes
DMARTO32A       installed       DECmessageQ Run Time Environment

# uname -a
OSF1 mynode.ozy.dec.com V3.2 148 alpha

	
	The symptoms are that a 2nd node is continually trying to
	connect but is being denied 'cause the group init file
	on the node experiencing the crashes is doesn't have the 
	2nd in it's "XGROUP SECTION" and does have XGROUP_VERIFY 
	set to YES.


	The solution for now is to fix the group init file, but
	is there a known resouce leak in this process that results
	in the dmqld crashing ?

	I can't find any core files to give you.


	Craig.

 
T.RTitleUserPersonal
Name
DateLines
2751.1XHOST::SJZRocking the Messaging Desktop !Thu Jan 30 1997 21:347
    
    resource leaks don't necessarily cause a signal 11 (SIGSEGV,
    or segmentation fault (also known as a GPF or accvio depend-
    ing on which world you grew up in)).
    
    Bad programming causes a signal 11.  We will look  into  it.
    _sjz.
2751.2XHOST::SJZRocking the Messaging Desktop !Thu Jan 30 1997 21:3712
    
    one more note,  when you so the setld -i | grep thing
    we usually instruct people to search for "DECmessageQ"
    rather than "DMA" since the 'A' in "DMA" changes by
    platform.
    
    # setld -i | grep DECmessageQ
    
    It produces the exact same results in this case,  but
    it is good to know for general trouble shooting.
    
    _sjz.
2751.3Ok - Your MoveOZROCK::THOMANThe House Of ScriptSun Feb 02 1997 20:1924
RE .1


>   ................ 	We will look  into  it.

	Thanks.

RE .2


>    rather than "DMA" since the 'A' in "DMA" changes by
>    platform.
>    
>    # setld -i | grep DECmessageQ

	I've been tricked by that "floating letter" b4. In
	future I'll use what you showed.


	Thx


	C.
2751.4More details from the grp log fileOZROCK::THOMANThe House Of ScriptMon Feb 03 1997 08:2871
    
    	I just noticed the following in our group log file on the 
    	offending node:
    
    	ld, link listener for group 1073756384 is running       
    
    
    	The group numbers involved are both in the two thousands !
    
    
    	Can you pls tell me what the (n.m) means  in the lines that are
    	displayed in the log files ?
    
    	If it is the PID of the dmq process, then I don't understand how
    	the number is being reassigned to the new dmqld process in the
    following  
    	example:
    
    
    ************ dmqld (6163.0) 03-FEB-1997 10:54:56 ************
    ld, link sender for group 2966 to group 2808 is exiting
    
    ************ dmqld (23269.0) 03-FEB-1997 10:54:56 ************
    ld, link receiver for group 2966 from group 2808 is exiting
    
    ************ dmqld (23269.0) 03-FEB-1997 10:54:56 ************
    ld, caught signal 11
    
    ************ dmqld (23269.0) 03-FEB-1997 10:54:56 ************
    ld, link receiver for group 2966 from group 2808 is exiting
    
    ************ dmqld (23269.0) 03-FEB-1997 10:55:06 ************
    ld, duplicate link receiver for group 2803
    
    ************ dmqld (23269.0) 03-FEB-1997 10:55:06 ************
    ld, initialization failure
    
    ************ dmqld (23269.0) 03-FEB-1997 10:55:16 ************
    ld, duplicate link receiver for group 2803
    
    ************ dmqld (23269.0) 03-FEB-1997 10:55:16 ************
    ld, initialization failure
    
    ************ dmqld (23269.0) 03-FEB-1997 10:55:24 ************
    ld, link receiver for group 2966 from group 2808 is running
    
    ************ dmqld (23269.0) 03-FEB-1997 10:55:24 ************
    ld, link receiver for group 2966 is connected to group 2808
    
    ************ dmqld (24282.0) 03-FEB-1997 10:55:24 ************
    ld, link sender for group 2966 to group 2808 is running
    
    ************ dmqld (24282.0) 03-FEB-1997 10:55:24 ************
    ld, link sender for group 2966 is connected to group 2808
    
    ************ dmqld (23269.0) 03-FEB-1997 10:55:26 ************
    ld, duplicate link receiver for group 2803
    
    ************ dmqld (23269.0) 03-FEB-1997 10:55:26 ************
    
    
    
    	Thanks
    
    	
    	Craig.
    
    
    
    
    g
2751.5XHOST::SJZRocking the Messaging Desktop !Mon Feb 03 1997 09:305
    
    we have a fix for the segmentation fault.  stay tuned.
    it should be available later this week.
    
    _sjz.