[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rdgeng::cics_technical

Title:	Discussion of CICS technical issues

Moderator:	IOSG::SMITHF

Created:	Mon Mar 13 1995
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	192
Total number of notes:	680

177.0. "core produced by CICS EPI client program" by MIPSBX::"[email protected]" (Ricardo Lopez Cencerrado) Mon Apr 14 1997 12:26

The following is an extract from entry #174.0 in this conference:

>>I am developing a daemon process which serves as a gateway between a tcp 
>>client and CICS OSF using the EPI interface. the client sends the daemon the

>>transaction and receives back the host answer.
>>
>>The daemon is a multithreaded server, which generates two threads for every 
>>conection. One attending the client connection and the other taking care of
>>the EPI events received.

During testing of this server process at a customer premises, the following
core
took place, with the following message being writtent to the console:

<<<
%DECthreads bugcheck (version V3.12-311), terminating execution.
% Running on DEC OSF/1 AXP [OSF1 alpha V3.2(148); cpu type 39, configured for
%  14 cpus, 2 cpus in box, 1023Mb]
% Reason: test and set: high order bits corrupt at 0x63e36310
%     
%     The DECthreads library has detected an inconsistency in its internal
%   state and cannot continue execution. The inconsistency may be due to a bug
%   within the DECthreads library, the application program, or in any library
%   active in the address space. Common causes are unchecked stack overflows,
%   writes through uninitialized pointers, and synchronization races that
%   result in use of invalid data by some thread.
%     Application and library developers are requested to please check for
%   such problems before reporting a DECthreads library problem.
%     The information in this file may aid application program, library, or
%   DECthreads developers in determining the state of the process at the time
%   of this bugcheck. When the problem is reported, this information should be
%   included, along with a detailed procedure for reproducing the problem, if
%   that is possible. The 'detailed procedure' most likely to be of use to
%   developers is a complete program.
% 
% The bugcheck occurred at 
% ***CONCURRENT BUGCHECK***
% ***CANNOT CONTINUE: REPORTED STATE MAY BE INACCURATE AND INCOMPLETE***
>>>

analysing the core file with dbx, I guess the process received a sig_abort
caused
 by some thread calling the abort() function. I guess this is the normal
operation
when the pthread library detects an internal error.

<<<
$ dbx ../project/hstcom/hstcom.mhilo.lisboa.03-03-97.debugcompile core
dbx version 3.11.8
Type 'help' for help.
Core file created by program "hstcom1"

thread 0xfffffc003a4ccf00 signal IOT/Abort trap at >*[ldexp, 0x2427d298]      
c
pys     $f16,$f31,$f0
(dbx) where
>  0 ldexp(0x64282620, 0x503, 0x166850, 0x0, 0x241e0a78) [0x2427d298]
   1 cma__test_and_set(0x64282620, 0x503, 0x166850, 0x0, 0x241e0a78)
[0x241e0a78]
(dbx) tlist
thread 0xfffffc003a4ccf00 signal IOT/Abort trap at >*[ldexp, 0x2427d298]      
 cpys 
   $f16,$f31,$f0
thread 0xfffffc003eb76e60 stopped at >*[__chmod, 0x2427c764]    bis     r31,
r31, r31
thread 0xfffffc0008eafa40 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
thread 0xfffffc0008eaef00 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
thread 0xfffffc0008eafea0 stopped at >*[__setgroups, 0x2427e378]        ldq_u 
 r31, 
0(sp)
thread 0xfffffc0008eae140 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
thread 0xfffffc0008eaf680 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
thread 0xfffffc0008eae500 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
thread 0xfffffc0008eafe00 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
thread 0xfffffc0008eae0a0 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
thread 0xfffffc0008eae640 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
thread 0xfffffc0008eae780 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
thread 0xfffffc0009a3c460 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
More (n if no)?n
(dbx) where
>  0 msg_receive_trap(0x2801000000, 0x0, 0x0, 0x2a, 0xaceface) [0x241f0570]
   1 msg_receive(0x0, 0x0, 0x0, 0x0, 0x0) [0x241e6dd0]
   2 cma__vp_sleep(0x2801000001, 0x0, 0x27, 0x0, 0x0) [0x241e05dc]
   3 cma__dispatch(0x27, 0x0, 0x0, 0x0, 0x241bfe34) [0x241c6958]
   4 cma__int_wait(0x241cfeb4, 0x3c418, 0x11018, 0x20238, 0x25f88)
[0x241bfe30]
   5 pthread_cond_wait(0x243bb030, 0x64535540, 0x243bafa8, 0x0, 0x1fb18)
[0x241cea30]
(dbx) quit
>>>

This error has happened once in the client system a 8200/300 with 2 CPUS, 1 GB
RAM, r
unning OSF 3.2C, with CICS 2.1 and Oracle 7.1.6. I have not seen this error in
my tes
t system and do not know how to reproduce it. My test system is a DEC 3000/500
wiht 6
4 MB RAM, OSF 3.2G and CICS 2.1A.

The customer remembers nothing special about the moment when the error took
place. 
From the log files generated by the server, the error seems to have happened
before t
he Addterminal operation has finished, because no EPI console is displayed in
the log
:

<<<
$ more 10819_#04_04_1997#10_26_12_000.log
DebugOn value is TRUE
DebugVerbose value is TRUE
DebugDir value is /usr/users/hst/logdir1
Log file /usr/users/hst/logdir1/10819_#04_04_1997#10_26_12_000.log created
The client inet address is: 16.190.192.70
The client port is: 26118
RegionName: TCICSAIF Timeout: -1 TermType: hft BufSizeIn: 8192 BufSizeOut:
8192
>>>

A normal connection,up to the point where the Addterminal operation has
finished, add
s another message to the log file, indicating the name of current CICS
terminal.
This message does not appear in the connection running when the error
happened.

An example of a normal connection follows. Take notice of the last line:

<<<
$ more 10819_#04_04_1997#09_59_56_000.log
DebugOn value is TRUE
DebugVerbose value is TRUE
DebugDir value is /usr/users/hst/logdir1
Log file /usr/users/hst/logdir1/10819_#04_04_1997#09_59_56_000.log created
The client inet address is: 16.190.192.70
The client port is: 24070
RegionName: TCICSAIF Timeout: -1 TermType: hft BufSizeIn: 8192 BufSizeOut:
8192

Session Terminal LICO0819
>>>


Any suggestion about tracing and solving this problem is wellcome.


Thanks in advance, 


Ricardo Lopez Cencerrado.

[Posted by WWW Notes gateway]

T.R	Title	User	Personal Name	Date	Lines
177.1	Atom....	CICS03::helen	Helen Pratt	`Tue Apr 15 1997 09:09`	38
	Ricardo, I assume from the way that you have attempted to debug this using the debugger that this is your Epi application, however if the DECthreads bugcheck you've reproduced below went to the console.msg file, it could be a CICS application server - in this case to determine which process, check the messages immediately after the bugcheck. We have seen this type of bug check before, generally associated with CICS application servers. It is believed to be caused by memory walkers walking over the threads data structures. If you are seeing the problem with application servers, note what transactions are running in the application servers which terminate. If you are seeing this problem from your Epi driver, try putting atom on it again and check for memory walkers (it might be benifical to do this on the customers system). Another thing - are you compiling your Epi application with the -threads flag? The reason the customer sees this problem and you don't could be associated with SMP. >>The customer remembers nothing special about the moment when the error took >>place. >>From the log files generated by the server, the error seems to have happened >>before t >>he Addterminal operation has finished, because no EPI console is displayed in >>the log During the routines for the addition of a terminal, a least one new thread will be generated, this is therefore the time when an inconsistency is likely to be noticed, rather than the point at which the inconsistency is created. Good luck, Helen.
177.2	instrumented version running at the customer site	NNTPD::"[email protected]"	Ricardo Lopez Cencerrado	`Tue Apr 22 1997 12:26`	32
	Hello, The error and the message file are stored in the log file created by the EPI daemon I have developed. There are not any reported problems with CICS server processes at the customer premises and no symrecs or cics cores associated with the daemon core. The compilation line for the program is the one at the end of note 174.0, and includes the -thread parameter. make -f test3.mk hstcom_c.third cc -c -taso -std1 -threads -I/usr/opt/cics/include -O2 cicserver.child-fork.c ld /usr/ccs/lib/crt0.o -taso -call_shared -L/usr/opt/cics/lib -o hstcom_c cicserver.child- fork.o -lcicsepico -L/usr/opt/dce/usr/shlib/dce -lpthreads -lmach -lc_r -lc atom hstcom_c -tool third -env threads -L/usr/opt/cics/lib -L/usr/opt/dce/usr/shlib -all -heapbase taso The instrumented version has been installed in the customer system. As soon as the error repeates I will post the information gathered by atom. Thanks for your quick help, Ricardo. [Posted by WWW Notes gateway]

Conference rdgeng::cics_technical

177.0. "core produced by CICS EPI client program" by MIPSBX::&quot;[email protected]&quot; (Ricardo Lopez Cencerrado) Mon Apr 14 1997 12:26

177.0. "core produced by CICS EPI client program" by MIPSBX::"[email protected]" (Ricardo Lopez Cencerrado) Mon Apr 14 1997 12:26