[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rdgeng::cics_technical

Title:Discussion of CICS technical issues
Moderator:IOSG::SMITHF
Created:Mon Mar 13 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:192
Total number of notes:680

177.0. "core produced by CICS EPI client program" by MIPSBX::"[email protected]" (Ricardo Lopez Cencerrado) Mon Apr 14 1997 13:26

The following is an extract from entry #174.0 in this conference:

>>I am developing a daemon process which serves as a gateway between a tcp 
>>client and CICS OSF using the EPI interface. the client sends the daemon the

>>transaction and receives back the host answer.
>>
>>The daemon is a multithreaded server, which generates two threads for every 
>>conection. One attending the client connection and the other taking care of
>>the EPI events received.

During testing of this server process at a customer premises, the following
core
took place, with the following message being writtent to the console:

<<<
%DECthreads bugcheck (version V3.12-311), terminating execution.
% Running on DEC OSF/1 AXP [OSF1 alpha V3.2(148); cpu type 39, configured for
%  14 cpus, 2 cpus in box, 1023Mb]
% Reason: test and set: high order bits corrupt at 0x63e36310
%     
%     The DECthreads library has detected an inconsistency in its internal
%   state and cannot continue execution. The inconsistency may be due to a bug
%   within the DECthreads library, the application program, or in any library
%   active in the address space. Common causes are unchecked stack overflows,
%   writes through uninitialized pointers, and synchronization races that
%   result in use of invalid data by some thread.
%     Application and library developers are requested to please check for
%   such problems before reporting a DECthreads library problem.
%     The information in this file may aid application program, library, or
%   DECthreads developers in determining the state of the process at the time
%   of this bugcheck. When the problem is reported, this information should be
%   included, along with a detailed procedure for reproducing the problem, if
%   that is possible. The 'detailed procedure' most likely to be of use to
%   developers is a complete program.
% 
% The bugcheck occurred at 
% ***CONCURRENT BUGCHECK***
% ***CANNOT CONTINUE: REPORTED STATE MAY BE INACCURATE AND INCOMPLETE***
>>>

analysing the core file with dbx, I guess the process received a sig_abort
caused
 by some thread calling the abort() function. I guess this is the normal
operation
when the pthread library detects an internal error.

<<<
$ dbx ../project/hstcom/hstcom.mhilo.lisboa.03-03-97.debugcompile core
dbx version 3.11.8
Type 'help' for help.
Core file created by program "hstcom1"

thread 0xfffffc003a4ccf00 signal IOT/Abort trap at >*[ldexp, 0x2427d298]      
c
pys     $f16,$f31,$f0
(dbx) where
>  0 ldexp(0x64282620, 0x503, 0x166850, 0x0, 0x241e0a78) [0x2427d298]
   1 cma__test_and_set(0x64282620, 0x503, 0x166850, 0x0, 0x241e0a78)
[0x241e0a78]
(dbx) tlist
thread 0xfffffc003a4ccf00 signal IOT/Abort trap at >*[ldexp, 0x2427d298]      
 cpys 
   $f16,$f31,$f0
thread 0xfffffc003eb76e60 stopped at >*[__chmod, 0x2427c764]    bis     r31,
r31, r31
thread 0xfffffc0008eafa40 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
thread 0xfffffc0008eaef00 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
thread 0xfffffc0008eafea0 stopped at >*[__setgroups, 0x2427e378]        ldq_u 
 r31, 
0(sp)
thread 0xfffffc0008eae140 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
thread 0xfffffc0008eaf680 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
thread 0xfffffc0008eae500 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
thread 0xfffffc0008eafe00 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
thread 0xfffffc0008eae0a0 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
thread 0xfffffc0008eae640 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
thread 0xfffffc0008eae780 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
thread 0xfffffc0009a3c460 stopped at >*[msg_receive_trap, 0x241f0574]   ret   
 r31, 
(r26), 1
More (n if no)?n
(dbx) where
>  0 msg_receive_trap(0x2801000000, 0x0, 0x0, 0x2a, 0xaceface) [0x241f0570]
   1 msg_receive(0x0, 0x0, 0x0, 0x0, 0x0) [0x241e6dd0]
   2 cma__vp_sleep(0x2801000001, 0x0, 0x27, 0x0, 0x0) [0x241e05dc]
   3 cma__dispatch(0x27, 0x0, 0x0, 0x0, 0x241bfe34) [0x241c6958]
   4 cma__int_wait(0x241cfeb4, 0x3c418, 0x11018, 0x20238, 0x25f88)
[0x241bfe30]
   5 pthread_cond_wait(0x243bb030, 0x64535540, 0x243bafa8, 0x0, 0x1fb18)
[0x241cea30]
(dbx) quit
>>>

This error has happened once in the client system a 8200/300 with 2 CPUS, 1 GB
RAM, r
unning OSF 3.2C, with CICS 2.1 and Oracle 7.1.6. I have not seen this error in
my tes
t system and do not know how to reproduce it. My test system is a DEC 3000/500
wiht 6
4 MB RAM, OSF 3.2G and CICS 2.1A.

The customer remembers nothing special about the moment when the error took
place. 
From the log files generated by the server, the error seems to have happened
before t
he Addterminal operation has finished, because no EPI console is displayed in
the log
:

<<<
$ more 10819_#04_04_1997#10_26_12_000.log
DebugOn value is TRUE
DebugVerbose value is TRUE
DebugDir value is /usr/users/hst/logdir1
Log file /usr/users/hst/logdir1/10819_#04_04_1997#10_26_12_000.log created
The client inet address is: 16.190.192.70
The client port is: 26118
RegionName: TCICSAIF Timeout: -1 TermType: hft BufSizeIn: 8192 BufSizeOut:
8192
>>>

A normal connection,up to the point where the Addterminal operation has
finished, add
s another message to the log file, indicating the name of current CICS
terminal.
This message does not appear in the connection running when the error
happened.

An example of a normal connection follows. Take notice of the last line:

<<<
$ more 10819_#04_04_1997#09_59_56_000.log
DebugOn value is TRUE
DebugVerbose value is TRUE
DebugDir value is /usr/users/hst/logdir1
Log file /usr/users/hst/logdir1/10819_#04_04_1997#09_59_56_000.log created
The client inet address is: 16.190.192.70
The client port is: 24070
RegionName: TCICSAIF Timeout: -1 TermType: hft BufSizeIn: 8192 BufSizeOut:
8192

Session Terminal LICO0819
>>>


Any suggestion about tracing and solving this problem is wellcome.


Thanks in advance, 


Ricardo Lopez Cencerrado.

[Posted by WWW Notes gateway]
T.RTitleUserPersonal
Name
DateLines
177.1Atom....CICS03::helenHelen PrattTue Apr 15 1997 10:0938

Ricardo,

I assume from the way that you have attempted to debug this
using the debugger that this is your Epi application, however 
if the DECthreads bugcheck you've reproduced below went to the console.msg
file, it could be a CICS application server - in this case to determine
which process, check the messages immediately after the bugcheck.

We have seen this type of bug check before, generally associated with
CICS application servers.  It is believed to be caused by memory walkers
walking over the threads data structures.  If you are seeing the problem
with application servers, note what transactions are running in the
application servers which terminate.

If you are seeing this problem from your Epi driver, try putting atom
on it again and check for memory walkers (it might be benifical to do
this on the customers system).  Another thing - are you compiling your 
Epi application with the -threads flag? The reason the customer sees 
this problem and you don't could be associated with SMP.

>>The customer remembers nothing special about the moment when the error took
>>place. 
>>From the log files generated by the server, the error seems to have happened
>>before t
>>he Addterminal operation has finished, because no EPI console is displayed in
>>the log

During the routines for the addition of a terminal, a least one new thread
will be generated, this is therefore the time when an inconsistency is
likely to be noticed, rather than the point at which the inconsistency is
created.

Good luck,

Helen.

177.2instrumented version running at the customer siteNNTPD::&quot;[email protected]&quot;Ricardo Lopez CencerradoTue Apr 22 1997 13:2632
Hello,

The error and the message file are stored in the log file created by the EPI
daemon
I have developed. There are not any reported problems with CICS server
processes
at the customer premises and no symrecs or cics cores associated with the
daemon
core.

The compilation line for the program is the one at the end of note 174.0, and 
includes the -thread parameter.

make -f test3.mk hstcom_c.third
cc -c -taso -std1 -threads -I/usr/opt/cics/include -O2 cicserver.child-fork.c
ld /usr/ccs/lib/crt0.o -taso -call_shared  -L/usr/opt/cics/lib -o hstcom_c
cicserver.child-
fork.o -lcicsepico -L/usr/opt/dce/usr/shlib/dce -lpthreads -lmach -lc_r -lc

atom hstcom_c -tool third -env threads  -L/usr/opt/cics/lib
-L/usr/opt/dce/usr/shlib -all -heapbase taso

The instrumented version has been installed in the customer system. As soon as
the
error repeates I will post the information gathered by atom.


Thanks for your quick help,


Ricardo.
[Posted by WWW Notes gateway]