| Taylor, your note is really confusing.
Are you saying that your VAR has an existing program which works well on V7.1
but which breaks after they have added threads to it? I.e., please go reread
your base note and interpret the following sentence for us:
.0> The program works well on Alpha/OVMS v7.1, but consistently dies [...]
.0> during "normal" use.
Also, if the VAR is doing their work on V7.1, why did you post image analyses
from V6.1? I presume that the debugger output is from V6.1 as well...
It might be more useful to post the output from the program run without the
debugger. Also, in order to interpret the ACCVIO messages, it would be useful
to have the output from a Debug SHOW IMAGE command, so we know which images they
are in and at what offset.
I wouldn't pay too much attention to the "%TASK 2 has overflowed its stack"
message -- by the time that comes out, the process is in such dire straights
(i.e., after an access violation and an extreme debugger intervention) that
there isn't much you can trust.
.0> are there any patches available that encompasses the changes made to VMS 7.1
.0> but are available on 6.1 (or 6.2)
What the customer is saying is that they want the enhancements, which prompted
us to increase the operating system major version number, packaged as a patch?
(E.g., "We would rather not upgrade our system, but we're willing to completely
replace the kernel and several of the fundamental run-time libraries...")
Suppose we offered them a patch which made their V7.1 system lie and call itself
V6.2-9? 8-)
Webb
|
|
The .exe was compiled and linked on VMS 6.1 and the debugger info was
from 6.1 (since that is where the program dies).
This exe runs just fine when moved to a 7.1 box.
It appears there is some run time issue that has been fixed in 7.1
Below is a run of the non-debug version on vms 6.1
- Taylor
%SYSTEM-F-ACCVIO, access violation, reason mask=04, virtual address=0118DE90,
PC
=8042479C, PS=0000001B
Improperly handled condition, image exit forced.
Signal arguments: Number = 00000005
Name = 0000000C
00000004
0118DE90
8042479C
0000001B
Register dump:
R0 = 0000000000000003 R1 = 0000000000000003 R2 = 000000007FE25FD8
R3 = 0000000001192038 R4 = 0000000001190BD0 R5 = 0000000001192018
R6 = 0000000000000066 R7 = 000000000118E448 R8 = 0000000000000003
R9 = 0000000000000000 R10 = 0000000001193458 R11 = 00000066332E3225
R12 = 0000000000000000 R13 = FFFFFFFF894425A8 R14 = 0000000000000000
R15 = FFFFFFFFFFFFFFFF R16 = 0000000000000001 R17 = 0000000000000024
R18 = 0000000000000003 R19 = 0000000000000002 R20 = 000000000118F7D1
R21 = 0000000000000000 R22 = FFFFFFFFFFFFEC9B R23 = 000000000000000A
R24 = 0000000000010000 R25 = 000000000000020A R26 = FFFFFFFF805F0194
R27 = 000000007FB211D0 R28 = FFFFFFFF805F10C0 R29 = 0000000000000003
SP = 000000007F912000 PC = FFFFFFFF8042479C PS = 000000000000001B
%CMA-F-EXCCOP, exception raised; VMS condition code follows
-SYSTEM-F-ACCVIO, access violation, reason mask=04, virtual address=D2D5E09C,
PC
=00F1F2CC, PS=0000001B
run of debug version on vms 6.1
run tg:das
%DEBUG-W-DWNOT1PROC, the 1 process debugger cannot be run in DECwindows mode
OpenVMS Alpha AXP DEBUG Version V6.1-000
%DEBUG-I-INITIAL, language is C_PLUS_PLUS, module set to DAS
%DEBUG-I-NOTATMAIN, type GO to get to start of main program
DBG> go
break at routine DAS\main
18638: dbg.debugging=debugging;
DBG> go
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=7FF91FC0,
PC=0134BD74, PS=0000001B
%DEBUG-E-LASTCHANCE, stack exception handlers lost, re-initializing stack
%SYSTEM-F-ACCVIO, access violation, reason mask=04, virtual address=011BFE90,
PC=8042479C, PS=0000001B
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=7FF91FC0,
PC=0134BD74, PS=0000001B
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=7FF91FC0,
PC=0134BD74, PS=0000001B
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=7FF91FC0,
PC=0134BD74, PS=0000001B
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=7FF91FC0,
PC=0134BD74, PS=0000001B
%DEBUG-I-ERRINSDEC, error occurred while decoding instruction at current PC
%DEBUG-F-ACCVIO, access violation, reason mask=00, virtual address=7FF91FC0,
PC=0134BD74, PS=0000001B
-MAKE-F-NOMSG, Message number 80442754
break on unhandled exception at in %TASK 2
error: %TASK 2 has overflowed its stack
SP: 00000034 Stack top at: 011C2A00 Remaining bytes: -18622924
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=7FF91FC0,
PC=0134BD74, PS=0000001B
%DEBUG-W-BADSTACK, stack corrupted - no further data available
DBG> show image
image name set base address end address
CDA$ACCESS no 01090000 011305FF
CDA$ACCESSMSG no 01452000 014621FF
CMA$OPEN_RTL no 00EA6000 00ED79FF
CMA$RTL no 00ED8000 00F597FF
CMA$TIS_SHR no 7FBB6000 7FBE7FFF
CODE0 8049A000 8049ABFF
DATA1 7FBB6000 7FBB69FF
DATA2 7FBC6000 7FBC61FF
DATA3 7FBE6000 7FBE61FF
*DAS yes 00010000 000501FF
DBGTBKMSG no 01464000 01470BFF
DCXSHR no 01718000 017483FF
DEBUG no 012DE000 013EE9FF
DEBUGSHR no 01494000 017163FF
DEC$COBRTL no 0033E000 003EF3FF
DECC$MSG no 013F0000 013F13FF
DECC$SHR no 7FE16000 7FE77FFF
CODE0 80554000 8060D1FF
DATA1 7FE16000 7FE2F5FF
DATA2 7FE36000 7FE3C9FF
DATA3 7FE46000 7FE481FF
DATA4 7FE56000 7FE561FF
DATA5 7FE66000 7FE6A3FF
DATA6 7FE76000 7FE77BFF
DECW$DWTLIBSHR no 00F5A000 0108E1FF
DECW$DWTMSG no 01440000 014501FF
DECW$DXMLIBSHR no 00D60000 00E639FF
DECW$TERMINALMSG no 0141E000 0143E1FF
DECW$TERMINALSHR no 00A94000 00B757FF
DECW$TRANSPORTMSG no 0140C000 0141C1FF
DECW$TRANSPORT_COMMON no 00A30000 00A92410
DECW$TRANSPORT_TCPIP no 0187A000 018BA3FF
DECW$XEXTLIBSHR no 00B76000 00BD69FF
DECW$XLIBMSG no 013FA000 0140A1FF
DECW$XLIBSHR no 0092E000 00A2F5FF
DECW$XMLIBSHR no 00BD8000 00D5FDFF
DECW$XTSHR no 0089C000 0092D7FF
DPML$SHR no 7FBF6000 7FD47FFF
CODE0 8049C000 805539FF
DATA1 7FBF6000 7FC239FF
DATA2 7FC26000 7FC3A3FF
DATA3 7FC46000 7FC463FF
DATA4 7FC56000 7FC805FF
DATA5 7FD46000 7FD46FFF
DY026_306 no 01F6A000 021B2DFF
FDLSHR no 002DC000 0033C3FF
LBRSHR no 00E64000 00EA43FF
LIBOTS no 7FB76000 7FBA7FFF
CODE0 8048C000 80499BFF
DATA1 7FB76000 7FB785FF
DATA2 7FB86000 7FB87BFF
DATA3 7FBA6000 7FBA61FF
LIBOTS2 no 0029A000 002DA5FF
LIBRTL no 7FB16000 7FB67FFF
CODE0 80400000 8048B3FF
DATA1 7FB16000 7FB25FFF
DATA2 7FB26000 7FB26FFF
DATA3 7FB36000 7FB3F5FF
DATA4 7FB46000 7FB461FF
DATA5 7FB56000 7FB56FFF
DATA6 7FB66000 7FB673FF
PTD$SERVICES_SHR no 00868000 0089A2D8
SECURESHRP no 00246000 00298750
SHRCWS no 00482000 008669FF
SHRIMGMSG no 013F2000 013F89FF
SHRPCRS no 018EE000 01E753FF
SHRSHL no 00052000 00245BFF
SMGSHR no 003F0000 004807FF
SYS$SSISHR no 01176000 011A63FF
USS no 01132000 01174340
total images: 42 bytes allocated: 647152
|
| .4> It appears there is some run time issue that has been fixed in 7.1
That is certainly one possible explanation, but I can't think of any recent
fixes which would account for the reported symptoms. Besides there are several
other possible explanations as well.
.4> %SYSTEM-F-ACCVIO, access violation, reason mask=04, virtual
.4> address=0118DE90, PC=8042479C, PS=0000001B
When run without the Debugger's influence, it would seem that something inside
LIBRTL tries to modify memory at an address in the SYS$SSISHR code, which
results in an access violation.
.4> %SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual
.4> address=7FF91FC0, PC=0134BD74, PS=0000001B
When run under the Debugger, the Debugger itself incurs an access violation,
trying to read from the top of P1 space somewhere, presumably in the course of
trying to deal with the application's ACCVIO, which this time appears to occur
at the same place in LIBRTL but which is writing to a slightly different address
(which looks like it might be in part of the heap, perhaps a thread stack guard
page):
.4> %SYSTEM-F-ACCVIO, access violation, reason mask=04, virtual
.4> address=011BFE90, PC=8042479C, PS=0000001B
I don't know what is prompting LIBRTL to try to do this write. It would be
interesting to know what routine 8042479C is in, if someone had the time to
chase that down, but it's probably something to do with raising a condition.
In the absence of anything more incriminating, I still suggest that the problem
is a memory corruptor (e.g., an uninitialized automatic pointer variable or an
instance of writing outside array bounds) which is causing this problem. (Or,
possibly, a call to DECthreads with an invalid or non-existent object handle.)
.4> %DEBUG-W-DWNOT1PROC, the 1 process debugger cannot be run in DECwindows mode
BTW, is there some reason why the customer is running the 1-process debugger?
They might have better (different) luck running the multiprocess debugger...
Webb
|