[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference vaxaxp::vmsnotes

Title:VAX and Alpha VMS
Notice:This is a new VMSnotes, please read note 2.1
Moderator:VAXAXP::BERNARDO
Created:Wed Jan 22 1997
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:703
Total number of notes:3722

386.0. "who corrupted my stack ?" by COL01::VSEMUSCHIN (Duck and Recover !) Wed Mar 26 1997 14:02

    A customer has very strange problem (OVMS Alpha 6.2-1h2, Motif 1.2):
    
    he has a GKS program, which calls some X-Windows (Xt) routines from
    shareable image linked with it. After some mouse clicks (2-500) his
    programs bombs out with ACCVIO. As we found the stack in pointer
    device initialisation routine becomes corrupted. This routine resides
    in shareable image and is called directly from the main program.
    
    It is possible, that another program from  the same shareable image
    initiated an asynchronous action (AST or IO with AST notification)
    and gives it address of its local varible as parameter. Then this
    routine returns and the asynchronous action shot the stack of our
    program.
    
    The fact is that only one and always the same routine got corrupted
    stack. This is the part of stack, where during prologue some of
    procedure parameters should be stored to. Instead of 5th and 6th
    parameters (both longword pointers) there is stored quadword 0x02c 
    (always the same).
    
    Stepwise DEBUG shows those values directly after execution of STL's.
    Register has expected contents, but target memory locations contain
    0x2C :-( When paralle to debug program is observer with ana/sys -
    there are nothing in AST queues in PCB and no outstanding I/Os on
    WSAnnn:
    
    When instead of debugger printf statemets will installed after each
    original statement ( if(par5==0) {printf(...);sys$hiber();} ) the
    program becomes hibernated always at the 3d printf ... 
    
    When the caller program uses $SETAST to disable user mode AST's it
    has no effect.
    
    Today I tried to catch the guilty using sys$setprt (note this
    is the simplified example):
    
    prog( par1, ... ) {
       int dummy[64*1024];
       static int hit = 0;
       {  int addr[2]; int oldprt;
          addr[0]=addr[1]=&par5; 
          if( hit++ > 5) sys$setprt( addr,0,PRT$C_UR,oldprt);
    
          /* user program */
    
          if( hit++ > 5) sys$setprt( addr,0,oldprt,0);
        }
    }
    
    The only villain that was caught was sys$imgact_C+03d0 (88ABE470)
    that tried to modify 7ee7bb28 compare with:
                         7ee7b4f0 or 7ee7b500 - address of par5
                         7ee7b4dc               address of dummy[last]
                         7ee3b4e0               address of dummy[0]
    What is interested that address of par5 can change from call to
    call address of dummy is always constant.
    
    Some ideas ?
    
    =Seva 
                                                  
    
           
    
T.RTitleUserPersonal
Name
DateLines
386.1this question was cross posted in DECWINDOWSCOL01::VSEMUSCHINDuck and Recover !Wed Mar 26 1997 14:1910
    I asked this question in DECWINDOWS notes conference (note 5813.*).
    The only answer was from Steve Hoffman, who suppose, that QUADWORD
    may be IOSB (0x2c is SS$_ABORT). To be closer to Steve I decide to
    move this thread here.
    
    Btw. the shareable image contains only Xt library calls and no 
    directly calls to system services. This code is written in C++
    and the main program in fortran.
    
    =Seva
386.2These Stack Corrupters Are Fun To Find...XDELTA::HOFFMANSteve, OpenVMS EngineeringWed Mar 26 1997 14:3322
: To be closer to Steve I decide to move this thread here.

   I'm not sure I want to think about that.  :-)

	--

   I'll assume the current set of Motif patches have been applied,
   and this problem is (still) being seen.

   I'd be suspicious of any subroutine calls -- it does not have to be
   a call to the X toolkit or to a system service...

   Does this error always involve a corruption to the same memory
   location?  (Is it possible to use a watchpoint or to program the
   debugger to watch the location(s), possibly enabling the watchpoint
   or breakpoint only when the appropriate application callframe(s) are
   active?  The debugger can literally be programmed to "lurk", looking
   for the footprint of the error before it takes action.)
    
   Without looking at the code -- and possibly only by running it under
   the debugger -- it's only really possible to guess...

386.3fairly obvious, but have you checked for...CUJO::SAMPSONWed Mar 26 1997 22:336
	If a routine were to declare an IOSB as an automatic
(stack) variable, queue an asynchronous I/O request, then
return without waiting for completion, this kind of corruption
could happen to some later user of the same stack address, when
the I/O completes and fills in the IOSB quadword.  I've seen
this happen before.  It's often very timing-dependent.
386.4catching the HuffalumpCOL01::VSEMUSCHINDuck and Recover !Thu Mar 27 1997 04:0356
    				"We will charm it with soap and smile,
                                 We will scare it with railway fare ..."
    					Luis Caroll, The Hunting of the Snark
    
>>   Does this error always involve a corruption to the same memory
>>   location?  
    Yes ...
    
>>    (Is it possible to use a watchpoint or to program the
>>   debugger to watch the location(s), possibly enabling the watchpoint
>>   or breakpoint only when the appropriate application callframe(s) are
>>   active?  The debugger can literally be programmed to "lurk", looking
>>   for the footprint of the error before it takes action.)
    No
    Debugger doesn't allow watchpoints in the stack, 'cause it would make
    this quiz too easy to solve ;-) Anyway to be sure I would RTFM it
    still one time to check whether I can find there still something
    usefull. And as you recall .0, I tried to build my own Very Cunning
    Trap to catch the Huffalump. To my great sorrow instead of this animal
    the SYS$IMGSTA_C was caught. And I hoped that probably one of readers 
    would have an idea WHY ?
    
>>   Without looking at the code -- and possibly only by running it under
>>   the debugger -- it's only really possible to guess...

    I wrote some comments about the source code in DECWINDOWS notes conference
    and I don't want to reapeat them. Only thing I want to say, that 
    investigating this code is as well hopefully as to guess ...
    
>>                -< fairly obvious, but have you checked for... >-
>>--------------------------------------------------------------------------------
>>	If a routine were to declare an IOSB as an automatic
>>(stack) variable, queue an asynchronous I/O request, then
>>return without waiting for completion, this kind of corruption
>>could happen to some later user of the same stack address, when
>>the I/O completes and fills in the IOSB quadword.  I've seen
>>this happen before.  It's often very timing-dependent.
    
    Yes, and now please compare it with :
    
    from .0
>>    It is possible, that another program from  the same shareable image
>>    initiated an asynchronous action (AST or IO with AST notification)
>>    and gives it address of its local varible as parameter. Then this
>>    routine returns and the asynchronous action shot the stack of our
>>    program.
    from .1
>>    I asked this question in DECWINDOWS notes conference (note 5813.*).
>>    The only answer was from Steve Hoffman, who suppose, that QUADWORD
>>    may be IOSB (0x2c is SS$_ABORT). To be closer to Steve I decide to
>>    move this thread here.
    
    Yes, this suggestion is that, what we are agree with. What is still
    unclear is WHO did it (see also the title of this thread)
    
    =Seva
386.5WIBBIN::NOYCEPulling weeds, pickin&#039; stonesThu Mar 27 1997 08:332
Well, if it is SS$_ABORT, perhaps you could look for places that issue a
$CANCEL, and try to figure out who issued the operation that's being CANCELed?
386.6Debugger can be *Programmed*XDELTA::HOFFMANSteve, OpenVMS EngineeringThu Mar 27 1997 09:5530
:>>   Does this error always involve a corruption to the same memory
:>>   location?  
:    Yes ...

    Then *program* the debugger to look at that offset in any
    routine at the same "stack depth"...

    Also look at the previous subroutines that would have been active
    at this same "stack depth", and see what variables are at the
    specified offset, and how these variables are used.  If nothing
    interesting is found at the first "stack depth", work deeper into
    the call stack.
    
:>>    (Is it possible to use a watchpoint or to program the
:>>   debugger to watch the location(s), possibly enabling the watchpoint
:>>   or breakpoint only when the appropriate application callframe(s) are
:>>   active?  The debugger can literally be programmed to "lurk", looking
:>>   for the footprint of the error before it takes action.)
:    No
:    Debugger doesn't allow watchpoints in the stack, 'cause it would make
:    this quiz too easy to solve ;-) 

   I've had good luck with programming the debugger -- the debugger
   can be far more useful than "just" watchpoints or breakpoints.

   The debugger can be programmed to take an action as a result of a
   breakpoint, and the actions can be to start evaluating the contents
   of the stack, or examining routine-local variables and decide to
   continue or call attention to a problem, etc.