[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::decladebug

Title:Digital Ladebug debugger
Moderator:TLE::LUCIA
Created:Fri Feb 28 1992
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:969
Total number of notes:3959

881.0. "4.0-31 watch; run; causes C++ program to not progress. Store_conditional problem" by WIDTH::MDAVIS (Mark Davis - compiler maniac) Wed Mar 12 1997 14:19

Perhaps my enthusiasm in 869.2 was premature :-)

For this c++ program, if I set a watch point on a location, and then
"run", nothing happens for the 10 minutes I waited.  Most of the time
is spent in ladebug (according to ps:
USER   %CPU %MEM   VSZ  RSS S    STARTED         TIME COMMAND
mdavis 94.8  9.1 22.0M  11M R  + 13:40:04     2:04.56 ladebug a.out
mdavis  3.9  0.4 1.80M 520K T  + 13:40:14     0:04.71 a.out
)
If I set a breakpoint on main, and don't set the watchpoint until I hit
that breakpoint, everything runs fine.

Aha!  Sometimes 2 ^C's will stop the program during initialization

#0  0x3ff81d37c00 in streambuf(
#1  0x3ff81d448f0 in filebuf(
#2  0x3ff81d43d40 in initialize(
#3  0x3ff81d438f8 in Iostream_init(

and the instruction in streambuf is:
(ladebug) 0x3ff81d37c00/i
 [<opaque> streambuf(void), 0x3ff81d37c00]      stq_c   r3, 0(r2)

and $r2 happens to point into the same 8k page as the address I'm 
watching.  Every time this stq_c executes, it faults since it's writing
to a protected page.  Ladebug then unprotects the page, single steps,
and reprotects the page.  HOWEVER, this conditional store FAILS, since
lots of branching happened since the preceeding, matching ldq_l.  So
the program loops back to the ldq_l and tries (and fails) again! (and
again and again ....)

In other words, if there's a lock on the same page we're trying to watch,
then the program will always loop infinitely.

WHAT CAN LADEBUG DO??
1. diagnose the problem and break:  if the faulting instruction is
stq_c or stl_c, stop, warn the user, and ask if they want to stop
watching for a while.

2. try to workaround the problem:
  if you fault on st?_c then
	a. unprotect, single step TWICE
	b. put a breakpoint on the instruction following the
	   st?_c
	c. do NOT protect the page!
	d. continue; the st?_c will fail, and the program should 
	   loop back and do the
	   ld?_l; ...; st?_c  successfully and then
	e. hit the breakpoint from b.
	f. PROTECT the page, and remove the breakpoint
	g. continue


SOurce program and ladebug output

cat watch_bug.cxx
extern "C"  void *putchar(char);
struct A{
  virtual void *r(){char c;return((c=*p++)?(r(),putchar(c)):(p=*--q));}
  static char *p,**q;
  A(){r();}
  ~A(){r();}};
struct B:A{};
struct C:B,A{};
struct D:C,B,A{};
char*m[]={"\nkraM\n\n.uoy ","gniteem deyojne I","\n.yadnoM no lae",
"m lufrednow eht ","rof uoy knahT   ","\n,ekiM dna ",",nhoJ ",",evetS"};
main(){B::p=*(B::q=&m[7]);D d;}
char *A::p,**A::q;

tagged 313% cxx -g watch_bug.cxx
tagged 314% ladebug a.out
Welcome to the Ladebug Debugger Version 4.0-31
------------------ 
object file name: a.out 
Reading symbolic information ...done
line: 9 Unable to parse input as legal command or C++ expression.
(ladebug) watch (&A::q)
[#1: watch 0x140000698 to 0x14000069f ]
(ladebug) run  <<<*** I have to hit ^C twice to get its attention
(ladebug) q
390.76u 180.94s 10:00 95% 46+138k 0+3io 7pf+7w 184stk+32416mem
***It was busy running for 10 minutes doing nothing useful.

tagged 315% ladebug a.out
Welcome to the Ladebug Debugger Version 4.0-31
------------------ 
object file name: a.out 
Reading symbolic information ...done
line: 9 Unable to parse input as legal command or C++ expression.
(ladebug) sti main
[#1: stop in int main(void) ]
(ladebug) run
[1] stopped at [int main(void):12 0x1200022cc]
     12 main(){B::p=*(B::q=&m[7]);D d;}
(ladebug) watch (&A::q)                <<<<*** Doing "watch" after program
                          <<<*** has started is much more successful!
[#2: watch 0x140000698 to 0x14000069f ]
(ladebug) c
[2] The contents at address range 0x140000698 to 0x14000069f
    was accessed by instruction 0x1200022d8
        Old value = 0
        New value = 5368709192
[2] stopped at [int main(void):12 0x1200022dc]
     12 main(){B::p=*(B::q=&m[7]);D d;}
(ladebug) c
[2] The contents at address range 0x140000698 to 0x14000069f
    was accessed by instruction 0x120001e34
        Old value = 5368709192
        New value = 5368709184
[2] stopped at [void* A::r(void):3 0x120001e38]
      3   virtual void *r(){char c;return((c=*p++)?(r(),putchar(c)):(p=*--q));}
(ladebug) q
T.RTitleUserPersonal
Name
DateLines
881.1TLE::LUCIAhttp://asaab.zko.dec.com/~lucia/biography.htmlWed Mar 12 1997 14:354
I like this trend of posting solutions with the bug reports.  Keep up the good
work, all!

Tim
881.2Not always so simpleWIBBIN::NOYCEPulling weeds, pickin&#039; stonesWed Mar 12 1997 15:5033
Can we post bug reports on the solutions, too?

Mark's strategy assumes that the next time through the LDx_L + STx_C
sequence, you will actually reach the STx_C instruction, rather than
branching out early.  If you branch out early, the program will "free-run"
without its watchpoint turned on.  For example, the recommended
sequence to set a mutex looks roughly like:
	LOOP:	LDQ_L	R0, (R1)
		BLBS	R0, WAIT	; Don't try store if already set
		BIS	R0, #1, R2
		STQ_C	R2, (R1)	; Try to set the lock
		BEQ	R2, LOOP	; Repeat if failed
	SUCCESS:
		:
	WAIT:	<do something entirely different>
It's quite likely that the first time through this code the STQ_C faults,
but the second time through it the BLBS is taken.

What does ladebug do when it's executing 'step' or 'stepi' and comes to
a LDx_L instruction?  I seem to recall that VAX DEBUG parses the following
instructions, and sets a breakpoint on every potential branch target
and also after the STx_C instruction, then executes the whole sequence
without interference.  The Alpha architecture says STx_C can fail if
there's a taken branch since the matching LDx_L, or if there are too
many instructions since the matching LDx_L, so this is a relatively-bounded
problem.  The only trouble is if there's a computed jump in there...

Perhaps what should happen when the watchpoint triggers on STx_C is
that you step over the STx_C (which will produce 0 in its output register),
then reprotect the page to no-access, and try to step until you reach 
a LDx_L instruction.  At that point, unprotect the page and use an
appropriate strategy (such as the one in the preceding paragraph) to
step over the entire LDx_L + STx_C sequence.
881.3TLE::CHIUThu Mar 13 1997 09:595
Thank you for reporting this problem and providing a reproducer along
with suggested solutions :-)  I'm looking into this now.

Caroline