[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference noted::hackers_v1

Title:-={ H A C K E R S }=-
Notice:Write locked - see NOTED::HACKERS
Moderator:DIEHRD::MORRIS
Created:Thu Feb 20 1986
Last Modified:Mon Aug 03 1992
Last Successful Update:Fri Jun 06 1997
Number of topics:680
Total number of notes:5456

647.0. "instruction stream tracing" by DECSIM::FARMER () Sun Jan 03 1988 17:25

	I  needed  a  way  to trace an arbitrary instruction stream and
	report the relative cost of the stream.  So I  put  together  a
	package for public consumption.  This method

	- can be more accurate than putting code in a loop and timing it.
	- works across unmatched call/returns.
	- finds the most expensive instructions in the stream.
	- reports summary of instructions executed and how many times.
	- reports estimated 8800 cycle counts.

	but it

	- won't count instructions executed in system space.
	- only estimates on variable length instructions like MOVCx.

	To  use  it,  paste the following condition handler, ENABLE it,
	and put a (BLISS BUILTIN) BPT instruction at the beginning  and
	the end of the execution stream you want to trace.

	The  condition  handler will output a trace of the instructions
	executed, and, at the end of the stream (2nd BPT), will print a
	summary of the counts of each type of instruction executed, and
	their total cost in 8800 cycles and �seconds.

	In  DECSIM, this has been very useful to find the exact cost of
	different features and commands.  Let me know what you think of
	it, and what you think it'd be useful for....

	The  8800  cycle counts were determined using a facility called
	the User Macro Level Instruction Timer.  I haven't figured  out
	all  the  instructions,  but  I've done most of the more common
	ones.  It looked to me like the cycle time of the 8800 is 45nS,
	or  at  least  all instruction times seemed to be a multiple of
	that, so that's what I used.  The times reported here are  best
	used as relative measures rather than exact times.

------------------------------------------------------------------------
FORWARD
   mnemonic: VECTOR [256, LONG],
   cycle:    VECTOR [256, BYTE];

ROUTINE Cond_handler
   (sig: REF VECTOR [, LONG],
    mech: REF VECTOR [, LONG],
    enbl: REF VECTOR [, LONG]
    ) =
!++
! FUNCTIONAL DESCRIPTION:
!
!	Condition handler that implements an instruction stream tracer,
! activated and deactivated by the presence of BPT instructions in the user's
! code.
!
! IMPLICIT INPUTS:
!	none
!
! IMPLICIT OUTPUTS:
!	none
!
! ROUTINE VALUE:
!	none
!
! SIDE EFFECTS:
!	Pages containing the BPT instructions are set to read/write.
!	Instruction tracing and summary output is sent to SYS$OUTPUT.
!--
   BEGIN
   EXTERNAL ROUTINE
      LIB$PUT_OUTPUT;

   LITERAL
      NOP  = %X'01',
      BPT  = %X'03',					   ! BreakPoinT instruction
      CHMK = %X'BC',
      CHMS = %X'BE';

   BIND
      blank_line = %ASCID'',
      n_args = .Sig [0],
      Cond = Sig[1],					   ! condition code that caused this routine to execute
      Cond_PC = Sig [n_args - 1] : REF VECTOR [, BYTE],	   ! PC of executing instruction
      Cond_PSL = Sig [n_args] : BLOCK [];		   ! Program Status Longword at time of ACCVIO

   OWN
      count: VECTOR [256, WORD],			   ! #times each opcode is executed
      Tracing,						   ! currently between 2 BPT's
      TBIT_Count;					   ! -# of instructions between BPT's

   LOCAL
      msg: VECTOR [80, BYTE],
      msg_dsc: VECTOR [2, LONG] INITIAL (0, msg),
      inadr: VECTOR [2, LONG];

   SELECTONE .Cond OF
      SET
      [SS$_TBIT]:
	 BEGIN
	 !++
	 ! We've just executed one machine instruction. The following code
	 ! reports what the next instruction is, among other things.
	 !--
	 Cond_PSL [PSL$V_TBIT] = 1;			   ! make sure TBIT is on after RET e.g.
	 !++
	 ! If the last instruction was where the BPT used to be (now a NOP),
	 ! replace the BPT instruction.
	 !--
	 TBIT_Count = .TBIT_Count - 1;
	 IF .TBIT_Count EQL 0 THEN			   ! just put in a NOP?
	    BEGIN
	    Cond_PC [-1] = BPT;				   ! replace with BPT
	    IF NOT .Tracing THEN			   ! after 2nd BPT?
	       Cond_PSL [PSL$V_TBIT] = 0;		   ! all done.
	    END;
	 !++
	 ! Report the next instruction to be executed and count it.
	 !--
	 IF (.Tracing AND
	     (.Cond_PC [0] NEQ BPT)) THEN
	    BEGIN
	    msg_dsc [0] = %ALLOCATION (msg);
	    $FAO (%ASCID'!8XL: !2XL !AD', msg_dsc, msg_dsc [0],
		  .Cond_PC, .Cond_PC [0], CH$RCHAR (.mnemonic [.Cond_PC [0]]), .mnemonic [.Cond_PC [0]] + 1);
	    LIB$PUT_OUTPUT (msg_dsc);
	    count [.Cond_PC [0]] = .count [.Cond_PC [0]] + 1;
	    END;
	 !++
	 ! Test for code in system space
	 !--
	 IF ((.Cond_PC GEQA %X'80000000') OR
	     ((.Cond_PC [0] GEQU CHMK) AND (.Cond_PC [0] LEQU CHMS))) THEN
	    LIB$PUT_OUTPUT (%ASCID'%E, can''t trace system instructions; data lost');
	 RETURN SS$_CONTINUE
	 END;	!( [SS$_TBIT]: )

      [SS$_BREAK]:
	 BEGIN
	 !++
	 ! This could be the first or the second BPT encountered. Turn on or
	 ! turn off tracing accordingly.
	 !--
	 IF NOT .Tracing THEN				   ! 1st BPT?
	    BEGIN
	    CH$FILL (0, %ALLOCATION (count), count);	   ! zero counts
	    LIB$PUT_OUTPUT (blank_line);
	    Cond_PSL [PSL$V_TBIT] = 1;			   ! turn on Tracing
	    Tracing = 1;
	    END
	 ELSE
	    BEGIN
	    !++
	    ! 2nd BPT. Report the summary of instructions executed
	    !--
	    LOCAL
	       last_idx: REF VECTOR [, BYTE],
	       total_cycles: INITIAL (0),
	       idx: VECTOR [256, BYTE];

	    LIB$PUT_OUTPUT (blank_line);
	    !++
	    ! Sort the instructions by (#times executed * #cycles per)
	    !--
	    idx [0] = 0;
	    INCR i FROM 0 TO 255 BY 1 DO
	       BEGIN
	       IF .count [.i] NEQ 0 THEN		   ! any occurrences of this instruction?
		  BEGIN
		  BIND
		     cycles = .count [.i]*.cycle [.i];

		  total_cycles = .total_cycles + cycles;
		  last_idx = idx [0];
		  UNTIL ((.last_idx [0] EQL 0) OR
			 (cycles GTRU .count [.last_idx [0]]*.cycle [.last_idx [0]])) DO
		     last_idx = idx [.last_idx [0]];
		  idx [.i] = .last_idx [0];
		  last_idx [0] = .i;
		  END;
	       END;      !( INCR )
	    !++
	    ! Print out the instructions in descending order of 'cost'
	    !--
	    LIB$PUT_OUTPUT (%ASCID'Count Cycles Instruction');
	    LIB$PUT_OUTPUT (%ASCID'----- ------ -----------');
	    last_idx = idx [0];
	    UNTIL .last_idx [0] EQL 0 DO
	       BEGIN
	       BIND i = .last_idx [0];
	       msg_dsc [0] = %ALLOCATION (msg);
	       $FAO (%ASCID'!5ULx!4UL   !AD', msg_dsc, msg_dsc [0],
		     .count [i], .cycle [i], CH$RCHAR (.mnemonic [i]), .mnemonic [i] + 1);
	       LIB$PUT_OUTPUT (msg_dsc);
	       last_idx = idx [i];
	       END;
	    LIB$PUT_OUTPUT (blank_line);
	    msg_dsc [0] = %ALLOCATION (msg);
	    $FAO (%ASCID'!UL instructions taking !UL cycles (!UL�S)',
		  msg_dsc, msg_dsc [0],
		  -.TBIT_Count, .total_cycles, (.total_cycles*45+500)/1000);
	    LIB$PUT_OUTPUT (msg_dsc);
	    Tracing = 0;				   ! setup for next BPT
	    END;
	 !++
	 ! Replace the BPT with a NOP so we can get beyond the BPT. With TBIT on
	 ! the NOP will be replaced (in the TBIT handler above) as soon as it is
	 ! executed.
	 !--
	 inadr [1] = inadr [0] = .Cond_PC;
	 $SETPRT (INADR = inadr, PROT = PRT$C_UW);	   ! make sure page is writeable
	 Cond_PC [0] = NOP;				   ! replace BPT with NOP
	 TBIT_Count = 1;				   ! initialize instruction count
	 RETURN SS$_CONTINUE
	 END;	!( [SS$_TBIT]: )

      [OTHERWISE]: RETURN SS$_RESIGNAL;
      TES;

   END;	  !( ROUTINE Cond_handler )

MACRO
   !++
   ! Pairs of opcode mnemonics and 8800 cycles (NOTE: not all cycle counts have
   ! been figured out and entered). 
   !--
   mach_instructions =
      , 0,       , 0, REI   , 0,       , 0, RET   ,11, RSB   , 6,       , 0,       , 0,   ! 0x
      , 0,       , 0, INDEX , 0, CRC  ,225, PROBER, 0, PROBEW, 0, INSQUE,28, REMQUE,15,
      BSBB  , 6, BRB   , 6, BNEQ  , 6, BEQL  , 6, BGTR  , 6, BLEQ  , 6, JSB   ,10, JMP   , 6,   ! 1x
      BGEQ  , 6, BLSS  , 6, BGTRU , 6, BLEQU , 6, BVC   , 6, BVS   , 6, BGEQU , 6, BLSSU , 6,
            , 0,       , 0,       , 0,       , 0,       , 0,       , 0,       , 0,       , 0,   ! 2x
      MOVC3 ,69, CMPC3 ,60, SCANC , 0, SPANC , 0, MOVC5 ,52, CMPC5 ,17, MOVTC , 0, MOVTUC, 0,
      BSBW  , 6, BRW   , 6, CVTWL , 4, CVTWB , 6,       , 0,       , 0,       , 0,       , 0,   ! 3x
            , 0, MATCHC, 0, LOCC  , 9, SKPC  , 0, MOVZWL, 4, ACBW  , 0, MOVAW , 2, PUSHAW, 3,
      ADDF2 , 7, ADDF3 , 8, SUBF2 , 7, SUBF3 , 8, MULF2 ,11, MULF3 ,12, DIVF2 ,35, DIVF3 ,36,   ! 4x
      CVTFB ,17, CVTFW ,17, CVTFL ,15, CVTRFL,19, CVTBF , 8, CVTWF , 8, CVTLF , 6, ACBF  , 0,
      MOVF  , 2, CMPF  , 8, MNEGF , 4, TSTF  , 0, EMODF , 0, POLYF,134,       , 0,       , 0,   ! 5x
      ADAWI , 0,       , 0,       , 0,       , 0, INSQHI,150,INSQTI,74, REMQHI,150,REMQTI,74,
            , 0,       , 0,       , 0,       , 0,       , 0,       , 0,       , 0,       , 0,   ! 6x
            , 0,       , 0,       , 0,       , 0,       , 0,       , 0,       , 0,       , 0,
            , 0,       , 0,       , 0,       , 0,       , 0,       , 0,       , 0,       , 0,   ! 7x
      ASHL  ,12, ASHQ  , 0, EMUL  ,21, EDIV  ,74, CLRQ  , 3, MOVQ  , 3, MOVAQ , 3, PUSHAQ, 3,
      ADDB2 , 3, ADDB3 , 5, SUBB2 , 3, SUBB3 , 5, MULB2 ,17, MULB3 ,18, DIVB2 ,46, DIVB3 ,47,   ! 8x
      BISB2 , 3, BISB3 , 4, BICB2 , 4, BICB3 , 5, XORB2 , 3, XORB3 , 3, MNEGB , 3, CASEB ,10,
      MOVB  , 2, CMPB  , 2, MCOMB , 0, BITB  , 0, CLRB  , 2, TSTB  , 5, INCB  , 2, DECB  , 2,   ! 9x
      CVTBL , 4, CVTBW , 4, MOVZBL, 4, MOVZBW, 4, ROTL  , 8, ACBB  , 0, MOVAB , 2, PUSHAB, 3,
      ADDW2 , 3, ADDW3 , 5, SUBW2 , 3, SUBW3 , 5, MULW2 ,17, MULW3 ,18, DIVW2 ,46, DIVW3 ,47,   ! Ax
      BISW2 , 3, BISW3 , 4, BICW2 , 4, BICW3 , 5, XORW2 , 3, XORW3 , 3, MNEGW , 3, CASEW ,10,
      MOVW  , 2, CMPW  , 2, MCOMW , 0, BITW  , 0, CLRW  , 2, TSTW  , 5, INCW  , 2, DECW  , 2,   ! Bx
      BISPSW, 0, BICPSW, 0, POPR  , 0, PUSHR , 0, CHMK  , 0, CHME  , 0, CHMS  , 0, CHMU  , 0,
      ADDL2 , 3, ADDL3 , 5, SUBL2 , 3, SUBL3 , 5, MULL2 ,17, MULL3 ,18, DIVL2 ,46, DIVL3 ,47,   ! Cx
      BISL2 , 3, BISL3 , 4, BICL2 , 4, BICL3 , 5, XORL2 , 3, XORL3 , 3, MNEGL , 3, CASEL ,10,
      MOVL  , 2, CMPL  , 7, MCOML , 2, BITL  , 2, CLRL  , 2, TSTL  , 5, INCL  , 2, DECL  , 2,   ! Dx
      ADWC  , 3, SBWC  , 3,       , 0,       , 0, MOVPSL, 0, PUSHL , 3, MOVAL , 2, PUSHAL, 3,
      BBS   , 9, BBC   , 9, BBSS  , 0, BBCS  , 0, BBSC  ,10, BBCC  ,10, BBSSI ,10, BBCCI ,10,   ! Ex
      BLBS  , 8, BLBC  , 8, FFS   ,38, FFC   ,21, CMPV  ,13, CMPZV ,14, EXTV  ,13, EXTZV ,12,
      INSV  ,14, ACBL  , 0, AOBLSS, 0, AOBLEQ, 0, SOBGEQ, 0, SOBGTR, 0, CVTLB , 6, CVTLW , 6,   ! Fx
            , 0,       , 0, CALLG ,46, CALLS ,50,       , 0,       , 0,       , 0,       , 0 %,
   instr_name [name, cycl] = %IF %NULL (name) %THEN NONAMEC %ELSE UPLIT (%ASCIC %STRING (name)) %FI %,
   instr_cycl [name, cycl] = cycl %;

BIND
   NONAMEC  = UPLIT (%ASCIC'');

OWN
   mnemonic: VECTOR [256, LONG] INITIAL (instr_name (mach_instructions)),
   cycle:    VECTOR [256, BYTE] INITIAL (BYTE (instr_cycl (mach_instructions)));
------------------------------------------------------------------------
T.RTitleUserPersonal
Name
DateLines
647.1here's some resuls from UMLIT (re.0)DECSIM::FARMERSun Jan 03 1988 17:2746
          Times for selected instructions on recent architectures
 
                                uVAX     780    8200    8600    8800 (cy)   avge
 
MOVL Reg,Reg                    0.44    0.40    0.40    0.10    0.09 (2)     .29
ADDL2 Reg,Reg                   0.44    0.40    0.39    0.10    0.14 (3)     .29
CLRL  Reg                       0.66    0.60    0.60    0.08    0.09 (2)     .41
ADDL3 Reg,Reg,Reg               0.69    0.60    0.81    0.16    0.21 (5)     .49
CLRQ  Reg                       0.88    1.20    0.80    0.16    0.14 (3)     .64
BRW                             1.17    0.85    0.73    0.29    0.25 (6)     .66
MOVL  Mem,Reg                   1.32    0.85    0.81    0.30    0.15 (3)     .69
MOVL  Reg,Mem                   1.27    1.25    0.95    0.30    0.14 (3)     .78
TSTL + BLEQ                     1.40    1.01    1.01    0.40    0.32 (7)     .83
CMPL + BLEQ                     1.56    1.23    1.19    0.44    0.40 (9)     .96
BBS   #22,Reg,disp              1.33    1.40    1.61    0.56    0.39 (9)    1.06
ADDF2 Reg,Reg                   2.96    0.79    1.46    0.24    0.32 (7)    1.15
ADDF3 Reg,Reg,Reg               2.94    1.20    1.91    0.32    0.36 (8)    1.35
BBCC  #22,Reg,disp              1.95    1.60    2.31    0.80    0.45 (10)   1.42
MOVL  Mem,Mem                   2.83    1.75    2.11    0.56    0.25 (6)    1.50
MULF2 Reg,Reg                   3.89    1.21    2.03    0.32    0.45 (11)   1.58
ASHL  #10,Reg,Reg               2.03    2.01    5.49    0.56    0.54 (12)   2.13
JSB + RSB                       3.61    3.01    2.93    1.27    0.73 (16)   2.31
CMPV #2,#7,Reg,#-2              3.14    2.82    4.83    0.48    0.59 (13)   2.37
EXTZV #4,#10,Reg,Reg            3.83    2.61    4.84    0.88    0.57 (13)   2.55
EXTV  #4,#10,Reg,Reg            3.65    3.02    5.65    0.88    0.54 (12)   2.75
MULL2 Reg,Reg                   5.24    1.86    5.92    0.64    0.75 (17)   2.88
CMPZV #2,#7,Reg,#-2             4.64    3.02    5.37    0.88    0.62 (14)   2.91
DIVF3 Reg,Reg,Reg               4.71    4.69    2.76    1.38    1.61 (36)   3.03
INSV  Reg,#4,#10,Reg            5.17    3.42    5.79    1.20    0.63 (14)   3.24
EMUL  Reg,Reg,Reg,Reg           6.40    6.93    3.48    0.73    0.95 (21)   3.70
DIVL2 Reg,Reg                   8.47    9.47    8.23    1.71    2.07 (46)   5.99
DIVL3 Reg,Reg,Reg               8.69    9.67    7.67    2.57    2.12 (47)   6.14
INSQUE + REMQUE                10.70   13.94    8.76    2.41    1.92 (43)   7.55
EDIV  Reg,Reg,Reg,Reg          11.43   11.88    7.47    4.01    3.34 (74)   7.63
CALLG #0,Rtn + RET, 0 GPRs     12.85   13.65   12.79    3.02    1.49 (33)   8.76
CALLS #0,Rtn + RET, 0 GPRs     14.10   14.33   14.20    3.42    1.66 (37)   9.54
CALLS #0,Rtn + RET, 5 GPRs     21.33   23.56   17.51    4.60    2.76 (61)  13.95
INSQHI + REMQHI                19.48   27.38   25.01    5.09   10.08 (224) 17.41
CALLS #0,Rtn + RET, 10 GPRs    27.23   32.22   25.49    5.64    3.34 (74)  18.78
CMPC3 Reg,(Reg),(Reg), 10 chr  83.41   14.30   10.30    3.85    2.71 (60)  22.91
MOVC5 #0,(R1),#0,#512,BLOCK   124.58  111.66   63.20   13.17   11.14 (247) 64.75
 
        1.  instruction times are average/typical, in micro-seconds.
 
        2.   numbers  in  parentheses  are  8800  cycles, which, if not
        exact, are meant as a relative comparison to the instructions.
647.2ERIS::CALLASI've lost my faith in nihilism.Mon Jan 04 1988 16:0810
    I find seeing the 780, the 8200, and the 8600 in a list of "recent
    architectures" to be amusing. Not only are they more correctly
    different implementations of a single architecture, the VAX
    archtecture, but it is amusing to see the 780 called "recent." I'm not
    criticizing your putting it there -- it's nice to see it as a reference
    point, but would be nicer to see timings for the "turbo" versions of
    the 8200 and 8600 -- the 8250 (even better if you find one with a
    working M-chip) and 8650, as these are more common. 
    
    	Jon
647.3DECSIM::FARMERMon Jan 04 1988 18:176
	I  did  put  the  780  times as a reference point for the other
	"implementations".  I only got times for the  "implementations"
	I  have  access  to.   I  can  send the procedure to anyone who
	wishes to find times for other "implementations".

	/cliff