[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | -={ H A C K E R S }=- |
Notice: | Write locked - see NOTED::HACKERS |
Moderator: | DIEHRD::MORRIS |
|
Created: | Thu Feb 20 1986 |
Last Modified: | Mon Aug 03 1992 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 680 |
Total number of notes: | 5456 |
647.0. "instruction stream tracing" by DECSIM::FARMER () Sun Jan 03 1988 17:25
I needed a way to trace an arbitrary instruction stream and
report the relative cost of the stream. So I put together a
package for public consumption. This method
- can be more accurate than putting code in a loop and timing it.
- works across unmatched call/returns.
- finds the most expensive instructions in the stream.
- reports summary of instructions executed and how many times.
- reports estimated 8800 cycle counts.
but it
- won't count instructions executed in system space.
- only estimates on variable length instructions like MOVCx.
To use it, paste the following condition handler, ENABLE it,
and put a (BLISS BUILTIN) BPT instruction at the beginning and
the end of the execution stream you want to trace.
The condition handler will output a trace of the instructions
executed, and, at the end of the stream (2nd BPT), will print a
summary of the counts of each type of instruction executed, and
their total cost in 8800 cycles and �seconds.
In DECSIM, this has been very useful to find the exact cost of
different features and commands. Let me know what you think of
it, and what you think it'd be useful for....
The 8800 cycle counts were determined using a facility called
the User Macro Level Instruction Timer. I haven't figured out
all the instructions, but I've done most of the more common
ones. It looked to me like the cycle time of the 8800 is 45nS,
or at least all instruction times seemed to be a multiple of
that, so that's what I used. The times reported here are best
used as relative measures rather than exact times.
------------------------------------------------------------------------
FORWARD
mnemonic: VECTOR [256, LONG],
cycle: VECTOR [256, BYTE];
ROUTINE Cond_handler
(sig: REF VECTOR [, LONG],
mech: REF VECTOR [, LONG],
enbl: REF VECTOR [, LONG]
) =
!++
! FUNCTIONAL DESCRIPTION:
!
! Condition handler that implements an instruction stream tracer,
! activated and deactivated by the presence of BPT instructions in the user's
! code.
!
! IMPLICIT INPUTS:
! none
!
! IMPLICIT OUTPUTS:
! none
!
! ROUTINE VALUE:
! none
!
! SIDE EFFECTS:
! Pages containing the BPT instructions are set to read/write.
! Instruction tracing and summary output is sent to SYS$OUTPUT.
!--
BEGIN
EXTERNAL ROUTINE
LIB$PUT_OUTPUT;
LITERAL
NOP = %X'01',
BPT = %X'03', ! BreakPoinT instruction
CHMK = %X'BC',
CHMS = %X'BE';
BIND
blank_line = %ASCID'',
n_args = .Sig [0],
Cond = Sig[1], ! condition code that caused this routine to execute
Cond_PC = Sig [n_args - 1] : REF VECTOR [, BYTE], ! PC of executing instruction
Cond_PSL = Sig [n_args] : BLOCK []; ! Program Status Longword at time of ACCVIO
OWN
count: VECTOR [256, WORD], ! #times each opcode is executed
Tracing, ! currently between 2 BPT's
TBIT_Count; ! -# of instructions between BPT's
LOCAL
msg: VECTOR [80, BYTE],
msg_dsc: VECTOR [2, LONG] INITIAL (0, msg),
inadr: VECTOR [2, LONG];
SELECTONE .Cond OF
SET
[SS$_TBIT]:
BEGIN
!++
! We've just executed one machine instruction. The following code
! reports what the next instruction is, among other things.
!--
Cond_PSL [PSL$V_TBIT] = 1; ! make sure TBIT is on after RET e.g.
!++
! If the last instruction was where the BPT used to be (now a NOP),
! replace the BPT instruction.
!--
TBIT_Count = .TBIT_Count - 1;
IF .TBIT_Count EQL 0 THEN ! just put in a NOP?
BEGIN
Cond_PC [-1] = BPT; ! replace with BPT
IF NOT .Tracing THEN ! after 2nd BPT?
Cond_PSL [PSL$V_TBIT] = 0; ! all done.
END;
!++
! Report the next instruction to be executed and count it.
!--
IF (.Tracing AND
(.Cond_PC [0] NEQ BPT)) THEN
BEGIN
msg_dsc [0] = %ALLOCATION (msg);
$FAO (%ASCID'!8XL: !2XL !AD', msg_dsc, msg_dsc [0],
.Cond_PC, .Cond_PC [0], CH$RCHAR (.mnemonic [.Cond_PC [0]]), .mnemonic [.Cond_PC [0]] + 1);
LIB$PUT_OUTPUT (msg_dsc);
count [.Cond_PC [0]] = .count [.Cond_PC [0]] + 1;
END;
!++
! Test for code in system space
!--
IF ((.Cond_PC GEQA %X'80000000') OR
((.Cond_PC [0] GEQU CHMK) AND (.Cond_PC [0] LEQU CHMS))) THEN
LIB$PUT_OUTPUT (%ASCID'%E, can''t trace system instructions; data lost');
RETURN SS$_CONTINUE
END; !( [SS$_TBIT]: )
[SS$_BREAK]:
BEGIN
!++
! This could be the first or the second BPT encountered. Turn on or
! turn off tracing accordingly.
!--
IF NOT .Tracing THEN ! 1st BPT?
BEGIN
CH$FILL (0, %ALLOCATION (count), count); ! zero counts
LIB$PUT_OUTPUT (blank_line);
Cond_PSL [PSL$V_TBIT] = 1; ! turn on Tracing
Tracing = 1;
END
ELSE
BEGIN
!++
! 2nd BPT. Report the summary of instructions executed
!--
LOCAL
last_idx: REF VECTOR [, BYTE],
total_cycles: INITIAL (0),
idx: VECTOR [256, BYTE];
LIB$PUT_OUTPUT (blank_line);
!++
! Sort the instructions by (#times executed * #cycles per)
!--
idx [0] = 0;
INCR i FROM 0 TO 255 BY 1 DO
BEGIN
IF .count [.i] NEQ 0 THEN ! any occurrences of this instruction?
BEGIN
BIND
cycles = .count [.i]*.cycle [.i];
total_cycles = .total_cycles + cycles;
last_idx = idx [0];
UNTIL ((.last_idx [0] EQL 0) OR
(cycles GTRU .count [.last_idx [0]]*.cycle [.last_idx [0]])) DO
last_idx = idx [.last_idx [0]];
idx [.i] = .last_idx [0];
last_idx [0] = .i;
END;
END; !( INCR )
!++
! Print out the instructions in descending order of 'cost'
!--
LIB$PUT_OUTPUT (%ASCID'Count Cycles Instruction');
LIB$PUT_OUTPUT (%ASCID'----- ------ -----------');
last_idx = idx [0];
UNTIL .last_idx [0] EQL 0 DO
BEGIN
BIND i = .last_idx [0];
msg_dsc [0] = %ALLOCATION (msg);
$FAO (%ASCID'!5ULx!4UL !AD', msg_dsc, msg_dsc [0],
.count [i], .cycle [i], CH$RCHAR (.mnemonic [i]), .mnemonic [i] + 1);
LIB$PUT_OUTPUT (msg_dsc);
last_idx = idx [i];
END;
LIB$PUT_OUTPUT (blank_line);
msg_dsc [0] = %ALLOCATION (msg);
$FAO (%ASCID'!UL instructions taking !UL cycles (!UL�S)',
msg_dsc, msg_dsc [0],
-.TBIT_Count, .total_cycles, (.total_cycles*45+500)/1000);
LIB$PUT_OUTPUT (msg_dsc);
Tracing = 0; ! setup for next BPT
END;
!++
! Replace the BPT with a NOP so we can get beyond the BPT. With TBIT on
! the NOP will be replaced (in the TBIT handler above) as soon as it is
! executed.
!--
inadr [1] = inadr [0] = .Cond_PC;
$SETPRT (INADR = inadr, PROT = PRT$C_UW); ! make sure page is writeable
Cond_PC [0] = NOP; ! replace BPT with NOP
TBIT_Count = 1; ! initialize instruction count
RETURN SS$_CONTINUE
END; !( [SS$_TBIT]: )
[OTHERWISE]: RETURN SS$_RESIGNAL;
TES;
END; !( ROUTINE Cond_handler )
MACRO
!++
! Pairs of opcode mnemonics and 8800 cycles (NOTE: not all cycle counts have
! been figured out and entered).
!--
mach_instructions =
, 0, , 0, REI , 0, , 0, RET ,11, RSB , 6, , 0, , 0, ! 0x
, 0, , 0, INDEX , 0, CRC ,225, PROBER, 0, PROBEW, 0, INSQUE,28, REMQUE,15,
BSBB , 6, BRB , 6, BNEQ , 6, BEQL , 6, BGTR , 6, BLEQ , 6, JSB ,10, JMP , 6, ! 1x
BGEQ , 6, BLSS , 6, BGTRU , 6, BLEQU , 6, BVC , 6, BVS , 6, BGEQU , 6, BLSSU , 6,
, 0, , 0, , 0, , 0, , 0, , 0, , 0, , 0, ! 2x
MOVC3 ,69, CMPC3 ,60, SCANC , 0, SPANC , 0, MOVC5 ,52, CMPC5 ,17, MOVTC , 0, MOVTUC, 0,
BSBW , 6, BRW , 6, CVTWL , 4, CVTWB , 6, , 0, , 0, , 0, , 0, ! 3x
, 0, MATCHC, 0, LOCC , 9, SKPC , 0, MOVZWL, 4, ACBW , 0, MOVAW , 2, PUSHAW, 3,
ADDF2 , 7, ADDF3 , 8, SUBF2 , 7, SUBF3 , 8, MULF2 ,11, MULF3 ,12, DIVF2 ,35, DIVF3 ,36, ! 4x
CVTFB ,17, CVTFW ,17, CVTFL ,15, CVTRFL,19, CVTBF , 8, CVTWF , 8, CVTLF , 6, ACBF , 0,
MOVF , 2, CMPF , 8, MNEGF , 4, TSTF , 0, EMODF , 0, POLYF,134, , 0, , 0, ! 5x
ADAWI , 0, , 0, , 0, , 0, INSQHI,150,INSQTI,74, REMQHI,150,REMQTI,74,
, 0, , 0, , 0, , 0, , 0, , 0, , 0, , 0, ! 6x
, 0, , 0, , 0, , 0, , 0, , 0, , 0, , 0,
, 0, , 0, , 0, , 0, , 0, , 0, , 0, , 0, ! 7x
ASHL ,12, ASHQ , 0, EMUL ,21, EDIV ,74, CLRQ , 3, MOVQ , 3, MOVAQ , 3, PUSHAQ, 3,
ADDB2 , 3, ADDB3 , 5, SUBB2 , 3, SUBB3 , 5, MULB2 ,17, MULB3 ,18, DIVB2 ,46, DIVB3 ,47, ! 8x
BISB2 , 3, BISB3 , 4, BICB2 , 4, BICB3 , 5, XORB2 , 3, XORB3 , 3, MNEGB , 3, CASEB ,10,
MOVB , 2, CMPB , 2, MCOMB , 0, BITB , 0, CLRB , 2, TSTB , 5, INCB , 2, DECB , 2, ! 9x
CVTBL , 4, CVTBW , 4, MOVZBL, 4, MOVZBW, 4, ROTL , 8, ACBB , 0, MOVAB , 2, PUSHAB, 3,
ADDW2 , 3, ADDW3 , 5, SUBW2 , 3, SUBW3 , 5, MULW2 ,17, MULW3 ,18, DIVW2 ,46, DIVW3 ,47, ! Ax
BISW2 , 3, BISW3 , 4, BICW2 , 4, BICW3 , 5, XORW2 , 3, XORW3 , 3, MNEGW , 3, CASEW ,10,
MOVW , 2, CMPW , 2, MCOMW , 0, BITW , 0, CLRW , 2, TSTW , 5, INCW , 2, DECW , 2, ! Bx
BISPSW, 0, BICPSW, 0, POPR , 0, PUSHR , 0, CHMK , 0, CHME , 0, CHMS , 0, CHMU , 0,
ADDL2 , 3, ADDL3 , 5, SUBL2 , 3, SUBL3 , 5, MULL2 ,17, MULL3 ,18, DIVL2 ,46, DIVL3 ,47, ! Cx
BISL2 , 3, BISL3 , 4, BICL2 , 4, BICL3 , 5, XORL2 , 3, XORL3 , 3, MNEGL , 3, CASEL ,10,
MOVL , 2, CMPL , 7, MCOML , 2, BITL , 2, CLRL , 2, TSTL , 5, INCL , 2, DECL , 2, ! Dx
ADWC , 3, SBWC , 3, , 0, , 0, MOVPSL, 0, PUSHL , 3, MOVAL , 2, PUSHAL, 3,
BBS , 9, BBC , 9, BBSS , 0, BBCS , 0, BBSC ,10, BBCC ,10, BBSSI ,10, BBCCI ,10, ! Ex
BLBS , 8, BLBC , 8, FFS ,38, FFC ,21, CMPV ,13, CMPZV ,14, EXTV ,13, EXTZV ,12,
INSV ,14, ACBL , 0, AOBLSS, 0, AOBLEQ, 0, SOBGEQ, 0, SOBGTR, 0, CVTLB , 6, CVTLW , 6, ! Fx
, 0, , 0, CALLG ,46, CALLS ,50, , 0, , 0, , 0, , 0 %,
instr_name [name, cycl] = %IF %NULL (name) %THEN NONAMEC %ELSE UPLIT (%ASCIC %STRING (name)) %FI %,
instr_cycl [name, cycl] = cycl %;
BIND
NONAMEC = UPLIT (%ASCIC'');
OWN
mnemonic: VECTOR [256, LONG] INITIAL (instr_name (mach_instructions)),
cycle: VECTOR [256, BYTE] INITIAL (BYTE (instr_cycl (mach_instructions)));
------------------------------------------------------------------------
T.R | Title | User | Personal Name | Date | Lines |
---|
647.1 | here's some resuls from UMLIT (re.0) | DECSIM::FARMER | | Sun Jan 03 1988 17:27 | 46 |
| Times for selected instructions on recent architectures
uVAX 780 8200 8600 8800 (cy) avge
MOVL Reg,Reg 0.44 0.40 0.40 0.10 0.09 (2) .29
ADDL2 Reg,Reg 0.44 0.40 0.39 0.10 0.14 (3) .29
CLRL Reg 0.66 0.60 0.60 0.08 0.09 (2) .41
ADDL3 Reg,Reg,Reg 0.69 0.60 0.81 0.16 0.21 (5) .49
CLRQ Reg 0.88 1.20 0.80 0.16 0.14 (3) .64
BRW 1.17 0.85 0.73 0.29 0.25 (6) .66
MOVL Mem,Reg 1.32 0.85 0.81 0.30 0.15 (3) .69
MOVL Reg,Mem 1.27 1.25 0.95 0.30 0.14 (3) .78
TSTL + BLEQ 1.40 1.01 1.01 0.40 0.32 (7) .83
CMPL + BLEQ 1.56 1.23 1.19 0.44 0.40 (9) .96
BBS #22,Reg,disp 1.33 1.40 1.61 0.56 0.39 (9) 1.06
ADDF2 Reg,Reg 2.96 0.79 1.46 0.24 0.32 (7) 1.15
ADDF3 Reg,Reg,Reg 2.94 1.20 1.91 0.32 0.36 (8) 1.35
BBCC #22,Reg,disp 1.95 1.60 2.31 0.80 0.45 (10) 1.42
MOVL Mem,Mem 2.83 1.75 2.11 0.56 0.25 (6) 1.50
MULF2 Reg,Reg 3.89 1.21 2.03 0.32 0.45 (11) 1.58
ASHL #10,Reg,Reg 2.03 2.01 5.49 0.56 0.54 (12) 2.13
JSB + RSB 3.61 3.01 2.93 1.27 0.73 (16) 2.31
CMPV #2,#7,Reg,#-2 3.14 2.82 4.83 0.48 0.59 (13) 2.37
EXTZV #4,#10,Reg,Reg 3.83 2.61 4.84 0.88 0.57 (13) 2.55
EXTV #4,#10,Reg,Reg 3.65 3.02 5.65 0.88 0.54 (12) 2.75
MULL2 Reg,Reg 5.24 1.86 5.92 0.64 0.75 (17) 2.88
CMPZV #2,#7,Reg,#-2 4.64 3.02 5.37 0.88 0.62 (14) 2.91
DIVF3 Reg,Reg,Reg 4.71 4.69 2.76 1.38 1.61 (36) 3.03
INSV Reg,#4,#10,Reg 5.17 3.42 5.79 1.20 0.63 (14) 3.24
EMUL Reg,Reg,Reg,Reg 6.40 6.93 3.48 0.73 0.95 (21) 3.70
DIVL2 Reg,Reg 8.47 9.47 8.23 1.71 2.07 (46) 5.99
DIVL3 Reg,Reg,Reg 8.69 9.67 7.67 2.57 2.12 (47) 6.14
INSQUE + REMQUE 10.70 13.94 8.76 2.41 1.92 (43) 7.55
EDIV Reg,Reg,Reg,Reg 11.43 11.88 7.47 4.01 3.34 (74) 7.63
CALLG #0,Rtn + RET, 0 GPRs 12.85 13.65 12.79 3.02 1.49 (33) 8.76
CALLS #0,Rtn + RET, 0 GPRs 14.10 14.33 14.20 3.42 1.66 (37) 9.54
CALLS #0,Rtn + RET, 5 GPRs 21.33 23.56 17.51 4.60 2.76 (61) 13.95
INSQHI + REMQHI 19.48 27.38 25.01 5.09 10.08 (224) 17.41
CALLS #0,Rtn + RET, 10 GPRs 27.23 32.22 25.49 5.64 3.34 (74) 18.78
CMPC3 Reg,(Reg),(Reg), 10 chr 83.41 14.30 10.30 3.85 2.71 (60) 22.91
MOVC5 #0,(R1),#0,#512,BLOCK 124.58 111.66 63.20 13.17 11.14 (247) 64.75
1. instruction times are average/typical, in micro-seconds.
2. numbers in parentheses are 8800 cycles, which, if not
exact, are meant as a relative comparison to the instructions.
|
647.2 | | ERIS::CALLAS | I've lost my faith in nihilism. | Mon Jan 04 1988 16:08 | 10 |
| I find seeing the 780, the 8200, and the 8600 in a list of "recent
architectures" to be amusing. Not only are they more correctly
different implementations of a single architecture, the VAX
archtecture, but it is amusing to see the 780 called "recent." I'm not
criticizing your putting it there -- it's nice to see it as a reference
point, but would be nicer to see timings for the "turbo" versions of
the 8200 and 8600 -- the 8250 (even better if you find one with a
working M-chip) and 8650, as these are more common.
Jon
|
647.3 | | DECSIM::FARMER | | Mon Jan 04 1988 18:17 | 6 |
| I did put the 780 times as a reference point for the other
"implementations". I only got times for the "implementations"
I have access to. I can send the procedure to anyone who
wishes to find times for other "implementations".
/cliff
|