[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::unix_objsym

Title:	Digital UNIX Object File/Symbol Table Notes Conference

Moderator:	SMURF::LOWELL

Created:	Mon Nov 25 1996
Last Modified:	Thu Jun 05 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	71
Total number of notes:	314

33.0. "ISSUE 27: Debugging optimized code" by SMURF::LOWELL () Wed Dec 04 1996 15:12

T.R	Title	User	Personal Name	Date	Lines
33.1	comments on issue 27 from David C. P. LaFrance-Linden	SMURF::LOWELL		`Thu Dec 05 1996 16:24`	56
33.2		VIRRUS::diewald	Here In Soap Opera Central...	`Tue Dec 24 1996 14:26`	13
33.3	Proposed Optimization Symbols Organization	GEMGRP::BRENDER	Ron Brender	`Fri Feb 14 1997 13:55`	338
	Optimization Symbols Section Ron Brender \| (Jeff Nelson) \| 14 February 1997 (Rev) ABSTRACT This document defines the syntax and semantics of the Third Eye Symbols section known as the "Optimization Symbols" section. This section, though it has a definition in /usr/include/sym.h, is currently unused on the Digital Unix platform. 1 MOTIVATION The Third Eye symbol table embedded within a COFF object file is unwieldy and difficult to extend without coordination and cooperation between lots of consumers and producers. This is further compounded by two factors: first, not all consumers and producers are necessarily known. Second, the extensions that a producer or consumer wants are not necessarily compatible with the needs of any other producer or consumer. 2 PURPOSE The definition of the Optimization Symbols section is designed to ease these problems. It gives individual producers and consumers the ability to communicate information about any aspect of the object file in any form they choose. This allows for new or modified descriptions, while keeping the rest of the symbol table unchanged. New information can be generated at any time without coordination between all producers and consumers, though eventually a minimal amount of coordination is required. In fact, it is recommended that the information in the Optimization Symbols section eventually be folded back into the mainstream symbol description. This is so all producers and consumers can take advantage of the information which up to now has presumably been private between one or more producers and consumers. It is assumed that the information in the Optimization Symbols section does not contradict or fundamentally violate any understanding of the object that is given in any other part of the symbol table. In other words, it is not OK to "lie" in the mainstream symbol description and tell the "truth" in the Optimization Symbols section. It is OK, however, to modify or enhance the description of the main symbol table. This is the intended purpose of the section. Optimization Symbols Section Page 2 PURPOSE 14 February 1997 \| Since the prior version of this proposal was prepared (8 August 1995), \| the UNIX Object File and Symbol Table Working Group has adopted a \| specification for the .comment section that is substantially similar \| in goals and mechanism. This revision recasts this proposal in a \| manner that is more consistent with the .comment section in \| representation and processing. 3 DEFINITION \| The Optimization Section consists of a sequence of zero or more "per \| procedure optimization descriptions" (PPOD). Each PPOD is pointed to \| by the iopt field of the procedure descriptor (PDR) to which it \| applies (as explained later). While each PPOD has an internal \| structure much like that of a .comment section, there is no \| meta-structure or wrapper that collects all the PPODs of a file \| together. (There are existing pointer and size fields of the COFF \| file header (HDRR) and File Descriptors (FDR) that are used in the \| usual way to describe the aggregate of all PPODs; more later.) In \| particular, it is intended that the linker (ld) be able to construct \| the Optimization Section of an output image much like it constructs \| the local symbol table -- as the concatenation of the Optimization \| Sections of the constituent object files. Unlike the local symbol \| table, however, it is intended that the linker need not interpret \| and/or modify the contents of a PPOD in any way. \| \| \| Each constituent PPOD of the Optimization Section has a structure that \| is analogous to a comment section: \| \| o A leading sequence of TLV "index" entries that describe the \| location and parts of the PPOD, followed by \| \| o A raw data area containing the actual optimization \| descriptions. \| \| \| \| 3.1 PPOD Index Entry Structure \| \| Each index entry has the following structure: \| \| =================start definition=============== \| \| typedef struct { \| unsigned int ppode_tag; \| unsigned int ppode_len; \| unsigned long ppode_val; \| } PPODHDR; \| \| ================= end definition =============== Optimization Symbols Section Page 3 DEFINITION 14 February 1997 \| where \| \| ppode_tag Identifies the kind of data described by the entry. \| \| ppode_len Indicates the size of the data, in bytes, which is \| found in the free form data area of this same PPOD. \| When this field is zero, then the only data is that \| found in the ppode_val field. \| \| ppode_val Is, or describes the location of, the data of the \| given kind. When ppode_len is zero, this field \| contains the (only) data itself. When ppode_len is \| non-zero, this field is a relative file offset from \| the beginning of the current PPOD to the applicable \| data area. \| \| \| The start of all data allocated in the free-form area must be octaword \| (16-byte) aligned. (Recall that the Optimization Section itself is \| octaword aligned.) It follows (and is required) that each distinct \| PPOD must be octaword aligned as well. The length stored in ppode_len \| need not be an octaword multiple, but when it is not, padding with \| zero-bytes must be appended to the end of the data item. \| \| \| \| 3.2 PPOD Index Entry Kinds \| \| Every PPOD must contain at least two PPOD Entries: the first and last \| of which must be: \| \| Tag Value Interpretation \| \| PPODE_STAMP 1 Identifies the version number of the PPOD. \| ppode_len must be zero, and ppode_val contains \| the version number (initially 1). \| \| =================start definition=============== \| \| #define PPOD_VERSION = 1; /* current version number */ \| \| ================= end definition =============== \| \| PPODE_END 2 Indicates the end of the PPOD Entries for this \| PPOD. (Both ppode_len and ppode_val must be \| zero.) \| \| \| The PPOD version number is for future expansion purposes; if the \| Optimization Section ever changes semantically or syntactically, the \| version number shall change so that consumers can recognize the \| difference. Optimization Symbols Section Page 4 DEFINITION 14 February 1997 \| In addition to the PPODE_STAMP and PPODE_END kinds, a number of \| additional kinds of data will be defined from time to time. The \| following kinds are currently anticipated; however, the actual \| specifications are (will be) given in separate documents: \| \| PPODE_SEM_EVENT 3 Semantic Event Descriptions \| PPODE_INLINE_INST 4 Inline Instance Descriptions \| PPODE_INLINE_LOC 5 Inline Locator Mapping Descriptions \| \| \| It is expected that there will typically be a natural correlation \| between index entries and the data parts: the first (non-version \| stamp) entry describes the first data part, the second descriptor \| describes the second data part, and so on. However, this cannot be \| assumed. In addition, it is invalid to assume an ordering of the \| index entries. \| \| \| Depending on the kind of data involved, it may be valid to have more \| than one entry with the same tag field value; that is, in general it \| is not valid to regard the tag field as a unique key. 3.3 The Data Parts The contents and format of the data parts are arranged by private agreement between the producers and consumers of the particular kind of data part. 4 RELATIONSHIP WITH OTHER COFF STRUCTURES The Optimization Symbols section is organized and indexed in much the same way as the Locals Symbols section: - The Optimization Symbols section is a single section within the COFF file. - The COFF header (HDRR) contains the starting offset and maximum index into the Optimization symbols section, cbOptOffset and ioptMax, respectively. The cbOptOffset value is the file offset of the first byte in the Optimization \| Symbol table. This value must always be aligned on an \| octaword (16-byte) boundary. The ioptMax value is the count of the total number of bytes in the entire Optimization \| Symbols section, including index entries, data and padding \| bytes. Thus, this value is always a multiple of 16 bytes. Optimization Symbols Section Page 5 RELATIONSHIP WITH OTHER COFF STRUCTURES 14 February 1997 - Each File Descriptor (FDR) has a pointer and size contribution to the Optimization Symbols section, named ioptBase and copt, respectively. The ioptBase value is the byte offset from the start of the Optimization Symbols section to this file's optimization symbols. The copt value is the total number of bytes in the Optimization Symbols \| section which are contributed by this file, including index \| entries, data and padding bytes. This implies that a file's contribution to the Optimization Symbols section must be contiguous, even if the procedures in that file are not. - Each Procedure Descriptor (PDR) has a pointer to the start of that procedure's contribution to the Optimization Symbols section, named iopt. This offset points to the start of a unique and complete Optimization Procedure Descriptor relative to the beginning of it containing FDR contribution. There is at most one Optimization Procedure Descriptor per routine. A procedure's Optimization Descriptor can be found using this formula: HDRR.cbOptOffset + FDR.ioptBase + PDR.iopt 5 PROCESSING Tools which produce COFF object files must produce either an empty Optimization Symbols section or a valid Optimization Symbols section. An image with no Optimization Symbols section has HDRR.cbOptOffset and HDRR.ioptMax values of zero. A file with no contribution to an \| Optimization Symbols section has a FDR.copt value of zero, in which \| case every PDR within it has a PDR.iopt value of zero. If a FDR has a \| contribution, then every procedure contained within it must have a \| contribution pointed to by its PDR.iopt field, even if that \| contribution consists only of the minimum pair of \| PPODE_STAMP/PPODE_END index entries. \| \| ISSUES \| \| \| 1. It does not work for a PDR.iopt of zero by itself \| to indicate no contribution because that (validly) \| points to the beginning of the FDR's contribution. \| Unfortunately, there is no PDR.copt (length) field \| analgous to the length in HDRR and FDR structures. \| Hence this "minimum contribution" convention. \| \| 2. Assuming this convention, would it be OK for \| multiple procedures to share the same such \| contribution. In which case, we could require \| that each FDR contrubution must begin with a \| minimum PPODE_STAMP/PPODE_END pair, so PDR.iopt == \| 0 does imply no contribution...? Optimization Symbols Section Page 6 PROCESSING 14 February 1997 Tools which consume COFF object files must be capable of skipping the entire Optimization Symbols section, or those parts of it which it does not understand. Tools which both read and write COFF object files must consume a valid Optimization Symbols section (if one exists in an input file) and produce an equivalent, valid Optimization Symbols section in its output file. This means one of the following: - The tool does not know how to process anything in the Optimization Symbols section. The tool must write an exact copy of any Optimization Symbols section it reads in. In other words, it must allow the Optimization Symbols section to pass through unchanged. - The tool recognizes some kinds of data parts. This tool must copy, unchanged, the data parts (and descriptors) that it does not understand. The tool must read (and if necessary, transform) the data parts (and descriptors) that it does understand and write the equivalent data parts (and descriptors). 6 COORDINATION It is the responsibility of every producer to obtain a unique kind value for each distinct kind of data it wishes to place in the Optimization Symbols section. Obtaining a unique kind value ensures that there won't be two producers using the same kind value to mean different things. Obtaining a unique kind value is accomplished by adding a new constant definition to the include file (currently /usr/include/symconst.h) which defines the format of the Optimization Symbols section. Information in the Optimization Symbols section, though arranged by private agreement between producers and consumers, is meant to be shared among all consumers and producers when it makes sense to do so. Producers and consumers are jointly responsible for ensuring that the data parts they write and read, respectively, are also recognized and processed by other tools if those tools could have an impact on the information in the data part. For example, if a compiler generates a different kind of PC-line correlation table which is used by a debugger, and an instruction-modifying tool makes changes (insertions and deletions) to the instruction stream, the compiler and debugger should both lobby the modifying tool to keep the different PC-line correlation table up to date.
33.4	Proposed Semantic Event Representation	GEMGRP::BRENDER	Ron Brender	`Fri Feb 14 1997 14:13`	144
	DBGOPT Semantic Event Representation using the UNIX COFF Optimization Section Ron Brender \| (Jeff Nelson) \| 14 February 1997 (Revision 3) Greg Lueck, Jeff Nelson and Mike Rickabaugh have proposed a storage representation and semantics for giving form and substance to the currently unused "Optimization Symbols" section in the COFF/Third Eye symbol table. This document builds on that framework to propose a specification for how to represent semantic event information. This information will be generated by GEM-based compilers and used by Ladebug (and ignored by dbx). 1 OVERVIEW OF SEMANTIC EVENTS Semantic events are those points in a program where the user-visible and user-relevant semantic actions of a program actually occur. For example, for an assignment statement, the instruction that stores into a user declared variable is generally the location of a semantic event (the event temporally occurs when that instruction is executed). Semantic event locations are generally divided into these kinds: assignments control points (conditional transfers) calls (and return, including PALcalls) labels Not all instructions that effect these operations are necessarily visible or even interesting to users. For a complete description of how the actual set of semantic event locations is determined, see "What Every Front-End Should Know About Debugging Optimized Code", \| which can be found in TURRIS::AD-PROJECTS note 74.6. 2 SEMANTIC EVENT REPRESENTATION Semantic events are represented using a semantic event kind of \| subsection (PPODE_SEM_EVENT == 3) in the Per Procedure Optimization \| Description for a procedure. There will be one instance of such a subsection that describes the semantic event information for the entire procedure. Semantic Event Representation Page 2 SEMANTIC EVENT REPRESENTATION 14 February 1997 A semantic event subsection consists of an array of Semantic Event Entries (for the entire procedure) where: - The length of the array is specified by the Size field in the subsection descriptor, and - Each element of the array is a Semantic Event Entry defined as described below. Each Semantic Event Entry is a byte consisting of two 4-bit fields: 7 4 3 0 +-------+-------+ \| Event \| Count \| +-------+-------+ where . Event is a 4-bit code that indicates the event being described: 0 None (used for a Count of 16 or more, see below) 1 Write (assignment) event 2 Control event 3 Call event 4 Label event 5 Instruction level only 6 Prolog End (first instruction following) 7 Epilog Begin (first instruction in) 8-15 (Reserved for future use) . Count is a 4-bit field with a value in the range 0 to 15 indicating the number of executable instructions following the previous event description to which this event applies. If more than 15 instructions separate events, then multiple event entries that indicate the null event are used to add up to the required separation. If more than one event applies to the same instruction, then the first event is encoded with the appropriate Count to "get to" the instruction and subsequent events are encoded using a Count of 0. NOTE The encoding of this field is not identical to the encoding of the Count field of a Line Number Entry. This Count encodes the values from 0 to 15 rather than 1 to 16. Semantic Event Representation Page 3 SEMANTIC EVENT REPRESENTATION 14 February 1997 The first semantic event of each procedure must be a Label event with a Count of zero. The address in the text section for this first instruction is specified in the Procedure Descriptor Entry that points to the containing Optimization Section. Typically (but not necessarily), the last Semantic Event Entry will consist of the value 0x3n corresponding to the last RET instruction of the routine. There is no need to "describe" any out-of-line code or padding NOP instructions that may occur at the end of a routine following the last RET so long as they contain no semantic event locations. APPENDIX A ADDITIONS TO SYM*.H FILES ...Changes to sym.h and symconst.h are TBD...
33.5	Proposed Representation for Split Lifetime Info	GEMGRP::BRENDER	Ron Brender	`Fri Feb 14 1997 16:02`	547
	Representation of Split Lifetime Information using Digital Unix COFF Ron Brender \| (Jeff Nelson) \| 14 February 1997 (Revision 2) This document proposes changes to the Digital UNIX COFF on-disk symbol table format which are required to support split lifetime variables. For background information (such as just what is a split lifetime), see "What Every Front End Should Know About Debugging Optimized Code" \| by Brender and Nelson. A copy is posted in TURRIS::AD-PROJECTS note \| 74.6. \| \| NOTE \| \| This revision differs significantly from the previous \| (dated 28 August 1995) only in that it eliminates the \| need/use of the scSymRef storage class. \| 1 OVERVIEW The split lifetime variable description is designed to supplement an existing symbol description. This is a change from earlier split lifetime descriptions, which attempted to completely replace one definition with another. There are several reasons why split lifetime needs to supplement, not entirely replace, a symbol's description. The most important one is that the variable may be split in a compilation unit which is independent from the compilation unit which declares the variable. For example, consider a global variable. It is declared once, but there are potentially many independent compilation units which manipulate the variable. Because each compilation unit is independent, it is not possible to replace the global definition, because each compilation would have to know about the others in order to give a complete replacement definition. Other significant but relatively less important reasons are due to limitations and assumptions about COFF and the Third Eye symbol table format. For example, text relocations (for relocating PC values) can only occur in the Locals symbol table and even then, are only meaningful when they occur in the same context as the compilation unit. This means that text relocations in synthesized files (e.g., for FORTRAN COMMON) don't work. Split Lifetime Representation Using COFF Page 2 OVERVIEW 14 February 1997 As before, the entire split lifetime description can be skipped by consumers who choose to ignore it. Those consumers will have some understanding of the variable (its name, type, and scope in which it appears), though less-accurate understanding of the symbol's address. The remainder of this document describes the new format. 2 SPLIT LIFETIME VARIABLE DESCRIPTION The split lifetime on-disk format for a program variable consists of: 1. A header symbol. 2. A referee symbol. 3. A list of one or more lifetime descriptions, called child descriptions. 4. A trailer symbol. A more detailed description using the following example is given below. 0. ( 0)( 0) docfe.f File Text symref 51 1. ( 1)(0x120001810) docfe_ Proc Text [24] 2. ( 2)( 0) A Param Unalloc [26] \| 3. (-4)( 0) N Param Unalloc [10] \| 4. ( ?)( ?) Block Text symref 99 \| 5. ( 2)( 5) N Split Info symref 23 \| 6. ( 2)( 5) N Local scInfo [->symref 3] 7. ( 2)(0x11) N Param VarRegister [10] 8. ( 2)(0x120001818) N Split Text symref 10 9. ( 1)( 0) N End Text symref 8 10. ( 1)( 0) N Param VarRegister [10] 11. ( 1)(0x12000181c) N Split Text symref 13 12. ( 0)( 0) N End Text symref 11 13. ( 0)(0x11) N Param VarRegister [10] 14. ( 0)(0x120001828) N Split Text symref 16 15. (-1)( 0) N End Text symref 14 16. (-1)( 0) N Param VarRegister [10] 17. (-1)(0x12000182c) N Split Text symref 19 18. (-2)( 0) N End Text symref 17 19. (-2)( 0) N Param VarRegister [10] 20. (-2)(0x120001838) N Split Text symref 22 21. (-3)( 0x8) N End Text symref 20 22. (-4)( 0) N End Info symref 5 Example 1. Split Lifetime Description of Parameter N. Split Lifetime Representation Using COFF Page 3 SPLIT LIFETIME VARIABLE DESCRIPTION 14 February 1997 2.1 Header Symbol The split lifetime header symbol is identified by the symbol type and class pair (stSplit,scInfo). In Example 1, symbol entry 5 denotes the beginning of the split lifetime description for the program variable N. The symbol value field contains a count of the number of child descriptions, which in the example is 5. The symbol index field is a forward symbol reference which points to the symbol just after the end of the split lifetime definition. Consumers which choose not to support the split lifetime description should recognize the stSplit/scInfo symbol table entry and skip the entire description using the symbol index. (Note that consumers which skip the split lifetime description will still see a symbol definition, which in this example occurs at entry 4). 2.2 Referee Symbol The referee symbol is the symbol entry which refers to the symbol to which the split lifetime information applies. There are two kinds of symbols which can be referred to. \| \| \| 1. Non-global Variables \| \| The referee symbol type and class are stLocal and scInfo, \| respectively. The referee symbol index is an RNDX AUX entry which \| refers to the file and symbol offset of the variable's declaration. \| \| \| In most typical cases, the file value will be the same as the current \| file. In the case of FORTRAN COMMON variables, the file value will \| refer to the symthesized file that represents the common. Symbol entry 6 in Example 1 illustrates a symbol reference to a parameter variable. 2. Global Variables Even though a variable is global (and therefore static), the compiler may perform "split loads" and "split flushes" on it to temporarily allocate the variable in a register. Symbols in the Externals symbol table cannot be referred to using the previous referee mechanism because there is no way to refer to a global symbol using an AUX RNDX. Thus, the referee symbol type and class are stGlobal and scInfo. The referee symbol index is undefined and must be zero. The symbol type informs the consumer that the variable is in the Externals table; the symbol name is then used as the identifying key to locate the definition. Split Lifetime Representation Using COFF Page 4 SPLIT LIFETIME VARIABLE DESCRIPTION 14 February 1997 2.3 Default Lifetime When a split lifetime is used to enhance an existing symbol description, the problem arises of what to do with the original description. This is handled by defining a meaning for the value field in the referee symbol. The symbol value is a bitmask. The meaning of the bits is as follows: Bits Value Meaning ---- ----- ------- 0 0 Do not use target of reference as default 1 Use target of reference as default 1-63 MBZ Bits 1 through 63 are reserved and must be zero If bit zero of the referee value field is set, then the target of the reference should be retained as the "default" representation. That is, the target symbol's type, class and address are active whenever a split child isn't. If bit zero of the referee value field is clear, then the target symbol should NOT be used as the default representation; instead, a default representation of "unallocated" should be used by the symbol table consumer. \| \| Possible Alternative \| \| Rather than use the referee symbol value field to \| encode the default vs no-default distinction (a \| certainly novel, if perhaps unnatural choice), we \| could use other pairing of st/sc values. For example, \| we could use \| \| stLocal,scInfo Use default \| stGlobal,scInfo Use default \| stLocal,scNil Don't use default \| stGlobal,scNil Don't use default \| \| Thoughts? \| 2.4 Child Descriptions Each child description is a 3-tuple of symbol table entries. The first symbol of the tuple, called the child symbol, is a standard symbol table entry. However, the only symbol types which may appear are stStatic, stGlobal, stParam and stLocal, because these are the symbol types which define program variables which can be split by a producer (compiler); the symbol classes are those which are already defined and paired with the preceding symbol types. The interpretation of the value and index fields of the child symbol is no different than the already-existing rules for interpretation of the appropriate symbol type and class pair. Split Lifetime Representation Using COFF Page 5 SPLIT LIFETIME VARIABLE DESCRIPTION 14 February 1997 The second symbol of the tuple, called the low PC symbol, defines the low bound of the PC range over which the child description is active. The symbol type and class are stSplit and scText, respectively. The symbol value is the PC address of the lower bound of the lifetime range. Because this value is an address in the text section, it is automatically relocated by the linker. The symbol index is a forward symbol reference which points one past the end of the child description. The third symbol of the tuple, called the high PC symbol, defines the high bound of the PC range over which the child description is active. The symbol type and class are stEnd and scInfo, respectively. The symbol value is an offset from the value of the low PC symbol. The actual upper bound value is computed by the expression: (low PC symbol)->value + (high PC symbol)->value The symbol index of the high PC symbol is a backward symbol reference which points to the low PC symbol to which it is paired. The low PC and high PC values together define an address range over which the child symbol is said to be active. That is, when the program is loaded and running and the program counter is within the address range: low PC value <= current PC <= high PC value then the child symbol describes the address of the split lifetime variable. Note that the address range is inclusive of the endpoints. In Example 1, there are 5 split children which are decoded as follows: from PC 0x120001818 to PC 0x120001818: N is a VarRegister parameter in register 0x11 from PC 0x12000181c to PC 0x12000181c: N is a VarRegister parameter in register 0x00 from PC 0x120001828 to PC 0x120001828: N is a VarRegister parameter in register 0x11 from PC 0x12000182c to PC 0x12000182c: N is a VarRegister parameter in register 0x00 from PC 0x120001838 to PC 0x120001840: N is a VarRegister parameter in register 0x00 Consumers may not make assumptions about the order in which child descriptions appear. Consumers may not make assumptions about the address ranges of the child descriptions. In particular, the address range of two or more split children may overlap. Split Lifetime Representation Using COFF Page 6 SPLIT LIFETIME VARIABLE DESCRIPTION 14 February 1997 2.5 Trailer Symbol The trailer symbol is identified by the symbol type and class pair (stEnd,scInfo). In Example 1, symbol entry 21 denote the end of the split lifetime description for program variable N. The symbol value field is undefined and must be zero. The symbol index field is a backward symbol reference which points to the beginning of the split lifetime description. 2.6 Assertions The following statements are all true about a given split lifetime description: - All symbol entries used to describe a split lifetime description have the same name. In fact, producers may choose to emit the same offset into the strings table for each symbol entry. - All split children have the same symbol type (e.g., stParam). In other words, producers are not allowed change the symbol type depending on the PC range. - All split children have the same symbol type (e.g., offset into the AUX table). In other words, producers are not allowed to change the variable's type depending on the PC range. - The PC address range of a split child may overlap with the range of one or more other split child ranges. If this occurs, then more than one split child is active within the overlapping range. - There is no significance to the order of child descriptions. - Though the split lifetime structure mirrors that of a block structure, there is no explicit or implicit scope defined by the structure. - The split lifetime description does NOT introduce a new variable, nor does it introduce a new name in the current scope. Consumers must take care not to treat it as a variable declaration. Split Lifetime Representation Using COFF Page 7 SPLIT LIFETIME VARIABLE DESCRIPTION 14 February 1997 2.7 Summary To summarize, the following new symbol type and symbol class combinations are introduced by the implementation of split lifetime support: Symbol Symbol Type Class Meaning ------ ------ ------- stSplit scInfo Begins a split lifetime description of a parameter, local variable, or a global variable. The symbol value field is a count of the number of split children. The default child is excluded from this count. The symbol index field points to the fallback description. stSplit scText Defines a split lifetime low PC value. The symbol value field is a PC address. The linker automatically relocates this value. The symbol index field points to the high PC definition (always the next entry). stSplit scAbs Defines a split lifetime low PC value. The symbol value field is a non-relocatable PC address. The symbol index field points to the high PC definition (always the next entry). stEnd scText Defines a split lifetime high PC value. The symbol value field is a non-relocatable PC offset relative to the low PC value. The symbol index field points to the low PC definition (always the previous entry). (Note that this is not a new type/class combination, but its use to define a high PC value is new.) stEnd scAbs Defines a split lifetime high PC value. The symbol value field is a non-relocatable PC address. The symbol index field points to the low PC definition (always the next entry). (Note that this is not a new type/class combination, but its use to define a high PC value is new.) stEnd scInfo Ends a split lifetime definition. The symbol value field is undefined and must be zero. The symbol index field points back to the beginning of the split lifetime definition. (Note that this is not a new type/class combination, but its use to end a split lifetime description is new.) Split Lifetime Representation Using COFF Page 8 SPLIT LIFETIME VARIABLE DESCRIPTION 14 February 1997 \| stLocal scInfo Referee symbol within a split lifetime description. The symbol value field is a bitmask as defined above. The symbol index field points to the symbol being extended by the split lifetime description. \| stGlobal scInfo Referee symbol within a split lifetime \| description. The symbol value field is a \| bitmask as defined above. The symbol index \| field is zero. (The name is used to lookup \| the appropriate symbol in the external symbols \| section.) 3 SPLIT LIFETIME EXAMPLES Different language constructs have different representations in the COFF/STABS symbol table. This section documents how split lifetime referee symbols should be used to refer to common and language-specific representations. 3.1 General Encodings This section describes the encodings applicable to all languages. 3.1.1 Global Variables - Global variables are those whose definitions appear in the Externals symbol table. A split lifetime description which extends the global definition by necessity must appear in the Locals symbol table (so the PC values for the child lifetimes will be properly relocated). The split lifetime referee symbol is encoded using symbol type stGlobal and symbol class scInfo. The name of the referee symbol must be the same as the name of the symbol in the Externals table to which the referee symbol refers, so that consumers can match the split lifetime description with the correct global symbol. 3.1.2 Local Variables - Local variables are those declared in the scope of a routine (perhaps nested within one or more blocks). A split lifetime description which extends the local variable symbol entry must occur after the symbol entry and must be in the same scope as the local variable symbol entry. The split lifetime referee symbol \| is encoded using symbol type stLocal and symbol class scInfo. The \| symbol index is an RNDX AUX entry which refers to the file and symbol \| offset of the variable's declaration. Split Lifetime Representation Using COFF Page 9 SPLIT LIFETIME EXAMPLES 14 February 1997 3.1.3 Routine Parameters - Split lifetime extensions of routine parameters occur after the routine's block begin symbol; the extensions do not appear adjacent to the parameter symbols. The split lifetime referee symbol is encoded using symbol type stLocal and \| symbol class scInfo. The symbol index is an RNDX AUX entry which \| refers to the file and symbol offset of the variable's declaration. 3.2 Language-Specific Encodings This section describes the split lifetime encodings that are specific to one or more languages. 3.2.1 Imported Symbols - Many languages support the notion of importing code and data from one file or compilation unit into another. There are two ways this is done: by a textual include (the "#include" preprocessor directive in C and C++) or by importing by a language-provided statement for that purpose (Ada's "with" and "use" statements, FORTRAN 90's "use" statement). 3.2.1.1 Textual Includes - Symbols which are imported by textual include commands appear in the symbol table in one of two places: either in the "includer" file (i.e, the file containing the include command) or in the "includee" file (i.e., the file being included). Regardless of where the producer (compiler) defines the symbol, the split lifetime referee symbol by definition must point to the place where the producer declares the symbol. Thus, the referee symbol type and class are stLocal and scInfo, respectively, and the index is an AUX RNDX value pointing to the declaration. 3.2.1.2 Statement Includes - Symbols which are imported by language-defined include statements can also be split by the compiler. The split lifetime description must occur in the Local symbol table in the context of the routine which accesses the imported variable and which causes it to be split. The split lifetime referee symbol type and class are stLocal and scInfo, respectively. The symbol index is an AUX RNDX value which points to the imported symbol's declaration. Split Lifetime Representation Using COFF Page 10 SPLIT LIFETIME EXAMPLES 14 February 1997 3.2.2 FORTRAN Entrypoint Parameters - Parameters to FORTRAN entrypoints are described in the symbol table immediately after the entrypoint symbol. If two or more entrypoints (including the main subroutine entry) declare the same parameter (i.e, same named parameter), then the split lifetime description always refers to the first. \| \| Possible Alternative \| \| Perhaps it would make sense in this case for there to \| be multiple referee symbols? \| 3.2.3 FORTRAN COMMON Symbols - FORTRAN COMMON members are described in the symbol table using three constructs: 1. A global symbol in the Externals table. This symbol has the same name as the COMMON (with an underscore appended); its address is the static base address of the COMMON. 2. A synthesized file. This file has the same name as the global symbol; it contains a C-structure where the elements of the structure are the members of the COMMON. Each element's value is an offset (in bits) from the static base address of the global symbol. 3. A symbol in the Local symbol table; this appears at the lexical scope at which the COMMON is declared. The symbol has the same name as the global symbol. A split lifetime description of a FORTRAN COMMON member must appear in the same scope as, but after the COMMON symbol in the Locals symbol table. The split lifetime referee symbol type and class are stLocal and scInfo, respectively. The symbol index is an AUX RNDX value which points to the COMMON member definition in the synthesized file.
33.6	Other DBGOPT needs and ideas	GEMGRP::BRENDER	Ron Brender	`Tue Mar 04 1997 11:43`	159
	In addition to the proposals presented in 33.3 thru 33.5, a number of additional proposals can be expected to deal with inline routine expansion and other debugging optimized code challenges. This note tries to give a brief sketch of some of the needs and ideas that are currently being considered. 1. Inline Expansion Scope Block The implementation of inline call expansion is fundamentally one of replacing a call with a copy of the body of the called routine, complete with copied versions of local variables, etc. A straightforward representation builds on this fact, using a new st code, as in stInlineBlock/scText. Parameters become initialized locals of the new block. Key attributes of this block are: - Pointer to the called routine STE - Location (possibly just the line number) of the call - list of "end prolog" addresses - list of "begin epilog" addresses "End prolog" and "begin epilog" in this context are misnomers, but the name does suggest their purposes: the addresses where breaks should occur when stepping "into" the called routine and about to "return" from the called routine. Because of the effects of optimization, these are not usefully regarded a single locations -- a list is required for each. A particular representation has yet to be proposed, but something like stInlineBlock/scInfo might be used to provide some structure to the initial sequence of STEs within the inline block. 2. Disjoint Range Extenders Even in non-optimized code, it is valuable to be able to represent routines and scopes that are not a single contiguous range of addresses; for optimized code this is essential! A possible representation is to include stExtendedRange/scText, stEnd/scText pairs within a scope (routine, block, inline block,...) to indicate additional ranges of addresses that are part of the containing scope. 3. Scheduled Code Masks Scheduling of inlined code can lead to lots of small disjoint extended ranges for an inlined block. A representation that addresses this might be something like stEndMask/scText as an alternative to stEnd/scText, where the value is interpreted as a bit mask for up to 64 instructions, beginning at the scope start, which are in fact part of the scope. Interlude Combining 1, 2 and 3 might lead to a symbol table structure sorta kinda like... !Start the inlined block... stInlineBlock scText <to end+1> !Then zero or more "parameters"... stParam scXxx !parameter ... !Pointer to callee routine... stInlineBlock scInfo <aux RNDX to callee STE> !Location of call stInlineBlock scAbs value=lineno of call !Then zero or more prolog addresses... stInlineBlock scText -1 ! End prolog address, ! no associated stEnd !Normal STEs for "copies" of the local variables of the called routine... ... !Then zero or more eplog addresses... stEnd scText -1 ! Begin epilog address, ! does not end inline !Then zero or more extended range pairs... stExtendedRange scText stEnd scText ! and/or stExtendedRange scText stEndMask scAbs !Finally, the closing end... stEnd scText <to beginning> !real end of inline 4. Source/Object Mapping (the Line Number Table) In general the program source line number to object code instruction mapping must be many-to-many. It may be simplest conceptually to regard this as two distinct mappings - Source to Object - Object to Source each of which is one-to-many. Moreover, the current inability to cross source file boundaries in the line number table must be overcome -- even to deal properly with non-optimized code. No particular suggestions are offered here at the moment, but a radical overhaul is definitely needed. One starting point might be to effectively eliminate the current role of File Descriptors by collapsing them all into a single virtual "Source" file that applies to the entire compilation (just to keep the current superstructure intact for the benefit of old tools) and then introduce additional new structures that provide the needed expressive power. Much work needed here... 5. Strength Reduced Variables A minor target of opportunity concerns strength reduced variables, where an index of a loop that steps thru an array is replaced by stepping thru the addresses of the array itself. The loop index is easily backed computed as a linear function (subtract the base of array, divide by the stride or element size) of the real control address. How to represent this? Meta Question These proposals (including the split lifetime proposal) build on the existing STE "style" rather than exploit the optimization section because: - It is necessary to represent code addresses within a procedure (and get them relocated) - It is necessary to point to AUX entries (in a reliable way) The question is, whether and to what extent it is possible to do both of these from within the DBGOPT section in a way that does not require ld to learn how to parse/interpret that new section. I think I understand how to handle code addresses, assuming that the debugger performs the final relocation when it reads the DBGOPT section. I am unclear how to handle AUX entries in a way that cannot be compromised by ld processing (eg merging). Perhaps a small group can help me brainstorm possibilities here...