|
Optimization Symbols Section
Ron Brender
| (Jeff Nelson)
| 14 February 1997 (Rev)
ABSTRACT
This document defines the syntax and semantics of the Third Eye
Symbols section known as the "Optimization Symbols" section. This
section, though it has a definition in /usr/include/sym.h, is
currently unused on the Digital Unix platform.
1 MOTIVATION
The Third Eye symbol table embedded within a COFF object file is
unwieldy and difficult to extend without coordination and cooperation
between lots of consumers and producers. This is further compounded
by two factors: first, not all consumers and producers are
necessarily known. Second, the extensions that a producer or consumer
wants are not necessarily compatible with the needs of any other
producer or consumer.
2 PURPOSE
The definition of the Optimization Symbols section is designed to ease
these problems. It gives individual producers and consumers the
ability to communicate information about any aspect of the object file
in any form they choose. This allows for new or modified
descriptions, while keeping the rest of the symbol table unchanged.
New information can be generated at any time without coordination
between all producers and consumers, though eventually a minimal
amount of coordination is required.
In fact, it is recommended that the information in the Optimization
Symbols section eventually be folded back into the mainstream symbol
description. This is so all producers and consumers can take
advantage of the information which up to now has presumably been
private between one or more producers and consumers.
It is assumed that the information in the Optimization Symbols section
does not contradict or fundamentally violate any understanding of the
object that is given in any other part of the symbol table. In other
words, it is not OK to "lie" in the mainstream symbol description and
tell the "truth" in the Optimization Symbols section. It is OK,
however, to modify or enhance the description of the main symbol
table. This is the intended purpose of the section.
Optimization Symbols Section Page 2
PURPOSE 14 February 1997
| Since the prior version of this proposal was prepared (8 August 1995),
| the UNIX Object File and Symbol Table Working Group has adopted a
| specification for the .comment section that is substantially similar
| in goals and mechanism. This revision recasts this proposal in a
| manner that is more consistent with the .comment section in
| representation and processing.
3 DEFINITION
| The Optimization Section consists of a sequence of zero or more "per
| procedure optimization descriptions" (PPOD). Each PPOD is pointed to
| by the iopt field of the procedure descriptor (PDR) to which it
| applies (as explained later). While each PPOD has an internal
| structure much like that of a .comment section, there is no
| meta-structure or wrapper that collects all the PPODs of a file
| together. (There are existing pointer and size fields of the COFF
| file header (HDRR) and File Descriptors (FDR) that are used in the
| usual way to describe the aggregate of all PPODs; more later.) In
| particular, it is intended that the linker (ld) be able to construct
| the Optimization Section of an output image much like it constructs
| the local symbol table -- as the concatenation of the Optimization
| Sections of the constituent object files. Unlike the local symbol
| table, however, it is intended that the linker need not interpret
| and/or modify the contents of a PPOD in any way.
|
|
| Each constituent PPOD of the Optimization Section has a structure that
| is analogous to a comment section:
|
| o A leading sequence of TLV "index" entries that describe the
| location and parts of the PPOD, followed by
|
| o A raw data area containing the actual optimization
| descriptions.
|
|
|
| 3.1 PPOD Index Entry Structure
|
| Each index entry has the following structure:
|
| =================start definition===============
|
| typedef struct {
| unsigned int ppode_tag;
| unsigned int ppode_len;
| unsigned long ppode_val;
| } PPODHDR;
|
| ================= end definition ===============
Optimization Symbols Section Page 3
DEFINITION 14 February 1997
| where
|
| ppode_tag Identifies the kind of data described by the entry.
|
| ppode_len Indicates the size of the data, in bytes, which is
| found in the free form data area of this same PPOD.
| When this field is zero, then the only data is that
| found in the ppode_val field.
|
| ppode_val Is, or describes the location of, the data of the
| given kind. When ppode_len is zero, this field
| contains the (only) data itself. When ppode_len is
| non-zero, this field is a relative file offset from
| the beginning of the current PPOD to the applicable
| data area.
|
|
| The start of all data allocated in the free-form area must be octaword
| (16-byte) aligned. (Recall that the Optimization Section itself is
| octaword aligned.) It follows (and is required) that each distinct
| PPOD must be octaword aligned as well. The length stored in ppode_len
| need not be an octaword multiple, but when it is not, padding with
| zero-bytes must be appended to the end of the data item.
|
|
|
| 3.2 PPOD Index Entry Kinds
|
| Every PPOD must contain at least two PPOD Entries: the first and last
| of which must be:
|
| Tag Value Interpretation
|
| PPODE_STAMP 1 Identifies the version number of the PPOD.
| ppode_len must be zero, and ppode_val contains
| the version number (initially 1).
|
| =================start definition===============
|
| #define PPOD_VERSION = 1; /* current version number */
|
| ================= end definition ===============
|
| PPODE_END 2 Indicates the end of the PPOD Entries for this
| PPOD. (Both ppode_len and ppode_val must be
| zero.)
|
|
| The PPOD version number is for future expansion purposes; if the
| Optimization Section ever changes semantically or syntactically, the
| version number shall change so that consumers can recognize the
| difference.
Optimization Symbols Section Page 4
DEFINITION 14 February 1997
| In addition to the PPODE_STAMP and PPODE_END kinds, a number of
| additional kinds of data will be defined from time to time. The
| following kinds are currently anticipated; however, the actual
| specifications are (will be) given in separate documents:
|
| PPODE_SEM_EVENT 3 Semantic Event Descriptions
| PPODE_INLINE_INST 4 Inline Instance Descriptions
| PPODE_INLINE_LOC 5 Inline Locator Mapping Descriptions
|
|
| It is expected that there will typically be a natural correlation
| between index entries and the data parts: the first (non-version
| stamp) entry describes the first data part, the second descriptor
| describes the second data part, and so on. However, this cannot be
| assumed. In addition, it is invalid to assume an ordering of the
| index entries.
|
|
| Depending on the kind of data involved, it may be valid to have more
| than one entry with the same tag field value; that is, in general it
| is not valid to regard the tag field as a unique key.
3.3 The Data Parts
The contents and format of the data parts are arranged by private
agreement between the producers and consumers of the particular kind
of data part.
4 RELATIONSHIP WITH OTHER COFF STRUCTURES
The Optimization Symbols section is organized and indexed in much the
same way as the Locals Symbols section:
- The Optimization Symbols section is a single section within
the COFF file.
- The COFF header (HDRR) contains the starting offset and
maximum index into the Optimization symbols section,
cbOptOffset and ioptMax, respectively. The cbOptOffset value
is the file offset of the first byte in the Optimization
| Symbol table. This value must always be aligned on an
| octaword (16-byte) boundary. The ioptMax value is the count
of the total number of bytes in the entire Optimization
| Symbols section, including index entries, data and padding
| bytes. Thus, this value is always a multiple of 16 bytes.
Optimization Symbols Section Page 5
RELATIONSHIP WITH OTHER COFF STRUCTURES 14 February 1997
- Each File Descriptor (FDR) has a pointer and size
contribution to the Optimization Symbols section, named
ioptBase and copt, respectively. The ioptBase value is the
byte offset from the start of the Optimization Symbols
section to this file's optimization symbols. The copt value
is the total number of bytes in the Optimization Symbols
| section which are contributed by this file, including index
| entries, data and padding bytes. This implies that a file's
contribution to the Optimization Symbols section must be
contiguous, even if the procedures in that file are not.
- Each Procedure Descriptor (PDR) has a pointer to the start of
that procedure's contribution to the Optimization Symbols
section, named iopt. This offset points to the start of a
unique and complete Optimization Procedure Descriptor
relative to the beginning of it containing FDR contribution.
There is at most one Optimization Procedure Descriptor per
routine. A procedure's Optimization Descriptor can be found
using this formula:
HDRR.cbOptOffset + FDR.ioptBase + PDR.iopt
5 PROCESSING
Tools which produce COFF object files must produce either an empty
Optimization Symbols section or a valid Optimization Symbols section.
An image with no Optimization Symbols section has HDRR.cbOptOffset and
HDRR.ioptMax values of zero. A file with no contribution to an
| Optimization Symbols section has a FDR.copt value of zero, in which
| case every PDR within it has a PDR.iopt value of zero. If a FDR has a
| contribution, then every procedure contained within it must have a
| contribution pointed to by its PDR.iopt field, even if that
| contribution consists only of the minimum pair of
| PPODE_STAMP/PPODE_END index entries.
|
| ISSUES
|
|
| 1. It does not work for a PDR.iopt of zero by itself
| to indicate no contribution because that (validly)
| points to the beginning of the FDR's contribution.
| Unfortunately, there is no PDR.copt (length) field
| analgous to the length in HDRR and FDR structures.
| Hence this "minimum contribution" convention.
|
| 2. Assuming this convention, would it be OK for
| multiple procedures to share the same such
| contribution. In which case, we could require
| that each FDR contrubution must begin with a
| minimum PPODE_STAMP/PPODE_END pair, so PDR.iopt ==
| 0 does imply no contribution...?
Optimization Symbols Section Page 6
PROCESSING 14 February 1997
Tools which consume COFF object files must be capable of skipping the
entire Optimization Symbols section, or those parts of it which it
does not understand.
Tools which both read and write COFF object files must consume a valid
Optimization Symbols section (if one exists in an input file) and
produce an equivalent, valid Optimization Symbols section in its
output file. This means one of the following:
- The tool does not know how to process anything in the
Optimization Symbols section. The tool must write an exact
copy of any Optimization Symbols section it reads in. In
other words, it must allow the Optimization Symbols section
to pass through unchanged.
- The tool recognizes some kinds of data parts. This tool must
copy, unchanged, the data parts (and descriptors) that it
does not understand. The tool must read (and if necessary,
transform) the data parts (and descriptors) that it does
understand and write the equivalent data parts (and
descriptors).
6 COORDINATION
It is the responsibility of every producer to obtain a unique kind
value for each distinct kind of data it wishes to place in the
Optimization Symbols section. Obtaining a unique kind value ensures
that there won't be two producers using the same kind value to mean
different things. Obtaining a unique kind value is accomplished by
adding a new constant definition to the include file (currently
/usr/include/symconst.h) which defines the format of the Optimization
Symbols section.
Information in the Optimization Symbols section, though arranged by
private agreement between producers and consumers, is meant to be
shared among all consumers and producers when it makes sense to do so.
Producers and consumers are jointly responsible for ensuring that the
data parts they write and read, respectively, are also recognized and
processed by other tools if those tools could have an impact on the
information in the data part. For example, if a compiler generates a
different kind of PC-line correlation table which is used by a
debugger, and an instruction-modifying tool makes changes (insertions
and deletions) to the instruction stream, the compiler and debugger
should both lobby the modifying tool to keep the different PC-line
correlation table up to date.
|
|
Representation of Split Lifetime Information
using
Digital Unix COFF
Ron Brender
| (Jeff Nelson)
| 14 February 1997 (Revision 2)
This document proposes changes to the Digital UNIX COFF on-disk symbol
table format which are required to support split lifetime variables.
For background information (such as just what is a split lifetime),
see "What Every Front End Should Know About Debugging Optimized Code"
| by Brender and Nelson. A copy is posted in TURRIS::AD-PROJECTS note
| 74.6.
|
| NOTE
|
| This revision differs significantly from the previous
| (dated 28 August 1995) only in that it eliminates the
| need/use of the scSymRef storage class.
|
1 OVERVIEW
The split lifetime variable description is designed to supplement an
existing symbol description. This is a change from earlier split
lifetime descriptions, which attempted to completely replace one
definition with another.
There are several reasons why split lifetime needs to supplement, not
entirely replace, a symbol's description. The most important one is
that the variable may be split in a compilation unit which is
independent from the compilation unit which declares the variable.
For example, consider a global variable. It is declared once, but
there are potentially many independent compilation units which
manipulate the variable. Because each compilation unit is
independent, it is not possible to replace the global definition,
because each compilation would have to know about the others in order
to give a complete replacement definition.
Other significant but relatively less important reasons are due to
limitations and assumptions about COFF and the Third Eye symbol table
format. For example, text relocations (for relocating PC values) can
only occur in the Locals symbol table and even then, are only
meaningful when they occur in the same context as the compilation
unit. This means that text relocations in synthesized files (e.g.,
for FORTRAN COMMON) don't work.
Split Lifetime Representation Using COFF Page 2
OVERVIEW 14 February 1997
As before, the entire split lifetime description can be skipped by
consumers who choose to ignore it. Those consumers will have some
understanding of the variable (its name, type, and scope in which it
appears), though less-accurate understanding of the symbol's address.
The remainder of this document describes the new format.
2 SPLIT LIFETIME VARIABLE DESCRIPTION
The split lifetime on-disk format for a program variable consists of:
1. A header symbol.
2. A referee symbol.
3. A list of one or more lifetime descriptions, called child
descriptions.
4. A trailer symbol.
A more detailed description using the following example is given
below.
0. ( 0)( 0) docfe.f File Text symref 51
1. ( 1)(0x120001810) docfe_ Proc Text [24]
2. ( 2)( 0) A Param Unalloc [26]
| 3. (-4)( 0) N Param Unalloc [10]
| 4. ( ?)( ?) Block Text symref 99
| 5. ( 2)( 5) N Split Info symref 23
| 6. ( 2)( 5) N Local scInfo [->symref 3]
7. ( 2)(0x11) N Param VarRegister [10]
8. ( 2)(0x120001818) N Split Text symref 10
9. ( 1)( 0) N End Text symref 8
10. ( 1)( 0) N Param VarRegister [10]
11. ( 1)(0x12000181c) N Split Text symref 13
12. ( 0)( 0) N End Text symref 11
13. ( 0)(0x11) N Param VarRegister [10]
14. ( 0)(0x120001828) N Split Text symref 16
15. (-1)( 0) N End Text symref 14
16. (-1)( 0) N Param VarRegister [10]
17. (-1)(0x12000182c) N Split Text symref 19
18. (-2)( 0) N End Text symref 17
19. (-2)( 0) N Param VarRegister [10]
20. (-2)(0x120001838) N Split Text symref 22
21. (-3)( 0x8) N End Text symref 20
22. (-4)( 0) N End Info symref 5
Example 1. Split Lifetime Description of Parameter N.
Split Lifetime Representation Using COFF Page 3
SPLIT LIFETIME VARIABLE DESCRIPTION 14 February 1997
2.1 Header Symbol
The split lifetime header symbol is identified by the symbol type and
class pair (stSplit,scInfo). In Example 1, symbol entry 5 denotes the
beginning of the split lifetime description for the program variable
N. The symbol value field contains a count of the number of child
descriptions, which in the example is 5. The symbol index field is a
forward symbol reference which points to the symbol just after the end
of the split lifetime definition.
Consumers which choose not to support the split lifetime description
should recognize the stSplit/scInfo symbol table entry and skip the
entire description using the symbol index. (Note that consumers which
skip the split lifetime description will still see a symbol
definition, which in this example occurs at entry 4).
2.2 Referee Symbol
The referee symbol is the symbol entry which refers to the symbol to
which the split lifetime information applies. There are two kinds of
symbols which can be referred to.
|
|
| 1. Non-global Variables
|
| The referee symbol type and class are stLocal and scInfo,
| respectively. The referee symbol index is an RNDX AUX entry which
| refers to the file and symbol offset of the variable's declaration.
|
|
| In most typical cases, the file value will be the same as the current
| file. In the case of FORTRAN COMMON variables, the file value will
| refer to the symthesized file that represents the common.
Symbol entry 6 in Example 1 illustrates a symbol reference to a
parameter variable.
2. Global Variables
Even though a variable is global (and therefore static), the compiler
may perform "split loads" and "split flushes" on it to temporarily
allocate the variable in a register. Symbols in the Externals symbol
table cannot be referred to using the previous referee mechanism
because there is no way to refer to a global symbol using an AUX RNDX.
Thus, the referee symbol type and class are stGlobal and scInfo. The
referee symbol index is undefined and must be zero. The symbol type
informs the consumer that the variable is in the Externals table; the
symbol name is then used as the identifying key to locate the
definition.
Split Lifetime Representation Using COFF Page 4
SPLIT LIFETIME VARIABLE DESCRIPTION 14 February 1997
2.3 Default Lifetime
When a split lifetime is used to enhance an existing symbol
description, the problem arises of what to do with the original
description. This is handled by defining a meaning for the value
field in the referee symbol. The symbol value is a bitmask. The
meaning of the bits is as follows:
Bits Value Meaning
---- ----- -------
0 0 Do not use target of reference as default
1 Use target of reference as default
1-63 MBZ Bits 1 through 63 are reserved and must be zero
If bit zero of the referee value field is set, then the target of the
reference should be retained as the "default" representation. That
is, the target symbol's type, class and address are active whenever a
split child isn't. If bit zero of the referee value field is clear,
then the target symbol should NOT be used as the default
representation; instead, a default representation of "unallocated"
should be used by the symbol table consumer.
|
| Possible Alternative
|
| Rather than use the referee symbol value field to
| encode the default vs no-default distinction (a
| certainly novel, if perhaps unnatural choice), we
| could use other pairing of st/sc values. For example,
| we could use
|
| stLocal,scInfo Use default
| stGlobal,scInfo Use default
| stLocal,scNil Don't use default
| stGlobal,scNil Don't use default
|
| Thoughts?
|
2.4 Child Descriptions
Each child description is a 3-tuple of symbol table entries. The
first symbol of the tuple, called the child symbol, is a standard
symbol table entry. However, the only symbol types which may appear
are stStatic, stGlobal, stParam and stLocal, because these are the
symbol types which define program variables which can be split by a
producer (compiler); the symbol classes are those which are already
defined and paired with the preceding symbol types. The
interpretation of the value and index fields of the child symbol is no
different than the already-existing rules for interpretation of the
appropriate symbol type and class pair.
Split Lifetime Representation Using COFF Page 5
SPLIT LIFETIME VARIABLE DESCRIPTION 14 February 1997
The second symbol of the tuple, called the low PC symbol, defines the
low bound of the PC range over which the child description is active.
The symbol type and class are stSplit and scText, respectively. The
symbol value is the PC address of the lower bound of the lifetime
range. Because this value is an address in the text section, it is
automatically relocated by the linker. The symbol index is a forward
symbol reference which points one past the end of the child
description.
The third symbol of the tuple, called the high PC symbol, defines the
high bound of the PC range over which the child description is active.
The symbol type and class are stEnd and scInfo, respectively. The
symbol value is an offset from the value of the low PC symbol. The
actual upper bound value is computed by the expression:
(low PC symbol)->value + (high PC symbol)->value
The symbol index of the high PC symbol is a backward symbol reference
which points to the low PC symbol to which it is paired.
The low PC and high PC values together define an address range over
which the child symbol is said to be active. That is, when the
program is loaded and running and the program counter is within the
address range:
low PC value <= current PC <= high PC value
then the child symbol describes the address of the split lifetime
variable. Note that the address range is inclusive of the endpoints.
In Example 1, there are 5 split children which are decoded as follows:
from PC 0x120001818 to PC 0x120001818:
N is a VarRegister parameter in register 0x11
from PC 0x12000181c to PC 0x12000181c:
N is a VarRegister parameter in register 0x00
from PC 0x120001828 to PC 0x120001828:
N is a VarRegister parameter in register 0x11
from PC 0x12000182c to PC 0x12000182c:
N is a VarRegister parameter in register 0x00
from PC 0x120001838 to PC 0x120001840:
N is a VarRegister parameter in register 0x00
Consumers may not make assumptions about the order in which child
descriptions appear.
Consumers may not make assumptions about the address ranges of the
child descriptions. In particular, the address range of two or more
split children may overlap.
Split Lifetime Representation Using COFF Page 6
SPLIT LIFETIME VARIABLE DESCRIPTION 14 February 1997
2.5 Trailer Symbol
The trailer symbol is identified by the symbol type and class pair
(stEnd,scInfo). In Example 1, symbol entry 21 denote the end of the
split lifetime description for program variable N. The symbol value
field is undefined and must be zero. The symbol index field is a
backward symbol reference which points to the beginning of the split
lifetime description.
2.6 Assertions
The following statements are all true about a given split lifetime
description:
- All symbol entries used to describe a split lifetime
description have the same name. In fact, producers may
choose to emit the same offset into the strings table for
each symbol entry.
- All split children have the same symbol type (e.g., stParam).
In other words, producers are not allowed change the symbol
type depending on the PC range.
- All split children have the same symbol type (e.g., offset
into the AUX table). In other words, producers are not
allowed to change the variable's type depending on the PC
range.
- The PC address range of a split child may overlap with the
range of one or more other split child ranges. If this
occurs, then more than one split child is active within the
overlapping range.
- There is no significance to the order of child descriptions.
- Though the split lifetime structure mirrors that of a block
structure, there is no explicit or implicit scope defined by
the structure.
- The split lifetime description does NOT introduce a new
variable, nor does it introduce a new name in the current
scope. Consumers must take care not to treat it as a
variable declaration.
Split Lifetime Representation Using COFF Page 7
SPLIT LIFETIME VARIABLE DESCRIPTION 14 February 1997
2.7 Summary
To summarize, the following new symbol type and symbol class
combinations are introduced by the implementation of split lifetime
support:
Symbol Symbol
Type Class Meaning
------ ------ -------
stSplit scInfo Begins a split lifetime description of a
parameter, local variable, or a global
variable. The symbol value field is a count
of the number of split children. The default
child is excluded from this count. The symbol
index field points to the fallback
description.
stSplit scText Defines a split lifetime low PC value. The
symbol value field is a PC address. The
linker automatically relocates this value.
The symbol index field points to the high PC
definition (always the next entry).
stSplit scAbs Defines a split lifetime low PC value. The
symbol value field is a non-relocatable PC
address. The symbol index field points to the
high PC definition (always the next entry).
stEnd scText Defines a split lifetime high PC value. The
symbol value field is a non-relocatable PC
offset relative to the low PC value. The
symbol index field points to the low PC
definition (always the previous entry). (Note
that this is not a new type/class combination,
but its use to define a high PC value is new.)
stEnd scAbs Defines a split lifetime high PC value. The
symbol value field is a non-relocatable PC
address. The symbol index field points to the
low PC definition (always the next entry).
(Note that this is not a new type/class
combination, but its use to define a high PC
value is new.)
stEnd scInfo Ends a split lifetime definition. The symbol
value field is undefined and must be zero.
The symbol index field points back to the
beginning of the split lifetime definition.
(Note that this is not a new type/class
combination, but its use to end a split
lifetime description is new.)
Split Lifetime Representation Using COFF Page 8
SPLIT LIFETIME VARIABLE DESCRIPTION 14 February 1997
| stLocal scInfo Referee symbol within a split lifetime
description. The symbol value field is a
bitmask as defined above. The symbol index
field points to the symbol being extended by
the split lifetime description.
| stGlobal scInfo Referee symbol within a split lifetime
| description. The symbol value field is a
| bitmask as defined above. The symbol index
| field is zero. (The name is used to lookup
| the appropriate symbol in the external symbols
| section.)
3 SPLIT LIFETIME EXAMPLES
Different language constructs have different representations in the
COFF/STABS symbol table. This section documents how split lifetime
referee symbols should be used to refer to common and
language-specific representations.
3.1 General Encodings
This section describes the encodings applicable to all languages.
3.1.1 Global Variables - Global variables are those whose definitions
appear in the Externals symbol table. A split lifetime description
which extends the global definition by necessity must appear in the
Locals symbol table (so the PC values for the child lifetimes will be
properly relocated). The split lifetime referee symbol is encoded
using symbol type stGlobal and symbol class scInfo. The name of the
referee symbol must be the same as the name of the symbol in the
Externals table to which the referee symbol refers, so that consumers
can match the split lifetime description with the correct global
symbol.
3.1.2 Local Variables - Local variables are those declared in the
scope of a routine (perhaps nested within one or more blocks). A
split lifetime description which extends the local variable symbol
entry must occur after the symbol entry and must be in the same scope
as the local variable symbol entry. The split lifetime referee symbol
| is encoded using symbol type stLocal and symbol class scInfo. The
| symbol index is an RNDX AUX entry which refers to the file and symbol
| offset of the variable's declaration.
Split Lifetime Representation Using COFF Page 9
SPLIT LIFETIME EXAMPLES 14 February 1997
3.1.3 Routine Parameters - Split lifetime extensions of routine
parameters occur after the routine's block begin symbol; the
extensions do not appear adjacent to the parameter symbols. The split
lifetime referee symbol is encoded using symbol type stLocal and
| symbol class scInfo. The symbol index is an RNDX AUX entry which
| refers to the file and symbol offset of the variable's declaration.
3.2 Language-Specific Encodings
This section describes the split lifetime encodings that are specific
to one or more languages.
3.2.1 Imported Symbols - Many languages support the notion of
importing code and data from one file or compilation unit into
another. There are two ways this is done: by a textual include (the
"#include" preprocessor directive in C and C++) or by importing by a
language-provided statement for that purpose (Ada's "with" and "use"
statements, FORTRAN 90's "use" statement).
3.2.1.1 Textual Includes - Symbols which are imported by textual
include commands appear in the symbol table in one of two places:
either in the "includer" file (i.e, the file containing the include
command) or in the "includee" file (i.e., the file being included).
Regardless of where the producer (compiler) defines the symbol, the
split lifetime referee symbol by definition must point to the place
where the producer declares the symbol. Thus, the referee symbol type
and class are stLocal and scInfo, respectively, and the index is an
AUX RNDX value pointing to the declaration.
3.2.1.2 Statement Includes - Symbols which are imported by
language-defined include statements can also be split by the compiler.
The split lifetime description must occur in the Local symbol table in
the context of the routine which accesses the imported variable and
which causes it to be split. The split lifetime referee symbol type
and class are stLocal and scInfo, respectively. The symbol index is
an AUX RNDX value which points to the imported symbol's declaration.
Split Lifetime Representation Using COFF Page 10
SPLIT LIFETIME EXAMPLES 14 February 1997
3.2.2 FORTRAN Entrypoint Parameters - Parameters to FORTRAN
entrypoints are described in the symbol table immediately after the
entrypoint symbol. If two or more entrypoints (including the main
subroutine entry) declare the same parameter (i.e, same named
parameter), then the split lifetime description always refers to the
first.
|
| Possible Alternative
|
| Perhaps it would make sense in this case for there to
| be multiple referee symbols?
|
3.2.3 FORTRAN COMMON Symbols - FORTRAN COMMON members are described
in the symbol table using three constructs:
1. A global symbol in the Externals table. This symbol has the
same name as the COMMON (with an underscore appended); its
address is the static base address of the COMMON.
2. A synthesized file. This file has the same name as the
global symbol; it contains a C-structure where the elements
of the structure are the members of the COMMON. Each
element's value is an offset (in bits) from the static base
address of the global symbol.
3. A symbol in the Local symbol table; this appears at the
lexical scope at which the COMMON is declared. The symbol
has the same name as the global symbol.
A split lifetime description of a FORTRAN COMMON member must appear in
the same scope as, but after the COMMON symbol in the Locals symbol
table. The split lifetime referee symbol type and class are stLocal
and scInfo, respectively. The symbol index is an AUX RNDX value which
points to the COMMON member definition in the synthesized file.
|
|
In addition to the proposals presented in 33.3 thru 33.5, a number of additional
proposals can be expected to deal with inline routine expansion and other
debugging optimized code challenges. This note tries to give a brief sketch
of some of the needs and ideas that are currently being considered.
1. Inline Expansion Scope Block
The implementation of inline call expansion is fundamentally one of replacing
a call with a copy of the body of the called routine, complete with copied
versions of local variables, etc. A straightforward representation builds on
this fact, using a new st code, as in stInlineBlock/scText. Parameters
become initialized locals of the new block.
Key attributes of this block are:
- Pointer to the called routine STE
- Location (possibly just the line number) of the call
- list of "end prolog" addresses
- list of "begin epilog" addresses
"End prolog" and "begin epilog" in this context are misnomers, but the name
does suggest their purposes: the addresses where breaks should occur when
stepping "into" the called routine and about to "return" from the called
routine. Because of the effects of optimization, these are not usefully
regarded a single locations -- a list is required for each.
A particular representation has yet to be proposed, but something like
stInlineBlock/scInfo might be used to provide some structure to the
initial sequence of STEs within the inline block.
2. Disjoint Range Extenders
Even in non-optimized code, it is valuable to be able to represent routines
and scopes that are not a single contiguous range of addresses; for optimized
code this is essential! A possible representation is to include
stExtendedRange/scText, stEnd/scText
pairs within a scope (routine, block, inline block,...) to indicate additional
ranges of addresses that are part of the containing scope.
3. Scheduled Code Masks
Scheduling of inlined code can lead to lots of small disjoint extended
ranges for an inlined block. A representation that addresses this might
be something like
stEndMask/scText
as an alternative to stEnd/scText, where the value is interpreted as a bit
mask for up to 64 instructions, beginning at the scope start, which are in
fact part of the scope.
Interlude
Combining 1, 2 and 3 might lead to a symbol table structure sorta
kinda like...
!Start the inlined block...
stInlineBlock scText <to end+1>
!Then zero or more "parameters"...
stParam scXxx !parameter
...
!Pointer to callee routine...
stInlineBlock scInfo <aux RNDX to callee STE>
!Location of call
stInlineBlock scAbs value=lineno of call
!Then zero or more prolog addresses...
stInlineBlock scText -1 ! End prolog address,
! no associated stEnd
!Normal STEs for "copies" of the local variables of the called routine...
...
!Then zero or more eplog addresses...
stEnd scText -1 ! Begin epilog address,
! does not end inline
!Then zero or more extended range pairs...
stExtendedRange scText
stEnd scText
! and/or
stExtendedRange scText
stEndMask scAbs
!Finally, the closing end...
stEnd scText <to beginning> !real end of inline
4. Source/Object Mapping (the Line Number Table)
In general the program source line number to object code instruction mapping
must be many-to-many. It may be simplest conceptually to regard this as two
distinct mappings
- Source to Object
- Object to Source
each of which is one-to-many. Moreover, the current inability to cross source
file boundaries in the line number table must be overcome -- even to deal
properly with non-optimized code.
No particular suggestions are offered here at the moment, but a radical
overhaul is definitely needed. One starting point might be to effectively
eliminate the current role of File Descriptors by collapsing them all into
a single virtual "Source" file that applies to the entire compilation (just
to keep the current superstructure intact for the benefit of old tools)
and then introduce additional new structures that provide the needed
expressive power. Much work needed here...
5. Strength Reduced Variables
A minor target of opportunity concerns strength reduced variables, where
an index of a loop that steps thru an array is replaced by stepping thru
the addresses of the array itself. The loop index is easily backed computed
as a linear function (subtract the base of array, divide by the stride or
element size) of the real control address. How to represent this?
Meta Question
These proposals (including the split lifetime proposal) build on the existing
STE "style" rather than exploit the optimization section because:
- It is necessary to represent code addresses within a procedure
(and get them relocated)
- It is necessary to point to AUX entries (in a reliable way)
The question is, whether and to what extent it is possible to do both of
these from within the DBGOPT section in a way that does not require ld
to learn how to parse/interpret that new section.
I think I understand how to handle code addresses, assuming that the debugger
performs the final relocation when it reads the DBGOPT section.
I am unclear how to handle AUX entries in a way that cannot be compromised
by ld processing (eg merging).
Perhaps a small group can help me brainstorm possibilities here...
|