[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::unix_objsym

Title:Digital UNIX Object File/Symbol Table Notes Conference
Moderator:SMURF::LOWELL
Created:Mon Nov 25 1996
Last Modified:Thu Jun 05 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:71
Total number of notes:314

33.0. "ISSUE 27: Debugging optimized code" by SMURF::LOWELL () Wed Dec 04 1996 15:12

T.RTitleUserPersonal
Name
DateLines
33.1comments on issue 27 from David C. P. LaFrance-LindenSMURF::LOWELLThu Dec 05 1996 16:2456
33.2VIRRUS::diewaldHere In Soap Opera Central...Tue Dec 24 1996 14:2613
33.3Proposed Optimization Symbols OrganizationGEMGRP::BRENDERRon BrenderFri Feb 14 1997 13:55338



                            Optimization  Symbols  Section

                                     Ron Brender
     |                              (Jeff Nelson)
     |                          14 February 1997 (Rev)


        ABSTRACT

        This document defines the  syntax  and  semantics  of  the  Third  Eye
        Symbols  section  known  as  the "Optimization Symbols" section.  This
        section,  though  it  has  a  definition  in  /usr/include/sym.h,   is
        currently unused on the Digital Unix platform.



        1  MOTIVATION

        The Third Eye symbol table embedded  within  a  COFF  object  file  is
        unwieldy  and difficult to extend without coordination and cooperation
        between lots of consumers and producers.  This is  further  compounded
        by   two   factors:   first,  not  all  consumers  and  producers  are
        necessarily known.  Second, the extensions that a producer or consumer
        wants  are  not  necessarily  compatible  with  the needs of any other
        producer or consumer.



        2  PURPOSE

        The definition of the Optimization Symbols section is designed to ease
        these  problems.   It  gives  individual  producers  and consumers the
        ability to communicate information about any aspect of the object file
        in   any   form   they  choose.   This  allows  for  new  or  modified
        descriptions, while keeping the rest of the  symbol  table  unchanged.
        New  information  can  be  generated  at any time without coordination
        between all producers  and  consumers,  though  eventually  a  minimal
        amount of coordination is required.


        In fact, it is recommended that the information  in  the  Optimization
        Symbols  section  eventually be folded back into the mainstream symbol
        description.   This  is  so  all  producers  and  consumers  can  take
        advantage  of  the  information  which  up  to now has presumably been
        private between one or more producers and consumers.


        It is assumed that the information in the Optimization Symbols section
        does  not contradict or fundamentally violate any understanding of the
        object that is given in any other part of the symbol table.  In  other
        words,  it is not OK to "lie" in the mainstream symbol description and
        tell the "truth" in the  Optimization  Symbols  section.   It  is  OK,
        however,  to  modify  or  enhance  the  description of the main symbol
        table.  This is the intended purpose of the section.

        Optimization Symbols Section                                    Page 2
        PURPOSE                                               14 February 1997


     |  Since the prior version of this proposal was prepared (8 August 1995),
     |  the  UNIX  Object  File  and  Symbol Table Working Group has adopted a
     |  specification for the .comment section that is  substantially  similar
     |  in  goals  and  mechanism.   This  revision recasts this proposal in a
     |  manner  that  is  more  consistent  with  the  .comment   section   in
     |  representation and processing.



        3  DEFINITION

     |  The Optimization Section consists of a sequence of zero or  more  "per
     |  procedure  optimization descriptions" (PPOD).  Each PPOD is pointed to
     |  by the iopt field of  the  procedure  descriptor  (PDR)  to  which  it
     |  applies  (as  explained  later).   While  each  PPOD  has  an internal
     |  structure  much  like  that  of  a  .comment  section,  there  is   no
     |  meta-structure  or  wrapper  that  collects  all  the  PPODs of a file
     |  together.  (There are existing pointer and size  fields  of  the  COFF
     |  file  header  (HDRR)  and  File Descriptors (FDR) that are used in the
     |  usual way to describe the aggregate of  all  PPODs;  more  later.)  In
     |  particular,  it  is intended that the linker (ld) be able to construct
     |  the Optimization Section of an output image much  like  it  constructs
     |  the  local  symbol  table  -- as the concatenation of the Optimization
     |  Sections of the constituent object files.   Unlike  the  local  symbol
     |  table,  however,  it  is  intended  that the linker need not interpret
     |  and/or modify the contents of a PPOD in any way.
     |  
     |  
     |  Each constituent PPOD of the Optimization Section has a structure that
     |  is analogous to a comment section:
     |  
     |        o  A leading sequence of TLV "index" entries that  describe  the
     |           location and parts of the PPOD, followed by
     |  
     |        o  A  raw  data  area   containing   the   actual   optimization
     |           descriptions.
     |  
     |  
     |  
     |  3.1  PPOD Index Entry Structure
     |  
     |  Each index entry has the following structure:
     |  
     |   =================start definition===============
     |  
     |  typedef struct {
     |      unsigned int  ppode_tag;
     |      unsigned int  ppode_len;
     |      unsigned long ppode_val;
     |      } PPODHDR;
     |  
     |   ================= end definition ===============

        Optimization Symbols Section                                    Page 3
        DEFINITION                                            14 February 1997


     |  where
     |  
     |  ppode_tag       Identifies the kind of data described by the entry.
     |  
     |  ppode_len       Indicates the size of the data,  in  bytes,  which  is
     |                  found  in  the  free form data area of this same PPOD.
     |                  When this field is zero, then the only  data  is  that
     |                  found in the ppode_val field.
     |  
     |  ppode_val       Is, or describes the location  of,  the  data  of  the
     |                  given  kind.   When  ppode_len  is  zero,  this  field
     |                  contains the (only) data itself.   When  ppode_len  is
     |                  non-zero,  this  field  is a relative file offset from
     |                  the beginning of the current PPOD  to  the  applicable
     |                  data area.
     |  
     |  
     |  The start of all data allocated in the free-form area must be octaword
     |  (16-byte)  aligned.   (Recall  that the Optimization Section itself is
     |  octaword aligned.) It follows (and is  required)  that  each  distinct
     |  PPOD must be octaword aligned as well.  The length stored in ppode_len
     |  need not be an octaword multiple, but when it  is  not,  padding  with
     |  zero-bytes must be appended to the end of the data item.
     |  
     |  
     |  
     |  3.2  PPOD Index Entry Kinds
     |  
     |  Every PPOD must contain at least two PPOD Entries:  the first and last
     |  of which must be:
     |  
     |      Tag           Value         Interpretation
     |  
     |  PPODE_STAMP         1   Identifies the version  number  of  the  PPOD.
     |                          ppode_len must be zero, and ppode_val contains
     |                          the version number (initially 1).
     |  
     |              =================start definition===============
     |  
     |              #define PPOD_VERSION = 1; /* current version number */
     |  
     |              ================= end definition ===============
     |  
     |  PPODE_END           2   Indicates the end of the PPOD Entries for this
     |                          PPOD.   (Both  ppode_len and ppode_val must be
     |                          zero.)
     |  
     |  
     |  The PPOD version number is  for  future  expansion  purposes;  if  the
     |  Optimization  Section  ever changes semantically or syntactically, the
     |  version number shall  change  so  that  consumers  can  recognize  the
     |  difference.

        Optimization Symbols Section                                    Page 4
        DEFINITION                                            14 February 1997


     |  In addition to the  PPODE_STAMP  and  PPODE_END  kinds,  a  number  of
     |  additional  kinds  of  data  will  be  defined from time to time.  The
     |  following  kinds  are  currently  anticipated;  however,  the   actual
     |  specifications are (will be) given in separate documents:
     |  
     |  PPODE_SEM_EVENT     3   Semantic Event Descriptions
     |  PPODE_INLINE_INST   4   Inline Instance Descriptions
     |  PPODE_INLINE_LOC    5   Inline Locator Mapping Descriptions
     |  
     |  
     |  It is expected that there will  typically  be  a  natural  correlation
     |  between  index  entries  and  the  data parts:  the first (non-version
     |  stamp) entry describes the first  data  part,  the  second  descriptor
     |  describes  the  second  data part, and so on.  However, this cannot be
     |  assumed.  In addition, it is invalid to  assume  an  ordering  of  the
     |  index entries.
     |  
     |  
     |  Depending on the kind of data involved, it may be valid to  have  more
     |  than  one  entry with the same tag field value; that is, in general it
     |  is not valid to regard the tag field as a unique key.



        3.3  The Data Parts

        The contents and format of the data  parts  are  arranged  by  private
        agreement  between  the producers and consumers of the particular kind
        of data part.



        4  RELATIONSHIP WITH OTHER COFF STRUCTURES

        The Optimization Symbols section is organized and indexed in much  the
        same way as the Locals Symbols section:

              -  The Optimization Symbols section is a single  section  within
                 the COFF file.

              -  The COFF header  (HDRR)  contains  the  starting  offset  and
                 maximum   index   into   the  Optimization  symbols  section,
                 cbOptOffset and ioptMax, respectively.  The cbOptOffset value
                 is  the  file  offset  of  the first byte in the Optimization
     |           Symbol table.  This  value  must  always  be  aligned  on  an
     |           octaword  (16-byte) boundary.  The ioptMax value is the count
                 of the total number  of  bytes  in  the  entire  Optimization
     |           Symbols  section,  including  index entries, data and padding
     |           bytes.  Thus, this value is always a multiple of 16 bytes.

        Optimization Symbols Section                                    Page 5
        RELATIONSHIP WITH OTHER COFF STRUCTURES               14 February 1997


              -  Each  File  Descriptor  (FDR)  has   a   pointer   and   size
                 contribution  to  the  Optimization  Symbols  section,  named
                 ioptBase and copt, respectively.  The ioptBase value  is  the
                 byte  offset  from  the  start  of  the  Optimization Symbols
                 section to this file's optimization symbols.  The copt  value
                 is  the  total  number  of  bytes in the Optimization Symbols
     |           section which are contributed by this file,  including  index
     |           entries,  data and padding bytes.  This implies that a file's
                 contribution to the  Optimization  Symbols  section  must  be
                 contiguous, even if the procedures in that file are not.

              -  Each Procedure Descriptor (PDR) has a pointer to the start of
                 that  procedure's  contribution  to  the Optimization Symbols
                 section, named iopt.  This offset points to the  start  of  a
                 unique   and   complete   Optimization  Procedure  Descriptor
                 relative to the beginning of it containing FDR  contribution.
                 There  is  at  most one Optimization Procedure Descriptor per
                 routine.  A procedure's Optimization Descriptor can be  found
                 using this formula:

                      HDRR.cbOptOffset + FDR.ioptBase + PDR.iopt



        5  PROCESSING

        Tools which produce COFF object files must  produce  either  an  empty
        Optimization  Symbols section or a valid Optimization Symbols section.
        An image with no Optimization Symbols section has HDRR.cbOptOffset and
        HDRR.ioptMax  values  of  zero.   A  file  with  no contribution to an
     |  Optimization Symbols section has a FDR.copt value of  zero,  in  which
     |  case every PDR within it has a PDR.iopt value of zero.  If a FDR has a
     |  contribution, then every procedure contained within  it  must  have  a
     |  contribution   pointed   to  by  its  PDR.iopt  field,  even  if  that
     |  contribution    consists    only    of    the    minimum    pair    of
     |  PPODE_STAMP/PPODE_END index entries.
     |  
     |                                  ISSUES
     |  
     |  
     |          1.  It does not work for a PDR.iopt of zero by  itself
     |              to indicate no contribution because that (validly)
     |              points to the beginning of the FDR's contribution.
     |              Unfortunately, there is no PDR.copt (length) field
     |              analgous to the length in HDRR and FDR structures.
     |              Hence this "minimum contribution" convention.
     |  
     |          2.  Assuming this  convention,  would  it  be  OK  for
     |              multiple   procedures   to  share  the  same  such
     |              contribution.  In which  case,  we  could  require
     |              that  each  FDR  contrubution  must  begin  with a
     |              minimum PPODE_STAMP/PPODE_END pair, so PDR.iopt ==
     |              0 does imply no contribution...?

        Optimization Symbols Section                                    Page 6
        PROCESSING                                            14 February 1997


        Tools which consume COFF object files must be capable of skipping  the
        entire  Optimization  Symbols  section,  or those parts of it which it
        does not understand.


        Tools which both read and write COFF object files must consume a valid
        Optimization  Symbols  section  (if  one  exists in an input file) and
        produce an equivalent,  valid  Optimization  Symbols  section  in  its
        output file.  This means one of the following:

              -  The tool does  not  know  how  to  process  anything  in  the
                 Optimization  Symbols  section.  The tool must write an exact
                 copy of any Optimization Symbols section  it  reads  in.   In
                 other  words,  it must allow the Optimization Symbols section
                 to pass through unchanged.

              -  The tool recognizes some kinds of data parts.  This tool must
                 copy,  unchanged,  the  data  parts (and descriptors) that it
                 does not understand.  The tool must read (and  if  necessary,
                 transform)  the  data  parts  (and  descriptors) that it does
                 understand  and  write  the  equivalent   data   parts   (and
                 descriptors).



        6  COORDINATION

        It is the responsibility of every producer to  obtain  a  unique  kind
        value  for  each  distinct  kind  of  data  it  wishes to place in the
        Optimization Symbols section.  Obtaining a unique kind  value  ensures
        that  there  won't  be two producers using the same kind value to mean
        different things.  Obtaining a unique kind value  is  accomplished  by
        adding  a  new  constant  definition  to  the  include file (currently
        /usr/include/symconst.h) which defines the format of the  Optimization
        Symbols section.


        Information in the Optimization Symbols section,  though  arranged  by
        private  agreement  between  producers  and  consumers, is meant to be
        shared among all consumers and producers when it makes sense to do so.
        Producers  and consumers are jointly responsible for ensuring that the
        data parts they write and read, respectively, are also recognized  and
        processed  by  other  tools if those tools could have an impact on the
        information in the data part.  For example, if a compiler generates  a
        different  kind  of  PC-line  correlation  table  which  is  used by a
        debugger, and an instruction-modifying tool makes changes  (insertions
        and  deletions)  to  the instruction stream, the compiler and debugger
        should both lobby the modifying tool to  keep  the  different  PC-line
        correlation table up to date.
33.4Proposed Semantic Event RepresentationGEMGRP::BRENDERRon BrenderFri Feb 14 1997 14:13144



                         DBGOPT Semantic Event Representation
                                      using the
                            UNIX COFF Optimization Section

                                     Ron Brender
     |                              (Jeff Nelson)
     |                      14 February 1997 (Revision 3)


        Greg Lueck, Jeff Nelson and Mike Rickabaugh have  proposed  a  storage
        representation  and  semantics  for  giving  form and substance to the
        currently unused "Optimization Symbols" section in the COFF/Third  Eye
        symbol  table.   This  document  builds on that framework to propose a
        specification for how to represent semantic event  information.   This
        information  will  be  generated  by  GEM-based  compilers and used by
        Ladebug (and ignored by dbx).



        1  OVERVIEW OF SEMANTIC EVENTS

        Semantic events are those points in a program where  the  user-visible
        and  user-relevant  semantic actions of a program actually occur.  For
        example, for an assignment statement, the instruction that stores into
        a user declared variable is generally the location of a semantic event
        (the event temporally  occurs  when  that  instruction  is  executed).
        Semantic event locations are generally divided into these kinds:

                assignments
                control points (conditional transfers)
                calls (and return, including PALcalls)
                labels

        Not all instructions that  effect  these  operations  are  necessarily
        visible  or  even interesting to users.  For a complete description of
        how the actual set of semantic  event  locations  is  determined,  see
        "What  Every  Front-End  Should  Know About Debugging Optimized Code",
     |  which can be found in TURRIS::AD-PROJECTS note 74.6.



        2  SEMANTIC EVENT REPRESENTATION

        Semantic events  are  represented  using  a  semantic  event  kind  of
     |  subsection  (PPODE_SEM_EVENT  ==  3) in the Per Procedure Optimization
     |  Description for a procedure.  There will be one  instance  of  such  a
        subsection  that  describes  the  semantic  event  information for the
        entire procedure.

        Semantic Event Representation                                   Page 2
        SEMANTIC EVENT REPRESENTATION                         14 February 1997


        A semantic event subsection consists of an  array  of  Semantic  Event
        Entries (for the entire procedure) where:

              -  The length of the array is specified by the Size field in the
                 subsection descriptor, and

              -  Each element of the array is a Semantic Event  Entry  defined
                 as described below.


        Each Semantic Event Entry is a byte consisting of two 4-bit fields:

                 7     4 3     0
                +-------+-------+
                | Event | Count |
                +-------+-------+

        where

              .  Event  is  a  4-bit  code  that  indicates  the  event  being
                 described:

                        0       None (used for a Count of 16 or more, see below)
                        1       Write (assignment) event
                        2       Control event
                        3       Call event
                        4       Label event
                        5       Instruction level only
                        6       Prolog End (first instruction following)
                        7       Epilog Begin (first instruction in)
                        8-15    (Reserved for future use)


              .  Count is a 4-bit field with a value in  the  range  0  to  15
                 indicating  the  number  of executable instructions following
                 the previous event description to which this  event  applies.
                 If  more  than 15 instructions separate events, then multiple
                 event entries that indicate the null event are used to add up
                 to  the  required separation.  If more than one event applies
                 to the same instruction, then the first event is encoded with
                 the  appropriate  Count  to  "get  to"  the  instruction  and
                 subsequent events are encoded using a Count of 0.

                                             NOTE

                         The encoding of this field is *not* identical
                         to  the encoding of the Count field of a Line
                         Number Entry.  This Count encodes the  values
                         from 0 to 15 rather than 1 to 16.


        Semantic Event Representation                                   Page 3
        SEMANTIC EVENT REPRESENTATION                         14 February 1997


        The first semantic event of each procedure must be a Label event  with
        a  Count  of  zero.   The  address  in the text section for this first
        instruction is specified in the Procedure Descriptor Entry that points
        to the containing Optimization Section.


        Typically (but not necessarily), the last Semantic  Event  Entry  will
        consist of the value 0x3n corresponding to the last RET instruction of
        the routine.  There is no need to "describe" any out-of-line  code  or
        padding  NOP  instructions  that  may  occur  at  the end of a routine
        following the last RET so long  as  they  contain  no  semantic  event
        locations.













                                      APPENDIX A

                              ADDITIONS TO SYM*.H FILES



                    ...Changes to sym.h and symconst.h are TBD...
33.5Proposed Representation for Split Lifetime InfoGEMGRP::BRENDERRon BrenderFri Feb 14 1997 16:02547



                     Representation of Split Lifetime Information
                                        using
                                  Digital Unix COFF

                                     Ron Brender
     |                              (Jeff Nelson)
     |                      14 February 1997 (Revision 2)


        This document proposes changes to the Digital UNIX COFF on-disk symbol
        table format which are required to support split lifetime variables.


        For background information (such as just what is  a  split  lifetime),
        see  "What Every Front End Should Know About Debugging Optimized Code"
     |  by Brender and Nelson.  A copy is posted in  TURRIS::AD-PROJECTS  note
     |  74.6.
     |  
     |                                   NOTE
     |  
     |          This revision differs significantly from the  previous
     |          (dated  28 August 1995) only in that it eliminates the
     |          need/use of the scSymRef storage class.
     |  



        1  OVERVIEW

        The split lifetime variable description is designed to  supplement  an
        existing  symbol  description.   This  is  a change from earlier split
        lifetime descriptions,  which  attempted  to  completely  replace  one
        definition with another.


        There are several reasons why split lifetime needs to supplement,  not
        entirely  replace,  a symbol's description.  The most important one is
        that the variable  may  be  split  in  a  compilation  unit  which  is
        independent  from  the  compilation  unit which declares the variable.
        For example, consider a global variable.  It  is  declared  once,  but
        there   are  potentially  many  independent  compilation  units  which
        manipulate  the  variable.    Because   each   compilation   unit   is
        independent,  it  is  not  possible  to replace the global definition,
        because each compilation would have to know about the others in  order
        to give a complete replacement definition.


        Other significant but relatively less important  reasons  are  due  to
        limitations  and assumptions about COFF and the Third Eye symbol table
        format.  For example, text relocations (for relocating PC values)  can
        only  occur  in  the  Locals  symbol  table  and  even  then, are only
        meaningful when they occur in the  same  context  as  the  compilation
        unit.   This  means  that text relocations in synthesized files (e.g.,
        for FORTRAN COMMON) don't work.

        Split Lifetime Representation Using COFF                        Page 2
        OVERVIEW                                              14 February 1997


        As before, the entire split lifetime description  can  be  skipped  by
        consumers  who  choose  to  ignore it.  Those consumers will have some
        understanding of the variable (its name, type, and scope in  which  it
        appears), though less-accurate understanding of the symbol's address.


        The remainder of this document describes the new format.



        2  SPLIT LIFETIME VARIABLE DESCRIPTION

        The split lifetime on-disk format for a program variable consists of:

             1.  A header symbol.

             2.  A referee symbol.

             3.  A list of one or more  lifetime  descriptions,  called  child
                 descriptions.

             4.  A trailer symbol.


        A more detailed description  using  the  following  example  is  given
        below.

          0. ( 0)(   0) docfe.f    File       Text       symref 51
          1. ( 1)(0x120001810) docfe_     Proc       Text       [24]
          2. ( 2)(   0) A          Param      Unalloc     [26]
     |    3. (-4)(   0) N          Param      Unalloc     [10]
     |    4. ( ?)(   ?)            Block      Text       symref 99
     |    5. ( 2)(   5) N          Split      Info       symref 23
     |    6. ( 2)(   5) N          Local      scInfo      [->symref 3]
          7. ( 2)(0x11) N          Param      VarRegister [10]
          8. ( 2)(0x120001818) N   Split      Text       symref 10
          9. ( 1)(   0) N          End        Text       symref 8
         10. ( 1)(   0) N          Param      VarRegister [10]
         11. ( 1)(0x12000181c) N   Split      Text       symref 13
         12. ( 0)(   0) N          End        Text       symref 11
         13. ( 0)(0x11) N          Param      VarRegister [10]
         14. ( 0)(0x120001828) N   Split      Text       symref 16
         15. (-1)(   0) N          End        Text       symref 14
         16. (-1)(   0) N          Param      VarRegister [10]
         17. (-1)(0x12000182c) N   Split      Text       symref 19
         18. (-2)(   0) N          End        Text       symref 17
         19. (-2)(   0) N          Param      VarRegister [10]
         20. (-2)(0x120001838) N   Split      Text       symref 22
         21. (-3)( 0x8) N          End        Text       symref 20
         22. (-4)(   0) N          End        Info       symref 5


                Example 1. Split Lifetime Description of Parameter N.

        Split Lifetime Representation Using COFF                        Page 3
        SPLIT LIFETIME VARIABLE DESCRIPTION                   14 February 1997


        2.1  Header Symbol

        The split lifetime header symbol is identified by the symbol type  and
        class pair (stSplit,scInfo).  In Example 1, symbol entry 5 denotes the
        beginning of the split lifetime description for the  program  variable
        N.   The  symbol  value  field contains a count of the number of child
        descriptions, which in the example is 5.  The symbol index field is  a
        forward symbol reference which points to the symbol just after the end
        of the split lifetime definition.


        Consumers which choose not to support the split  lifetime  description
        should  recognize  the  stSplit/scInfo symbol table entry and skip the
        entire description using the symbol index.  (Note that consumers which
        skip   the   split  lifetime  description  will  still  see  a  symbol
        definition, which in this example occurs at entry 4).



        2.2  Referee Symbol

        The referee symbol is the symbol entry which refers to the  symbol  to
        which  the split lifetime information applies.  There are two kinds of
        symbols which can be referred to.
     |  
     |  
     |  1.  Non-global Variables
     |  
     |  The  referee  symbol  type  and  class   are   stLocal   and   scInfo,
     |  respectively.   The  referee  symbol  index is an RNDX AUX entry which
     |  refers to the file and symbol offset of the variable's declaration.
     |  
     |  
     |  In most typical cases, the file value will be the same as the  current
     |  file.   In  the  case of FORTRAN COMMON variables, the file value will
     |  refer to the symthesized file that represents the common.


        Symbol entry 6 in Example  1  illustrates  a  symbol  reference  to  a
        parameter variable.


        2.  Global Variables

        Even though a variable is global (and therefore static), the  compiler
        may  perform  "split  loads"  and "split flushes" on it to temporarily
        allocate the variable in a register.  Symbols in the Externals  symbol
        table  cannot  be  referred  to  using  the previous referee mechanism
        because there is no way to refer to a global symbol using an AUX RNDX.
        Thus,  the referee symbol type and class are stGlobal and scInfo.  The
        referee symbol index is undefined and must be zero.  The  symbol  type
        informs  the consumer that the variable is in the Externals table; the
        symbol name is  then  used  as  the  identifying  key  to  locate  the
        definition.

        Split Lifetime Representation Using COFF                        Page 4
        SPLIT LIFETIME VARIABLE DESCRIPTION                   14 February 1997


        2.3  Default Lifetime

        When  a  split  lifetime  is  used  to  enhance  an  existing   symbol
        description,  the  problem  arises  of  what  to  do with the original
        description.  This is handled by defining  a  meaning  for  the  value
        field  in  the  referee  symbol.   The symbol value is a bitmask.  The
        meaning of the bits is as follows:

               Bits     Value   Meaning
               ----    -----   -------
                  0       0     Do not use target of reference as default
                          1     Use target of reference as default
               1-63      MBZ    Bits 1 through 63 are reserved and must be zero

        If bit zero of the referee value field is set, then the target of  the
        reference  should  be  retained as the "default" representation.  That
        is, the target symbol's type, class and address are active whenever  a
        split  child  isn't.  If bit zero of the referee value field is clear,
        then  the  target  symbol  should  NOT  be   used   as   the   default
        representation;  instead,  a  default  representation of "unallocated"
        should be used by the symbol table consumer.
     |  
     |                           Possible Alternative
     |  
     |          Rather than use the  referee  symbol  value  field  to
     |          encode   the  default  vs  no-default  distinction  (a
     |          certainly novel,  if  perhaps  unnatural  choice),  we
     |          could use other pairing of st/sc values.  For example,
     |          we could use
     |  
     |               stLocal,scInfo     Use default
     |               stGlobal,scInfo    Use default
     |               stLocal,scNil      Don't use default
     |               stGlobal,scNil     Don't use default
     |  
     |          Thoughts?
     |  



        2.4  Child Descriptions

        Each child description is a 3-tuple  of  symbol  table  entries.   The
        first  symbol  of  the  tuple,  called the child symbol, is a standard
        symbol table entry.  However, the only symbol types which  may  appear
        are  stStatic,  stGlobal,  stParam  and stLocal, because these are the
        symbol types which define program variables which can be  split  by  a
        producer  (compiler);  the  symbol classes are those which are already
        defined  and  paired   with   the   preceding   symbol   types.    The
        interpretation of the value and index fields of the child symbol is no
        different than the already-existing rules for  interpretation  of  the
        appropriate symbol type and class pair.

        Split Lifetime Representation Using COFF                        Page 5
        SPLIT LIFETIME VARIABLE DESCRIPTION                   14 February 1997


        The second symbol of the tuple, called the low PC symbol, defines  the
        low  bound of the PC range over which the child description is active.
        The symbol type and class are stSplit and scText,  respectively.   The
        symbol  value  is  the  PC  address of the lower bound of the lifetime
        range.  Because this value is an address in the text  section,  it  is
        automatically  relocated by the linker.  The symbol index is a forward
        symbol  reference  which  points  one  past  the  end  of  the   child
        description.


        The third symbol of the tuple, called the high PC symbol, defines  the
        high bound of the PC range over which the child description is active.
        The symbol type and class are stEnd  and  scInfo,  respectively.   The
        symbol  value  is  an offset from the value of the low PC symbol.  The
        actual upper bound value is computed by the expression:

                (low PC symbol)->value + (high PC symbol)->value

        The symbol index of the high PC symbol is a backward symbol  reference
        which points to the low PC symbol to which it is paired.


        The low PC and high PC values together define an  address  range  over
        which  the  child  symbol  is  said  to  be active.  That is, when the
        program is loaded and running and the program counter  is  within  the
        address range:

                low PC value <= current PC <= high PC value

        then the child symbol describes the  address  of  the  split  lifetime
        variable.  Note that the address range is inclusive of the endpoints.


        In Example 1, there are 5 split children which are decoded as follows:

           from PC 0x120001818 to PC 0x120001818:
                N is a VarRegister parameter in register 0x11
           from PC 0x12000181c to PC 0x12000181c:
                N is a VarRegister parameter in register 0x00
           from PC 0x120001828 to PC 0x120001828:
                N is a VarRegister parameter in register 0x11
           from PC 0x12000182c to PC 0x12000182c:
                N is a VarRegister parameter in register 0x00
           from PC 0x120001838 to PC 0x120001840:
                N is a VarRegister parameter in register 0x00

        Consumers may not make assumptions about  the  order  in  which  child
        descriptions appear.


        Consumers may not make assumptions about the  address  ranges  of  the
        child  descriptions.   In particular, the address range of two or more
        split children may overlap.

        Split Lifetime Representation Using COFF                        Page 6
        SPLIT LIFETIME VARIABLE DESCRIPTION                   14 February 1997


        2.5  Trailer Symbol

        The trailer symbol is identified by the symbol  type  and  class  pair
        (stEnd,scInfo).   In  Example 1, symbol entry 21 denote the end of the
        split lifetime description for program variable N.  The  symbol  value
        field  is  undefined  and  must  be zero.  The symbol index field is a
        backward symbol reference which points to the beginning of  the  split
        lifetime description.



        2.6  Assertions

        The following statements are all true about  a  given  split  lifetime
        description:

              -  All  symbol  entries  used  to  describe  a  split   lifetime
                 description  have  the  same  name.   In  fact, producers may
                 choose to emit the same offset into  the  strings  table  for
                 each symbol entry.

              -  All split children have the same symbol type (e.g., stParam).
                 In  other  words, producers are not allowed change the symbol
                 type depending on the PC range.

              -  All split children have the same symbol  type  (e.g.,  offset
                 into  the  AUX  table).   In  other  words, producers are not
                 allowed to change the variable's type  depending  on  the  PC
                 range.

              -  The PC address range of a split child may  overlap  with  the
                 range  of  one  or  more  other  split child ranges.  If this
                 occurs, then more than one split child is active  within  the
                 overlapping range.

              -  There is no significance to the order of child descriptions.

              -  Though the split lifetime structure mirrors that of  a  block
                 structure,  there is no explicit or implicit scope defined by
                 the structure.

              -  The split lifetime  description  does  NOT  introduce  a  new
                 variable,  nor  does  it  introduce a new name in the current
                 scope.  Consumers must  take  care  not  to  treat  it  as  a
                 variable declaration.

        Split Lifetime Representation Using COFF                        Page 7
        SPLIT LIFETIME VARIABLE DESCRIPTION                   14 February 1997


        2.7  Summary

        To  summarize,  the  following  new  symbol  type  and  symbol   class
        combinations  are  introduced  by the implementation of split lifetime
        support:

            Symbol    Symbol
            Type      Class     Meaning
            ------    ------    -------

            stSplit   scInfo    Begins  a  split  lifetime  description  of  a
                                parameter,   local   variable,   or  a  global
                                variable.  The symbol value field is  a  count
                                of  the number of split children.  The default
                                child is excluded from this count.  The symbol
                                index    field    points   to   the   fallback
                                description.

            stSplit   scText    Defines a split lifetime low  PC  value.   The
                                symbol  value  field  is  a  PC  address.  The
                                linker  automatically  relocates  this  value.
                                The  symbol  index field points to the high PC
                                definition (always the next entry).

            stSplit   scAbs     Defines a split lifetime low  PC  value.   The
                                symbol  value  field  is  a non-relocatable PC
                                address.  The symbol index field points to the
                                high PC definition (always the next entry).

            stEnd     scText    Defines a split lifetime high PC  value.   The
                                symbol  value  field  is  a non-relocatable PC
                                offset relative to  the  low  PC  value.   The
                                symbol  index  field  points  to  the  low  PC
                                definition (always the previous entry).  (Note
                                that this is not a new type/class combination,
                                but its use to define a high PC value is new.)

            stEnd     scAbs     Defines a split lifetime high PC  value.   The
                                symbol  value  field  is  a non-relocatable PC
                                address.  The symbol index field points to the
                                low  PC  definition  (always  the next entry).
                                (Note  that  this  is  not  a  new  type/class
                                combination,  but  its use to define a high PC
                                value is new.)

            stEnd     scInfo    Ends a split lifetime definition.  The  symbol
                                value  field  is  undefined  and must be zero.
                                The symbol index  field  points  back  to  the
                                beginning  of  the  split lifetime definition.
                                (Note  that  this  is  not  a  new  type/class
                                combination,  but  its  use  to  end  a  split
                                lifetime description is new.)

        Split Lifetime Representation Using COFF                        Page 8
        SPLIT LIFETIME VARIABLE DESCRIPTION                   14 February 1997


     |      stLocal   scInfo    Referee  symbol  within   a   split   lifetime
                                description.   The  symbol  value  field  is a
                                bitmask as defined above.   The  symbol  index
                                field  points  to the symbol being extended by
                                the split lifetime description.

     |      stGlobal  scInfo    Referee  symbol  within   a   split   lifetime
     |                          description.   The  symbol  value  field  is a
     |                          bitmask as defined above.   The  symbol  index
     |                          field  is  zero.   (The name is used to lookup
     |                          the appropriate symbol in the external symbols
     |                          section.)



        3  SPLIT LIFETIME EXAMPLES

        Different language constructs have different  representations  in  the
        COFF/STABS  symbol  table.   This section documents how split lifetime
        referee  symbols   should   be   used   to   refer   to   common   and
        language-specific representations.



        3.1  General Encodings

        This section describes the encodings applicable to all languages.



        3.1.1  Global Variables - Global variables are those whose definitions
        appear  in  the  Externals symbol table.  A split lifetime description
        which extends the global definition by necessity must  appear  in  the
        Locals  symbol table (so the PC values for the child lifetimes will be
        properly relocated).  The split lifetime  referee  symbol  is  encoded
        using  symbol  type stGlobal and symbol class scInfo.  The name of the
        referee symbol must be the same as the  name  of  the  symbol  in  the
        Externals  table to which the referee symbol refers, so that consumers
        can match the split  lifetime  description  with  the  correct  global
        symbol.



        3.1.2  Local Variables - Local variables are  those  declared  in  the
        scope  of  a  routine  (perhaps  nested within one or more blocks).  A
        split lifetime description which extends  the  local  variable  symbol
        entry  must occur after the symbol entry and must be in the same scope
        as the local variable symbol entry.  The split lifetime referee symbol
     |  is  encoded  using  symbol  type stLocal and symbol class scInfo.  The
     |  symbol index is an RNDX AUX entry which refers to the file and  symbol
     |  offset of the variable's declaration.

        Split Lifetime Representation Using COFF                        Page 9
        SPLIT LIFETIME EXAMPLES                               14 February 1997


        3.1.3  Routine  Parameters - Split  lifetime  extensions  of   routine
        parameters   occur   after  the  routine's  block  begin  symbol;  the
        extensions do not appear adjacent to the parameter symbols.  The split
        lifetime  referee  symbol  is  encoded  using  symbol type stLocal and
     |  symbol class scInfo.  The symbol index is  an  RNDX  AUX  entry  which
     |  refers to the file and symbol offset of the variable's declaration.



        3.2  Language-Specific Encodings

        This section describes the split lifetime encodings that are  specific
        to one or more languages.



        3.2.1  Imported  Symbols - Many  languages  support  the   notion   of
        importing  code  and  data  from  one  file  or  compilation unit into
        another.  There are two ways this is done:  by a textual include  (the
        "#include"  preprocessor  directive in C and C++) or by importing by a
        language-provided statement for that purpose (Ada's "with"  and  "use"
        statements, FORTRAN 90's "use" statement).



        3.2.1.1  Textual Includes - Symbols  which  are  imported  by  textual
        include  commands  appear  in  the  symbol table in one of two places:
        either in the "includer" file (i.e, the file  containing  the  include
        command) or in the "includee" file (i.e., the file being included).


        Regardless of where the producer (compiler) defines  the  symbol,  the
        split  lifetime  referee  symbol by definition must point to the place
        where the producer declares the symbol.  Thus, the referee symbol type
        and  class  are  stLocal and scInfo, respectively, and the index is an
        AUX RNDX value pointing to the declaration.



        3.2.1.2  Statement   Includes - Symbols   which   are   imported    by
        language-defined include statements can also be split by the compiler.
        The split lifetime description must occur in the Local symbol table in
        the  context  of  the routine which accesses the imported variable and
        which causes it to be split.  The split lifetime referee  symbol  type
        and  class  are stLocal and scInfo, respectively.  The symbol index is
        an AUX RNDX value which points to the imported symbol's declaration.

        Split Lifetime Representation Using COFF                       Page 10
        SPLIT LIFETIME EXAMPLES                               14 February 1997


        3.2.2  FORTRAN   Entrypoint   Parameters - Parameters    to    FORTRAN
        entrypoints  are  described  in the symbol table immediately after the
        entrypoint symbol.  If two or more  entrypoints  (including  the  main
        subroutine   entry)  declare  the  same  parameter  (i.e,  same  named
        parameter), then the split lifetime description always refers  to  the
        first.
     |  
     |                           Possible Alternative
     |  
     |          Perhaps it would make sense in this case for there  to
     |          be multiple referee symbols?
     |  



        3.2.3  FORTRAN COMMON Symbols - FORTRAN COMMON members  are  described
        in the symbol table using three constructs:

             1.  A global symbol in the Externals table.  This symbol has  the
                 same  name  as  the COMMON (with an underscore appended); its
                 address is the static base address of the COMMON.

             2.  A synthesized file.  This file  has  the  same  name  as  the
                 global  symbol;  it contains a C-structure where the elements
                 of the  structure  are  the  members  of  the  COMMON.   Each
                 element's  value  is an offset (in bits) from the static base
                 address of the global symbol.

             3.  A symbol in the Local  symbol  table;  this  appears  at  the
                 lexical  scope  at  which the COMMON is declared.  The symbol
                 has the same name as the global symbol.


        A split lifetime description of a FORTRAN COMMON member must appear in
        the  same  scope  as, but after the COMMON symbol in the Locals symbol
        table.  The split lifetime referee symbol type and class  are  stLocal
        and scInfo, respectively.  The symbol index is an AUX RNDX value which
        points to the COMMON member definition in the synthesized file.
33.6Other DBGOPT needs and ideasGEMGRP::BRENDERRon BrenderTue Mar 04 1997 11:43159
In addition to the proposals presented in 33.3 thru 33.5, a number of additional
proposals can be expected to deal with inline routine expansion and other
debugging optimized code challenges. This note tries to give a brief sketch
of some of the needs and ideas that are currently being considered.


1. Inline Expansion Scope Block

The implementation of inline call expansion is fundamentally one of replacing
a call with a copy of the body of the called routine, complete with copied
versions of local variables, etc. A straightforward representation builds on
this fact, using a new st code, as in stInlineBlock/scText. Parameters
become initialized locals of the new block.

Key attributes of this block are:

  - Pointer to the called routine STE
  - Location (possibly just the line number) of the call
  - list of "end prolog" addresses
  - list of "begin epilog" addresses

"End prolog" and "begin epilog" in this context are misnomers, but the name
does suggest their purposes: the addresses where breaks should occur when
stepping "into" the called routine and about to "return" from the called
routine. Because of the effects of optimization, these are not usefully
regarded a single locations -- a list is required for each.

A particular representation has yet to be proposed, but something like
stInlineBlock/scInfo might be used to provide some structure to the
initial sequence of STEs within the inline block.


2. Disjoint Range Extenders

Even in non-optimized code, it is valuable to be able to represent routines
and scopes that are not a single contiguous range of addresses; for optimized
code this is essential! A possible representation is to include

	stExtendedRange/scText, stEnd/scText

pairs within a scope (routine, block, inline block,...) to indicate additional
ranges of addresses that are part of the containing scope.


3. Scheduled Code Masks

Scheduling of inlined code can lead to lots of small disjoint extended
ranges for an inlined block. A representation that addresses this might
be something like

	stEndMask/scText

as an alternative to stEnd/scText, where the value is interpreted as a bit
mask for up to 64 instructions, beginning at the scope start, which are in
fact part of the scope.


			Interlude

    Combining 1, 2 and 3 might lead to a symbol table structure sorta
    kinda like...

!Start the inlined block...

	stInlineBlock	scText		<to end+1>

!Then zero or more "parameters"...

	stParam		scXxx				!parameter
	   ...

!Pointer to callee routine...

	stInlineBlock	scInfo		<aux RNDX to callee STE>

!Location of call

	stInlineBlock	scAbs	value=lineno of call

!Then zero or more prolog addresses...

	stInlineBlock	scText		-1		! End prolog address,
							! no associated stEnd

!Normal STEs for "copies" of the local variables of the called routine...
	...

!Then zero or more eplog addresses...

	stEnd		scText		-1		! Begin epilog address,
							! does not end inline

!Then zero or more extended range pairs...

	stExtendedRange	scText
	stEnd		scText

!		and/or
	stExtendedRange	scText
	stEndMask	scAbs

!Finally, the closing end...

	stEnd		scText		<to beginning>	!real end of inline


4. Source/Object Mapping (the Line Number Table)

In general the program source line number to object code instruction mapping
must be many-to-many. It may be simplest conceptually to regard this as two
distinct mappings

  - Source to Object
  - Object to Source

each of which is one-to-many. Moreover, the current inability to cross source
file boundaries in the line number table must be overcome -- even to deal
properly with non-optimized code.

No particular suggestions are offered here at the moment, but a radical
overhaul is definitely needed. One starting point might be to effectively
eliminate the current role of File Descriptors by collapsing them all into
a single virtual "Source" file that applies to the entire compilation (just
to keep the current superstructure intact for the benefit of old tools)
and then introduce additional new structures that provide the needed
expressive power. Much work needed here...


5. Strength Reduced Variables

A minor target of opportunity concerns strength reduced variables, where
an index of a loop that steps thru an array is replaced by stepping thru
the addresses of the array itself. The loop index is easily backed computed
as a linear function (subtract the base of array, divide by the stride or
element size) of the real control address. How to represent this?


			Meta Question

These proposals (including the split lifetime proposal) build on the existing
STE "style" rather than exploit the optimization section because:

  - It is necessary to represent code addresses within a procedure
    (and get them relocated)

  - It is necessary to point to AUX entries (in a reliable way)

The question is, whether and to what extent it is possible to do both of
these from within the DBGOPT section in a way that does not require ld
to learn how to parse/interpret that new section.

I think I understand how to handle code addresses, assuming that the debugger
performs the final relocation when it reads the DBGOPT section. 

I am unclear how to handle AUX entries in a way that cannot be compromised
by ld processing (eg merging).

Perhaps a small group can help me brainstorm possibilities here...