[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::unix_objsym

Title:Digital UNIX Object File/Symbol Table Notes Conference
Moderator:SMURF::LOWELL
Created:Mon Nov 25 1996
Last Modified:Thu Jun 05 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:71
Total number of notes:314

67.0. "Static Links" by GEMGRP::MONTELEONE () Wed Apr 16 1997 10:45

    
    
    
    
    
    
    
    
    
    Initial proposal, written by Peter Craig.
    
    
Now that GEM's decomposition will be creating nested function definitions
and uplevel references, we need a convention for describing this code in
the debug symbol table on Unix.  David Lafrance-Linden, Bob Monteleone
and I have been talking about how to accomplish this.

The solution that I had been assuming we could use has now been shown
not to work.  I had been assuming that as long as we don't have the most
general form of bound procedure value (which we won't in DECC or Fortran 90),
then the debugger could simply walk up the stack frames until it finds the
enclosing function that declares the symbol.  This does not work because
we are creating threads and passing them the static link, and POSIX
threads do not have a connection from the base frame on their stack to
the frame on the stack of the creating thread.

The clean and fully general solution that we would all like to get to
is to augment the Unix debug symbol table to explicitly identify the
symbol that contains the static link of a function.  Unfortunately, changes
to the symbol table format take a very long time to accomplish, and a
solution to debugging decomposed applications is needed very soon.

Consequently, Bob and I are recommending two solutions: a short term one
that requires no changes to the debug symbol table structure, and a long
term one that properly describes static links.  The following example
program will be used to illustrate the solutions (in no particular
language, but I hope all will understand it never the less):

	function a
	  automatic variable x
	  function b
	    automatic variable y
	    function c
	      automatic variable z
	      function d
	      <code that uses x and y but not z>
	      end function d
	    end function c
	  end function b
	end function a

Short Term Solution:

We (compilers and debuggers) will agree on a special name, such as
__StaticLink that GEM will give the variable that contains the static
link value for a function.  When a debugger needs the static link that
was passed to a function, it will look for a symbol with that name
within the function.  The fact that function definitions are nested
is indicated by the ordering of the stProcs, stBlocks and stEnds.

If the debug user stops at a breakpoint in function d, and asks the
debugger to print the value of x, the following sequence happens in
the debugger:

  1. The debugger looks for the name x within function d and does
     not find it.

  2. Before telling the user that there is no symbol x, the debugger
     notices that function d is nested within another function, and
     consequently the debugger begins looking in the enclosing
     functions for the name x.

  3. The debugger recursively looks for the name x in the enclosing
     functions until it finds the name x in function a.  The debugger
     now knows that x is located three levels up in the tree of
     nested functions.

  4. The debugger now attempts to find the current static link for
     function a.  Starting at function d, the debugger iterates three
     times around a loop which looks for the name __StaticLink in
     the current function, finds its value, and uses its value as
     the frame pointer for addressing symbols in the next enclosing
     function.

  5. The static link for function a is used to compute the address
     of symbol x within function a's fame.

If the debug user asks for the value of Q, the debugger would fail
to find it at step 3, and would then tell the user there is no such
symbol.

If the program was compiled with optimization, then because function
d does not use any variables from function c, the optimizer may have
passed function d a static link that points directly at the nearest
ancestor containing any variables that d uses.  This can be handled
in one of two ways.  Either we can invent another special name like
__StaticLinkLevels that indicates how many levels up the static
link goes, or we can punt and simply not describe the static link
when optimizing.

If we describe the optimization, then during step 4 above, the
debugger needs to determine how many levels up it has gone on each
iteration by getting the value of __StaticLinkLevels.  If the user
asked for the symbol z, then the debugger would have to notice that
the optimization has skipped over function c, and consequently the
debugger must tell the user that optimization has made it impossible
to print z.

If we punt, then when the optimization is done, the debugger should
complete step 3 successfully and know whether or not the name is
defined.  Then if the debugger fails to find a symbol __StaticLink
at one of the required levels in step 4, the debugger must tell
the user that optimization has made it impossible to print the
variable.

Long Term Solution:

The long term solution is the same as the short term solution
except that there are no special names.  Instead the variable that
contains the static link will have a special "static link" type
which indicates that it's the static link for the current function.

When we implement the long term solution, we should similarly
describe the number of levels that each static link spans in the
tree of nested functions.
    
T.RTitleUserPersonal
Name
DateLines
67.1Discussion, extracted from mail messages...GEMGRP::MONTELEONEWed Apr 16 1997 10:48177
     
    Excerpts from mail:
    
David,

Overall I interpret your response to the uplevel proposal as saying
that the short term proposal is acceptable (with some suggested
improvements), but you're not comfortable with the long term proposal.
I like your suggestion about encoding the static link level into
the name.  I would like us to move ahead with the short term proposal
ASAP, and continue discussing the long term proposal separately.

Below are my specific comments on some of the issues you raised.

						- Peter Craig
						  [email protected]


>I'm curious about the etymology of "static link".  I can understand
>the "link" part but I'm not sure what's static about it since it
>changes with each instance of the host procedure.  At Symbolics we
>called it the lexical environment.  Anyway...
>
I think "lexical environment" would be a better name, but "static
link" is the traditional name used by Digital's compiler engineering
community.  Essentially it means a link to the "static parent" of the
procedure.  The "static parent" is the lexically enclosing procedure,
as opposed to the "dynamic parent" which is the procedure that called
you.

>The short term solution, well-known name for the reference, looks
>fine.  I believe this has to be in the outermost scope of the
>procedure.  In coff-speak, siblings of the stParam's, within the
>stProc but before the first stBlock/scText.

Good. Let's move ahead with this form of solution.

>
>I must caution about some of the level-removing optimizations you
>mention.
> -- First and functional, even though function D may not use variable
>    Z, variable Z is still in lexical scope and may want to  
>    be seen by the debugger.  I'm not sure how well user's will like
>    your contention that the debugger says an optimization has made it
>    impossible to find Z.

That is the only honest thing to tell the user given that we are
performing the optimization.  See discussion below.

> -- Second, doing such optimizations, even when there were no
>    variables (such as Z) to consider, was a rich source of bugs in
>    the Symbolic's compiler and debugger.  Perhaps it was just lack of
>    rigor; I don't know.  Once we had lexical scoping, various
>    programming styles and paradigms erupted that used it in lots of
>    different ways which allowed for optimizations, but which
>    sometimes caused the optimizations to forget how to get to the
>    right level.
>I'm not going to fight either of those; just mention them.

This optimization has been in our GEM-based Pascal and Ada compilers
for many years, and as far as I know is utterly reliable.

We (Digital) are striving for the highest level of performance possible.
If an optimization significantly interferes with the user's ability to
debug, then we recommend that users compile with -g which should
suppress that optimization.

We have implemented a number of optimizations that fall in this
category.  The GEM project (and Ron Brender in particular) are
interested in improving the user's ability to debug optimized code,
however we are not willing to limit our optimizations based on the
current debugging capabilities.

>
>With or without the optimization, I might vote for naming the variable
>__StaticLink.<uplevel> to combine the features of __StaticLink and
>__StaticLinkLevels.  I.e., we'll usually see __StaticLink.1 but
>sometimes we'll see __StaticLink.2 and higher.

I like this idea.  It seems somewhat more powerful than my original
proposal.  The debuggers' algorithms for finding the desired static
link will need to be modified to search for __StaticLink.* and
discover which level(s) are available within the current procedure.

>
>For the long term solution, I'm not comfortable extending the symbol
>table quite yet.  As some of you know, I'm a fan of well-known names
>when there's not good reason not to.  Second, I don't think it is the
>general solution.  For example, here's my cut at what the proposal
>would say:
>  Create a new st type, stStaticLink, which can be paired with sc
>  codes as follows:
>	scAbs		value is on stack, use stLocal algorithm
>	scRegister	value is in register
>	scData, scBss	value is in memory (unlikely)
>	scTlsXXX	value is TLS, use TLS algorithm	(unlikely)
>The only thing this does is move the lookup from the name to the st
>type, gobbling the st indicator that could otherwise encoded
>something.  But there are two technical problems.
> -- Suppose the compiler decided to do further optimizations by
>    passing not just the parent's link, but also the links of other
>    ancestors as well?  I.e., parent in t0, grandparent in t1, etc,
>    which get stashed someplace (registers and/or stack).  They are
>    all static links, but for different uplevels.  Does this still use
>    stStaticLink and the number of uplevels is encoded in the name?

In principle this is a problem.  In practice we haven't found it
useful to implement such an optimization in over a decade of Pascal
and Ada compilers.  I'm sure we can develop a symbol table representation
that doesn't assume a single static link if we feel that generality
is useful.

> -- In the eventuality we have bound procedures that can persist after
>    the parent function exits, I'm not sure this solution is enough.
>    For example, consider
>	function OUTER
>	  variable OO
>	  function MIDDLE
>	    variable MM
>	    return INNER;
>	    function INNER
>	      variable II
>	      <use OO,MM,II>
>	    end function
>	  end function
>        end function
>    where function INNER can be called after function MIDDLE has
>    returned.  In this case, function MIDDLE generally has to malloc()
>    some memory to hold the local variables that INNER will reference;
>    MM in this case, but also MIDDLE's link to OUTER.  I.e., something
>    like
>	function MIDDLE
>	  struct my_environment {
>	    void *	__StaticLink;
>	    variable	MM;
>	  } * __Environment = malloc(sizeof(*__Environment));
>	  __Environment->__StaticLink = __StaticLink;
>	  callable_INNER = bind_up_in_heap(INNER, __Environment);
>	  return callable_INNER;
>        end function
>    It is __Environment that MIDDLE passes to INNER as the static
>    link, not the SP or FP register.  I suppose the type of INNER's
>    __StaticLink would be a struct rather than a void *, and this
>    would be a clue to the debugger.  And in that case, I suppose a
>    stStaticLink/scInfo inside an stBlock describe a struct would be
>    equivalent to stMember.

This hypothetical feature is not likely to appear in any language that
our debuggers need to support.  I would still consider it a valid point
if it demonstrated that one proposal would be flexible enough to support
it and another proposal would not.  However, it seems to me that the
name based schemes have no inherent advantage over binary schemes.  One
could invent binary schemes that describe multiple static links to
different environments.

>
>To make a long story short, we're talking algorithms here.  I don't
>think the symbol table format is a good place to describe algorithms,
>especially if experimentation or extension is foreseeable.  Naming
>flexibility permits algorithm flexibility.  Predefined binary
>encodings generally do not permit algorithm flexibility, and the long
>lead times for format extension cause a fallback to naming schemes
>anyway.

No.  I dissagree.  We're not talking algorithms here.  We are talking
about a data structure (the debug symbol table) that describes a
program.  Name based schemes have the advantage that they permit
late binding of meanings to names.  They have the (minor) dissadvantage
of being less compact.

Historically, changing binary formats usually required all code that
uses them to change.  For this reason engineers have tended to take
a more serious attitude about specifying and documenting new features
in binary formats.  If this same careful attention were paid to
name based features that don't automatically require everyone to
change their code, I think I'd feel comfortable with them.
    
67.2A few more comments in the mailNNTPD::&quot;[email protected]&quot;David LaFrance-LindenTue Apr 22 1997 15:3366
Date: Wed, 9 Apr 1997 13:57:42 -0400
From: "David C. P. LaFrance-Linden" <david_ll>

Short term ASAP is fine.  Discussing long term is fine, too.

When faced with making decisions of debuggability vs optimization, I
have traditionally leaned toward debuggability.  I surmise from your
comments that the prevailing wind is optimization foremost.  That's
fine; we just need to be clear, e.g., documentation.

My cautionary comments about skipping uplevels is based on a personal
history.  If Digital's Pascal and Ada have been doing it successfully
for years, then presumably whatever problems have already been ironed
out.  I suspect part of the Symbolics problem was complexity.  We
tended to use macros for extending the syntax, and the macros would
expand into a style resembling continuation passing, so it was quite
possible to have nested functions to a dozen levels.  Depending on
how they were used, which ones referenced variables from where, which
ones could keep their environment on the stack and which ones needed
to spill some of their environment to heap storage, things could get
complicated.  And at times we goofed.

    however we are not willing to limit our optimizations based on the
    current debugging capabilities.

I generally agree, and sometimes forget.  Feel free to make a sign
with that on it and hit me over the head when I do forget.

Re: heap-based environment which contain static links: 
    This hypothetical feature is not likely to appear in any language that
    our debuggers need to support.  
In practice, perhaps not.  In theory, Java 1.2 might, but I'm not sure
even that would intersect with this discussion.  

    Historically, changing binary formats usually required all code that
    uses them to change.  For this reason engineers have tended to take
    a more serious attitude about specifying and documenting new features
    in binary formats.  If this same careful attention were paid to
    name based features that don't automatically require everyone to
    change their code, I think I'd feel comfortable with them.

I think we devolve into religion here.  For me, name-based schemes
don't require everyone to change their code.  If ANY scheme changes,
the people who use the scheme may have to change their code.  But more
often with binary changes, everybody might have to change their code
even if they don't use the feature.  In addition to the advantage of
late binding of meanings to names, name-based schemes also permit
experimentation, progress, and flexibility without the constraints of
a governing board.  Allowing name-based schemes for short term
solutions may also provide for experimentation and progress, but
assuming binary encoding for the long term can have flexibility
problems.  In STWG we recently obsoleted the binary form of F90 array
descriptors in favor of a named-based scheme partly because the binary
form didn't have the flexibility (distinguishing assumed shape vs
POINTER vs ALLOCATABLE) that the named-based scheme did.  Perhaps this
was just a lack of careful attention (but applied to a binary
approach!) you would like to see for named-based schemes,  I'm not
convinced.

Anyway, let's move forward with named-based, at least for the short
term.

And welcome to the "Are named-based schemes bad?" debate!


[Posted by WWW Notes gateway]
67.3Various commentsFLYBA::BRENDERRon BrenderWed Apr 23 1997 12:5580
First of all, TIME OUT!

Last time we talked about a name vs binary convention, we all agreed that we
really ought to develop a list/catalog of all the name based conventions that
currently exist and are in use -- before we go proliferating more...

While I personally tend toward the binary side of this religious debate, I
am quite willing to be pragmatic -- provided we get the universe of naming
conventions documented and under control. 

Someone(s) took on/were given the assignment to put together such a summary.

				Where is it?

End of TIME OUT...
--------------------------------------------------------------------------------

Re .1:

Is it clear just what value constitutes a static link? Are we talking "real
frame pointer" (which could be the parent's SP or FP, depending) or "virtual
frame pointer" which biases the real frame pointer by some amount (a constant
on DU, I think)?

If we are to accomodate compiler's other than GEM, do we need to allow for a
choice of static link convention?


Re .2:

>>I must caution about some of the level-removing optimizations you
>>mention.
>> -- First and functional, even though function D may not use variable
>>    Z, variable Z is still in lexical scope and may want to  
>>    be seen by the debugger.  I'm not sure how well user's will like
>>    your contention that the debugger says an optimization has made it
>>    impossible to find Z.
>
>That is the only honest thing to tell the user given that we are
>performing the optimization.  See discussion below.

I note in passing that, if the need arises, it should be straight-
forward to identify those cases where the caller's frame and the
parent's frame are the same and can be found by parsing the stack
directly rather than resorting to a static link. A first cut
at the criteria:

  - Parent frame is fixed size (eg, no use of alloca(), however spelled,
    explicit or implicit)
  - Current routine is called only directly and from the lexical parent, eg
    excludes self recusion, no bound procedure values created
  - Others?

All that remains is for a way to pass a "flag" to the effect that
the necessary criteria are satisfied.

I doubt we need to worry about this right now, however.


>>With or without the optimization, I might vote for naming the variable
>>__StaticLink.<uplevel> to combine the features of __StaticLink and
>>__StaticLinkLevels.  I.e., we'll usually see __StaticLink.1 but
>>sometimes we'll see __StaticLink.2 and higher.
>
>I like this idea.  It seems somewhat more powerful than my original
>proposal.  The debuggers' algorithms for finding the desired static
>link will need to be modified to search for __StaticLink.* and
>discover which level(s) are available within the current procedure.

Interesting suggestion. The pragmatic question is whether a single but a bit
more complicated lookup is faster than doing two independent lookups. If the
single one is no slower (and the symbol table size *is* definitely smaller),
then by all means lets use the __StaticLink.<n> convention!


Re .1, .2, .3: short term vs long term

Two ways of doing exactly the same thing has a lot of negative appeal going
for it. If we adopt some form of the short term proposal now, I can't see
later adopting a functionally equivalent but different long term scheme later.
67.4Formalized proposalGEMGRP::MONTELEONEThu Apr 24 1997 13:50278
    

This note is a proposal concerning how uplevel references will be
described by compilers and understood by debuggers on Digital Unix.
This is intended to enable users of certain compiler features to
access uplevel referenced variables while debugging.  To support
this proposal, enhancements are recommended in the GEM compiler
back end, the ladebug debugger, and the TotalView debugger.
Similar enhancements to the dbx debugger should be considered if
resources permit.

Overview:

There are two features that will be included in Digital's
Fortran 90 compiler for Digital Unix, that are not adequately
described in the debug symbol table, and are not supported by
debuggers.  Both of these features use a mechanism called "uplevel
reference" for accessing variables declared in another procedure.
One of these features may also be included in future versions
of other Digital compilers for Digital Unix.

The result is that when debugging programs that use these features,
there are locations in the program where the compiler provides
access to variables through a mechanism that the debuggers do not
understand.  When stopped at these locations in the debugger,
the debugger does not know about these variables, and thus does
not support printing their values, evaluating their addresses, or
using them in expressions within debugger commands.

The Language Features:

The first of these features is host association within internal
subprograms.  This is a feature provided by the Fortran 90 standard.
The definition of one subroutine may declare local variables, and
may also declare an internal subroutine.  Within this internal
subroutine, the local variables of the enclosing "host" subroutine
are available by "host association".

Depending upon command line options such as -automatic, and the
declaration modifiers on the host associated variable, Digital's
Fortran 90 compiler may use the uplevel reference mechanism to
address the variable.

The second feature is manual decomposition.  With this feature
the user identifies a portion of a subroutine that is to be
executed in parallel by multiple threads.  The compiler's
implementation of this feature is based upon defining a
nested (or internal) subroutine which contains the code to be
executed in parallel.  Within this nested subroutine, variables
declared within the user's original subroutine may be accessed
using the uplevel reference mechansim.

The new internal subroutines created by manual decomposition
may occur within an internal subprogram, despite the fact that
the Fortran 90 standard prohibits users from nesting internal
subroutines.  Although manual decomposition is not a feature of
the officially accepted language standard, our implementation
will adhere to the draft standards produced by the ANSI X3H5
committee (AKA the Parallel Computing Forum), which unfortunately
disbanded before a parallel language standard was adopted.

Manual decomposition may be offered by Digital in other compilers
for Digital Unix in the future.  If so, then we expect that it
will create nested subprograms and uplevel references in the
same way that it does in the Fortran 90 compiler.

Digital's Pascal and Ada compilers also use the uplevel reference
mechanism to provide access to variables declared in enclosing
subprogram definitions, as required by their respective language
standards.

How the uplevel reference mechanism works:

A program that uses uplevel references includes a forest of nested
procedures.  Each tree in this forest has a root which is a
procedure that is not defined inside the scope of any other
procedure.  Any procedure in the tree has some number of
descendents which are procedures that are defined within it
(or nested within it), and has a chain of ancestors leading
back to the root.

Traditionally, when any procedure within such a tree is executing,
there must be an active invocation of every procedure in its chain
of ancestors.  The languages ensure this by restricting the visibility
of the name for a nested procedure so that it can only be named
within its parent, and thus the parent must be active if the nested
procedure has been called.

A procedure within such a tree is passed a pointer to a stack
frame for its immediate ancestor so that it can address local
variables of that ancestor.  This pointer is called the static
link of the procedure.

Specifically, the static link contains the "real" frame pointer
for the ancestor.  If the procedure wishes to access local
variables from multiple levels up in the tree, the stack
frame of its immediate ancestor also contains the pointer to
the next level's stack frame.

There is an optimization that Digital's compilers perform and
we would like debuggers to support.  When a procedure needs to
access local variables from multiple levels up, but the program
does not use any of the variables at the intermediate levels,
the compiler may pass that procedure a static link that points
directly to the distant ancestor's stack frame.

When one of these procedures is recursive, it may have multiple
active stack frames during execution.  In this case, logically
each of these dynamic instances of that procedure creates a new
instance of the local procedures that it contains, and a new
instance of the local variables that it contains.  The new
instance of the local procedure is bound to the corresponding
new instance of its ancestor's local variables -- that is,
whenever the new instance of the local procedure is called, it
should be passed a static link that points to the corresponding
instance of the its ancestor's local variables.

In languages like Pascal and Ada, it is also possible to pass
a local procedure as an argument.  When passing such a procedure,
the compiler passes a pair of values including the pointer to
the code, and the static link that it should be bound to.
Such a pair is called a bound procedure value.

When bound procedure values are passed and recursion occurs, it
is possible that the static link for the current procedure
points to a stack frame that is not the most recent stack frame
of its ancestor that is currently on the stack.

The proposed symbol table representation:

Two features of the program need to be represented in the
debug symbol table for Digital Unix: the parent and child
relationships among procedures in the forest, and the static
link of each procedure.

Describing the forest of nested procedures:

The parent and child relationships are currently already
described by Digital's compilers in the debug symbol table.
The following is a specification for that description.

Each procedure is defined by the sequence of stProc (or stStaticProc),
stBegin, and stEnd entries.  A procedure X is an ancestor of a
procedure Y if and only if the stProc for Y occurs between the
stProc for X and the corresponding stEnd for X.  A procedure X
is the parent of procedure Y if it is an ancestor of Y, and there
exist no intervening procedure Z such that Z is an ancestor of Y,
and Z is not an ancestor of X.

Between the stProc and corresponding stEnd for a procedure,
an arbitrary number of child procedures may be defined.
Within those child procedures, additional procedure definitions
may be nested to an arbitrary depth.  Thus the trees defined
by this structure are both arbitrarily wide and arbitrarily
deep.

Describing the static links:

Digital's compilers currently do not identify the symbol that
contains the static link for a given procedure, nor do they
describe how many levels up in the procedure's tree of ancestors
the static link goes.  The following is the proposed representation
that future versions of Digital's compilers will use.

When a procedure is passed a static link, that static link
will be moved to a local automatic symbol of the procedure, with
a special name, during the prolog of the procedure.  By the end
of the prolog, the compiled code will ensure that the local symbol
whose name begins with "__StaticLink." will contain the static link
value for the procedure.  This local symbol will have a type
that is equivalent to "char *" variables in C, and will contain
the value of the "real" frame pointer for an ancestor of the
current procedure.

The full name of this symbol will be "__StaticLink." followed by
a positive decimal integer with no leading zeros.  This integer
value identifies the number of levels up the ancestor tree that
the static link points to.

For example, if the name is __StaticLink.3 then after the
procedure's prolog it contains the static link of the procedure
in which it is defined and that procedure's static link points
to a stack frame three levels up in the procedure's ancestor tree
(i.e. to the great grandfather of the procedure).

If the program is compiled without optimization (specifying -O0),
then all static links will point to a stack frame exactly one
level up in the tree.  In this case all static link variables
will be named __StaticLink.1.

How debuggers should interpret these symbol table structures:

Debuggers for Digital Unix should be enhanced to understand
which symbols are visible at a location in the program through
uplevel reference, and how to compute their addresses using
static links.  These enhancements should be used in all contexts
where the user asks the debugger about a name that might be
defined as an uplevel reference.  For example when the user
asks the debugger to print the value of an expression containing
the name.

When the debugger needs the current value, or address for a
name at some location in the program, two separate actions may
be required if the name might be defined as an uplevel reference:
finding the procedure that defines the currently visible
instance of that name (if any), and finding the address of the
currently visible instance of that name.  If only type information
is required, finding the procedure that defines the name may
be sufficient.

Finding the defining procedure is accomplished by repeatedly
looking up the name in the local symbol table of a chain of
procedures that extends from the current procedure through
its chain of ancestors until either the name is found in a
procedure or the end of the chain of ancestors is reached without
finding the name.  If this search terminates without finding
the name then the debugger should conclude that the name is
not visible by uplevel reference at the current location in the
program.

When searching for the defining procedure, the debugger should
count how many levels in the ancestor chain were traversed
before finding the name.  If zero levels were traversed, the
name is defined within the current procedure, and is not
an uplevel reference.  The number of levels traversed is
assumed to be in the variable LevelsToGo in the algorithm
below.

Finding the address for the name involves locating static
link values and dereferencing them with appropriate offsets.
basically, while the number of levels to be traversed is
greater than zero, find the static link symbol for the current
level and obtain its value.  Finally add the desired symbol's
offset from the "real" frame pointer to the final static link
value.

The recommended algorithm for finding the address is as follows:

	LevelsToGo = <from name lookup above>;
	NewProc = CurrentProcedure;
	NewFrame = FramePointerValue(CurrentProcedure);
	Failed = false;
	while (LevelsToGo > 0 && !Failed)
	  {
	  StaticLink = FindStaticLinkSym(NewProc);
	  if (StaticLink == null)
	    Failed = true;
	  else
	    {
	    NewFrame = *(NewFrame + StaticLink->symbol.offset);
	    Levels = StaticLinkLevels(StaticLink);
	    LevelsToGo = LevelsToGo - Levels;
	    for (; Levels > 0; Levels--)
	      NewProc = NewProc->proc.parent;
	    }
	  }

After executing this algorithm, if Failed is true then required
information about static links is missing in the symbol table,
and an error has occurred.  If LevelsToGo ends up less than zero,
then the optimizer's static link optimization has eliminated a
static link level that would be required to compute the address
of the name.  It is recommended that debuggers inform the user
that optimization prevents the debugger from computing the address
of the name.

If Failed is false and LevelsToGo is equal to zero, then the address
for the currently visible instance of the name is NewFrame plus the
offset of the name with respect to the "real" frame pointer for NewProc.

The function FindStaticLinkSym needs to look in the local symbol
table of the indicated procedure for the symbol whose name begins
with "__StaticLink.", and return a pointer to the symbol table
information for that symbol.

The function StaticLinkLevels returns the integer at the end of the
name for the indicated static link symbol.
    
67.5Updated proposal, incorporating comments from 4/28 meetingGEMGRP::MONTELEONETue Apr 29 1997 08:57280
This note is a proposal concerning how uplevel references will be
described by compilers and understood by debuggers on Digital Unix.
This is intended to enable users of certain compiler features to
access uplevel referenced variables while debugging.  To support
this proposal, enhancements are recommended in the GEM compiler
back end, the ladebug debugger, and the TotalView debugger.
Similar enhancements to the dbx debugger should be considered if
resources permit.

Overview:

There are two features that will be included in Digital's
Fortran 90 compiler for Digital Unix, that are not adequately
described in the debug symbol table, and are not supported by
debuggers.  Both of these features use a mechanism called "uplevel
reference" for accessing variables declared in another procedure.
One of these features may also be included in future versions
of other Digital compilers for Digital Unix.

The result is that when debugging programs that use these features,
there are locations in the program where the compiler provides
access to variables through a mechanism that the debuggers do not
understand.  When stopped at these locations in the debugger,
the debugger does not know about these variables, and thus does
not support printing their values, evaluating their addresses, or
using them in expressions within debugger commands.

The Language Features:

The first of these features is host association within internal
subprograms.  This is a feature provided by the Fortran 90 standard.
The definition of one subroutine may declare local variables, and
may also declare an internal subroutine.  Within this internal
subroutine, the local variables of the enclosing "host" subroutine
are available by "host association".

Depending upon command line options such as -automatic, and the
declaration modifiers on the host associated variable, Digital's
Fortran 90 compiler may use the uplevel reference mechanism to
address the variable.

The second feature is manual decomposition.  With this feature
the user identifies a portion of a subroutine that is to be
executed in parallel by multiple threads.  The compiler's
implementation of this feature is based upon defining a
nested (or internal) subroutine which contains the code to be
executed in parallel.  Within this nested subroutine, variables
declared within the user's original subroutine may be accessed
using the uplevel reference mechansim.

The new internal subroutines created by manual decomposition
may occur within an internal subprogram, despite the fact that
the Fortran 90 standard prohibits users from nesting internal
subroutines.  Although manual decomposition is not a feature of
the officially accepted language standard, our implementation
will adhere to the draft standards produced by the ANSI X3H5
committee (AKA the Parallel Computing Forum), which unfortunately
disbanded before a parallel language standard was adopted.

Manual decomposition may be offered by Digital in other compilers
for Digital Unix in the future.  If so, then we expect that it
will create nested subprograms and uplevel references in the
same way that it does in the Fortran 90 compiler.

Digital's Pascal and Ada compilers also use the uplevel reference
mechanism to provide access to variables declared in enclosing
subprogram definitions, as required by their respective language
standards.

How the uplevel reference mechanism works:

A program that uses uplevel references includes a forest of nested
procedures.  Each tree in this forest has a root which is a
procedure that is not defined inside the scope of any other
procedure.  Any procedure in the tree has some number of
descendents which are procedures that are defined within it
(or nested within it), and has a chain of ancestors leading
back to the root.

Traditionally, when any procedure within such a tree is executing,
there must be an active invocation of every procedure in its chain
of ancestors.  The languages ensure this by restricting the visibility
of the name for a nested procedure so that it can only be named
within its parent, and thus the parent must be active if the nested
procedure has been called.

A procedure within such a tree is passed a pointer to a stack
frame for its immediate ancestor so that it can address local
variables of that ancestor.  This pointer is called the static
link of the procedure.

Specifically, the static link contains the "real" frame pointer
for the ancestor.  If the procedure wishes to access local
variables from multiple levels up in the tree, the stack
frame of its immediate ancestor also contains the pointer to
the next level's stack frame.

There is an optimization that Digital's compilers perform and
we would like debuggers to support.  When a procedure needs to
access local variables from multiple levels up, but the program
does not use any of the variables at the intermediate levels,
the compiler may pass that procedure a static link that points
directly to the distant ancestor's stack frame.

When one of these procedures is recursive, it may have multiple
active stack frames during execution.  In this case, logically
each of these dynamic instances of that procedure creates a new
instance of the local procedures that it contains, and a new
instance of the local variables that it contains.  The new
instance of the local procedure is bound to the corresponding
new instance of its ancestor's local variables -- that is,
whenever the new instance of the local procedure is called, it
should be passed a static link that points to the corresponding
instance of the its ancestor's local variables.

In languages like Pascal and Ada, it is also possible to pass
a local procedure as an argument.  When passing such a procedure,
the compiler passes a pair of values including the pointer to
the code, and the static link that it should be bound to.
Such a pair is called a bound procedure value.

When bound procedure values are passed and recursion occurs, it
is possible that the static link for the current procedure
points to a stack frame that is not the most recent stack frame
of its ancestor that is currently on the stack.

The proposed symbol table representation:

Two features of the program need to be represented in the
debug symbol table for Digital Unix: the parent and child
relationships among procedures in the forest, and the static
link of each procedure.

Describing the forest of nested procedures:

The parent and child relationships are currently already
described by Digital's compilers in the debug symbol table.
The following is a specification for that description.

Each procedure is defined by the sequence of stProc (or stStaticProc),
stBegin, and stEnd entries.  A procedure X is an ancestor of a
procedure Y if and only if the stProc for Y occurs between the
stProc for X and the corresponding stEnd for X.  A procedure X
is the parent of procedure Y if it is an ancestor of Y, and there
exist no intervening procedure Z such that Z is an ancestor of Y,
and Z is not an ancestor of X.

Between the stProc and corresponding stEnd for a procedure,
an arbitrary number of child procedures may be defined.
Within those child procedures, additional procedure definitions
may be nested to an arbitrary depth.  Thus the trees defined
by this structure are both arbitrarily wide and arbitrarily
deep.

Describing the static links:

Digital's compilers currently do not identify the symbol that
contains the static link for a given procedure, nor do they
describe how many levels up in the procedure's tree of ancestors
the static link goes.  The following is the proposed representation
that future versions of Digital's compilers will use.

When a procedure is passed a static link, that static link
will be moved to a local automatic symbol of the procedure, with
a special name, during the prolog of the procedure.  By the end
of the prolog, the compiled code will ensure that the local symbol
whose name begins with "__StaticLink." will contain the static link
value for the procedure.

This local symbol will have a type that is equivalent to "void *"
variables in C, and will contain the value of the "real" frame
pointer for an ancestor of the current procedure.  This symbol
will be a local symbol that occurs between the stProc or stStaticProc
and the stBegin, similar to the placement of parameter symbols.

The full name of this symbol will be "__StaticLink." followed by
a positive decimal integer with no leading zeros.  This integer
value identifies the number of levels up the ancestor tree that
the static link points to.

For example, if the name is __StaticLink.3 then after the
procedure's prolog it contains the static link of the procedure
in which it is defined and that procedure's static link points
to a stack frame three levels up in the procedure's ancestor tree
(i.e. to the great grandfather of the procedure).

If the program is compiled without optimization (specifying -O0),
then all static links will point to a stack frame exactly one
level up in the tree.  In this case all static link variables
will be named __StaticLink.1.

How debuggers should interpret these symbol table structures:

Debuggers for Digital Unix should be enhanced to understand
which symbols are visible at a location in the program through
uplevel reference, and how to compute their addresses using
static links.  These enhancements should be used in all contexts
where the user asks the debugger about a name that might be
defined as an uplevel reference.  For example when the user
asks the debugger to print the value of an expression containing
the name.

When the debugger needs the current value, or address for a
name at some location in the program, two separate actions may
be required if the name might be defined as an uplevel reference:
finding the procedure that defines the currently visible
instance of that name (if any), and finding the address of the
currently visible instance of that name.  If only type information
is required, finding the procedure that defines the name may
be sufficient.

Finding the defining procedure is accomplished by repeatedly
looking up the name in the local symbol table of a chain of
procedures that extends from the current procedure through
its chain of ancestors until either the name is found in a
procedure or the end of the chain of ancestors is reached without
finding the name.  If this search terminates without finding
the name then the debugger should conclude that the name is
not visible by uplevel reference at the current location in the
program.

When searching for the defining procedure, the debugger should
count how many levels in the ancestor chain were traversed
before finding the name.  If zero levels were traversed, the
name is defined within the current procedure, and is not
an uplevel reference.  The number of levels traversed is
assumed to be in the variable LevelsToGo in the algorithm
below.

Finding the address for the name involves locating static
link values and dereferencing them with appropriate offsets.
basically, while the number of levels to be traversed is
greater than zero, find the static link symbol for the current
level and obtain its value.  Finally add the desired symbol's
offset from the "real" frame pointer to the final static link
value.

The recommended algorithm for finding the address is as follows:

	LevelsToGo = <from name lookup above>;
	NewProc = CurrentProcedure;
	NewFrame = FramePointerValue(CurrentProcedure);
	Failed = false;
	while (LevelsToGo > 0 && !Failed)
	  {
	  StaticLink = FindStaticLinkSym(NewProc);
	  if (StaticLink == null)
	    Failed = true;
	  else
	    {
	    NewFrame = *(NewFrame + StaticLink->symbol.offset);
	    Levels = StaticLinkLevels(StaticLink);
	    LevelsToGo = LevelsToGo - Levels;
	    for (; Levels > 0; Levels--)
	      NewProc = NewProc->proc.parent;
	    }
	  }

After executing this algorithm, if Failed is true then required
information about static links is missing in the symbol table,
and an error has occurred.  If LevelsToGo ends up less than zero,
then the optimizer's static link optimization has eliminated a
static link level that would be required to compute the address
of the name.  It is recommended that debuggers inform the user
that optimization prevents the debugger from computing the address
of the name.

If Failed is false and LevelsToGo is equal to zero, then the address
for the currently visible instance of the name is NewFrame plus the
offset of the name with respect to the "real" frame pointer for NewProc.

The function FindStaticLinkSym needs to look in the local symbol
table of the indicated procedure for the symbol whose name begins
with "__StaticLink.", and return a pointer to the symbol table
information for that symbol.

The function StaticLinkLevels returns the integer at the end of the
name for the indicated static link symbol.
    
67.6approvedSMURF::LOWELLTue Apr 29 1997 11:223
Approved 4/28/97

OF/STWG ECO ID = 14
67.7Supported in GEM...GEMGRP::MONTELEONEMon May 05 1997 15:126
    
    
    Support for Static Links has been checked into GEM, BL36 and above.
    
    
    Bob