[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::unix_objsym

Title:	Digital UNIX Object File/Symbol Table Notes Conference

Moderator:	SMURF::LOWELL

Created:	Mon Nov 25 1996
Last Modified:	Thu Jun 05 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	71
Total number of notes:	314

67.0. "Static Links" by GEMGRP::MONTELEONE () Wed Apr 16 1997 09:45

    
    
    
    
    
    
    
    
    
    Initial proposal, written by Peter Craig.
    
    
Now that GEM's decomposition will be creating nested function definitions
and uplevel references, we need a convention for describing this code in
the debug symbol table on Unix.  David Lafrance-Linden, Bob Monteleone
and I have been talking about how to accomplish this.

The solution that I had been assuming we could use has now been shown
not to work.  I had been assuming that as long as we don't have the most
general form of bound procedure value (which we won't in DECC or Fortran 90),
then the debugger could simply walk up the stack frames until it finds the
enclosing function that declares the symbol.  This does not work because
we are creating threads and passing them the static link, and POSIX
threads do not have a connection from the base frame on their stack to
the frame on the stack of the creating thread.

The clean and fully general solution that we would all like to get to
is to augment the Unix debug symbol table to explicitly identify the
symbol that contains the static link of a function.  Unfortunately, changes
to the symbol table format take a very long time to accomplish, and a
solution to debugging decomposed applications is needed very soon.

Consequently, Bob and I are recommending two solutions: a short term one
that requires no changes to the debug symbol table structure, and a long
term one that properly describes static links.  The following example
program will be used to illustrate the solutions (in no particular
language, but I hope all will understand it never the less):

	function a
	  automatic variable x
	  function b
	    automatic variable y
	    function c
	      automatic variable z
	      function d
	      <code that uses x and y but not z>
	      end function d
	    end function c
	  end function b
	end function a

Short Term Solution:

We (compilers and debuggers) will agree on a special name, such as
__StaticLink that GEM will give the variable that contains the static
link value for a function.  When a debugger needs the static link that
was passed to a function, it will look for a symbol with that name
within the function.  The fact that function definitions are nested
is indicated by the ordering of the stProcs, stBlocks and stEnds.

If the debug user stops at a breakpoint in function d, and asks the
debugger to print the value of x, the following sequence happens in
the debugger:

  1. The debugger looks for the name x within function d and does
     not find it.

  2. Before telling the user that there is no symbol x, the debugger
     notices that function d is nested within another function, and
     consequently the debugger begins looking in the enclosing
     functions for the name x.

  3. The debugger recursively looks for the name x in the enclosing
     functions until it finds the name x in function a.  The debugger
     now knows that x is located three levels up in the tree of
     nested functions.

  4. The debugger now attempts to find the current static link for
     function a.  Starting at function d, the debugger iterates three
     times around a loop which looks for the name __StaticLink in
     the current function, finds its value, and uses its value as
     the frame pointer for addressing symbols in the next enclosing
     function.

  5. The static link for function a is used to compute the address
     of symbol x within function a's fame.

If the debug user asks for the value of Q, the debugger would fail
to find it at step 3, and would then tell the user there is no such
symbol.

If the program was compiled with optimization, then because function
d does not use any variables from function c, the optimizer may have
passed function d a static link that points directly at the nearest
ancestor containing any variables that d uses.  This can be handled
in one of two ways.  Either we can invent another special name like
__StaticLinkLevels that indicates how many levels up the static
link goes, or we can punt and simply not describe the static link
when optimizing.

If we describe the optimization, then during step 4 above, the
debugger needs to determine how many levels up it has gone on each
iteration by getting the value of __StaticLinkLevels.  If the user
asked for the symbol z, then the debugger would have to notice that
the optimization has skipped over function c, and consequently the
debugger must tell the user that optimization has made it impossible
to print z.

If we punt, then when the optimization is done, the debugger should
complete step 3 successfully and know whether or not the name is
defined.  Then if the debugger fails to find a symbol __StaticLink
at one of the required levels in step 4, the debugger must tell
the user that optimization has made it impossible to print the
variable.

Long Term Solution:

The long term solution is the same as the short term solution
except that there are no special names.  Instead the variable that
contains the static link will have a special "static link" type
which indicates that it's the static link for the current function.

When we implement the long term solution, we should similarly
describe the number of levels that each static link spans in the
tree of nested functions.

T.R	Title	User	Personal Name	Date	Lines
67.1	Discussion, extracted from mail messages...	GEMGRP::MONTELEONE		`Wed Apr 16 1997 09:48`	177
	Excerpts from mail: David, Overall I interpret your response to the uplevel proposal as saying that the short term proposal is acceptable (with some suggested improvements), but you're not comfortable with the long term proposal. I like your suggestion about encoding the static link level into the name. I would like us to move ahead with the short term proposal ASAP, and continue discussing the long term proposal separately. Below are my specific comments on some of the issues you raised. - Peter Craig [email protected] >I'm curious about the etymology of "static link". I can understand >the "link" part but I'm not sure what's static about it since it >changes with each instance of the host procedure. At Symbolics we >called it the lexical environment. Anyway... > I think "lexical environment" would be a better name, but "static link" is the traditional name used by Digital's compiler engineering community. Essentially it means a link to the "static parent" of the procedure. The "static parent" is the lexically enclosing procedure, as opposed to the "dynamic parent" which is the procedure that called you. >The short term solution, well-known name for the reference, looks >fine. I believe this has to be in the outermost scope of the >procedure. In coff-speak, siblings of the stParam's, within the >stProc but before the first stBlock/scText. Good. Let's move ahead with this form of solution. > >I must caution about some of the level-removing optimizations you >mention. > -- First and functional, even though function D may not use variable > Z, variable Z is still in lexical scope and may want to > be seen by the debugger. I'm not sure how well user's will like > your contention that the debugger says an optimization has made it > impossible to find Z. That is the only honest thing to tell the user given that we are performing the optimization. See discussion below. > -- Second, doing such optimizations, even when there were no > variables (such as Z) to consider, was a rich source of bugs in > the Symbolic's compiler and debugger. Perhaps it was just lack of > rigor; I don't know. Once we had lexical scoping, various > programming styles and paradigms erupted that used it in lots of > different ways which allowed for optimizations, but which > sometimes caused the optimizations to forget how to get to the > right level. >I'm not going to fight either of those; just mention them. This optimization has been in our GEM-based Pascal and Ada compilers for many years, and as far as I know is utterly reliable. We (Digital) are striving for the highest level of performance possible. If an optimization significantly interferes with the user's ability to debug, then we recommend that users compile with -g which should suppress that optimization. We have implemented a number of optimizations that fall in this category. The GEM project (and Ron Brender in particular) are interested in improving the user's ability to debug optimized code, however we are not willing to limit our optimizations based on the current debugging capabilities. > >With or without the optimization, I might vote for naming the variable >__StaticLink.<uplevel> to combine the features of __StaticLink and >__StaticLinkLevels. I.e., we'll usually see __StaticLink.1 but >sometimes we'll see __StaticLink.2 and higher. I like this idea. It seems somewhat more powerful than my original proposal. The debuggers' algorithms for finding the desired static link will need to be modified to search for __StaticLink.* and discover which level(s) are available within the current procedure. > >For the long term solution, I'm not comfortable extending the symbol >table quite yet. As some of you know, I'm a fan of well-known names >when there's not good reason not to. Second, I don't think it is the >general solution. For example, here's my cut at what the proposal >would say: > Create a new st type, stStaticLink, which can be paired with sc > codes as follows: > scAbs value is on stack, use stLocal algorithm > scRegister value is in register > scData, scBss value is in memory (unlikely) > scTlsXXX value is TLS, use TLS algorithm (unlikely) >The only thing this does is move the lookup from the name to the st >type, gobbling the st indicator that could otherwise encoded >something. But there are two technical problems. > -- Suppose the compiler decided to do further optimizations by > passing not just the parent's link, but also the links of other > ancestors as well? I.e., parent in t0, grandparent in t1, etc, > which get stashed someplace (registers and/or stack). They are > all static links, but for different uplevels. Does this still use > stStaticLink and the number of uplevels is encoded in the name? In principle this is a problem. In practice we haven't found it useful to implement such an optimization in over a decade of Pascal and Ada compilers. I'm sure we can develop a symbol table representation that doesn't assume a single static link if we feel that generality is useful. > -- In the eventuality we have bound procedures that can persist after > the parent function exits, I'm not sure this solution is enough. > For example, consider > function OUTER > variable OO > function MIDDLE > variable MM > return INNER; > function INNER > variable II > <use OO,MM,II> > end function > end function > end function > where function INNER can be called after function MIDDLE has > returned. In this case, function MIDDLE generally has to malloc() > some memory to hold the local variables that INNER will reference; > MM in this case, but also MIDDLE's link to OUTER. I.e., something > like > function MIDDLE > struct my_environment { > void * __StaticLink; > variable MM; > } * __Environment = malloc(sizeof(__Environment)); > __Environment->__StaticLink = __StaticLink; > callable_INNER = bind_up_in_heap(INNER, __Environment); > return callable_INNER; > end function > It is __Environment that MIDDLE passes to INNER as the static > link, not the SP or FP register. I suppose the type of INNER's > __StaticLink would be a struct rather than a void , and this > would be a clue to the debugger. And in that case, I suppose a > stStaticLink/scInfo inside an stBlock describe a struct would be > equivalent to stMember. This hypothetical feature is not likely to appear in any language that our debuggers need to support. I would still consider it a valid point if it demonstrated that one proposal would be flexible enough to support it and another proposal would not. However, it seems to me that the name based schemes have no inherent advantage over binary schemes. One could invent binary schemes that describe multiple static links to different environments. > >To make a long story short, we're talking algorithms here. I don't >think the symbol table format is a good place to describe algorithms, >especially if experimentation or extension is foreseeable. Naming >flexibility permits algorithm flexibility. Predefined binary >encodings generally do not permit algorithm flexibility, and the long >lead times for format extension cause a fallback to naming schemes >anyway. No. I dissagree. We're not talking algorithms here. We are talking about a data structure (the debug symbol table) that describes a program. Name based schemes have the advantage that they permit late binding of meanings to names. They have the (minor) dissadvantage of being less compact. Historically, changing binary formats usually required all code that uses them to change. For this reason engineers have tended to take a more serious attitude about specifying and documenting new features in binary formats. If this same careful attention were paid to name based features that don't automatically require everyone to change their code, I think I'd feel comfortable with them.
67.2	A few more comments in the mail	NNTPD::"[email protected]"	David LaFrance-Linden	`Tue Apr 22 1997 14:33`	66
	Date: Wed, 9 Apr 1997 13:57:42 -0400 From: "David C. P. LaFrance-Linden" <david_ll> Short term ASAP is fine. Discussing long term is fine, too. When faced with making decisions of debuggability vs optimization, I have traditionally leaned toward debuggability. I surmise from your comments that the prevailing wind is optimization foremost. That's fine; we just need to be clear, e.g., documentation. My cautionary comments about skipping uplevels is based on a personal history. If Digital's Pascal and Ada have been doing it successfully for years, then presumably whatever problems have already been ironed out. I suspect part of the Symbolics problem was complexity. We tended to use macros for extending the syntax, and the macros would expand into a style resembling continuation passing, so it was quite possible to have nested functions to a dozen levels. Depending on how they were used, which ones referenced variables from where, which ones could keep their environment on the stack and which ones needed to spill some of their environment to heap storage, things could get complicated. And at times we goofed. however we are not willing to limit our optimizations based on the current debugging capabilities. I generally agree, and sometimes forget. Feel free to make a sign with that on it and hit me over the head when I do forget. Re: heap-based environment which contain static links: This hypothetical feature is not likely to appear in any language that our debuggers need to support. In practice, perhaps not. In theory, Java 1.2 might, but I'm not sure even that would intersect with this discussion. Historically, changing binary formats usually required all code that uses them to change. For this reason engineers have tended to take a more serious attitude about specifying and documenting new features in binary formats. If this same careful attention were paid to name based features that don't automatically require everyone to change their code, I think I'd feel comfortable with them. I think we devolve into religion here. For me, name-based schemes don't require everyone to change their code. If ANY scheme changes, the people who use the scheme may have to change their code. But more often with binary changes, everybody might have to change their code even if they don't use the feature. In addition to the advantage of late binding of meanings to names, name-based schemes also permit experimentation, progress, and flexibility without the constraints of a governing board. Allowing name-based schemes for short term solutions may also provide for experimentation and progress, but assuming binary encoding for the long term can have flexibility problems. In STWG we recently obsoleted the binary form of F90 array descriptors in favor of a named-based scheme partly because the binary form didn't have the flexibility (distinguishing assumed shape vs POINTER vs ALLOCATABLE) that the named-based scheme did. Perhaps this was just a lack of careful attention (but applied to a binary approach!) you would like to see for named-based schemes, I'm not convinced. Anyway, let's move forward with named-based, at least for the short term. And welcome to the "Are named-based schemes bad?" debate! [Posted by WWW Notes gateway]
67.3	Various comments	FLYBA::BRENDER	Ron Brender	`Wed Apr 23 1997 11:55`	80
	First of all, TIME OUT! Last time we talked about a name vs binary convention, we all agreed that we really ought to develop a list/catalog of all the name based conventions that currently exist and are in use -- before we go proliferating more... While I personally tend toward the binary side of this religious debate, I am quite willing to be pragmatic -- provided we get the universe of naming conventions documented and under control. Someone(s) took on/were given the assignment to put together such a summary. Where is it? End of TIME OUT... -------------------------------------------------------------------------------- Re .1: Is it clear just what value constitutes a static link? Are we talking "real frame pointer" (which could be the parent's SP or FP, depending) or "virtual frame pointer" which biases the real frame pointer by some amount (a constant on DU, I think)? If we are to accomodate compiler's other than GEM, do we need to allow for a choice of static link convention? Re .2: >>I must caution about some of the level-removing optimizations you >>mention. >> -- First and functional, even though function D may not use variable >> Z, variable Z is still in lexical scope and may want to >> be seen by the debugger. I'm not sure how well user's will like >> your contention that the debugger says an optimization has made it >> impossible to find Z. > >That is the only honest thing to tell the user given that we are >performing the optimization. See discussion below. I note in passing that, if the need arises, it should be straight- forward to identify those cases where the caller's frame and the parent's frame are the same and can be found by parsing the stack directly rather than resorting to a static link. A first cut at the criteria: - Parent frame is fixed size (eg, no use of alloca(), however spelled, explicit or implicit) - Current routine is called only directly and from the lexical parent, eg excludes self recusion, no bound procedure values created - Others? All that remains is for a way to pass a "flag" to the effect that the necessary criteria are satisfied. I doubt we need to worry about this right now, however. >>With or without the optimization, I might vote for naming the variable >>__StaticLink.<uplevel> to combine the features of __StaticLink and >>__StaticLinkLevels. I.e., we'll usually see __StaticLink.1 but >>sometimes we'll see __StaticLink.2 and higher. > >I like this idea. It seems somewhat more powerful than my original >proposal. The debuggers' algorithms for finding the desired static >link will need to be modified to search for __StaticLink.* and >discover which level(s) are available within the current procedure. Interesting suggestion. The pragmatic question is whether a single but a bit more complicated lookup is faster than doing two independent lookups. If the single one is no slower (and the symbol table size is definitely smaller), then by all means lets use the __StaticLink.<n> convention! Re .1, .2, .3: short term vs long term Two ways of doing exactly the same thing has a lot of negative appeal going for it. If we adopt some form of the short term proposal now, I can't see later adopting a functionally equivalent but different long term scheme later.
67.4	Formalized proposal	GEMGRP::MONTELEONE		`Thu Apr 24 1997 12:50`	278
	This note is a proposal concerning how uplevel references will be described by compilers and understood by debuggers on Digital Unix. This is intended to enable users of certain compiler features to access uplevel referenced variables while debugging. To support this proposal, enhancements are recommended in the GEM compiler back end, the ladebug debugger, and the TotalView debugger. Similar enhancements to the dbx debugger should be considered if resources permit. Overview: There are two features that will be included in Digital's Fortran 90 compiler for Digital Unix, that are not adequately described in the debug symbol table, and are not supported by debuggers. Both of these features use a mechanism called "uplevel reference" for accessing variables declared in another procedure. One of these features may also be included in future versions of other Digital compilers for Digital Unix. The result is that when debugging programs that use these features, there are locations in the program where the compiler provides access to variables through a mechanism that the debuggers do not understand. When stopped at these locations in the debugger, the debugger does not know about these variables, and thus does not support printing their values, evaluating their addresses, or using them in expressions within debugger commands. The Language Features: The first of these features is host association within internal subprograms. This is a feature provided by the Fortran 90 standard. The definition of one subroutine may declare local variables, and may also declare an internal subroutine. Within this internal subroutine, the local variables of the enclosing "host" subroutine are available by "host association". Depending upon command line options such as -automatic, and the declaration modifiers on the host associated variable, Digital's Fortran 90 compiler may use the uplevel reference mechanism to address the variable. The second feature is manual decomposition. With this feature the user identifies a portion of a subroutine that is to be executed in parallel by multiple threads. The compiler's implementation of this feature is based upon defining a nested (or internal) subroutine which contains the code to be executed in parallel. Within this nested subroutine, variables declared within the user's original subroutine may be accessed using the uplevel reference mechansim. The new internal subroutines created by manual decomposition may occur within an internal subprogram, despite the fact that the Fortran 90 standard prohibits users from nesting internal subroutines. Although manual decomposition is not a feature of the officially accepted language standard, our implementation will adhere to the draft standards produced by the ANSI X3H5 committee (AKA the Parallel Computing Forum), which unfortunately disbanded before a parallel language standard was adopted. Manual decomposition may be offered by Digital in other compilers for Digital Unix in the future. If so, then we expect that it will create nested subprograms and uplevel references in the same way that it does in the Fortran 90 compiler. Digital's Pascal and Ada compilers also use the uplevel reference mechanism to provide access to variables declared in enclosing subprogram definitions, as required by their respective language standards. How the uplevel reference mechanism works: A program that uses uplevel references includes a forest of nested procedures. Each tree in this forest has a root which is a procedure that is not defined inside the scope of any other procedure. Any procedure in the tree has some number of descendents which are procedures that are defined within it (or nested within it), and has a chain of ancestors leading back to the root. Traditionally, when any procedure within such a tree is executing, there must be an active invocation of every procedure in its chain of ancestors. The languages ensure this by restricting the visibility of the name for a nested procedure so that it can only be named within its parent, and thus the parent must be active if the nested procedure has been called. A procedure within such a tree is passed a pointer to a stack frame for its immediate ancestor so that it can address local variables of that ancestor. This pointer is called the static link of the procedure. Specifically, the static link contains the "real" frame pointer for the ancestor. If the procedure wishes to access local variables from multiple levels up in the tree, the stack frame of its immediate ancestor also contains the pointer to the next level's stack frame. There is an optimization that Digital's compilers perform and we would like debuggers to support. When a procedure needs to access local variables from multiple levels up, but the program does not use any of the variables at the intermediate levels, the compiler may pass that procedure a static link that points directly to the distant ancestor's stack frame. When one of these procedures is recursive, it may have multiple active stack frames during execution. In this case, logically each of these dynamic instances of that procedure creates a new instance of the local procedures that it contains, and a new instance of the local variables that it contains. The new instance of the local procedure is bound to the corresponding new instance of its ancestor's local variables -- that is, whenever the new instance of the local procedure is called, it should be passed a static link that points to the corresponding instance of the its ancestor's local variables. In languages like Pascal and Ada, it is also possible to pass a local procedure as an argument. When passing such a procedure, the compiler passes a pair of values including the pointer to the code, and the static link that it should be bound to. Such a pair is called a bound procedure value. When bound procedure values are passed and recursion occurs, it is possible that the static link for the current procedure points to a stack frame that is not the most recent stack frame of its ancestor that is currently on the stack. The proposed symbol table representation: Two features of the program need to be represented in the debug symbol table for Digital Unix: the parent and child relationships among procedures in the forest, and the static link of each procedure. Describing the forest of nested procedures: The parent and child relationships are currently already described by Digital's compilers in the debug symbol table. The following is a specification for that description. Each procedure is defined by the sequence of stProc (or stStaticProc), stBegin, and stEnd entries. A procedure X is an ancestor of a procedure Y if and only if the stProc for Y occurs between the stProc for X and the corresponding stEnd for X. A procedure X is the parent of procedure Y if it is an ancestor of Y, and there exist no intervening procedure Z such that Z is an ancestor of Y, and Z is not an ancestor of X. Between the stProc and corresponding stEnd for a procedure, an arbitrary number of child procedures may be defined. Within those child procedures, additional procedure definitions may be nested to an arbitrary depth. Thus the trees defined by this structure are both arbitrarily wide and arbitrarily deep. Describing the static links: Digital's compilers currently do not identify the symbol that contains the static link for a given procedure, nor do they describe how many levels up in the procedure's tree of ancestors the static link goes. The following is the proposed representation that future versions of Digital's compilers will use. When a procedure is passed a static link, that static link will be moved to a local automatic symbol of the procedure, with a special name, during the prolog of the procedure. By the end of the prolog, the compiled code will ensure that the local symbol whose name begins with "__StaticLink." will contain the static link value for the procedure. This local symbol will have a type that is equivalent to "char " variables in C, and will contain the value of the "real" frame pointer for an ancestor of the current procedure. The full name of this symbol will be "__StaticLink." followed by a positive decimal integer with no leading zeros. This integer value identifies the number of levels up the ancestor tree that the static link points to. For example, if the name is __StaticLink.3 then after the procedure's prolog it contains the static link of the procedure in which it is defined and that procedure's static link points to a stack frame three levels up in the procedure's ancestor tree (i.e. to the great grandfather of the procedure). If the program is compiled without optimization (specifying -O0), then all static links will point to a stack frame exactly one level up in the tree. In this case all static link variables will be named __StaticLink.1. How debuggers should interpret these symbol table structures: Debuggers for Digital Unix should be enhanced to understand which symbols are visible at a location in the program through uplevel reference, and how to compute their addresses using static links. These enhancements should be used in all contexts where the user asks the debugger about a name that might be defined as an uplevel reference. For example when the user asks the debugger to print the value of an expression containing the name. When the debugger needs the current value, or address for a name at some location in the program, two separate actions may be required if the name might be defined as an uplevel reference: finding the procedure that defines the currently visible instance of that name (if any), and finding the address of the currently visible instance of that name. If only type information is required, finding the procedure that defines the name may be sufficient. Finding the defining procedure is accomplished by repeatedly looking up the name in the local symbol table of a chain of procedures that extends from the current procedure through its chain of ancestors until either the name is found in a procedure or the end of the chain of ancestors is reached without finding the name. If this search terminates without finding the name then the debugger should conclude that the name is not visible by uplevel reference at the current location in the program. When searching for the defining procedure, the debugger should count how many levels in the ancestor chain were traversed before finding the name. If zero levels were traversed, the name is defined within the current procedure, and is not an uplevel reference. The number of levels traversed is assumed to be in the variable LevelsToGo in the algorithm below. Finding the address for the name involves locating static link values and dereferencing them with appropriate offsets. basically, while the number of levels to be traversed is greater than zero, find the static link symbol for the current level and obtain its value. Finally add the desired symbol's offset from the "real" frame pointer to the final static link value. The recommended algorithm for finding the address is as follows: LevelsToGo = <from name lookup above>; NewProc = CurrentProcedure; NewFrame = FramePointerValue(CurrentProcedure); Failed = false; while (LevelsToGo > 0 && !Failed) { StaticLink = FindStaticLinkSym(NewProc); if (StaticLink == null) Failed = true; else { NewFrame = (NewFrame + StaticLink->symbol.offset); Levels = StaticLinkLevels(StaticLink); LevelsToGo = LevelsToGo - Levels; for (; Levels > 0; Levels--) NewProc = NewProc->proc.parent; } } After executing this algorithm, if Failed is true then required information about static links is missing in the symbol table, and an error has occurred. If LevelsToGo ends up less than zero, then the optimizer's static link optimization has eliminated a static link level that would be required to compute the address of the name. It is recommended that debuggers inform the user that optimization prevents the debugger from computing the address of the name. If Failed is false and LevelsToGo is equal to zero, then the address for the currently visible instance of the name is NewFrame plus the offset of the name with respect to the "real" frame pointer for NewProc. The function FindStaticLinkSym needs to look in the local symbol table of the indicated procedure for the symbol whose name begins with "__StaticLink.", and return a pointer to the symbol table information for that symbol. The function StaticLinkLevels returns the integer at the end of the name for the indicated static link symbol.
67.5	Updated proposal, incorporating comments from 4/28 meeting	GEMGRP::MONTELEONE		`Tue Apr 29 1997 07:57`	280
	This note is a proposal concerning how uplevel references will be described by compilers and understood by debuggers on Digital Unix. This is intended to enable users of certain compiler features to access uplevel referenced variables while debugging. To support this proposal, enhancements are recommended in the GEM compiler back end, the ladebug debugger, and the TotalView debugger. Similar enhancements to the dbx debugger should be considered if resources permit. Overview: There are two features that will be included in Digital's Fortran 90 compiler for Digital Unix, that are not adequately described in the debug symbol table, and are not supported by debuggers. Both of these features use a mechanism called "uplevel reference" for accessing variables declared in another procedure. One of these features may also be included in future versions of other Digital compilers for Digital Unix. The result is that when debugging programs that use these features, there are locations in the program where the compiler provides access to variables through a mechanism that the debuggers do not understand. When stopped at these locations in the debugger, the debugger does not know about these variables, and thus does not support printing their values, evaluating their addresses, or using them in expressions within debugger commands. The Language Features: The first of these features is host association within internal subprograms. This is a feature provided by the Fortran 90 standard. The definition of one subroutine may declare local variables, and may also declare an internal subroutine. Within this internal subroutine, the local variables of the enclosing "host" subroutine are available by "host association". Depending upon command line options such as -automatic, and the declaration modifiers on the host associated variable, Digital's Fortran 90 compiler may use the uplevel reference mechanism to address the variable. The second feature is manual decomposition. With this feature the user identifies a portion of a subroutine that is to be executed in parallel by multiple threads. The compiler's implementation of this feature is based upon defining a nested (or internal) subroutine which contains the code to be executed in parallel. Within this nested subroutine, variables declared within the user's original subroutine may be accessed using the uplevel reference mechansim. The new internal subroutines created by manual decomposition may occur within an internal subprogram, despite the fact that the Fortran 90 standard prohibits users from nesting internal subroutines. Although manual decomposition is not a feature of the officially accepted language standard, our implementation will adhere to the draft standards produced by the ANSI X3H5 committee (AKA the Parallel Computing Forum), which unfortunately disbanded before a parallel language standard was adopted. Manual decomposition may be offered by Digital in other compilers for Digital Unix in the future. If so, then we expect that it will create nested subprograms and uplevel references in the same way that it does in the Fortran 90 compiler. Digital's Pascal and Ada compilers also use the uplevel reference mechanism to provide access to variables declared in enclosing subprogram definitions, as required by their respective language standards. How the uplevel reference mechanism works: A program that uses uplevel references includes a forest of nested procedures. Each tree in this forest has a root which is a procedure that is not defined inside the scope of any other procedure. Any procedure in the tree has some number of descendents which are procedures that are defined within it (or nested within it), and has a chain of ancestors leading back to the root. Traditionally, when any procedure within such a tree is executing, there must be an active invocation of every procedure in its chain of ancestors. The languages ensure this by restricting the visibility of the name for a nested procedure so that it can only be named within its parent, and thus the parent must be active if the nested procedure has been called. A procedure within such a tree is passed a pointer to a stack frame for its immediate ancestor so that it can address local variables of that ancestor. This pointer is called the static link of the procedure. Specifically, the static link contains the "real" frame pointer for the ancestor. If the procedure wishes to access local variables from multiple levels up in the tree, the stack frame of its immediate ancestor also contains the pointer to the next level's stack frame. There is an optimization that Digital's compilers perform and we would like debuggers to support. When a procedure needs to access local variables from multiple levels up, but the program does not use any of the variables at the intermediate levels, the compiler may pass that procedure a static link that points directly to the distant ancestor's stack frame. When one of these procedures is recursive, it may have multiple active stack frames during execution. In this case, logically each of these dynamic instances of that procedure creates a new instance of the local procedures that it contains, and a new instance of the local variables that it contains. The new instance of the local procedure is bound to the corresponding new instance of its ancestor's local variables -- that is, whenever the new instance of the local procedure is called, it should be passed a static link that points to the corresponding instance of the its ancestor's local variables. In languages like Pascal and Ada, it is also possible to pass a local procedure as an argument. When passing such a procedure, the compiler passes a pair of values including the pointer to the code, and the static link that it should be bound to. Such a pair is called a bound procedure value. When bound procedure values are passed and recursion occurs, it is possible that the static link for the current procedure points to a stack frame that is not the most recent stack frame of its ancestor that is currently on the stack. The proposed symbol table representation: Two features of the program need to be represented in the debug symbol table for Digital Unix: the parent and child relationships among procedures in the forest, and the static link of each procedure. Describing the forest of nested procedures: The parent and child relationships are currently already described by Digital's compilers in the debug symbol table. The following is a specification for that description. Each procedure is defined by the sequence of stProc (or stStaticProc), stBegin, and stEnd entries. A procedure X is an ancestor of a procedure Y if and only if the stProc for Y occurs between the stProc for X and the corresponding stEnd for X. A procedure X is the parent of procedure Y if it is an ancestor of Y, and there exist no intervening procedure Z such that Z is an ancestor of Y, and Z is not an ancestor of X. Between the stProc and corresponding stEnd for a procedure, an arbitrary number of child procedures may be defined. Within those child procedures, additional procedure definitions may be nested to an arbitrary depth. Thus the trees defined by this structure are both arbitrarily wide and arbitrarily deep. Describing the static links: Digital's compilers currently do not identify the symbol that contains the static link for a given procedure, nor do they describe how many levels up in the procedure's tree of ancestors the static link goes. The following is the proposed representation that future versions of Digital's compilers will use. When a procedure is passed a static link, that static link will be moved to a local automatic symbol of the procedure, with a special name, during the prolog of the procedure. By the end of the prolog, the compiled code will ensure that the local symbol whose name begins with "__StaticLink." will contain the static link value for the procedure. This local symbol will have a type that is equivalent to "void " variables in C, and will contain the value of the "real" frame pointer for an ancestor of the current procedure. This symbol will be a local symbol that occurs between the stProc or stStaticProc and the stBegin, similar to the placement of parameter symbols. The full name of this symbol will be "__StaticLink." followed by a positive decimal integer with no leading zeros. This integer value identifies the number of levels up the ancestor tree that the static link points to. For example, if the name is __StaticLink.3 then after the procedure's prolog it contains the static link of the procedure in which it is defined and that procedure's static link points to a stack frame three levels up in the procedure's ancestor tree (i.e. to the great grandfather of the procedure). If the program is compiled without optimization (specifying -O0), then all static links will point to a stack frame exactly one level up in the tree. In this case all static link variables will be named __StaticLink.1. How debuggers should interpret these symbol table structures: Debuggers for Digital Unix should be enhanced to understand which symbols are visible at a location in the program through uplevel reference, and how to compute their addresses using static links. These enhancements should be used in all contexts where the user asks the debugger about a name that might be defined as an uplevel reference. For example when the user asks the debugger to print the value of an expression containing the name. When the debugger needs the current value, or address for a name at some location in the program, two separate actions may be required if the name might be defined as an uplevel reference: finding the procedure that defines the currently visible instance of that name (if any), and finding the address of the currently visible instance of that name. If only type information is required, finding the procedure that defines the name may be sufficient. Finding the defining procedure is accomplished by repeatedly looking up the name in the local symbol table of a chain of procedures that extends from the current procedure through its chain of ancestors until either the name is found in a procedure or the end of the chain of ancestors is reached without finding the name. If this search terminates without finding the name then the debugger should conclude that the name is not visible by uplevel reference at the current location in the program. When searching for the defining procedure, the debugger should count how many levels in the ancestor chain were traversed before finding the name. If zero levels were traversed, the name is defined within the current procedure, and is not an uplevel reference. The number of levels traversed is assumed to be in the variable LevelsToGo in the algorithm below. Finding the address for the name involves locating static link values and dereferencing them with appropriate offsets. basically, while the number of levels to be traversed is greater than zero, find the static link symbol for the current level and obtain its value. Finally add the desired symbol's offset from the "real" frame pointer to the final static link value. The recommended algorithm for finding the address is as follows: LevelsToGo = <from name lookup above>; NewProc = CurrentProcedure; NewFrame = FramePointerValue(CurrentProcedure); Failed = false; while (LevelsToGo > 0 && !Failed) { StaticLink = FindStaticLinkSym(NewProc); if (StaticLink == null) Failed = true; else { NewFrame = (NewFrame + StaticLink->symbol.offset); Levels = StaticLinkLevels(StaticLink); LevelsToGo = LevelsToGo - Levels; for (; Levels > 0; Levels--) NewProc = NewProc->proc.parent; } } After executing this algorithm, if Failed is true then required information about static links is missing in the symbol table, and an error has occurred. If LevelsToGo ends up less than zero, then the optimizer's static link optimization has eliminated a static link level that would be required to compute the address of the name. It is recommended that debuggers inform the user that optimization prevents the debugger from computing the address of the name. If Failed is false and LevelsToGo is equal to zero, then the address for the currently visible instance of the name is NewFrame plus the offset of the name with respect to the "real" frame pointer for NewProc. The function FindStaticLinkSym needs to look in the local symbol table of the indicated procedure for the symbol whose name begins with "__StaticLink.", and return a pointer to the symbol table information for that symbol. The function StaticLinkLevels returns the integer at the end of the name for the indicated static link symbol.
67.6	approved	SMURF::LOWELL		`Tue Apr 29 1997 10:22`	3
	Approved 4/28/97 OF/STWG ECO ID = 14
67.7	Supported in GEM...	GEMGRP::MONTELEONE		`Mon May 05 1997 14:12`	6
	Support for Static Links has been checked into GEM, BL36 and above. Bob