| A Symbol Table Proposal for Representing Split Routines
by Peter Thompson
Introduction
Put simply, the idea of routine splitting is essentially a
means to improve performance. The typical executable has a small
amount of code that is the most frequently exercised, and a larger
portion that is less frequently hit. Often a small loop may call a
routine through each iteration, and if the two sections of code are
widely separated, we increase the chances of cache misses. If instead
we move the code from various routines that is most often exercised,
placing them together in a common spot, then the chance of a cache hit
increases. For example, we might have a call to a function inside a
loop:
for (x = 0; x < MAX; x++) {
if (analyze_array(array_head, x) == TRUE)
do_deep_scan(array_head, x);
}
Depending on the size of analyze_array, it may be advantageous
to move the code from the loop outside of its enclosing routine,
and bunch it together with analyze_array, and possibly do_deep_scan.
This way, when the loop is entered, and analyze_array is called in
each loop, instead of having to flush the cache each time to
read in analyze_array, we can speed through the loop with everything
needed loaded into cache with the first iteration and not flushed
until we leave the loop. The mechanics of actually splitting the
routine and grouping the code into cacheable pages is left as
an exercise for the compilers, linker, loader, and om. However,
as a side effect, this will place some strain on symbol table
consumers unless we plan this carefully.
Requirements
The producers and consumers of symbol tables will need to work
closely together to coordinate this change. In order to do the actual
routine splitting the proposal is to do performance analysis of the
executable, and, using the generated information, rearrange the
executable so that the basic blocks most often exercised, the 'hot
blocks',are grouped together at the top of executable. The 'cold
blocks' will move be moved downward. When this is done, the symbol
table must be changed to allow the consumers to follow the changes.
The consumers can be seen in two ways. There are what I would
consider to be the 'passive' consumers, such as odump, and stdump,
which read and display the symbol table. These will merely need to be
modified to recognize and display any new information. Then there are
the active consumers. These are the debuggers, the profilers, and
likely ostrip. The debuggers need to be able to set breakpoints and
interpret variables in the section of code that is relocated, but
logically part of the original routine. The profilers need to be able
to read and instrument the executable before it is run, so that they
can produce logical output when the instrumented executable is run.
Ostrip will need to recognize the new information to decide what to
keep and what to discard at various option levels.
Proposal
There are some specific needs that can be described briefly.
First, we need a way to indicate which procedures have been split.
Then we need to know where the code was moved from, where it moved
to, and how much of the code was moved.
Taking a look at the procedure descriptor, there is a reserved
field of 13 bits. If we take one bit here for use as a flag,
to indicate the split procedure, then we can maintain the size of the
PDR. So the reserved field in the PDR would change to:
unsigned split_proc : 1; /* true if procedure is split */
unsigned reserved : 12; /* reserved: must be zero */
Next there are addresses and size that would be needed by the consumers
in order to process the executables correctly. These are; the
location that the code was originally located at, or more aptly,
the location of the branch to the moved code; the address
the moved code now resides at; and the length or size of the code that
was moved. All this information should be readily available
to the producer at the time the routine is split. A more complete
explanation of how the code will look after it is split is given
below in the producer task section. Suffice it to say here that
by the original location, it should be understood that this is
the location of the branch to the moved code after the procedure
has been split. It is assumed that rather than leaving a gap
that is jumped over, the remaining code would be collapsed or moved
to fill in this gap.
The location the code was originally at gives us 2 important
pieces of information. First, it allows us to line up each piece
of split code with it's original procedure. It's a simple translation
from address to procedure, and if many procedures are split,
this will be good way to keep them straight. Second, it allows
us to deal with the line numbers in a somewhat straight forward
manner. Due to the way the line number info is set up, it would
be easier to keep track of line numbers by looking to see if this
is a split procedure, and if so, keeping track of the location
where the split takes place. When we reach that location,
then we can manipulate the counters so that we start tracking things
at the new location, and jump back when we reach the end of the
moved portion of code, which of course we know from the size of
that code.
Given the current structure of the symbol table, it seems the
best location to store this information is in the comment
section. I propose the following to encode this information.
We need a new tag in the comment section to identify and locate
the data. The new tag, to stay with easily understandable names,
could be called:
CM_SPLIT_PRC
and would be assigned a value that does not conflict with any
existing (or soon to be assigned) tag.
The information described above, branch address, new address, and
the size of the moved code, would fit naturally in a structure
which would be define in the scncomment.h header. The structure
can be described as following:
struct split_info {
unsigned long branch_addr; /* location of branch to code */
unsigned long vaddr; /* current location of code */
unsigned long size; /* size of moved code */
};
This should suffice to fit the needs of both consumers and producers.
The size field here would probably never to exceed the size of an int, as
that would likely exceed the size of an cache and invalidate the whole
exercise, but it seems that it would ease alignment if it is kept at
an unsigned long. Perhaps we could break this into an unsigned
int followed by a reserved int.
Producer Tasks
Currently OM is the only producer capable of splitting
a procedure. Prior to the split let us presume that we have an
image that looks something like the following:
A1-> /* Main entry point */
A2-> (C) /* Basic block, tight loop calling C */
A3-> /* intermediate initializing code */
A4-> (B) /* Another tight loop calling B */
A5 /* exit code, clean up tasks */
B1-> /* Initializing code mostly skipped */
/* after first run. */
B2-> /* Main body of routine B, tight loop */
B3 /* post loop cleanup code */
C /* whole body of C */
The basic plan is to move the most exercised routines and blocks,
the "hot" code, to the top of the executable, and for blocks,
to the top of the routine if they are not entirely split out of
the routine. The 'cold' routines and blocks are pushed towards
the bottom of the executable or routine.
After OM has split the executable, the layout of the file will look
very different. Depending on which is the hotter loop, either
C or B may come out on top. For the sake of argument, let us suppose
that A2 is the hotter loop, and C therefore the hotter routine.
This leaves us with something that should resemble the following:
A2-> /* hottest loop, calls C, ends with branch to A3 */
C /* hottest routine */
A4-> /* next hottest loop, calls B, ends with br to A5 */
B2 /* hottest block in B ends with branch to B3 */
B1-> /* entry point in B, branches to b2 */
B3 /* exit point of routine */
A1-> /* main entry point, ends with branch to A2 */
A3-> /* code immediately follows br to A2, ends with branch
/* to A4 */
A5 /* code immediately follows br to A4 */
Once OM has split the procedure in the above or a similar fashion,
all the addresses in the symbol table will be updated to reflect
the new location of the various routines and entry points. The
debuggers could probably use this as is, to set breakpoints on
main entry points, and follow the code with next and step
instructions, but would get confused in setting breakpoints
and determining scope in the sections of a routine that has
been moved. Exception handling should not be affected, since
OM will modify the code range descriptors to reflect the new layout of
the program. The same runtime procedure descriptors can be used,
as the flow through the code is not affected, just the location of
the actual pieces.
In order to allow the consumers to deal with the split procedures, OM
(and any other producer that may do this in the future) will need to
modify the procedure descriptor and write the proposed information
into the comment section. First the procedure descriptor will have
the new split_proc flag set to 1. Then a comment section header would be
written out with the CM_SPLIT_PRC tag, a cm_len equal to the
sizeof(struct split_info) and a cm_val pointing to the split_info
structure in the free-form data area of the comment section.
The split_info struct would have the location of the branch in
the split routine in .branch_addr, the current location in .vaddr,
and the size of the code moved in .size.
All of this can be done by the producer itself, or with the proposed
mcs tool.
Consumer Tasks
In order to facilitate access of the split procedure data a number of API's
will be designed. To handle the line number information, the current
access routines in libmld and libst will be updated, with the actual line
number info remaining unchanged for the purpose of this proposal.
The consumers can then decide for themselves how best to handle
this information.
API's (very roughly designed)
New structures:
Note: The following structures were based on some code I experimented
with while investigating using the code range descriptors to passively
collect the needed info without any modification to the symbol table.
This didn't pan out the way I expected, and I'm not certain that
using the pdsc_crd typedef will really be flexible enough to suit
the needs here. But since something like it will be needed, I'm
setting this up this way just to illustrate how the structures would
look in general.
struct sp_range_list {
pdsc_crd crditem;
struct split_info *sp_info;
long size;
struct sp_range_list *next;
};
struct sp_info {
pPDR ppd;
struct sp_range_list *spr_head;
struct sp_range_list *spr_tail;
};
sp_init:
sp_info *
sp_init(pObj obj)
Sp_init is called when dealing with an executable that has been
split. It basically traverses the comment section, looking for split
procedure info. It creates an array of structures ordered by procedure
descriptors. Each structure has a linked list of code ranges that
cover the extent of the entire procedure, in order as they would be
encountered if the procedure were not split.
sp_size:
unsigned long
sp_size(sp_info *spi, pPDR ppd)
Sp_size returns the size of the input procedure by looking up
the proc in the sp_info structure, and adding up the individual
pieces of the procedure.
sp_ranges:
sp_range *
sp_ranges(sp_info *spi, pPDR ppd)
Sp_ranges returns a pointer to a linked list which comprise the
various code ranges of a particular procedure, as indicated by the
ppd input
Line Numbers
While this would require a number of changes in a number of files,
the line number info is surprisingly tractable. The line number
info does not rely on addresses, and is all represented as offset
information. At discreet points these offsets are added to an
existing address, starting with the entry address of a routine,
and then inspected to see if they compare with the desired address or
line number. At these places in the code, we can add a check to
see if we are in a split routine, and if so, compare the generated
address with the address we know contains a branch to the moved code.
At that point, we merely change the current generated address
with the address the code has been moved to. We will know the
size of this moved code, so when reach the end of that code,
we change the address again, so that it is one instruction past
the branch to the moved code. A brief psuedo-code example would
look something like this:
branch_addr = spr_list->sp_info->branch_addr;
new_addr = spr_list->sp_info->vaddr;
final_addr = 0;
code for manipulating the line number info, up to
address = address + (count * 4);
if (address == branch_addr && final_addr == 0) {
address = new_addr;
branch_addr++; /* keep to make the switch back later */
final_addr = new_addr + (size*4);
}
else if (final_addr && address == final_addr) {
address = branch_addr;
final_addr = 0;
branch_addr = next branch address, if any, 0 otherwise;
}
A preliminary search indicates that the following files will probably
need changing, and there may be more:
libmld/
ldfcn.c
ldlread.c
line.c
llib-lmld.c
obj.c
obj_nlist.c
obj_rewrite.c
stcu.c
stfd.c
stio.c
stprint.c
libst/
st_line.c
st_obj.c
st_open.c
This was based by a grep search on the word 'pline' which is
frequently used in line number translation. It is quite possible
that some of these files use 'pline' in some other fashion, or
are make reference to another variable with pline as part of a
name, and do not actually manipulate the line number info.
|
| Ron,
Valid questions all. I've got some ideas, but not definite answers for
some of these.
1) OM does many things. I don't know enough about what or how it does
these things to give you a good answer, but I'll track them down.
I believe you can choose, through options, what om does, so that you could
choose just routine splitting, but that might not be the typical or
most desired option. Were I to design it, I would make the routine
splitting the last thing, so that things were rather settled by
the time you got to it, but that may be naive.
2) The code to do routine splitting is in OM already. I can check to
see what happens with entry and exit points.
3) It would be nice if the code were unchanged. Om is all post link
optimization, so the compiler has already worked on it. We already know
of problems with line numbers in these areas, which is why it might make
sense to also look at the sematic events you've proposed. Code might be
further coerced by OM but I'd like to see the data presented in
the sp_info represented as if splitting were the last thing done to it.
So that you could basically slap things back together into their
old places if you had to.
4) I'm unclear about what you're unclear about. How does adding a new
destination branch qualify as splitting a routine? Or are you suggesting that
the switch might have branched into the region that has now been relocated,
and we don't need to branch into what is now just a branch, but could
point to the new location instead? But that new location is tracked by
dint of the sp_info in an address_to_procedure routine so I'm not sure
there's any conern. It shouldn't matter if things are changed to directly
point there, we just need the origin info to tell us what procedure we
are actually in.
4a) What you don't ask here, but I was thinking about when I first read
this, was that it may take more than one branch to get to the new
location, via trampoline code. That should be easily enough handled
through steps and such in the debugger, since we shouldn't stop
UNTIL we're within the new location.
5) There is no restriction as to where the new location might be. As hot
blocks and routines bubble to the top they may or may not be contiguous
with their original code.
6) I'd been thinking originally of stBlock stMoved stEnd combo's but
it seemed somewhat awkward. After all, we only represent the beginning
and end of routines, right? We don't issue an stBlock stEnd pair for
each basic block within a routine. That seemed more the focus of
the pdsc_cdr info, which may still be used. Or maybe you have
some other ideas?
7) The optimization section is a possibility.
8) The sp_size was more or less a node to the profiler folks, who are always
wondering (at least in their questions to me) about the size of a particular
block or routine. It would be post split.
9) Are you saying we should make these addrs offsets instead of absolute
addrs? I've got nothing against that.
Let me know if I've interpreted these correctly. I'm going to have to take
some of the stuff to the OM guy (whom I haven't heard back from yet on the
copy I sent to him) to get a clearer idea.
PeterT
|
| A Symbol Table Proposal to Extend Source Location Information
Authors: Ron Brender, Dennis Murphy, Peter Thompson
Extended Source Location Information.
1.1 Introduction
This document describes how Extended Source Location information
is to be contained in the Third Eye Symbol table and is intended
to serve as a specification as to how that information is
constructed and accessed by the compilers and tools.
Extended Source Location information is intended to provide more
information than is contained in the existing line number table
including PC deltas, line number deltas, file transitions,
columns, and line/column ranges.
It is also intended that the Extended Source Location information
be a compact and extensible representation. Those issues will be
addressed in the detailed description.
While it was initially intended that Extended Source Location
information be defined to allow ATOM to accurately describe "split
routines" it seems likely that it will also be used be the
compilers to describe "out-of-line" code where functions are
segregated into hot and cold regions.
The existing line number table is unable to describe either split
routines or out-of-line code, resulting in difficult to debug
code.
This document makes use of a Tag/Length/Value (TLV)
representation, a self-describing data format. This particular TLV
representation is described in UNIX_OBJSYM note 33.3 and will not
be explained further in this document.
1.2 Basic rules
o If a compilation is to build Extended Source Location information
that information is to be provided for the entire compilation. It is
NOT a pick-and-choose mechanism.
o Extended Source Location information is to be contained in the
Optimization Table.
o Extended Source Location information will be produced in addition
to the existing line number table, at least initially. This is
intended
to provide backward compatibility with existing tools. Issues
of object file size and compact line number representation are
to be deferred.
o Each procedure descriptor will describe its Optimization Table
contribution via the pdr iopt element as detailed below.
o File descriptor entries describing the Optimization Table are
defined in a following section.
1.3 Symbol Table Placement
The Extended Source Location information will be located in the
currently unused Optimization Table.
Compilers will construct Extended Source Location information in
the Optimization Table in addition to any other information to be
placed in that table.
All entries placed in the Optimization Table are in the form of
self-describing TLV entries.
The compilation of a module will produce an object file with a
single Optimization Table that is constructed and accessed using
the existing file and procedure mechanisms.
The linker will concatenate the Optimization Tables of all modules
contributing to an image. Note in particular that the linker need
not parse or analyze the Optimization Table information in any
way. The resulting image Optimization Table will be accessed by
post link tools such as OM and debuggers
1.4 Form
The Extended Source Location information is managed as a
Tag/Length/Value (TLV) element of the Optimization Table.
The fdr copt element is a count of the number of bytes a file
contributed to the Optimization table.
The fdr ioptBase is used to locate the files first TLV entry.
The files first TLV entry must be the same as the first TVL entry
of the first procedure (via PD table) of the file.
Each procedure descriptor iopt member will locate the Optimization
Table contribution using the standard Third Eye rules for indexing
via the procedure descriptor.
The iopt value will indicate the first TLV entry for a given routine.
Tools, including the debuggers, seeking to process the Extended Source
Location information will scan a procedure's Optimization table TVL
elements until the Extended Source Location element is found or the
procedure's terminal TVL entry is encountered.
It is an error for a module containing Extended Source Location
information to omit producing an Extended Source Location TLV entry
for any routine. However, the error is always nonfatal.
1.5 Issues
The Tag value for Extended Source Location information will need to
be specified.
2.1 Detailed Extended Source Location Format
The current line number information has a very compact and very
restrictive format. We propose some changes to the current format
in order to gain some flexibility and allow for future expansion.
The current format is basically a byte stream, with each byte
being divided into two 4 bit halves. The high ordered bits are
used to indicate the line number delta, with a range of -7 to 7
lines. A bit string indicating -8, or '1000'b is used to indicate
that the line number delta is too large, and is represented in the
next 2 bytes.
The low 4 bits of each byte, except in the case above, are used to
represent an instruction delta, from 1 to 16 instructions (the
range covers from 0 to 15, but you always add 1, as each pc delta
represents at least one instruction).
2.2 New Proposal
In order to give us greater flexibilty in dealing with line
numbers we want to separate the byte stream into two different
types of formats, which we will call data mode and command mode.
The basic format described above will now be known as data mode 1.
Since we know the initial state of things from the procedure and
file descriptors before we start looking at the line number info,
it is intended that we will always presume to start in data mode
1. There will also be a data mode 2, which will be used to
represent line, pc, and column information for programs with more
than one statement on a line. Data mode 0 is currently not
defined, and would be seen as an error.
The current 'escape' of '1000'b in the line delta information
would now become a escape to the command mode.
Once an escape is encountered, the line number algorithm will
switch to interpreting command mode, and subsequent bytes will be
treated as commands, paired with LEB128 data. LEB(Little Endian
Byte)128 is a compact way of expressing integer data and does not
need to be used only on little endian machines. Each command will
be 1 byte in length, with the 2 high order bits as flags for mark
and resume, respectively. The mark bit will be used to indicate
the last command in a sequence that takes us to the next line. The
resume flag would be an indication to return to the previous data
mode. The low 6 bits then, will be used to distinguish the
various commands.
The LEB128 is a byte stream way of encoding data taking advantage
that most of the integers will be fairly small. Integers are
split into 7 bit segments, '1111111'b, and the segments are put
into the 7 lower bits of a byte. The high order eighth bit is used
as a flag to indicate whether there are any more bytes in the
number. A 1 indicates a continuation and a 0 indicates completion.
Large numbers would thus be formed by concatenating together the 7
bit segments.
2.3 Commands
We've identified a small group of commands that we intend to start
with. These may grow as new needs are assessed in the future.
The commands, with an explanation of how the LEB data should be
interpreted follow below. No particular bit pattern has been
established yet for each command.
add_pc, LEB
signed delta, Value is added to current pc 'value' (as
determined by previous line number machinations) to
end up with a pc value that is greater or less than the
current value.
add_line, LEB
signed delta, added to current line number
set_col, LEB
absolute value, used to associate pc with a particular
section of a line, rather than just the beginning of
a line.
set_file, LEB
absolute value, used to set the current information to
a particular source file. Needed to handle include
files, which we have no way of currently representing
in the present symbol table.
set_data_mode, LEB
absolute value. Only a 1 or a 2 would be acceptable
here, for data mode 1 or data mode 2. Anything
else would be an error.
add_pc_line, LEB, LEB
Sort of a larger, more flexible representation of the
current delta/pc pair. Both signed deltas.
add_pc_line_col, LEB, LEB, LEB
To handle pc, line and column data all at once.
The first 2 are signed delta's. The third is an
absolute value.
|