Ultra-fast Aliasin9 Analysis Using Cla: A Million Lines Of C Code In

    Number lines to millions. A variety of analyses (we use it to implement a number of different algorithms for these code bases are in excess of a million lines of code (un- commented. www.cs.ucla.edu.

  • Id: # 0860840
  • File Type: pdf
  • Author: www.cs.ucla.edu
  • File Size 1.34 MB
  • Read by User: 106 Times
  • Published: Thursday, April 3, 2014
  • index: Number Lines To Millions

Rating

  • Read Online
Ultra-fast Aliasin9 Analysis using CLA:
A Million Lines of C Code in a Second I
Nevin Heintze
Research, Agere Systems
(formerly Lucent's Microeleetronics Division)
nch©agere,
com
Olivier Tardieu
Ecole des Mines, Paris
olivier, [email protected], org
ABSTRACT
We describe the design and implementation of a system for
very fast points-to analysis. On code bases of about a million
lines of unpreprocessed C code, our system performs field-
based Andersen-style points-to analysis in less than a second
and uses less than 10MB of memory. Our two main contri-
butions are a database-centric analysis architecture called
compile-link-analyze (CLA), and a new algorithm for imple-
menting dynamic transitive closure. Our points-to analysis
system is built into a forward data-dependence analysis tool
that is deployed within Lucent to help with consistent type
modifications to large legacy C code bases.
1. INTRODUCTION
The motivation for our work is the following software main-
tenance/development problem: given a million+ lines of C
code, and a proposed change of the form "change the type
of this object (e.g. a variable or struct field) from typel to
type2", find all other objects whose type may need to be
changed to ensure the "type consistency" of the code base.
In particular, we wish to avoid data loss through implicit
narrowing conversions. To solve this problem, we need a
global data-dependence analysis that in effect performs a
forward data-dependence analysis (Section 2 describes this
analysis, and how it differs from other more standard de-
pendence analyses in the literature.). A critical part of this
dependence analysis is an adequate treatment of pointers:
for assignments such as *p = x we need to determine what
objects p could point to. This kind of aliasing analysis is
commonly called points-to analysis in the literature [4]. The
scalability of points-to analysis has been a subject of inten-
sive study over the last few years [5, 8, 21, 11, 23]. However
the feasibility of building interactive tools that employ some
form of "sufficiently-accurate" pointer analysis on million
line code-bases is still an open question.
The paper has two main contributions. The first is an archi-
1This is a substantially revised version of [16].
Permission to make
digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that
copies are not made or distributed for profit or commercial advan-
tage and that copies bear this notice and the
full citation on
the first page.
To copy otherwise, to
republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee.
PLDI 2001 6/01 Snowbird, Utah,
USA
© 2001 ACM ISBN 1-58113-414-2/01/06... $5.00
tecture for analysis systems that utilizes ideas from indexed
databases. We call this architecture compile-link-analyze
(CLA), in analogy with the standard compilation process.
This architecture provides a substrate on which we can build
a variety of analyses (we use it to implement a number of
different algorithms for Andersen-style points~to analysis,
dependence analysis and a unification-style points-to anal-
ysis, all using a common database format for representing
programs). It scales to large code bases and supports sepa-
rate and/or parallel compilation of collections of source files.
Also, its indexing structures support rapid dynamic loading
of just those components of object files that are needed for
a specific analysis, and moreover after reading a component
we have the choice of keeping it in memory or discarding it
and re-reading it if we even need it again (this is used to
reduce the memory footprint of an analysis). We describe
CLA in detail in Section 4, and discuss how it differs from
other approaches in the literature, such as methods where
sepaxate files are locally analyzed in isolation and then the
individual results are combined to analyze an entire code-
base.
The second contribution is a new algorithm for implement-
ing dynamic transitive closure (DTC). Previous algorithms
in the literature for Andersen's analysis are based on a tran-
sitively closed constraint graph e.g. [4, 10, 11, 21, 23, 22].
In contrast, our algorithm is based on a pre-transitive graph
i.e. we maintain the graph in a form that is not transitively
closed. When information about a node is requested, we
must perform a graph reachability computation (as opposed
to just looking up the information at the node itself in the
case of a transitively closed constraint graph). A direct im-
plementation of the pre-transitive graph idea is impractical.
We show how two optimizations - caching of teachability
computations, and cycle elimination - yield aa efficient al-
gorithm. Cycle elimination has previously been employed
in the context of transitively closed graph and shown to re-
sult in significant improvement [11], however in that work
the cost of finding cycles is non-trivial and so completeness
of cycle detection is sacrificed in order to contain its cost.
However, in the pre-transitive setting, cycle detection is es-
sentially free during graph teachability. We describe our
algorithm in detail in Section 5.
Section 6 presents various measurements of the performance
of our system. For the Lucent code bases for which our sys-
tem is targeted, runtimes are typically less than a second
(800MHz Pentium) and space utilization is about 10MB.
................ 254
These code bases are in excess of a million lines of code (un-
commented non-blank lines of source, before pre-processing).
On gimp (a publicly available code base of about 440K
lines), our system performs field-based Andersen-style points-
to analysis in about a second (800MHz Pentium) and uses
about 12MB. We also present data to illustrate the space
advantages of CLA.
2. MOTIVATING APPLICATION- DEPEN-
DENCE ANALYSIS
Our points-to analysis system is built into a forward data-
dependence analysis tool that is deployed within Lucent to
help with consistent type modifications to large legacy C
code bases. The basic problem is as follows: suppose that
the range of values to be stored in a variable must be in-
creased to support additional system functionality. This
may require changing the type of a variable, for example
from short to int. To avoid data loss through implicit nar-
rowing conversions, any objects that take values from the
changed variable must also have their types appropriately
altered. Consider the following program fragment.
short x,
y,
y
= x;
z = y+l;
p =
~v;
*p = Z;
w =
1;
Z, *pp V~ W;
If the type of x is changed from short to int, then we may
also have to change the types of y, z, v and probably p, but
we do not need to change the type of w.
Given an object whose type must be changed (the
target),
we wish to find all other objects that can be assigned val-
ues from the specified object. This is a forward dependence
problem, as opposed to backwards dependence used for ex-
ample in program slicing [25]. Moreover it only involves
data-dependencies, as opposed to
both
data-dependencies
and control-dependencies which are needed in program slic-
ing. Our analysis refines forward data-dependence analysis
to reflect the importance of a dependency for the purposes
of consistent type changes. The most important dependen-
cies are those involving assignments such as x = y and z =
y+l. On the other hand, an assignment such as zl = !y
can be ignored, since changing the type of y has no effect
on the range of values of zl, and so the type of zl does not
need to be changed. Assignments involving operations such
as division and multiplication are less clear. We discuss this
later in the section.
An important issue for the dependence analysis is how to
treat structs. Consider the program fragment involving
structs in Figure 1. If the target is the variable target,
then u, w and s. x are all dependent objects. If the type
of target is changed from short to int, then the types of
u, w and s.x should also be changed. To effect this change
to the type of s. x, we can either change the type of the x
field of the struct S, or we can introduce a new struct type
especially for s. The advantage of the former case is that we
make minimal changes to the program. The disadvantage is
that we also change the type of t. x, and this may not be
Table 1: Classification of operations.
Operations Argument 1 Argument 2
+, -, l, gq ^
*
~ >>, <<
unary: +,-
Strong Strong
Weak Weak
Weak None
Strong n/a
None None
None n/a
strictly necessary. However, in practice it is likely that if
we have to change the type of the x field of s, then we will
have to change the type of the x field of t. As a result, it
is desirable to treat objects that refer to the same field in
a uniform way. By "same field", we mean not just that the
fields have the same name, but that they are the same field
of the same struct type.
Since our ultimate use of dependence analysis is to help iden-
tify objects whose type must be changed, we are not just in-
terested in the set of dependent objects. Rather, we need to
give a user information about
why
one object is dependent on
another. To this end, we computes the dependence chains,
which identify paths of dependence between one object and
another. In general there are many dependence paths be-
tween a pair of objects. Moreover, some paths are more
important than others. Dependencies arising from direct
assignments such as x=y are usually the most important; de-
pendencies involving arithmetic operations x=y+l, x=y>>3,
x=42Xy, x=l<<y are increasingly less important. Our metric
of importance is biased towards operations that are likely
to preserve the shape and size of input data. Table 1 out-
lines a simple strong/weak/none classification that we have
employed. Our analysis computes the most important path,
and if there are several paths of the same importance, we
compute the shortest path.
Large code bases often generate many dependent objects -
typically in the range 1K-100K. To help users sift through
these dependent objects and determine if they are objects
whose type must be changed, we prioritize them according
to the importance of their underlying dependence chain. We
also provide a collection of graphic user interface tools for
browsing the tree of chains and inspecting the corresponding
source code locations. In practice, there are often too many
chains to inspect - a common scenario is that a central ob-
ject that is not relevant to a code change becomes dependent
(often due the context- or flow-insensitivity of the underly-
ing analysis), and then everything that is dependent on this
central object also becomes dependent. We address this is-
sue with some additional domain knowledge: we allow the
user to specify "non-targets", which are objects that the
user knows are certainly not dependent on the target ob-
ject. This has proven to be a very effective mechanism for
focusing on the important dependencies.
3. ANDERSEN'S POINTS-TO ANALYSIS
We review Andersen's points-to analysis and introduce some
definitions used in the rest of the paper. In the literature,
there are two core approaches to points-to analysis, ignoring
context-sensitivity and flow-sensitivity. The first approach is
........ 255
i. short target;
2. struct S { short x;
3. short u, *v, w;
4. struct S s, t;
5. v = ~w;
6.
u = target;
7. *v
=
u;
8. S.X = ~;
short
y; );
w/short <eg1.c:3> --3- u/short <egl.c:7> --~ target/short <egl.c:6> where target/short <egl.c:l>
target/short <egl.c:l>
u/short <egl.c:3> -+ target/short <egl.c:6> where target/short <egl.c:1>
S.x/short <egl.c:2> -- w/short <egl.c:8> -+ u/short <egl.c:7> ~ target/short <egl.c:6> ...
Figure 1: A program fragment involving structs and its dependence results (the target is target).
unification-based [24]: an assignment such as x = y invokes
a unification of the node for x and the node for y in the
points-to graph. The algorithms for the unification-based
approach typically involve union/find and have essentially
linear-time complexity. The second approach is based on
subset relationships: an assignment such as x = y gives rise
to a subset constraint x D y between the nodes x and y in
the points-to graph [4]. The algorithms for the subset-based
approach utilize some form of subtyping system, subset con-
straints or a form of dynamic transitive closure, and have
cubic-time complexity.
The unification-based approach is faster and less accurate
[22]. There has been considerable work on improving the
performance of the subset-based approach [11, 23, 21], al-
though the performance gap is still sizable (c.f. [23, 21] and
[8]). As Das very recently observed "In spite of these efforts,
Andersen's algorithm does not yet scale to programs beyond
500KLOC." [8] There has also been work on improving the
accuracy of the unification-based approach by incorporat-
ing some of the directional features of the subset-based ap-
proach to produce a hybrid unification-based algorithm [8]:
for a small increase in analysis time (and quadratic worst-
case complexity), much of the additional accuracy of the
subset-based approach can be recovered.
A Deductive Reachability Formulation
We use a context-insensitive, flow-insensitive version of the
subset-based approach that is essentially the analysis due
to Andersen [4]. One reason for this choice is the better
accuracy of the subset-based approach over the unification-
based approach. Another reason is that users of our depen-
dence analysis system must be able to inspect the depen-
dence chains produced by our system (Section 2), and un-
derstand why they were produced. Subset-based approaches
generate easier to understand results; unification-based ap-
proaches often introduce hard to understand "backwards"
flows of information due to the use of equalities.
Previous presentations of Andersen's algorithm have used
some form of non-standard type system. Our presentation
uses a simple deductive reachability system. This style of
analyses presentation was developed by McAllester [20]. It
has also been used to describe control-flow analysis [18]. To
simplify our presentation, we consider a tiny language con-
sisting of just the operations , and &. Expressions e have
x > &y
(if ,x = e in P)
y >e
x ---~ &y
(if e = ,x in P)
e------+y
(if el = e2 in P)
el ~ 62
el ~
e2 e2 )
e3
el
---'~ e3
Figure 2: Deduction rules for aliasing analysis.
(STAre 1)
(STAR-2)
(ASSIGN)
(TRANS)
the form:
e ::= x I *~ I &x
We shall assume that nested uses of, and & are removed
by a preprocessing phase. Programs are sequences of assign-
ments of the form el = e2 where el cannot be &x.
Given some program P, we construct deduction rules as
specified in Figure 2. In the first rule, the side condition
"if *x : e in P" indicates that there is an instance of this
rule for each occurrence of an assignment of the form *x = e
in P. The side conditions in the other rules are similarly
interpreted. Intuitively, an edge el ~ e2 indicates that any
object pointer that we can derive from e2 is also derivable
from ex. The first rule deals with expressions of the form
*x on the left-hand-sides of assignments: it states that if
there is a transition from x to &y, then add a transition
from y to e, where e is the left-hand-side of the assignment.
The second rule deals with expressions of the form *x on
the right-hand-sides of assignments: it states that if there
is a transition from x to &y, then add a transition from e
to y where e is the right-hand-side of the assignment. The
third rule adds a transition from el to e2 for all assignments
el = e2 in the program, and finally, the fourth rule is just
transitive closure. The core of our points-to analysis can
now be stated as follows: x can point to y if we can derive
x ~ &y. Figure 3 contains an example program and shows
how y ~ &x can be derived.
Analysis of Full C
Extending this core analysis to full C presents a number of
choices. Adding values such as integers is straightforward.
It is also easy to deal with nested uses of * and & through
the addition of new temporary variables (we remark that
256
int x, *y;
int
**Z;
Z -->
~y (ASSIGN)
Z
=. 86y; *Z ~ ~X (ASSIGN)
*z = &X; Y ~ ~x (from STAR-l)
Figure 3: Example program and application of de-
duction rules to show y ~ ax.
considerable implementation effort is required to avoid in-
troducing
too many
temporary variables). However, treat-
ing structs and unions is more complex. One possibility
is, in effect, to ignore them: each declaration of a variable
of struct or union type is treated as an unstructured mem-
ory location and any assignment to a field is viewed as an
assignment to the entire chunk e.g.x.f is viewed as an as-
signment to x and the field component f is ignored. We
call this the
field-independent
approach and examples in-
clude [10, 11, 22]. Another approach is to use a field-based
treatment of structs such as that taken by Andersen [4]. In
essence, the field-based approach collects information with
each field of each struct, and so an assignment to
x.f
is
viewed as an assignment to f and the base object x is ig-
nored. (Note that two fields of different structs that happen
to have the same name are treated as separate entities.) The
following code illustrates the distinction between field-based
and field-independent.
struct
s ~ int *x; int *y; ]- A, B;
int z;
main () 'C
int *p, *q, *r, *s;
A.x = ~z; /* field-based: assigns to "x"
* field-independent: assigns ~o "A" */
p = A.x; /* p gets ~z in both approaches */
q = A.y; /* field-independent: q gets Ez */
r = B.x; /* field-based: r gets ~z */
s = B.y; /* in neither approach does s get ~z
*/
}
In the field-independent approach, the analysis determines
that only p and q can point to ~zz. In the field-based ap-
proach, only p and r can point to ~z. Hence, neither of
these approaches strictly dominates the other in terms of ac-
curacy. We note that while the works [10, 11, 22] are based
on Andersen's algorithm [4], they in fact differ in their treat-
ment of structs: they are field-independent whereas Ander-
sen's algorithm is field-based ~. In Section 6, we show this
choice has significant implications in practice, especially for
large code bases. Our aliasing analysis uses the field-based
approach, in large part because our dependence analysis is
also field-based.
4. COMPILE-LINK-ANALYZE
A fundamental problem in program analysis is modular-
ity: how do we analyze large code bases consisting of many
source files? The simple approach of concatenating all of the
source files into one file does not scale beyond a few thou-
sand lines of code. Moreover, if we are to build interactive
1Strictly speaking, while Andersen's core algorithm is field-
based, he assumes that a pre-processiug phase has dupli-
cated and renamed struct definitions so that structs whose
values cannot flow together have distinct names (see Section
2.3.3 and 4.3.1 of [4]).
tools based on an analysis, then it is important to avoid re-
parsing/reprocessing the entire'code base when changes are
made to one or two files.
The most basic approach to this problem is to parse compila-
tion units down to an intermediate representation, and then
defer analysis to a hybrid link-analyze phase. For example,
at the highest level of optimization, DEC's MIPS compiler
treats the internal ucode files produced by the frontend as
"object files", and then invokes a hybrid linker (uld) on the
ucode files [9]. The uld "linker" simply concatenates the
ucode files together into a single big ucode file and then
performs analysis, optimization and code generation on this
file. The advantage of this approach is it modularizes the
parsing problem - we don't have to parse the entire program
as one unit. Also, we can avoid re-parsing of the entire code
base if one source file changes. However, it does not mod-
ularize the analysis problem - the analysis proceeds as if
presented with the whole program in one file.
One common way to modularize the analysis problem is to
analyze program components (at the level of functions or
source files), and compute summary information that cap-
tures the results of these local analyses. Such summaries
are then combined/linked together in a subsequent "global-
analysis" phase to generate results for the entire program.
This idea is analogous to the construction of principle types
in type inference systems. For example, assigning 'ca ~ o?'
to the identity function in a simply typed language is essen-
tially a way of analyzing the identity function in a modular
way. Uses of the identity function in other code fragments
can utilize oz -4 ~ as a summary of the behavior of the
identity function, thus avoiding inspection of the original
function. (Of course, full polymorphic typing goes well be-
yond simply analyzing code in a modular way, since it allows
different type instantiations for different uses of a function
- akin to context-sensitive analysis - which is beyond the
scope of the present discussion.)
This modular approach to analysis has a long history. Ac-
cording to folklore, one version of the MIPS compiler em-
ployed local analysis of separate files and then combined the
local analysis results during a "linking" phase. The idea is
also implicit in Aiken et. al.'s set-constraint type systems
[3], and is much more explicit in Flanagan and Felleisen's
componential analysis for set-based analysis [12]. Recently,
the idea has also been applied to points-to analysis. Das [8]
describes a hybrid unification-based points-to analysis with
the following steps. First, each source file is parsed, and
the assignment statements therein are used to construct a
points-to graph with flow edges, which is simplified using a
propagation step. The points-to graph so computed is then
"serialized" and written to disk, along with a table that
associates symbols and functions with nodes in the graph.
The second phase reads in all of these (object) files, unifies
nodes corresponding to the same symbol or function from
different object files, and reapplies the propagation step to
obtain global points-to information. In other words, the
analysis algorithm is first applied to individual files and the
internal state of the algorithm (which in this case is a points-
to graph, and symbol information) is frozen and written to
a file. Then, all of these files are thawed, linked and the
algorithm re-applied.
.............. 257
This means that the object files used are specific not just to
a particular class of analysis (points-to analysis), but to a
particular analysis algorithm (hybrid unification-based anal-
ysis), and arguably even to a particular implementation of
that algorithm. The object files are designed with specific
knowledge of the internal data-structures of an implemen-
tation in such a way that the object file captures sufficient
information about the internal state of the implementation
that this state can be reconstructed at a later stage.
The CLA Model
Our approach, which we call compile-link-analyze (CLA),
also consists of a local computation, a linking and a global
analysis phase. However, it represents a different set of
tradeoffs from previous approaches, and redraws the bound-
aries of what kind of work is done in each phase. A key
difference is that the first phase simply parses source files
and extracts assignment statements - no actual analysis is
performed - and the linking phase just links together the
assignment statements. One advantage of the CLA architec-
ture is that the first two phases remains unchanged for many
different implementations of points-to armlysis and even dif-
ferent kinds of analysis (we return to this point later). A
follow-on advantage is that we can justify investing resources
into optimizing the representation of the collections of as-
signments, because we can reuse this work in a number of
different analysis implementations. In particular, we have
developed a database-inspired representation of assignments
and function definitions/calls/returns. This representation
is compact and heavily indexed. The indexing allows rel-
evant assignments for a specific variable to be identified in
just one lookup step, and more generally, it supports a mode
where the assignments needed to solve a particular analysis
problem can be dynamically loaded from the database on
demand.
More concretely, CLA consists of three phases. The
compile
phase parses source files, extracts assignments and function
calls/returns/definitions (in what follows we just call these
"assignments"), and writes an object file that is basically an
indexed database structure of these basic program compo-
nents. No analysis is performed yet. Complex assignments
are broken down into primitive ones by introducing tempo-
rary variables. The elements of the database, which we call
primitive assignments, involve variables and (typically) at
most one operation.
The
link
phase merges all of the database files into one
database, using the linking information present in the ob-
ject files to link global symbols (the same global symbol
may be referenced in many files). During this process we
must recompute indexing information. The "executable" file
produced has the same format as the object files, although
its linking information is typically obsolete (and could be
stripped).
The
analyze
phase performs the actual analysis: the linked
object file is dynamically loaded on demand into the running
analysis. Importantly, only those parts of the object file
that are required are loaded. An additional benefit of the
indexing structure of the object file is that when we have
read information from the object file we can simply discard
it and re-load it later if necessary (memory-mapped I/O
is used to support efficient reading and re-reading of the
object file). We use this feature in our implementation_ of
Andersen's analysis to greatly reduce the memory footprint
of the analysis. It allows us to maintain only a very small
portion of the object file in memory.
An example source file and a partial sketch of its object file
representation is given in Figure 4. These object files consist
of a header section which provides an index to the remaining
sections, followed by sections containing linking information,
primitive assignments (including information about function
calls/returns/definitions) and string information, as well as
indexing information tbr identifying targets for the depen-
dence analysis. The primitive assignments are contained in
the
dynamic section;
it consists of a list of blocks, one for
each object in the source program. Each block consists of
information about the object (its name, type, source code
location and other attributes), followed by a list of prim-
itive assignments where this object is the source. For ex-
ample, the block for z contains two primitive assigmnents,
corresponding to the second and third assignments in the
program (a very rough intuition is that whenever z changes,
the primitive assignments in the block for z tell us what we
must recompute).
As mentioned before, one of the goals of our work is to build
infrastructure that can be used for a variety of different anal-
ysis implementations as well as different kinds of analysis.
We have used our CLA infrastructure for a number of differ-
ent subset-based points-to analysis implementations (includ-
ing an implementation based on bit-vectors, as well as many
variations of the graph-based points-to algorithm described
later in this paper), and field-independent and field-based
points-to analysis. The key point is that our object files
do not depend on the internals of our implementation and
so we can freely change the implementation details without
changing the object file format. We have also used CLA
infrastructure for implementing unification-based points-to
analysis, and for the dependence analysis described in Sec-
tion 2. Finally, we note that we can write pre-analysis op-
timizers as database to database transformers. In fact, we
have experimented with context-sensitive analysis by writ-
ing a transformation that reads in databases and simulates
context-sensitivity by controlled duplication of primitive as-
signments in the database - this requires no changes to code
in the compile, link or analyze components of our system.
We now briefly sketch how the dependence and points-to
analyses use object files. Returning to Figure 4, consider
performing points-to analysis. The starting point for points-
to analysis is primitive assignments such as q = ~y in the
static section. Such an assignment says that y should be
added to the points-to set for q. This means that the points-
to set for q is now non-empty, and so we must load all prim-
itive assignments where q is the source. In this case, we
load p = q, which imposes the constraint p D q. This is
all we need to load for points-to analysis in this case. Now
consider a dependence analysis. Suppose that the target of
the dependence analysis is the variable z. We first look up
"z" in the hashtable in the target section to find all vari-
ables in the object file whose name is "z" (strictly speaking,
we find the object file offsets of all such variables). In this
case we find just one variable. We build a data-structure to
258
file a.c:
int x, y, z,
x=y;
x=z;
*p= z;
p=q;
q= ~y;
x
=
*p;
*p, *q;
header section: segment offsets and sizes
global section: linking information
static section: address-of operations; always loaded for points-to analysis
q= ~y
string section: common strings
target section: hashtable for finding targets
dynamic section: elements are loaded on demand, organized by object
x @ a.c:l
none
y @ a.c:l
x = y @ a.c:2
~ a.c:l
x
: z @
a.c:3
*p =z ~ a.c:4
p @ a.c:l
x = *p ~ a.c:7
q Q a.c:1
p = q ~ a.c:5
Figure 4: Example program and sketch of its object file
say that this variable is a target of the dependence analysis.
We then load the block for z, which contains the primitive
assignments x = z and *p = z. Using the first assignment,
we build a data-structure for x and then we load the block
for x, which is empty. Using the second assignment, we find
from the points-to analysis that p can point to g~y, and so
we build a data-structure for y and load the block for y, etc.
In the end, we find that both x and y depend on z.
The compilation phase we have implemented includes more
information in object files that we have sketched here. Our
object files record information about the strength of depen-
dencies (see Section 2), and also information about any oper-
ations involved in assignments. For example, corresponding
to a program assignment x = y + z, we obtain two prim-
itive assignments x = y and x = z in the database. Each
would retain information about the "+" operation. Such
information is critical for printing out informative depen-
dence chains; it is also useful for other kinds of analysis that
need to know about the underlying operations. We include
sections that record information about constants in the pro-
gram. To support advanced searches and experiment with
context-sensitive analysis, we also include information for
each local variable that identifies the function in which it
is defined. We conjecture that our object file format can
be used (or easily adapted) for any flow-insensitive analysis
that computes properties about the values of variables i.e.
any analysis that focuses entirely on the assignments of the
program, and ignores control constructs. Examples include
points-to analysis, dependence analysis, constant propaga-
tion, binding-time analysis and many variations of set-based
analysis. One advantage of organizing object files using sec-
tions (much like COFF/ELF), is that new sections can be
transparently added to object files in such a way that exist-
ing analysis systems do not need to be rewritten.
We conclude with a discussion of functions and function
pointers. Function are handled by introducing standard-
ized names for function arguments and returns. For exam-
ple, corresponding to a function definition int f(x, y) {
• .. return(z)}, we generate primitive assignments x =
fl, y = f2, f~t : z, where ffl, f2, f~ are respectively the
standardized variables for the two arguments of f and f's
return value. Similarly, corresponding to a call of the form
w = f(el,
e2), we generate primitive assignments fl = el,
f2 = e2 and w : f~t. These standardized names are treated
as global objects, and are linked together, like other global
objects, by the linker. The treatment of indirect function
calls uses the same naming convention, however some of the
linking of formal and actual parameters happens at analysis
time. Specifically, corresponding to a function definition for
g, there is an object file entry (in the block for g) that records
the argument and return variables for 9. Corresponding to
an indirect call (*f) (x, y), we mark f as a function pointer
as well as adding the primitive assignments fl -- x, f2 = y,
etc. During analysis, if a function g is added to the points-to
set for f (marked as a function pointer), then we load the
record of argument and return variables for both f and g.
Using this information, we add new assignments gl = fl,
g2 : f2 and fret = 9rct.
5. A GRAPH-BASED ALGORITHM FOR AN-
DERSEN'S ANALYSIS
Scalability of Andersen's context-insensitive flow-insensitive
points-to analysis has been a subject of much research over
the last five years. One problem with Andersen's analysis is
the "join-point" effect of context-insensitive flow-insensitive
analysis: results from different execution paths can be joined
together and distributed to the points-to sets of many vari-
ables. As a result, the points-to sets computed by the anal-
ysis cam be of size
O(n)
where n is the size of the program;
such growth is commonly encountered in large benchmarks.
This can spell scalability disaster if all points-to sets are
explicitly enumerated.
Aiken et. al. have addressed a variety of scaling issues for
Andersen's analysis in a series of papers. Their work has
included techniques for elimination of cycles in the inclusion
graph [11], and projection merging to reduce redundancies
in the inclusion graph [23]. All of these are in the context of
a transitive-closure based algorithm, and their results show
very substantial improvements over their base algorithm -
with all optimizations enabled, they report analysis times of
1500s for the gimp benchmark on a SPARC Enterprise 5000
with 2GB [23].
Alternatively, context- and flow-sensitivity can be used to
259
reduce the effect of join-points. However the cost of these ad-
ditional mechanisms can be large, and without other break-
throughs, they are unlikely to scale to millions of lines of
code. Also, recent results suggest that this approach may
be of little benefit for Andersen's analysis [13].
In principle, ideas from sub-transitive control-flow analysis
[18] could also be applied to avoid propagation of the in-
formation from join-points. The basic idea of sub-transitive
control-flow analysis is that the usual dynamic transitive
closure formulation of control-fiow analysis is redesigned so
that the dynamic edge-adding rules are de-coupled from the
transitive closure rules. This approach can lead to linear-
time algorithms. However, it is currently only effective on
bounded-type programs, an unreasonable restriction for C.
The main focus of our algorithm, much like that of the sub-
transitive approach, is on finding a way to avoid the cost of
computing the full transitive closure of the inclusion graph.
We begin by classifying the assignments used in the deduc-
tive reachability system given in Section 3 into three classes:
(a) simple assignments, which have the form x ---- y, (b) base
assignments, which have the form x = &y, and (c) complex
assignments, which have the form x = =~y or *x = y. For
simplity, we omit treatment of *z = *y; it can be split into
*x =- z and z -= ~yj where z is a new variable. In what
follows we refer to items of the form &y as Ivals.
The central data-structure of our algorithm is a graph Q,
which initially contains all information about the simple as-
signments and base assignments in the program. The nodes
of G are constructed as follows: for each variable x in the
program, we introduce nodes n~ and n,~ (strictly speaking,
we only need a node n,~ if there is a complex assignment
of the form y = *x). The initial edges of ~ are constructed
as follows: corresponding to each simple assignment x = y,
there is an edge n~ --+ n u from nz to nu. Corresponding to
every node n in G, there is a set of base elements, defined
as follows:
baseElements(n~) = {y : x = &y appears in P}
The complex assignments, which are not represented in G,
are collected into a set C. The algorithm proceeds by it-
erating through the complex assignments in C and adding
edges to G based on the information currently in G. At any
point in the algorithm, G represents what we explicitly know
about the sets of lvals for each program variable. A major
departure from previous work is that ~ is maintained in pre-
transitive form i.e. we do not transitively close the graph.
As a result, whenever we need to determine the current lvals
of a specific variable, we must perform graph reachability:
to find the set of lvals for variable x, we find the set of nodes
reachable from n~ in zero or more steps, and compute the
union of the baseElements sets for all of these nodes. We use
the function getLvals(n~) to denote this graph reachability
computation for node n~.
The process of iterating through the complex assignments in
C and adding edges to ~ based on the information currently
in Q is detailed in Figure 5. Note that line 7 need only be
executed once, rather than once for each iteration of the
loop.
Before discussing the getLvals 0 fimction, we give some in-
tuition on the computational tradeoffs invotved in maintain-
ing the constraint graph in pre-transitive form and comput-
ing lvals on demand. First, during its execution, the algo-
rithm only requires the computation of lvals for some subset
of the nodes in the graph. Now, of course, at the end of' the
algorithm, we may still have to compute all lvals for all graph
nodes. However, in the presence of cycle-elimination (dis-
cussed shortly), it is typicMly much cheaper to compute all
lvals for all nodes when the algorithm terminates than it is
to do so during execution of the algorithm. Second, the pre-
transitive algorithm trades off traversal of edges versus flow
of lvals along edges. More concretely, consider a complex
assignment such as *x = y, and suppose t~hat the set of lvals
for x includes &xl and &x2. As a result of this complex
assignment, we add edges from xl to y and x2 to y. Now,
in an algorithm based on transitive-closure, all lvals asso-
ciated with y will flow back along the new edges inserted
and from there back along any paths that end with xl or
x2. In the pre-transitive graph, the edges are added and
there is no flow of lvals. Instead, Ivals are collected (when
needed) by a traversal of edges. In the transitive closure
case, there are O(n.E) transitive closure steps, where n is
the average number of distinct lvals that flow along an edge,
and E is the number of edges, versus O(E) steps per reach-
ability computation This tradeoff favors the pre-transitive
graph approach when E is small and n is large. (We re-
mark that this is analysis is for intuition only; it is not a
formal analysis, since neither the transitive closure step nor
the reachability step are O(1) operations.)
We next describe getLvals0, which is the graph reachabil-
ity component of the algorithm. A key part of the graph
reachability algorithm is the treatment of cycles. Not only
is cycle detection important for termination, but it has fun-
damental performance implications. The first argument of
getLvals 0 is a graph node and the second is a list of ele-
ments that define the path we are currently exploring; top-
level calls have the form getLvals(n, nil). Each node in
has a one-bit field onPath.
The function uni:fyNodes 0 merges two nodes. We imple-
ment node merging by introducing an optional skip field
for each node. Two nodes nl and n2 are then unified by
setting skip(n1) to n2, and merging edge and baseElement
information from nl into n2. Subsequently, whenever node
nl is accessed, we follow its skip pointer. We use an incre-
mental algorithm for updating graph edges to skip-nodes to
their de-skipped counter-parts.
Cycle elimination was first used for points-to analysis by
F/ihndrich et. al [11]. In their work, the cost of finding cy-
cles was non-trivial and so completeness of cycle detection
was sacrificed in order to contain its cost. In contrast, cy-
cle detection is essentially free in our setting during graph
teachability computations. Moreover, we find almost all cy-
cles - more precisely, we find all cycles in the parts of the
graphs we traverse during graph reachability. In essence,
we find the costly cycles - those that are not detected are
in parts of the graph that we ignore. In other words, one
of the benefits of our algorithm is that it finds more of the
important cycles and it does so more cheaply.
260
/* The Iteration Algorithm */
I. do {
2. nochange
= true;
3. for each complex assignment *x = y in
4. for each Rz in getLvals(nz)
5. add an edge nz ~ ny to ~;
6. for each complex assignment x =
*y
in
7. add an edge
n~ --~
n,y;
8. for each &z in getLvals(ny)
9. add an edge n~y -+ nz
I0. } until nochange
1.
2.
3.
4.
5.
6.
7.
C
8.
10.
11,
12,
13.
14.
15.
Figure 5: The Pre-Transitive Graph
getLvals(n, path)
{
±f(onPath(n)) <
/*
we have a cycle
*/
foreach n' in path, unifyNode(n', n);
return(emptySet);
} else { /* explore new
node
n */
onPath(n) = 1;
Ivals
=
emptySet;
path = cons(n, path);
Ivals = union(ivals, baseElements(n);
foreach n' such that there is an edge from n to n'
Ivals = union(ivals, getLvals(n, path));
onPath(n) = O;
return(ivals);
}
Algorithm for Points-to Analysis
This completes the basic description of our algorithm. We
conclude with a number of enhancements to this basic al-
gorithm. First, and most important, is a caching of reach-
ability computations. Each call to
getLvals 0
first checks to
see if the Ivals have been computed for the node during the
current iteration of the iteration algorithm; if so, then the
previous lvals axe returned, and if not, then they are recom-
puted (and stored in the node). Note that this means we
might use "stale" information; however if the information
is indeed stale, the nochange flag in the iteration algorithm
will ensure we compute another iteration of the algorithm
using fresh information. Second, the graph edges are main-
tained in both a hash table and a per-node list so that it is
fast to determine whether an edge has been previously added
and also to iterate through all of the outgoing edges from a
node. Third, since many lval sets are identical, a mechanism
is implemented to share common lvals set. Such sets axe im-
plemented as ordered lists, mad are linked into a hash table,
based on set size. When a new lval set is created, we check
to see if it has been previously created. This table is flushed
at the beginning of each pass through the complex assign-
ments. Fourth, lines 4-5 and lines 8-9 of Figure 5 are changed
so that instead of iterating over all lvals in getLvals (n~), we
iterate over all nodes in getLvalsNodes (n~). Conceptually,
the function getLvalsNodes() returns all of the de-skipped
nodes corresponding to the lvals computed by getLvals ();
however it can be implemented more efficiently.
From the viewpoint of performance, the two most signif-
icant elements of our algorithm are cycle elimination and
the caching of teachability computations. We have observed
a slow down by a factor in excess of > 50K for
gimp
(45,000s
c.f. 0.8s user time) when both of these components of the
algorithm are turned off.
6.
RESULTS
Our analysis system is implemented using a mix of ML and
C. The compile phase is implemented in ML using the ckit
frontend[6]. The linker and the analyzer axe implemented in
C. Our implementation deals with full C including structs,
unions, arrays and function call/return (including indirect
calls). Support for many of these features is based on simple
syntactic transformations in the compile phase. The field-
based treatment of structs is implemented as follows: we
generate a new variable for each field f of a struct defini-
tion, and then map each access of that field to the variable.
Our treatment of axrays is index-independent (we essentially
ignore the index component of sub expressions). The bench-
marks we use are described in Table 2. The first six bench-
marks were obtained from the authors of [21], and the lines
of code reported for these are the number of lines of non-
blank, non-# lines in each case. We do not currently have
accurate source line counts for these benchmarks. The sev-
enth benchmark was obtained from the authors of [23]. The
last benchmark is the Lucent code base that is the main tar-
get of our system (for proprietary reasons, we have not in-
cluded M1 informations on this benchmark). For each bench-
mark, we also measure the size of the preprocessed code in
bytes, the size of the object files produced by the analy-
sis (compiler + linker, also in bytes), and the number of
primitive assignments in the object files - the five kinds of
assignments allowed in our intermediate language.
We remark that line counts are only a very rough guide
to program size. Source code is misleading for many rea-
sons. For instance, macro expansion can considerably in-
crease the amount of work that must be performed by an
analysis. Preprocessed code is misleading because many ex-
traneous extern declarations axe included as the result of
generic system include files. Moreover, these system include
files can vary considerably in size from system to system.
AST node counts of preprocessed code are a better mea-
sure of complexity because they de-emphasize the effects of
coding style; however there is no agreed upon notion of AST
nodes, and AST nodes might still be inflated by unnecessary
dedarations generated by rampant include files. Counts of
primitive assignments may be a more robust measure.
Results from these benchmarks are included in Table 3.
These results measure analysis where (a) each static oc-
currence of a memory allocation primitive (malloc, calloc,
etc.) is treated as a fresh location, and (b) we ignore con-
stant strings. This is the default setup we use for points-to
and dependence analysis. The first column of Table 3 rep-
resents the count of program objects (variables and fields)
for which we compute non-empty pointer sets; it does not
include any temporary variables introduced by the analysis.
The second column represents the total sizes of the points-to
sets for all program objects. The third and fourth columns
give wall-dock time and user time in seconds respectively,
as reported by/bin/time using a single processor of a two
processor Pentium 800MHz machine with 2GB of memory
running Linux 2. The fifth column represents space utiliza-
2Red Hat Linux release 6.2 (Piglet)
VA Linux release 6.2.3 07/28/00 bl.1 P2
Kernel 2:2.14-VA.5.1smp on a 2-processor i686.
......... ...... 261
preproc, object
size size
nethack
burlap
vortex
emacs
povray
gee
gimp
lucent
LOC
(source)
440K
1.3M
LOC
(preproc.)
44.1K
74.6K
170.3K
93.5K
175.5K
199.8K
7486.7K
1.4MB
2.4MB
7.7MB
40.2MB
68.1MB
69.0MB
201.6MB
0.TMB
1.4MB
2.6MB
2.6MB
3.1MB
4.4MB
27.2MB
20.1MB
Table 2:
program t assignm~lt s______r__r_ ___:__ ~
variables X = y x = &y ] *x = y I *x = ~..~_1~
3856 9118
6859 14202
11395 24218
12587 31345
12570 29565
18749 62556
131552 303810
96509 270148
1115 30 34
1049 1160 714
7458 353 231
3461 614 154
4009 2431 1190
3434 1673 585
25578 5943 2397
72355 1562 991
1897
1866
1029
3085
1467
6428
3989_
Benchmarks
tion in MB, obtained by summing static data and text sizes
(reported by /bin/size), and dynamic allocation (as re-
ported by malloc_statsO). We note that for the lucent
benchmark - the target code base of our system - we see
total wall-clock times of about half a second, and space uti-
lization of under 10MB.
The last three columns explain why the space utilizations
are so low: these columns respectively show the number
of primitive assignments maintained "in-core", the number
loaded during the analysis, and the total number of primitive
assignments in the object file. Note that only primitive as-
signments relevant to aliasing analysis are loaded (e.g. non-
pointer arithmetic assignments are usually ignored). Recall
that once we have loaded a primitive assignment from an ob-
ject file and used it, we can discard it, or keep it in memory
for future use. Our discard strategy is: discard assignments
x = y and x = &y, but maintain all others. These num-
bers demonstrate the effectiveness of the load-on-demand
and load-and-throw-away strategies supported by the CLA
architecture.
Table 4 studies the effect of changing the baseline system.
The first group of columns represents the baseline and is
just a repeat of information from Table 3. The second
group shows the effect of changing the underlying treat-
ment of structs from field-based to field-independent. We
caution that these results are very preliminary, and should
not be interpreted as a conclusive comparison of field-based
and field-independent. In particular, there are a number of
opportunities of optimization that appear to be especially
important in the field-independent case that have not im-
plemented in our current system. We expect that these op-
timizations could significantly close the time and space gap
between the two approaches. However, it is clear that the
choice between field-based and field-independent has signifi-
cant implications in practice. Most points-to systems in the
literature use the field-independent approach. Our results
suggest that the field-based might in fact represent a better
tradeoff. The question of the relative accuracy of the two
approaches is open - even the metric for measuring their
relative accuracy is open to debate.
We conclude by briefly discussing empirical results from re-
lated systems in the literature. Since early implementations
of Andersen's analysis [22], much progress has been made
[11, 23, 21]. Currently, the best results for Andersen's are
analysis times of about 430 seconds for about 500K lines of
code (using a single 195 MHz processor on a multi-processor
SGI Origin machine with 1.5GB) [21]. The main limiting
factor in these results is that space utilization (as measured
by the amount of live data after GC) is 150MB and up - in
fact the largest benchmark in [21] ran out of memory. Re-
sults from [23] report analysis times of 1500s for gimp (on
a SPARC Enterprise 5000 with 2GB). We note that both
of these implementations of Andersen's analysis employ a
field-independent treatment of structs, and so these results
are not directly comparable to ours (see the caveats above
about the preliminary nature of results in Table 4).
Implementations of Steensgaard's algorithm are faster and
use less memory. Das reports that Word97 (about 2.2 mil-
lion lines of code) runs in about 60s on a 450MHz Intel Xeon
running Windows NT [8]. Das also reports that modifica-
tions to Steensgaard's algorithm to improve accuracy yield
analysis times of about 130s, and memory usage of "less
than 200MB" for the same benchmark. We again note that
Das uses a field-independent treatment of structs.
7. CONCLUSION
We have introduced CLA, a database-centric analysis archi-
tecture, and described how we have utilized it to implement
a variety of high-performance analysis systems for points-
to analysis and dependence analysis. Central to the per-
formance of these systems are CLA's indexing schemes and
support for demand-driven loading of database components.
We have also described a new algorithm for implementing
dynamic transitive closure that is based on maintaining a
pre-transitive graph, and computing teachability on demand
using caching and cycle elimination techniques.
The original motivation for this work was dependence anal-
ysis to help identify potential narrowing bugs that may be
introduced during type modifications to large legacy C code
bases. The points-to analysis system described in this paper
has been built into a forward data-dependence analysis tool
that is deployed within Lucent. Our system has uncovered
many serious new errors not found by code inspections and
other tools.
Future work includes exploration of context-sensitivity, and
a more accurate treatment of structs that goes beyond field-
based and field-independent (e.g. modeling of the layout of
C structs in memory[7], so that an expression x.f is treated
as an offset "f" from some base object x)
Acknowledgements: Thanks to Satish Chandra and Jeff
Foster for access to their respective systems and bench-
262
] pointer
variables
1018
3332
4359
8246
6126
11289
45091
22360
7K I 0.03s 10.01s
201K 0.08si0.03s
392K 0.15s 0.11s
11232K 0.54s 0.51s
141K i0.11s 0.09s
123K 0.20s 0.17s
15298K 1.05s 1.00s
3865K 0.46s 0.38s
poi to ii tea loser
relations time time
5.2MB
5.4MB
5.7MB
6.0MB
5.7MB
6.0MB
12.1MB
8.8MB
Table 3: Results
IJ assignments
process in core
size loaded in file
114 5933 10402
3201 12907 19022
1792 15411 34126
1560 28445 36603
5886 27566 40280
2732 53805 69715
8377 144534 344156
4281 101856 349045
pointers
1018
3332
4359
8246
6126
11289
45091
22360
nethack
burlap
vortex
emacs
povray
gcc
gimp
lucent
nethack
burlap
vortex
emacs
povray
gcc
gimp
lucent
field-based
relations utime
7K 0.01s
201K 0.03s
392K 0.11s
11232K 0.51s
141KI 0.09s
123K 0.17s
15298K 1.00s
3865K 0.46s
size I pointers
5.2MB 1714
5.4MB 2903
5.7MB 4655
6.0MB 8314
5.7MB 5759
6.0MB 10984
12.1MB 39888
8.8MB 26085
field-independent i(Preliminary)
pointers relations utime [ size
97K 0.03s 5.2MB
323K
164K
14643K
1375K
408K
79603K
19665K
0.21s 5.9MB
0.09s 5.7MB
1.05s 6.7MB
0.39s 6.6MB
0.65s 8.8MB
30.12s 18.1MB
137.20s 59.0MB
Table 4: Effect of a field-independent treatment of structs.
marks.
8. REFERENCES
[1] A. Aiken, M. F~hndrich, J. Poster, and Z. Su, "A Toolkit
for Constructing Type- and Constraint-Based Program
Analyses",
TIC'98.
[2] A. Aiken and E. Wimmers, "Solving Systems of Set
Constraints",
LICS,
1992.
[3] A. Aiken and E. Wimmers, "Type Inclusion Constraints
and Type Inference",
ICFP,
1993.
[4] L. Andersen, "Program Analysis and Specialization for
the C Programming Language", PhD. thesis, DIKU
report 94/19, 1994,
[5] D. Atkinson and W. Griswold, "Effective Whole-Program
Analysis in the Presence of Pointers", 1998 Symp. on the
Foundations of Soft. Eng..
[6] S. Chandra, N. Heintze, D. MacQueen, D. Oliva and M.
Sift, "ckit: an extensible C frontend in ML", to be
released as an SML/NJ library.
[7] S. Chandra and T. Reps, "Physical Type Checking for C"
PASTE,
1999.
[8] M. Das, "Unification-Based Pointer Analysis with
Directional Assignments"
PLDI,
2000.
[9] "Appendix D: Optimizing Techniques (MIPS-Based C
Compiler)",
Programmer's Guide: Digital UNIX Version
4.0, Digital Equipment Corporation, March 1996.
[10] J. Foster, M. FShndrich and A. Aiken, "Flow-Insensitive
Points-to Analysis with Term and Set Constraints" U. of
California, Berkeley,
UCB//CSD97964,
1997.
[11] M. Fg.hndrich, J. Foster, Z. Su and A. Aiken, "Partial
Online Cycle Elimination in Inclusion Constraint Graphs"
PLDI,
1998.
[12] C. Flanagan and M. Felleisen, "Componential Set-Based
Analysis"
PLDI,
1997.
[13] J. Foster, M. F~hndrich and A. Aiken, "Polymorphic
versus Monomorphic Flow-insensitive Points-to Analysis
for
C", SAS
2000.
[14] N. Heintze, "Set Based Program Analysis", PhD thesis,
Carnegie Mellon University, 1992.
[15] N. Heintze, ,Set-Based Analysis of ML Programs",
LFP,
1994.
[16] N. Heintze, "Analysis of Large Code Bases: The
Compile-Link-Analyze Model" unpublished report,
November 1999.
[17] N. Heintze and J. Jaffar, "A decision procedure for a class
of Herbrand set constraints"
LICS,
1990.
[18] N. Heintze and D. McAllester, "On the Cubic-Bottleneck
of Subtyping and Flow Analysis"
LICS,
1997.
[19] "Programming Languages - C",
ISO/IEC
9899:1990,
Internation Standard, !990.
[20] D. McAllester, "On the Complexity Analysis of Static
Analysis",
SAS,
1999.
[21] A. Rountev and S. Chandra, "Off-line Variable
Substitution for Scaling Points-to Analysis",
PLDI,
2000.
[22] M. Shapiro and S. Horwitz, "Fast and Accurate
Flow-Insensitive Points-To Analysis",
POPL,
1997.
[23] Z. Su, M. F£hndrich, and A. Aiken, "Projection Merging:
Reducing Redundancies in Inclusion Constraint Graphs",
POPL, 2000.
[24] B. Steensgaard, "Points-to Analysis in Almost Linear
Time",
POPL,
1996.
[25] F. Tip, "Generation of Program Analysis Tools", Institute
for Logic Language and Computation dissertation series,
1995-5, 1995.
263