Jump to content

Draft:Context-free language reachability

From Wikipedia, the free encyclopedia

Context-free language reachability is an algorithmic problem with applications in static program analysis. Given a graph with edge labels from some alphabet and a context-free grammar over that alphabet, the problem is to determine whether there exists a path through the graph such that the concatenation of the labels along the path is a string accepted by the grammar.

Variations

[edit]

In addition to the decision problem formulation given above, there are several related function problem formulations of CFL-reachability. For brevity, define an L-path to be a path with edge labels in the language of the grammar. Then:[1]

  • The all-pairs variant is to determine all pairs of nodes such that there exists an L-path between them.
  • The single-source variant is to determine all nodes that are reachable from a given source node via an L-path.
  • The single-target variant is to determine all nodes that are the sources of L-paths that end at a given target node.
  • The single-source/single-target variant is to determine whether there is an L-path between two given nodes.

Algorithms

[edit]

There is a relatively simple dynamic programming algorithm for solving all-pairs CFL-reachability. The algorithm requires a normalized grammar, where each production has at most two symbols (terminals or nonterminals) on the right-hand side. The runtime of this algorithm , where is the number of terminals and nonterminals in the normalized grammar (which is linear with respect to the original grammar), and is the number of nodes in the graph.[2] The algorithm works by repeatedly adding summary edges to the graph: given a production , if there exists an edge between some nodes x and y labeled with B and an edge between y and z labeled C, then the algorithm adds a new edge labeled A between x and z. This process is repeated until saturation, i.e., until no more summary edges may be added.

Applications to program analysis

[edit]

Several problems in program analysis can be formulated as CFL-reachability problems, including:[3]

Alias analysis

[edit]

Consider an imperative language with pointers, like a simplified C. The program expression graph (PEG) for a program in such a language has a node for each expression in the program, and two kinds of edges:

  • A pointer dereference edge labeled d from each pointer dereference expression *e to the corresponding variable expression e
  • An assignment edge labeled a from r to l for each assignment l = r

For each d- and a-edge, there are also corresponding edges in the opposite direction, labeled ~d and ~a, respectively.

The CFL-reachability problem over the PEG and the following grammar encodes the may-alias problem:[6]

M ::= ~d V d
V ::= ~F M? F
F ::= (a M?)*
~F ::= (M? ~a)*

The nonterminal M signifies that two memory locations may alias, i.e., they point to the same value. Nonterminal V signifies that two values may alias, i.e., they hold pointers that may alias. F signifies data-flows, which are sequences of assignments interleaved with memory aliases. ~F is the inverse production of F.

The following grammar is equivalent:

M ::= ~d V d
V ::= (M? ~a)* M? (a M?)*
[edit]

Every CFL-reachability problem can be encoded as a Datalog program.[7]

References

[edit]
  1. ^ (Reps 1998)
  2. ^ (Melski & Reps 2000)
  3. ^ (Reps 1998, p. 2)
  4. ^ He, Dongjie; Lu, Jingbo; Xue, Jingling (2024). "A CFL-Reachability Formulation of Callsite-Sensitive Pointer Analysis with Built-In On-The-Fly Call Graph Construction". 38th European Conference on Object-Oriented Programming (ECOOP 2024). Schloss Dagstuhl – Leibniz-Zentrum für Informatik: 18:1–18:29. doi:10.4230/LIPIcs.ECOOP.2024.18.
  5. ^ Lu, Jingbo; He, Dongjie; Xue, Jingling (2021-07-23). "Eagle: CFL-Reachability-Based Precision-Preserving Acceleration of Object-Sensitive Pointer Analysis with Partial Context Sensitivity". ACM Trans. Softw. Eng. Methodol. 30 (4): 46:1–46:46. doi:10.1145/3450492. ISSN 1049-331X.
  6. ^ Zheng, Xin; Rugina, Radu (2008-01-07). "Demand-driven alias analysis for C". Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. POPL '08. New York, NY, USA: Association for Computing Machinery. pp. 197–208. doi:10.1145/1328438.1328464. hdl:1813/8222. ISBN 978-1-59593-689-9.
  7. ^ (Reps 1998, p. 6)