Draft:Context-free language reachability
Draft article not currently submitted for review.
This is a draft Articles for creation (AfC) submission. It is not currently pending review. While there are no deadlines, abandoned drafts may be deleted after six months. To edit the draft click on the "Edit" tab at the top of the window. To be accepted, a draft should:
It is strongly discouraged to write about yourself, your business or employer. If you do so, you must declare it. Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
Last edited by Citation bot (talk | contribs) 19 minutes ago. (Update) |
Context-free language reachability is an algorithmic problem with applications in static program analysis. Given a graph with edge labels from some alphabet and a context-free grammar over that alphabet, the problem is to determine whether there exists a path through the graph such that the concatenation of the labels along the path is a string accepted by the grammar.
Variations
[edit]In addition to the decision problem formulation given above, there are several related function problem formulations of CFL-reachability. For brevity, define an L-path to be a path with edge labels in the language of the grammar. Then:[1]
- The all-pairs variant is to determine all pairs of nodes such that there exists an L-path between them.
- The single-source variant is to determine all nodes that are reachable from a given source node via an L-path.
- The single-target variant is to determine all nodes that are the sources of L-paths that end at a given target node.
- The single-source/single-target variant is to determine whether there is an L-path between two given nodes.
Algorithms
[edit]There is a relatively simple dynamic programming algorithm for solving all-pairs CFL-reachability. The algorithm requires a normalized grammar, where each production has at most two symbols (terminals or nonterminals) on the right-hand side. The runtime of this algorithm , where is the number of terminals and nonterminals in the normalized grammar (which is linear with respect to the original grammar), and is the number of nodes in the graph.[2] The algorithm works by repeatedly adding summary edges to the graph: given a production , if there exists an edge between some nodes x and y labeled with B and an edge between y and z labeled C, then the algorithm adds a new edge labeled A between x and z. This process is repeated until saturation, i.e., until no more summary edges may be added.
Applications to program analysis
[edit]Several problems in program analysis can be formulated as CFL-reachability problems, including:[3]
- Interprocedural program slicing
- Many interprocedural data-flow analyses
- Certain kinds of shape analysis
- Flow-insensitive pointer analysis, including variants with different kinds of polyvariance and on-the-fly callgraph construction.[4][5]
Alias analysis
[edit]Consider an imperative language with pointers, like a simplified C. The program expression graph (PEG) for a program in such a language has a node for each expression in the program, and two kinds of edges:
- A pointer dereference edge labeled
d
from each pointer dereference expression*e
to the corresponding variable expressione
- An assignment edge labeled
a
fromr
tol
for each assignmentl = r
For each d
- and a
-edge, there are also corresponding edges in the opposite direction, labeled ~d
and ~a
, respectively.
The CFL-reachability problem over the PEG and the following grammar encodes the may-alias problem:[6]
M ::= ~d V d
V ::= ~F M? F
F ::= (a M?)*
~F ::= (M? ~a)*
The nonterminal M
signifies that two memory locations may alias, i.e., they point to the same value. Nonterminal V
signifies that two values may alias, i.e., they hold pointers that may alias. F
signifies data-flows, which are sequences of assignments interleaved with memory aliases. ~F
is the inverse production of F
.
The following grammar is equivalent:
M ::= ~d V d
V ::= (M? ~a)* M? (a M?)*
Related problems
[edit]Every CFL-reachability problem can be encoded as a Datalog program.[7]
References
[edit]- ^ (Reps 1998)
- ^ (Melski & Reps 2000)
- ^ (Reps 1998, p. 2)
- ^ He, Dongjie; Lu, Jingbo; Xue, Jingling (2024). "A CFL-Reachability Formulation of Callsite-Sensitive Pointer Analysis with Built-In On-The-Fly Call Graph Construction". 38th European Conference on Object-Oriented Programming (ECOOP 2024). Schloss Dagstuhl – Leibniz-Zentrum für Informatik: 18:1–18:29. doi:10.4230/LIPIcs.ECOOP.2024.18.
- ^ Lu, Jingbo; He, Dongjie; Xue, Jingling (2021-07-23). "Eagle: CFL-Reachability-Based Precision-Preserving Acceleration of Object-Sensitive Pointer Analysis with Partial Context Sensitivity". ACM Trans. Softw. Eng. Methodol. 30 (4): 46:1–46:46. doi:10.1145/3450492. ISSN 1049-331X.
- ^ Zheng, Xin; Rugina, Radu (2008-01-07). "Demand-driven alias analysis for C". Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. POPL '08. New York, NY, USA: Association for Computing Machinery. pp. 197–208. doi:10.1145/1328438.1328464. hdl:1813/8222. ISBN 978-1-59593-689-9.
- ^ (Reps 1998, p. 6)
- Reps, Thomas (1998-12-01). "Program analysis via graph reachability". Information and Software Technology. 40 (11): 701–726. doi:10.1016/S0950-5849(98)00093-7. ISSN 0950-5849.
- Melski, David; Reps, Thomas (2000-10-06). "Interconvertibility of a class of set constraints and context-free-language reachability". Theoretical Computer Science. PEPM'97. 248 (1): 29–98. doi:10.1016/S0304-3975(00)00049-9. ISSN 0304-3975.