Algorithms for CFLs, mostly incorporating dynamic programming. This lecture is the last on CFLs/PDAs.
This lecture's notes were originally typed up by @ikones.
1Algorithms for CFLs¶
1.1Determining if w∈L(G)¶
Given w∈Σ∗, a CFG G, how can we determine if w∈L(G) ?
The naive algorithm runs in O(2n): convert G to CNF, produce all the possible parse (binary) trees of height |w|, and check if w is produced by any of them.
A better approach would be to use dynamic programming - the above, plus memoisation. Given G=(V,Σ,P,S) and input w=a1…an, ai∈Σ, create all possible substrings of w of length 1, length 2, and so on, memoising the partial results.
We describe a 2-indexed family of subsets of V. Define Xij={A∈V∣A∗⇒ai…aj} (where V is the set of variables). For example, X35 is the set of variables which produce a3a4a5. Observe that w∈L(G)⇔S∈X1n, i.e., our word is in the language if and only if our start variable is in the set of variables which can produce a1…an.
Our base case for the program will be Xii={A∈V∣A⇒ai}.
Our inductive rule will be if B∈Xik, C∈X(k+1)j, and A→BC∈P (where P is taken to be our set of production rules), then A∈Xij for i≤k≤j.
The general strategy is to compute all the Xii, work up to X1n, and check if S∈X1n (i.e., build the memoisation matrix based on i−j in Xij).
1.1.1Example¶
Consider the grammar with the following production rules:
S→AB∣BCA→BA∣aB→CC∣bC→AB∣a
and suppose we want to find out if baaba∈L(G). We would build the following table, starting from the bottom row and working our way up:
5 | {A,S,C} | - | - | - | - |
---|---|---|---|---|---|
4 | ∅ | {S,A,C} | - | - | - |
3 | ∅ | {B} | {B} | - | - |
2 | {A,S} | {B} | {C,S} | {A,S} | - |
1 | {B} | {A,C} | {A,C} | {B} | {A,C} |
b | a | a | b | a |
where the number in the leftmost column presents (i−j)+1 in Xij, and, for example, the first entry in the '3' column is obtained by checking if any symbols produce the concatenation of an element of X11 and X23 or a concatenation of an element of X12 and X33, as formulated in our inductive rule. Since S∈X1n, we determine that baaba is in fact in L(G).
The total size of this table is O(n2). Each check of the form B∈Xik, C∈X(k+1)j, and A→BC∈P takes time independent of the input, so O(1), and for each entry in the table this check has to be performed O(n) times. So our algorithm runs in O(n3).
1.2Determining if L(G)=∅¶
How do we check if the language produced by a given grammar is empty?
Suppose our production rules are of the form
S→whateverA→α1∣α2∣…∣αkB→β1∣β2∣…∣βkC→γ1∣γ2∣…∣γk
Recall that a variable A is said to be productive if and only if it eventually generates a terminal variable (i.e., A∗⇒α∈Σ∗). The idea is we iteratively check to see if S can eventually produce a variable which is productive.
We can use the following algorithm, in which we iterate until there are no more changes:
Empty-Test:
Given: Production rules P
Prod ← ∅
Look for A∈P such that A is productive
Add each such A to Prod
Loop until no change to Prod:
Look for B∈P such that B→C where C∈(Σ∪Prod)∗
End-Loop
If S∈Prod then L(G)≠∅.
Else L(G)=∅
Here's a more verbose description:
- Look for A∈V such that A∗⇒α∈Σ∗
- If there aren't any, then yes, L(G)=∅. If there are any such variables, add them to some set which we'll call Prod (initialised to be empty).
- Look for B∈V such that B→β∈(Σ∗∪Prod)∗.
- Iff S∈Prod, then L(G)≠∅.
- If, at any point, we see that S∈Prod, we can stop and return false (i.e., L(G)≠∅).
Note that there is no algorithm that can check if L(G)=Σ∗ for general G.
2Comments on compilers¶
Parsing is the process of extracting a parse tree for a string.
Compilers work by first performing lexical analysis via a DFA (for annotating types, etc) then parsing.
My notes for this section aren't very well-structured so I'm going to leave out the rest (it was only a few lines anyway).