Processing math: 100%

Thursday, October 24, 2013 CC-BY-NC
Algorithms for CFLs

Maintainer: admin

Algorithms for CFLs, mostly incorporating dynamic programming. This lecture is the last on CFLs/PDAs.

This lecture's notes were originally typed up by @ikones.

1Algorithms for CFLs

1.1Determining if wL(G)

Given wΣ, a CFG G, how can we determine if wL(G) ?

The naive algorithm runs in O(2n): convert G to CNF, produce all the possible parse (binary) trees of height |w|, and check if w is produced by any of them.

A better approach would be to use dynamic programming - the above, plus memoisation. Given G=(V,Σ,P,S) and input w=a1an, aiΣ, create all possible substrings of w of length 1, length 2, and so on, memoising the partial results.

We describe a 2-indexed family of subsets of V. Define Xij={AVAaiaj} (where V is the set of variables). For example, X35 is the set of variables which produce a3a4a5. Observe that wL(G)SX1n, i.e., our word is in the language if and only if our start variable is in the set of variables which can produce a1an.

Our base case for the program will be Xii={AVAai}.

Our inductive rule will be if BXik, CX(k+1)j, and ABCP (where P is taken to be our set of production rules), then AXij for ikj.

The general strategy is to compute all the Xii, work up to X1n, and check if SX1n (i.e., build the memoisation matrix based on ij in Xij).

1.1.1Example

Consider the grammar with the following production rules:

SABBCABAaBCCbCABa

and suppose we want to find out if baabaL(G). We would build the following table, starting from the bottom row and working our way up:

5 {A,S,C} - - - -
4 {S,A,C} - - -
3 {B} {B} - -
2 {A,S} {B} {C,S} {A,S} -
1 {B} {A,C} {A,C} {B} {A,C}
b a a b a

where the number in the leftmost column presents (ij)+1 in Xij, and, for example, the first entry in the '3' column is obtained by checking if any symbols produce the concatenation of an element of X11 and X23 or a concatenation of an element of X12 and X33, as formulated in our inductive rule. Since SX1n, we determine that baaba is in fact in L(G).

The total size of this table is O(n2). Each check of the form BXik, CX(k+1)j, and ABCP takes time independent of the input, so O(1), and for each entry in the table this check has to be performed O(n) times. So our algorithm runs in O(n3).

1.2Determining if L(G)=

How do we check if the language produced by a given grammar is empty?

Suppose our production rules are of the form

SwhateverAα1α2αkBβ1β2βkCγ1γ2γk

Recall that a variable A is said to be productive if and only if it eventually generates a terminal variable (i.e., AαΣ). The idea is we iteratively check to see if S can eventually produce a variable which is productive.

We can use the following algorithm, in which we iterate until there are no more changes:

Empty-Test:
Given: Production rules P
Prod
Look for AP such that A is productive
Add each such A to Prod
Loop until no change to Prod:
Look for BP such that BC where C(ΣProd)
End-Loop
If SProd then L(G).
Else L(G)=

Here's a more verbose description:

  • Look for AV such that AαΣ
  • If there aren't any, then yes, L(G)=. If there are any such variables, add them to some set which we'll call Prod (initialised to be empty).
  • Look for BV such that Bβ(ΣProd).
  • Iff SProd, then L(G).
  • If, at any point, we see that SProd, we can stop and return false (i.e., L(G)).

Note that there is no algorithm that can check if L(G)=Σ for general G.

2Comments on compilers

Parsing is the process of extracting a parse tree for a string.

Compilers work by first performing lexical analysis via a DFA (for annotating types, etc) then parsing.

My notes for this section aren't very well-structured so I'm going to leave out the rest (it was only a few lines anyway).