Thursday, October 24, 2013 CC-BY-NC
Algorithms for CFLs

Maintainer: admin

Algorithms for CFLs, mostly incorporating dynamic programming. This lecture is the last on CFLs/PDAs.

This lecture's notes were originally typed up by @ikones.

1Algorithms for CFLs

1.1Determining if $w \in L(G)$

Given $w\in \Sigma^*$, a CFG $G$, how can we determine if $w\in L(G)$ ?

The naive algorithm runs in $O(2^n)$: convert $G$ to CNF, produce all the possible parse (binary) trees of height $|w|$, and check if $w$ is produced by any of them.

A better approach would be to use dynamic programming - the above, plus memoisation. Given $G = (V, \Sigma, P, S)$ and input $w = a_1\ldots a_n$, $a_i \in \Sigma$, create all possible substrings of $w$ of length $1$, length $2$, and so on, memoising the partial results.

We describe a 2-indexed family of subsets of $V$. Define $X_{ij} = \{A \in V \mid A \overset{*}{\Rightarrow} a_i\ldots a_j \}$ (where $V$ is the set of variables). For example, $X_{35}$ is the set of variables which produce $a_3 a_4 a_5 $. Observe that $w\in L(G) \Leftrightarrow S\in X_{1n}$, i.e., our word is in the language if and only if our start variable is in the set of variables which can produce $a_1\ldots a_n$.

Our base case for the program will be $X_{ii} = \{A\in V \mid A\Rightarrow a_i \}$.

Our inductive rule will be if $B\in X_{ik}$, $C\in X_{(k+1)j}$, and $A\to BC \in P$ (where $P$ is taken to be our set of production rules), then $A \in X_{ij}$ for $i\le k \le j$.

The general strategy is to compute all the $X_{ii}$, work up to $X_{1n}$, and check if $S\in X_{1n}$ (i.e., build the memoisation matrix based on $i -j $ in $X_{ij}$).

1.1.1Example

Consider the grammar with the following production rules:

$$\begin{align} S & \to AB \mid BC \\ A & \to BA \mid a \\ B & \to CC \mid b \\ C & \to AB \mid a \end{align}$$

and suppose we want to find out if $baaba \in L(G)$. We would build the following table, starting from the bottom row and working our way up:

5 $\{A, S, C\}$ - - - -
4 $\emptyset$ $\{S,A,C\}$ - - -
3 $\emptyset$ $\{B\}$ $\{B\}$ - -
2 $\{A,S\}$ $\{B\}$ $\{C,S\}$ $\{A,S\}$ -
1 $\{B\}$ $\{A,C\}$ $\{A,C\}$ $\{B\}$ $\{A,C\}$
$b$ $a$ $a$ $b$ $a$

where the number in the leftmost column presents $(i-j)+1$ in $X_{ij}$, and, for example, the first entry in the '3' column is obtained by checking if any symbols produce the concatenation of an element of $X_{11}$ and $X_{23}$ or a concatenation of an element of $X_{12}$ and $X_{33}$, as formulated in our inductive rule. Since $S \in X_{1n}$, we determine that $baaba$ is in fact in $L(G)$.

The total size of this table is $O(n^2)$. Each check of the form $B\in X_{ik}$, $C \in X_{(k+1)j}$, and $A\to BC \in \mathcal{P}$ takes time independent of the input, so $O(1)$, and for each entry in the table this check has to be performed $O(n)$ times. So our algorithm runs in $O(n^3)$.

1.2Determining if $L(G) = \varnothing$

How do we check if the language produced by a given grammar is empty?

Suppose our production rules are of the form

$$\begin{align} S & \to \text{whatever} \\ A & \to \alpha_1 \mid \alpha_2 \mid \ldots \mid \alpha_k \\ B & \to \beta_1 \mid \beta_2 \mid \ldots \mid \beta_k \\ C & \to \gamma_1 \mid \gamma_2 \mid \ldots \mid \gamma_k \end{align}$$

Recall that a variable $A$ is said to be productive if and only if it eventually generates a terminal variable (i.e., $A \overset{*}{\Rightarrow} \alpha \in \Sigma^*$). The idea is we iteratively check to see if $S$ can eventually produce a variable which is productive.

We can use the following algorithm, in which we iterate until there are no more changes:

Empty-Test:
Given: Production rules $\mathcal{P}$
Prod $\leftarrow$ $\emptyset$
Look for $A\in \mathcal{P} $ such that $A$ is productive
Add each such $A$ to Prod
Loop until no change to Prod:
Look for $B \in \mathcal{P}$ such that $B \to C$ where $C\in (\Sigma \cup $Prod$)^*$
End-Loop
If $S\in$Prod then $L(G) \ne \emptyset$.
Else $L(G) = \emptyset$

Here's a more verbose description:

  • Look for $A \in V$ such that $A \overset{*}{\Rightarrow} \alpha \in \Sigma^*$
  • If there aren't any, then yes, $L(G) = \varnothing$. If there are any such variables, add them to some set which we'll call $\text{Prod}$ (initialised to be empty).
  • Look for $B \in V$ such that $B \to \beta \in (\Sigma^* \cup \text{Prod})^*$.
  • Iff $S \in \text{Prod}$, then $L(G) \neq \varnothing$.
  • If, at any point, we see that $S \in \text{Prod}$, we can stop and return false (i.e., $L(G) \neq \varnothing$).

Note that there is no algorithm that can check if $L(G) = \Sigma^*$ for general $G$.

2Comments on compilers

Parsing is the process of extracting a parse tree for a string.

Compilers work by first performing lexical analysis via a DFA (for annotating types, etc) then parsing.

My notes for this section aren't very well-structured so I'm going to leave out the rest (it was only a few lines anyway).