**Maintainer:**admin

Algorithms for CFLs, mostly incorporating dynamic programming. This lecture is the last on CFLs/PDAs.

This lecture's notes were originally typed up by @ikones.

*1*Algorithms for CFLs¶

*1.1*Determining if $w \in L(G)$¶

Given $w\in \Sigma^*$, a CFG $G$, how can we determine if $w\in L(G)$ ?

The naive algorithm runs in $O(2^n)$: convert $G$ to CNF, produce all the possible parse (binary) trees of height $|w|$, and check if $w$ is produced by any of them.

A better approach would be to use dynamic programming - the above, plus memoisation. Given $G = (V, \Sigma, P, S)$ and input $w = a_1\ldots a_n$, $a_i \in \Sigma$, create all possible substrings of $w$ of length $1$, length $2$, and so on, memoising the partial results.

We describe a 2-indexed family of subsets of $V$. Define $X_{ij} = \{A \in V \mid A \overset{*}{\Rightarrow} a_i\ldots a_j \}$ (where $V$ is the set of variables). For example, $X_{35}$ is the set of variables which produce $a_3 a_4 a_5 $. Observe that $w\in L(G) \Leftrightarrow S\in X_{1n}$, i.e., our word is in the language if and only if our start variable is in the set of variables which can produce $a_1\ldots a_n$.

Our base case for the program will be $X_{ii} = \{A\in V \mid A\Rightarrow a_i \}$.

Our inductive rule will be if $B\in X_{ik}$, $C\in X_{(k+1)j}$, and $A\to BC \in P$ (where $P$ is taken to be our set of production rules), then $A \in X_{ij}$ for $i\le k \le j$.

The general strategy is to compute all the $X_{ii}$, work up to $X_{1n}$, and check if $S\in X_{1n}$ (i.e., build the memoisation matrix based on $i -j $ in $X_{ij}$).

*1.1.1*Example¶

Consider the grammar with the following production rules:

$$\begin{align} S & \to AB \mid BC \\ A & \to BA \mid a \\ B & \to CC \mid b \\ C & \to AB \mid a \end{align}$$

and suppose we want to find out if $baaba \in L(G)$. We would build the following table, starting from the bottom row and working our way up:

5 | $\{A, S, C\}$ | - | - | - | - |
---|---|---|---|---|---|

4 | $\emptyset$ | $\{S,A,C\}$ | - | - | - |

3 | $\emptyset$ | $\{B\}$ | $\{B\}$ | - | - |

2 | $\{A,S\}$ | $\{B\}$ | $\{C,S\}$ | $\{A,S\}$ | - |

1 | $\{B\}$ | $\{A,C\}$ | $\{A,C\}$ | $\{B\}$ | $\{A,C\}$ |

$b$ | $a$ | $a$ | $b$ | $a$ |

where the number in the leftmost column presents $(i-j)+1$ in $X_{ij}$, and, for example, the first entry in the '3' column is obtained by checking if any symbols produce the concatenation of an element of $X_{11}$ and $X_{23}$ or a concatenation of an element of $X_{12}$ and $X_{33}$, as formulated in our inductive rule. Since $S \in X_{1n}$, we determine that $baaba$ is in fact in $L(G)$.

The total size of this table is $O(n^2)$. Each check of the form $B\in X_{ik}$, $C \in X_{(k+1)j}$, and $A\to BC \in \mathcal{P}$ takes time independent of the input, so $O(1)$, and for each entry in the table this check has to be performed $O(n)$ times. So our algorithm runs in $O(n^3)$.

*1.2*Determining if $L(G) = \varnothing$¶

How do we check if the language produced by a given grammar is empty?

Suppose our production rules are of the form

$$\begin{align} S & \to \text{whatever} \\ A & \to \alpha_1 \mid \alpha_2 \mid \ldots \mid \alpha_k \\ B & \to \beta_1 \mid \beta_2 \mid \ldots \mid \beta_k \\ C & \to \gamma_1 \mid \gamma_2 \mid \ldots \mid \gamma_k \end{align}$$

Recall that a variable $A$ is said to be **productive** if and only if it eventually generates a terminal variable (i.e., $A \overset{*}{\Rightarrow} \alpha \in \Sigma^*$). The idea is we iteratively check to see if $S$ can eventually produce a variable which is productive.

We can use the following algorithm, in which we iterate until there are no more changes:

**Empty-Test**:

**Given**: Production rules $\mathcal{P}$

**Prod** $\leftarrow$ $\emptyset$

Look for $A\in \mathcal{P} $ such that $A$ is productive

Add each such $A$ to **Prod**

Loop until no change to **Prod**:

Look for $B \in \mathcal{P}$ such that $B \to C$ where $C\in (\Sigma \cup $**Prod**$)^*$

End-Loop

If $S\in$**Prod** then $L(G) \ne \emptyset$.

Else $L(G) = \emptyset$

Here's a more verbose description:

- Look for $A \in V$ such that $A \overset{*}{\Rightarrow} \alpha \in \Sigma^*$
- If there aren't any, then yes, $L(G) = \varnothing$. If there are any such variables, add them to some set which we'll call $\text{Prod}$ (initialised to be empty).
- Look for $B \in V$ such that $B \to \beta \in (\Sigma^* \cup \text{Prod})^*$.
- Iff $S \in \text{Prod}$, then $L(G) \neq \varnothing$.
- If, at any point, we see that $S \in \text{Prod}$, we can stop and return false (i.e., $L(G) \neq \varnothing$).

Note that there is no algorithm that can check if $L(G) = \Sigma^*$ for general $G$.

*2*Comments on compilers¶

**Parsing** is the process of extracting a parse tree for a string.

Compilers work by first performing lexical analysis via a DFA (for annotating types, etc) then parsing.

My notes for this section aren't very well-structured so I'm going to leave out the rest (it was only a few lines anyway).