Algorithms for CFLs, mostly incorporating dynamic programming. This lecture is the last on CFLs/PDAs.
This lecture's notes were originally typed up by @ikones.
1Algorithms for CFLs¶
1.1Determining if $w \in L(G)$¶
Given $w\in \Sigma^*$, a CFG $G$, how can we determine if $w\in L(G)$ ?
The naive algorithm runs in $O(2^n)$: convert $G$ to CNF, produce all the possible parse (binary) trees of height $|w|$, and check if $w$ is produced by any of them.
A better approach would be to use dynamic programming - the above, plus memoisation. Given $G = (V, \Sigma, P, S)$ and input $w = a_1\ldots a_n$, $a_i \in \Sigma$, create all possible substrings of $w$ of length $1$, length $2$, and so on, memoising the partial results.
We describe a 2-indexed family of subsets of $V$. Define $X_{ij} = \{A \in V \mid A \overset{*}{\Rightarrow} a_i\ldots a_j \}$ (where $V$ is the set of variables). For example, $X_{35}$ is the set of variables which produce $a_3 a_4 a_5 $. Observe that $w\in L(G) \Leftrightarrow S\in X_{1n}$, i.e., our word is in the language if and only if our start variable is in the set of variables which can produce $a_1\ldots a_n$.
Our base case for the program will be $X_{ii} = \{A\in V \mid A\Rightarrow a_i \}$.
Our inductive rule will be if $B\in X_{ik}$, $C\in X_{(k+1)j}$, and $A\to BC \in P$ (where $P$ is taken to be our set of production rules), then $A \in X_{ij}$ for $i\le k \le j$.
The general strategy is to compute all the $X_{ii}$, work up to $X_{1n}$, and check if $S\in X_{1n}$ (i.e., build the memoisation matrix based on $i -j $ in $X_{ij}$).
1.1.1Example¶
Consider the grammar with the following production rules:
$$\begin{align} S & \to AB \mid BC \\ A & \to BA \mid a \\ B & \to CC \mid b \\ C & \to AB \mid a \end{align}$$
and suppose we want to find out if $baaba \in L(G)$. We would build the following table, starting from the bottom row and working our way up:
5 | $\{A, S, C\}$ | - | - | - | - |
---|---|---|---|---|---|
4 | $\emptyset$ | $\{S,A,C\}$ | - | - | - |
3 | $\emptyset$ | $\{B\}$ | $\{B\}$ | - | - |
2 | $\{A,S\}$ | $\{B\}$ | $\{C,S\}$ | $\{A,S\}$ | - |
1 | $\{B\}$ | $\{A,C\}$ | $\{A,C\}$ | $\{B\}$ | $\{A,C\}$ |
$b$ | $a$ | $a$ | $b$ | $a$ |
where the number in the leftmost column presents $(i-j)+1$ in $X_{ij}$, and, for example, the first entry in the '3' column is obtained by checking if any symbols produce the concatenation of an element of $X_{11}$ and $X_{23}$ or a concatenation of an element of $X_{12}$ and $X_{33}$, as formulated in our inductive rule. Since $S \in X_{1n}$, we determine that $baaba$ is in fact in $L(G)$.
The total size of this table is $O(n^2)$. Each check of the form $B\in X_{ik}$, $C \in X_{(k+1)j}$, and $A\to BC \in \mathcal{P}$ takes time independent of the input, so $O(1)$, and for each entry in the table this check has to be performed $O(n)$ times. So our algorithm runs in $O(n^3)$.
1.2Determining if $L(G) = \varnothing$¶
How do we check if the language produced by a given grammar is empty?
Suppose our production rules are of the form
$$\begin{align} S & \to \text{whatever} \\ A & \to \alpha_1 \mid \alpha_2 \mid \ldots \mid \alpha_k \\ B & \to \beta_1 \mid \beta_2 \mid \ldots \mid \beta_k \\ C & \to \gamma_1 \mid \gamma_2 \mid \ldots \mid \gamma_k \end{align}$$
Recall that a variable $A$ is said to be productive if and only if it eventually generates a terminal variable (i.e., $A \overset{*}{\Rightarrow} \alpha \in \Sigma^*$). The idea is we iteratively check to see if $S$ can eventually produce a variable which is productive.
We can use the following algorithm, in which we iterate until there are no more changes:
Empty-Test:
Given: Production rules $\mathcal{P}$
Prod $\leftarrow$ $\emptyset$
Look for $A\in \mathcal{P} $ such that $A$ is productive
Add each such $A$ to Prod
Loop until no change to Prod:
Look for $B \in \mathcal{P}$ such that $B \to C$ where $C\in (\Sigma \cup $Prod$)^*$
End-Loop
If $S\in$Prod then $L(G) \ne \emptyset$.
Else $L(G) = \emptyset$
Here's a more verbose description:
- Look for $A \in V$ such that $A \overset{*}{\Rightarrow} \alpha \in \Sigma^*$
- If there aren't any, then yes, $L(G) = \varnothing$. If there are any such variables, add them to some set which we'll call $\text{Prod}$ (initialised to be empty).
- Look for $B \in V$ such that $B \to \beta \in (\Sigma^* \cup \text{Prod})^*$.
- Iff $S \in \text{Prod}$, then $L(G) \neq \varnothing$.
- If, at any point, we see that $S \in \text{Prod}$, we can stop and return false (i.e., $L(G) \neq \varnothing$).
Note that there is no algorithm that can check if $L(G) = \Sigma^*$ for general $G$.
2Comments on compilers¶
Parsing is the process of extracting a parse tree for a string.
Compilers work by first performing lexical analysis via a DFA (for annotating types, etc) then parsing.
My notes for this section aren't very well-structured so I'm going to leave out the rest (it was only a few lines anyway).