1The pumping lemma for CFLs¶
1.1Relation to regular languages¶
Just as we had for regular languages, contextfree languages also have a pumping lemma that can be used to prove that a language is not contextfree. However, this pumping lemma is quite different from the one we've seen before. The major difference between the machines for recognising contextfree languages and those for regular languages is that the latter are finitestate machines, while the former are more like infinitestate machines due to the existence of the unbounded stack. Thus, the argument we used to prove the pumping lemma for DFAs will not work for PDAs.
So where do we find the structure needed to prove the pumping lemma? In the grammar, of course. We learned earlier that any CFL can be described by a grammar in Chomsky normal form. We'll use this fact in our proof of the pumping lemma.
1.2Sketch of the proof¶
Assume that $L$ has a contextfree grammar $G$ in Chomsky normal form, where $G$ contains $n$ nonterminals (where $n$ is finite).
Recall that in CNF, our rules are all of the form $A \to BC$ and $A \to a$ (or $S \to \epsilon$). Consider a string of length $2^{n+1}$. By the form of the rules in CNF, we know that the branching in the tree for generating this string is at most binary. So the smallest tree you can build to represent such a tree has a height of $n+1$.
Here's another way to think about this: going down the tree, you must have hit at least $n+1$ nonterminals on the way. Since there are only $n$ distinct nonterminals in this grammar, then, by the pigeonhole principle, at least two occurrences along the path must be using the same nonterminal, which we'll call $X$.
Now divide the string up into 5 pieces: $uvwxy$. We'll call the tree that generates $w$ $t$, and the (larger) tree that generates $vwx$ $T$, as in the diagram below.
This shows us where we have the opportunity to pump. In a contextfree language, we could just replace $t$ by $T$ or vice versa (pumping up vs. pumping down). So then you would have either $uvxy$ or $uvvwxxy$, and in the latter case you could keep pumping indefinitely, resulting in $uv^iwx^iy$ for some integer $i$.
This is the essence of the proof for the pumping lemma. We won't formalise it.
1.3Formal statement of the pumping lemma¶
Let $L$ be a CFL. $\exists p \geq 0$ such that $\forall s \in L$ with $s \geq p$, $\exists u, v, w, x, y \in \Sigma^*$ where $s = uvwxy$ such that
 $vx > 0$ (i.e., $v$ and $x$ can't both be the empty string);
 $vwx \leq p$;
 $\forall i \geq 0$, $uv^iwx^iy \in L$.
1.4Using the contrapositive¶
As with regular languages, we can use the contrapositive of the pumping lemma to prove that a language is not contextfree. Here's a formal statement of the contrapositive:
Given some language $L$. If, $\forall p \geq 0$, $\exists s \in L$ such that $s \geq p$, and $\forall u, v, w, x, y \in \Sigma^*$ where $s = uvwxy$, with $vx > 0$ and $vwx \leq p$, $\exists i \geq 0$ such that $uv^iwx^iy \notin L$, then $L$ is not contextfree.
1.4.1Examples¶
$$L = \{a^nb^na^n\}$$
We see intuitively that a pushdown automaton with only one stack is incapable of recognising this language. To prove it, we can use the pumping lemma. We can use the same sort of game formulation that we used with regular languages:
 Adversary chooses some $p$
 We produce a string $s = a^pb^pa^p$, which is clearly $\in L$
 Adversary chooses some $u,v,w,x,y\in \Sigma^*$ such that $s = uvwxy$.
Our job, now, is to choose some $i$ such that $uv^iwx^iy \notin L$. We can do this on a casebycase basis, making use of the fact that $vwx$ must be $\leq p$ (and so $v$ and $x$ cannot be greater than $p$ characters apart). We break up the string $s$ into blocks as follows:
$$\underbrace{a^p}_{\text{block 1}} \underbrace{b^p}_{\text{block 2}} \underbrace{a^p}_{\text{block 3}}$$
 Case 1: both $v$ and $x$ are in block 1. Choose $i \neq 1$. Then $uv^iwx^iy$ would have too many $a$'s at the beginning, and so the resulting string would not be part of $L$.
 Case 2: $v$ or $x$ straddles a block boundary. Choose $i = 2$. Then, the $a$'s and $b$'s would be out of order in the resulting string.
 Case 3: $v = \epsilon$ and $x$ is fully within a block or $x = \epsilon$ and $v$ is fully within a block. Choose $i\neq 1$. Then there would be too many of one letter.
(There are in fact many other ways of choosing the cases  the solution presented above is merely one of many possibilities.)
$$L = \{ww \mid w \in \Sigma^* \} \tag{$\Sigma = \{a, b\}$}$$
This one is trickier since the language is so unconstrained. Instead of using the same approach that we used for the last question, we'll make the following fact: if $L$ is contextfree and $R$ is regular, then $L \cap R$ is contextfree. We define a language $L' = L \cap \{a^*b^*a^*b^*\} = \{a^nb^ma^nb^m\}$. Since $L'$ is the intersection of $L$ with a regular language, we can use the pumping lemma on $L'$ (which is more structured and thus easier to pump), as proving that $L'$ is not contextfree suffices to prove that $L$ is not contextfree.
 Adversary picks some $p$.
 Choose $s = a^pb^pa^pb^p$.
 Adversary chooses some $u,v,w,x,y\in \Sigma^*$ such that $s = uvwxy$.
Now we break up the string $s$ into two equal blocks (halves). As before, we analyse the different possibilities for $v$ and $x$:
 Case 1: $v$ and $x$ are both fully contained in the same block. Choose $i \neq 1$. Then the two halves will no longer match and so the resulting word is not in $L'$.
 Case 2: $v$ and $x$ are both fully contained in different blocks. Because they can only be a maximum of $p$ letters apart, $v$ must be in the $b$ section of the first block and $x$ must be in the $a$ section of the second block. If we choose $i \neq 1$, the two halves will again no longer match.
 Case 3: either $v$ or $x$ straddles the boundary between the two blocks. If we pump up, then there will be a middle section in the resulting word where the $a$'s and $b$'s are out of order.
$$L = \{a^ib^j \mid j = i^2 \}$$
Left as an exercise for the reader.
2Further notes on CFLs¶
2.1Oneletter alphabets¶
There is no CFL over a oneletter alphabet that is not regular. The proof is left as an exercise for the reader.
2.2Designing a CFL from its complement¶
Recall $L = \{ww \mid w \in \Sigma^* \}$, which we showed is not contextfree. However, its complement, $\overline{L}$, is in fact contextfree. Let's design a grammar for $\overline{L}$:
$$\begin{align} S & \to AB \mid BA \mid A \mid B \\ A & \to CAC \mid a \\ B & \to CBC \mid b \\ C & \to a \mid b \end{align}$$
Let's check that this grammar indeed generates $\overline{L}$. This means that no words generated by $G$ can be of the form $ww$, and that all words not of the form $ww$ must be generated by this grammar.
Well, between the rules $S \to A$ and $S \to B$, we can generate all the possible oddlength strings. $A$ generates all oddlength strings with an $a$ in the middle (through $AS \overset{*}{\to} C \ldots C a \ldots C$) and $B$ generates all oddlength strings with a $b$ in the middle. Then, $S \to AB$ and $S \to BA$ together generate all evenlength strings that are not of the form $ww$. To convince yourself that this is true, consider any word $s$ generated by $S \to AB$. Suppose the $A$ part generates a string $xay$ where $x, y \in \Sigma^*$ and $x = y = n$, and suppose the $B$ part generates a string $ubv$ where $u = v = m$. Then the string $s$ can be divided into two (not necessarily equal) parts:
$$s = \underbrace{xay}_{A \overset{*}{\to}} \mid \underbrace{ubv}_{B \overset{*}{\to}}$$
Recall that $y = n$ and $u = m$. So the $a$ and $b$ shown above are exactly $m+n$ characters apart. The word as a whole has a length of $2(m+n+1)$, which means that each half has a length of $m+n+1$. Now, the first $a$ is $n$ letters away from the start of the word. But the $b$ is also $n$ letters away from the midpoint, simply because each half must have $m+n+1$ characters. Here's an illustration:
So the two halves of the resulting string differ  one half has an $a$ at some position where the other half has a $b$. Similar reasoning holds for words generated by the $S \to BA$ rule.
Thus we have an example of a CFL whose complement is not a CFL, illustrating that CFLs are not closed under complementation^{1}. We also learned that CFLs are not closed under intersection; here's an illustration of that:
$$L_1 = \{a^nb^nc^i\} \quad L_2 = \{a^ib^nc^n \} \quad L_1 \cap L_2 = \{a^nb^nc^n \} \mbox{where $L_1$, $L_2$ are CFLs but $L_1 \cap L_2$ is not}$$

CFLs that can be represented by deterministic PDAs are closed under complementation, however. ↩