Tuesday, September 10, 2013 CC-BY-NC
Nondeterministic finite automata

Maintainer: admin

Introduction to nondeterministic finite automata. This class was taught by Costin Popescu (who is neither the professor nor the TA for the class).

1Nondeterministic finite automata

Let $\Sigma = \{0, 1\}$, and we want to check divisibility by 4 (akin to the divisibility-by-3 problem in the previous lecture). Essentially, we accept a word if and only if the last 2 digits are 0. We can do this using a nondetermistic finite automaton. Unlike with a DFA, there can be multiple paths for a given letter (see diagram below for an illustration). We can spawn multiple threads concurrently, each to a different state, and we consider a word to be accepted if there is some possible sequence of threads that results in a final state.

Our diagram for this machine would look something like this:

Divisibility by 4 NFA

Thus, at $q_0$, if we read a 1, we stay at $q_0$. On the other hand, if we read a 0, we spawn two threads: one that goes to $q_1$, and one that stays on $q_0$.

Alternatively, we can think of this as backtracking - first we try one path, and if it doesn't work out, we try another path, until we find one that works (among the legal possibilities). If we never find one that works, then we reject the word.

An NFA is formally defined as a 4-tuple: $(Q, Q_0, \Delta, F)$.

  • $Q$: The set of states.
  • $Q_0$: The set of start states (there can be more than one - in fact it could be all of $Q$); $Q_0 \subseteq Q$.
  • $\Delta$: The function defining the transitions, $Q \times \Sigma \to \mathcal P(Q)$1
  • $F$: The set of final states, $F \subseteq Q$.


Similarly to $\delta^*$ in the previous lecture, we define $\Delta^*: \mathcal P(Q) \times \Sigma^* \to \mathcal P(Q)$. That is, given a word and a set of states, it returns the possible next states. The mathematical definition is as follows:

$$\Delta^*(A, ax) = \bigcup_{q \in A} \bigcup_{q' \in \Delta(q, a)} \Delta^*(\{q'\}, x)$$

where $a$ is a letter and $x$ is a word.

For example, if $A = \{q_0, q_1\}$, $a = 1$ and $x = 00$ (so the word $w = ax$ is $100$), then

$$\Delta^*(A, ax) = \Delta^*(\{q_0, q_1\}, ax) = \bigcup_{q_0, q_1} \bigcup_{q' \in \Delta(A, 1)} \Delta^*(\{q'\}, 00)$$

where $q' \Delta(A, 1)$ just means all the states we can transition to with the letter 1, given the starting states $A$.


Distributivity over the union operation: $\Delta^*(A \cup B, x) = \Delta^*(A, x) \cup \Delta^*(B, x)$

Not sure what to call this ("inductive definition?"): $\Delta^*(A, xy) = \Delta^*(\Delta^*(A, x), y)$ (can be proved by induction on $y$)

1.2Comparing NFAs and DFAs

First, we define $L(N)$ to be the language accepted by the NFA $N$, as in, the set of words for which you end up in at least one final state in $N$. Formally, this is:

$$L(N) = \{w | \Delta^*(Q_0, w) \cap F \neq \varnothing \}$$

The question that is probably on your mind is: How do NFAs compare to DFAs? Are they more powerful than DFAs? The possibly surprising answer is no - NFAs are no more expressive than DFAs. In fact, any arbitrary NFA can be converted into a DFA that recognises the same language.

Stated formally, we have the following: any NFA $N = (Q, Q_0, \Delta, F)$ can be converted into an equivalent DFA $M = (S, s_0, \delta, \hat F)$ such that $L(N) = L(M)$. We construct $M$ explicitly:

  • $S = \mathcal P(Q)$
  • $s_0 = Q_0$ (since $Q_0 \in \mathcal P(Q)$ and thus is an element of $S$)
  • $\displaystyle \delta(A, a) = \bigcup_{q\in A} \Delta(q, a)$ - that is, $\Delta^*(A, a)$
  • $\hat F = \{A \subseteq Q | A \cap F \neq \varnothing\}$ (i.e., the set of elements in $Q$ which contain a final state)

1.2.1Proof by induction

We now prove that $M$ is equivalent to $N$, by induction on the length of the input word $w$. In the base case, $w = \epsilon$. Then $\Delta^*(A, \epsilon) = A$ (since if you have an empty word, you can't transition anywhere) and so $\delta^*(A, \epsilon) = A$, by definition.

Induction hypothesis: Assume, for all $A \subseteq Q$ (equivalent to "all $A \in S$"), that $\Delta^*(A, x) = \delta^*(A, x)$. (So we assume that the transitions coincide.) Then, suppose that $ax \in \Sigma^*$ (where $a$ is a letter in the alphabet, and $x$ is a word in the language). Let's look at the execution of the $\delta^*$ function. $\delta^*(A, ax) = \delta^*(\delta(A, a), x)$ by definition of $\delta^*$, and $\delta(A, a) = \Delta^*(A, a)$, by the way we defined $\delta$ for this machine $M$.


$$\begin{align} \delta^*(A, ax) & = \delta^*(\Delta^*(A, a), x) \tag{by definition} \\ & = \Delta^*(\Delta^*(A, a), x) \tag{by induction hypothesis} \\ & = \Delta^*(A, ax) \tag{by definition} \end{align}$$

Now let's look at $L(N)$:

$$\begin{align} L(N) & = \{w | \Delta^*(Q_0, w) \cap F \neq \varnothing\} \tag{by definition} \\ & = \{w|\Delta^*(Q_0, w) \in \hat F\} \tag{by definition of $\hat F$} \\ & = \{w|\delta^*(Q_0, w) \in \hat F\} \tag{by the thing we just proved by induction} \\ & = \{w|\delta^*(s_0, w) \in \hat F\} \\ & = L(M) \; \blacksquare \end{align}$$

Sidenote: the resulting DFA will have an exponential number of states relative to the corresponding NFA, due to the way it is constructed (using power sets). This illustrates why NFAs are useful: because they are usually simpler to design. Also note that a DFA is nothing but a special case of an NFA.


Next, we add the following concepts to our study of NFAs: $\epsilon$-transitions (exactly what it sounds like: when you transition from one state to the other without taking any input).

Here's an example:

Simple epsilon transition

This machine would accept anything that does not end with 0 (i.e., anything that ends with 1 plus the empty word).

We can now formally add $\epsilon$-transitions to our definition of an NFA $N = (Q, Q_0, N, F)$. First, we change $\Delta$ to be a function from $Q \times (\Sigma \cup \{\epsilon \}) \to \mathcal P(Q)$. Next, we define the term $\epsilon$-closure (of a state $q \in Q$) as the set of all $q'$ such that there exists an $\epsilon$-transition from $q$ to $q'$. Note that this always includes the state $q$ itself. In the example above, the $\epsilon$-closure of $q_0$ would be $\{q_0, q_1\}$, and the $\epsilon$-closure of $q_1$ would be $\{q_1\}$.

Next, we modify $\Delta^*$ to $\hat\Delta: \mathcal P(Q) \times (\Sigma \cup \{\epsilon\}) \to \mathcal P(Q)$. Then $\hat \Delta (A, \epsilon)$ returns the $\epsilon$-closure of $A$ (i.e., the union of all the $\epsilon$-closures for the states $q \in A$, aka $\displaystyle \bigcup_{q \in A} \epsilon-\text{closure}(q)$.

Then, $\hat \Delta (A, ax) = \epsilon-\text{closure}(\hat \Delta(\epsilon-\text{closure}(\Delta(A, a)), x))$.

So our NFA with $\epsilon$-transitions is defined by $N' = (Q, Q_0, \Delta', F')$ where $\Delta'(q, a) = \hat \Delta(\{q\}, a)$ and $F' = \{F \cup \{q_0\}\}$ if the $\epsilon$-closure of $q_0$ contains something in $F$, and just $F$ otherwise.

In terms of expressiveness, NFAs with $\epsilon$-transitions are equivalent to standard NFAs.


Here's an example where NFAs with $\epsilon$-transitions can come in handy. Let's say we have two languages $L_1$ and $L_2$, both of which are regular, and we have machines $M_1$ and $M_2$ that can recognise these languages (respectively). We want to create a machine that can recognise a new language $L$, which is created by interleaving $L_1$ and $L_2$. For instance, if $L_1 = \{aba\}$, and $L_2 = \{cc\}$, then $L=\{abacc, abcac, acbac, cabac, abcca, acbca, cabca, accba, cacba, ccaba\}$.

How do we construct a machine $M$ for recognising $L$? A simple method is to create a new start state, add $\epsilon$-paths from this state to the start states of $M_1$ and $M_2$, and then add $\epsilon$-paths between $M_1$ and $M_2$ (in both directions). See the diagram below for a visual representation.

Constructing M using epsilon transitions

Now let's define $M$ formally. First, note that $M_1 = (S_1, s_1, \delta_1, F_1)$ accepts $L_1$; $M_2 = (S_2, s_2, \delta_2, F_2)$ accepts $L_2$. We construct the NFA $M = (Q, Q_0, \Delta, F)$ as follows:

  • $Q = (S_1 \times S_2 \times \{1\}) \cup (S_1 \times S_2 \times \{2\}) \cup \{q_0\}$ (where $q_0$ is the new start state we have created, and the $\{1\}$ and $\{2\}$ are used to separate $M_1$ and $M_2$ in the state space)
  • $Q_0 = \{q_0\}$
  • $\Delta$ is defined as follows:
    • $\Delta(q_0, \epsilon) = \{(s_1, s_2, 1), (s_1, s_2, 2)\}$
    • $\Delta((s, s', 1), a) = \{(\delta_1(s, a), s', 1)\}$
    • $\Delta((s, s', 2), a) = \{(s, \delta_2(s', a), 2)\}$
    • $\Delta((s, s', 1), \epsilon) = \{(s, s', 2)\}$ (so if we're in $M_1$ and we get an $\epsilon$-transition, we go to $M_2$)
    • $\Delta((s, s', 2), \epsilon) = \{(s, s', 1)\}$ (vice versa)
  • $F = \{(s, s', 1) \,|\, s \in F_1,\, s' \in F_2\} \cup \{(s, s', 2) \, | \, s \in F_1, \, s' \in F_2\}$ (note that this is a disjoint union)
  1. $\mathcal P(Q)$ means the power set of $Q$. So the function $\Delta$ returns a subset of $Q$