Any errors in the final exam due to the use of these notes will not be given any special consideration. Prakash has not endorsed these notes and is not responsible for any mistakes in them. You can find Prakash's own notes on this topic here.
1The recursion theorem¶
This is actually a fixed-point theorem, though it won't be presented as such.
1.1Warm-up: quines¶
As a warm-up topic, we'll talk about quines. As a simple example, we could tell a computer to write the following sentence twice, the second time in quotes. "As a simple example, we could tell a computer to write the following sentence twice, the second time in quotes."
Pretty much any quine behaves in a manner similar to this. Note that quines have access to their own text so they're a special case of the recursion theorem.
1.2Obtaining the text of a program¶
Recall that if $P$ is a program, $\langle P \rangle$ is a procedure that can be used within programs to get the text of $P$. We will now prove that we don't need this procedure - in fact, we already have this power!
Consider the program which does the following:
- obtain $\langle P \rangle$
- output $\langle P \rangle$
Alternatively:
- obtain $\langle P \rangle$
- run $\langle P \rangle$
Also consider this program, which takes in a word $w$:
- If $w = \epsilon$ then output 0
- Else:
- Obtain $\langle P \rangle$
- Run $\langle P \rangle$ on $\text{tail}(w)$, store it as $n$
- Output $n+1$
Clearly this program gives us the length of the word.
Alternatively, if we were to just run $\langle P \rangle$ on $w$, either the return value would be 0 (if the length is 0) or the program would diverge. (Why is this relevant?)
1.3Statement of the theorem¶
Now back to the theorem itself. Let $T$ be a Turing machine that computes some function $t: \Sigma^* \times \Sigma^* \to \Sigma^*$. In general, $t$ may be a partial function if $T$ fails to halt. There is another Turing machine $R$, which computes $r: \Sigma^* \to \Sigma^*$ where $\forall w \in \Sigma^*$, $r(w) = t(\langle R \rangle, w)$. so it's like $T$ except one of the inputs is replaced by its own code. So this will tell us that a program can obtain its own code naturally.
1.4Self-reproducing programs¶
First, let's distinguish between three different things:
- the machine
- the code of the machine
- a machine that prints out its own code
Given any $w \in \Sigma^*$, we can construct a Turing machine $P_w$ such that $P_w$ ignores its input and outputs $w$. From there, it's easy to construct a total computable function $q: \Sigma^* \to \Sigma^*$ such that $q(w) = \langle P_w \rangle$ (so just the text of a program that prints $w$). I guess this has to be a machine and not the machine.
Notation: $A; B$ means that you run $A$, and the output of $A$ is fed as input to $B$ (with UNIX-style piping: A | B
). So $\langle A; B\rangle$ is the code for $A$, then a semicolon, then the code for $B$ (?).
Now, this $B$ expects the description for a Turing machine as input. We'll call this description $\langle M \rangle$. $B$ outputs $\langle P_{\langle M \rangle}; M \rangle$. First, it creates the description of a program that just prints out $M$. (We can force $B$ to receive the code for a machine, since we are controlling its input - i.e., the output of $A$.) $B$ is fairly easy to write, so we can obtain $\langle B \rangle$ easily.
Then we just let $A$ be $P_{\langle B\rangle}$, i.e., a program that outputs $\langle B \rangle$. So $B$ receives $\langle B \rangle$ (its own source code!) as input, and outputs $\langle P_{\langle B\rangle}; B \rangle = \langle A, B \rangle$. VoilĂ , a self-reproducing program.
TODO: Add diagrams.
1.5The set of minimal programs¶
Consider the following set:
$$MIN_{TM} = \{\langle M \rangle \mid \text{No TM with a shorter description recognises $L(M)$}\}$$
i.e., the set of all "minimal" programs. Obviously this set exists, but can we produce it? The answer is no, we can't. This set is not even RE. Here's a proof by contradiction: Assume that the set is RE, and so we have an enumerator, $E$. Define some program $R$ which runs $E$ to produce the Turing machines in $MIN_{TM}$, and keeps going until it finds some $M$ such that $|\langle <_i \rangle| > |\langle R \rangle|$ (i.e., a machine whose description is longer than that of $R$). Then we simply run the machine $M_i$ on $w$ and do whatever $M_i$ does. But then $R$ is a machine that does exactly the same thing as $M_i$, and yet $R$ is shorter, by the way we chose $M_i$. So $M_i$ can't actually be in $MIN_{TM}$ after all, since it's not the minimal machine for the language that it recognises! Contradiction, and so the set is not RE.
Note that this proof works because we must eventually encounter some $M_i$ whose description is longer than that of $R$ eventually. If all the descriptions were shorter, then there would only be finitely many programs, but we know that there are infinitely many programs.
1.6Some stuff about fixed-point theory¶
Take an arbitrary total computable function $\sigma: \mathbb N \to \mathbb N$. There exists a $u \in \mathbb N$ such that $G_u = G_{\sigma(n)}$1. Consider the following factorial function (written in Python because that's about as close to pseudocode as we can get):
def fact(n): return n * fact(n-1) if n else 1
Now consider the following function:
def H(f): return lambda n: n * f(n-1) if n else 0
Then H(fact) = fact
, which is kind of cool. So every recursive definition is actually a fixed-point equation.
1.6.1Finding fixed points¶
How did we know that H
has a fixed point? In fact, the recursion theorem tells us this. It even tells us how to find it!
Consider a function on a closed interval whose range is that same interval. As long as the function is continuous, then the function has a fixed point somewhere on the interval. On the other hand, if it were an open interval, there wouldn't necessarily be a fixed point (example: $f(x) = x/2$; if 0 is omitted, there is no fixed point).
Sidenote: computability and continuity are related concepts! We can rewrite the $\delta$-$\epsilon$ definition of continuity in terms of bits: $f$ is continuous at $x_0$ if $\forall n >0$, $\exists m > 0$ such that $\forall y$ such that $|x_0 - y| < 2^{-m}$, $|f(x_0) - f(y) | < 2^{-m}$. This has to do with how much precision is needed in the inputs to get a certain amount of precision in the output.
-
$G$ is a Godel universal function. See the detailed notes on the instructor's website for more information. ↩