Final review CC-BY-NC

Maintainer: admin

The final examination will take place on Tuesday, April 23, at 6pm, in Bronfman. Your room number depends on your last name: Bau-Liu is in room 045, Liu-Zha is in room 046. If, like me, your last name is Liu, then presumably you will have to write your exam in both rooms.

The exam will consist of 6 long-answer questions, possibly including true/false questions.

This page will consist of an overview of the material learned in this course, following the organisation of the textbook (Linear Algebra Done Right by Sheldon Axler), and will be an extension of the midterm review page. Besides this, the following material is available on Wikinotes to help you prepare for the final:

You may also find these notes (created by Professor Tosun) to be useful, if you enjoy craning your neck: What we have done

1Chapter 1: Vector spaces

1.1Complex numbers

Nothing new

1.2Definition of a vector space

A set $V$ along with the operations of addition and scalar multiplication such that the following properties hold:

  • Commutativity of addition
  • Associativity of addition and scalar multiplication
  • Additive identity
  • Additive inverse
  • Multiplicative identity (scalar)
  • Distributivity

1.3Properties of vector spaces

  • The additive identity is unique
  • Additive inverse are unique for each element
  • $0v = 0$ for $v \in V$
  • $a0 = 0$ for $a \in \mathbb F$
  • $(-1)v = -v$ for $v \in V$

1.4Subspaces

A subset $U$ of $V$, which:

  • Contains the zero vector $0 \in V$
  • Is closed under addition
  • Is closed under scalar multiplication

Examples:

  • $\mathcal P(\mathbb F)$: The set of all polynomials with a root at 1 is a subspace. So is the set of all polynomials with a constant term of 0.
  • $\mathbb R^2$: The only subspaces are the zero subspace (which contains only the zero vector), the set of all of lines passing through the origin, and $\mathbb R^2$ itself.
  • $\mathbb R^3$: The only subspaces are the zero subspace, the set of all lines passing through the origin, the set of all planes passing through the origin, and $\mathbb R^3$ itself.

1.5Sums and direct sums

1.5.1Sums

$U_1 + \ldots + U_m$ is the set of all elements $u$ such that $u = u_1 + \ldots + u_m$ where $u_i \in U_i$.

If $U_1, \ldots, U_m$ are subspaces of $V$, their sum $U$ is also a subspace of $V$. This is easy to verify. The zero vector is obviously in the sum, as the zero vector is in each of $U_m$ and thus $0 = 0 + \ldots + 0 \in U$. The sum of any vectors $u, v \in U$ is also in $U$, as we can write them, respectively, as

$$u = u_1 + \ldots + u_m \qquad v = v_1 + \ldots + v_m \qquad u_1, \ldots, u_m, v_1, \ldots, v_m \in U$$

Then, taking their sum, we get:

$$u + v = u_1 + \ldots + u_m + v_1 + \ldots + v_m = \underbrace{(u_1 + v_1)}_{\in U_1} + \ldots + \underbrace{(u_m + v_m)}_{\in U_m}$$

which is the sum of elements from $U_1, \ldots, U_m$ and thus belongs to $U$. Furthermore, if $a \in \mathbb F$ and $u = u_1 + \ldots + u_m$, then $au = a(u_1 + \ldots + u_m) = au_1 + \ldots + au_m$ and since $U_1, \ldots, U_m$ are subspaces, they are themselves closed under scalar multiplication, and so $au_i \in U_i$ for each $i$. Hence, $au$ is the sum of elements $U_1, \ldots, U_m$ and thus belongs to $U$.

Also, the sum is the smallest subspace of $V$ containing all of $U_1, \ldots, U_m$.

If $V$ is the sum of $U_1, \ldots, U_m$, then any element in $V$ can be written as the sum etc.

1.5.2Direct sums

If each element of $V$ can be written uniquely as a sum of $u_1 + \ldots + u_m$, then $V$ is the direct sum of $U_1 + \ldots + U_m$.

For $V$ to be a direct sum of $U_1, \ldots, U_m$, then it must be the sum, and there can only be one way to write the 0 vector. The converse is true too. The proof for $\Rightarrow$ is trivial; the proof for $\Leftarrow$ is pretty simple too, we just set 0 to be one representation minus another, and so all the vectors must be zero, etc.

$V = U \oplus W$ if and only if $V = U + W$ and $U \cap W = \{0\}$. Proof for $\Rightarrow$: the first is from the definition; for the second, assume that $v \in U \cap W$. Then $-v \in U$ and $-v \in W$ (closure under scalar multiplication by -1). So $v + (-v) = 0$ is a sum of a vector in $U$ and a vector in $W$. We know that $0 + 0 = 0$. Since this representation is unique, by the fact that $V$ is the direct sum, then $v = 0$. Thus 0 is the only element in the intersection. For $\Leftarrow$, we need to show that the zero vector can only be written one way. Suppose that $0 = w + u$ where $u\in U$ and $w \in W$. Then $w = -u$ so $w \in U$ and also $u \in W$. Thus $u$ and $w$ are both in the intersection. But the only such element is 0. Thus $u = w = 0$ and so there is a unique way of writing the zero vector as a sum of elements in $U$ and $W$. $\blacksquare$

1.6Exercises

(Interesting results encountered in the exercises.)

  • $-(-v) = v$ for any $v \in V$. Proof: $-(-v) = -(-v) + 0 = -(-v) + ((-v) + v) = (-(-v) + (-v)) + v = 0 + v = v$
  • Suppose $av = 0$. Assume that $a \neq 0$. Then $a$ has a reciprocal, $1/a$. Multiplying both sides by $1/a$ gives us $1/a \cdot av = 1v = v = 0$. Thus $v= 0$ if $a \neq 0$. On the other hand, if $a = 0$, then $a = 0$, QED. So either $a$ or $v$ is 0 (or both).
  • Subspaces: If a subset is defined by the sum of the elements in a tuple being equal to a constant, then it's a subspace only if that constant is 0. If it's a product instead of a sum, it's never a subspace (because the product must be 0 for the zero vector to be in the subset, but this isn't closed under vector addition).
  • The intersection of subspaces is always a subspace. Proof: so trivial I don't even know how to write it out.
  • The union of subspaces is only a subspace if one of them is contained within the other. Proof: Let the subspaces be $U$ and $V$. Suppose neither is contained in the other, so there is a $u\in U$ not in $V$ and a $v \in V$ not in $U$. Now, both $u$ and $v$ are in the union. Consider $u + v$. If $u + v$ were in the union as well, then it must be that $u + v$ is in either $U$ or $V$. If it's in $U$, then $-u$ is in $U$ by closure under scalar multiplication, and $-u + (u+v) = v$ would also be in $U$ by closure under addition. But we have established already that $v \notin U$. The argument for $V$ is similar. Thus, only if one is contained within the other does closure under vector addition hold.
  • The sum of a vector space and itself is itself
  • Addition on vector spaces is commutative and associative. The additive identity is clearly the zero subspace. Consequently, this is the only subspace that has an additive inverse (itself).
  • If $V = W \oplus U_1 = W \oplus U_2$, $U_1$ and $U_2$ can be different. Example: $(y, y)$ and $(x, 0)$ or $(0, y)$.

2Chapter 2: Finite-dimensional vector spaces

2.1Span and linear independence

Span
set of all linear combinations; always a subspace of whatever the ambient space is
Finite-dimensional vector space
There is a list of vectors (recall that lists must be finite) that spans the vector space
Linear independence for $v_1, \ldots, v_m$
$a_1v_1 + \ldots + a_mv_m = 0$ only happens when all the a's are 0. Note that the empty list is linearly independent (so that when you remove a vector from a linearly independent list it remains that way).
Linear dependence lemma
If the vectors are linearly dependent, then one vector in the list can be written as a linear combination of the others (proof: at least one coefficient is non-zero so divide by it). Also, we can remove one without affecting the span (proof: use the previous bit and just replace that vector with the linear combination of the others).
Spanning sets
The length of any independent list $\leq$ the length of any spanning set. Proof: given a spanning list, add a vector from a linearly independent list to it. This makes it linearly dependent. We can then remove one of the vectors in the original list and the new list will still span the vector space. Repeat this process until we've added all the vectors from the linearly independent list. There will always be a spanning-set vector to remove, so this proof holds.

2.2Bases

Basis for $V$
Linearly independent spanning set. Any vector in $V$ can be written uniquely as a linear combination of basis elements. We can reduce a spanning list to a basis by removing linearly dependent elements, and we can extend a linearly independent list to a basis by adding elements from a spanning list that are not in the span of the lin ind list.

For finite-dimensional $V$: Given that $U \subseteq V$, then there exists $W \subseteq W$ such that $V = U \oplus W$. Proof: $U$ has a basis due to the finite thing. We can extend the basis of $U$ to a basis of $V$ by adding some vectors $w_1,\ldots, w_n$, which we let be the basis for $W$. Then we just have to prove that $V$ is their sum and their intersection is 0.

2.3Dimension

Any two bases of a fdvs have the same length. Proof: one is linearly independent, the other spans, so we get an inequality; this is true in the other direction too.

Dimension of a subspace never exceeds the dimension of the ambient space (any linear independent list can be extended, etc). Any spanning list or linearly independent list with length $\dim V$ is a basis for $V$.

2.3.1Lunch in Chinatown

$$\dim(U_1 + U_2) = \dim U_1 + \dim U_2 -\dim(U_1 \cap U_2)$$

Proof: construct a basis for everything, look at the number of vectors in each.

If $V$ is the sum, and $\dim V$ is the sum of the dims, then $V$ is the direct sum. Proof: bases.

2.4Exercises

  • $w$ is in the span of $(v_1+w, \ldots, v_n+w)$ if it's dep. Proof: write out the definition for lin dep, rearrange terms, divide by coefficient for $w$ (non-zero).
  • The subset of $\mathcal P(\mathbb F)$ defined by the zero polynomial plus all polynomials of degree exactly $m > 0$ is not a subspace, because it's not closed under addition: $x^m + c$ and $-x^m +c$ are both members, but their sum, $2c$, is not.
  • It is possible to write an fdvs as the sum of one-dimensional subspaces. Proof: It must have a basis. So make a subspace out of each basis vector. QED.
  • Given a list of length $n$ which is not a spanning set, can it be linearly independent in a vector space of dimension $n$? No, because since it's not a spanning set, it's not a basis, and we know that any linearly independent list of length $n$ must be a basis. Thus it can't be linearly independent.
  • Lunch in Chinatown doesn't hold for > 2 subspaces.

3Chapter 3: Linear maps

3.1Definitions and examples

Linear map from $V$ to $W$: function $T: V \to W$ that satisfies additivity ($T(u+v) = Tu + Tv$) and homogeneity ($T(av) = a(Tv)$, aka it preserves scalar multiplication).

Examples: zero map, identity map, differentiation, integration, multiplication, shifting elements, linear combinations of elements, etc.

Also, $\mathcal L(V, W)$ is a vector space. We add a "product" operation, which is really just composition, in the expected order (so $(ST)(v) = S(Tv)$). This is associative and distributive, and there is an identity $I$, but it is not generally commutative.

3.2Nullspaces and ranges

Nullspace: $\{ v \in V: Tv = 0\}$. Always a subspace. Easy to prove. $T$ is injective if and only if its nullspace is $\{0\}$. Also easy to prove. 0 is always in the nullspace, and if $T$ is injective, then 0 is the only thing in the nullspace. Also, if 0 is the only thing in the nullspace, then if $Tu = Tv$, $0 = Tv - Tu = T(v-u)$ so $v - u= 0$ and $v= u$, proving injectivity.

Range: $\{Tv: v \in V\}$. Also a subspace. Also easy to prove. $T$ is surjective if and only if its range is $W$.

3.2.1Dimension of the range and nullspace

$$\dim V = \dim \text{null} T +\dim \text{range} T$$

(The rank-nullity theorem, Theorem 3.4)

Proof: create bases, apply $T$, the nullspace parts disappear because they go to 0; this proves the spanning. To prove the linear independence part, show that the only way to get 0 is if the coefficients are 0, because of the linear independence of the basis of the ambient space.

Corollary: If $\dim V > \dim W$, there are no injective maps from $V$ to $W$, and there are no surjective maps from $W$ to $V$.

3.3The matrix of a linear map

To find: Evaluate each element of the given basis and make each result a column.

The matrix of a vector with respect to some basis of length $n$ is just the $n \times 1$ column vector of the coefficients for the basis vectors. Then, $M(Tv) = M(T)M(v)$.

3.4Invertibility

$T \in \mathcal L(V, W)$ is invertible if there exists $S \in \mathcal L(W, V)$ such that $ST = TS = I$ (obviously unique).

Invertibility $\Leftrightarrow$ injectivity and/or surjectivity. Proof: Suppose $T^{-1}$ exists. Then if $Tv=Tu$, $v = T^{-1}(Tv) = T^{-1}(Tu) = u$ so $v = u$, and it's injective. Also, if $w$ is in the target space, then $T(T^{-1}w) = w$ where $T^{-1}w$ is in the domain, and so every element in the target space is part of the range. Conversely, suppose that it is injective and surjective. Create a map $S$ defined as the inverse, and prove that $ST = TS = I$ when applied to any vector. Then we just need to prove that $S$ is linear.

Isomorphic vector spaces: there is an invertible linear map between them. Thus there is a bijection between them, and they have the same dimension (from the nullspace-range formula)

$$\dim \mathcal L(V, W) = \dim V \cdot \dim W$$

Operator: linear map from $V$ to $V$ ($T \in \mathcal L(V)$). TFAE: invertibility, surjectivity, injectivity.

3.5Exercises

  • A linear operator defined on a 1-dimensional VS is equivalent to multiplication by a scalar. Proof follows easily from homogeneity.
  • We can always extend a linear map on a subspace to a linear map on the ambient space. It won't be injective, obviously, but we just ignore the part of any vector that we don't have a rule for (in terms of basis vectors, etc).
  • If $T \in \mathcal L(V, \mathbb F)$, and $u \notin \text{null}T$, then $V$ is the direct sum of the nullspace and $\{au: a\in \mathbb F\}$. Proof: show that their intersection is zero, and that their sum is $V$ (take one vector from each such that their sum is an arbitrary $v$).
  • If $T \in \mathcal L(V, W)$ is injective, then $T$ applied to each element of a lin ind list gives a lin ind list. Proof: by additivity of $T$, if some linear combination is equal to 0, then $T(a_1v_1 + \ldots + a_nv_n) = 0$, and by injectivity, the nullspace is 0, so this inherits the independence of the original list.
  • A product of injective maps is injective. Proof: apply argument inductively, starting from the last.
  • If $T: V \to W$ is surjective, then the list composed of applying the map to each element in a spanning set of $V$ spans $W$. Easy to prove: $w \in W$, since $T$ is surjective there is a $v$ such that $Tv = w$. We can write $v$ as a linear combination of the spanning set. Applying $T$ to both sides gives us $w = Tv$ as the desired linear combination.
  • There is a surjective linear map from $V$ to $W$ only if the dimensional of $V$ is greater than that of $W$. Pretty easy to see. To prove, use the rank-nullity theorem for one direction, and create an obviously surjective linear map for the other direction.
  • $ST$ is invertible $\Leftrightarrow$ $S$ and $T$ are both invertible. Proof of $\Rightarrow$: $T$ is injective (nullspace has only 0), and $S$ is surjective (range is $V$). In the other direction, just multiply to get $I$.

4Chapter 4: Polynomials

4.1Degree

$p$ has degree $m \geq 1$. Then $\lambda$ is a root $\Leftrightarrow$ there exists a $q$ with degree $m-1$ such that

$$p(z) = (z-\lambda)q(z)$$

Proof of the non-trivial direction: $p(z) - p(\lambda) = p(z) - 0 = p(z)$ but when you write it all out you can factor out $(z-\lambda)$ from each, so yeah.

Corollary: $p$ has at most $m$ distinct roots.

Another corollary: the standard basis of $\mathcal P$ is, of course, linearly independent.

Division algorithm: Analogous to that for integers. $q = sp + r$, where $\deg r < \deg p$.

4.2Complex coefficients

4.2.1Fundamental theorem of algebra

Every nonconstant polynomial over $\mathbb C$ has a root $\in \mathbb C$. Proof: beyond the scope of this course probably.

Corollary: $p$ has a unique factorisation of the form $p(z) = c(z-\lambda)\cdots(z-\lambda_m)$ where the lambdas are the complex roots.

4.3Real coefficients

Complex conjugate definition: negative of the imaginary part

If $\lambda$ is a root, so is $\overline{\lambda}$. Proof: take the complex conjugate of both sides ($\overline 0 = 0$).

$p$ has a unique factorisation consisting of linear and quadratic polynomials (all irreducible).

4.4Exercises

  • The uniqueness corollary to the division algorithm: if we assume two different representations, we get $(s-s')p = (r- r')$ but the degree of the right side is less than the degree of $p$. So we must have $s-s' = 0$ and so $r-r'=0$, thus $r=r'$ and $s=s'$.

5Chapter 5: Eigenvalues and eigenvectors

5.1Invariant subspaces

If $u \in U$, then $Tu \in U$. Thus a subspace is invariant under an operator $T$ if the range of $T$ is a subspace of the domain. Thus $T$ is an operator on its domain.

For a one-dimensional $V$, the only invariant subspaces are the trivial ones (zero, whole space). For complex vector spaces with dimension greater than 1, there will be other invariant subspaces as well; for real vector spaces, dimension must be greater than 2.

For a self-map (operator), both the nullspace and range are invariant.

Eigenvalue $\lambda$: there exists a nonzero $v \in V$ such that $Tv = \lambda v$. So, $T$ has a one-dimensional invariant subspace $\Leftrightarrow$ $T$ has an eigenvalue. Also, $\lambda$ is an eigenvalue if and only if $T - \lambda I$ is not injective (or invertible, or surjective). $v$ is an eigenvector (found by looking at the nullspace of $T-\lambda I$).

If the operator is a scalar multiple of $I$, so $T=aI$, then $a$ is the only eigenvalue and every vector is an eigenvector.

Some operators, like the counterclockwise 90 degree rotation (defined by $T(x, y) = (-y, x)$), don't have any real eigenvalues. (This one does have complex eigenvalues: $\pm i$.)

The eigenvectors are always linearly independent if the eigenvalues are linearly independent. Proof: let $v \in \text{span}$, write it out as a linear combo, multiply both sides by $\lambda$, then since they're all distinct, the coefficients must be 0.

The maximum number of eigenvalues is $\dim V$, since the eigenvectors must be distinct (same number, etc).

5.2Polynomials applied to operators

$T^m$ is $T$ applied $m$ times

5.3Upper-triangular matrices

Every operator on a fd, non-zero complex vector space has an eigenvalue. Proof: consider the vectors $(v, Tv, T^2v, \ldots, T^nv)$. This is not linearly independent for $\dim V = n$. So we can write 0 as a linear combination of not-all-zero complex coefficients. Take the largest nonzero coefficient, and make all of them up to there the coefficients of a polynomial. We can factor that over the complex numbers! Replace $z$ with $T$ and we see that $T-\lambda_j$ is not injective for at least one $j$, since it sends something non-zero to 0.

Upper triangular matrix: everything below the diagonal is 0 etc.

If $T \in \mathcal L(V)$, then TFAE: (1) the matrix with respect to the basis is upper triangular; (2) $Tv_k$ for any basis vector $v_k$ is in the span of the first $k$ basis vectors; (3) the span of the first $k$ basis vectors is invariant under $T$. Proof: just need to show that (2) implies (3). Well, $Tv_j \in \text{span}(v_1, \ldots, v_j)$ which is itself contained within $\text{span}(v_1, \ldots, v_k)$ for $j \leq k$. Then a linear combination of $(v_1, \ldots, v_k)$ is also within the span of $\text{span}(v_1, \ldots, v_k)$ and so $(v_1, \ldots, v_k)$ is invariant under $T$.

Any operator has an upper-triangular matrix with respect to some basis. Proof: by induction on the dimension. Just have to show that $Tv_k \in \text{span}$ some basis, then use the previous proposition.

$T$ is invertible $\Leftrightarrow$ its upper triangular matrix has no zeros on the diagonal (eigenvalues). Proof: if one eigenvalue is 0, then $Tv = 0$ for some nonzero $v$, so it's not injective, so it's not invertible. If it's not invertible, it's not injective, so there's some $v$ such that $Tv = 0$. That means there's an eigenvalue of 0. For the other direction, if it's not invertible, then it's not injective, so there is a $v$ with $Tv=0$. Write $v$ as a linear combination of its basis vectors, and apply $T$ to both sides: $0 = T(a_1v_1 + \ldots + a_kv_k) = (a_1Tv_1 + \ldots + a_{k-1}Tv_{k-1}) + a_kTv_k$. Then $Tv_k$ is in the span of $(v_1, \ldots, v_{k-1})$ (since $a_kTv_k$ is, and we can just multiply by the reciprocal of $a_k$, since it's non-zero). So when we write $Tv_k$ as a linear combination of the basis vectors, the coefficient for $v_k$ will be 0, and so the corresponding value in the diagonal, $\lambda_k$, will also be zero.

To prove that the values along the diagonal are eigenvalues, consider the matrix representation of $T-\lambda I$, which will have $\lambda - \lambda_i$ along its diagonal for various $i$. We know from above that if there is a zero on the diagonal, $T-\lambda I$ is not invertible, and vice versa. So if $\lambda = \lambda_i$ for some $i$, this means that an eigenvalue was on the original diagonal, and so $(T-\lambda I)v = 0$ for some non-zero $v$. The other direction is true too.

5.4Diagonal matrices

An operator has a diagonal matrix with respect to some basis if the eigenvalues are distinct (and thus the eigenvectors comprise a linearly independent list, and hence, a basis). (It can also have a diagonal matrix even when the eigenvalues are not distinct, sometimes.)

TFAE:

  • $T$ has a diagonal
  • Eigenvectors form a basis of $V$
  • The sum of one-dimensional invariant subspaces is the whole space
  • The sum of nullspaces of $T-\lambda_k I$ is the whole space
  • The sum of dims of the above fits

5.5Invariant subspaces on real vector spaces

Every operator has an invariant subspace of dim 1 or 2. Proof: polynomial thing again (factorisation).

Every operator on an odd-dimensional space has an eigenvalue. Proof: by induction, on the dimension. tl;dr

5.6Exercises

  • A subspace that is invariant under every operator $T \in \mathcal L(V)$ is either zero or $V$. Proof: Suppose we have a subspace $U$ that contains at least one non-zero vector but not all vectors in $V$ (so there is a $w \notin U$). Extend a non-zero vector $u \in U$ to a basis. Then we define a linear operator by mapping any element in $U$, written as $v = au + \ldots$, to $aw$. But $aw \notin U$.
  • Suppose $ST = TS$. Then $\text{null} (T-\lambda I)$ is invariant under $S$ for any $\lambda$. Proof: Suppose $v$ is in the nullspace. Apply $T-\lambda I$ to $Sv$. By distributivity, you get $TSv - \lambda Sv = STv - \lambda Sv = S((T-\lambda I)v) = S(0) = 0$. So $Sv$ is also in the nullspace. Thus it's invariant. Pretty neat.
  • Finding eigenvectors and eigenvalues of a linear op: Just use the definition. $Tv = \lambda v$. Solve for $\lambda$ first (creating simultaneous equations based on what each element is mapped to), then for $v$ (actually a set of eigenvectors). Don't forget that 0 can be a valid eigenvalue.
  • $T$ has at most $(\dim \text{range } T) + 1$ distinct eigenvalues. Proof: dim of the nullspace has to be at most 1 (if there is a 0 eigenvalue).
  • If $\lambda$ is an eigenvalue of $T$, $1/\lambda$ is of $T^{-1}$.
  • $ST$, $TS$ have the same eigenvalues.
  • If every vector is an eigenvector, then we have a scalar multiple of the identity operator.
  • If every subspace with one dim less than $V$ is invariant under $T \in \mathcal L(V)$, then $T$ is a scalar multiple of the identity operator. Proof: if $T$ is not a scalar multiple of $I$, there is a $v \in V$ which is not an eigenvector of $T$. So $v$ and $Tv$ are linearly independent. We can extend this to basis of $V$. Now remove $Tv$ from the list. The dimension of the subspace thus produced is $\dim(V) - 1$. However, $Tv$ is not in this subspace, so it's not invariant under $T$. Proof by contradiction concluded.
  • If $P^2=P$, then the ambient space is the direct sum of the nullspace and range. Proof: $u-Pv$ is in the nullspace, and $Pv$ is in the range.

6Chapter 6: Inner-product spaces

6.1Inner products

Inner product: a function that takes $u, v \in V$ and outputs a number from the field. Properties:

  • Positive definiteness ($\langle v, v \rangle \geq 0$, with equality iff $v=0$)
  • Linearity in the first argument (additivity and homogeneity)
  • Conjugate symmetry ($\langle u, v \rangle = \overline{\langle v, u \rangle}$ - just symmetry in a real inner product space)

Standard inner products:

  • The standard inner product on $\mathbb F^n$ is the Euclidean inner product, which is like the dot product but with the conjugate of the elements in the second vector.
  • For $\mathcal P(\mathbb F)$, we have $\displaystyle \langle p, q\rangle = \int_0^1 p(x)\overline{q(x)}\,dx$.

Note that the function defined by $f: v \to \langle v, w \rangle$ where $w$ is fixed is a linear map from $V$ to $\mathbb F$. Since 0 must go to 0, we have that $\langle 0, w \rangle = \langle w, 0 \rangle = 0$ for any $w \in V$. We also have conjugate homogeneity in the second slot: $\langle u, av \rangle = \overline{a} \langle u, v\rangle$

6.2Norms

The norm of a vector is the square root of its inner product with itself. Properties:

  • The norm is only 0 if the vector is (by positive definiteness)
  • $\|av\|^2 = |a|^2\|v\|^2$, from which we get that $\|av\| = |a|\|v\|$. Note that working with norms squared is usually easier, for obvious reasons.
  • If $\langle u, v \rangle = 0$, we say that $u$ and $v$ are orthogonal. (0 is orthogonal to everything, and is the only vector orthogonal to itself.)

6.2.1Pythagorean theorem

If $u$ and $v$ are orthogonal, then $\|u+v\|^2 = \|u\|^2 + \|v\|^2$.

Proof: $\|u+v\|^2 = \langle u+v, u+v \rangle = \|u\|^2 + \|v\|^2 + \langle u, v \rangle + \langle v, u \rangle$ but the last two terms are 0 since they're orthogonal.

6.2.2Orthogonal decomposition

Suppose we have two vectors $u$ and $v$, and we want to write $v$ as the sum of $au$ for $a \in \mathbb F$ and $w \in V$ where $w$ is orthogonal to $u$. How can we do this? Well, $v = au + w$. If $w = v-au$, then that satisfies it. Now we just need to choose $a$ accordingly to ensure that $w$ and $u$ are orthogonal. We need $\langle w, u \rangle = \langle v-au, u\rangle=0$. If we rewrite this in terms of norms, we have that

$$\langle v-au, u \rangle = \langle v, u \rangle - \langle au, u \rangle = \langle v, u \rangle - a \|u\|^2 = 0$$

Thus, we choose $a$ to be

$$a = \frac{\langle v, u \rangle}{\|u\|^2}$$

where we assume that $u \neq 0$ (otherwise, we could just set $w = v$ in which case the decomposition is trivial).

So the decomposition we wanted is

$$v = \underbrace{\left ( \frac{\langle v, u \rangle}{\|u\|^2} u \right )}_{\text{scalar multiple of } u} + \underbrace{\left ( v - \frac{\langle v, u \rangle}{\|u\|^2} u \right )}_{\text{orthogonal to } u}$$

6.2.3Cauchy-Schwarz inequality

$$|\langle u, v \rangle | \leq \|u\|\|v\|$$

Equality holds only if one is a scalar multiple of the other.

Proof: Assume $u\neq 0$ (trivial case). Then, $\|u \|^2 \neq 0$, and so we can divide by it. Then the decomposition shown above for $v$ is valid. So we have the orthogonal vectors $u$ and $w$, whose sum is equal to $v$. The Pythagorean theorem tells us that $\|u+w\|^2 = \|u \|^2 + \|w\|^2$. So:

$$\|v\|^2 = \| \frac{ \langle v, u \rangle }{\|u\|^2} u \|^2 + \|w \|^2 = \frac{\langle v, u \rangle^2}{\|u\|^4} \|u\|^2 + \|w\|^2 = \frac{\langle v, u \rangle^2}{\|u\|^2} + \|w\|^2$$

Then we just multiply both sides by $\|u\|^2$ to get

$$\|v\|^2\|u\|^2 = \langle v, u \rangle^2 + \|w\|^2\|u\|^2$$

Since the last expression is never negative, we get $\|v\|^2\|u\|^2 \geq \langle v, u \rangle^2$ and if we take the square root we get the Cauchy-Schwarz inequality, as desired. We only get equality if $\|w\|^2 = 0$ (since we already established that $u\neq 0$ and so $\|u\|^2 \neq 0$). This can only happen if $w= 0$, by positive definiteness. But if $w=0$ then $v$ is just a scalar multiple of $a$!

6.2.4The triangle inequality

$$\|u+v\| \leq \|u\| + \|v\|$$

Proof: Write it out, use Cauchy-Schwarz. Equality only happens if one is a (non-negative) scalar multiple of the other.

6.2.5The parallelogram equality

$$\|u+v\|^2 + \|u-v\|^2 = 2(\|u\|^2 + \|v\|^2)$$

Proof: Directly from the inner products. Very simple.

6.3Orthonormal bases

Orthonormal list: any two vectors in it are orthogonal, and the norm of each is 1.

Note that $\|a_1e_1 + \ldots a_ne_n\|^2 = |a_1|^2 + \ldots + |a_n|^2$ where $(e_1, \ldots, e_n)$ is an orthonormal list. Proof: Pythagorean theorem. A corollary of this is that any orthonormal list is linearly independent (just try to make zero). Thus, if it's also a spanning set (i.e., of length $\dim V$), then it's an orthonormal basis.

6.3.1Linear combinations of an orthonormal basis

$$v = \langle v, e_1 \rangle e_1 + \ldots + \langle v, e_n \rangle$$

Proof: Assume $(e_1, \ldots, e_n)$ is a basis. Then since it's a spanning set, we can write $v = a_1e_1 + \ldots + a_ne_n$. Take the inner product of $v$ with $e_j$: $\langle e_j, v \rangle = a_1 \langle e_1, e_j \rangle + \ldots + a_n \langle e_n, e_j \rangle$. Since it's an orthonormal basis, all the inner products on the right side vanish except $\langle e_j, e_j \rangle$, which is 1 (since each vector has a norm of 1). So we just get $\langle e_j, v \rangle = a_j$. Applying this for every $j$ should convince you of the validity of this proposition.

If we take the squared norm of $v$, we get:

$$\|v \|^2 = |\langle v, e_1 \rangle|^2 + \ldots + |\langle v, e_n\rangle|^2$$

which follows from the two previous propositions.

6.3.2Gram-Schmidt

We can turn any linearly independent list into an orthonormal list that spans the same subspace! Here's how:

First, set $e_1 = v_1 / \|v_1\|$. (This ensures that $e_1$ has a norm of 1.) The rest is done inductively. For $j > 1$, we set

$$e_j = \frac{v_j - \langle v_j, e_1 \rangle e_1 - \cdots - \langle v_j, e_{j-1} \rangle e_{j-1}}{\|v_j - \langle v_j, e_1 \rangle e_1 - \cdots - \langle v_j, e_{j-1} \rangle e_{j-1}\|}$$

Note that the denominator cannot be zero because all the $v$s are linearly independent. Then $e_j$ clearly has a norm of 1. We just need to show that it's orthogonal to every other $e_k$. If we take the inner product of $e_j$ and $e_k$ where $k < j$, then all the $\langle e_i, e_k \rangle$ terms disappear (due to pairwise orthogonality) for $i \neq k$ so all we have left is $\langle v_j, e_k \rangle - \langle v_j, e_k \rangle$ in the numerator. Of course, this is just 0. So $(e_1, \ldots, e_j)$ is indeed an orthonormal list. We can see that $v_j$ is in the span of $(e_1, \ldots, e_j)$ (just have to rearrange the formula for $e_j$), which means that the $v$ spanning set is contained within the $e$ one. Since both lists are also linearly independent, both subspaces have the same dimension $j$, and thus are equal.

Gram-Schmidt can be applied to any finite-dimensional inner-product space! Thus any FDIPS has an orthonormal basis. Furthermore, any orthonormal list can be extended to a basis.

6.3.3Upper triangular matrices and orthonormal bases

Any operator $T$ that has an upper-triangular matrix with respect to some basis has a UT matrix wrt an orthonormal basis. Proof: We know that the span of $(v_1, \ldots, v_j)$ is invariant under $T$ for each $j$. If we apply Gram-Schmidt to get an orthonormal basis, we conclude the same thing about the orthonormal basis. QED

Note that in a complex VS, any operator has a UT matrix with respect to an orthonormal basis, by the above.

6.4Orthogonal projections and minimisation problems

6.4.1The orthogonal complement

The orthogonal complement of $U$, denoted $U^{\perp}$, is the set of all vectors that are orthogonal to every vector in $U$.

Properties:

  • Is a subspace (0 is obvious, closure under scalar multiplication follows from homogeneity, closure under addition follows from additivity)
  • Complement of $V$ is $\{0\}$ and vice versa
  • If $U_1 \subset U_2$, the complements have the opposite relation
  • $U \oplus U^{\perp} = V$ where $U$ is a subspace of $V$. Proof: Let $(e_1, \ldots, e_m)$ be a basis for $U$. We can write any vector in $V$ as $v = (\langle v, e_1 \rangle + \ldots) + (v - \langle v, e_1 \rangle - \ldots) = u + w$. Now $u \in U$, clearly. Is $w \in U^{\perp}$? Yes, because if we take $\langle w, e_j\rangle$ for any $j$ we get $\langle v, e_j \rangle - \langle v, e_j \rangle = 0$ since all the other inner products vanish. This proves that $V$ is the sum. We just need to show that the intersection is only the zero vector. But this is obvious, since anything in the intersection would have to be orthogonal to itself, and by positive definiteness only 0 meets this criteria.

Corollary: The complement of $U^{\perp}$ is $U$. We can prove inclusion in both directions. I don't feel like doing it.

6.4.2Orthogonal projections

$P_Uv$ maps $v$ to the part of its orthogonal decomposition that is in $U$.

Properties:

  • Range is $U$
  • Nullspace is $U^{\perp}$
  • $v-P_Uv$ is in the nullspace
  • $P_U^2 = P_U$
  • $\|P_Uv\| \leq \|v\|$

These are useful in minimisation problems: Given some $v \in V$, find the point $u$ in the subspace $U$ to minimise $\|v-u\\$. How do we do that? First, let's prove a proposition:

$$\|v-P_Uv\| \leq \|v-u\|$$

Start from $\|v-P_Uv\|^2$. This is $\leq \|v-P_Uv \|^2 + \|P_Uv - u\|^2$, which is itself equal to $\|v-u\|^2$ (the middle terms cancel out) because of the Pythagorean theorem (which applies because the vectors are orthogonal). Then we just take square roots. Equality only holds if $\|P_Uv-u\|=0$, which is exactly when $u = P_Uv$. So the solution to a minimisation problem is to take $u = P_Uv$!

6.4.3Solving minimisation problems

Suppose we want to approximate some function $f$ in terms of polynomials of degree $n$ or less. First, we need to find an orthonormal basis of $\mathcal P^n$. Then we just compute the inner product of $f$ with each vector in the basis, and use the resulting scalar as the coefficient for the corresponding vector. There is a halfhearted example here.

6.5Linear functionals and adjoints

Linear functional: $\varphi: V \to \mathbb F$ (linear map). Equivalent to sending $v$ to $\langle v, u \rangle$ where $u$ is some fixed vector. Furthermore, there is a unique choice of $u$ for any linear functional. The existence proof simply requires writing $v$ in terms of its orthonormal basis. For uniqueness, assume that $\langle v, u_1 \rangle = \langle v, u_2 \rangle$. Then $0 = \langle v, u_1 -u_2\rangle$ for any $v$, including $v = u_1 - u_2$. By positive definiteness, we conclude that $v = u_1 - u_2 = 0$, and so $u_1=u_2$ QED.

Adjoint: If $T \in \mathcal L(V, W)$, then $T^{*} \in \mathcal L(W, V)$ is defined such that $\langle Tv, w \rangle = \langle v, T^*w\rangle$. To compute the adjoint for a given $T$, just use the definition and try to get it in the form of an inner product. This should let you compute $T^* w$, and if you made $w$ arbitrary, that gives you the definition of $T^*$. Note that the adjoint is always a linear map (additivity and homogeneity follow from the definition).

Properties of the adjoint operator (used in the sense of operator/operand, not linear operator):

  • Additivity: $(S+T)^* = S^* + T^*$
  • Conjugate homogeneity: $(aT)^* = \overline{a}T^*$
  • The adjoint of an adjoint is the original
  • $I^* = I$
  • $(ST)^* = T^*S^*$
  • The nullspace of an adjoint is the complement of the range of $T$
  • Range of an adjoint is complement of nullspace of $T$
  • The above two, replacing $T$ with $T^*$ and vice versa (the proofs of these follow almost directly from the definitions)

6.5.1The conjugate transpose

I knew there was some relation between adjoints and matrix transposition! The conjugate transpose of a matrix is just transposing the transpose of the matrix and taking the conjugate of each element.

6.6Exercises

  • $\langle x, y \rangle = \|x\|\|y\|\cos\theta$. Proof: draw triangle, use cosine law.
  • $\|u \| \leq \|u+av\| \Leftrightarrow \langle u, v\rangle = 0$. Proof: Pythagorean for one direction. The other direction is weird.
  • To prove that a certain inner product does not exist, show that it violates something (e.g., the parallelogram equality - memorise it!)
  • If we try to apply Gram-Schmidt to a linearly dependent list, at some point we will get a division by zero and the world will explode.
  • To find an orthonormal basis with respect to which some operator has a UT matrix, first find a regular basis that gives a UT matrix, then apply Gram-Schmidt.
  • To prove that something is an orthogonal projection, show that it sends any vector in the ambient space to one portion of its orthogonal decomposition (specifically, the range of $P$). Do this using the $v = Pv + (v - Pv)$ trick, where $Pv$ is in the range and $v-Pv$ is in the nullspace. These two are complements, so, QED.
  • There is a relationship between orthogonal projections and invariant subspaces. It's very subtle. You've probably never heard of it.
  • To find the adjoint, just work from the definitions, etc
  • The conjugate of an eigenvalue of $T$ is an eigenvalue of $T^*$. Proof: use the definition of eigenvalues (with the $T-\lambda I = 0$ thing), apply the adjoint operator to both sides, etc.
  • If an adjoint is injective, the original is surjective, and vice versa (all four possibilities).

7Chapter 7: Operators on inner-product spaces

7.1Self-adjoint and normal operators

7.1.1Self-adjoint operators

Self-adjoint: if $T=T^*$ (aka Hermitian). The matrix has to be equal to its conjugate transpose. Self-adjointness is preserved under addition and scalar multiplication. The concept of a self-adjoint operator can be considered to be loosely analogous to that of a number that is equal to its own conjugate (that is, a real number).

7.1.2Eigenvalues of self-adjoint operators

All eigenvalues of self-adjoint operators are real. Proof: use the definitions. We get that $\lambda \|v\|^2 = \overline{\lambda} \|v\|^2$ so $\lambda = \overline{\lambda}$ so $\lambda$ is real.

7.1.3Zero operators

If $V$ is a complex (and not real) inner product space, and $T \in \mathcal L(V)$ such that $\langle Tv, v \rangle = 0$ for all $v \in V$, then $T = 0$. Proof: We can write $\langle Tu, w \rangle$ in a weird way, getting a bunch of inner-product terms on the right. Each term can be written as $\langle Tv, v \rangle$ for some $v$. If this is equal to 0, then the LHS is zero as well for all choice of $u$ and $w$. If $w=Tu$, then we have that $\langle w, w \rangle = 0$ for any $w$. But this can only happen if $T =0$.

Remark: This is only true if $V$ is not real. For example, if we're working on $\mathbb R^2$, with $T(x, y) = (-y, x)$, then $\langle Tv, v \rangle = \langle (-y, x), (x, y) \rangle = -yx + xy = 0$, and yet $T \neq 0$. Of course, arises from the fact that the eigenvalues are not real, and thus not part of the field we're working over. In a complex vector space, all the eigenvalues are indeed part of the field we're working over (i.e., $\mathbb C$), and so if $\langle Tv, v \rangle = 0$, then it must be that the operator is the zero operator.

Corollary: $T$ is self-adjoint $\Leftrightarrow$ $\langle Tv, v \rangle \in \mathbb R$ for all $v$. Proof: subtract the conjugate, use conjugate symmetry and the definition of adjointness. Then $\langle Tv, v \rangle - \overline{\langle Tv, v\rangle} = \langle (T-T^*)v, v\rangle$. So if the number is real, then it's equal to its conjugate, so the LHS is 0; this implies that the RHS is real, in which case $T - T^* = 0$ (by the above proposition) so $T=T^*$ and $T$ is self-adjoint. The other direction is the same idea.

Also, if $T$ is self-adjoint and $\langle Tv, v\rangle = 0$ for all $v$, then $T = 0$. We already know this is true for a complex IPS, so assume that we're working in a real IPS. The proof is the same as before, with $w = Tu$, only we use the fact that $\langle w, Tu \rangle = \langle Tu, w \rangle$ by conjugate symmetry and self-adjointness.

7.1.4Normal operators

$T$ is normal if

$$TT^* = T^*T$$

(i.e., commutes with its adjoint). All self-adjoint operators are normal. Not all normal ops are self-adjoint.

Relation to norm: $T$ is normal $\Leftrightarrow$ $\|Tv \| = \|T^*v\|$ for all $v$. Proof: Since $TT^* - T^*T = 0$, we apply this operator to the first slot of an inner-product with $v$ in both slots. Use additivity to separate the two, and move one over to the other side, to get $\langle T^*T v, v \rangle = \langle TT^*v, v \rangle$. Then just use the adjoint definition to get $\langle Tv, Tv \rangle = \langle T^*v, T^*v \rangle$ which is equivalent to the norm formulation. This works in both directions. QED

Eigenvectors and eigenvalues of adjoints: If $T$ is normal, then any eigenvector for $\lambda$ is also the eigenvector of the conjugate of $\lambda$. Proof: $(T-\lambda I)$ is normal as well, and that times $v$ is 0. Use the norm relation above to get that $\| (T-\lambda I)^* v \| = 0$, and if we distribute the adjoint we get $\|(T^*-\overline{\lambda}I)v \| = 0$, so that's that.

Orthogonality of eigenvectors: eigenvectors for distinct eigenvalues of a normal op are orthogonal. Proof: suppose $Tv_1 = \lambda_1v_1$ and $Tv_2 = \lambda_2v_2$. Consider $(\lambda_1 - \lambda_2)\langle v_1, v_2 \rangle$. By distributivity and conj symmetry, we get $\langle \lambda_1 v_1, v_1 \rangle - \langle v_1, \overline{\lambda_2} v_2 \rangle = \langle Tv_1, v_1 \rangle - \langle v_2, T^*v_2 \rangle$ (the last equality comes from the fact that adjoint ops have the same eigenvectors with conjugate eigenvalues. Now, the last expression simplifies quite nicely to 0, since they are equivalent by the definition of adjoint operators. So $(\lambda_1 - \lambda_2)\langle v_1, v_2 \rangle = 0$, and if the eigenvalues are distinct, $(\lambda_1-\lambda_2) \neq 0$ so it must be that the inner product is 0. Thus the eigenvectors are orthogonal.

7.2The spectral theorem

This is a very important theorem! It seeks to answer the following question: for which operators can the eigenvectors form an orthonormal basis (or, equivalently, which operators have diagonal matrices wrt to an orthonormal basis)? For complex vector spaces, the answer is normal operators; for real, self-adjoint. Let's look at the complex spectral theorem first.

Before that, let's look at a quick example. Consider the following matrix: $\displaystyle \begin{pmatrix} 2 & -3 \\ 3 & 2 \end{pmatrix}$. This so happens to the matrix of a normal operator (verify that yourself if you like). We can find the eigenvalues by solving a system of simultaneous equations using the quadratic formula: $\lambda = 2 \pm 3i$. Then the eigenvectors are $(x, -ix)$ or equivalently $(ix, x)$, and $(-iy, y)$. To normalise, just set $x=y=1$ (or any other arbitrary number) and divide by the resulting norm. This gives us the following orthogonal basis:

$$\left ( (i/\sqrt{2}, 1/\sqrt{2}), (-1/\sqrt{2}, 1/\sqrt{2} ) \right )$$

The matrix for $T$ with respect to this matrix so happens to be a diagonal matrix whose diagonal elements are the eigenvalues, in the same order:

$$\begin{pmatrix} 2 + 3i & 0 \\ 0 & 2-3i \end{pmatrix}$$

7.2.1The complex spectral theorem

Now we can prove the complex spectral theorem, valid on any complex IPS ($T$ is normal $\Leftrightarrow$ $T$'s eigenvectors can form an orthonormal basis). $\Rightarrow$ is easy: $T$ has a diagonal basis (see section 5.4), so $T^*$ also has a diagonal matrix (since the conjugate transpose of a diagonal matrix is also diagonal). If they're both diagonal, they commute. So $T$ is normal. To prove $\Leftarrow$, note that $T$ has a UT matrix wrt some orthonormal basis (see section 6.3.3). Now we just have to prove that this matrix is diagonal. We do this by looking at $\|T e_j\|$ and $\|T^*e_j\|$ for each $j$. We know that these quantities are equal, by the norm relation in the previous section. The matrix for $T^*$ is the conjugate transpose of the matrix for $T$. Since we're looking at norms, the conjugateness doesn't matter, so we're essentially comparing the sum of the squares of the elements in the $j$th row versus the $j$th column. We already know that the everything after the element itself in the $j$th column is 0. So this holds true for everything after the element itself in the $j$th row, too (by induction - it holds for $j=1$, etc). Which is pretty neat. This shows that the matrix is diagonal.

7.2.2The real spectral theorem

The proof of the real spectral theorem is a little more involved. First, a lemma: if a quadratic polynomial has no real roots ($b^2 < 4ac$), then the value at any point on the real line is always non-zero, and thus invertible. If we pass in a self-adjoint operator to the polynomial, then we get that the result is an invertible operator. The proof of this lemma just requires showing that the resulting operator, let's call it $S$, is non-zero: $\langle Sv, v \rangle$ is greater than the sum of a square and a positive constant times the norm of a non-zero operator, so it's $> 0$. (We can use Cauchy-Schwarz to get the greater than thing.) Since it's never 0, it's injective, and thus invertible.

Next, another lemma: Any self-adjoint op has an eigenvalue in a real IPS. Proof: Starts off the same as in section 5.3. Consider the vectors $(v, Tv, T^2v, \ldots, T^nv)$. This is not linearly independent for $\dim V = n$. So we can write 0 as a linear combination of not-all-zero complex coefficients. Take the largest nonzero coefficient, and make all of them up to there the coefficients of a polynomial. To factor this over the reals, we end up with some irreducible quadratics and some irreducible linears. We just need to prove that none of the irreducible quadratics gives 0. But this follows from the previous lemma! None of these polynomials have real roots, so the resulting operator is never 0. So if the LHS is 0, it's one of the linears $(T-\lambda_i I)$ that sends a vector to 0. Specifically, there is at least $\lambda_i$ for which this is true. But then $\lambda_i$ is an eigenvalue of $T$. QED

Now we can prove the real spectral theorem (same as complex except replace normal by self-adjoint and complex by real). $\Rightarrow$ is trivial, as before. For $\Leftarrow$, assume that $T$ is self-adjoint. We can prove this by induction on the dimension of $V$. For $\dim V = 1$, we only have one eigenvalue and one eigenvector, so of course we can create an orthonormal basis of eigenvectors.

For the induction step, assume that it holds for $\dim V = n-1$, and let's try to prove it for $n$. Now, we know that $T$ has at least one eigenvalue $\lambda$ and corresponding eigenvector $u$, from the previous lemma. Consider only the normalised $u$ (just take any eigenvector and divide it by its norm). $u$ by itself forms the basis for some subspace $U$. Let's prove that $U^{\perp}$ is invariant under $T$. Well, if $v \in U^{\perp}$, then $\langle u, Tv \rangle = \langle Tu, v \rangle$ (by self-adjointness) $ = (\lambda u, v )$ (since $\lambda$ is an eigenvalue) $ = \lambda \langle u, v \rangle = 0$ since $v \in U^{\perp}$. So $Tv \in U^{\perp}$ as well, and $U^{\perp}$ is indeed invariant under $T$.

Then, we create a new operator, which we'll call $S$, which is exactly the same as $T$ in terms of effects except its domain and range are $U^{\perp}$ ($S \in \mathcal L(U^{\perp})$. Is $S$ self-adjoint? Yes, because, if $u, v \in U^{\perp}$, then $\langle Sv, w \rangle = \langle Tv, w \rangle = \langle v, Tw \rangle = \langle v, Sw \rangle$ (in other words, it inherits $T$'s self-adjointness). Now, recall that $S$ is operating on a vector space of dimension $n-1$ (since it's the complement of $U$, of dimension 1, in $V$, of dimension $n$). So the induction hypothesis applies to $S$ - the eigenvectors of $S$ can form an orthonormal basis of $U^{\perp}$. But all the eigenvectors of $S$ are also eigenvectors of $T$, since $Sv = Tv$ for any $v$ in $U^{\perp}$. Furthermore, $u$ is clearly orthogonal to eigenvector of $S$, since those are all in $U^{\perp}$. So $u$ plus the eigenvectors of $S$ forms an orthonormal basis consisting of eigenvectors of $T$ (recall that $u$ is an eigenvector of $T$).

That concludes the proof. It's pretty long, but quite nice.

Corollary: the direct sum of the vector spaces formed by the eigenvectors for each eigenvalue is $V$. Proof: by spectral theorem (basis of eigenvectors, etc) plus section 5.4. Also, each eigenvector is orthogonal to the eigenvectors corresponding to all other eigenvalues (proved in the previous section).

7.3Normal operators on real inner-product spaces

7.3.1Two-dimensional vector spaces

On a two-dimensional real IPS, TFAE:

  1. $T$ is normal but not self-adjoint
  2. the matrix of $T$ wrt any orthonormal basis is NOT diagonal and looks like $\displaystyle \begin{pmatrix} a & -b \\ b & a \end{pmatrix}$
  3. Same as above, but only for some orthonormal basis; same format but the bottom left entry is positive (and the top right is negative)

Proof:

  • 1 -> 2: The matrix of $T$ wrt to some ortho basis is $\displaystyle \begin{pmatrix} a & b \\ d & c \end{pmatrix}$. Then $\|Te_1\|^2 = a^2 + b^2$ and $\|T^*e_1\|^2 = a^2+c^2$ (conjugate transpose, etc). Since the operator is normal, we know that $\|Te_1\|^2 = \|T^*e_1\|^2$ so $b^2=c^2$. Then either $b=c$ or $b=-c$. If the former were true, then $T$ would be self-adjoint, which is not the case. So $b = -c$. Also, the matrix can't be diagonal (otherwise $T$ would be self-adjoint) so $b \neq 0$. Now, to determine $d$, we compute $T^*$ and $TT^*$ which, we know, are the same, since $T$ is normal. Thus the entries in the top right corner of each are identical. So $ab-bd = bd-ab$. Thus $ab=bd$. We can divide by $b$ since it's nonzero, getting $a=d$. QED.
  • 2 -> 3: Assume that for a basis $(e_1, e_2)$, $b < 0$ (in the matrix above). Now consider the matrix for the basis $(-e_1, -e_2)$1. Well, $T(-e_1) = -Te_1$, and $T(-e_2) = -Te_2$, so the matrix will be $\displaystyle \begin{pmatrix} -a & -(-b) \\ -b & -a \end{pmatrix}$. Since $b < 0$, $-b > 0$ so we can equally well write this matrix as $\displaystyle \begin{pmatrix} a' & -b' \\ b' & a' \end{pmatrix}$ where $b > 0$. QED.
  • 3 -> 1: If $T$ has a matrix in the form mentioned in 3, then $T$ is clearly not self-adjoint since $b > 0$ (so $b$ is never equal to $-b$). Can we prove that $T$ is normal? Yes, just multiply out the matrices. Left as an exercise for the reader.

7.3.2Block matrices

When an entry isn't just an entry, it's a matrix consisting of other entries. So a matrix can contain other matrices which presumably can also contain other matrices MATRICEPTION

7.3.3Invariant subspaces and normal operators

$U$ is a subspace of $V$, invariant under the normal operator $T \in \mathcal L(V)$. Then the following are true:

  1. $U^{\perp}$ is also invariant under $T$. Proof: start with an ortho basis $(e_1, \ldots, e_m)$ of $U$. Extend that to an ortho basis of $V$ (so we add some vectors $f_1, \ldots, f_n$). Now, since $U$ is invariant under $T$, the matrix of $T$ wrt this basis of $V$ has $n$ rows of zeros under the first $m$ columns (because all the coefficients for the $f$'s are 0). Picture the matrix as consisting of A, B in the top row and C, D in the bottom row: C is all 0. Since $T$ is normal we know that $\|Te_j\|^2 = \|T^*e_j\|^2$. The former is the sum of squares of entries in the $j$th column; the latter is the sum of squares of entries in the $j$th row (conjugate transpose, etc). But this tells us that B is all zeroes. So for $Tf_k$, the coefficients for the $e$ vectors are all zero, which means that $Tf_k \in \text{span}(f_1, \ldots, f_n)$. So $U^{\perp}$ is invariant under $T$!
  2. $U$ is invariant under $T^*$. Proof: Just take the transpose of the above matrix. For $T^*e_j$, the coefficients for the $f$ vectors are all zero. QED.
  3. $(T|_U)^* = (T^*)|_U$ (I don't remember seeing this notation introduced anywhere, but it seems to mean "$T$ limited to the subspace $U$). Proof: Create a new operator $S = T|_U$. For any $u, v \in U$, we have $\langle Su, v \rangle = \langle Tu, v \rangle = \langle u, T^*v \rangle$ but also $\langle Su, v \rangle = \langle u, S^*v \rangle$ so $T^*v = S^*v$. Thus the adjoint of $T|_U$ is equivalent to $(T^*)|U$.
  4. $T|_U$ is normal. Proof: by (3), and the fact that $T$ and $T^*$ commute.
  5. $T|_{U^{\perp}}$ is normal. Proof: well, (1) tells us that $U^{\perp}$ is invariant under $T$. So (4) applies to $U^{\perp}$ as well.

7.3.4Block diagonal matrices

A square matrix whose diagonal is composed of square matrices (can be of different sizes). All blocks not along the diagonal are zero. Technically any square matrix can be considered a block diagonal matrix. Standard diagonal matrices are block diagonal matrices with 1x1 blocks (though any other dimension is valid too).

If we have two block matrices whose blocks are the same size (and they have the same number of blocks) then their product is given by the block diagonal matrix made by multiplying each pair of blocks and sticking the result in the expected place. Analogous to standard diagonal matrices, where you just multiply the corresponding entries along the diagonal.

7.3.4.1Relation to normal operators

Now we get to a nice result that seems vaguely reminiscent of polynomial factorisation over the reals. If we're in a real IPS, and $T \in \mathcal L(V)$, then $T$ is normal $\Leftrightarrow$ $T$ has a block diagonal matrix wrt some ortho basis of $V$ where each block is either 1x1 or 2x2 of the form

$$\begin{pmatrix} a & -b \\ b & a \end{pmatrix} \quad b > 0$$

Proof: $\Rightarrow$ is easy - the matrices commute (try it). $\Leftarrow$ is longer: let $T$ be normal, and prove using strong induction on the dimension. If $n=1$ or $n=2$, the theorem trivially holds (use the spectral theorem if $T$ is self-adjoint and section 3.1 otherwise). For our induction step, we note that if $T$ has a nonzero eigenvalue, then there is a one-dimensional invariant subspace (i.e., the span of the associated eigenvector). If it doesn't, then at the very least it has a two-dimensional invariant subspace (see section 5.5).

Let this invariant subspace of $\dim$ 1 or 2 be $U$. If the dim is 1, we take any arbitrary non-zero vector and divide it by its norm to get a vector of norm 1; this vector is a valid ortho basis for $U$. Clearly the matrix of $T|_U$ wrt this basis is 1x1. On the other hand, if the dim is 2, then $T|_U$ must be normal (see section 7.3.3, number 4). However, it can't be self-adjoint, for if it were, then it would have an eigenvalue (proved in a lemma in section 7.2.2), and so its dim would be 2 and not 1. So, since it's normal but not self-adjoint, then it has a 2x2 matrix wrt to some ortho basis (by (1) in section 7.3.1).

Then we just apply the induction hypothesis to $U^{\perp}$ which has dimension less than $n$. So there is some ortho basis for $U^{\perp}$ that results in a matrix of the desired form, and if we join it with the ortho basis of $U$, we get an ortho basis of $V$ that does what we want. QED

7.4Exercises

Note that sections 4 and beyond were not covered.

  • To show that an operator is not self-adjoint, you just need to produce $u, v$ such that $\langle Tu, v\rangle \neq \langle u, Tv \rangle$.
  • Note that the proposition which states that the matrix of an adjoint is the conjugate transpose of the original matrix only applies wrt orthonormal bases.
  • The product of two self-adjoint operators is only self-adjoint if the multiplication is commutative.
  • The set of self-adjoint operators is a subspace only in a real IPS, not in a complex IPS (not closed under scalar multiple because of the whole complex conjugate thing - $iI$ is not, for example).
  • If $P^2=P$ is an orthogonal projection, then it's self-adjoint, and vice versa. Proof: Find a subspace $U$ for the projection, consider $V = U \oplus U^{\perp}$, look at the decomps of two vectors in $V$, use additivity and orthogonality to prove that $P$ is self-adjoint (should be easy). For the other direction, write any vector $v$ as the sum of something in the range of $P$ ($U$) and its nullspace ($U^{\perp}$), using the trick that $v-Pv$ is in the nullspace (because $P(v-Pv) = Pv - P^2v = Pv-Pv=0$).
  • The set of normal operators on a vs with $\dim > 1$ is not a subspace, because additivity is not satisfied. Try to make up an example where normality does not arise.
  • For a normal operator, the nullspaces of $T^k$ and $T$ are identical. Same for range. Reason: too long to write out.
  • To prove that a particular operator is not self-adjoint given only some of its mappings, first try to determine two different eigenvalues (and associated eigenvectors) from these mappings. Then take the inner product of the eigenvectors from distinct eigenvalues and show that they are not orthogonal (and thus the operator cannot be self-adjoint).
  • If we're working on a complex IPS, a normal operator $T$ is self-adjoint $\Leftrightarrow$ all of its eigenvalues are real.
  • Every normal op on a complex IPS has square and cube root (presumably this is true for $n>3$ too). Proof by spectral theorem, and defining a new operator that uses the square/cube roots of the eigenvalues of the original op.

8Chapter 8: Operators on complex vector spaces

8.1Generalised eigenvectors

Why do we need generalised eigenvectors? The motivation arises from the desire to describe $V$ via the direct sum decomposition of invariant subspaces:

$$V = U_1 \oplus \cdots \oplus U_n$$

It would be nice if all of these subspaces were one-dimensional. Of course, that's not always possible - indeed, it's only possible when the eigenvectors form a basis (in which case we have the decomposition $\DeclareMathOperator{\n}{null} V = \n(T-\lambda_1 I)\oplus\cdots\oplus \n(T-\lambda_nI)$ for the distinct eigenvalues $\lambda$). Last chapter, we showed that this can occur for all self-adjoint operators on a complex IPS. Unfortunately, this doesn't hold for all operators. To improve this bleak situation, we introduce the concept of generalised eigenvectors: $(T-\lambda I)^j v = 0$ for some $j > 0$. Then we have the following decomposition:

$$V = \n (T-\lambda_1 I)^{\dim V} \oplus \cdots (T-\lambda_n)^{\dim V}$$

(We will soon see that $j = \dim V$ for all generalised eigenvalues.)

Obviously, all regular eigenvalues are generalised eigenvalues. Just as obviously, the converse is not true, otherwise what would be the point of introducing this concept?

8.1.1Nullspaces of powers

Consider $T^k$. What is its nullspace? Well, if $T^kv = 0$, then $T^{k+1}(v) = T(T^kv) = T(0) = 0$. Thus the nullspace of $T^k$ is always within the nullspace of $T^{k+1}$. We have the following relation:

$$\{0\} = \n T^0 \subseteq \n T^1 \subseteq \cdots \subseteq T^k \subseteq T^{k+1} \subseteq \cdots$$

If ever two nullspaces in the above ordering are equal, all subsequent nullspaces will be equal as well. Proof: Suppose $\n T^m = \n T^{m+1}$. We need to show that we also have $\n T^{m+k} = \n T^{m+k+1}$ for all $k > 0$. We know already that the former is contained within the latter; now we just have to show the converse. Suppose $v \in \n T^{m+k+1}$ such that $T^{m+k+1}v = 0$. Then $T^{m+1}(T^kv) = 0$. So $T^kv$ is in the nullspace of $T^{m+1}$. But then it must be in the nullspace of $T^{m}$ as well, since $\n T^m = \n T^{m+1}$! Thus $T^kv \in \n T^{m+1}$ which means that $T^{m+1}(T^kv) = T^{m+k+1}(v) = 0$. This is exactly what we wanted to show.

Are we guaranteed that two nullspaces are ever the same, for a particular $m$? Indeed are we: when $m = \dim V$. Proof: well, by the previous proposition, we know that no two nullspaces before $T^{\dim V}$ are equal. Thus it must be a strict inclusion. So the dimension must increase by 1 for every power. But then $\n T^{\dim V +1}$ would have a dimension of $\dim V + 1$, which is not possible as it's still a subspace of $V$! Thus we must have that $\n T^{\dim V} = \n T^{\dim V + 1} = \cdots$.

As a corollary of the above, we get that the subspace composed of generalised eigenvectors for an evalue $\lambda$ is given by $\n (T-\lambda I)^{\dim V}$. Proof: for one direction, $v \in \n$ is clearly an eigenvector by definition. For the other direction, it also pretty much follows by definition (and the two previous statements).

8.1.2Nilpotent operators

Nilpotent: $T^k =0$ for some $k$. Example: $N(x, y) = (0, x)$. Then $N^2 = 0$. Differentiation is also nilpotent, with the power being $m+1$ if the degree of the polynomial is $m$ (and thus the dimension of the VS is $m+1$).

In fact, we never have to raise a nilpotent operator to a power higher than $\dim V$ to make it 0. Proof: every $v \in V$ is essentially a generalised eigenvector for $\lambda = 0$.

8.1.3Ranges of powers

The inclusion chain for range is exactly the opposite of what it was for nullspace, for obvious reasons (note that $T^0 = I$):

$$\DeclareMathOperator{\r}{range} V = \r T^0 \supseteq \r T^1 \supseteq \cdots \supseteq \r T^k \supseteq \r T^{k+1} \supseteq \cdots$$

Similarly, we have that $\r T^{\dim V} = \r T^{\dim V + 1} = \cdots$. We can prove this by making use of the dimension of the vector space and the dimension of the nullspace (proved previously).

8.2The characteristic polynomial

Consider the UT matrix of $T$ wrt some basis of a complex IPS. We know from section 5.3 that the eigenvalues are the values along the diagonal. So if $T$ has $n = \dim V$ different eigenvalues, then there are $n$ different values along the diagonal.

What about when $T$ has fewer than $\dim V$ eigenvalues? Clearly some eigenvalues are repeated. How many times? Exactly $\dim \n (T-\lambda I)^{\dim V}$ times (i.e., the geometric multiplicity - or, dimension of the eigenspace - for the generalised eigenvalues). Proof: By induction on dimension. If $(v_1, \ldots, v_n)$ is the basis for $V$, then consider $(v_1, \ldots, v_{n-1})$. This is a basis for a subspace $U$ that is invariant under $T$, because of the whole UT matrix thing (section 5.3 again). Consider $T|_U$: this operator's UT matrix has dim $n-1$, and we can apply the IH. Now going back to $U$. The rest of the proof is really long and complicated, forget it.

8.2.1Multiplicity

This textbook uses multiplicity to mean geometric multiplicity $(\dim \n (T-\lambda I)^{\dim V}$). As stated previously, this is what controls the number of times an eigenvalue shows up on the diagonal.

The sum of multiplicities is $\dim V$ on a complex VS (think about the size of the matrix).

8.2.2Finding the characteristic polynomial

Characteristic polynomial: $(x-\lambda_i)^{d_i} \cdots$ where $d_i$ is the algebraic multiplicity. Total degree = $\dim V$, obviously. The roots are the eigenvalues. Given a UT matrix, we can easily find its characteristic polynomial.

8.2.3Cayley-Hamilton theorem

If we put the operator itself through its own char poly ($q(T)$), we get 0. Proof: show that $q(T)v_i = 0$ where $(v_1, \ldots, v_n)$ is a basis for $V$ (complex). We can use strong induction on the dim. Since we have a UT matrix, for $n=1$, $Tv_1 = \lambda v_1$, so obviously $q(T) = 0$ (since the one and only factor vanishes). Now, $(T-\lambda_iI)v_i$ is in the span of the previous $v$'s, so we can write it as a linear combination of them. We also know, by the IH, that $(T-\lambda_1I)v_1 = 0$, and $(T-\lambda_1I)(T-\lambda_2I)v_2 = 0$, etc. So if we rewrite $(T-\lambda_iI)v_i$ in terms of the other $v$'s, we see that

$$(T-\lambda_1I)(T-\lambda_2I)\cdots(T-\lambda_nI)v_n = 0$$

and so all of the basis vectors go to zero. Then $q(T)$ is just the zero operator, as desired.

8.3Decomposition of an operator

8.3.1Nullspaces of polynomials

$\n p(T)$ is invariant under $T$. Proof: by linearity, any vector in $\n$ is 0 when passed through the poly. Now consider $p(T)(Tv)$. We can rewrite this as $T(p(T)v)$ by commutativity or something I'm not sure. So it's just $T(0) = 0$, showing that $Tv$ is indeed in the nullspace.

8.3.2Decomposition into nilpotent operators

For $T$, $\lambda_1, \ldots, \lambda_n$ are distinct eigenvalues, $U_1, \ldots, U_n$ are the eigenspaces (from generalised eigenvalues: $U_i = \n (T-\lambda_i I)^{\dim V}$). Then $V$ is the direct sum of the $U$s, and each $U$ is invariant under $T$, and $(T-\lambda_i I)|_{U_i}$ is nilpotent. Proof: directly from definitions and other stuff we've proved before.

8.3.3Bases from generalised eigenvectors

Not all ops on a complex VS have enough regular eigenvectors to form a basis. BUT, there are always enough generalised eigenvectors. Proof: the bases of the eigenspaces above.

8.3.4Matrices of nilpotent operators

Any nilpotent operator has a UT matrix with all 0's along the diagonal wrt some basis. Proof: choose bases from the nullspace of $N$, then $N^2$, etc (put them all in one giant basis to get a basis for $V$). Basically use the chain of inclusion for $\n N$, $\n N^2$ etc.

8.3.5Upper-triangular block matrices

If $T$ has $m$ distinct eigenvalues, then $T$ has a block diagonal matrix where each block $i$ is UT with the $i$th eigenvalue repeated along the diagonal (number of times = geometric multiplicity). Proof: bases, etc. Follows from earlier statements.

8.4Square roots

8.4.1Square roots and nilpotent operators

If $N$ is nilpotent then $I+N$ has a sqrt. Proof: Taylor expansion of $\sqrt{1+x}$. It's actually a finite sum because $N^m = 0$ for some $m$, and after that everything is 0 too. So there is a sqrt of the form $I + a_1N + a_2N^2 + \cdots + a_{m-1}N^{m-1}$. To solve for the $a$s, we square this and set it equal to $I+N$:

$$I + 2a_1N + (2a_2 + a_1^2)N^2 + \cdots = I+N$$

Then we choose the coefficients to make like terms match. So $a_1 = 1/2$, $a_2 = -1/8$, etc. It doesn't really matter what the $a$'s actually are, since this is just an existence theorem.

8.4.2Invertible operators and square roots

Every invertible op has a sqrt. This holds only on a complex VS (because negative numbers don't have square roots in a real VS). This follows from the previous result, because each eigenspace $U_i$ contains a nilpotent operator $N_i$ given by $\lambda_i I + N_i$. Now, none of these eigenvalues is 0 (since $T$ is invertible), so we can divide by $\lambda_i$ to get $S_i = T|_{U_i} = \lambda_i(I + N_i/\lambda_i)$. Obviously any scalar multiple of a nilpotent operator is also nilpotent, so $S_i$ has a sqrt.

Now, since $V$ is the direct sum of the $U$'s, we can write a $v \in V$ as a linear combo of $u_i$'s. Then define $S = S_1u_1 + \cdots + S_nu_n$. This is in fact the square root of $T$.

We can actually use this proof for $k$th roots other than $k=2$.

8.5The minimal polynomial

Minimal poly: the unique monic poly $p$ of smallest degree such that $p(T) = 0$.

Proof of existence: Suppose $V$ has dim $n$, so $\mathcal L(V)$ has dim $n^2$ (see section 3.4). Consider $(I, T, T^2, \ldots, T^{n^2})$. This is $n^2+1$ vectors, so they can't all be linearly independent. Let $m$ be the smallest number such that $(I, T, T^2, \ldots, T^m)$ is dependent. Then we can make 0 using some (unique) choice of coefficients, not all zero. We call this the min poly.

$q(T) = 0$ $\Leftrightarrow$ The min poly, $p$, divides $q$. $\Rightarrow$ is easy. For $\Leftarrow$, use the division algorithm, conclude that $r=0$ (otherwise it would contradict the minimality of $p$).

The roots of the min poly are the eigenvalues. Well, since $p(T) = 0$, we can factor out any root $\lambda$ like this: $0 = (T-\lambda I)(q(T)v)$. But this tells us that $\lambda$ is an evalue. For the other direction, we want to prove that any evalue is a root of the min poly. Suppose $Tv = \lambda v$ and we apply $p(T)$ on $v$. If we keep replacing $T$ with $\lambda$ in $p(T)v = 0$ we just get $p(\lambda)v = 0$. Since eigenvectors are non-zero by definition, it has to be $p(\lambda)$ that is zero. So $\lambda$ is a root.

How do we find a min poly from a matrix? The theory is to make a lin dep list (with powers of matrices instead of powers of ops) then find coefficients that make it go to 0. Then we can put these coeffs in poly format to get the min poly. Can be done via row-reduction, if necessary. Hopefully we won't ever have to actually do this because the calculations get messy really quickly (unless you're a computer). Plus, it's not like there's a better way to do it ... After all, we know that the min poly divides the char poly. If these have the same degree (i.e., $\dim V$), then they are equal.

It seems like this chapter ends now, but I feel like there's more to be learned about the minimal polynomial ... some notes:

  • all factors of the char poly are in the min poly
  • sometimes the multiplicity is reduced
  • WHEN IS THE MULTIPLICITY REDUCED???

8.6Jordan form

Any op $T$ on a CVS has a matrix which is mostly diagonal (except for maybe the line above the diagonal) wrt some basis. Before we can prove this, we launch into an unexpected sidebar on nilpotent operators:

8.6.1Nilpotent operators and bases

For some nilpotent ops, it's possible to make a basis out of $(v, Nv, N^2v, \ldots, N^{n-1}v)$ with the right choice of $v$. For example, for the op which puts 0 in the first arg and moves the rest down, we can choose $v = (1, 0, \ldots)$ to get a basis. But there are ops for which this can't be done. $N(x, y) = (0, 0)$ is a very simple example; $(v, Nv)$ would just look like $(v, 0)$ no matter what the $v$. On the other hand, if we could use two vectors - $v_1 = (1, 0)$ and $v_2 = (0, 1)$ - then we could make a basis, with $(v_1, v_2)$.

More generally, one suspects that it's possible to make a basis out of $(v_1, Nv_1, \ldots, N^{a_i}v_1, v_2, Nv_2, \ldots, )$ etc (note that the $a$'s are the largest value that don't make the op 0, so that $N^av \neq 0$ but $N^{a+1}v = 0$). Indeed this is true, and furthermore, we have the terms of powers with highest degree form a basis of the nullspace of $N$. We can prove this with strong induction. First, since $N$ is nilpotent, its nullspace includes some non-zero vectors and thus has a non-zero dimension. So then its range has a dimension less than $\dim V$. So consider $\r N$ instead of $V$ and apply the IH to that. We can make a basis starting with some vectors $u_1, \ldots, u_j$ in the range, with the highest powers being a basis for the intersection of $\n$ and $\r$. Now since the $u$s are all in the range, we can consider their preimage instead, and just apply $N$ one more time to get the same exact thing. For the nullspace part, we consider the subspace which, when direct-summed with $\r N \cap \n N$, gives $\n N$.

Now we would just have to prove that this actually works (prove linear independence) but I don't like how long this proof is so I'm going to stop here. It feels like it should work, which is what matters right?

8.6.2Jordan bases

Jordan basis: $T$ wrt this basis has a block diagonal matrix where each nearly-diagonal block has a repeated eigenvalue along the diagonal and 1 in the line directly above the diagonal (and 0 everywhere else).

Any $T$ on a CVS has a Jordan basis. Proof: If $N$ is a nil op, consider applying it on each of $(N^{a_i}v_i, \ldots, Nv_i, v_i)$. The first vector maps to zero, because of the way $a_i$ was defined above. The second vector maps to the first vector, the third maps to the second, etc, etc. So we get a matrix that looks like

$$\begin{bmatrix} 0 & 1 & \ddots & 0 \\ \vdots & 0 & \ddots & 0 \\ \vdots & \vdots & \ddots & 1 \\ \vdots & \vdots & \ldots & 0 \end{bmatrix} \tag{I know the dots are weird}$$

So this does work for nilpotent ops. What about others? If it's not nilpotent, we can decompose the vector space into invariant subspaces (the eigenspaces), and $T- \lambda_iI$ limited to the eigenspace $U_i$ is indeed nilpotent (see section 8.3.2). So then we can make a basis for each of these nil ops, and if we put them all together, we get a basis for $V$, which is what we wanted.

8.7Exercises

  • Finding generalised eigenvectors of an op: First, find the eigenvalues. For an evalue $\lambda$, we want to find $v$ for which $(T-\lambda I)^{\dim V}v = 0$. So multiply that out using the binomial theorem if necessary, and factor out non-zero constants. Basically the generalised eigenvectors for a particular eigenvalue are the vectors in $\n (T-\lambda I)^{\dim V}$. As for how to solve that, I suppose it depends on the particular operator. I don't know of any generalised (ha) techniques.
  • If $N$ is nilpotent, then the set $(v, Tv, Tv^2, \ldots, T^mv)$ where $T^mv \neq 0$ is lin ind. Proof: Write out the $=0$ thing, keep applying $T$ to both sides, the RHS stays 0, the LHS vanishes because $T^mv = 0$, so you get that all of the coefficients are zero.
  • To prove that an op doesn't have a square root: just show that it's nilpotent. Then, if it did have a square root, you would get a contradiction (to the effect that your op must be 0 but it can't be etc etc).
  • If $ST$ is nil, so is $TS$. Proof: $(TS)^{n+1} = 0$ (write it out, regroup it so you get $T(ST)S = 0$).
  • A nil op has only one eigenvalue: 0. Proof: $N$ is not injective, so $N - \lambda I$ is not injective for $\lambda =0$, thus 0 is an eigenvalue. Now show that it's the only one. If $\lambda$ is an evalue, with evector $v$, we have $Nv = \lambda v$. If $N^m = 0$, then we just have to apply $N$ $m-1$ times to both sides to get that $\lambda^mv = N^mv = 0$ so yeah, zero is the only one.
  • Only $N=0$ is both self-adjoint and nilpotent. Proof: the spectral theorem tells us that the evectors of a self adj op can form an ortho basis. From the previous exercise, we know that a nil op has only one evalue: 0. Thus $Ne_i = 0e_i = 0$ for every ortho basis vector $e_i$. So $N$ sends everything in $V$ to zero. Thus $N=0$, if we want to give it a label that badly. I think $N$ is just misunderstood.
  • If $n = \dim V$, and $\n T^n \neq \n T^{n-1}$, then $T$ is nilpotent, and the dimension of $\n T^i$ is $i$ for $0 \leq i \leq n$. Proof: The nullspace dimension monotonically increases (i.e., never decreases) with each exponent. If those two aren't equal, the previous ones can't be equal either - the dimension must increase by 1 for. So the nullspace of $T^n = V$ (since its dim is $n$) so $T^n = 0$ (since everything goes to 0).
  • If $\r T^{m+1} = \r T^m$, then the same is true for all higher exponents. Reason: Find the pre-image of something in $T^{m+1}$, then its pre-image, etc.
  • A vector space is not necessarily the direct sum of the range and nullspace of an op. The dimensions always match though. On the other hand, if we look at the nullspace and range of $T^{\dim V}$, then $V$ is the direct sum. Proof: provide the decomp of $v$ (sum of vector in nullspace, range), show that the dimensions match (or show that the intersection only has 0)?
  • If 0 is the only eigenvalue on a CVS, $N$ is nilpotent. Not necessarily true for a RVS. Proof: If 0 is the only evalue, every vector is a generalised evector, which is enough I guess.
  • If $\dim \n T^{n-2} \neq \dim \n T^{n-1}$ (where $n = \dim V$), then $T$ has at most 2 distinct evalues. Proof: $\n T^{n-1}$ has dimension either $n-1$ or $n$. So 0 is an evalue with multiplicity $\geq n-1$. There can only be one other evalue at most, because the sum of the multiplicities can't exceed $n$.
  • Creating an operator with a specific char poly: You know the evalues and their multiplicities. The dimension is the degree of the polynomial. Make an operator that maps $x_i$ to $\lambda_j x_i$ the same number of times as its algebraic multiplicity. That should do it.
  • Eigenvectors of $T$ form a basis $\Leftrightarrow$ all generalised evectors are also evectors. Proof: Let $v$ be a gen evector, write it as a linear combo of basis vectors. Since these basis vectors are also evectors, we can write $(T-\lambda I)v =(\lambda_1-\lambda)a_1v_1 + \ldots$ and if we apply $T-\lambda I$ $n-1$ times, we get $(T-\lambda I)^nv$ on the LHS, which must be 0; so the RHS must be 0 too, and since not all the a's are zero, $\lambda =\lambda_i$ for some $i$ (to ensure that $(\lambda - \lambda_i)a_i = 0$ for each $i$).
  • Finding the square root of $I+N$: First prove that $N$ is nilpotent (calculate $N^iv$). Then do the Taylor expansion thing. Then use the formulations of $N^2, N^3, \ldots$ that you computed above to write out the formulation for the sqrt operator.
  • If $T$ is invertible, we can write it as a poly of $T$. Proof: Consider the min poly: $q(T) = T^m + \ldots + a_1T + a_0I$. If $a_0$ were 0, we could multiply by $T^{-1}$ to get a minimal polynomial of smaller degree, which is a contradiction. So $a_0$ is nonzero and thus we can divide by it, and solve for $I$ on one side. Multiply both sides by $T^{-1}$ and thus time you'll get a poly of $T$ for $T^{-1}$. Yay!
  • Finding an op which has a given min poly: Use the $p(T) = 0$ definition, work from there. Show that none of the divisors make it go to 0. You can also find the evalues and show that they must be roots of the min poly if necessary. Then just check that none of the other possibilities (accounting for multiplicity) work. Is there no general rule for this?
  • Eigenvectors of $T$ form a basis $\Leftrightarrow$ min poly has no repeated roots. Proof: $(T-\lambda_1I)(T-\lambda_2I)\cdots$ is the zero operator, since it sends each basis vector to 0. So the minimal polynomial is a divisor, thus it can't have repeated roots. Other direction: min poly has no repeated roots, and consider the subspaces of generalised eigenvectors. Somehow they are also standard eigenvectors. Use previous exercise.
  • The min poly of a normal op on an IPS has no repeated roots.
  • If $p(T)v = 0$ for some (not necessarily all) $v$, then it divides the min poly. Proof: Euclidean algorithm, $r=0$ since otherwise it would contradict the minimality of the min poly.
  • Finding the min/char poly of a matrix: First evaluate $T^ie_1 = Te_i$ for each basis vector $e_i$ (go column by column; the rows are the coefficients). If $(e_i, \ldots, T^{n-1}e_i)$ is linearly ind for some $e_i$, then the degree must be $n$ (the dim), in which case the char poly is the same as the min poly (find them using some sort of relation between $T^ne_1$ and the basis vectors). If not, not sure.
  • Suppose $(v_1, \ldots, v_n)$ is a Jordan basis for $V$. What does the matrix wrt $(v_n, \ldots, v_1)$ look like? Well, each block would be transposed, and the diagonal is sort of flipped (along the / axis so it's still a diagonal, just in reverse order).

9Chapter 9: Operators on real vector spaces

Not covered in the lectures.

10Chapter 10: Trace and determinant

10.1Change of basis

10.1.1MTRX 101

Identity, inverses, etc.

10.1.2Change-of-basis matrices

Change-of-basis matrices: the inverse matrix gives you the opposite direction.

To actually find the change-of-basis matrix, refer to the MATH 223 HTSEFP. I can't seem to find anything beyond theory in this section.

10.2Trace

Depends only on $T$, not on the basis chosen, a laudable trait that it inherits from the char poly.

On a CVS, equal to the sum of the eigenvalues (repeated depending on multiplicity).

For a UT (or block UT) matrix, it's the sum along the diagonal. This happens to be the sum of the evalues too.

The trace of $BA$ and $AB$ is the same if $A$ and $B$ are square, of the same size.

The trace of an op independent of the basis chosen for the matrix! Proof: $A^{-1}MA$, move the $A^{-1}$ over, $MAA^{-1} = M$. So even a matrix that is not UT works.

Trace of operators distributes: trace of $(S+T)$ is the same as trace of $S$ plus trace of $T$. Same for multiplication (commutative). Inherited from matrices.

There are no operators such that $ST-TS = I$, because if we apply the trace we get that the RHS is 0, and the trace of $I$ is $\dim V$.

10.3Determinant of an operator

Defined as $(-1)^{\dim V}$ times the constant term in the char poly. On a CVS, it's the product of the eigenvalues (including repeats); for an RVS, we include the eigenpairs somehow (take the second coordinate, since trace uses the first coordinate).

Invertible op $\Leftrightarrow$ non-zero det. Proof: the det is the product of eigenvalues, so it's only zero if one of the eigenvalues is 0, in which case it's not invertible.

Char poly = $\det(z I - T)$. Proof: $\lambda$ is an evalue of $T$ $\Leftrightarrow$ $z-\lambda$ is an evalue of $zI - T$. Same thing for eigenpairs.

10.4Determinant of a matrix

In this chapter we slowly work our way up to defining the determinant of a matrix. If we want to do it from first principles (i.e., what we've learned so far), we find the characteristic polynomial, then take the constant term and multiply it by $(-1)^n$.

Note: If $(v, Tv, T^2v, \ldots, T^{n-1}v)$ is linearly independent (where $n = \dim V$), then no polynomial of degree less than $n$ is ever 0. Hence the minimal polynomial is equal to the characteristic polynomial. Kind of neat.

Permutations: Switching the order of a pair multiplies the sign by -1.

Formula for computing the determinant: You probably know it already. The equation itself is long and doesn't provide much insight.

Determinants of block UT matrices: the product of the determinants of the block matrices along the diagonal.

Changing two columns flips the sign of the matrix (from permutation theory etc).

Two equal columns (or one a scalar multiple of the other): determinant is zero. Proof: the previous statement (if $\det(A) = -\det(A)$ then $\det(A) = 0$).

$\det(AB) = \det(BA) = \det(A)\det(B)$ (for square matrices of the same size).

Determinants of operators are independent of the basis used for the matrix. Also, the det of an op is the same as the det of its associated matrix (wrt any basis).

10.5Exercises

Note that that section 5 (Volume) was not covered.

  • If an op has the same matrix wrt to any basis, it's a scalar multiple of $I$. Proof: $(v, Tv)$ must be lin dep, otherwise $T$ would have different matrices wrt $(v, Tv, \ldots)$ and $(2v, Tv, \ldots)$. Since it's lin dep, every vector is an evector. Thus, scalar multiple of $I$.
  • A change-of-basis matrix is the same as the matrix for the operator that maps each basis vector to the corresponding one. Kind of obvious when you think about it.
  • If the evectors of $T$ can form a basis, then the trace of $T^2 \geq 0$. Proof: $T^2$ is a diagonal matrix with the squares of evalues along the diagonal.
  • Finding a formula for the trace of a given $T$: Create an ortho basis, compute $T$ for each basis vector, figure out what the matrix looks like, then sum the diagonal elements.
  • If $P^2 = P$, its trace is an int $\geq 0$. Proof: $V$ is the direct sum of the nullspace and the range (proved in chapter 5), so we can make a basis for $V$ out of basis vectors for the nullspace and range. The matrix wrt this basis is 1 for all of the range basis vectors and 0 for all of the nullspace basis vectors (along the diagonal). So, the trace is $\dim \r T$. Pretty cool.
  • On an IPS, the trace of an adjoint is the conjugate of the original trace. Proof: Write it as the sum of the diagonal entries of the matrix wrt to some basis, so a sum of inner products, then use the adjoint definition & conjugate symmetry. Works out nicely.
  • Multiplication with a scalar distributes for trace ($\DeclareMathOperator{\trace}{trace} \trace(cT) = c\trace(T)$).
  • $\trace(ST) \neq \trace(S)\trace(T)$
  • If $\trace(ST) = 0$ for all $S$, $T = 0$. Proof: if $Tv \neq 0$ for some $v$, we can extend that to a basis, and we can create an op whose matrix wrt this basis has one 1 in the diagonal (so a trace of 1). Contradiction, so $T = 0$.
  • $\trace(T^*T) = \|Te_1\|^2 + \ldots + \|Te_n\|^2$, for any ortho basis. Proof: $\trace(T^*T) = \langle T^*Te_1, e_1 \rangle + \ldots + \langle T^*T e_n, e_n \rangle = \langle Te_1, Te_1 \rangle + \ldots$ (by adjoint def) and there you have it.
  • $\det(cT) = c^{\dim V}\det(T)$ (multiplying one row or column by $c$ is enough to multiply the whole det by $c$)
  • $\det(S+T) \neq \det(S) + \det(T)$ obviously

I stopped around exercise 21. That's it for now.

  1. The textbook says $(e_1, -e_2)$, which I think is a mistake