3.3 Linear maps

3.3.1 Definitions

Definition 3.9 Let U and V be \(\FF\)-vector spaces. Then \(f:U \to V\) is called a linear map if

\(f(\mathbf{x}+\mathbf{y})=f(\mathbf{x})+f(\mathbf{y})\) for all \(\mathbf{x},\mathbf{y} \in U\), and
\(f(\lambda \mathbf{x})= \lambda f(\mathbf{x})\) for all \(\mathbf{x} \in U\) and all \(\lambda\in \FF\).

Other words used for precisely the same concept are linear transformation and linear function. We say f is linear to mean that it is a linear map.

By using the two properties of a linear map repeatedly, \[ f(l_1 \mathbf{x}_1 + \cdots + l_n \mathbf{x}_n) = l_1 f(\mathbf{x}_1) + \cdots + l_n f(\mathbf{x}_n) \] for any linear map f, any number n, any scalars \(l_i\) and any \(\mathbf{x}_1, \ldots, \mathbf{x}_n \in U\).

Example 3.10

Let U and V be \(\FF\)-vector spaces. The identity map \(\id_U :U \to U\) defined by \(\id_U(\mathbf{u}) = \mathbf{u}\) for all \(\mathbf{u}\in U\) is linear, as is the zero map \(U \to V\) that sends every element of U to \(\mathbf{0}_V\).
Important! Let A be an m✕n matrix with entries in \(\FF\) and define \(T_A: \FF^n \to \FF^m\) by \(T_A(\mathbf{v})=A \mathbf{v}\). Then \(T_A\) is linear. Here \(\FF^n\) is the vector space of column vectors of height n with entries from \(\FF\). This example is important because it shows every matrix gives rise to a linear map. We will see later that every linear map can be represented as a matrix, linking the concrete linear algebra we did in the previous part of these notes with what we’re doing now.
Let \(f: \rr^3 \to \rr\) be \(f \begin{pmatrix} a\\b\\c \end{pmatrix} = a+b-2c\). Then f is linear. On the other hand, if \(f \begin{pmatrix} a\\b\\c \end{pmatrix} =\sqrt{a^2+b^2+c^2}\) then f is not linear: for example, for any nonzero \(\mathbf{x}\) we have \(f(- \mathbf{x} ) \neq -f( \mathbf{x})\).
Let V be the vector space of polynomials in one variable x over \(\FF\). Let \(D:V \to V\) be defined by \(D(f) = \frac{df}{dx}\). Then D is linear.

Lemma 3.10 If \(f:U \to V\) is linear then \(f(\mathbf{0}_U)=\mathbf{0}_V\).

Proof. \(f(\mathbf{0}_U)=f(\mathbf{0}_U+\mathbf{0}_U)=f(\mathbf{0}_U)+f(\mathbf{0}_U)\) so adding the additive inverse of \(f(\mathbf{0}_U)\) to both sides we get \(\mathbf{0}_V = f(\mathbf{0}_U)\).

3.3.2 Kernel and image

Definition 3.10 Let \(f:U \to V\) be a linear map between \(\FF\)-vector spaces U and V.

The kernel of f, written \(\ker f\), is defined to be \[\begin{equation*} \{ \mathbf{u} \in U : f(\mathbf{u})=\mathbf{0}_V \}. \end{equation*}\]
The image of f, written \(\im f\), is defined to be \[\begin{equation*} \{f(\mathbf{u}) : \mathbf{u} \in U\}. \end{equation*}\]

Lemma 3.11 Let \(f:U \to V\) be a linear map between \(\FF\)-vector spaces. Then \(\ker f\) is a subspace of U and \(\im f\) is a subspace of V.

Proof. \(\mathbf{0}_U \in \ker f\) by Lemma 3.10 and \(\mathbf{0}_V = f(\mathbf{0}_U)\in \im f\) so both contain the appropriate zero vector.

Let \(\mathbf{x},\mathbf{y} \in \ker f\) and \(\lambda \in \FF\). Then \(f(\mathbf{x}+\mathbf{y})=f(\mathbf{x})+f(\mathbf{y})=\mathbf{0}_V+\mathbf{0}_V=\mathbf{0}_V\) and \(f(\lambda \mathbf{x})=\lambda f(\mathbf{x})=\lambda \mathbf{0}_V = \mathbf{0}_V\), so \(\ker f\) is closed under addition and under scalar multiplication.

Let \(f(\mathbf{x}),f(\mathbf{y}) \in \im f\) and \(\lambda \in \FF\). Then \(f(\mathbf{x})+f(\mathbf{y})=f(\mathbf{x}+\mathbf{y}) \in \im f\) and \(\lambda f(\mathbf{x}) = f(\lambda \mathbf{x}) \in \im f\) so \(\im f\) is closed under addition and scalar multiplication.

3.3.3 The matrix of a linear map

We’ll assume from now on that the vector spaces we work with are nonzero and finite-dimensional.

Let \(f: U \to V\) be a linear map and \(\mathcal{B} = \mathbf{b}_1,\ldots, \mathbf{b}_n\) a basis of U. Imagine that you want to communicate f to someone else. You only have to tell them \(f(\mathbf{b}_1), \ldots, f(\mathbf{b}_n)\), because they can then work out \(f(\mathbf{u})\) for any \(\mathbf{u} \in U\) by first expressing \(\mathbf{u}\) as a linear combination of the \(\mathbf{b}_i\) \[ \mathbf{u} = \sum_{i=1}^n \lambda_i \mathbf{b}_i\] and then using the linearity property to get \[ f(\mathbf{u}) = \sum_{i=1}^n \lambda_i f(\mathbf{b}_i).\] If the two of you agree a basis \(\mathcal{C} =\mathbf{c}_1,\ldots,\mathbf{c}_m\) of V then you can express each \(f(\mathbf{b}_j)\) as a linear combination of the \(\mathbf{c}_i\)s, so that you may as well only tell them the scalars involved in these linear combinations.

What this means is that a linear map \(f:U \to V\) is completely determined by the data of a basis \(\mathcal{B}\) of U, a basis \(\mathcal{C}\) of V, and the scalars needed to express the \(f(\mathbf{b}_i)\) in terms of the \(\mathbf{c}_j\). Organising these mn scalars into a rectangular grid gives a matrix representing f which depends on the choice of bases \(\mathcal{B}\) and \(\mathcal{C}\).

Definition 3.11 Let

\(f: U \to V\) be linear
\(\mathcal{B} = \mathbf{b}_1, \ldots, \mathbf{b}_n\) be a basis of U
\(\mathcal{C} = \mathbf{c}_1,\ldots \mathbf{c}_m\) be a basis of V

and define scalars \(a_{ij}\) by \(f(\mathbf{b}_j) = \sum_{i=1}^m a_{ij} \mathbf{c}_i\). Then the matrix of f with respect to initial basis \(\mathcal{B}\) and final basis \(\mathcal{C}\), written \([f]_{\mathcal{C}}^{\mathcal{B}}\), is the m × n matrix \((a_{ij})\).

This means that the \(j\)th column of this m × n matrix records the coefficients you need to express \(f(\mathbf{b}_j)\) as a linear combination of the \(\mathbf{c}_i\).

When we’re talking about a linear map from a vector space to itself we say the matrix of f with respect to \(\mathcal{B}\) to mean the matrix with respect to initial basis \(\mathcal{B}\) and final basis \(\mathcal{B}\).

Example 3.11

Let \(D: \RR[x]_{\leq 2} \to \RR[x]_{\leq 2}\) be the differentiation map. The vector space \(\RR[x]_{\leq 2}\) has a basis \(\mathcal{B} = 1,x,x^2\). We find the matrix of D with respect to initial and final bases \(\mathcal{B}\). Here is what D does to the elements of \(\mathcal{B}\): \[\begin{align*} D(1)&= 0 = 0\times 1 + 0 \times x + 0 \times x^2 \\ D(x) &= 1 = 1\times 1 + 0 \times x + 0 \times x^2\\ D(x^2) &= 2x = 0\times 1 + 2 \times x + 0 \times x^2 \end{align*}\] The first of these tells us the first column of \([D]^{\mathcal{B}}_{\mathcal{B}}\), and so on. We get \[\begin{equation*} [D]^{\mathcal{B}}_{\mathcal{B}}= \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 2 \\ 0 & 0 & 0 \end{pmatrix} \end{equation*}\]
Let A be any m✕n matrix. Let \(T_A: \rr^n \to \rr^m\) be \(T_A( \mathbf{v} ) = A \mathbf{v}\). Let \(\mathcal{B}\) and \(\mathcal{C}\) be the standard bases of \(\rr^n\) and \(\rr^m\). Then \([ T_A]_ {\mathcal{C}} ^ {\mathcal{B}} = A\). This is because the jth column of \([T_A]^ {\mathcal{B}} _ {\mathcal{C}}\) is the representation of \(T_A( \mathbf{e} _j)\) with respect to the basis \(\mathcal{C}\), and \(T_A( \mathbf{e} _j) = A \mathbf{e} _j\) which equals the jth column of A.
Let \(T : \rr^3 \to \rr^3\) be the linear map defined by \[\begin{equation*} T \begin{pmatrix} x\\y\\z \end{pmatrix} = \begin{pmatrix} x+2y+3z \\ 4y+5z \\ 5y+6z \end{pmatrix} \end{equation*}\] We’ll find \([T]_ {\mathcal{B}} ^ {\mathcal{B}}\), where \(\mathcal{B}=( \mathbf{e} _1, \mathbf{e} _2, \mathbf{e} _3)\) is the standard basis of \(\rr^3\). To do that we have to work out how to express \(T( \mathbf{e} _j)\) as a linear combination of the standard basis elements, for each j. We have \[\begin{align*} T( \mathbf{e} _1) &= \begin{pmatrix} 1\\0\\0 \end{pmatrix} = \mathbf{e} _1 + 0\mathbf{e}_2 + 0\mathbf{e}_3 \\ T( \mathbf{e} _2) &= \begin{pmatrix} 2\\4\\5 \end{pmatrix} =2 \mathbf{e}_1 + 4 \mathbf{e} _2 + 5 \mathbf{e}_5 \\ T( \mathbf{e} _3) &= \begin{pmatrix} 3\\5\\6 \end{pmatrix} = 3 \mathbf{e} _1 + 5 \mathbf{e} _2 + 6 \mathbf{e}_3 \end{align*}\] and so \([T]_ {\mathcal{B}} ^ {\mathcal{B}} = \begin{pmatrix} 1&2&3\\0&4&5 \\0&5&6 \end{pmatrix}\).

Example 3.12 The matrix of the identity map \(\id : V \to V\) is the identity matrix if the initial and final bases chosen are the same, and the matrix of the zero map from U to V with respect to any initial and final bases is the appropriately-sized zero matrix.

Matrix multiplication is defined the way it is so that it corresponds to composition of linear maps.

Proposition 3.3 Let \(f:U \to V\) and \(g:V \to W\) be linear maps, let \(\mathcal{B}=\mathbf{b}_1,\ldots,\mathbf{b}_l\) be a basis of U, let \(\mathcal{C}=\mathbf{c}_1,\ldots,\mathbf{c}_m\) a basis of V and let \(\mathcal{D}=\mathbf{d}_1,\ldots,\mathbf{d}_n\) be a basis of W. Then \[[g\circ f]^{\mathcal{B}}_{\mathcal{D}} = [g]^{\mathcal{C}}_{\mathcal{D}}[f]^{\mathcal{B}}_{\mathcal{C}}.\]

Proof. To work out the matrix of \(g\circ f\) on the left hand side, we must find \(g(f(\mathbf{b}_j))\) as a linear combination of the \(\mathbf{d}_i\): the coefficient of \(\mathbf{d}_i\) in \(g(f(\mathbf{b}_j))\) is the i, j entry of the matrix. So let’s do that: \[\begin{equation*} g(f(\mathbf{b}_j)) = g\left( \sum_{k=1}^m f_{kj} \mathbf{c}_k \right) = \sum_{k=1}^m f_{kj}g(\mathbf{c}_k) =\sum_{k=1}^m \sum_{i=1}^n f_{kj}g_{ik}\mathbf{d}_i \end{equation*}\] where \(f_{kj}\) is the k, j entry of \([f]^{\mathcal{B}}_{\mathcal{C}}\) and \(g_{ik}\) is the i, j entry of \([g]^{\mathcal{C}}_{\mathcal{D}}\). So the i, j entry of \([g\circ f]^{\mathcal{B}}_{\mathcal{D}}\), that is, the coefficient of \(\mathbf{d}_i\) in the above, is \(\sum_{k=1}^m g_{ik}f_{kj}\) which is, by definition of matrix multiplication, the i, j entry of \([g]^{\mathcal{C}}_{\mathcal{D}}[f]^{\mathcal{B}}_{\mathcal{C}}\).

We can take the matrix of a linear map f with respect to different choices of initial and final bases. The next result, called the change of basis formula, tells you how the resulting matrices are related.

Corollary 3.4 Let \(f:U \to V\) be a linear map and let \(\mathcal{B}, \mathcal{B}'\) be bases of U and \(\mathcal{C}, \mathcal{C}'\) be bases of V. Then \[\begin{equation*} [f]^{\mathcal{B}'}_{\mathcal{C}'} = [\id_V]^{\mathcal{C}}_{\mathcal{C}'} [f]^{\mathcal{B}}_ {\mathcal{C}} [\id_U]^{\mathcal{B'}}_{\mathcal{B}} \end{equation*}\]

Proof. \(f = \id_V \circ f \circ \id_U\), so using Proposition 3.3 twice: \[\begin{equation*} [f]^{\mathcal{B'}}_{\mathcal{C}'} = [\id_V \circ f \circ \id_U]^{\mathcal{B'}}_{\mathcal{C}'} = [\id_V]^{\mathcal{C}}_{\mathcal{C}'}[f\circ \id_U]^{\mathcal{B}'}_{\mathcal{C}} = [\id_V]^{\mathcal{C}}_{\mathcal{C}'} [f]^{\mathcal{B}}_{\mathcal{C}} [\id_U]^{\mathcal{B}'}_{\mathcal{B}}. \end{equation*}\]

Example 3.13 Let \(T: \mathbb{R}^2 \to \mathbb{R}^2\) be the linear map \[\begin{equation*} T \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} -16x+6y \\ -45x+17y \end{pmatrix} \end{equation*}\] Let \(\mathcal{B}\) be the standard basis \(\mathbf{e}_1, \mathbf{e}_2\) of \(\mathbb{R}^2\) and let \(\mathcal{C}\) be the basis \(\mathbf{c}_1 = \begin{pmatrix} 2 \\ 5\end{pmatrix}, \mathbf{c}_2 = \begin{pmatrix} 1 \\ 3 \end{pmatrix}\).

The matrix of T with respect to \(\mathcal{B}\) is easy to find: \[\begin{align*} T( \mathbf{e}_1) &= T \begin{pmatrix}1\\0\end{pmatrix} \\ &= \begin{pmatrix} -16 \\ -45 \end{pmatrix} \\ &= -16 \mathbf{e}_1 -45 \mathbf{e}_2 \\ T( \mathbf{e}_2) &= T \begin{pmatrix}0\\1\end{pmatrix} \\ &= \begin{pmatrix} 6 \\ 17 \end{pmatrix} \\ &= 6 \mathbf{e}_1 +17 \mathbf{e}_2 \end{align*}\] so \([T]^{\mathcal{B}}_{\mathcal{B}} = \begin{pmatrix} -16 & 6 \\ -45 & 17 \end{pmatrix}\).

\([\id]^{\mathcal{C}}_{\mathcal{B}}\) is also easy: it’s the matrix which tells us how to express the elements of \(\mathcal{C}\) in terms of the standard basis. \[\begin{align*} \mathbf{c}_1 &= 2 \mathbf{e}_1 + 5 \mathbf{e}_2 \\ \mathbf{c}_2 &= 1 \mathbf{e}_1 + 3 \mathbf{e}_2 \end{align*}\] and so \([\id]^{\mathcal{C}}_{\mathcal{B}} = \begin{pmatrix} 2&1\\5&3\end{pmatrix}\).

How to express the \(\mathbf{e}_i\) in terms of the \(\mathbf{c}_i\) isn’t so obvious, so on the face of it computing \([\id]^{\mathcal{B}}_{\mathcal{C}}\) is harder. But we can avoid that by using proposition 3.3: \([\id]^{\mathcal{B}}_{\mathcal{C}} [\id]^{\mathcal{C}}_{\mathcal{B}} = [\id\circ\id]^{\mathcal{C}}_{\mathcal{C}} = [\id]^{\mathcal{C}}_{\mathcal{C}}=I_2\), so \[\begin{align*} [\id]_{\mathcal{C}}^{\mathcal{B}} &= ([\id]^{\mathcal{C}}_{\mathcal{B}}) ^{-1} \\ &= \begin{pmatrix} 2&1\\5&3\end{pmatrix}^{-1} \\ &= \begin{pmatrix} 3&-1\\-5&2\end{pmatrix} \end{align*}\]

We could work out \([T]^{\mathcal{C}}_{\mathcal{C}}\) directly using the definition, but instead we are going to practise using the change of basis formula. It says \[\begin{align*} [T]^{\mathcal{C}}_{\mathcal{C}} &= [\id]^{\mathcal{B}}_{\mathcal{C}} [T]^{\mathcal{B}}_{\mathcal{B}} [\id]^{\mathcal{C}}_{\mathcal{B}} \\ &= \begin{pmatrix} 3&-1 \\ -5 & 2 \end{pmatrix} \begin{pmatrix} -16 & 6 \\ -45 & 17 \end{pmatrix} \begin{pmatrix} 2&1 \\5&3\end{pmatrix} \\ &= \begin{pmatrix} -1 & 0 \\ 0 & 2\end{pmatrix} \end{align*}\]

What about \([T]^{\mathcal{B}}_{\mathcal{C}}\)? Again we could do this directly from the definition by computing \(T(\mathbf{e}_1)\) and \(T(\mathbf{e}_2)\) and expressing them in terms of the \(\mathbf{c}_i\)s. But we already have the information we need: by proposition 3.3 \[\begin{align*} [T]^{\mathcal{B}}_{\mathcal{C}} &= [T\circ \id]^{\mathcal{B}}_{\mathcal{C}} \\ &= [T]^{\mathcal{C}}_{\mathcal{C}} [\id]^{\mathcal{B}}_{\mathcal{C}} \\ &= \begin{pmatrix} -1&0\\0&2\end{pmatrix} \begin{pmatrix} 3&-1\\-5&2\end{pmatrix} \\ &= \begin{pmatrix} -3&1 \\ -10 & 4 \end{pmatrix} \end{align*}\] To check our answer we compute \(T(\mathbf{e}_1)\), which is \(\begin{pmatrix} -16 \\ -45 \end{pmatrix}\). If the matrix is correct this should be the same as \(-3 \mathbf{c}_1 - 10 \mathbf{c}_2\), and you can check that it really is.

3.3.4 The rank-nullity theorem

Definition 3.12 Let U and V be finite-dimensional \(\FF\)-vector spaces and \(f:U \to V\) be a linear map. Then

the rank of f, written \(\rk f\), is \(\dim \im f\)
the nullity of f , written \(\nul f\), is \(\dim \ker f\).

Theorem 3.2 (Rank-nullity theorem) Let U and V be finite-dimensional \(\FF\)-vector spaces and \(f:U \to V\) be a linear map. Then \[\dim \im f + \dim \ker f = \dim U.\]

Proof. Let \(\mathbf{k} _1,\ldots, \mathbf{k} _m\) be a basis of \(\ker f\). Using Proposition 3.1, extend it to a basis \(\mathbf{k} _1,\ldots, \mathbf{k} _m, \mathbf{u} _1,\ldots, \mathbf{u}_n\) of U. Then \(\dim \ker f= m\), \(\dim U = m+n\) and so we need to prove that \(\dim \im f = n\). To do this we show that \(f(\mathbf{u}_1),\ldots,f(\mathbf{u}_n)\) is a basis of \(\im f\).

First we show that \(f(\mathbf{u}_1),\ldots,f(\mathbf{u}_n)\) spans \(\im f\). If \(\mathbf{x} \in U\) then we can write \[\mathbf{x} = \sum_{i=1}^m \lambda_i \mathbf{k} _i + \sum_{i=1}^n \mu_i \mathbf{u} _i \] for some scalars \(\lambda_i,\mu_i\) since \(\mathbf{k} _1,\ldots, \mathbf{k} _m, \mathbf{u} _1,\ldots , \mathbf{u} _n\) is a basis of U. Then \[\begin{multline*} f(\mathbf{x})=f\left( \sum_{i=1}^m \lambda_i \mathbf{k} _i + \sum_{i=1}^n \mu_i \mathbf{u} \right) =\sum_{i=1}^m \lambda_i f(\mathbf{k} _i) + \sum_{i=1}^n \mu_i (\mathbf{u}_i) =\sum_{i=1}^n \mu_i f(\mathbf{u}_i) \end{multline*}\] because \(f(\mathbf{k}_i)=\mathbf{0}_V\) for all i. This shows that any element \(f(\mathbf{x})\) of the image of f is in the span of the \(f(\mathbf{u}_i)\).

Now we show \(f(\mathbf{u}_1),\ldots,f(\mathbf{u}_n)\) is linearly independent. Suppose \[\sum_{i=1}^n \lambda_i f(\mathbf{u}_i)=\mathbf{0}_V\] for some scalars \(\lambda_i\); we want to prove all these scalars must be zero. By linearity \(f\left(\sum_{i=1}^n \lambda_i\mathbf{u}_i\right)=\mathbf{0}_V\) and so \(\sum _{i=1}^n \lambda_i \mathbf{u}_i \in \ker f\). But \(\mathbf{k} _1,\ldots, \mathbf{k} _m\) is a basis of the kernel of f, so there exist scalars \(\mu_i\) such that \[\begin{equation*} \sum_{i=1}^n \lambda_i \mathbf{u}_i = \sum_{i=1}^m \mu_i \mathbf{k}_i \end{equation*}\] or equivalently \[\begin{equation*} \sum_{i=1}^n \mu_i \mathbf{k}_i-\sum_{i=1}^m \lambda_i \mathbf{u}_i=\mathbf{0}_U. \end{equation*}\] This is a linear dependence relation on \(\mathbf{k} _1,\ldots, \mathbf{k _m}, \mathbf{u} _1,\ldots, \mathbf{u}_n\), but those vectors are a basis of U hence linearly independent. Thus all the \(\lambda_i\) and \(\mu_i\) are zero, completing the proof that \(f(\mathbf{u}_1),\ldots,f(\mathbf{u}_n)\) is linearly independent.

3.3.5 Isomorphisms of vector spaces

A linear map \(f:U \to V\) which is a bijection is called an isomorphism of vector spaces or a vector space isomorphism or just an isomorphism. If there exists an isomorphism from U to V we say that U and V are isomorphic and write \(U \cong V\).

In this section we’ll assume all vector spaces are finite-dimensional.

Lemma 3.12 If \(U \cong V\) then \(\dim U = \dim V\).

Proof. There is a linear map \(f : U \to V\) which is a bijection. By the rank nullity theorem, \(\dim \ker f + \dim \im f = \dim U\).

Since \(f\) is one-to-one and \(f( \mathbf{0}_U) = \mathbf{0}_V\), there are no other elements of U which f sends to \(\mathbf{0}_V\), so \(\ker f = \{ \mathbf{0}_U\}\) and \(\dim \ker f = 0\).

Since \(f\) is onto, \(\im f = V\) so \(\dim \im f = \dim V\).

Substituting these in to the rank nullity theorem, \(\dim V = \dim U\).

Symbols which look a bit like equality ought to behave a bit like equality, so let’s show that if \(U \cong V\) then \(V \cong U\). To do this we need to show that if f is a vector space isomorphism then so is its inverse.

Lemma 3.13 Let \(f: U \to V\) be an isomorphism of vector spaces. Then so is \(f^{-1} : V \to U\).

Proof. f is a bijection, and the inverse of a bijection is a bijection. We only need show that \(f^{-1}\) is linear.

Let \(\mathbf{x}, \mathbf{y} \in V\). Since f is onto, \(\mathbf{x} = f(\mathbf{u})\) and \(\mathbf{y} = f(\mathbf{v})\) for some \(\mathbf{u}, \mathbf{v} \in U\). Notice that \(\mathbf{u} = f^{-1}(\mathbf{x})\) and \(\mathbf{v} = f^{-1}(\mathbf{y})\). Now

\[\begin{align*} f^{-1}(\mathbf{x} + \mathbf{y}) &= f^{-1}( f(\mathbf{u}) + f(\mathbf{v})) \\ &= f^{-1}(f(\mathbf{u}+\mathbf{v})) & \text{as } f \text{ is linear} \\ &= \mathbf{u} + \mathbf{v} \\ &= f^{-1}(\mathbf{x}) + f^{-1}(\mathbf{y}) \end{align*}\]

so \(f^{-1}\) satisfies the first part of the definition of linearity. Now let a be a scalar.

\[\begin{align*} f^{-1}(a \mathbf{x}) &= f^{-1}(a f(\mathbf{u})) \\ &= f^{-1}(f(a\mathbf{u})) & \text{as } f \text{ is linear} \\ &= a\mathbf{u} \\ &= a f^{-1}(\mathbf{x}) \end{align*}\] and so \(f^{-1}\) satisfies the other part of the linearity definition.

Another familiar property of equality is the transitive property: if a = b and b = c then a = c. The same holds for isomorphisms.

Lemma 3.14 If \(U \cong V\) and \(V \cong W\) then \(U \cong W\).

Proof. Let \(f : U \to V\) and \(g : V \to W\) be isomorphisms. I claim that \(g\circ f : U \to W\) is an isomorphism.

Certainly \(g\circ f\) is a bijection, because the composition of two bijections is a bijection. We only need to show that the composition of two linear maps is again linear. Let \(\mathbf{x}, \mathbf{y} \in U\). Then

\[\begin{align*} (g\circ f)(\mathbf{x}+\mathbf{y}) &= g(f(\mathbf{x}+\mathbf{y})) \\ &= g(f(\mathbf{x}) + f(\mathbf{y})) & \text{as } f \text{ is linear} \\ &= g(f(\mathbf{x})) + g(f(\mathbf{y})) & \text{as } g \text{ is linear} &= (g\circ f)(\mathbf{x}) + (g\circ f)(\mathbf{y}) \end{align*}\]

so \(g\circ f\) satisfies the first part of the definition of linearity. The second part is similar.

We can now show that every vector space is isomorphic to a column vector space:

Theorem 3.3 Let V be a \(\ff\)-vector space with \(\dim V = n\). Then \(V \cong \ff^n\).

Proof. Choose a basis \(\mathbf{v}_1,\ldots, \mathbf{v}_n\) of \(V\). Every element of V can be written uniquely as \(\sum_{i=1}^n a_i \mathbf{v}_i\) for some scalars \(a_i\), so we can define a map \(f: V \to \ff^n\) by \(f( \sum_i a_i \mathbf{v}_i) = \begin{pmatrix} a_1 \\ \vdots \\ a_n\end{pmatrix}\).

f is linear: if \(\mathbf{a} = \sum_i a_i \mathbf{v}_i\) and \(\mathbf{b} = \sum_i b_i \mathbf{v}_i\) are elements of V then \(\mathbf{a} + \mathbf{b} = \sum_i a_i \mathbf{v}_i + \sum_i b_i \mathbf{v}_i = \sum_i (a_i+b_i) \mathbf{v}_i\), so

\[\begin{equation*} f(\mathbf{a}+\mathbf{b}) = \begin{pmatrix} a_1+b_1 \\ \vdots \\ a_n + b_n \end{pmatrix} = \begin{pmatrix} a_1 \\ \vdots \\ b_1 \end{pmatrix} + \begin{pmatrix} b_1 \\ \vdots \\ b_n \end{pmatrix} = f(\mathbf{a}) + f(\mathbf{b}) \end{equation*}\]

Similarly, \(f( a \mathbf{a}) = a f(\mathbf{a})\) for any scalar \(a \in \ff\).

f is onto because any element \(\begin{pmatrix} a_1 \\ \vdots \\ a_n\end{pmatrix}\) of \(\ff^n\) equals \(f( \sum_i a_i \mathbf{v}_i)\).

f is one-to-one because if \(\mathbf{a} = \sum_i a_i \mathbf{v}_i\) and \(\mathbf{b} = \sum_i b_i \mathbf{v}_i\) and \(f(\mathbf{a}) = f(\mathbf{b})\) then \(\begin{pmatrix} a_1 \\ \vdots \\ a_n\end{pmatrix} = \begin{pmatrix} b_1 \\ \vdots \\ b_n\end{pmatrix}\), so \(a_i = b_i\) for all i, so \(\mathbf{a} = \mathbf{b}\). Therefore f is an isomorphism.

We can now show that the converse of the lemma at the start of this section holds: if \(\dim U = \dim V\) then \(U \cong V\). For if they have the same dimension n then \(U \cong \ff^n\) and \(V \cong \ff^n\) by the theorem above. By the first lemma in this section, \(\ff^n \cong V\). By the transitivity property, \(U \cong \ff^n \cong V\) implies \(U \cong V\).

3.3.6 Eigenvalues and eigenvectors

We defined eigenvalues and eigenvectors for square matrices in the last part of these notes. Now we will define them for linear maps from a vector space to itself. Because we know how to turn a n✕n matrix into a linear map \(\ff^n \to \ff^n\) (Example 3.10 part 2) this will generalize the old definition for matrices.

Definition 3.13 Let \(f:V \to V\) be a linear map. An eigenvector for f is a nonzero \(\mathbf{v} \in V\) such that there exists \(\lambda \in \ff\) with \(f( \mathbf{v} )= \lambda \mathbf{v}\). In this case we say that \(\mathbf{v}\) is a \(\lambda\)-eigenvector of f, and that \(\lambda\) is an eigenvalue of f.

Example 3.14

Let A be a n✕n matrix. Let \(T_A: \ff^n \to \ff^n\) be \(T_A( \mathbf{v} )= A \mathbf{v}\). Then a column vector \(\mathbf{v}\) is an eigenvector for \(T_A\) in the sense defined above if and only if it is an element of \(\ff^n\) which is an eigenvector for A in the sense defined in the previous part of these notes.
Let V be the real vector space of all polynomials in one variable x with degree at most n with real coefficients, and let \(D:V \to V\) be \(D(f)= \frac{df}{dx}\), the differentiation map. Then D is a linear map. Any nonzero constant polynomial c is an 0-eigenvector of D, because \[\begin{equation*} D(c) = 0 = 0 \times c \end{equation*}\] Any nonconstant polynomial f is an eigenvector of D, because \(D(f)\) has strictly smaller degree than f so cannot be a scalar multiple of f.
Let V be the real vector space of all differentiable functions \(\rr \to \rr\), and let \(D:V \to V\) be \(D(f)=f'\) be the differentiation map again. The function \(f(x)=e^{\lambda x}\) is a \(\lambda\)-eigenvector of D.
Let \(M_{2\times 2}(\rr)\) be the real vector space of all 2✕2 matrices with real entries. Let \(\operatorname{tr}:M_{2\times 2}(\rr) \to M_{2\times 2}(\rr)\) be the map \(\operatorname{tr}(A) = A^T\), the transpose of A. The matrices \[\begin{pmatrix} 0&1\\1&0 \end{pmatrix} \text{ and } \begin{pmatrix} 0&1 \\ -1 & 0 \end{pmatrix} \] are a 1-eigenvector and a \(-1\)-eigenvector of \(\operatorname{tr}\) respectively. Can you find some more 1-eigenvectors?
Let \(\id_V:V \to V\) be the identity map. Then any nonzero \(\mathbf{v}\in V\) is an eigenvector for \(\id_V\) with eigenvalue 1.

Proposition 3.4 Let \(T:V \to V\) be linear and let \(\mathbf{v} _1,\ldots, \mathbf{v} _n\) be eigenvectors of T with eigenvalues \(\lambda_1,\ldots, \lambda _n\). Then \(\mathbf{v} _1,\ldots, \mathbf{v}_n\) are linearly independent.

Slogan version: eigenvectors with different eigenvalues are linearly independent.

Proof. By contradiction. Choose a counterexample with n as small as possible. Note that n must be bigger than 1 in this counterexample, as eigenvectors are nonzero so an eigenvector \(\mathbf{v} _1\) on its own is linearly independent.

A counterexample is a linear dependence relation \[\begin{equation} \tag{3.3} \sum _{i=1}^n a_i \mathbf{v} _i= \mathbf{0}_V. \end{equation}\] with not all \(a_i=0\). Apply the linear map \(T-\lambda_n \id_V\) to both sides. We have \[(T-\lambda _n\id_V) \mathbf{v} _i = T( \mathbf{v} _i)-\lambda _n \mathbf{v}_i = \lambda _i \mathbf{v} _i - \lambda _n \mathbf{v} _i = (\lambda _i-\lambda _n) \mathbf{v} _i\] and so (3.3) becomes \[\sum_{i=1}^n a_i (\lambda_i-\lambda_n) \mathbf{v} _i = \mathbf{0} _V.\] The \(i=n\) term is 0, so we can rewrite this as \[\begin{equation*} \sum _{i=1}^{n-1} a_i (\lambda_i-\lambda_n) \mathbf{v} _i = \mathbf{0}_V. \end{equation*}\] This is a shorter linear dependence relation, so all the coefficients must be 0. Since \(\lambda_i-\lambda _n \neq 0\) for \(i<n\) we must have \(a_i=0\) for \(i<n\). Then (3.3) just says \(a_n \mathbf{v} _n= \mathbf{0}_V\), so \(a_n=0\) too. We’ve shown all the \(a_i\) in (3.3) were zero, a contradiction.

Definition 3.14

A linear map \(T:V \to V\) is called diagonalizable if and only if there is a basis of V consisting of eigenvectors of T.
An n✕n matrix A with entries from \(\ff\) is called diagonalizable over \(\ff\) if and only if there is a basis of \(\ff^n\) consisting of eigenvectors of A.

This means A is diagonalizable if and only if \(T_A:\ff^n \to \ff^n\) is diagonalizable.

Diagonalizable linear maps \(T:V \to V\) are the easiest linear maps to understand, because they behave in a simple way: V has a basis \(\mathbf{v} _1,\ldots, \mathbf{v}_n\) say, consisting of eigenvectors of T, and all T does is to multiply each \(\mathbf{v} _i\) by a certain scalar.

Example 3.15

The zero map \(V \to V\) is diagonalizable. Any basis of V is a basis consisting of 0-eigenvectors for this map.
The identity map \(V \to V\) is diagonalizable. Any basis of V is a basis consisting of 1-eigenvectors for this map. For the same sort of reason, any multiple \(\lambda \id_V\) is diagonalizable too
Let \(n>1\) and let V be the space of polynomials of degree at most n in one variable x over the real numbers. Let \(D:V \to V\) be the differentiation map. We’ve already seen that every eigenvector of D is a constant polynomial, so there cannot be a basis of V consisting of eigenvectors of D which is therefore not diagonalizable.
The transpose map \(\operatorname{tr}: M_{2\times 2}(\rr) \to M_{2\times 2}(\rr)\) is diagonalizable. First note that we have \(\tr (\tr(A))=A\). It follows that if \(\lambda\) is an eigenvalue of \(\tr\) then \(\lambda^2=1\): if A is a \(\lambda\)-eigenvector then \(A= \tr(\tr(A)) = \tr(\lambda A) = \lambda \tr(A) = \lambda ^2 A\). So \(\lambda=\pm 1\). A 1-eigenvector is a matrix which is its own transpose, e.g. \[\begin{pmatrix} 1&0\\0&0 \end{pmatrix} , \begin{pmatrix} 0&0\\0&1 \end{pmatrix} , \begin{pmatrix} 0&1 \\1&0 \end{pmatrix}.\] The matrix \[\begin{pmatrix} 0& 1 \\ -1 & 0 \end{pmatrix}\] is a \(-1\)-eigenvector for \(\tr\). You can check that these four eigenvectors are linearly independent, so they form a basis of the four-dimensional vector space \(M_{2\times 2}(\rr)\), so \(\tr\) is diagonalizable.

A diagonal matrix is a square matrix of the form \[\begin{equation*} \begin{pmatrix} d_{11} & 0 & 0 & \cdots & 0 \\ 0 & d_{11} & 0 & \cdots & 0 \\ 0 & 0 & \ddots & & \vdots \\ \vdots & \vdots & & \ddots & \vdots \\ 0 & 0 & \cdots & \cdots & 0 \end{pmatrix} \end{equation*}\] that is, a matrix \((d_{ij})\) where \(d_{ij}=0\) if \(i \neq j\). We write \(\diag( a_1,\ldots,a_n)\) for the n✕n diagonal matrix whose \((1,1),(2,2),\ldots\) entries are \(a_1,a_2,\ldots\)

If \(T:V \to V\) is diagonalizable and \(\mathcal{B} = \mathbf{v} _1,\ldots, \mathbf{v}_n\) is a basis of V in eigenvectors of T then its matrix with respect to \(\mathcal{B}\) is diagonal: \[\begin{equation*} [T]_ {\mathcal{B}} ^ {\mathcal{B}} = \diag ( \lambda_1,\ldots,\lambda_n) \end{equation*}\] where \(\lambda_i\) is the eigenvalue of \(\mathbf{v} _i\).

Proposition 3.5 If \(\dim V=n\) and \(T:V \to V\) has n distinct eigenvalues then T is diagonalizable.

Proof. Let the distinct eigenvalues be \(\lambda_1,\ldots,\lambda_n\). Let \(\mathbf{v_i}\) be a \(\lambda_i\)-eigenvector. Then the \(\mathbf{v} _i\) form a basis of V, since they are linearly independent by Proposition 3.4 and there are \(n=\dim V\) of them so they span a subspace of V with dimension \(\dim V\) and which is therefore all of V.

It is important to remember that the converse of this result is false. A diagonalizable linear map does not have to have \(\dim V\) distinct eigenvalues. Consider the identity map \(V \to V\): it only has one eigenvalue, namely 1, but it is diagonalizable: any basis of V is a basis consisting of 1-eigenvectors of the identity map.

Proposition 3.6 Let A be a diagonalizable matrix. Let \(\mathbf{v} _1,\ldots, \mathbf{v}_n\) be a basis of \(\ff^n\) consisting of eigenvectors of A. Let P be the matrix whose columns are the \(\mathbf{v}_i\). Then \[\begin{equation*} P^{-1}AP =D \end{equation*}\] where D is the diagonal matrix whose ith diagonal entry is the eigenvalue of \(\mathbf{v} _i\).

Proof. Let \(\mathcal{B}\) be the standard basis of \(\ff^n\) and \(\mathcal{C}\) the basis \(\mathbf{v}_1,\ldots, \mathbf{v} _n\). P is the matrix of the identity map with respect to initial basis \(\mathcal{C}\) and final basis \(\mathcal{B}\) that is, \(P=[\id]^{\mathcal{C}}_{\mathcal{B}}\). By Proposition 3.3, \[\begin{equation*} I_n= [\id]_ {\mathcal{B}} ^ {\mathcal{B}} = [\id]_ {\mathcal{B}} ^ {\mathcal{C}} [\id]_ {\mathcal{C}} ^ {\mathcal{B}} = P [\id]_ {\mathcal{C}}^ {\mathcal{B}} . \end{equation*}\] So P is invertible with inverse \(P^{-1}=[\id]_ {\mathcal{C}} ^ {\mathcal{B}}\). By the change of basis formula, \[\begin{equation*} [T_A]_ {\mathcal{B}} ^ {\mathcal{B}} [\id] _ {\mathcal{B}} ^ {\mathcal{C}} = [\id]^ {\mathcal{C}} _ {\mathcal{B}} [T_A]^ {\mathcal{C}} _ {\mathcal{C} } \end{equation*}\] Since \([T_A]^ {\mathcal{B}} _ {\mathcal{B}} =A\) and \([T_A]^ {\mathcal{C}} _ {\mathcal{C}}=D\) this says \(AP=PD\), which is equivalent to what we wanted to prove.

One reason this is useful is in finding powers of matrices. It’s very easy to find powers of diagonal matrices, for example, \[\begin{equation*} \begin{pmatrix} a & 0 \\ 0 & b \end{pmatrix} ^n = \begin{pmatrix} a^n & 0 \\ 0 & b^n \end{pmatrix} . \end{equation*}\] If we have a diagonalizable matrix A there is a basis of \(\ff^n\) consisting of eigenvectors of A, and we can form a matrix P whose columns are the elements of this basis. Then \(P^{-1}AP=D\) for a diagonal matrix D, so \(A=PDP^{-1}\), so \[\begin{align*} A^n &= (PDP^{-1})^n \\ &= PDP^{-1}PDP^{-1}\cdots PDP^{-1}PDP^{-1} \\ &= PDI_n D I_n \cdots I_n D I_n P^{-1} \\ &= PDD\cdots D P^{-1}\\ &= PD^n P^{-1} \end{align*}\] Since \(D^n\) is easy to find, this gives a method of computing a closed formula for \(A^n\).

Example 3.16 Let \(A = \begin{pmatrix} 1 & 1 \\ 1&1 \end{pmatrix}\). We’ve seen that the characteristic polynomial of A is \(x(x-2)\), so A has two distinct eigenvalues 0 and 2, so A is diagonalizable. If we find a 0-eigenvector and a 2-eigenvector they must be linearly independent (as they are eigenvectors with different eigenvalues), so they will form a basis of \(\rr^2\). It turns out that \(\begin{pmatrix} 1\\1 \end{pmatrix}\) is a 2-eigenvector and \(\begin{pmatrix} 1\\-1 \end{pmatrix}\) is a 0-eigenvector. Let P be the matrix with these eigenvectors as columns, so \(P = \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}\). Then \(P^{-1}AP\) is a diagonal matrix with the eigenvalues of A on the diagonal: \[\begin{equation*} P^{-1}AP = \begin{pmatrix} 2&0 \\ 0 & 0 \end{pmatrix} . \end{equation*}\]