2.2 Matrix multiplication

We are going to build up the definition of matrix multiplication in several steps. The first is that if $\mathbf{r}= (r_1,\ldots, r_n)$ is a 1✕n row vector and $\mathbf{c} = \begin{pmatrix} c_1 \\ \vdots \\ c_n \end{pmatrix}$ is a n✕1 column vector, we define \[ \mathbf{r}\mathbf{c} = r_1c_1 + \cdots + r_n c_n. \] This might remind you of the dot product, if you have seen that before.

For example, if $\mathbf{r}=(1\; 2)$ and $\mathbf{c} = \begin{pmatrix} 3 \\ 4\end{pmatrix}$ then \[\mathbf{rc} = 1\times 3 + 2 \times 4 = 11.\]

Next, suppose A is a m✕n matrix and $\mathbf{c}$ is again a height n column vector. Let $\mathbf{r}_1,\ldots, \mathbf{r}_m$ be the rows of A; these are 1✕n row vectors. We then define $A\mathbf{c}$ to be the m✕1 column vector \[ \begin{pmatrix} \mathbf{r}_1 \mathbf{c} \\ \vdots \\ \mathbf{r}_m \mathbf{c} \end{pmatrix}.\] For example, if $A = \begin{pmatrix}1&2&3\\4&5&6\end{pmatrix}$ and $\mathbf{c} = \begin{pmatrix}1\\0\\-1\end{pmatrix}$ then \[ A\mathbf{c} =\begin{pmatrix} 1\times 1 + 2 \times 0 + 3 \times (-1) \\ 4 \times 1 + 5 \times 0 + 6 \times (-1)\end{pmatrix} \]

Finally, the most general definition of matrix multiplication. Suppose A is m✕n and B is n✕p (so that the length of a row of A is the same as the height of a column of B). Write $\mathbf{c}_j$ for the jth column of B. Then we define AB to be the m✕p matrix whose jth column is $A\mathbf{c}_j$: \[AB = (A\mathbf{c}_1 \; A\mathbf{c}_2 \; \cdots A \mathbf{c}_p)\] This time if $A = \begin{pmatrix}1&2\\3&4\end{pmatrix}$ and $B = \begin{pmatrix} 5&6&7 \\ 1&0&-1 \end{pmatrix}$, the columns of B are $c_1 = \begin{pmatrix}5\\1\end{pmatrix}$, $c_2 = \begin{pmatrix} 6\\0\end{pmatrix}$, and $c_3 = \begin{pmatrix} 7\\-1\end{pmatrix}$ and \[\begin{align*}AB &= (A\mathbf{c}_1\; A\mathbf{c}_2 \; A\mathbf{c}_3) \\ &= \begin{pmatrix} 1\times 5 + 2\times 1 & 1\times 6 + 2 \times 0 & 1\times 7 + 2 \times (-1) \\ 3\times 5 + 4 \times 1 &3\times 6 + 4\times 0 & 3 \times 7 + 4 \times (-1) \end{pmatrix}\end{align*}\]

This definition of matrix multiplication is good for hand calculations, and for emphasising that matrix multiplication happens columnwise (that is, A multiplies into B one column of B at a time). Sometimes, especially when proving properties of matrhx multiplication, it is more convenient to have a definition with an explicit formula. The definition we have just seen is equivalent to the following:

Definition 2.5 Let $A=(a_{ij})$ be m✕n and $B=(b_{ij})$ be n✕p. The matrix product AB is the m✕p matrix whose i, j entry is \[\sum_{k=1}^n a_{ik}b_{kj}.\]

Notice that we only define the product AB when the number of columns of A is the same as the number of rows of B.

Here are some important properties of matrix multiplication:

Proposition 2.1 Let A and A’ be m✕n, B and B’ be n✕p, and C be p✕q. Let l be a number.

(AB)C=A(BC) (that is, matrix multiplication is associative).
(A+A’)B=AB + A’B, A(B+B’)=AB+AB’ (matrix multiplication distributes over addition).
$(lA) B = l (AB) = A(l B)$.
$A \mathbf{0}_{n\times p}= \mathbf{0}_{m\times p}$ and $\mathbf{0}_{p\times m} A = \mathbf{0}_{p\times n}$.
$(AB)^T = B^T A^T$.

Proof. Let $A=(a_{ij}), A'=(a'_{ij}), B=(b_{ij}), B'=(b'_{ij}), C=(c_{ij})$. During this proof we also write $X_{ij}$ to mean the i, j entry of a matrix X.

AB has i, j entry $\sum_{k=1}^n a_{ik}b_{kj}$, so the i, j entry of (AB)C is \[ \sum_{l=1}^p (AB)_{il} c_{lj} = \sum_{l=1}^p \sum_{k=1}^n a_{ik}b_{kl}c_{lj}.\] On the other hand, the i, j entry of BC is $\sum_{l=1}^pb_{il}c_{lj}$ so the i, j entry of A(BC) is \[\begin{align*}\sum_{k=1}^n a_{ik}(BC)_{kj} &= \sum_{k=1}^n a_{ik} \sum_{l=1}^p b_{kl}c_{lj}\\ &= \sum_{k=1}^n \sum_{l=1}^p a_{ik}b_{kl}c_{lj}. \end{align*}\] These are the same: it doesn’t matter if we do the k or l summation first, since we just get the same terms in a different order.
The i, j entry of (A+A’)B is $\sum_{k=1}^n (a_{ik} + a'_{ik})b_{jk}$ which equals $\sum_{k=1}^n a_{ik}b_{kj} + \sum_{k=1}^n a'_{ik}b_{kj}$, but this is the sum of the i, j entry of AB and the i, j entry of A’B, proving the first equality. The second is similar.
The i, j entry of $l A$ is $\lambda a_{ij}$, so the i, j entry of (l A)B is \[ \sum_{k=1}^n(l a_{ik})b_{kj} = l \sum_{k=1}^n a_{ik}b_{kj}= l (AB)_{ij}\] so (l A)B$ and l (AB) have the same i, j entry for any i, j, and are therefore equal. The second equality can be proved similarly.
This is clear from the definition of matrix multiplication.
The i, j entry of AB is $\sum_{k=1}^n a_{ik}b_{kj}$, so the i, j entry of $(AB)^T$ is $\sum_{k=1}^n a_{jk}b_{ki}$. If we write the i, j entry of $B^T$ as $\beta_{ij}$ and the i, j entry of $A^T$ as $\alpha_{ij}$ then the i, j entry of $B^T A^T$ is $\sum_{i=1}^n \beta_{ik}\alpha_{kj}$, but $\beta_{ik}=b_{ki}$ and $\alpha_{kj} = a_{jk}$ so this is $\sum_{i=1}^n a_{jk}b_{ki}$ which is the same as the i, j entry of $(AB)^T$.

These properties show some of the ways in which matrix multiplication is like ordinary multiplication of numbers. There are two important ways in which it is different: in general, $AB \neq BA$ \[\begin{align*} \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix} & = \begin{pmatrix} 19 & 22 \\ 43 & 50 \end{pmatrix} \\ \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix} \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} &= \begin{pmatrix} 23 & 34 \\ 31 & 46 \end{pmatrix} \end{align*}\] and unlike for multiplying numbers, we can have $AB=\mathbf{0}$ even when $A,B\neq \mathbf{0}$: \[ \begin{pmatrix} 0&1\\0&0 \end{pmatrix} \begin{pmatrix} 0 & 2 \\ 0 & 0 \end{pmatrix} = \begin{pmatrix} 0&0\\0&0 \end{pmatrix}.\]

There is a matrix which behaves like 1 does under multiplication:

Definition 2.6 The n✕n identity matrix $I_n$ is the matrix whose i, j entry is 1 if i=j and 0 otherwise.

a For example, \[ I_2= \begin{pmatrix} 1&0\\0&1\end{pmatrix},\;\; I_3 = \begin{pmatrix} 1&0&0\\0&1&0\\0&0&1 \end{pmatrix}. \]

Proposition 2.2 Let A be m✕n. Then $AI_n = A = I_m A$.

Proof. Let $A=(a_{ij})$, and write $I_n = (\delta_{ij})$ so that $\delta_{ij}=0$ if $i\neq j$ and $\delta_{ii}=1$ for each i. Using the definition of matrix multiplication, the i, j entry of $AI_n$ is \[ \sum_{k=1}^n a_{ik}\delta_{kj}. \] Because $\delta_{kj}$ is zero unless k=j, the only term in this sum which might not be zero is the one when k=j. This term is $a_{ij}\times 1 = a_{ij}$, so the i, j entry of $AI_n$ is the same as the i, j entry of A and therefore $AI_n=A$.

The proof that $I_m A=A$ is similar.

When we work in $\rr^n$ or $\mathbb{C}^n$, we use the notation $\mathbf{e}_i$ to refer to the height n column vector with a 1 in position i and zeroes elsewhere. For example, if n was 3 then

\[\mathbf{e}_1 = \begin{pmatrix} 1\\0\\0\end{pmatrix}, \mathbf{e}_2 = \begin{pmatrix} 0\\1\\0\end{pmatrix}, \mathbf{e}_3 = \begin{pmatrix} 0\\0\\1\end{pmatrix}.\]

The vector $\mathbf{e}_i$ is called the $i$th standard basis vector - you will learn what a basis is in the next section. These vectors have an important property with respect to matrix multiplication:

Lemma 2.1 Let A be an m✕n matrix. Then for any $1\leq x \leq n$ the product $A \mathbf{e}_x$ is equal to the xth column of A.

Proof. Let $A = (a_{ij})$, so for any $1\leq i \leq m$ the ith entry of the height m column vector $A \mathbf{e}_x$ is $\sum_{k=1}^n a_{ik}e_k$ where $e_k$ is the kth entry of $\mathbf{e}_x$. Since $e_k$ is zero unless k=x, in which case it is 1, the sum equals $a_{ix}$. It follows \[A\mathbf{e}_x = \begin{pmatrix} a_{1x} \\ \vdots \\ a_{mx}\end{pmatrix}\] which is the xth column of A.

2.2.1 Invertibility

Definition 2.7 An n✕n matrix A is called invertible if there is an n✕n matrix B such that $AB=I_n=BA$.

The next lemma shows that if A is invertible then there is one and only one matrix B such that $AB=I_n=BA$.

Lemma 2.2 Suppose $AB = BA = I_n$ and $AC = CA = I_n$. Then B=C.

Proof. \[\begin{align*} B&= I_nB & \text{as } IX=X \text{ for any} X \\ &= (CA)B \\ &= C(AB) & \text{associativity} \\ &=CI_n \\ &= C. \end{align*}\]

When A is invertible we use the notation $A^{-1}$ for the one and only matrix such that $AA^{-1}=A^{-1}A=I_n$.

Not every non-zero square matrix is invertible: you can check directly that $\begin{pmatrix} 0&1\\0&0 \end{pmatrix}$ and $\begin{pmatrix}1&1\\1&1\end{pmatrix}$ aren’t invertible, for example. More generally, suppose that A and B are any non-zero matrices such that $AB=\mathbf{0}_{n\times n}$ (we have already seen examples of this). If A were invertible we could multiply this equation on the left by the inverse of A to get \[\begin{align*} A^{-1} AB &= A^{-1} \mathbf{0}_{n\times n} \\ B &= \mathbf{0}_{n\times n} \end{align*}\] which is not the case. It follows that if there is a non-zero matrix B such that $AB=\mathbf{0}_{n\times n}$ then A isn’t invertible (and a similar argument, multiplying on the right instead of the left, shows neither is B).

Here is another useful result that can tell you a matrix is not invertible:

Lemma 2.3 Let A be a matrix with a row of zeroes or a column of zeroes. Then A is not invertible.

Proof. If A has a column of zeroes then, because matrix multiplication works columnwise, so does BA for any $B$. So BA never equals the identity matrix, and thus A is not invertible.

The proof for the case when A has a row of zeroes is similar.

2.2.2 Why define matrix multiplication this way?

The last thing in this section on matrix multiplication is a quick comment on where it comes from.
Write $\mathbb{R}^p$ for the set of all p✕1 column vectors with real numbers as entries. If A is an m✕n matrix (with real number entries, say) then there is a function \[\begin{align*} T_A: & \mathbb{R}^n \to \mathbb{R}^m \\ & T_A(\mathbf{v}) = A\mathbf{v} \end{align*}\]

Now suppose B is n✕p, so that there is a similar map $T_B : \mathbb{R}^p \to \mathbb{R}^n$. The composition $T_A \circ T_B$ makes sense as a map $\mathbb{R}^p \to \mathbb{R}^m$, and it turns out \[ T_A \circ T_B = T_{AB}.\] The connection with composition of maps is why matrix multiplication is defined the way it is.

We use a similar notation for complex column vectors: $\mathbb{C}^p$ denotes the set of all height p column vectors with complex number entries.