2.2 Matrix multiplication

We are going to build up the definition of matrix multiplication in several steps. The first is that if \(\mathbf{r}= (r_1,\ldots, r_n)\) is a 1✕n row vector and \(\mathbf{c} = \begin{pmatrix} c_1 \\ \vdots \\ c_n \end{pmatrix}\) is a n✕1 column vector, we define \[ \mathbf{r}\mathbf{c} = r_1c_1 + \cdots + r_n c_n. \] This might remind you of the dot product, if you have seen that before.

For example, if \(\mathbf{r}=(1\; 2)\) and \(\mathbf{c} = \begin{pmatrix} 3 \\ 4\end{pmatrix}\) then \[\mathbf{rc} = 1\times 3 + 2 \times 4 = 11.\]

Next, suppose A is a mn matrix and \(\mathbf{c}\) is again a height n column vector. Let \(\mathbf{r}_1,\ldots, \mathbf{r}_m\) be the rows of A; these are 1✕n row vectors. We then define \(A\mathbf{c}\) to be the m✕1 column vector \[ \begin{pmatrix} \mathbf{r}_1 \mathbf{c} \\ \vdots \\ \mathbf{r}_m \mathbf{c} \end{pmatrix}.\] For example, if \(A = \begin{pmatrix}1&2&3\\4&5&6\end{pmatrix}\) and \(\mathbf{c} = \begin{pmatrix}1\\0\\-1\end{pmatrix}\) then \[ A\mathbf{c} =\begin{pmatrix} 1\times 1 + 2 \times 0 + 3 \times (-1) \\ 4 \times 1 + 5 \times 0 + 6 \times (-1)\end{pmatrix} \]

Finally, the most general definition of matrix multiplication. Suppose A is mn and B is np (so that the length of a row of A is the same as the height of a column of B). Write \(\mathbf{c}_j\) for the jth column of B. Then we define AB to be the mp matrix whose jth column is \(A\mathbf{c}_j\): \[AB = (A\mathbf{c}_1 \; A\mathbf{c}_2 \; \cdots A \mathbf{c}_p)\] This time if \(A = \begin{pmatrix}1&2\\3&4\end{pmatrix}\) and \(B = \begin{pmatrix} 5&6&7 \\ 1&0&-1 \end{pmatrix}\), the columns of B are \(c_1 = \begin{pmatrix}5\\1\end{pmatrix}\), \(c_2 = \begin{pmatrix} 6\\0\end{pmatrix}\), and \(c_3 = \begin{pmatrix} 7\\-1\end{pmatrix}\) and \[\begin{align*}AB &= (A\mathbf{c}_1\; A\mathbf{c}_2 \; A\mathbf{c}_3) \\ &= \begin{pmatrix} 1\times 5 + 2\times 1 & 1\times 6 + 2 \times 0 & 1\times 7 + 2 \times (-1) \\ 3\times 5 + 4 \times 1 &3\times 6 + 4\times 0 & 3 \times 7 + 4 \times (-1) \end{pmatrix}\end{align*}\]

This definition of matrix multiplication is good for hand calculations, and for emphasising that matrix multiplication happens columnwise (that is, A multiplies into B one column of B at a time). Sometimes, especially when proving properties of matrhx multiplication, it is more convenient to have a definition with an explicit formula. The definition we have just seen is equivalent to the following:


Definition 2.5 Let \(A=(a_{ij})\) be mn and \(B=(b_{ij})\) be np. The matrix product AB is the mp matrix whose i, j entry is \[\sum_{k=1}^n a_{ik}b_{kj}.\]


Notice that we only define the product AB when the number of columns of A is the same as the number of rows of B.

Here are some important properties of matrix multiplication:


Proposition 2.1 Let A and A’ be mn, B and B’ be np, and C be pq. Let l be a number.

  • (AB)C=A(BC) (that is, matrix multiplication is associative).
  • (A+A’)B=AB + AB, A(B+B’)=AB+AB’ (matrix multiplication distributes over addition).
  • \((lA) B = l (AB) = A(l B)\).
  • \(A \mathbf{0}_{n\times p}= \mathbf{0}_{m\times p}\) and \(\mathbf{0}_{p\times m} A = \mathbf{0}_{p\times n}\).
  • \((AB)^T = B^T A^T\).

Proof. Let \(A=(a_{ij}), A'=(a'_{ij}), B=(b_{ij}), B'=(b'_{ij}), C=(c_{ij})\). During this proof we also write \(X_{ij}\) to mean the i, j entry of a matrix X.

  • AB has i, j entry \(\sum_{k=1}^n a_{ik}b_{kj}\), so the i, j entry of (AB)C is \[ \sum_{l=1}^p (AB)_{il} c_{lj} = \sum_{l=1}^p \sum_{k=1}^n a_{ik}b_{kl}c_{lj}.\] On the other hand, the i, j entry of BC is \(\sum_{l=1}^pb_{il}c_{lj}\) so the i, j entry of A(BC) is \[\begin{align*}\sum_{k=1}^n a_{ik}(BC)_{kj} &= \sum_{k=1}^n a_{ik} \sum_{l=1}^p b_{kl}c_{lj}\\ &= \sum_{k=1}^n \sum_{l=1}^p a_{ik}b_{kl}c_{lj}. \end{align*}\] These are the same: it doesn’t matter if we do the k or l summation first, since we just get the same terms in a different order.
  • The i, j entry of (A+A’)B is \(\sum_{k=1}^n (a_{ik} + a'_{ik})b_{jk}\) which equals \(\sum_{k=1}^n a_{ik}b_{kj} + \sum_{k=1}^n a'_{ik}b_{kj}\), but this is the sum of the i, j entry of AB and the i, j entry of A’B, proving the first equality. The second is similar.
  • The i, j entry of \(l A\) is \(\lambda a_{ij}\), so the i, j entry of (l A)B is \[ \sum_{k=1}^n(l a_{ik})b_{kj} = l \sum_{k=1}^n a_{ik}b_{kj}= l (AB)_{ij}\] so (l A)B$ and l (AB) have the same i, j entry for any i, j, and are therefore equal. The second equality can be proved similarly.
  • This is clear from the definition of matrix multiplication.
  • The i, j entry of AB is \(\sum_{k=1}^n a_{ik}b_{kj}\), so the i, j entry of \((AB)^T\) is \(\sum_{k=1}^n a_{jk}b_{ki}\). If we write the i, j entry of \(B^T\) as \(\beta_{ij}\) and the i, j entry of \(A^T\) as \(\alpha_{ij}\) then the i, j entry of \(B^T A^T\) is \(\sum_{i=1}^n \beta_{ik}\alpha_{kj}\), but \(\beta_{ik}=b_{ki}\) and \(\alpha_{kj} = a_{jk}\) so this is \(\sum_{i=1}^n a_{jk}b_{ki}\) which is the same as the i, j entry of \((AB)^T\).

These properties show some of the ways in which matrix multiplication is like ordinary multiplication of numbers. There are two important ways in which it is different: in general, \(AB \neq BA\) \[\begin{align*} \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix} & = \begin{pmatrix} 19 & 22 \\ 43 & 50 \end{pmatrix} \\ \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix} \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} &= \begin{pmatrix} 23 & 34 \\ 31 & 46 \end{pmatrix} \end{align*}\] and unlike for multiplying numbers, we can have \(AB=\mathbf{0}\) even when \(A,B\neq \mathbf{0}\): \[ \begin{pmatrix} 0&1\\0&0 \end{pmatrix} \begin{pmatrix} 0 & 2 \\ 0 & 0 \end{pmatrix} = \begin{pmatrix} 0&0\\0&0 \end{pmatrix}.\]

There is a matrix which behaves like 1 does under multiplication:


Definition 2.6 The nn identity matrix \(I_n\) is the matrix whose i, j entry is 1 if i=j and 0 otherwise.

a For example, \[ I_2= \begin{pmatrix} 1&0\\0&1\end{pmatrix},\;\; I_3 = \begin{pmatrix} 1&0&0\\0&1&0\\0&0&1 \end{pmatrix}. \]


Proposition 2.2 Let A be mn. Then \(AI_n = A = I_m A\).

Proof. Let \(A=(a_{ij})\), and write \(I_n = (\delta_{ij})\) so that \(\delta_{ij}=0\) if \(i\neq j\) and \(\delta_{ii}=1\) for each i. Using the definition of matrix multiplication, the i, j entry of \(AI_n\) is \[ \sum_{k=1}^n a_{ik}\delta_{kj}. \] Because \(\delta_{kj}\) is zero unless k=j, the only term in this sum which might not be zero is the one when k=j. This term is \(a_{ij}\times 1 = a_{ij}\), so the i, j entry of \(AI_n\) is the same as the i, j entry of A and therefore \(AI_n=A\).

The proof that \(I_m A=A\) is similar.

When we work in \(\rr^n\) or \(\mathbb{C}^n\), we use the notation \(\mathbf{e}_i\) to refer to the height n column vector with a 1 in position i and zeroes elsewhere. For example, if n was 3 then

\[\mathbf{e}_1 = \begin{pmatrix} 1\\0\\0\end{pmatrix}, \mathbf{e}_2 = \begin{pmatrix} 0\\1\\0\end{pmatrix}, \mathbf{e}_3 = \begin{pmatrix} 0\\0\\1\end{pmatrix}.\]

The vector \(\mathbf{e}_i\) is called the \(i\)th standard basis vector - you will learn what a basis is in the next section. These vectors have an important property with respect to matrix multiplication:

Lemma 2.1 Let A be an mn matrix. Then for any \(1\leq x \leq n\) the product \(A \mathbf{e}_x\) is equal to the xth column of A.
Proof. Let \(A = (a_{ij})\), so for any \(1\leq i \leq m\) the ith entry of the height m column vector \(A \mathbf{e}_x\) is \(\sum_{k=1}^n a_{ik}e_k\) where \(e_k\) is the kth entry of \(\mathbf{e}_x\). Since \(e_k\) is zero unless k=x, in which case it is 1, the sum equals \(a_{ix}\). It follows \[A\mathbf{e}_x = \begin{pmatrix} a_{1x} \\ \vdots \\ a_{mx}\end{pmatrix}\] which is the xth column of A.

2.2.1 Invertibility

Definition 2.7 An nn matrix A is called invertible if there is an nn matrix B such that \(AB=I_n=BA\).

The next lemma shows that if A is invertible then there is one and only one matrix B such that \(AB=I_n=BA\).


Lemma 2.2 Suppose \(AB = BA = I_n\) and \(AC = CA = I_n\). Then B=C.
Proof. \[\begin{align*} B&= I_nB & \text{as } IX=X \text{ for any} X \\ &= (CA)B \\ &= C(AB) & \text{associativity} \\ &=CI_n \\ &= C. \end{align*}\]

When A is invertible we use the notation \(A^{-1}\) for the one and only matrix such that \(AA^{-1}=A^{-1}A=I_n\).

Not every non-zero square matrix is invertible: you can check directly that \(\begin{pmatrix} 0&1\\0&0 \end{pmatrix}\) and \(\begin{pmatrix}1&1\\1&1\end{pmatrix}\) aren’t invertible, for example. More generally, suppose that A and B are any non-zero matrices such that \(AB=\mathbf{0}_{n\times n}\) (we have already seen examples of this). If A were invertible we could multiply this equation on the left by the inverse of A to get \[\begin{align*} A^{-1} AB &= A^{-1} \mathbf{0}_{n\times n} \\ B &= \mathbf{0}_{n\times n} \end{align*}\] which is not the case. It follows that if there is a non-zero matrix B such that \(AB=\mathbf{0}_{n\times n}\) then A isn’t invertible (and a similar argument, multiplying on the right instead of the left, shows neither is B).

Here is another useful result that can tell you a matrix is not invertible:

Lemma 2.3 Let A be a matrix with a row of zeroes or a column of zeroes. Then A is not invertible.

Proof. If A has a column of zeroes then, because matrix multiplication works columnwise, so does BA for any \(B\). So BA never equals the identity matrix, and thus A is not invertible.

The proof for the case when A has a row of zeroes is similar.

2.2.2 Why define matrix multiplication this way?

The last thing in this section on matrix multiplication is a quick comment on where it comes from.
Write \(\mathbb{R}^p\) for the set of all p✕1 column vectors with real numbers as entries. If A is an mn matrix (with real number entries, say) then there is a function \[\begin{align*} T_A: & \mathbb{R}^n \to \mathbb{R}^m \\ & T_A(\mathbf{v}) = A\mathbf{v} \end{align*}\]

Now suppose B is np, so that there is a similar map \(T_B : \mathbb{R}^p \to \mathbb{R}^n\). The composition \(T_A \circ T_B\) makes sense as a map \(\mathbb{R}^p \to \mathbb{R}^m\), and it turns out \[ T_A \circ T_B = T_{AB}.\] The connection with composition of maps is why matrix multiplication is defined the way it is.

We use a similar notation for complex column vectors: \(\mathbb{C}^p\) denotes the set of all height p column vectors with complex number entries.