of this series [1], [2], and [3], we have observed:
- interpretation of multiplication of a matrix by a vector,
- the physical meaning of matrix-matrix multiplication,
- the behavior of several special-type matrices, and
- visualization of matrix transpose.
In this story, I want to share my perspective on what lies beneath matrix inversion, why different formulas related to inversion are the way they actually are, and finally, why calculating the inverse can be done much more easily for matrices of several special types.
Here are the definitions that I use throughout the stories of this series:
- Matrices are denoted with uppercase (like ‘A‘, ‘B‘), while vectors and scalars are denoted with lowercase (like ‘x‘, ‘y‘ or ‘m‘, ‘n‘).
- |x| – is the length of vector ‘x‘,
- AT – is the transpose of matrix ‘A‘,
- B-1 – is the inverse of matrix ‘B‘.
Definition of the inverse matrix
From part 1 of this series – “matrix-vector multiplication” [1], we remember that a certain matrix “A“, when multiplied by a vector ‘x‘ as “y = Ax“, can be treated as a transformation of input vector ‘x‘ into the output vector ‘y‘. If so, then the inverse matrix A-1 should do the reverse transformation – it should transform vector ‘y‘ back to ‘x‘:
[begin{equation*}
x = A^{-1}y
end{equation*}]
Substituting “y = Ax” there will give us:
[begin{equation*}
x = A^{-1}y = A^{-1}(Ax) = (A^{-1}A)x
end{equation*}]
which means that the product of the original matrix and its inverse – A-1A, should be such a matrix, which does no transformation to any input vector ‘x‘. In other words:
[begin{equation*}
(A^{-1}A) = E
end{equation*}]
where “E” is the identity matrix.

The first question that can arise here is, is it always possible to reverse the influence of a certain matrix “A“? The answer is – it is possible, only if no 2 different input vectors x1 and x2 are being transformed through “A” into the same output vector ‘y‘. In other words, the inverse matrix A-1 exists only if for any output vector ‘y‘ there exists exactly one input vector ‘x‘, which is transformed through “A” into it:
[begin{equation*}
y = Ax
end{equation*}]


In this series, I don’t want to dive too much into the formal part of definitions and proofs. Instead, I want to observe several cases where it is actually possible to invert the given matrix “A“, and we will see how the inverse matrix A-1 is calculated for each of those cases.
Inverting chains of matrices
An important formula related to matrix inverse is:
[begin{equation*}
(AB)^{-1} = B^{-1}A^{-1}
end{equation*}]
which states that the inverse of the product of matrices is equal to the product of inverse matrices, but in the reverse order. Let’s understand why the order of matrices is being reversed.
What is the physical meaning of the inverse (AB)-1? It should be such a matrix that turns back the influence of the matrix (AB). So if:
[begin{equation*}
y = (AB)x,
end{equation*}]
then, we should have:
[begin{equation*}
x = (AB)^{-1}y.
end{equation*}]
Now the transformation “y = (AB)x” goes in 2 steps: first, we do:
[begin{equation*}
Bx = t,
end{equation*}]
which gives an intermediate vector ‘t‘, and then that ‘t‘ is multiplied by “A“:
[begin{equation*}
y = At = A(Bx).
end{equation*}]

So the matrix “A” influenced the vector after it was already influenced by “B“. In this case, to turn back such a sequential influence, at first we should turn back the influence of “A“, by multiplying A-1 over ‘y‘, which will give us:
[begin{equation*}
A^{-1}y = A^{-1}(ABx) = (A^{-1}A)Bx = EBx = Bx = t,
end{equation*}]
… the intermediate vector ‘t‘, produced a bit above.

Note, the vector ‘t’ participates here twice.
Then, after getting back the intermediate vector ‘t‘, to restore ‘x‘, we should also reverse the influence of matrix “B“. And that is done by multiplying B-1 over ‘t‘:
[begin{equation*}
B^{-1}t = B^{-1}(Bx) = (B^{-1}B)x = Ex = x,
end{equation*}]
or writing it all in an expanded way:
[begin{equation*}
x = B^{-1}(A^{-1}A)Bx = (B^{-1}A^{-1})(AB)x,
end{equation*}]
which explicitly shows that to turn back the influence of the matrix (AB) we should use (B-1A-1).

Note, both vectors ‘x’ and ‘t’ participate here twice.
This is why in the inverse of a product of matrices, their order is reversed:
[begin{equation*}
(AB)^{-1} = B^{-1}A^{-1}
end{equation*}]
The same principle is applied when we have more matrices in a chain, like:
[begin{equation*}
(ABC)^{-1} = C^{-1}B^{-1}A^{-1}
end{equation*}]
Inversion of several special matrices
Now, with the perception of what lies beneath matrix inversion, let’s view how matrices of several special types are being inverted.
Inverse of cyclic-shift matrix
A cyclic-shift matrix is such a matrix “V“, which when multiplied by an input vector ‘x‘, produces an output vector “y = Vx“, where all values of ‘x‘ are cyclic shifted by some ‘k‘ positions. To achieve that, the cyclic-shift matrix “V” has 2 lines of ‘1’s, which reside parallel to its main diagonal, while all other cells of it are ‘0’s.
[begin{equation*}
begin{pmatrix}
y_1 \ y_2 \ y_3 \ y_4 \ y_5
end{pmatrix}
= y = Vx =
begin{bmatrix}
0 & 0 & 1 & 0 & 0 \
0 & 0 & 0 & 1 & 0 \
0 & 0 & 0 & 0 & 1 \
1 & 0 & 0 & 0 & 0 \
0 & 1 & 0 & 0 & 0
end{bmatrix}
*
begin{pmatrix}
x_1 \ x_2 \ x_3 \ x_4 \ x_5
end{pmatrix}
=
begin{pmatrix}
x_3 \ x_4 \ x_5 \ x_1 \ x_2
end{pmatrix}
end{equation*}]

Now, how should we undo the transformation of the cyclic-shift matrix “V“? Obviously, we should apply another cyclic-shift matrix V-1, which now cyclic shifts all the values of ‘y‘ downwards by ‘k‘ positions (remember, “V” was shifting all the values of ‘x‘ upwards).
[begin{equation*}
begin{pmatrix}
x_1 \ x_2 \ x_3 \ x_4 \ x_5
end{pmatrix}
= x = V^{-1}Vx =
begin{bmatrix}
0 & 0 & 0 & 1 & 0 \
0 & 0 & 0 & 0 & 1 \
1 & 0 & 0 & 0 & 0 \
0 & 1 & 0 & 0 & 0 \
0 & 0 & 1 & 0 & 0
end{bmatrix}
begin{bmatrix}
0 & 0 & 1 & 0 & 0 \
0 & 0 & 0 & 1 & 0 \
0 & 0 & 0 & 0 & 1 \
1 & 0 & 0 & 0 & 0 \
0 & 1 & 0 & 0 & 0
end{bmatrix}
begin{pmatrix}
x_1 \ x_2 \ x_3 \ x_4 \ x_5
end{pmatrix}
= V^{-1}y
end{equation*}]

This is why the inverse of a cyclic-shift matrix is another cyclic-shift matrix:
[begin{equation*}
V_1^{-1} = V_2
end{equation*}]
More than that, we can note that the X-diagram of V-1 is actually the horizontal flip of the X-diagram of “V“. And from the previous part of this series – “transpose of a matrix” [3], we remember that the horizontal flip of an X-diagram corresponds to the transpose of that matrix. This is why the inverse of a cyclic shift matrix is equal to its transpose:
[begin{equation*}
V^{-1} = V^T
end{equation*}]
Inverse of an exchange matrix
An exchange matrix, often denoted by “J“, is such a matrix, which when multiplied by an input vector ‘x‘, produces an output vector ‘y‘, having all the values of ‘x‘, but in reverse order. To achieve that, “J” has ‘1’s on its anti-diagonal, while all other cells are ‘0’s.
[begin{equation*}
begin{pmatrix}
y_1 \ y_2 \ y_3 \ y_4 \ y_5
end{pmatrix}
= y = Jx =
begin{bmatrix}
0 & 0 & 0 & 0 & 1 \
0 & 0 & 0 & 1 & 0 \
0 & 0 & 1 & 0 & 0 \
0 & 1 & 0 & 0 & 0 \
1 & 0 & 0 & 0 & 0
end{bmatrix}
*
begin{pmatrix}
x_1 \ x_2 \ x_3 \ x_4 \ x_5
end{pmatrix}
=
begin{pmatrix}
x_5 \ x_4 \ x_3 \ x_2 \ x_1
end{pmatrix}
end{equation*}]

Obviously, to undo this type of transformation, we should apply one more exchange matrix.
[
begin{equation*}
begin{pmatrix}
x_1 \ x_2 \ x_3 \ x_4 \ x_5
end{pmatrix}
= x = J^{-1}Jx =
begin{bmatrix}
0 & 0 & 0 & 0 & 1 \
0 & 0 & 0 & 1 & 0 \
0 & 0 & 1 & 0 & 0 \
0 & 1 & 0 & 0 & 0 \
1 & 0 & 0 & 0 & 0
end{bmatrix}
begin{bmatrix}
0 & 0 & 0 & 0 & 1 \
0 & 0 & 0 & 1 & 0 \
0 & 0 & 1 & 0 & 0 \
0 & 1 & 0 & 0 & 0 \
1 & 0 & 0 & 0 & 0
end{bmatrix}
begin{pmatrix}
x_1 \ x_2 \ x_3 \ x_4 \ x_5
end{pmatrix}
= J^{-1}y
end{equation*}]

This is why the inverse of an exchange matrix is the exchange matrix itself:
[begin{equation*}
J^{-1} = J
end{equation*}]
Inverse of a permutation matrix
A permutation matrix is such a matrix “P” which, when multiplied by an input vector ‘x‘, rearranges its values in a different order. To achieve that, an n*n-sized permutation matrix “P” has ‘n‘ 1(s), arranged in such a way that no two 1(s) appear on the same row or the same column. All other cells of “P” are 0(s).
[begin{equation*}
begin{pmatrix}
y_1 \ y_2 \ y_3 \ y_4 \ y_5
end{pmatrix}
= y = Px =
begin{bmatrix}
0 & 0 & 1 & 0 & 0 \
1 & 0 & 0 & 0 & 0 \
0 & 0 & 0 & 1 & 0 \
0 & 0 & 0 & 0 & 1 \
0 & 1 & 0 & 0 & 0
end{bmatrix}
*
begin{pmatrix}
x_1 \ x_2 \ x_3 \ x_4 \ x_5
end{pmatrix}
=
begin{pmatrix}
x_3 \ x_1 \ x_4 \ x_5 \ x_2
end{pmatrix}
end{equation*}]

Now, what type of matrix should be the inverse of a permutation matrix? In other words, how to undo the transformation of a permutation matrix “P“? Obviously, we need to do another rearrangement, which acts in reverse order. So, for example, if the input value x3 was moved by “P” to output value y1, then in the inverse permutation matrix P-1, the input value y1 should be moved back to output value x3. This means that when drawing X-diagrams of permutation matrices “P-1” and “P“, one will be the reflection of the other.

Similarly to the case of an exchange matrix, in the case of a permutation matrix, we can visually note that the X-diagrams of “P” and P-1 differ only by a horizontal flip. That is why the inverse of any permutation matrix “P” is equal to its transposition:
[begin{equation*}
P^{-1} = P^T
end{equation*}]
[begin{equation*}
begin{pmatrix}
x_1 \ x_2 \ x_3 \ x_4 \ x_5
end{pmatrix}
= x = P^{-1}Px =
begin{bmatrix}
0 & 1 & 0 & 0 & 0 \
0 & 0 & 0 & 0 & 1 \
1 & 0 & 0 & 0 & 0 \
0 & 0 & 1 & 0 & 0 \
0 & 0 & 0 & 1 & 0
end{bmatrix}
begin{bmatrix}
0 & 0 & 1 & 0 & 0 \
1 & 0 & 0 & 0 & 0 \
0 & 0 & 0 & 1 & 0 \
0 & 0 & 0 & 0 & 1 \
0 & 1 & 0 & 0 & 0
end{bmatrix}
begin{pmatrix}
x_1 \ x_2 \ x_3 \ x_4 \ x_5
end{pmatrix}
= P^{-1}y
end{equation*}]
Inverse of a rotation matrix
A rotation matrix on 2D plane is such a matrix “R“, which, when multiplied by a vector (x1, x2), rotates the point “x=(x1, x2)” counter-clockwise by a certain angle “ϴ” around the null-point. Its formula is:
[
begin{equation*}
begin{pmatrix}
y_1 \ y_2
end{pmatrix}
= y = Rx =
begin{bmatrix}
cos(theta) & -sin(theta) \
sin(theta) & phantom{+} cos(theta)
end{bmatrix}
*
begin{pmatrix}
x_1 \ x_2
end{pmatrix}
end{equation*}]

Now, what should be the inverse of a rotation matrix? How to undo the rotation produced by a matrix “R“? Obviously, that should be another rotation matrix, this time with an angle “-ϴ” (or “360°-ϴ“):
[begin{equation*}
R^{-1} =
begin{bmatrix}
cos(-theta) & -sin(-theta) \
sin(-theta) & phantom{+} cos(-theta)
end{bmatrix}
=
begin{bmatrix}
phantom{+} cos(theta) & sin(theta) \
-sin(theta) & cos(theta)
end{bmatrix}
=
R^T
end{equation*}]
Which is why the inverse of a rotation matrix is another rotation matrix. We also see that the inverse R-1 is equal to the transpose of the original matrix “R“.
Inverse of a triangular matrix
An upper-triangular matrix is a square matrix that has zeros below its diagonal. Because of that, in its X-diagram, there are no arrows directed downwards:

The horizontal arrows correspond to cells of the diagonal, while the arrows that are directed upwards correspond to the cells above the diagonal.
Similarly, the lower-triangular matrix is defined, which has zeroes above its main diagonal. In this article, we will concentrate only on upper-triangular matrices, as for lower-triangular ones, inversion is performed in an analogous way.
For simplicity, let’s at first address inverting a 2×2-sized upper-triangular matrix ‘A‘.

Once ‘A‘ is multiplied by an input vector ‘x‘, the result vector “y = Ax” has the following form:
[begin{equation*}
y =
begin{pmatrix}
y_1 \ y_2
end{pmatrix}
=
begin{bmatrix}
a_{1,1} & a_{1,2} \
0 & a_{2,2}
end{bmatrix}
begin{pmatrix}
x_1 \ x_2
end{pmatrix}
=
begin{pmatrix}
begin{aligned}
a_{1,1}x_1 + a_{1,2}x_2 \
a_{2,2}x_2
end{aligned}
end{pmatrix}
end{equation*}]
Now, when calculating the inverse matrix A-1, we want it to act in the reverse order:

How should we restore (x1, x2) from (y1, y2)? The first and simplest step is to restore x2, using only y2, because y2 was originally affected only by x2. We don’t need the value of y1 for that:

Next, how should we restore x1? This time, we can’t use only y1, because the value “y1 = a1,1x1 + a1,2x2” is kind of a mixture of x1 and x2. But we can restore x1 if using both y1 and y2 properly. This time, y2 will help to filter out the influence of x2, so the pure value of x1 can be restored:

We see now that the inverse A-1 of the upper-triangular matrix “A” is also an upper-triangular matrix.
What about triangular matrices of larger sizes? Let’s take this time a 3×3-sized matrix and find its inverse analytically.

Values of the output vector ‘y‘ are obtained now from ‘x‘ in the following way:
[
begin{equation*}
y =
begin{pmatrix}
y_1 \ y_2 \ y_3
end{pmatrix}
= Ax =
begin{bmatrix}
a_{1,1} & a_{1,2} & a_{1,3} \
0 & a_{2,2} & a_{2,3} \
0 & 0 & a_{3,3}
end{bmatrix}
begin{pmatrix}
x_1 \ x_2 \ x_3
end{pmatrix}
=
begin{pmatrix}
begin{aligned}
a_{1,1}x_1 + a_{1,2}x_2 + a_{1,3}x_3 \
a_{2,2}x_2 + a_{2,3}x_3 \
a_{3,3}x_3
end{aligned}
end{pmatrix}
end{equation*}]
As we are interested in building the inverse matrix A-1, our target is to find (x1, x2, x3), having the values of (y1, y2, y3):
[begin{equation*}
begin{pmatrix}
x_1 \ x_2 \ x_3
end{pmatrix}
= A^{-1}y =
begin{bmatrix}
text{?} & text{?} & text{?} \
text{?} & text{?} & text{?} \
text{?} & text{?} & text{?}
end{bmatrix}
*
begin{pmatrix}
y_1 \ y_2 \ y_3
end{pmatrix}
end{equation*}]
In other words, we must solve the system of linear equations mentioned above.
Doing that will restore at first the value of x3 as:
[begin{equation*}
y_3 = a_{3,3}x_3, hspace{1cm} x_3 = frac{1}{a_{3,3}} y_3
end{equation*}]
which will clarify cells of the last row of A-1 :
[begin{equation*}
begin{pmatrix}
x_1 \ x_2 \ x_3
end{pmatrix}
= A^{-1}y =
begin{bmatrix}
text{?} & text{?} & text{?} \
text{?} & text{?} & text{?} \
0 & 0 & frac{1}{a_{3,3}}
end{bmatrix}
*
begin{pmatrix}
y_1 \ y_2 \ y_3
end{pmatrix}
end{equation*}]
Having x3 figured out, we can bring all its occurrences to the left side of the system:
[begin{equation*}
begin{pmatrix}
y_1 – a_{1,3}x_3 \
y_2 – a_{2,3}x_3 \
y_3 – a_{3,3}x_3
end{pmatrix}
=
begin{pmatrix}
begin{aligned}
a_{1,1}x_1 + a_{1,2}x_2 \
a_{2,2}x_2 \
0
end{aligned}
end{pmatrix}
end{equation*}]
which will allow us to calculate x2 as:
[begin{equation*}
y_2 – a_{2,3}x_3 = a_{2,2}x_2, hspace{1cm}
x_2 = frac{y_2 – a_{2,3}x_3}{a_{2,2}} = frac{y_2 – (a_{2,3}/a_{3,3})y_3}{a_{2,2}}
end{equation*}]
This already clarifies the cells of the second row of A-1 :
[begin{equation*}
begin{pmatrix}
x_1 \ x_2 \ x_3
end{pmatrix}
= A^{-1}y =
begin{bmatrix}
text{?} & text{?} & text{?} \[0.2cm]
0 & frac{1}{a_{2,2}} & – frac{a_{2,3}}{a_{2,2}a_{3,3}} \[0.2cm]
0 & 0 & frac{1}{a_{3,3}}
end{bmatrix}
*
begin{pmatrix}
y_1 \ y_2 \ y_3
end{pmatrix}
end{equation*}]
Finally, having the values of x3 and x2 figured out, we can do the same trick of moving now x2 to the left side of the system:
[begin{equation*}
begin{pmatrix}
begin{aligned}
y_1 – a_{1,3}x_3 & – a_{1,2}x_2 \
y_2 – a_{2,3}x_3 & – a_{2,2}x_2 \
y_3 – a_{3,3}x_3 &
end{aligned}
end{pmatrix}
=
begin{pmatrix}
a_{1,1}x_1 \
0 \
0
end{pmatrix}
end{equation*}]
from which x1 will be derived as:
[begin{equation*}
begin{aligned}
& y_1 – a_{1,3}x_3 – a_{1,2}x_2 = a_{1,1}x_1, \
& x_1
= frac{y_1 – a_{1,3}x_3 – a_{1,2}x_2}{a_{1,1}}
= frac{y_1 – (a_{1,3}/a_{3,3})y_3 – a_{1,2}frac{y_2 – (a_{2,3}/a_{3,3})y_3}{a_{2,2}}}{a_{1,1}}
end{aligned}
end{equation*}]
so the first row of matrix A-1 will also be clarified:
[begin{equation*}
begin{pmatrix}
x_1 \ x_2 \ x_3
end{pmatrix}
= A^{-1}y =
begin{bmatrix}
frac{1}{a_{1,1}} & – frac{a_{1,2}}{a_{1,1}a_{2,2}} & frac{a_{1,2}a_{2,3} – a_{1,3}a_{2,2}}{a_{1,1}a_{2,2}a_{3,3}} \[0.2cm]
0 & frac{1}{a_{2,2}} & – frac{a_{2,3}}{a_{2,2}a_{3,3}} \[0.2cm]
0 & 0 & frac{1}{a_{3,3}}
end{bmatrix}
*
begin{pmatrix}
y_1 \ y_2 \ y_3
end{pmatrix}
end{equation*}]
After deriving A-1 analytically, we can see that it is also an upper-triangular matrix.
Paying attention to the sequence of actions that we used here to calculate A-1, we can say for sure now that the inverse of any upper-triangular matrix ‘A‘ is also an upper-triangular matrix:

An analogous judgment will show that the inverse of a lower-triangular matrix is another lower-triangular matrix.
A numerical example of inverting a chain of matrices
Let’s have another look at why, during an inversion of a chain of matrices, their order is reversed. Recalling the formula:
[begin{equation*}
(AB)^{-1} = B^{-1}A^{-1}
end{equation*}]
This time, for both ‘A‘ and ‘B‘, we will take certain types of matrices. The first matrix “A=V” will be a cyclic shift matrix:

Let’s recall here that to restore the input vector ‘x‘, the inverse V-1 should do the opposite – cyclic shift values of the argument vector ‘y‘ downwards:

The second matrix “B=S” will be a diagonal matrix with different values on its main diagonal:

The inverse S-1 of such a scale matrix, to restore the original vector ‘x‘, must halve only the first 2 values of its argument vector ‘y‘:

Now, what kind of behavior will the product matrix “VS” have? When calculating “y = VSx“, it will double only the first 2 values of the input vector ‘x‘, and cyclic shift the entire result upwards.

We know already that once the output vector “y = VSx” is calculated, to reverse the influence of the product matrix “VS” and to restore the input vector ‘x‘, we should do:
[begin{equation*}
x = (VS)^{-1}y = S^{-1}V^{-1}y
end{equation*}]
In other words, the order of matrices ‘V‘ and ‘S‘ should be reversed during inversion:

And what will happen if we try to invert the affection of “VS” in an improper way, without reversing the order of the matrices, assuming that V-1S-1 is what should be used for it:

We see that the original vector (x1, x2, x3, x4) from the right side is not restored at the left side now. Instead, we have vector
(2x1, x2, 0.5x3, x4) there. One reason for this is that the value x3 should not be halved on its path, but it actually gets halved because at the moment when matrix S-1 is applied, x3 appears at the second position from the top, which actually halves it. Same refers to the path of value x1. All that results in having an altered vector on the left side.
Conclusion
In this story, we have looked at matrix inversion operation A-1 as something that undoes the transformation of the given matrix “A“. We have observed why inverting a chain of matrices like (ABC)-1 actually reverses the order of multiplication, resulting in C-1B-1A-1. Also, we got a visual perspective on why inverting several special types of matrices results in another matrix of the same type.
Thanks for reading!
This is probably the last part of my “Understanding Matrices” series. I hope you enjoyed reading all 4 parts! If that is the case, feel free to follow me on LinkedIn, as hopefully other articles will be coming soon, and I’ll post updates there!
My gratitude to:
– Asya Papyan, for precise design of all the used illustrations ( behance.net/asyapapyan ).
– Roza Galstyan, for careful review of the draft, and useful suggestions ( linkedin.com/in/roza-galstyan-a54a8b352/ ).If you enjoyed reading this story, feel free to connect with me on LinkedIn ( linkedin.com/in/tigran-hayrapetyan-cs/ ).
All used images, unless otherwise noted, are designed by request of the author.
References:
[1] – Understanding matrices | Part 1: Matrix-Vector Multiplication
[2] – Understanding matrices | Part 2: Matrix-Matrix Multiplication