the first 2 stories of this series [1], [2], we:
- Introduced X-way interpretation of matrices
- Observed physical meaning and special cases of matrix-vector multiplication
- Looked at the physical meaning of matrix-matrix multiplication
- Observed its behavior on several special cases of matrices
In this story, I want to share my thoughts about the transpose of a matrix, denoted as AT, the operation that just flips the content of the square table around its diagonal.
In contrast to many other operations on matrices, it is quite easy to transpose a given matrix ‘A‘ on paper. However, the physical meaning of that often stays behind. On the other hand, it is not so clear why the following transpose-related formulas actually work:
- (AB)T = BTAT,
- (y, Ax) = (x, ATy),
- (ATA)T = ATA.
In this story, I am going to give my interpretation of the transpose operation, which, among others, will show why the mentioned formulas are actually the way they are. So let’s dive in!
But first of all, let me remind all the definitions that are used throughout the stories of this series:
- Matrices are denoted with uppercase (like ‘A‘, ‘B‘), while vectors and scalars are denoted with lowercase (like ‘x‘, ‘y‘ or ‘m‘, ‘n‘).
- |x| – is the length of vector ‘x‘,
- rows(A) – number of rows of matrix ‘A‘,
- columns(A) – number of columns of matrix ‘A‘,
- AT – the transpose of matrix ‘A‘,
- aTi,j – the value on the i-th row and j-th column of the transposed matrix AT,
- (x, y) – dot product of vectors ‘x‘ and ‘y‘ (i.e. “x1y1 + x2y2 + … + xnyn“).
Transpose vs. X-way interpretation
In part 1 of this series – “matrix-vector multiplication” [1], I introduced the X-way interpretation of matrices. Let’s recall it with an example:

From there, we also remember that the left stack of the X-diagram of ‘A‘ can be associated with rows of matrix ‘A‘, while its right stack can be associated with the columns.

At the same time, the values which come to the 2’nd from the top item of the left stack are the values of 2’nd row of ‘A’ (highlighted in purple).
Now, if transposing a matrix is actually flipping the table around its main diagonal, it means that all the columns of ‘A‘ become rows in ‘AT‘, and vice versa.

And if transposing means changing the places of rows and columns, then perhaps we can do the same on the X-diagram? Thus, to swap rows and columns of the X-diagram, we should flip it horizontally:

Will the horizontally flipped X-diagram of ‘A‘ represent the X-diagram of ‘AT‘? We know that cell “ai,j” is present in the X-diagram as the arrow starting from the j‘th item of the left stack, and directed towards the i‘th item of the right stack. After flipping horizontally, that same arrow will start now from the i‘th item of the right stack and will be directed to the j‘th item of the left stack.

Which means that the definition of transpose “ai,j = aTj,i” does hold.
Concluding this chapter, we have seen that transposing matrix ‘A‘ is the same as horizontally flipping its X-diagram.
Transposing a chain of matrices
Let’s see how interpreting AT as a horizontal flip of its X-diagram will help us to uncover the physical meaning of some transpose-related formulas. Let’s start with the following:
[begin{equation*}
(AB)^T = B^T A^T
end{equation*}]
which says that transposing the multiplication “A*B” is the same as multiplying transpositions AT and BT, but in reverse order. Now, why does the order actually become reversed?
From part 2 of this series – “matrix-matrix multiplication” [2], we remember that the matrix multiplication “A*B” can be interpreted as a concatenation of X-diagrams of ‘A‘ and ‘B‘. Thus, having:
y = (AB)x = A*(Bx)
will force the input vector ‘x‘ to go at first through the transformation of matrix ‘B‘, and then the intermediate result will go through the transformation of matrix ‘A‘, after which the output vector ‘y‘ will be obtained.

And now the physical meaning of the formula “(AB)T = BTAT” becomes clear: flipping horizontally the X-diagram of the product “A*B” will obviously flip the separate X-diagrams of ‘A‘ and the one of ‘B‘, but it also will reverse their order:

In the previous story [2], we have also seen that a cell ci,j of the product matrix ‘C=A*B‘ describes all the possible ways in which xj of the input vector ‘x‘ can affect yi of the output vector ‘y = (AB)x‘.

Now, when transposing the product “C=A*B“, thus calculating matrix CT, we want to have the mirroring effect – so cTj,i will describe all possible ways by which yj can affect xi. And in order to get that, we should just flip the concatenation diagram:

Of course, this interpretation can be generalized on transposing the product of several matrices:
[begin{equation*}
(ABC)^T = C^T B^T A^T
end{equation*}]

Why ATA is always symmetrical, for any matrix A
A symmetrical matrix ‘S‘ is such an nxn square matrix, where for any indexes i, j ∈ [1..n], we have ‘si,j = sj,i‘. This means that it is symmetrical upon its diagonal, as well as that transposing it will have no effect.

We see that transposing a symmetrical matrix will have no effect. So, a matrix ‘S‘ is symmetrical if and only if:
[begin{equation*}
S^T = S
end{equation*}]
Similarly, the X-diagram of a symmetrical matrix ‘S‘ has the property that it is not changed after a horizontal flip. That is because for any arrow si,j we have an equal arrow sj,i there:

In matrix analysis, we have a formula stating that for any matrix ‘A‘ (not necessarily symmetrical), the product ATA is always a symmetrical matrix. In other words:
[begin{equation*}
(A^T A)^T = A^T A
end{equation*}]
It is not straightforward to feel the correctness of this formula if looking at matrix multiplication in the traditional way. But its correctness becomes obvious if looking at matrix multiplication as the concatenation of their X-diagrams:

What will happen if an arbitrary matrix ‘A‘ is concatenated with its horizontal flip AT? The result ATA will be symmetrical, as after a horizontal flip, the right factor ‘A‘ comes to the left side and is flipped, becoming AT, while the left factor AT comes to the right side and is also flipped, becoming ‘A‘.
This is why for any matrix ‘A‘, the product ATA is always symmetrical.
Understanding why (y, Ax) = (x, ATy)
There is another formula in matrix analysis, stating that:
[begin{equation*}
(y, Ax) = (x, A^T y)
end{equation*}]
where “(u, v)” is the dot product of vectors ‘u‘ and ‘v‘:
[begin{equation*}
(u,v) = u_1 v_1 + u_2 v_2 + dots + u_n v_n
end{equation*}]
The dot product can be calculated only for vectors of equal length. Also, the dot product is not a vector but a single number. If trying to illustrate the dot product “(u, v)” in a way similar to X-diagrams, we can draw something like this:

Now, what does the expression (y, Ax) actually mean? It is the dot product of vector ‘y‘ by the vector “Ax” (or by vector ‘x‘, which went through the transformation of “A“). For the expression (y, Ax) to make sense, we should have:
|x| = columns(A), and
|y| = rows(A).
At first, let’s calculate (y, Ax) formally. Here, every value yi is multiplied by the i-th value of the vector Ax, denoted here as “(Ax)i“:
[begin{equation*}
(Ax)_i = a_{i,1}x_1 + a_{i,2}x_2 + dots + a_{i,m}x_m
end{equation*}]
After one multiplication, we will have:
[begin{equation*}
y_i(Ax)_i = y_i a_{i,1}x_1 + y_i a_{i,2}x_2 + dots + y_i a_{i,m}x_m
end{equation*}]
And after summing all the terms by “i ∈ [1, n]”, we will have:
[begin{equation*}
begin{split}
(y, Ax) = y_1(Ax)_1 + y_2(Ax)_2 + dots + y_n(Ax)_n = \
= y_1 a_{1,1}x_1 + y_1 a_{1,2}x_2 + &dots + y_1 a_{1,m}x_m + \
+ y_2 a_{2,1}x_1 + y_2 a_{2,2}x_2 + &dots + y_2 a_{2,m}x_m + \
&vdots \
+ y_n a_{n,1}x_1 + y_n a_{n,2}x_2 + &dots + y_n a_{n,m}x_m
end{split}
end{equation*}]
which clearly shows that in the product (y, Ax), every cell ai,j of the matrix “A” participates exactly once, together with the factors yi and xj.
Now let’s move to X-diagrams. If we want to draw something like an X-diagram of vector “Ax“, we can do it in the following way:

Next, if we want to draw the dot product (y, Ax), we can do it this way:

On this diagram, let’s see how many ways there are to reach the left endpoint from the right one. The path from right to left can pass through any arrow of A‘s X-diagram. If passing through a certain arrow ai,j, it will be the path composed of xj, the arrow ai,j, and yi.

And this exactly matches the formal behavior of (y, Ax) derived a bit above, where (y, Ax) was the sum of all triples of the form “yi*ai,j*xj“. And we can conclude here that if looking at (y, Ax) in the X-interpretation, it is equal to the sum of all possible paths from the right endpoint to the left one.
Now, what will happen if we flip this entire diagram horizontally?

From the algebraic perspective, the sum of all paths from right to left will not change, as all participating terms remain the same. But looking from the geometrical perspective, the vector ‘y‘ goes to the right part, the vector ‘x‘ comes to the left part, and the matrix “A” is being flipped horizontally; in other words, “A” is transposed. So the flipped X-diagram corresponds to the dot product of vectors “x” and “ATy” now, or has the value of (x, ATy). We see that both (y, Ax) and (x, ATy) represent the same sum, which proves that:
[begin{equation*}
(y, Ax) = (x, A^T y)
end{equation*}]
Conclusion
That’s all I wanted to present in regard to the matrix transpose operation. I hope that the visual methods illustrated above will help all of us to gain a better grasp of various matrix operations.
In the next (and probably the last) story of this series, I will address inverting matrices, and how it can be visualized by X-interpretation. We will see why formulas like “(AB)-1 = B-1A-1” are the way they actually are, and we will observe how the inverse works on several special types of matrices.
So see you in the next story!
My gratitude to:
– Asya Papyan, for the precise design of all the used illustrations (linkedin.com/in/asya-papyan-b0a1b0243/),
– Roza Galstyan, for careful review of the draft (linkedin.com/in/roza-galstyan-a54a8b352/).If you enjoyed reading this story, feel free to follow me on LinkedIn, where, among other things, I will also post updates (linkedin.com/in/tigran-hayrapetyan-cs/).
All used images, unless otherwise noted, are designed by request of the author.
References
[1] – Understanding matrices | Part 1: matrix-vector multiplication – https://towardsdatascience.com/understanding-matrices-part-1-matrix-vector-multiplication/
[2] – Understanding matrices | Part 2: matrix-matrix multiplication – https://towardsdatascience.com/understanding-matrices-part-2-matrix-matrix-multiplication/