Understanding Matrices | Part 3: Matrix Transpose

the first 2 stories of this series [1], [2], we:

Introduced X-way interpretation of matrices
Observed physical meaning and special cases of matrix-vector multiplication
Looked at the physical meaning of matrix-matrix multiplication
Observed its behavior on several special cases of matrices

In this story, I want to share my thoughts about the transpose of a matrix, denoted as A^T, the operation that just flips the content of the square table around its diagonal.

An example of a 3×4 matrix “A”, and its transpose “A^T“.

In contrast to many other operations on matrices, it is quite easy to transpose a given matrix ‘A‘ on paper. However, the physical meaning of that often stays behind. On the other hand, it is not so clear why the following transpose-related formulas actually work:

(AB)^T = B^TA^T,
(y, Ax) = (x, A^Ty),
(A^TA)^T = A^TA.

In this story, I am going to give my interpretation of the transpose operation, which, among others, will show why the mentioned formulas are actually the way they are. So let’s dive in!

But first of all, let me remind all the definitions that are used throughout the stories of this series:

Matrices are denoted with uppercase (like ‘A‘, ‘B‘), while vectors and scalars are denoted with lowercase (like ‘x‘, ‘y‘ or ‘m‘, ‘n‘).
|x| – is the length of vector ‘x‘,
rows(A) – number of rows of matrix ‘A‘,
columns(A) – number of columns of matrix ‘A‘,
A^T – the transpose of matrix ‘A‘,
a^T_i_,j – the value on the i-th row and j-th column of the transposed matrix A^T,
(x, y) – dot product of vectors ‘x‘ and ‘y‘ (i.e. “x₁y₁ + x₂y₂ + … + x_ny_n“).

Transpose vs. X-way interpretation

In part 1 of this series – “matrix-vector multiplication” [1], I introduced the X-way interpretation of matrices. Let’s recall it with an example:

An example of a matrix and corresponding X-diagram. All arrows in the diagram are directed from right to left. The arrow which starts at item ‘j’ on the right and finishes at item ‘i’ on the left corresponds to cell “a_i,j” of the matrix.

From there, we also remember that the left stack of the X-diagram of ‘A‘ can be associated with rows of matrix ‘A‘, while its right stack can be associated with the columns.

In the X-diagram of matrix ‘A’, the values which go from the 3’rd from the top item of the right stack are the values of 3’rd column of ‘A’ (highlighted in red).
At the same time, the values which come to the 2’nd from the top item of the left stack are the values of 2’nd row of ‘A’ (highlighted in purple).

Now, if transposing a matrix is actually flipping the table around its main diagonal, it means that all the columns of ‘A‘ become rows in ‘A^T‘, and vice versa.

*The original matrix ‘A’ and its transpose ‘A^T‘. We see how the 3’rd column of ‘A’ becomes the 3’rd row in ‘A^T‘.*

And if transposing means changing the places of rows and columns, then perhaps we can do the same on the X-diagram? Thus, to swap rows and columns of the X-diagram, we should flip it horizontally:

Horizontal flip of the X-diagram of ‘A’ corresponds to the transpose of ‘A’. We see that the values adjacent to 3’rd from top item of right stack of original X-diagram (3’rd column of ‘A’), which are [9, 7, 14], are the same as values adjacent to 3’rd from top item of left stack of the flipped X-diagram (3’rd row of A^T).

Will the horizontally flipped X-diagram of ‘A‘ represent the X-diagram of ‘A^T‘? We know that cell “a_i_,j” is present in the X-diagram as the arrow starting from the j‘th item of the left stack, and directed towards the i‘th item of the right stack. After flipping horizontally, that same arrow will start now from the i‘th item of the right stack and will be directed to the j‘th item of the left stack.

*The value “a_1,3 = 9″ equals the value “a^T_3,1 = 9″.*

Which means that the definition of transpose “a_i_,j = a^T_j,i” does hold.

Concluding this chapter, we have seen that transposing matrix ‘A‘ is the same as horizontally flipping its X-diagram.

Transposing a chain of matrices

Let’s see how interpreting A^T as a horizontal flip of its X-diagram will help us to uncover the physical meaning of some transpose-related formulas. Let’s start with the following:

[begin{equation*}
(AB)^T = B^T A^T
end{equation*}]

which says that transposing the multiplication “A*B” is the same as multiplying transpositions A^T and B^T, but in reverse order. Now, why does the order actually become reversed?

From part 2 of this series – “matrix-matrix multiplication” [2], we remember that the matrix multiplication “A*B” can be interpreted as a concatenation of X-diagrams of ‘A‘ and ‘B‘. Thus, having:

y = (AB)x = A*(Bx)

will force the input vector ‘x‘ to go at first through the transformation of matrix ‘B‘, and then the intermediate result will go through the transformation of matrix ‘A‘, after which the output vector ‘y‘ will be obtained.

Moving input vector ‘x’ from right to left, through X-diagrams of ‘A’ and ‘B’. At first, after moving through the transformation of ‘B’, it becomes an intermediate vector ‘t = Bx’, which, after moving through the transformation of ‘A’, becomes the final vector ‘y = At = A(Bx)’.

And now the physical meaning of the formula “(AB)^T = B^TA^T” becomes clear: flipping horizontally the X-diagram of the product “A*B” will obviously flip the separate X-diagrams of ‘A‘ and the one of ‘B‘, but it also will reverse their order:

*Flipping horizontally 2 adjacent figures ‘A’ and ‘B’ will result in the horizontal flip of both figures separately (step 1), as well as in swapping their order (step 2).*

In the previous story [2], we have also seen that a cell c_i_,j of the product matrix ‘C=A*B‘ describes all the possible ways in which x_j of the input vector ‘x‘ can affect y_i of the output vector ‘y = (AB)x‘.

Concatenation of X-diagrams of ‘A’ and ‘B’, which corresponds to the product “A*B”. All 4 possible paths by which the input value ‘x₄‘ can affect the output value ‘y₂‘ are highlighted in red.

Now, when transposing the product “C=A*B“, thus calculating matrix C^T, we want to have the mirroring effect – so c^T_j,i will describe all possible ways by which y_j can affect x_i. And in order to get that, we should just flip the concatenation diagram:

If “C = A*B”, then the value of “c_2,4” corresponds to the sum of all 4 possible paths from ‘x₄‘ to ‘y₂‘ (highlighted in red). At the same time, it is equal to “c_2,4 = c^T_4,2“, which corresponds to the sum of the same 4 possible paths from ‘y₂‘ to ‘x₄‘, in the horizontally flipped concatenation of “A*B”, which is “B^TA^T“.

Of course, this interpretation can be generalized on transposing the product of several matrices:

[begin{equation*}
(ABC)^T = C^T B^T A^T
end{equation*}]

*Horizontally flipping 3 adjacent items ‘A’, ‘B’, and ‘C’ (not necessarily matrices), and reversing their order will have the effect of horizontally flipping the sequence “ABC” itself.*

Why A^TA is always symmetrical, for any matrix A

A symmetrical matrix ‘S‘ is such an nxn square matrix, where for any indexes i, j ∈ [1..n], we have ‘s_i_,j = s_j_,i‘. This means that it is symmetrical upon its diagonal, as well as that transposing it will have no effect.

*An example of a 4×4 symmetrical matrix. All values are symmetrical along the main diagonal. For example, “a_3,1 = a_1,3 = 16″.*

We see that transposing a symmetrical matrix will have no effect. So, a matrix ‘S‘ is symmetrical if and only if:

[begin{equation*}
S^T = S
end{equation*}]

Similarly, the X-diagram of a symmetrical matrix ‘S‘ has the property that it is not changed after a horizontal flip. That is because for any arrow s_i_,j we have an equal arrow s_j_,i there:

*An example of a 3×3 symmetrical matrix ‘S’ and its X-diagram. We have there ‘s_1,2 = s_2,1 = 4′. Corresponding arrows are highlighted.*

In matrix analysis, we have a formula stating that for any matrix ‘A‘ (not necessarily symmetrical), the product A^TA is always a symmetrical matrix. In other words:

[begin{equation*}
(A^T A)^T = A^T A
end{equation*}]

It is not straightforward to feel the correctness of this formula if looking at matrix multiplication in the traditional way. But its correctness becomes obvious if looking at matrix multiplication as the concatenation of their X-diagrams:

*Concatenation of ‘A^T‘ and ‘A’, which is a concatenation of two mirrored objects, is always a symmetrical object. Flipping such a concatenation horizontally will have no effect.*

What will happen if an arbitrary matrix ‘A‘ is concatenated with its horizontal flip A^T? The result A^TA will be symmetrical, as after a horizontal flip, the right factor ‘A‘ comes to the left side and is flipped, becoming A^T, while the left factor A^T comes to the right side and is also flipped, becoming ‘A‘.

This is why for any matrix ‘A‘, the product A^TA is always symmetrical.

Understanding why (y, Ax) = (x, A^Ty)

There is another formula in matrix analysis, stating that:

[begin{equation*}
(y, Ax) = (x, A^T y)
end{equation*}]

where “(u, v)” is the dot product of vectors ‘u‘ and ‘v‘:

[begin{equation*}
(u,v) = u_1 v_1 + u_2 v_2 + dots + u_n v_n
end{equation*}]

The dot product can be calculated only for vectors of equal length. Also, the dot product is not a vector but a single number. If trying to illustrate the dot product “(u, v)” in a way similar to X-diagrams, we can draw something like this:

*As the dot product is the accumulation of terms u_i*v_i , we can present it as the sum of all possible paths from the right endpoint to the left one.*

Now, what does the expression (y, Ax) actually mean? It is the dot product of vector ‘y‘ by the vector “Ax” (or by vector ‘x‘, which went through the transformation of “A“). For the expression (y, Ax) to make sense, we should have:

|x| = columns(A), and
|y| = rows(A).

At first, let’s calculate (y, Ax) formally. Here, every value y_i is multiplied by the i-th value of the vector Ax, denoted here as “(Ax)_i“:

[begin{equation*}
(Ax)_i = a_{i,1}x_1 + a_{i,2}x_2 + dots + a_{i,m}x_m
end{equation*}]

After one multiplication, we will have:

[begin{equation*}
y_i(Ax)_i = y_i a_{i,1}x_1 + y_i a_{i,2}x_2 + dots + y_i a_{i,m}x_m
end{equation*}]

And after summing all the terms by “i ∈ [1, n]”, we will have:

[begin{equation*}
begin{split}
(y, Ax) = y_1(Ax)_1 + y_2(Ax)_2 + dots + y_n(Ax)_n = \
= y_1 a_{1,1}x_1 + y_1 a_{1,2}x_2 + &dots + y_1 a_{1,m}x_m + \
+ y_2 a_{2,1}x_1 + y_2 a_{2,2}x_2 + &dots + y_2 a_{2,m}x_m + \
&vdots \
+ y_n a_{n,1}x_1 + y_n a_{n,2}x_2 + &dots + y_n a_{n,m}x_m
end{split}
end{equation*}]

which clearly shows that in the product (y, Ax), every cell a_i,j of the matrix “A” participates exactly once, together with the factors y_i and x_j.

Now let’s move to X-diagrams. If we want to draw something like an X-diagram of vector “Ax“, we can do it in the following way:

The product “Ax” is a vector of length equal to “|Ax| = rows(A)”, while “|x| = columns(A)”. Here, values of vector “x” are attached from the right side, and on the left side, we receive values of the result vector “Ax”.

Next, if we want to draw the dot product (y, Ax), we can do it this way:

*Values of vector ‘y’ are attached to the left side of the X-diagram of “A”, while values of vector ‘x’ remain attached to its right side.*

On this diagram, let’s see how many ways there are to reach the left endpoint from the right one. The path from right to left can pass through any arrow of A‘s X-diagram. If passing through a certain arrow a_i_,j, it will be the path composed of x_j, the arrow a_i_,j, and y_i.

*If a path from right to left passes through arrow “a_4,2” of the X-diagram of “A”, then it also passes through values “y₄” and “x₂“.*

And this exactly matches the formal behavior of (y, Ax) derived a bit above, where (y, Ax) was the sum of all triples of the form “y_i*a_i_,j*x_j“. And we can conclude here that if looking at (y, Ax) in the X-interpretation, it is equal to the sum of all possible paths from the right endpoint to the left one.

Now, what will happen if we flip this entire diagram horizontally?

*Horizontally flipping the X-diagram of “(y, Ax)” results in the X-diagram of “(x, A*^Ty)”.

From the algebraic perspective, the sum of all paths from right to left will not change, as all participating terms remain the same. But looking from the geometrical perspective, the vector ‘y‘ goes to the right part, the vector ‘x‘ comes to the left part, and the matrix “A” is being flipped horizontally; in other words, “A” is transposed. So the flipped X-diagram corresponds to the dot product of vectors “x” and “A^Ty” now, or has the value of (x, A^Ty). We see that both (y, Ax) and (x, A^Ty) represent the same sum, which proves that:

[begin{equation*}
(y, Ax) = (x, A^T y)
end{equation*}]

Conclusion

That’s all I wanted to present in regard to the matrix transpose operation. I hope that the visual methods illustrated above will help all of us to gain a better grasp of various matrix operations.

In the next (and probably the last) story of this series, I will address inverting matrices, and how it can be visualized by X-interpretation. We will see why formulas like “(AB)^-1 = B^-1A^-1” are the way they actually are, and we will observe how the inverse works on several special types of matrices.

So see you in the next story!

My gratitude to:

– Asya Papyan, for the precise design of all the used illustrations (linkedin.com/in/asya-papyan-b0a1b0243/),
– Roza Galstyan, for careful review of the draft (linkedin.com/in/roza-galstyan-a54a8b352/).

If you enjoyed reading this story, feel free to follow me on LinkedIn, where, among other things, I will also post updates (linkedin.com/in/tigran-hayrapetyan-cs/).

All used images, unless otherwise noted, are designed by request of the author.

References

[1] – Understanding matrices | Part 1: matrix-vector multiplication – https://towardsdatascience.com/understanding-matrices-part-1-matrix-vector-multiplication/

[2] – Understanding matrices | Part 2: matrix-matrix multiplication – https://towardsdatascience.com/understanding-matrices-part-2-matrix-matrix-multiplication/

Understanding Matrices | Part 3: Matrix Transpose

Transpose vs. X-way interpretation

Transposing a chain of matrices

Why ATA is always symmetrical, for any matrix A

Understanding why (y, Ax) = (x, ATy)

Conclusion

References

Related Posts

NumPy API on a GPU?

Things I Wish I Had Known Before Starting ML

Leave a Reply Cancel reply

Why A^TA is always symmetrical, for any matrix A

Understanding why (y, Ax) = (x, A^Ty)