The Math You Need to Pan and Tilt 360° Images

0.

You’re certainly already familiar with spherical or 360 images. They’re used in Google Street View or in virtual house tours to give you an immersive feeling by letting you look around in any direction.

Since such images lie on the unit sphere, storing them in memory as flat images might be tricky. In practice, we usually store them as flat arrays using one of the two following formats:

Cubemap (6 images): each image corresponds to the face of a cube onto which the unit sphere has been projected.
Equirectangular image: similar to a planisphere map of the Earth. The south and north poles of the unit sphere are stretched to flatten the image onto a regular grid. Unlike the cubemap, the image is stored as a single image which simplifies the boundary handling during image processing. But this approach introduces significant distortion.

6 images of the cubemap with annotated faces — Figure by the author, from Understanding 360 Images

Equirectangular view of the cubemap with annotated faces — Figure by the author, from Understanding 360 Images

In a previous article (Understanding 360 images), I explained the math behind the conversion between these two formats. In this article we’ll focus instead only on the equirectangular format and investigate the math behind modifying the camera pose of an equirectangular image.

It’s a great opportunity to better understand spherical coordinates, rotation matrices and image remapping!

The images below illustrate the kind of transform we’d like to apply.

“What would my 360 image look like if it were tilted 20° downwards?”

360 image of the cubemap with annotated faces, tilted 20° downwards — Image by the author

“What would my 360 image look like if it were shifted 45° to the right?”

360 image of the cubemap with annotated faces, shifted 45° to the right — Image by the author

“What would my 360 image look like if it were shifted 45° to the right and tilted 20° downwards?”

360 image of the cubemap with annotated faces, shifted 45° to the right and tilted 20° downwards — Image by the author

N.B. The most widely used image coordinates system is to have the vertical Y axis pointing downwards in the image and the horizontal X axis pointing to the right. Thus, I find it more intuitive that a positive horizontal Δθ shift moves the image to the right, while a negative Δθ moves it to the left. However this counterintuitively implies that moving the image to the right corresponds to looking left in the actual scene! Similarly a positive vertical Δφ tilt moves the image downwards. The choice of convention is arbitrary, it doesn’t really matter.

1. Spherical Camera Model

Spherical coordinates

On a planisphere map of the Earth, horizontal lines correspond to latitudes while vertical lines correspond to longitudes.

When converting from cartesian to spherical coordinates, a point M in the scene is fully described by its radius r and its two angles θ and φ. These angles allow us to unwrap the sphere into an equirectangular image, where θ serves as the longitude and φ as the latitude.

Spherical coordinates with Theta around the Y axis and Phi around the X axis — Figure by the author, from Understanding 360 Images

I’ve arbitrarily chosen to use the Right_Down_Front XYZ camera convention (See my previous article about camera poses) and to have θ=φ=0 in front of us. Feel free to use another convention. You get the same image at the end anyway. I just find it more convenient.

The image below illustrates the convention we’re using, with θ varying horizontally on the equirectangular image from -π on the left, 0 at the center and +π on the right. Note that the left and right edges of the image smoothly extend each other. As for φ, the north pole is at -π/2 and the south pole is at π/2.

θ (longitude) and φ (latitude) — Figure by the author, from Understanding 360 Images

Mapping to pixel coordinates is then just an affine transform.

Rotation Matrices

When working with 3D rotations it can quickly get messy without using matrix form. Rotation matrices provide a convenient way to express rotations as plain matrix-vector multiplications.

In our Right_Down_Front XYZ camera convention (arbitrarily chosen), the rotation of angle φ around the X axis is described by the matrix below.

As you can see, this matrix leaves the X-axis unchanged since it’s its rotation axis.

Having cosines along the diagonal makes sense because φ=0 must yield the identity matrix.

As for the sign before the sines, I find it helpful to refer to the spherical coordinates diagram above and think about what would happen for a tiny positive φ. The point directly in front of the camera is (0,0,1), i.e. the tip of the front axis Z, and will thus be rotated into the last column of Rφ: (0, sinφ, cosφ). This gives us a vector close to Z but also with a tiny positive component along the Y axis, exactly as expected!

Similarly, we have the matrix describing the rotation of angle θ around the Y axis.

Conversion between cartesian and spherical coordinates

The point M with spherical coordinates (φ,θ) can be converted to 3D cartesian coordinates p by starting from the point (0,0,1) in front of the camera, tilting it by φ around the X axis and finally panning it by θ around the Y axis.

The equations below derive the cartesian coordinates by successively applying the rotation matrices on (0,0,1). The radius r has been omitted since we’re only interested in the unit sphere.

To recover the spherical angles (φ,θ) we simply have to apply the inverse trigonometric functions to the components of p. Note that since φ lies in [-π/2,π/2] we know that the factor cosφ is guaranteed to remain positive, which allows us to safely apply arctan2.

2. Tilt/Pan 360 image

Image remapping

Our goal is to transform an equirectangular image into another equirectangular image, that mimics a (Δφ, Δθ) angular shift. In Understanding 360 images I explained how to transform between equirectangular images and cubemaps. Fundamentally it’s the same process: a sampling task where we apply a transform to existing pixels to generate new pixels.

Since the transform can produce floating-point coordinates we have to use interpolation rather than just moving integer pixels.

It may sound counter-intuitive, but when remapping an image we actually need the reverse transform and not the transform itself. The determinant of the Jacobian of the transform defines how local density changes, which means there won’t always be a one-to-one correspondence between input and output pixels. If we were to apply the transform to each of the input pixels to populate the new image we could end up with huge holes in the transformed image because of density variations.

Thus we need to define the reverse transform, so that we can iterate over each output pixel coordinates, map them back to the input image and interpolate its color from the neighboring pixels.

Transition Matrix

We have two 3D coordinates systems:

Frame 0: Input 360 image
Frame 1: Output 360 image transformed by (Δφ, Δθ)

Each frame can define locally its own spherical angles.

We’re now looking for the 3×3 transition matrix between both frames.

The transition is defined by the fact that we want the center of the input image 0 to be mapped to the point at (Δφ, Δθ) in the output image 1.

It turns out that it’s precisely what we already do in the spherical coordinates system: mapping the point at (0,0,1) to given spherical angles. Thus we end up with the following transform.

The reverse transform is then:

Warning: Rotation matrices generally don’t commute when their rotation axes differ. Order matters!

This directly gives us the pose of the transformed camera. In fact, you can replace p1 by any base axis of frame 1 (1,0,0), (0,1,0) and (0,0,1) and look at where it lands in frame 0.

Thus, the camera is first panned by -Δθ and then tilted by -Δφ. This is exactly what you would do intuitively with your camera: orient yourself towards the target and then adjust the tilt. Reversing the order would result in a skewed or rolled camera orientation.

Reverse transform

Let’s develop the matrix form to end up with a clean closed-form expression for the reverse transform yielding angles 0 from angles 1.

First we substitute p1 with its definition to make φ1 and θ1 appear.

It turns out that both rotation matrices around the Y axis end up side-by-side and can be merged into a single rotation matrix of angle θ1-Δθ.

The right part of the equation corresponds to a spherical point of coordinates (φ1, θ1-Δθ). We substitute it by its explicit form.

We then substitute the remaining rotation matrix with its explicit form and perform the multiplication.

We finally use inverse trigonometric functions to retrieve (φ0,θ0).

Great! We now know where to interpolate within the input image.

Code

The code below loads a spherical image, selects arbitrary rotation angles (Δφ,Δθ), computes the reverse transform maps and finally applies them using cv2.remap to get the output transformed image.

N.B. is intended for pedagogical purposes. There is still room for performance optimization!

Camera Pan only

What if Δφ=0 and the transform is a pure pan?

When Δφ=0 the rotation matrix becomes the identity and the expression of p0 simplifies to its canonical form in spherical coordinates (φ1, θ1-Δθ).

The transform is straightforward and is just a plain subtraction on the θ angle. We’re basically simply rolling the image horizontally using a floating-point Δθ shift.

Camera Tilt only

What if Δθ=0 and the transform is a pure tilt?

When Δθ=0 the rotation matrix becomes the identity. But unfortunately it doesn’t change anything, we’ve just replaced θ1-Δθ by θ1 in the equation, that’s it.

3. Behavior at Image Boundaries

Introduction

It could be interesting to see how points at boundaries of the input 360 image are affected by the transform.

For instance, the south pole corresponds to φ0=π/2, which significantly simplifies the equations with cos(φ0)=0 and sin(φ0)=1. And we also know that the value of θ0 doesn’t matter since each pole is reduced to a single point.

Unfortunately, substituting φ0=π/2 into the final reverse transform formula derived above gives us a clearly non-trivial equation to solve for (φ1,θ1).

Classic mistake! Instead of using the reverse transform formula, it would be much simpler to use the forward transform. Let’s derive it.

Direct Transform

Unlike the reverse transform we can’t merge rotation matrices because pan and tilt rotations strictly alternate.

Let’s use the explicit form of p0 and the rotations matrices Δθ and Δφ. Since points (φ0,θ0) at the image boundary greatly simplify the equations thanks to their convenient cos(φ0) and sin(φ0) values, I’ve chosen to compute only the product of RΔφ and RΔθ to keep substitutions for trivial p0 values straightforward.

North/South Poles

The south pole is defined by φ0=π/2. We have cos(φ0)=0 and sin(φ0)=1 which simplifies the product into simply keeping the second column of the rotation matrix.

As expected θ0 doesn’t appear in the expression. Although the poles are infinitely stretched at the top/bottom of the spherical image they are still single points in 3D space.

Considering Δφ in [-π,π], we can replace φ1 by π/2-|Δφ|.

Graph of arcsin(cosΔφ) — Generated by the author on desmos

As for θ1 it will depend on the sign of sinΔφ. Note that on [-π,π] sinΔφ and Δφ have the same sign. Finally, we get:

When Δφ<0, the image tilts downwards and the camera starts looking upwards. As a result the south pole appears in front of the camera at θ1=Δθ. However, when Δφ>0 the camera starts looking downwards and the south pole moves to the back, which explains the θ1=Δθ+π.

The math for the north pole is very similar, we get:

In real life 360 images the south pole is easy to spot because it corresponds to the tripod. To make this clearer, I’ve added in the figure below a green band at the bottom of the input 360 image to mark the south pole and a magenta band at the top to highlight the north pole. The left column corresponds to negative Δφ angles, while the right column corresponds to positive Δφ angles.

Evolution of the south pole (in green) and north pole (in magenta) with respect to Δφ, with Δθ fixed at π/3 — Figure by the author

Left/Right Edge

The left and right edge of the 360 image coincide and correspond to θ0=±π, which means cosθ0=-1 and sinθ0=0.

Identifying the cosine difference identity helps simplifying the equations.

The inverse trigonometric functions give us:

As long as you remain on [-π/2,π/2], arcsin(sinx) is the identity function. But beyond this range, the function turns into a periodic triangular wave.

Graph of arcsin(sinx) — Generated by the author on desmos

As for θ1 it will be either Δθ or Δθ+π depending on the sign of cos(φ0-Δφ).

It’s actually really intuitive: the front and back edges are just the connections between the north and south poles. Since the poles translate along Δθ and Δθ+π the front/back edges simply follow them along these two vertical tracks.

In the images below I’ve highlighted the front edge in cyan and the back edge in red. Like before, the left column corresponds to negative Δφ angles, while the right column corresponds to positive Δφ angles.

Evolution of the back edge (in red), front edge (in cyan), south pole (in green) and north pole (in magenta) with respect to Δφ, with Δθ fixed at π/3 — Figure by the author

Photo by Dan Cristian Pădureț on Unsplash

Conclusion

I hope you enjoyed as much as I did thoroughly investigating what it actually means to pan or tilt a spherical image!

The math can get a bit verbose at times, but the matrix form helps a lot keeping things tidy and at the end the final formulas are not that long.

In the end it’s really satisfying to apply the transform onto the image with just a dozen of lines of code.

Once you truly understand how image remapping works you can easily use it for a lot of applications: converting between cubemaps and spherical images, undistorting images, stitching panoramas…