In this post we look at the transformation matrix with all steps.

## Introduction

The vertices of the 3D scene are stored in static arrays, and then in "buffers", by the Javascript code. In order to render the scene as viewed by an observer located at an arbitrary position in the scene, the vertex coordinates must be transformed such that the visible part of the scene is in the viewable cube (or view "frustum"). The necessary transformations are summarized hereafter:

- The 3 translations correspond to the 3 degrees of freedom of the observer in pure translation
- The scaling represents a zoom capability of the observer
- The 3 rotations correspond to the 3 degrees of freedom in rotation
- The projection is necessary because the visualization device we use is a 2D monitor
- The aspect ratio must be corrected due to the default scaling used to fill the WebGL context

The next sections describe these transformations step-by-step starting with a reminder about homogeneous coordinates.

## Homogeneous coordinates

A coordinate transformation in the 3D space is a function that takes a 3-components vector as input and returns a transformed 3-components vector. In general, such a function could be anything (square, exponentiation, etc) and would transform the space in any arbitrary way. If the transforming function is chosen to be a single matrix multiplication, several common transformation can be realized: rotation, scaling and symmetry in the 3D space can all be implemented as 3×3 matrix multiplications. Furthermore, because it is a linear operation, several successive transformations can be combined into one matrix multiplication, the necessary matrix being the product of the matrices of the successive operations. Consequently, because only one matrix operation is necessary, the method is well adapted to massive computations in the GPU.

The only problem with 3×3 matrix multiplications is that some operations that are necessary to render a 3D scene, namely translations and projections, cannot be realized.

Homogeneous coordinates allow to concentrate all transformations into one matrix multiplication by adding an extra dimension (not the time!), such that:

- All operations are done on 4-components vectors, using 4×4 matrix multiplications
- The "homogeneous coordinates" of a point are hence the 4 components of a vector (x,y,z,w), where x, y, z are the coordinates of the point in the space, and w (from "weight") an appended scale factor set to 1 at the beginning, which is used only for internal computation
- 3D space coordinates are obtained back by dividing the 3 first components by the 4th one:
`X=x/w , Y=y/w, Z=z/w`

Several areas can be identified in the 4×4 matrix (in wxMaxima: `mat:matrix( [Mxx,Mxy,Mxz,Cx], [Myx,Myy,Myz,Cy], [Mzx,Mzy,Mzz,Cz], [Lx,Ly,Lz,W]);`

):

The coefficients in the matrix can be interpreted in independent groups (provided the other coefficients are set to 0, except the diagonal set to 1):

`Mij`: the 3×3 matrix used for the common linear transformations`Ci`: after multiplication by a point vector`(x,y,z,w)`, the result is`(x + Cx w, y + Cy w, z + Cz w, w)`, hence these coefficients are used to add constant values i.e. for translations`Lj`: after multiplication by the point vector, the scale factor becomes a linear combination of the point coordinates (`Lx x + Ly y + Lz z + W w`), i.e. it allows to divide the coordinates among them (when 3D coordinates are transformed back), what is useful for projection.

The next sections illustrate how the space transformations are implemented in 4×4 matrices.

In addition, homogeneous coordinates support operations on points that are infinitely far. Indeed if the scale factor is near 0, the last 4D to 3D transformation (division by the scale factor) will result in values nearing infinity. But before this last transformation, all operations are done on non infinite values without risk of errors.

## Observer position and orientation

We define the observer with the following parameters:

*(ox,oy,oz)*: the point looked at*(rx,ry,rz)*: orientation of the observer around the axes*d*: the distance between the observer and the point looked at

## Translation

A translation is necessary to follow the movements of the observer. The general form a of a translation matrix with homogeneous coordinates is as following:

Multiplied by the point coordinates given by the column-vector (x,y,z,w), we obtain:

As usual with homogeneous coordinates, the transformation is completed only after the last operation of dividing the three first components by *w *(that should be equal to 1 at this stage):

Here we define the translation as a change of the origin (0,0,0) of the model. The new origin is noted *(ox,oy,oz)* and corresponds to the point the observer is located (or better said is looking at). The translation matrix is as following (in wxMaxima: `translation:matrix( [1,0,0,-ox], [0,1,0,-oy], [0,0,1,-oz], [0,0,0,1]);`

):

Negative values are due to the definition we use: the model is translated from the *(ox,oy,oz)* point to the (0,0,0) point, i.e. in the opposite direction as the vector *(ox,oy,oz)*.

## Rotation Basics

Basically, a rotation is a transformation of the plane. In the plane formed by the X and Y axes, the rotation around the Z axis makes a point turn around the origin according to a given angle:

Let’s take the point with polar coordinates (r,α). The Cartesian coordinates are given by:

`x = r * cos(α)`

y = r * sin(α)

In polar coordinates, a rotation around the origin is simply done by changing the angle component. Hence in our example the new point (x’,y’) after rotation by the angle theta is given by:

`x´ = r * cos( α + θ )`

y´ = r * sin( α + θ )

By applying the angle addition formulas:

`x´ = r * ( cos(α) * cos(θ) - sin(α) * sin(θ) )`

y´ = r * ( sin(α) * cos(θ) + cos(α) * sin(θ) )

Now we can re-order the terms to let appear the Cartesian coordinates of the original point:

`x´ = x * cos(θ) - y * sin(θ)`

y´ = x * sin(θ) + y * cos(θ)

This is the standard form of a 2D rotation, given hereafter as a multiplication matrix:

There is another way to obtain this formula by using complex numbers. Indeed, a multiplication by a complex number of the form r*eiθ results in a rotation of angle θ and a scaling of factor r. So we rewrite the original point as *x+iy*, and multiply it with the complex number of angle θ and modulus 1 (defined using trigonometric functions) to get the transformed point:

`x' + iy' = ( x + iy ) * ( cos(θ) + i sin(θ) )`

After development (remember i2=-1) and re-ordering:

`x' + iy' = x cos(θ) - y sin(θ) + i ( x sin(θ) + y cos(θ) )`

The real and imaginary values must be identified with x’ and y’, and then we get the same formula as with the first method.

## Rotations of the observer

Rotations are needed due to the orientation of the observer. In our definition, the observer rotates around the point looked at. Taken separately, each single rotation can be understood as a rotation around one axis, defined by angles such that positive values represent:

*rx*: goes down, look towards top*ry*: goes right, look towards left*rz*: rolls to the right staying at the same place

In order to simplify the matrices, we set:

*sx = sin(rx)**cx = cos(rx)**sy = sin(ry)**cy = cos(ry)**sz = sin(rz)**cz = cos(rz)*

Using the results of the previous paragraphs, the rotations around the axes can be represented in homogeneous coordinates as following, starting with the rotation around Z (in wxMaxima: `rotz:matrix( [cz,-sz,0,0], [sz,cz,0,0], [0,0,1,0], [0,0,0,1]);`

):

The rotation around X and Y are obtained by permutation, i.e. by moving the 2×2 rotation matrix within the 4×4 identity matrix. The rotation matrix around X (in wxMaxima: `rotx:matrix( [1,0,0,0], [0,cx,-sx,0], [0,sx,cx,0], [0,0,0,1]);`

):

Around Y (in wxMaxima: `roty:matrix( [cy,0,sy,0], [0,1,0,0], [-sy,0,cy,0], [0,0,0,1]);`

):

The different rotations must be combined, what is descried in a section below.

## Scaling and aspect ratio

Two scaling operations are needed: one for the zooming factor of the observer, the other to correct the X/Y distortion due to the standard rendering mechanism.

The zoom is an isotropic scaling, i.e. the 3 dimensions are scaled equally. Such a transformation is represented by the following matrix (in wxMaxima: `scale:matrix( [s,0,0,0], [0,s,0,0], [0,0,s,0], [0,0,0,1]);`

):

The correction of the aspect ratio is anisotropic: the Z coordinate remains unchanged and either the X or Y coordinate must be scaled such that a square is rendered as a square and not as a rectangle. Here we choose to leave the Y coordinate unchanged and to adapt the X coordinates (horizontal compression). Without correction, the point at (1,1) would be rendered at the top-right corner of the view port (i.e. with x equal to half of the the width of the context) . Once corrected, the same point must rendered at a width *x’* such that it equals half of the height of the context (hence no distortion). Formally:

`x' = context_height/2`

x = context_width/2

By dividing the first formula by the second and re-ordering:

`x' = x * context_height / context_width`

We define the aspect ratio as the quotient of the width by the height (remember the common "16/9" screen dimension), i.e. it will be greater than 1 for a standard screen and equal to 1 for a square view port:

`ar = context_width / context_height`

With this definition, the matrix used for the correction aspect ratio is (with wxMaxima: `aspect_ratio:matrix([1,0,0,0],[0,1/ar,0,0],[0,0,1,0],[0,0,0,1]);`

):

## Combining rotations, scaling and translations

The transformations seen so far can be combined in a basic transformation matrix, i.e. without the perspective projection part. Because the matrix product is not a commutative operation, the order in which matrices are multiplied is important. Some rules can be deduced from the above sections to determine possible combinations:

- The rotation around the Z axis is better after the rotations around X and Y, such that it is a final rotation of the rendered plane
- Scaling and rotations can be combined in any order
- The translation must be the first operation for it moves the center of all subsequent operations

We choose the following order: first translations, then rotations, then scaling and aspect ratio.

A very important point to note about the way WebGL (as well as OpenGL) uses arrays: matrices are flattened in linear arrays according to the "column-major order". This means that data is stored one column after the other, and not row by row. Because wxMaxima returns matrices in row-major order, the matrices we use must be first transposed before being passed to any WebGL function. We obtain the wished matrix with wxMaxima using the previously entered matrices and the final command `transpose(aspect_ratio. scale. rotz. roty. rotx. translation);`

:

This basic transformation matrix is implemented in the Javascript code in addition to the full one (see section below).

## Perspective projection

The projection is needed to render a 3D scene onto a 2D screen, it is hence a transformation of the space to a plane. The projection plane is typically chosen perpendicular to the view direction. In our basic transformation matrix, the depth coordinate is just ignored to render the model (we keep *(x,y)* from *(x,y,z)*). This kind of projection, called "orthographic" (it can be shown that it conserves parallelism and relative angles), cannot render realistically a scene with perspective because the depth information is simply lost.

To produce a perspective effect, i.e. such that far objects are rendered smaller than near ones, the depth information must be combined with the other coordinates in the projection computation. The following picture illustrates what happens on the Y coordinates during the projection of a point to the screen in the general case:

Here we look at the positions of the objects in the plane ZY, i.e. we see the observer and what it observes from the side. The observer is located at *(0,0)* and the point to be rendered (real position) is at *(Z,Y).* We set the "projection screen" at a distance *d* of the observer. Very naturally, we find that the observer "sees" the point to be rendered through the projection screen at a position *Y’* such that (using the Thales theorem a.k.a. similar triangles):

`Y' / d = Y / Z`

Or, after re-ordering:

`Y' = d * Y / Z`

Several things can be seen in this formula:

- The main operation is a division by
*Z*, what matches the expected behavior (the bigger Z, the smaller are the rendered objects) - The parameter
*d*has the effect of a zoom factor, and due to this distance*d*there is always a translation to take into account for a correct positioning of the camera. These unwanted scaling and translation make it difficult to predict the final position of rendered points. - If the observer is far away from the projection screen, i.e.
*d*and*Z*are high hence the ratio*d/Z≈1*, the result of the projection*Y’≈Y*is almost the same as without projection. Consequently, if*d*is high, the perspective effect is minimized (like for an orthographic projection).

Besides, in order to fit in the homogeneous coordinates concept, the projection must be represented as a matrix multiplication. As an illustration, in the simplest case (no translation, no scaling), a projection is a product by a matrix of the following form (in wxMaxima: `pure_projection:matrix( [1,0,0,0], [0,1,0,0], [0,0,1,0], [0,0,1,0]);`

):

Multiplied by the column-vector coordinates of a point *(x,y,z,1)* (or in wxMaxima `point:matrix([x],[y],[z],[1]);`

), the result is (`pure_projection.point;`

):

The division by Z appears only after the homogeneous coordinates are transformed back into 3D space coordinates (division by the 4th coordinate) :

The main issue with this simple projection matrix is that the Z coordinate equals 1 at the end, and is thus lost. The Z value is necessary to correctly display overlapping faces according to their depths (the nearest faces must be drawn last).

We will hence use a matrix that allows keeping the Z coordinate, and at the same time make the result more predictable. The goal here is to find a projection matrix such that a given set of points remain invariant during the projection, that the parameter *d* (distance between observer and projection screen) controls only the perspective effect without zoom (for which there is an explicit scaling operation as seen in a section above), and that the Z coordinates of all points in the visible interval are scaled to the standard [-1,1] interval.

The perspective projection is formally defined with the following requirements:

- The projection screen shall be at a distance
*d*of the observer*.* - Points in the projection screen at Z=0 shall be invariant in the operation (no scaling, etc)
- Points at Z=n (near plane, i.e. points nearer than this plane are not displayed) shall have a coordinate Z’=-1 after projection
- Points at Z=f (far plane) shall have a coordinate Z’=+1 after projection

Requirement 1 implies a translation on the Z axis. Because the translated value of the *z* coordinate must be used by later operations on all other coordinates, the translation is isolated in a dedicated pre-projection matrix (`pre_projection:matrix( [1,0,0,0], [0,1,0,0], [0,0,1,d], [0,0,0,1]);`

):

The multiplication by a point *(x,y,z,1)* gives (`pre_projected_point:pre_projection.point;`

):

We now look for a matrix containing unknowns that can answer our needs after the first translation on Z, i.e. a matrix including a scaling on X and Y, an additional scaling factor and a projection factor. We assume that the following matrix is usable for this task (in wxMaxima `conjectured_projection:matrix( [A,0,0,0], [0,A,0,0], [0,0,B,C], [0,0,1,0]);`

):

The product of the coordinates of a point *(x,y,z,1)* by the pre-projection matrix and then by our conjectured matrix gives (`conjectured_projected_point: conjectured_projection. pre_projected_point;`

):

After the last transformation from homogeneous coordinates to space coordinates (division by the 4th component), it comes (`conjectured_projected_point_3D: matrix( [conjectured_projected_point[1][1]], [conjectured_projected_point[2][1]], [conjectured_projected_point[3][1]])/ conjectured_projected_point[4][1];`

):

We extract the first and third components (`xcoord:conjectured_projected_point_3D[1][1];`

and `zcoord:conjectured_projected_point_3D[3][1];`

) and use them to create a system of equations by substituting values given by Requirements 2, 3 and 4 (in wxMaxima `equations: [subst(0,z,xcoord)=x, subst(f,z,zcoord)=1, subst(n,z,zcoord)=-1];`

):

This is a common system of linear equations showing corner conditions if we want to keep the sign of all factors:

*d>0*: If the distance between eye and projection screen equals 0, all points are at the infinite. If lower than 0, everything is inverted.*n+d>0*: If the distance*d*is smaller than the near plane, the coordinates of the points between the observer and the near plane will be inverted.*f+d>0*: Same as the condition on the near plane.

It can be solved by wxMaxima (`solutions: solve(equations,[A,B,C]);`

):

The results are quite complex and show the obvious corner condition *f-n>0*, i.e. near and far planes are placed in increasing Z coordinates (the Z axis is pointed towards the back of the rendering cube).

## Full transformation matrix

Now that the last operation is defined, it can be combined as the last step. The product and the final transposition are done in wxMaxima with the code `full_transformation:transpose(conjectured_projection. pre_projection. aspect_ratio. scale. rotz. roty. rotx. translation);`

:

## Exchange of X and Z

For a better understanding of the projection operation, we add a final matrix that only exchanges the X and Z coordinates (`exchange_xz:matrix( [0,0,1,0], [0,1,0,0], [1,0,0,0], [0,0,0,1]);`

):

This operation must be activated only for didactic purposes. It shows the projection operation from the side. The resulting matrix is given hereafter (`full_transformation_exz: transpose(exchange_xz. conjectured_projection. pre_projection. aspect_ratio. scale. rotz. roty. rotx. translation);`

):

Depending on the activation of the last XZ exchange, the wished matrix is selected.

## Javascript code

We use wxMaxima to get the results in a C-like syntax and insert the formulas in `getTransformationMatrix`

. As the distance to camera must be greater than the near plane, we use the transformation matrix without projection for smaller values of *d*, and a third matrix for a projection including the final XZ exchange:

// Returns a transformation matrix as a flat array with 16 components, given: // ox, oy, oz: new origin (translation) // rx, ry, rz: rotation angles (radians) // s: scaling factor // d: distance between camera and origin after translation, // if d <= -n skips projection completely // f: z coordinate of far plane (normally positive) // n: z coordinate of near plane (normally negative) // ar: aspect ratio of the viewport (e.g. 16/9) // exz: if true exchanges X and Z coords after projection function getTransformationMatrix(ox, oy, oz, rx, ry, rz, s, d, f, n, ar, exz) { // Pre-computes trigonometric values var cx = Math.cos(rx), sx = Math.sin(rx); var cy = Math.cos(ry), sy = Math.sin(ry); var cz = Math.cos(rz), sz = Math.sin(rz); // Tests if d is too small, hence making perspective projection not possible if (d <= -n) { // Transformation matrix without projection return new Float32Array([ (cy*cz*s)/ar,cy*s*sz,-s*sy,0, (s*(cz*sx*sy-cx*sz))/ar,s*(sx*sy*sz+cx*cz),cy*s*sx,0, (s*(sx*sz+cx*cz*sy))/ar,s*(cx*sy*sz-cz*sx),cx*cy*s,0, (s*(cz*((-oy*sx-cx*oz)*sy-cy*ox)-(oz*sx-cx*oy)*sz))/ar, s*(((-oy*sx-cx*oz)*sy-cy*ox)*sz+cz*(oz*sx-cx*oy)), s*(ox*sy+cy*(-oy*sx-cx*oz)),1 ]); } else { // Pre-computes values determined with wxMaxima var A=d; var B=(n+f+2*d)/(f-n); var C=-(d*(2*n+2*f)+2*f*n+2*d*d)/(f-n); // Tests if X and Z must be exchanged if(!exz) { // Full transformation matrix return new Float32Array([ (cy*cz*s*A)/ar,cy*s*sz*A,-s*sy*B,-s*sy, (s*(cz*sx*sy-cx*sz)*A)/ar,s*(sx*sy*sz+cx*cz)*A,cy*s*sx*B,cy*s*sx, (s*(sx*sz+cx*cz*sy)*A)/ar,s*(cx*sy*sz-cz*sx)*A,cx*cy*s*B,cx*cy*s, (s*(cz*((-oy*sx-cx*oz)*sy-cy*ox)-(oz*sx-cx*oy)*sz)*A)/ar, s*(((-oy*sx-cx*oz)*sy-cy*ox)*sz+cz*(oz*sx-cx*oy))*A, C+(s*(ox*sy+cy*(-oy*sx-cx*oz))+d)*B,s*(ox*sy+cy*(-oy*sx-cx*oz))+d ]); } else { // Full transformation matrix with XZ exchange return new Float32Array([ -s*sy*B,cy*s*sz*A,(cy*cz*s*A)/ar,-s*sy, cy*s*sx*B,s*(sx*sy*sz+cx*cz)*A,(s*(cz*sx*sy-cx*sz)*A)/ar,cy*s*sx, cx*cy*s*B,s*(cx*sy*sz-cz*sx)*A,(s*(sx*sz+cx*cz*sy)*A)/ar,cx*cy*s, C+(s*(ox*sy+cy*(-oy*sx-cx*oz))+d)*B,s*(((-oy*sx-cx*oz)*sy-cy*ox)*sz+cz*(oz*sx-cx*oy))*A, (s*(cz*((-oy*sx-cx*oz)*sy-cy*ox)-(oz*sx-cx*oy)*sz)*A)/ar,s*(ox*sy+cy*(-oy*sx-cx*oz))+d ]); } } }

## Results

As usual, the result can be seen on-line. Additional HTML/Javascript controls were added to let the user play with the parameters of the transformation. It should look like this:

The controls give the ability to decompose the full transformation step-by-step, i.e. we start with the following view:

After the rotation around the X axis:

After the rotation around the Y axis:

After the rotation around the Z axis:

After scaling:

After perspective projection:

We use the XZ exchange to display what happens on the Z coordinates during the perspective projection, i.e. both images show the same scene after projection (with no other transformation), the first from the front, the second from the side:

## Summary

Nothing really new about WebGL in this post, mainly mathematics:

- Operations with "homogeneous coordinates" consist in successive 4×4 matrix products, where space coordinates are represented by 4-components column-vectors, the fourth component being initialized to 1
- Operations such as translation, rotation, scaling and projection on a plane are combined in one matrix product
- After this product, the 3D coordinates are obtained back by dividing the first 3 components by the fourth one
- Rotation and scaling operations are common three dimensional transformations adapted to homogeneous coordinates by putting the 3×3 transformation matrix into a 4×4 identity matrix
- The translations cannot be represented by a 3×3 matrix (non-linear operations) and need a specific 4×4 matrix with translation vector components on the 4th column
- The perspective projection is as well represented as a 4×4 matrix through which the 4th component of the homogeneous coordinates is made proportional to the Z component, resulting in a division by Z during the conversion from homogeneous to space coordinates (division by the 4th coordinate).
- A quite complex calibration of the factors used in the perspective projection is necessary to keep the Z coordinate and scale it to the interval [-1,+1]
- The parameters chosen for the transformation are:
*(ox,oy,oz)*point looked at,*(ry,ry,rz)*rotation angles,*ar*aspect ratio,*s*scaling,*d*distance between observer and projection screen,*f*far plane Z coordinate,*n*near plane Z coordinate

#1 by

Jason Slemonson September 28, 2011 - 21:21I think the explanation of homogeneous coordinates could use an example, also i don’t understand the section on the observer. it seems like the observer should be stationary and the object should move, not the other way around.

#2 by blogoben on September 29, 2011 - 17:30

First of all, thanks a lot for your comments. My intent with this post is not to go deep into homogeneous coordinates. This is a new field for me and I lack experience. As soon as I understand more about the topic, I will post.

For the observer, it is only a matter of definition. At the end, matrix operations transform vertices no matter if the operations were defined according to this or this point of view. I made the choice for a parameterization of the observer such that the successive operations can be understood step by step and such that the result on the screen is predictable. Due to the distance d, it is really the case that the observer goes around the point looked at. Play with the HTML controls to get an idea how it works.

I am not planning to update this post once again (I’m working on it for weeks now!) but want to concentrate on the WebGL API and new effects such as lighting and texturing.

#3 by

Jason Slemonson September 29, 2011 - 18:39thanks for the reply, and best of luck with the rest of webGL!

#4 by Tom Novelli on June 13, 2012 - 17:43

Excellent write-up and diagrams. Lack of experience is an asset when you’re trying to explain something like this.

Thanks for not using a matrix lib. I wonder if you ended up using one…? Some quick searching tells me most of them are slow and/or buggy, but gl-matrix looks good.

#5 by blogoben on June 14, 2012 - 09:20

I used only Maxima for matrix pre-calculations, no library. This is a deliberate choice for this mini-tutorial: one single HTML page and no use of external libraries. If needed, I may write some simple matrix operation functions, or find a way to use the built-in functions of the shaders.

#6 by

Scotton December 28, 2013 - 04:36Thanks very much for your thoughtful explanation of these concepts. I looked all over the web and this page was the single most helpful page that I found.