How to transform a 2D image into a 3D space ?

Tutorial

3D transformation matrix for 2D python image with OpenCV. Rotation, Translation, Shearing and Scalin

Informations
  • axel.thevenot@edu.devinci.fr
  • 06 24 98 20 33
Dates
  • Creation: 01/31/2020
  • Update: 08/26/2020
axel_thevenotscanner-opencvbillard-interactifinnovation

Transformation matrix

Transformation matrices are used to modify and reposition points from one frame to another. They are widely used in video games and Computer Vision. It is impossible to enumerate all their uses, but they are also used to enhance images during training in Deep Learning. Or they can be used to create a document scanner, which you can create from this tutorial.

As said before, these matrices allow you to move from one marker to another. In other words, each point in space will have a different position in another frame.

For the moment we do not know what the matrix M and its coefficients correspond to. In reality, the coefficients are not directly found. Let's rather say that there is a better way to decompose this matrix. Indeed a transformation matrix can be decomposed into 4 matrices, all playing a role on the transformation of coordinates in space.

We note the Translation matrix, the Rotation matrix, the Scaling matrix and the Shearing (or Skewing) matrix.

I have decomposed the matrix in the above order, that is to say, the order has its importance. In this case, it will apply first the translation then the rotation and so on. However, we will have to move to Homogeneous coordinates and therefore introduce two new matrices.


Homogeneous coordinates


The two matrices that we are going to see allow us to go from a Cartesian coordinate system to a projective coordinate system and vice versa, respectively H and H'. Note that H' is not the inverse matrix of H.

Throughout this tutorial we will work in a 3D Cartesian coordinate system. We will therefore have coordinates of the shape (x, y, z) in the Cartesian coordinate system and (x, y, z, w) in the projective coordinate system.

To explain what the projection coordinates are, I will make the analogy in 2D for simplicity. Imagine a screen of size X, Y, which is quite easy to represent. Then we have (X, Y, W) where W is the distance from the screen to the projector. We have the coordinates (X, Y, W). So we are in a 2D space in projective or homogeneous coordinates.


Now that we are in this projection space, we can move the projector forward or backward from the screen with a distance ΔW. It is easy to understand that it is the impact of the projected image on the screen according to this displacement ΔW. If we move it backwards then the projection image will be projected beyond it. Conversely, by bringing the projector closer to the screen, the display size of the image on the screen is reduced.






It is not possible (or difficult) to imagine this projective marker associated with a 3D Cartesian marker, but the principle remains the same. We will go from (X, Y, Z) to (X, Y, Z, W).

In this tutorial we will apply these transformation matrices by considering angles in degrees (0° to 360°) and measurements in pixels. For the moment we have not defined the transformation matrices. We will define them later. For the moment we will replace them with the Identity matrix. This matrix does not modify the mark, it is like multpling a number by 1.



Let's go back to our matrices for the transition from Cartesian coordinates to projective coordinates and vice versa. For the moment the matrix M is the identity matrix.

Here, h and w are the height and width dimensions of the image to be processed. In the rest of this tutorial we will consider that our image is located in a 3D frame. If h and w appear in our equation it is to re-center the frame in the center of the image. Indeed, in an image, the origin of the reference frame (the coordinates (0, 0)) are the left corner of the image.

So if we want to consider the origin of the reference point in the center of the image, we have to make a translation which I will talk about in the next paragraph.

In addition, we can see the appearance of the variable f which corresponds to the focal length of our projector in the diagram we saw before.

import cv2
import numpy as np

def transform(image, translation=(0, 0, 0), rotation=(0, 0, 0),
            scaling=(1, 1, 1), shearing=(0, 0, 0)):
    # get the values on each axis
    t_x, t_y, t_z = translation
    r_x, r_y, r_z = rotation
    sc_x, sc_y, sc_z = scaling
    sc_x, sc_y, sc_z = 1. / sc_x, 1. / sc_y, 1. / sc_z
    sh_x, sh_y, sh_z = shearing


    # convert degree angles to rad
    theta_rx = np.deg2rad(r_x)
    theta_ry = np.deg2rad(r_y)
    theta_rz = np.deg2rad(r_z)
    theta_shx = np.deg2rad(sh_x)
    theta_shy = np.deg2rad(sh_y)
    theta_shz = np.deg2rad(sh_z)

    # get the height and the width of the image
    h, w = image.shape[:2]

    # compute its diagonal
    diag = (h ** 2 + w ** 2) ** 0.5
    # compute the focal length (as if there is a projector)
    f = diag
    if np.sin(theta_rz) != 0:
        f /= 2 * np.sin(theta_rz)


    # set the image from cartesian to projective dimension
    H_M = np.array([[1, 0, -w / 2],
                  [0, 1, -h / 2],
                  [0, 0,      1],
                  [0, 0,      1]])

    # set the image projective to carrtesian dimension
    Hp_M = np.array([[f, 0, w / 2, 0],
                   [0, f, h / 2, 0],
                   [0, 0,     1, 0]])

    """
        We will define our matrices here in next parts
                                                       """
    identity = np.array([[1, 0, 0, 0],
                       [0, 1, 0, 0],
                       [0, 0, 1, 0],
                       [0, 0, 0, 1]])
    T_M = identity
    R_M = identity
    Sc_M = identity
    Sh_M = identity

    # compute the full transform matrix
    M = np.dot(T_M, R_M)
    M = np.dot(Sc_M, M)
    M = np.dot(Sh_M, M)
    M = np.dot(Hp_M, np.dot(M, H_M))

    # apply the transformation
    image = cv2.warpPerspective(image, M, (w, h))
    return image

So we have the framework for our image transformation function. Starting by recovering all the transformations on all the axes and then defining the matrices of homogeneous passage.


Translation

Let's move on to the translation matrix. As its name indicates, this allows us to perform translations on the different axes.

You will notice that the Z-axis translation had to be updated. Indeed our translation, in pixel, does not take into account the "distance to the projector". We must therefore simulate this translation by updating the Z-axis translation.


t_z = (f - t_z) * sc_z
# translation matrix to translate the image
T_M = np.array([[1, 0, 0, t_x],
              [0, 1, 0, t_y],
              [0, 0, 1, t_z],
              [0, 0, 01]])


Rotation

The rotation matrix can be broken down into the product of rotation matrices on the different axes. In a 3D space we have 3 rotation matrices.



# calculate cos and sin of angles
sin_rx, cos_rx = np.sin(theta_rx), np.cos(theta_rx)
sin_ry, cos_ry = np.sin(theta_ry), np.cos(theta_ry)
sin_rz, cos_rz = np.sin(theta_rz), np.cos(theta_rz)
# get the rotation matrix on x axis
R_Mx = np.array([[1,      0,       0, 0],
               [0, cos_rx,  -sin_rx, 0],
               [0, sin_rx,   cos_rx, 0],
               [0,      0,       0, 1]])

# get the rotation matrix on y axis
R_My = np.array([[cos_ry, 0,  -sin_ry, 0],
               [     0, 1,       0, 0],
               [sin_ry,  0,  cos_ry, 0],
               [     0, 0,       0, 1]])

# get the rotation matrix on z axis
R_Mz = np.array([[cos_rz, -sin_rz,  0, 0],
               [sin_rz,  cos_rz,  0, 0],
               [     0,       0, 1, 0],
               [     0,       0, 0, 1]])

# compute the full rotation matrix
R_M = np.dot(np.dot(R_Mx, R_My), R_Mz)

Scale

The Scale matrix allows, to put it simply, to imitate the zoom. In other words, if we take again our example our image, the zoom by two on X or Y can be understood by spreading the image by as much on these axes (while looking at the image from the same distance). As for the Z-axis, we can interpret a zoom by two as the combination of a zoom by two on X and on Y. But I prefer to consider that there is no zoom on X and Y and that we just look at the image from half the distance.

Note that one can make symmetries on the axes by passing a value of -1.

# get the scaling matrix
Sc_M = np.array([[sc_x,     0,    0, 0],
               [   0,  sc_y,    0, 0],
               [   0,     0, sc_z, 0],
               [   0,     0,    0, 1]])


Shear

The shearing matrix makes it possible to stretch (to shear) on the different axes.

In 3D we therefore have a shearing matrix which is broken down into distortion matrices on the 3 axes.

# get the tan of angles
tan_shx = np.tan(theta_shx)
tan_shy = np.tan(theta_shy)
tan_shz = np.tan(theta_shz)
# get the shearing matrix on x axis
Sh_Mx = np.array([[      1, 0, 0, 0],
                [tan_shy, 1, 0, 0],
                [tan_shz, 0, 1, 0],
                [      0, 0, 0, 1]])
# get the shearing matrix on y axis
Sh_My = np.array([[1, tan_shx, 0, 0],
                [0,       1, 0, 0],
                [0, tan_shz, 1, 0],
                [0,       0, 0, 1]])

# get the shearing matrix on z axis
Sh_Mz = np.array([[1, 0, tan_shx, 0],
                [0, 1, tan_shy, 0],
                [0, 0,       1, 0],
                [0, 0,       0, 1]])
# compute the full shearing matrix
Sh_M = np.dot(np.dot(Sh_Mx, Sh_My), Sh_Mz)


Create some sliders

We then have all the necessary matrices to transform our image. It would be interesting to be able to visualize their effect in real time on an image of our choice. Let's create a slider.


class Slider:
    """Slider with cursor to get/set a value"""

    def __init__(self, name, shape, min_value, max_value, starting_value=None):
        """
        Init the slider
        :param name: name of the slider
        :param shape: shape of the image describing the slider
        :param min_value: minimum value of the slider
        :param max_value: maximum value of the slider
        :param starting_value: starting value of the cursor
        """
        self.name = name
        self.shape = shape
        self.min_value = min_value
        self.max_value = max_value
        self.value = starting_value
        # set the cursor to the middle of the slider by default
        if self.value is None:
            self.value = (self.min_value + self.max_value) / 2


    def set_cursor_from_image_coord(self, x):
        """
        Get the index of the selected border if it exists
        :param x: coordinate of the new value in range [0, weight_of_image]
        """
        # normalize the x-coordinate value according to the image shape
        x_normalize =  x / self.shape[1]
        # rescale it according to its minimum / maximum value
        x_rescale = x_normalize * (self.max_value - self.min_value) + self.min_value
        # check it does not go outside the slider
        if self.min_value < x_rescale <= self.max_value:
            self.value = x_rescale


    def get_image(self):
        """
        Create an image to describe the state of the slider
        :return: image of the slider
        """
        # set the background to black
        slider_img = np.zeros((*self.shape, 3))

        # put its name, its minimum value and its maximum value on the image
        cv2.putText(slider_img, self.name, (0, 20), cv2.FONT_HERSHEY_SIMPLEX, .55, (1, 1, 1), 2, cv2.LINE_AA)
        cv2.putText(slider_img, str(self.min_value), (50, 80), cv2.FONT_HERSHEY_SIMPLEX, .55, (1, 1, 1), 2, cv2.LINE_AA)
        cv2.putText(slider_img, str(self.max_value), (1000 - 100, 80), cv2.FONT_HERSHEY_SIMPLEX, .55, (1, 1, 1), 2, cv2.LINE_AA)

        # draw the slider as a centered rectangle in the image
        slider_pt1 = ( 1 * self.shape[1] // 204 * self.shape[0] // 10)
        slider_pt2 = (19 * self.shape[1] // 206 * self.shape[0] // 10)
        cv2.rectangle(slider_img, slider_pt1, slider_pt2, (0.6, 0.6, 0.6) , 5)
        cv2.rectangle(slider_img, slider_pt1, slider_pt2, (0.8, 0.8, 0.8),-1)

        # get the value of the slider and rescale it to set the cursor
        # at its right place according to the slider on the image
        x_normalize = (self.value - self.min_value) / (self.max_value - self.min_value)
        x = self.shape[1] // 20 + 18 * self.shape[1] // 20 * x_normalize
        x = int(x)

        # draw the cursor as a rectangle too
        slider_pt1 = (x - self.shape[1] // 502 * self.shape[0] // 10)
        slider_pt2 = (x + self.shape[1] // 508 * self.shape[0] // 10)
        cv2.rectangle(slider_img, slider_pt1, slider_pt2, (0.6, 0.6, 0.6) , 5)
        cv2.rectangle(slider_img, slider_pt1, slider_pt2, (0.8, 0.8, 0.8),-1)
        return slider_img

The slider is then only a static image of which nothing can be done yet.

Stack the sliders and Interactions

Let's create a stacker of these sliders so we can control them at the same time with the mouse.

class SliderStacker():

    def __init__(self, sliders):
        """
        Initialize the calibrator
        :param sliders: list of the sliders to stack
        """
        # get the sliders and their height assuming
        # they all have the same shape
        self.sliders = sliders
        self.slider_height = self.sliders[0].shape[0]

        # create an attribute to remember if the left mouse button is pressed
        self.l_pressed = False

        # Create a window to interact with the sliders
        cv2.namedWindow('Sliders', cv2.WINDOW_NORMAL)
        cv2.setMouseCallback('Sliders', self.mouse_event)

    def mouse_event(self, event, x, y, flags, param):
        """
        Able to the user to manually change the slider value
        :param event: event
        raised from the mouse
        :param x: x coordinate of the mouse at the event time
        :param y: y coordinate of the mouse at the event time
        :param flags: flags of the event
        :param param: param of the event
        """
        # get the index of the slider where the mouse is and its y-coordinates
        index, y = y // self.slider_height,  y % self.slider_height
        # If the left click is pressed, get the slider index
        if event == cv2.EVENT_LBUTTONDOWN:
            self.l_pressed = True
            self.slider_index = index
            # set the new value according to the y-coordinate of the mouse
            self.sliders[self.slider_index].set_cursor_from_image_coord(x)

        # If the mouse is moving while dragging a cursor, set its new position
        elif event == cv2.EVENT_MOUSEMOVE and self.l_pressed:
            self.sliders[self.slider_index].set_cursor_from_image_coord(x)

        # If the left click is released
        elif event == cv2.EVENT_LBUTTONUP:
            self.l_pressed = False

    def update(self):
        """Update the frame to display the current state of the sliders"""

        # get each slider image
        img_sliders = (s.get_image() for s in self.sliders)
        # display stacking them vertically
        cv2.imshow('Sliders', np.vstack(img_sliders))





Run it...

Do not forget that the order of the matrices that make up the transformation matrix are important. I invite you to change this order to see for yourselves.

if __name__ == '__main__':
    
    img_input = cv2.imread('logo.jpg')
    cv2.namedWindow('Rendering', cv2.WINDOW_NORMAL)
    slider_shape = (100, 1000)
    h, w, _ = img_input.shape
    sliders = []
    sliders.append(Slider('T(x)', slider_shape, - h, h))
    sliders.append(Slider('T(y)', slider_shape, - w, w))
    sliders.append(Slider('T(z)', slider_shape, - 2 * (h + w), 2 * (h + w)))
    sliders.append(Slider('R(x)', slider_shape, -180, 180))
    sliders.append(Slider('R(y)', slider_shape, -180, 180))
    sliders.append(Slider('R(z)', slider_shape, -180, 180))
    sliders.append(Slider('Sc(x)', slider_shape, -2, 2, starting_value=1))
    sliders.append(Slider('Sc(y)', slider_shape, -2, 2, starting_value=1))
    sliders.append(Slider('Sc(z)', slider_shape, -2, 2, starting_value=1))
    sliders.append(Slider('Sh(x)', slider_shape, -180, 180))
    sliders.append(Slider('Sh(y)', slider_shape, -180, 180))
    sliders.append(Slider('Sh(z)', slider_shape, -180, 180))
    params = SliderStacker(sliders)

    while True:

        slider_values = [s.value for s in params.sliders]
        translation = tuple(slider_values[:3])
        rotation = tuple(slider_values[3:6])
        scaling = tuple(slider_values[6:9])
        shearing = tuple(slider_values[9:])
        params.update()
        img_output = transform(img_input, rotation, translation, scaling, shearing)
        
        # Display until the 'Enter' key is pressed
        cv2.imshow('Rendering', img_output)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cv2.destroyAllWindows()


Source

Image Transformation, MIT Media Lab

Programmer's Guide to Homogeneous Coordinates, Hackernoon