3D transformation matrix for 2D python image with OpenCV. Rotation, Translation, Shearing and Scalin
- 06 24 98 20 33
- Creation: 01/31/2020
- Update: 02/24/2020
3D transformation matrix for 2D python image with OpenCV. Rotation, Translation, Shearing and Scalin
Transformation matrices are used to modify and reposition points from one frame to another. They are widely used in video games and Computer Vision. It is impossible to enumerate all their uses, but they are also used to enhance images during training in Deep Learning. Or they can be used to create a document scanner, which you can create from this tutorial.
As said before, these matrices allow you to move from one marker to another. In other words, each point in space will have a different position in another frame.
For the moment we do not know what the matrix M and its coefficients correspond to. In reality, the coefficients are not directly found. Let's rather say that there is a better way to decompose this matrix. Indeed a transformation matrix can be decomposed into 4 matrices, all playing a role on the transformation of coordinates in space.
We note the Translation matrix, the Rotation matrix, the Scaling matrix and the Shearing (or Skewing) matrix.
I have decomposed the matrix in the above order, that is to say, the order has its importance. In this case, it will apply first the translation then the rotation and so on. However, we will have to move to Homogeneous coordinates and therefore introduce two new matrices.
The two matrices that we are going to see allow us to go from a Cartesian coordinate system to a projective coordinate system and vice versa, respectively H and H'. Note that H' is not the inverse matrix of H.
Throughout this tutorial we will work in a 3D Cartesian coordinate system. We will therefore have coordinates of the shape (x, y, z) in the Cartesian coordinate system and (x, y, z, w) in the projective coordinate system.
To explain what the projection coordinates are, I will make the analogy in 2D for simplicity. Imagine a screen of size X, Y, which is quite easy to represent. Then we have (X, Y, W) where W is the distance from the screen to the projector. We have the coordinates (X, Y, W). So we are in a 2D space in projective or homogeneous coordinates.
Now that we are in this projection space, we can move the projector forward or backward from the screen with a distance ΔW. It is easy to understand that it is the impact of the projected image on the screen according to this displacement ΔW. If we move it backwards then the projection image will be projected beyond it. Conversely, by bringing the projector closer to the screen, the display size of the image on the screen is reduced.
It is not possible (or difficult) to imagine this projective marker associated with a 3D Cartesian marker, but the principle remains the same. We will go from (X, Y, Z) to (X, Y, Z, W).
In this tutorial we will apply these transformation matrices by considering angles in degrees (0° to 360°) and measurements in pixels. For the moment we have not defined the transformation matrices. We will define them later. For the moment we will replace them with the Identity matrix. This matrix does not modify the mark, it is like multpling a number by 1.
Let's go back to our matrices for the transition from Cartesian coordinates to projective coordinates and vice versa. For the moment the matrix M is the identity matrix.
Here, h and w are the height and width dimensions of the image to be processed. In the rest of this tutorial we will consider that our image is located in a 3D frame. If h and w appear in our equation it is to re-center the frame in the center of the image. Indeed, in an image, the origin of the reference frame (the coordinates (0, 0)) are the left corner of the image.
So if we want to consider the origin of the reference point in the center of the image, we have to make a translation which I will talk about in the next paragraph.
In addition, we can see the appearance of the variable f which corresponds to the focal length of our projector in the diagram we saw before.
import cv2 import numpy as np def transform(image, translation=(0, 0, 0), rotation=(0, 0, 0), scaling=(1, 1, 1), shearing=(0, 0, 0)): # get the values on each axis t_x, t_y, t_z = translation r_x, r_y, r_z = rotation sc_x, sc_y, sc_z = scaling sc_x, sc_y, sc_z = 1. / sc_x, 1. / sc_y, 1. / sc_z sh_x, sh_y, sh_z = shearing # convert degree angles to rad theta_rx = np.deg2rad(r_x) theta_ry = np.deg2rad(r_y) theta_rz = np.deg2rad(r_z) theta_shx = np.deg2rad(sh_x) theta_shy = np.deg2rad(sh_y) theta_shz = np.deg2rad(sh_z) # get the height and the width of the image h, w = image.shape[:2] # compute its diagonal diag = (h ** 2 + w ** 2) ** 0.5 # compute the focal length (as if there is a projector) f = diag if np.sin(theta_rz) != 0: f /= 2 * np.sin(theta_rz) # set the image from cartesian to projective dimension H_M = np.array([[1, 0, -w / 2], [0, 1, -h / 2], [0, 0, 1], [0, 0, 1]]) # set the image projective to carrtesian dimension Hp_M = np.array([[f, 0, w / 2, 0], [0, f, h / 2, 0], [0, 0, 1, 0]]) """ We will define our matrices here in next parts """ identity = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]]) T_M = identity R_M = identity Sc_M = identity Sh_M = identity # compute the full transform matrix M = np.dot(T_M, R_M) M = np.dot(Sc_M, M) M = np.dot(Sh_M, M) M = np.dot(Hp_M, np.dot(M, H_M)) # apply the transformation image = cv2.warpPerspective(image, M, (w, h)) return image
So we have the framework for our image transformation function. Starting by recovering all the transformations on all the axes and then defining the matrices of homogeneous passage.
Let's move on to the translation matrix. As its name indicates, this allows us to perform translations on the different axes.
You will notice that the Z-axis translation had to be updated. Indeed our translation, in pixel, does not take into account the "distance to the projector". We must therefore simulate this translation by updating the Z-axis translation.
t_z = (f - t_z) # translation matrix to translate the image T_M = np.array([[1, 0, 0, t_x], [0, 1, 0, t_y], [0, 0, 1, t_z], [0, 0, 0, 1]])
The rotation matrix can be broken down into the product of rotation matrices on the different axes. In a 3D space we have 3 rotation matrices.
# calculate cos and sin of angles sin_rx, cos_rx = np.sin(theta_rx), np.cos(theta_rx) sin_ry, cos_ry = np.sin(theta_ry), np.cos(theta_ry) sin_rz, cos_rz = np.sin(theta_rz), np.cos(theta_rz) # get the rotation matrix on x axis R_Mx = np.array([[1, 0, 0, 0], [0, cos_rx, -sin_rx, 0], [0, sin_rx, cos_rx, 0], [0, 0, 0, 1]]) # get the rotation matrix on y axis R_My = np.array([[cos_ry, 0, -sin_ry, 0], [ 0, 1, 0, 0], [sin_ry, 0, cos_ry, 0], [ 0, 0, 0, 1]]) # get the rotation matrix on z axis R_Mz = np.array([[cos_rz, -sin_rz, 0, 0], [sin_rz, cos_rz, 0, 0], [ 0, 0, 1, 0], [ 0, 0, 0, 1]]) # compute the full rotation matrix R_M = np.dot(np.dot(R_Mx, R_My), R_Mz)
The Scale matrix allows, to put it simply, to imitate the zoom. In other words, if we take again our example our image, the zoom by two on X or Y can be understood by spreading the image by as much on these axes (while looking at the image from the same distance). As for the Z-axis, we can interpret a zoom by two as the combination of a zoom by two on X and on Y. But I prefer to consider that there is no zoom on X and Y and that we just look at the image from half the distance.
Note that one can make symmetries on the axes by passing a value of -1.
# get the scaling matrix Sc_M = np.array([[sc_x, 0, 0, 0], [ 0, sc_y, 0, 0], [ 0, 0, sc_z, 0], [ 0, 0, 0, 1]])
The shearing matrix makes it possible to stretch (to shear) on the different axes.
In 3D we therefore have a shearing matrix which is broken down into distortion matrices on the 3 axes.
# get the tan of angles tan_shx = np.tan(theta_shx) tan_shy = np.tan(theta_shy) tan_shz = np.tan(theta_shz) # get the shearing matrix on x axis Sh_Mx = np.array([[ 1, 0, 0, 0], [tan_shy, 1, 0, 0], [tan_shz, 0, 1, 0], [ 0, 0, 0, 1]]) # get the shearing matrix on y axis Sh_My = np.array([[1, tan_shx, 0, 0], [0, 1, 0, 0], [0, tan_shz, 1, 0], [0, 0, 0, 1]]) # get the shearing matrix on z axis Sh_Mz = np.array([[1, 0, tan_shx, 0], [0, 1, tan_shy, 0], [0, 0, 1, 0], [0, 0, 0, 1]]) # compute the full shearing matrix Sh_M = np.dot(np.dot(Sh_Mx, Sh_My), Sh_Mz)
Create some sliders
We then have all the necessary matrices to transform our image. It would be interesting to be able to visualize their effect in real time on an image of our choice. Let's create a slider.
class Slider: """Slider with cursor to get/set a value""" def __init__(self, name, shape, min_value, max_value, starting_value=None): """ Init the slider :param name: name of the slider :param shape: shape of the image describing the slider :param min_value: minimum value of the slider :param max_value: maximum value of the slider :param starting_value: starting value of the cursor """ self.name = name self.shape = shape self.min_value = min_value self.max_value = max_value self.value = starting_value # set the cursor to the middle of the slider by default if self.value is None: self.value = (self.min_value + self.max_value) / 2 def set_cursor_from_image_coord(self, x): """ Get the index of the selected border if it exists :param x: coordinate of the new value in range [0, weight_of_image] """ # normalize the x-coordinate value according to the image shape x_normalize = x / self.shape # rescale it according to its minimum / maximum value x_rescale = x_normalize * (self.max_value - self.min_value) + self.min_value # check it does not go outside the slider if self.min_value < x_rescale <= self.max_value: self.value = x_rescale def get_image(self): """ Create an image to describe the state of the slider :return: image of the slider """ # set the background to black slider_img = np.zeros((*self.shape, 3)) # put its name, its minimum value and its maximum value on the image cv2.putText(slider_img, self.name, (0, 20), cv2.FONT_HERSHEY_SIMPLEX, .55, (1, 1, 1), 2, cv2.LINE_AA) cv2.putText(slider_img, str(self.min_value), (50, 80), cv2.FONT_HERSHEY_SIMPLEX, .55, (1, 1, 1), 2, cv2.LINE_AA) cv2.putText(slider_img, str(self.max_value), (1000 - 100, 80), cv2.FONT_HERSHEY_SIMPLEX, .55, (1, 1, 1), 2, cv2.LINE_AA) # draw the slider as a centered rectangle in the image slider_pt1 = ( 1 * self.shape // 20, 4 * self.shape // 10) slider_pt2 = (19 * self.shape // 20, 6 * self.shape // 10) cv2.rectangle(slider_img, slider_pt1, slider_pt2, (0.6, 0.6, 0.6) , 5) cv2.rectangle(slider_img, slider_pt1, slider_pt2, (0.8, 0.8, 0.8),-1) # get the value of the slider and rescale it to set the cursor # at its right place according to the slider on the image x_normalize = (self.value - self.min_value) / (self.max_value - self.min_value) x = self.shape // 20 + 18 * self.shape // 20 * x_normalize x = int(x) # draw the cursor as a rectangle too slider_pt1 = (x - self.shape // 50, 2 * self.shape // 10) slider_pt2 = (x + self.shape // 50, 8 * self.shape // 10) cv2.rectangle(slider_img, slider_pt1, slider_pt2, (0.6, 0.6, 0.6) , 5) cv2.rectangle(slider_img, slider_pt1, slider_pt2, (0.8, 0.8, 0.8),-1) return slider_img
The slider is then only a static image of which nothing can be done yet.
Stack the sliders and Interactions
Let's create a stacker of these sliders so we can control them at the same time with the mouse.
class SliderStacker(): def __init__(self, sliders): """ Initialize the calibrator :param sliders: list of the sliders to stack """ # get the sliders and their height assuming # they all have the same shape self.sliders = sliders self.slider_height = self.sliders.shape # create an attribute to remember if the left mouse button is pressed self.l_pressed = False # Create a window to interact with the sliders cv2.namedWindow('Sliders', cv2.WINDOW_NORMAL) cv2.setMouseCallback('Sliders', self.mouse_event) def mouse_event(self, event, x, y, flags, param): """ Able to the user to manually change the slider value :param event: event raised from the mouse :param x: x coordinate of the mouse at the event time :param y: y coordinate of the mouse at the event time :param flags: flags of the event :param param: param of the event """ # get the index of the slider where the mouse is and its y-coordinates index, y = y // self.slider_height, y % self.slider_height # If the left click is pressed, get the slider index if event == cv2.EVENT_LBUTTONDOWN: self.l_pressed = True self.slider_index = index # set the new value according to the y-coordinate of the mouse self.sliders[self.slider_index].set_cursor_from_image_coord(x) # If the mouse is moving while dragging a cursor, set its new position elif event == cv2.EVENT_MOUSEMOVE and self.l_pressed: self.sliders[self.slider_index].set_cursor_from_image_coord(x) # If the left click is released elif event == cv2.EVENT_LBUTTONUP: self.l_pressed = False def update(self): """Update the frame to display the current state of the sliders""" # get each slider image img_sliders = (s.get_image() for s in self.sliders) # display stacking them vertically cv2.imshow('Sliders', np.vstack(img_sliders))
Do not forget that the order of the matrices that make up the transformation matrix are important. I invite you to change this order to see for yourselves.
if __name__ == '__main__': img_input = cv2.imread('logo.jpg') cv2.namedWindow('Rendering', cv2.WINDOW_NORMAL) slider_shape = (100, 1000) h, w, _ = img_input.shape sliders =  sliders.append(Slider('T(x)', slider_shape, - h, h)) sliders.append(Slider('T(y)', slider_shape, - w, w)) sliders.append(Slider('T(z)', slider_shape, - 2 * (h + w), 2 * (h + w))) sliders.append(Slider('R(x)', slider_shape, -180, 180)) sliders.append(Slider('R(y)', slider_shape, -180, 180)) sliders.append(Slider('R(z)', slider_shape, -180, 180)) sliders.append(Slider('Sc(x)', slider_shape, -2, 2, starting_value=1)) sliders.append(Slider('Sc(y)', slider_shape, -2, 2, starting_value=1)) sliders.append(Slider('Sc(z)', slider_shape, -2, 2, starting_value=1)) sliders.append(Slider('Sh(x)', slider_shape, -180, 180)) sliders.append(Slider('Sh(y)', slider_shape, -180, 180)) sliders.append(Slider('Sh(z)', slider_shape, -180, 180)) params = SliderStacker(sliders) while True: slider_values = [s.value for s in params.sliders] translation = tuple(slider_values[:3]) rotation = tuple(slider_values[3:6]) scaling = tuple(slider_values[6:9]) shearing = tuple(slider_values[9:]) params.update() img_output = transform(img_input, rotation, translation, scaling, shearing) cv2.imshow('Rendering', img_output) if cv2.waitKey(1) & 0xFF == ord('q'): break cv2.destroyAllWindows()