How to implement a Deep Learning dataset annotation platform ? (2/2) : Dynamic interface with OpenCV


Implement a platform / interface to manually annotate a Deep Learning dataset. Make the inferface dy

  • 06 24 98 20 33
  • Creation: 01/30/2020
  • Update: 08/26/2020

Part 2 : Dynamizing an OpenCV window une fenêtre OpenCV

In the first part we designed our static interface structure for segmentation of a dataset. Program that can be launched from the bash.

$ python path/to/image/folder path/to/target/folder path/to/config.csv

Let's have a littke reminder on what we already did and what is remaining :

  • import the input images
  • import their equivalent target if it already exists
  • save
  • be adaptable according to a configuration file of the different classes (config.csv)
  • inform us of the image and the class we are treating
  • allow a global rendering of the input image and the modifications on its associated target in real time
  • modify the targets according to the drawing shapes
  • erase them, "erase", in the event of an error.
  • be controllable from a keyboard and mouse
  • as a bonus: being able to zoom in to be more precise

    def __init__(self, x_save_dir, y_save_dir, config_save_path):
             Here the last attributes for the configuration 
        # Init the variables to manually interact with the window
        # reference point memory list
        self.ref_p = []
        # activation of the eraser
        self.brush = False
        # brush size to erase manually
        self.brush_size = 20

        # current mouse position
        self.mouse_pos = [0, 0]
        # left mouse button is pressed
        self.l_pressed = False
        # Switch between zoom
        self.zoom_factor = 3
        # Set the window as normal
        cv2.namedWindow("GUI", cv2.WINDOW_NORMAL)
        cv2.setMouseCallback("GUI", self.mouse_event)
        # Set the window in full screen
        cv2.setWindowProperty("GUI", 1, 1)

    def mouse_event(self, *args):
        """Able to the user to manually interact
           with the images and their target"""

First of all for this second part of the tutorial we will need new attributes. These will be the last ones we'll have to add. They will be used to keep in memory all the positions of the different clicks, to activate / deactivate the eraser, the zoom or to know the position of the mouse in real time. Finally, we add an OpenCV window that will call the mouse_event() function at each mouse event. We will implement this function later.

Interact with keyboard keys

The purpose of this section is to design the interaction between the interface and the user from the keyboard keys. So we will define them at the top of our python script.

keys = {
    'next_channel': ord('e'),       # ASCII number = 101
    'previous_channel': ord('d'),    # ASCII number = 100
    'next_image': ord('f'),          # ASCII number = 102
    'previous_image': ord('s'),       # ASCII number = 115
    'zoom': ord('z'),                # ASCII number = 122
    'validate': 13,                  # ASCII number = 13
    'undo': ord('u'),                 # ASCII number = 117
    'brush': ord('b'),                 # ASCII number = 98
    'delete': 8,                       # ASCII number = 8
    'quit': ord('q')                    # ASCII number = 113

class objectview(object):
    def __init__(self, d):
        self.__dict__ = d

# to transform the dictionary into an object
keys = objectview(keys)

Here we can redefine the commands we like. For my part I made this choice because I'm on an AZERTY keyboard. It is a personal command choice that I invite you to change if it doesn't satisfy you. To change them, the keyboard keys are indicated by integers between 0 and 255 respecting the ASCII table. You will notice that I have transformed the dictionary into an object. This is done for clarity during implementation. We can access the next_image value by writing "keys.next_image" instead of "keys['next_image']".

All the processing of the functions assigned to the keyboard keys will be done in the run() function which can be rewritten as below. So we make a list of the commands activated by the keyboard keys. The called functions will be implemented afterwards.

    def run(self):
        while True:
            # Display the current gui frame
            cv2.imshow("GUI", self.get_frame())
            # Continuously wait for a pressed key
            key = cv2.waitKey(1) & 0xFF
            # activate/deactivate the zoom in the gui and its factor
            if key == keys.zoom:
                # Switch between zoom factors
                self.zoom_factor = 1 + self.zoom_factor % 5
            # if the enter key is pressed for an undefined contour target
            # draw it from references points
            if key == keys.validate and not self.shapes[]:
            # remove the last reference point by pressing the return key
            elif key == keys.undo:
                self.ref_p = self.ref_p[:-1] if len(self.ref_p) else []
            # activate/deactivate the brush eraser
            elif key == keys.brush:
                self.brush = not self.brush
            # go to the next image
            elif key == keys.next_image:
            # go to the previous image
            elif key == keys.previous_image:
            # go to the next channel
            elif key == keys.next_channel:
            # go to the previous channel
            elif key == keys.previous_channel:
            # delete the current image and its target
            elif key == keys.delete:
            # exit the gui if the 'q' key is pressed
            elif key == keys.quit:
                # think to save before to quit

So we have the basis to call methods to switch from one image to another, from one class to another, activate / deactivate the eraser, zoom etc..

All we need to do is to implement the mouse_event() function to have all the requested interactions. Then we will be able to implement the actions to be performed depending on the interaction.


    def mouse_event(self, event, x, y, flags, param):
        Able to the user to manually interact with the images and their target
        :param event: event raised from the mouse
        :param x: x coordinate of the mouse at the event time
        :param y: y coordinate of the mouse at the event time
        :param flags: flags of the event
        :param param: param of the event
        # mouse move
        if event == cv2.EVENT_MOUSEMOVE:
            # update mouse position
            self.mouse_pos = [x, y]
            # check if the left button is pressed to have an action
            if self.l_pressed:
                # erase the activated pixels in the channel
                if self.brush:
                # else draw pixel if  it is `pixel by pixel` draw shape
                elif self.shapes[] == 1:
        if event == cv2.EVENT_LBUTTONUP:
            self.l_pressed = False
        # left button pressed
        if event == cv2.EVENT_LBUTTONDOWN:
            self.l_pressed = True
            # check if the brush eraser is activated
            if self.brush:
            # check if the click is inside the image
            elif 0 <= x < self.X.shape[1] and 0 <= y < self.X.shape[0]:
                # check if it is a pixel by pixel draw shape
                if self.shapes[] == 1:
                    # append the current point to the history
                    self.ref_p.append([x, y])
                    # check if the channel is an unlimited contour shape
                    if self.shapes[]:
                        # check if the channel is a circle draw shape
                        # and if the two points are given
                        if self.shapes[] == len(self.ref_p) == 2:
                        # else wait to reach the number of references points
                        # given by the shape of the channel
                        elif self.shapes[] == len(self.ref_p):

This method will mainly allow us to draw our targets according to the type of shape that defines the class. It checks whether the eraser is activated or not and that the clicks are well done on the image. And speaking of the eraser, it would be nice to be able to see its location when it is activated, as well as the zoom and reference points. We just need to add the following code in the get_frame() function before concatenating the images.

        # get the associated color to the current channel
        color = self.hex2tuple(self.colors[], normalize=True)

        # draw each reference points
        for pt in self.ref_p:
  , tuple(pt), 2, color, cv2.FILLED)

        # draw the diameter of the circle if it is the channel mode shape
        # and there is one ref point
        if self.shapes[] == 2 and len(self.ref_p):
            self.draw_circle_visualization(x_img, y_img, color)
        # draw the polygone if it is the channel mode shape
        elif self.shapes[] != 1 and len(self.ref_p):
            self.draw_poly_visualization(x_img, y_img, color)

        # draw the brush if the right mouse button is pressed
        if self.brush:
            self.draw_brush(x_img, y_img, color)

        # if the zoom is activated
        if self.zoom_factor > 1:
            self.draw_zoom_window(x_img, y_img)

What about the other stuff ?

If you have copied and pasted the above lines of code and tried to start the program, you will have noticed that some functions are not yet defined. They are there! At this stage you can already place reference points wherever you want and delete them according to your commands (in my case, the 'u' key).

The rest of the methods will perform all the actions indicated by the user. Luckily, they are very short and quick to implement.

    def update_image(self, *args):
        """Update the current image"""

    def update_channel(self, *args):
        """Update the current channel"""

    def set_pixel(self, *args):
        """Set value to 1 at the mouse position
        in the case of `1` draw shape"""

    def set_poly(self, *args):
        """Draw a polynomial from the points given its contours"""

    def set_circle(self, *args):
        """Draw a circles with the two reference points in memory
           considering they give us the diameter of the circle"""

    def set_brush_eraser(self, *args):
        """Erase the target image
           according to the brush size and the mouse position"""

    def draw_poly_visualization(self, *args):
        """Draw the lines in the case of polygones to visualize it"""

    def draw_circle_visualization(self, *args):
        """Draw the circle in the case of `2` draw typ to visiualize
        the circle before to set the 2nd and last refernce point"""

    def draw_brush(self, *args):
        """Draw the brush if activated on the images"""

    def draw_zoom_window(self, *args):
        """ Draw the zoom window if activated on the input image
            and draw its position on the target image"""

Let's start by being able to move from one image or class to another with the keyboard.

     def update_image(self, increment=0):
        # save the previous one and its target
        # update the index of the image
        self.n = (self.n + increment) % len(self.X_paths)
        # load the new current image
        # remove the potential references points
        self.ref_p = []
    def update_channel(self, increment=0):
        # update the index of the current channel = ( + increment) % self.n_class
        # remove the potential references points
        self.ref_p = []

We can now use our keyboard to move to our image folder and browse through the different classes.

Let's draw !

Résultat de recherche d'images pour "dessin bonhomme""

According to the different types of shapes specified in our config.csv file that we saw in the first part, we therefore need to define different types of drawing: pixel by pixel, with these circles or with polynomials.

    def set_pixel(self):
        # draw the pixel according at the mouse position
        x, y = self.mouse_pos
        self.Y[y, x,] = 1
    def set_poly(self):
        if len(self.ref_p) > 1:
            # create a new mask
            mask = np.zeros((*self.Y.shape[:2], 3))
            # get the contours points
            pts = np.array([[pt] for pt in self.ref_p])
            # fill the contours in the mask as bitwise then get one channel
            cv2.fillPoly(mask, pts=[pts], color=(1, 1, 1))
            mask = mask[:, :, 0]
            mask = [[1 * (value or mask[i, j])
                        for j, value in enumerate(l)]
                            for i, l in enumerate(self.Y[:, :,])]
            # update the segmented target
            self.Y[:, :,] = np.array(mask)
            # remove the points
            self.ref_p = []
    def set_circle(self):
        # get the two references points in memory
        p1, p2 = self.ref_p
        # get the center of the circle and its radius
        # according to the two bordered selected points
        cx = (p1[0] + p2[0]) // 2
        cy = (p1[1] + p2[1]) // 2
        r = ((p1[0] - p2[0]) ** 2 + (p1[1] - p2[1]) ** 2) ** 0.5 / 2
        # get the mask of the circle
        mask = [[1 * (value or ((j - cx) ** 2 + (i - cy) ** 2 < r ** 2))
                    for j, value in enumerate(line)]
                        for i, line in enumerate(self.Y[:, :,])]
        # draw the circle
        self.Y[:, :,] = np.array(mask)
        # remove the points
        self.ref_p = []

    def draw_poly_visualization(self, x_img, y_img, color):
        # draw a line between the first reference point and the mouse pos
        cv2.line(x_img, tuple(self.mouse_pos), tuple(self.ref_p[0]), color, 1)
        cv2.line(y_img, tuple(self.mouse_pos), tuple(self.ref_p[0]), color, 1)
        # draw a line between the reference points
        for i  in range(len(self.ref_p) - 1):
            cv2.line(x_img, tuple(self.ref_p[i]), tuple(self.ref_p[i + 1]), color, 1)
            cv2.line(y_img, tuple(self.ref_p[i]), tuple(self.ref_p[i + 1]), color, 1)
        # draw a line between the last reference point and the mouse pos
        cv2.line(x_img, tuple(self.mouse_pos), tuple(self.ref_p[-1]), color, 1)
        cv2.line(y_img, tuple(self.mouse_pos), tuple(self.ref_p[-1]), color, 1)

    def draw_circle_visualization(self, x_img, y_img, color):
        p1, p2 = self.ref_p[0], self.mouse_pos
        # get the center of the circle and its radius
        # according to the two bordered selected points
        cx = (p1[0] + p2[0]) // 2
        cy = (p1[1] + p2[1]) // 2
        r = ((p1[0] - p2[0]) ** 2 + (p1[1] - p2[1]) ** 2) ** 0.5 / 2
        # draw the circle to visualize the rendered segmented circle, (cx, cy), int(r), color, 1), (cx, cy), int(r), color, 1)

If you have copied and pasted the above lines of code and tried to start the program, you will have noticed that some functions are not yet defined. With the exception of the last two, which do not modify it. The draw_circle_visualization() and draw_poly_visualization() functions do not alter the target. They draw on the colored image in get_frame() which gives us a preview of the reference points already placed by adding the mouse position.

We are now ready to segment our image.