How to implement a Deep Learning dataset annotation platform ? (1/2) : Creating the interface structure


Implement a platform / interface to manually annotate a Deep Learning dataset. Design of the interfa

  • 06 24 98 20 33
  • Creation: 01/29/2020
  • Update: 02/24/2020

Part 1 : Design of an interface

In this part we will create the static structure of a Deep Learning dataset annotation interface. Then we will make this structure interactive in the next part. If you don't have an image dataset at hand I invite you to download the project folder.

Annotations and segmentation

Annotating your dataset is like making sense of our raw data. A meaning understandable by the machine. This will allow it to learn from the annotated (or labeled) data. Let's take a few examples to make the term of annotation less confusing.

Different types of annotation exist. With a data set of cat images, for example, each of them could be annotated (or labeled) :

  •  By 0 or 1: contains or does not contain a cat
  •  By the coordinates (x1, y1) and (x2, y2): respectively the coordinates at the top left and bottom right of the box containing the cat in the image
  • Pixel by pixel with a value of 0 or 1: answers the question "Is this pixel a cat pixel?"

In the last case, there is a precision at pixel's order on the annotation. This is called segmentation. This labeled data is called targets.

This tutorial aims to design a graphical interface with OpenCV and python to label image segmentation targets. But for any type of annotation, what is expected from the annotation interface of a dataset remains similar.

Segmentation by shape

Let's get to the point. We will implement our interface from a config.csv file that will define how each class (or channel) will be treated. By class, we must mean category. Throughout this tutorial, I will build the interface with the segmentation of a billiard game as an example. The different classes are then the billiard cue, the red ball, the yellow ball and the black ball (I didn't have a white ball available, but we would have had an extra class). My config.csv file is the following :

This file is composed as follows:

  • Class column : name of the defined class.
  • Column bgr_color : color of the class, in BGR format, with which we want it to be represented.
  • Column shape : shape to take to draw it :
  • 0 : draw a polygon of which we don't know in advance the number of sides.
  • 1 : draws pixel by pixel
  • 2 : draws a circle from two points that define one of its diameters
  • 3 or more : polygon whose number of sides is known in advance

The number we give to each class is the number of reference points we will have to indicate with our interface. In other words, for a ball (shape = 2), we will have to draw it by clicking on the two ends of its diameter in the image. If we don't know in advance the number of sides of the polygon (shape = 0), we can validate our clicks by pressing the Enter key on the keyboard. In the other cases we can validate the shapes automatically when the number of clicks made is the right one. We will come back to this in the second part of this tutorial. To begin, let's tackle the global structure of our interface.

Structure globale de notre interface

Before to dive in the implementation, let's take a quick look at what we expect from this interface. To create a data set with annotated targets, it must be able to :

  • import the input images
  • import their equivalent target if it already exists
  • save
  • be adaptable according to a configuration file of the different classes (config.csv)
  • modify the targets according to the drawing shapes
  • erase them, "erase", in the event of an error.
  • inform us of the image and the class we are treating
  • allow a global rendering of the input image and the modifications on its associated target in real time
  • be controllable from a keyboard and mouse
  • as a bonus: being able to zoom in to be more precise

Our interface will be in the form of a class and will contain the methods below. We will complete these methods one by one during this part to have a fully functional interface structure. We will also complete the necessary function arguments at the same time.

import os
import glob
import argparse
import cv2
import numpy as np
parser = argparse.ArgumentParser()
parser.add_argument('x_save_dir', help='Train images directory')
parser.add_argument('y_save_dir', help='Target images directory')
parser.add_argument('config_path', help='Path to csv config file')
args = parser.parse_args()

class ManualSegmentation:
    """Class to manually segment the dataset"""

    def __init__(self, *args):
        """Init the manual segmentation GUI"""

    def load_config(self, *args):
        """Load the config of the targets if the path is given"""

    def load(self, *args):
        """Load the current image and its target"""

    def save(self, *args):
        """Save the current target matrix"""

    def delete(self, *args):
        """Delete the current image and its target"""

    def hex2tuple(*args):
        """Convert an hexadecimal color to its associated tuple"""

    def get_x_image(self, *args):
        """Get the normalized training image"""

    def get_y_image(self, *args):
        """Get the colorized target as an image"""

    def get_params_image(self, *args):
        """Build the parameters bar to display information
        at the bottom of the GUI"""

    def get_frame(self, *args):
        """Get the GUI frame"""

    def run(self, *args):
        """Run the GUI until quit"""

if __name__ == "__main__":
    ms = ManualSegmentation(args.x_save_dir, args.y_save_dir, args.config_path)

Import the configurations

First we'll need to initialize our interface.

      def __init__(self, x_save_dir, y_save_dir, config_save_path):
        # number of _classes
        self.n_class = None
        # shapes of draw of the different classes
        self.shapes = None
        # hexadecimal BGR color of the different classes
        self.colors = None
        # name of the different classes
        self.class_names = None
        # load the config save path to fill the attributes above

To process this configuration data, we need to create some attributes. We need to know the total number of classes that our target will have. We also need to know their names, their typical shapes and the color associated with them. Finally, we can load these configurations from our config.csv file.

    def load_config(self, path):
        if os.path.exists(path):
            # get config of the targets as a str matrix
            config = np.genfromtxt(
                path, delimiter=",", dtype=np.str, comments=None
            # convert to a dictionary
            config = {
                column[0]: np.array(column[1:])
                for _, column in enumerate(zip(*config))
            # load the shapes into an integer
            self.shapes = np.array([int(value) for value in config['shape']])
            # load the hexadecimal BGR color
            self.colors = config['bgr_color']
            # load the name of the different classes
            self.class_names = config['class']
            # Get the number of _classes
            self.n_class = len(self.class_names)
            print(f"The path {path} does not exist")

As we have seen, we did not instantiate the object's attributes in the __init__() function since these are instantiated when the configuration file is imported. It is important to change the "comment" argument of the numpy.genfromtxt() function. By default, comments are indicated by a #. So there would have been a reading problem with the colors in BGR which are also indicated by a #. Then, we instantiate our attributes as a data array, depending on the configuration.

Import and save our data

With our configurations well defined, we can then start importing our data. So we will need a few more attributes to complete our __init__() function.

    def __init__(self, x_save_dir, y_save_dir, config_save_path):
             Here the last attributes for the configuration 
        # paths of training images sorted by their ID number
        self.X_paths = glob.glob(os.path.join(x_save_dir, "*.jpg"))
        self.X_paths = sorted(self.X_paths, key=lambda k: (len(k), k))
        # directory of the targets
        self.Y_dir = y_save_dir
        # current index of paths's list
        self.n = 0
        # current channel of the target matrix = 0
        # current X (training) and Y (target) to deal with
        self.X, self.Y = None, None
        # load a training image and its target

We'll never import all the data at once. In the simple case where we end up with 4GB of image, we can't afford to retrieve everything in one variable. So we will load them one by one using their path. In the same way as for configuration attributes, we can't instantiate all our attributes. self.X and self.Y will be instantiated in our load() function. These two variables correspond to the input image that will be displayed on the interface and its target.

    def load(self):
        # get the input image
        self.X = cv2.imread(self.X_paths[self.n])
        # get the target image
        # get its name by removing folders in the X path anoted with `//` or `\`
        file_name = self.X_paths[self.n].split("\\")[-1].split("/")[-1]
        # split by the `.` to remove the extension to get the name
        name = file_name.split(".")[0]
        # get its according target path if it exists
        y_path = glob.glob(os.path.join(self.Y_dir, name + ".npy"))
        # if the target already exists, load it
        if len(y_path):
            self.Y = np.load(y_path[0])
        # else set it to an empty matrix (full of zeros)
            self.Y = np.zeros((*self.X.shape[:2], self.n_class))

We import an image and its target assuming that the target file is in .npy format and has the same name as its associated image. In other words, if the image has the path /Path/Image/0123456.png then its associated target has the path /Path/Target/0123456.npy.

If this file doesn't exist then a matrix is created from scratch. Note that during segmentation, if an image has a dimension (height, width, 3), its target will have a dimension (width, height, class_number).

    def save(self):
        # get its name by removing folders in the path anoted with `//` or `\`
        file_name = self.X_paths[self.n].split("\\")[-1].split("/")[-1]
        # split by the `.` to remove the extension to get the name
        name = file_name.split(".")[0]
        # convert to boolean to gain space
        self.Y = self.Y.astype(np.bool)
        # save it, name + ".npy"), self.Y)

In segmentation, for each class, for each pixel we have 0 or 1 to say whether or not that pixel belongs to the class. Then to save this data by optimizing its memory size, we can export it in binary form by converting it to boolean with the function numpy_array.astype(np.bool).

Finally for some reason we can decide that an image must be deleted (in case its quality leaves something to be desired for example). So let's create this function, remembering to load the next image afterwards.

    def delete(self):
        path_to_remove = self.X_paths.pop(self.n)
        # get the target's name
        name = path_to_remove.split("\\")[-1].split("/")[-1].split(".")[0]
        # get its according target path if it exists
        y_path = glob.glob(os.path.join(self.Y_dir, name + ".npy"))
        # it the target exists, delete it
        if len(y_path):
        # actualize the current image index
        self.n = self.n % len(self.X_paths)
        # load the new current training sample

Display the images

Everything is in place to create the static interface with OpenCV. So we need to be able to visualize the images and targets.

    def hex2tuple(value, normalize=False):
        # get the hexadecimal value
        value = value.lstrip("#")
        # get it length
        lv = len(value)
        # get the associated color in 0 to 255 base
        color = np.array([int(value[i : i + lv // 3], 16)
                            for i in range(0, lv, lv // 3)])
        # normalize it if needed
        if normalize:
            color = tuple(color / 255)
        return color

    def get_x_image(self):
        return (self.X - np.min(self.X)) / (np.max(self.X) - np.min(self.X))

    def get_y_image(self):
        # create a black background image
        y_image = np.zeros((*self.Y.shape[:2], 3))
        # for each channel, set the color to the image according to its mask
        for i, color in enumerate(self.colors):
            # get the the mask
            mask = self.Y[:, :, i] > 0
            # update the colorized image
            y_image[np.where(mask)] = self.hex2tuple(color, normalize=True)
        return y_image

We first define the hex2tuple() method to convert the color format #BGR to (B, G, R). Then we normalize the input image, and colorize the target according to the classes.

Create the parameters bar

You can now create a parameter bar to get a preview of what you're dealing with. Don't be afraid of the length of the code block. It may be a little long, but it is not very complex.

It is a question of creating a bar summarizing the current parameters: On the left we place the name of the image we are processing, on the right we place the channel, its name and its color.

    def get_params_image(self):
        # create the image to display the current parameters
        params_img = np.ones((self.X.shape[0] // 10, self.X.shape[1] * 2, 3))

        # get its name
        name = self.X_paths[self.n].split("\\")[-1].split("/")[-1]
        # compute its height to center the text of the name and compute is fontsize
        text = "{0} - ({1}/{2})".format(name, self.n + 1, len(self.X_paths))
        size = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 1, 2)
        text_height = size[0][1]
        fontsize = (params_img.shape[0] / text_height) // 2
        # put the name of the image at left
            (params_img.shape[0] // 2, params_img.shape[0] * 2 // 3),
            (0, 0, 0),

        # set the current color's class to deal with at right

        # get the up left corner of the color legend rectangle
        up_left = (
            params_img.shape[1] * 17 // 20,
            params_img.shape[0] * 1 // 4,
        # get the down right corner of the color legend rectangle
        down_right = (
            params_img.shape[1] * 19 // 20,
            params_img.shape[0] * 3 // 4,
        # draw the rectangle to legend the current channel
        color = self.hex2tuple(self.colors[], normalize=True)
        cv2.rectangle(params_img, up_left, down_right, color, cv2.FILLED)
        cv2.rectangle(params_img, up_left, down_right, (0, 0, 0), 2)

        # set the current name's class to deal with at the left of its color

        # get the name of the class
        legend = self.class_names[]
        # compute its size to put it a the left of the colored rectangle
        size = cv2.getTextSize(legend, cv2.FONT_HERSHEY_SIMPLEX, fontsize, 2)
        # put the text
                params_img.shape[1] * 16 // 20 - size[0][0],
                params_img.shape[0] * 2 // 3,
            (0, 0, 0),
        return params_img

Put it all together

All that's left to do is to assemble the whole thing into an image (or frame) and implement the method to run the whole thing.

    def get_frame(self):
        # get the current training image
        x_img = self.get_x_image()
        # get the colorized training target matrix
        y_img = self.get_y_image()
        # apply a filter to see the training image
        # behind the colorized training target matrix
        alpha = 0.3
        y_img = cv2.addWeighted(x_img, alpha, y_img, 1 - alpha, 0)
        params_img = self.get_params_image()
        # concatenate the training image and the target image horizontally
        concat_xy = np.hstack((x_img, y_img))
        # concatenate vertically with parameter bar image
        gui_img = np.vstack((concat_xy, params_img))
        return gui_img

    def run(self):
        while True:
            # Set the window as normal
            cv2.namedWindow("GUI", cv2.WINDOW_NORMAL)

            # Set the window in full screen
            cv2.setWindowProperty("GUI", 1, 1)
            # Display the current gui frame
            cv2.imshow("GUI", self.get_frame())

            # quit the program when 'q' key is pressed
            key = cv2.waitKey(1) & 0xFF

            if key == ord('q'):

All you need to do is to run the program.

if __name__ == "__main__":
    ms = ManualSegmentation(args.x_save_dir, args.y_save_dir, args.config_path)

You can also run the program from your console (powershell, bash...).

$ python path/to/image/folder path/to/target/folder path/to/config.csv

The interface is currently only a static display. In the next part, we will make it interactive to make it fully operational in the annotation of the segmentation dataset.