Part 1 : Design of an interface
In this part we will create the static structure of a Deep Learning dataset annotation interface. Then we will make this structure interactive in the next part. If you don't have an image dataset at hand I invite you to download the project folder.
Annotations and segmentation
Annotating your dataset is like making sense of our raw data. A meaning understandable by the machine. This will allow it to learn from the annotated (or labeled) data. Let's take a few examples to make the term of annotation less confusing.
FF. Li & J. Johnson & S. Yeung
Different types of annotation exist. With a data set of cat images, for example, each of them could be annotated (or labeled) :
In the last case, there is a precision at pixel's order on the annotation. This is called segmentation. This labeled data is called targets.
This tutorial aims to design a graphical interface with OpenCV and python to label image segmentation targets. But for any type of annotation, what is expected from the annotation interface of a dataset remains similar.
Segmentation by shape
Let's get to the point. We will implement our interface from a config.csv file that will define how each class (or channel) will be treated. By class, we must mean category. Throughout this tutorial, I will build the interface with the segmentation of a billiard game as an example. The different classes are then the billiard cue, the red ball, the yellow ball and the black ball (I didn't have a white ball available, but we would have had an extra class). My config.csv file is the following :
This file is composed as follows:
The number we give to each class is the number of reference points we will have to indicate with our interface. In other words, for a ball (shape = 2), we will have to draw it by clicking on the two ends of its diameter in the image. If we don't know in advance the number of sides of the polygon (shape = 0), we can validate our clicks by pressing the Enter key on the keyboard. In the other cases we can validate the shapes automatically when the number of clicks made is the right one. We will come back to this in the second part of this tutorial. To begin, let's tackle the global structure of our interface.
Structure globale de notre interface
Before to dive in the implementation, let's take a quick look at what we expect from this interface. To create a data set with annotated targets, it must be able to :
Our interface will be in the form of a class and will contain the methods below. We will complete these methods one by one during this part to have a fully functional interface structure. We will also complete the necessary function arguments at the same time.
import os
import glob
import argparse
import cv2
import numpy as np
parser = argparse.ArgumentParser()
parser.add_argument('x_save_dir', help='Train images directory')
parser.add_argument('y_save_dir', help='Target images directory')
parser.add_argument('config_path', help='Path to csv config file')
args = parser.parse_args()
class ManualSegmentation:
"""Class to manually segment the dataset"""
def __init__(self, *args):
"""Init the manual segmentation GUI"""
pass
def load_config(self, *args):
"""Load the config of the targets if the path is given"""
pass
def load(self, *args):
"""Load the current image and its target"""
pass
def save(self, *args):
"""Save the current target matrix"""
pass
def delete(self, *args):
"""Delete the current image and its target"""
pass
@staticmethod
def hex2tuple(*args):
"""Convert an hexadecimal color to its associated tuple"""
pass
def get_x_image(self, *args):
"""Get the normalized training image"""
pass
def get_y_image(self, *args):
"""Get the colorized target as an image"""
pass
def get_params_image(self, *args):
"""Build the parameters bar to display information
at the bottom of the GUI"""
pass
def get_frame(self, *args):
"""Get the GUI frame"""
pass
def run(self, *args):
"""Run the GUI until quit"""
pass
if __name__ == "__main__":
ms = ManualSegmentation(args.x_save_dir, args.y_save_dir, args.config_path)
ms.run()
Import the configurations
First we'll need to initialize our interface.
def __init__(self, x_save_dir, y_save_dir, config_save_path):
# number of _classes
self.n_class = None
# shapes of draw of the different classes
self.shapes = None
# hexadecimal BGR color of the different classes
self.colors = None
# name of the different classes
self.class_names = None
# load the config save path to fill the attributes above
self.load_config(config_save_path)
To process this configuration data, we need to create some attributes. We need to know the total number of classes that our target will have. We also need to know their names, their typical shapes and the color associated with them. Finally, we can load these configurations from our config.csv file.
def load_config(self, path):
if os.path.exists(path):
# get config of the targets as a str matrix
config = np.genfromtxt(
path, delimiter=",", dtype=np.str, comments=None
)
# convert to a dictionary
config = {
column[0]: np.array(column[1:])
for _, column in enumerate(zip(*config))
}
# load the shapes into an integer
self.shapes = np.array([int(value) for value in config['shape']])
# load the hexadecimal BGR color
self.colors = config['bgr_color']
# load the name of the different classes
self.class_names = config['class']
# Get the number of _classes
self.n_class = len(self.class_names)
else:
print(f"The path {path} does not exist")
As we have seen, we did not instantiate the object's attributes in the __init__() function since these are instantiated when the configuration file is imported. It is important to change the "comment" argument of the numpy.genfromtxt() function. By default, comments are indicated by a #. So there would have been a reading problem with the colors in BGR which are also indicated by a #. Then, we instantiate our attributes as a data array, depending on the configuration.
Import and save our data
With our configurations well defined, we can then start importing our data. So we will need a few more attributes to complete our __init__() function.
def __init__(self, x_save_dir, y_save_dir, config_save_path):
"""
Here the last attributes for the configuration
"""
# paths of training images sorted by their ID number
self.X_paths = glob.glob(os.path.join(x_save_dir, "*.jpg"))
self.X_paths = sorted(self.X_paths, key=lambda k: (len(k), k))
# directory of the targets
self.Y_dir = y_save_dir
# current index of paths's list
self.n = 0
# current channel of the target matrix
self.channel = 0
# current X (training) and Y (target) to deal with
self.X, self.Y = None, None
# load a training image and its target
self.load()
We'll never import all the data at once. In the simple case where we end up with 4GB of image, we can't afford to retrieve everything in one variable. So we will load them one by one using their path. In the same way as for configuration attributes, we can't instantiate all our attributes. self.X and self.Y will be instantiated in our load() function. These two variables correspond to the input image that will be displayed on the interface and its target.
def load(self):
# get the input image
self.X = cv2.imread(self.X_paths[self.n])
# get the target image
# get its name by removing folders in the X path anoted with `//` or `\`
file_name = self.X_paths[self.n].split("\\")[-1].split("/")[-1]
# split by the `.` to remove the extension to get the name
name = file_name.split(".")[0]
# get its according target path if it exists
y_path = glob.glob(os.path.join(self.Y_dir, name + ".npy"))
# if the target already exists, load it
if len(y_path):
self.Y = np.load(y_path[0])
# else set it to an empty matrix (full of zeros)
else:
self.Y = np.zeros((*self.X.shape[:2], self.n_class))
We import an image and its target assuming that the target file is in .npy format and has the same name as its associated image. In other words, if the image has the path /Path/Image/0123456.png then its associated target has the path /Path/Target/0123456.npy.
If this file doesn't exist then a matrix is created from scratch. Note that during segmentation, if an image has a dimension (height, width, 3), its target will have a dimension (width, height, class_number).
def save(self):
# get its name by removing folders in the path anoted with `//` or `\`
file_name = self.X_paths[self.n].split("\\")[-1].split("/")[-1]
# split by the `.` to remove the extension to get the name
name = file_name.split(".")[0]
# convert to boolean to gain space
self.Y = self.Y.astype(np.bool)
# save it
np.save(os.path.join(self.Y_dir, name + ".npy"), self.Y)
In segmentation, for each class, for each pixel we have 0 or 1 to say whether or not that pixel belongs to the class. Then to save this data by optimizing its memory size, we can export it in binary form by converting it to boolean with the function numpy_array.astype(np.bool).
Finally for some reason we can decide that an image must be deleted (in case its quality leaves something to be desired for example). So let's create this function, remembering to load the next image afterwards.
def delete(self):
path_to_remove = self.X_paths.pop(self.n)
os.remove(path_to_remove)
# get the target's name
name = path_to_remove.split("\\")[-1].split("/")[-1].split(".")[0]
# get its according target path if it exists
y_path = glob.glob(os.path.join(self.Y_dir, name + ".npy"))
# it the target exists, delete it
if len(y_path):
os.remove(y_path[0])
# actualize the current image index
self.n = self.n % len(self.X_paths)
# load the new current training sample
self.load()
Display the images
Everything is in place to create the static interface with OpenCV. So we need to be able to visualize the images and targets.
@staticmethod
def hex2tuple(value, normalize=False):
# get the hexadecimal value
value = value.lstrip("#")
# get it length
lv = len(value)
# get the associated color in 0 to 255 base
color = np.array([int(value[i : i + lv // 3], 16)
for i in range(0, lv, lv // 3)])
# normalize it if needed
if normalize:
color = tuple(color / 255)
return color
def get_x_image(self):
return (self.X - np.min(self.X)) / (np.max(self.X) - np.min(self.X))
def get_y_image(self):
# create a black background image
y_image = np.zeros((*self.Y.shape[:2], 3))
# for each channel, set the color to the image according to its mask
for i, color in enumerate(self.colors):
# get the the mask
mask = self.Y[:, :, i] > 0
# update the colorized image
y_image[np.where(mask)] = self.hex2tuple(color, normalize=True)
return y_image
We first define the hex2tuple() method to convert the color format #BGR to (B, G, R). Then we normalize the input image, and colorize the target according to the classes.
Create the parameters bar
You can now create a parameter bar to get a preview of what you're dealing with. Don't be afraid of the length of the code block. It may be a little long, but it is not very complex.
It is a question of creating a bar summarizing the current parameters: On the left we place the name of the image we are processing, on the right we place the channel, its name and its color.
def get_params_image(self):
# create the image to display the current parameters
params_img = np.ones((self.X.shape[0] // 10, self.X.shape[1] * 2, 3))
# get its name
name = self.X_paths[self.n].split("\\")[-1].split("/")[-1]
# compute its height to center the text of the name and compute is fontsize
text = "{0} - ({1}/{2})".format(name, self.n + 1, len(self.X_paths))
size = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 1, 2)
text_height = size[0][1]
fontsize = (params_img.shape[0] / text_height) // 2
# put the name of the image at left
cv2.putText(
params_img,
text,
(params_img.shape[0] // 2, params_img.shape[0] * 2 // 3),
cv2.FONT_HERSHEY_SIMPLEX,
fontsize,
(0, 0, 0),
2,
)
# set the current color's class to deal with at right
# get the up left corner of the color legend rectangle
up_left = (
params_img.shape[1] * 17 // 20,
params_img.shape[0] * 1 // 4,
)
# get the down right corner of the color legend rectangle
down_right = (
params_img.shape[1] * 19 // 20,
params_img.shape[0] * 3 // 4,
)
# draw the rectangle to legend the current channel
color = self.hex2tuple(self.colors[self.channel], normalize=True)
cv2.rectangle(params_img, up_left, down_right, color, cv2.FILLED)
cv2.rectangle(params_img, up_left, down_right, (0, 0, 0), 2)
# set the current name's class to deal with at the left of its color
# get the name of the class
legend = self.class_names[self.channel]
# compute its size to put it a the left of the colored rectangle
size = cv2.getTextSize(legend, cv2.FONT_HERSHEY_SIMPLEX, fontsize, 2)
# put the text
cv2.putText(
params_img,
legend,
(
params_img.shape[1] * 16 // 20 - size[0][0],
params_img.shape[0] * 2 // 3,
),
cv2.FONT_HERSHEY_SIMPLEX,
fontsize,
(0, 0, 0),
2,
)
return params_img
Put it all together
All that's left to do is to assemble the whole thing into an image (or frame) and implement the method to run the whole thing.
def get_frame(self):
# get the current training image
x_img = self.get_x_image()
# get the colorized training target matrix
y_img = self.get_y_image()
# apply a filter to see the training image
# behind the colorized training target matrix
alpha = 0.3
y_img = cv2.addWeighted(x_img, alpha, y_img, 1 - alpha, 0)
params_img = self.get_params_image()
# concatenate the training image and the target image horizontally
concat_xy = np.hstack((x_img, y_img))
# concatenate vertically with parameter bar image
gui_img = np.vstack((concat_xy, params_img))
return gui_img
def run(self):
while True:
# Set the window as normal
cv2.namedWindow("GUI", cv2.WINDOW_NORMAL)
# Set the window in full screen
cv2.setWindowProperty("GUI", 1, 1)
# Display the current gui frame
cv2.imshow("GUI", self.get_frame())
# quit the program when 'q' key is pressed
key = cv2.waitKey(1) & 0xFF
if key == ord('q'):
break
All you need to do is to run the program.
if __name__ == "__main__":
ms = ManualSegmentation(args.x_save_dir, args.y_save_dir, args.config_path)
ms.run()
You can also run the program from your console (powershell, bash...).
$ python main.py path/to/image/folder path/to/target/folder path/to/config.csv
The interface is currently only a static display. In the next part, we will make it interactive to make it fully operational in the annotation of the segmentation dataset.