Custom object detection using Tensorflow Object Detection API
Problem to solve
Given a collection of images with a target object in many different shapes, lights, poses and numbers, train a model so that given a new image, a bounding box will be drawn around each of the target objects if they are present in the image.
Steps to take
- Step 1 - Label the images
- Step 2 - Install Tensorflow Object Detection API
- Step 3 - Prepare the labeled images as input
- Step 4 - Configure an object detection pipeline for training
- Step 5 - Train and evalute the pipeline
- Step 6 - Export the trained model for inferencing.
- Common errors and solutions
Step 1 - Label the images
You can use tools such as VoTT or LabelImg to label images. Here we use VoTT to output data in Pascal VOC format.
- Open the folder which contains your collection of images
- Put in labels. You can train the model to recognize multiple types of objects, but here we will only recognize one type of objects, say, helmets.
- Go through each image:
- Draw a bounding box for each occurance of the target object, helmet, in that image,
- The default label is already applied, click on any other applicable label
- Export to Pascal VOC format, the output folder will look like this:
+Annotations (contains the label info in xml for each image) +ImageSets +Main -{label}_train.txt -{label}_val.txt +JPEGImages (contains the image files) -pascal_label_map.pbtxt (map of label and id)
Step 2 - Install Tensorflow Object Detection API
Instead of starting from scratch, pick an Azure Data Science VM, or Deep Learning VM which has GPU attached. This saves a lot of setup steps because the VMs come with a plethora of machine learning frameworks and tools installed, including Tensorflow. We will use a Ubuntu 16.04 based DSVM here. As for the VM size, you can start with a small size such as DS2_v3. But when it’s time to train, you’ll need to scale it to a larger size, otherwise it will probably take many days to train on hundreds of images.
- Install Tensorflow Object Detection API
git clone https://github.com/tensorflow/models.git
- Most of the dependencies in the installation doc are already installed on the DSVM. However, make sure to follow the steps in each of the sections after the Dependencies section to perform the additional installation. If these steps are not followed, you will see errors along the way.
- Make sure the Tensorflow version installed on DSVM is compatible with git cloned Object Detection API.
- To check Tensorflow version, run
python3 -c 'import tensorflow as tf; print(tf.__version__)'
- To check Tensorflow installation location, run
python3 -c 'help("tensorflow")'
- Since the git cloned API is always the latest, run
pip3 install --upgrade tensorflow-gpu
to update Tensorflow to the latest.
- To check Tensorflow version, run
Step 3 - Prepare the labeled images as Tensorflow input
Tensorflow Object Detection API takes TFRecords as input, so we need to convert Pascal VOC data to TFRecords. The script to do the convertion is located in the object_detection/dataset_tools folder. You need to modify one of the files such as create_pascal_tf_record.py
or create_pet_tf_record.py
to convert your data. Pick a script that converts data format close to yours. Here we pick create_pascal_tf_record.py
as our template, and modified it to convert our VoTT output above. Don’t worry about making a mistake here, you will quickly see an error when you run the following command if you made a mistake. Run the script to convert input data to TFRecords:
python object_detection/dataset_tools/{my_create_tf_record}.py --set=train --data_dir=path/to/VoTToutputFolder --output_dir=path/to/TFRecordsOutput
python object_detection/dataset_tools/{my_create_tf_record}.py --set=val --data_dir=path/to/VoTToutputFolder --output_dir=path/to/TFRecordsOutput
Step 4 - Configure an object detection pipeline for training
Instead of creating a model from scratch, a common practice is to train a pre-trained model listed in Tensorflow Detection Model Zoo on your own dataset. These models are trained on well known datasets which may not include the type of object you are trying to detect, but we can leverage transfer learning to train these models to detect new types of object. If you don’t have GPU, pick a faster model over a more accurate one. Here, we choose ssd_mobilenet_v1_coco.
- Download the pre-trained ssd_mobilenet_v1_coco from Tensorflow Detection Model Zoo. It should include the following files:
-checkpoint -frozen_inference_graph.pb -model.ckpt.data-00000-of-00001 -model.ckpt.index -model.ckpt.meta -pipeline.config -saved_model/saved_model.pb
- Since we have a lot of artifacts, including input image data, TFRecords, pre-trained model, and training output, it’s a good idea to organize the directory similar to what’s suggested on Tensorflow Object Detection github. Our directory looks like this:
+helmet_detection +data (contains the output from VoTT) +tfrecords (contains generated tfrecords) +models +ssd_mobilenet_v1_coco (contains downloaded ssd_mobilenet_v1_coco model) +train (contains the training output files)
- Edit pipeline.config with the following main modifications. See our sample config.
num_classes
should be 1 if you are detecting one type of objectsfine_tune_checkpoint
should be path/to/downloaded_ssd_mobilenet_v1_coco/model.ckptlabel_map_path
should be path/to/pacal_label_map.pbtxt in the input datatrain_input_reader.tf_record_input_reader.input_path
should be path/to/train_tfrecordeval_input_reader.tf_record_input_reader.input_path
should be path/to/val_tfrecord
Step 5 - Train and evalute the pipeline
From the tensorflow/models/research/ directory, run the following command to train the model:
python object_detection/model_main.py --pipeline_config_path=path/to/modified_pipeline.config --model_dir=path/to/training_output --alsologtostderr
On a GPU, this may take a couple hours for precision to go above, say, 80%, or loss to go below, say 1. On a CPU, it could take much longer. Run tensorboard to observe how precision and loss change as the model learns:
tensorboard --logdir=path/to/training_output
If your images are of low quality, or the target object is very hard to detect in the images, or you have few images (less than 50), the mean average precision and total loss may appear erratic and unable to converge even after training for long time. Start with easy to detect object and good quality images.
Step 6 - Export the trained model for inferencing
- Pick a checkpoint in the training output folder which contains the following 3 files:
-model.ckpt-{checkpoint#}.data-00000-of-00001 -model.ckpt-{checkpoint#}.index -model.ckpt-{checkpoint#}.meta
- From the tensorflow/models/research folder, run
python object_detection/export_inference_graph.py --input_type=image_tensor --pipeline_config_path=path/to/pipeline.config --trained_checkpoint_prefix=path/to/training_output_dir/model.ckpt-{checkpoint#} --output_directory=path/to/output_model_files_for_inference
- Modify object_detection/object_detection_tutorial.ipynb to use our trained model and our test image. Here’s our sample notebook.
Common errors and solutions
I’ve encountered the following main issues in this process of custom object detection. With some research, I found that the community has found resolutions or workaround.
- Many errors can result by forgetting to run the following from tensorflow/models/research folder. Make sure this is set in every shell session:
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
- Error message “Value Error: First Step Cannot Be Zero”
Resolution: https://github.com/tensorflow/models/issues/3794 - Error message “_tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file /usr/local/lib/python2.7/dist-packages/tensorflow/models/model/model.ckpt.data-00000-of-00001”
Resolution: https://github.com/tensorflow/models/issues/2231. fine_tune_checkpoint=file_path/model.ckpt - Error message “TypeError: can’t pickle dict_values objects”
Resolution: https://github.com/tensorflow/models/issues/4780