Skip to main content

Using Tensorflow Object Detection API to build a Toy detector

Here I extend the API to train on a new object that is not part of the COCO dataset. In this case I chose a toy that was lying around. See gif below. So far, I have been impressed by the performance of the API. The steps highlighted here can be extended to any single or multiple object detector that you want to build.

Tensorflow Toy Detector~

You can find the code on my Github repo
  1. Collecting data
The first step is collecting images for your project. You could download them from google ensuring you have a wide variation in angles, brightness, scale etc. In my case I created a video of the little aeroplane toy and used Opencv to extract images from the video. This saved me a lot of time. I ensured that images were taken from multiple angles. You can also randomly change brightness for some of the images so that the detector can work under different conditions of lightning. Overall 100–150 pics will suffice. See some sample images below:

Sample images

PS: Since video was taken from my Iphone, the original images were pretty big — 1920x1090. This would have required a lot of memory so used PIL resize to resize them to 500x300 kind of keeping the aspect ratio.
2. Annotating the images
I used labelimg to annotate the images. This is a very handle tool and annotations are created in the Pascal VOC format which is useful later on. It is written in Python and uses Qt for interface. I used Python3 + Qt5 with no problems. See example of annotated image. Essentially we identify xmin, ymin, xmax and ymax for the object and pass that to the model along with the image for training

Annotating using labelimg

3. Creating the TFR datasets
Tensorflow API wants the datasets to be in TFRecord file format. This is probably the trickiest part. However tensorflow has provided a couple of handy scripts to get you started — create_pascal_tf_record.py and create_pet_tf_record.pyI was able to used the create_pet_tf_record.pywith minimal edits since labelimg already creates annotations in the correct format. I also like that this script randomly takes 30% of the data and creates a validation TFR file.
You will also need to create a label.pbtxt file that is used to convert label name to a numeric id. For my case it was as simple as
item {
 id: 1
 name: ‘toy’
}
I have included the label_map.pbtxt file and the create_pet_tf_records.py file on my github. In case you are getting stuck anywhere, I highly recommend the Oxfort Pets walkthrough provided by Tensorflow.
4. Creating a model config file
Once the TFR datasets are created, then first you need to decide if you will use an existing model and fine tune it or build from scratch. I highly recommend using an existing model since most of the features that are learnt by CNNs are often object agnostic and fine tuning an existing model is usually an easy and accurate process. Please note that if you do decide to build from scratch you will need much more than 150 images and training will take days. The API provides 5 different models that provide a trade off between speed of execution and the accuracy in placing bounding boxes. See table below:

Tensorflow Detection Models

For this project I decided to use the faster_rcnn_resnet101 that was trained on coco dataset. This is a very nice link if you want to learn more about RCNN models.
Tensorflow provides several sample config files to get started. I decided to use the faster_rcnn_resnet101_coco file and updated any paths that need to be configured in the file.Don’t forget to update the num. of classes too.
5. Training the model
Finally! All the hard (and boring) part is done and we can start training the model. Since I have a reasonable GPU, I decided to train locally. However you can train on the cloud. Again tensorflow documentation has made this easy and provided all the steps.
You can start the training job and the evaluation jobs on two separate terminals at the same time. And initiate tensorboard to monitor performance. After training for 2–3 hours, I could see total loss get down to 0.077 and precision up to 0.99. By looking at images in Tensorboard we can see that model becomes accurate fairly quickly.

Model gets accurate pretty quickly

6. Testing the model
To test the model, we first select a model checkpoint (usually the latest) and export into a frozen inference graph. The script for this is also on my github. I tested the model on a new video recorded on my Iphone. As in my previousarticle, I used the Python moviepy library to parse the video into frames and then run object detector on each frame and collate results back into the video.

Next Steps

Couple of things I noticed and additional explorations for the future
  • During testing, I found the Faster RCNN model was a bit slow. Next, I will explore using the fastest model — SSD mobilenet and see if there is a noticeable decrease in accuracy
  • For this model, I just used the default parameters in the model config file for faster_rcnn_resnet101_coco. Might be worth exploring if they can be tweaked for a better performance
  • With a little more effort, this process can be extended to additional categories
Give me a ❤️ if you liked this post:) Hope you pull the code and try it yourself.
PS: I have my own deep learning model building consultancy and love to work on interesting problems. I have helped several startups deploy innovative AI based solutions. If you have a project that we can collaborate on, then please contact me at priya.toronto3@gmail.com
References:

Comments

Popular posts from this blog

Machine Learning with ML.NET 1.0

As a person coming from .NET world, it was quite hard to get into  machine learning  right away. One of the main reasons was the fact that I couldn’t start Visual Studio and  try out  these new things in the technologies I am proficient with. I had to solve another obstacle and learn other  programming languages  more fitting for the job like Python and R. You can imagine my happiness when more than a year ago,  Microsoft  announced that as a part of  .NET Core 3 , a new feature will be available –  ML.NET . In fact it made me so happy that this is the third time I write similar  guide . Basically, I wrote one when ML.NET was a  version 0.2  and one when it was  version 0.10 . Both times, guys from Microsoft decided to modify the  API  and make my articles obsolete. That is why I have to do it once again. But hey, third time is the charm, so hopefully I will not have to do this again until ML.NET 2.0   Anyhow, with  .NET Core 3  we got a new toy to play around. With this tool we ar

Integrating Tensor flow API into ASP.net Web applications for Image detection

Introduction Tensor flow provides advanced Machine Learning API. For the example here we will provide a solution for the Image recognition. The same can be applied to other use cases such as Text recognition or any other data set that needs to solve the problems of AI and Machine Learning such as: Predictive Analytics Cognitive Processes and its Automation e.g. NLP Virtual Agents (Bot’s) Camera Image detection and processing e.g. Industrial Robotics and Drones Text Knowledge Analytics Application Development Steps In the first step we create an ASP.NET application for connecting the camera feed or input data. In the next step Install Tensor flow as per instructions on the website –>  Click here To begin using Tensor flow we need to decide on the model to use for processing input data – There is an option to use either ready-made models or develop on own depending on the problem at hand. For simplicity we will use ready-made model called a deep  convolutional neu