If a computer would have eyes, what would it be able to recognize? Distinguishing cats and dogs would be nice, but what’s better is recognizing all 7,870 objects in the Open Images dataset! Here’s an exercise. Look out the window and count how many objects you can recognize. Object recognition is bringing that part of the human faculty into a computer. It enables a breadth of applications previously very difficult for a computer — from self-driving cars to advanced security. Your Facebook face recognition to traffic management.
Computer vision has truly gone far from classifying entire images into recognizing individual objects in the image. That’s the difference from, “Here’s a picture of road with vehicles” to “There’s 12-15 cars and 4-6 motorcycles in this photo”. That’s an algorithm that gives sufficient context to declare say, “medium traffic”.
Sounds great. How can I sign up?
A lot of the big cloud providers have ready-made APIs to do this. There’s Google, Microsoft and Amazon’s implementations. If you have the budget, then simply plug it in your application and you’ll have the Not Hotdogs app. But I want to focus on something that you can get your hands dirty with. Most of these APIs are powered by deep learning models and in object recognition, YOLO is your guy. I recommend checking out their amazing opener for the YOLO v3 implementation:
We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that’s pretty swell. It’s a little bigger than last time but more accurate. It’s still fast though, don’t worry.
The authors have a sense of humor but don’t let that fool you. It is a great algorithm. It is named You Only Look Once since the algorithm’s main contribution is that it is very fast. Other models perform a two-stage region proposal and object classification stage. YOLO uses a single model to output the bounding boxes and the class probabilities. It blows away other algorithms in terms of speed and accuracy.
This YOLO is nice. How can I get it?
There’s an open-source library for that. ImageAI is a Python library that can easily enable developers to do object recognition with off-the-shelf trained models. One can also do custom training with their own objects.
I tried the simple recognition use case in this Kaggle notebook. It’s as simple as the following snippet:
from imageai.Detection import ObjectDetection # load YOLO here detector = ObjectDetection() detector.setModelTypeAsYOLOv3() detector.setModelPath("yolo.h5") detector.loadModel() # load your image here "input_img" # ... detections = detector.detectObjectsFromImage( input_img, input_type='array', minimum_percentage_probability=50, output_type='array')
I tried it out on the Open Images dataset, picking out pictures with at least one “Person” object in it. The YOLO model was off-the-shelf from ImageAI, which is trained on 80 classes including persons, vehicles and house items. I set the model to output regions with a probability to be an object to be greater than 50%.
See how it performed in some of my sample inputs
From left to right. (1) 100% a person, but notice the tie. (2) It’s amazing that overlapping bounding boxes are recognized. (3) An out-of-focus person can still be recognized. (4 & 5) Drawings are recognized as persons. (6-8) Multiple persons can be detected. (9) The model thought there were two dogs.
(1) This is an example where the algorithm is stumped. (2) Feeling good! (3) The 404 message is mislabeled as a clock. (4) Multiple persons mislabeled as a single person. (5) I don’t really know what’s happening here — some kind of party? (6) Wrong labels all around! (7-9) All correct.
(1) I don’t think this image contain a person anyway! (2) Pretty cool that a bowl was found. (3-9) Persons!
The image with the person walking through rubble raises a possibility that this same technology can help disaster recovery efforts. Drones that patrol forest fires or floods for example can recognize stranded persons.
It outputs the objects in a Python dict so you should be able to manipulate it easy for your app. Here’s a mundane application. Being able to tell individual objects in a photo can auto-sort your growing stockpile of camera images. Or at the very least, spot clocks, bowls and skateboards.