Image Recognition with AWS “Rekognition”

AWS Rekognition (sic) is a cloud service that provides a number of features for image recognition. You can train it with your own custom images, or you can use the canned training that’s already been done for you, if that’s sufficient.

I decided to try to use the tool both ways: by training it with some custom images and by using the out-of-the-box, pre-trained capabilities of Rekognition.

Rekognition Custom Labels

You use AWS Rekognition Custom Labels if you have your own images, which make sense for your application, but which aren’t handled by the standard Rekognition service. For example, you have good widgets coming off your production line, and damaged ones. You could train Rekognition to know the difference between the two and move the damaged ones off to the side.

I decided to train Rekognition to be able to recognize various birds. I’m thinking of pointing a camera at my bird feeder and sending the output to Rekognition. Caltech has a really nice library of labeled bird pictures of 200 different varieties of birds. They have color images and some outlines of the bird’s shapes. I just used the color images. First you go into the Recognition UI in the AWS Console. Click on Use Custom Labels. You need to create a data set. Recognition creates a bucket for you, and you can dump the images in there. You can drag and drop them, but if you have a lot of images you’re better off using the AWS CLI, with a command like this (which copies a folder containing all its images to an S3 bucket):

aws s3 cp --recursive 200.Common_Yellowthroat/ s3://mikes-bird-bucket/200.Common_Yellowthroat

I only copied some of the folders to S3 because you pay for Rekognition training time and you may not want to load it up with all 18,000 images until you have a feel for what that’s going to cost you. And anyway, a lot of those birds aren’t native to my area.

Having created a data set, you click on Project, and create a new Rekognition project. You tell Rekognition where your images are. Conveniently, the Caltech bird images were nicely organized in folders that were named with the correct label for each image, i.e., the type of bird. So Recognition was able to understand the correct type of bird in the pictures with no further work by me. I started training, and waited. And waited. It took a while. Maybe an hour. And that was with only about 500 images. Rekognition is, I believe, optimizing the hyperparameters of the deep learning model it’s using. So that takes a while because it means it’s training the model multiple times to find the best hyperparameters. But the results are amazing. My model ended up with an F1 score of .977. The best possible score is 1.0, so the model is extremely accurate at classifying the nine types of birds I trained it to recognize. And if you look at these images, you’ll see that the differences between some of the varieties of birds is pretty subtle, so it’s impressive that Rekognition can tell the difference.

Recognizing New Images

Having trained our model, how do we use it? First, you need to start the model. In the Rekognition console they provide an AWS CLI command that does that. It takes about 5 minutes to fire it up. Then you can use the AWS CLI to provide your model with an image to classify, but to create a real application you’ll want to write a little code. I found some example code that AWS provides in the SDK for Rekognition, and managed to modify it and get it to work for Rekognition Custom Labels (although the API call I’m using here isn’t mentioned in the documentation, which might not be up-do-date. I just guessed it and it worked). Here’s my code:

To make this work I had to install boto (the AWS Python SDK) and figure out how to get my AWS API credentials and load them into a file and put them into my .boto folder.

Using Standard Rekognition

I managed to get an AWS account with some free credits, so I was able to train my custom model with 9 bird varieties for free, although it did cost a couple dollars in credits. But if I wanted to train my model with all 200 bird varieties, that might cost a lot more. And the other thing that worried me was the cost of inference (running the model vs. new images). You get a few hours as part of the AWS free tier, but after that it costs $4.00 an hour. But if you’re running it 24×7 that adds up. It looks like AWS is firing up a VM to run my model (its hard to tell because it doesn’t show up as an EC2 instance in the console). But I’m guessing that’s what’s happening because of the delay in starting up my model, and due to the cost. So I’m wondering if instead of using Custom Labels, I can just use standard out-of-the-box Rekognition. Pricing for standard image recognition (i.e., no custom labels) is really cheap. It’s priced per API call, and you can get 5000 calls per month free. Even when you exhaust that, it costs only 0.1 cents per image. So for my personal application here, I’d like to use standard Rekognition if possible. It turns out it’s quite good. It can’t distinguish between 10 different kinds of sparrows, but it can tell you that a bird is a sparrow, and not, say, a blackbird.

The Python code to call the REST API for standard Rekognition is similar to the code for Rekognition Custom Labels. The main difference is that you haven’t trained a custom model, and don’t need to start it up ahead of time, or refer to it in your API call:

For the above picture, which was in the test data set, the output is:

Detected labels for 010.Red_winged_Blackbird/Red_Winged_Blackbird_0011_5845.jpg
Label Animal
Confidence 99.97032165527344
Label Bird
Confidence 99.97032165527344
Label Beak
Confidence 94.90032196044922
Label Agelaius
Confidence 93.90814971923828
Label Blackbird
Confidence 93.90814971923828

So Rekognition is 99.97% sure it’s an animal, and it’s 93.9% sure it’s an Agelaius (red-winged blackbird), which is correct.


AWS Rekognition Custom Labels does a phenomenal job of letting you train a custom image recognition model, with no programming. You’ll need to do some programming to use the model, but you don’t need much, if any, deep learning knowledge to get great results. But it’s expensive because it’s not really (as far as I can tell) a true serverless deployment. So you pay for every hour you need it running. By contrast, the standard Rekognition service is truly serverless. You pay only based on the number of API calls you make. And Rekognition has been pretrained to do a lot of useful things out of the box. I was able to get it to recognize blackbirds, and sparrows, for example. And it can also do other things, like put bounding boxes around portions of an image to show you where, for example, cars or other objects are located.

Test Driving Google’s AutoML Vision to Diagnose Malaria

Dr. Google gets average precision of 0.988 in classifying infected vs. non-infected

Google’s AutoML services have been promoted as making machine learning accessible for people who may not have a lot of ML expertise. The idea behind AutoML is that it will automatically determine the best model and hyperparameters for your data. This can take a lot of CPU power, so Google argues that it’s best to do it in the cloud, because there’s no sense in having that kind of computing power lying around idle (except for the brief intervals during which you’re training a model).

In the case of the Google AutoML Vision service, you can use a graphical UI to drag in a bunch of images, label them, and start learning with the click of a button. While that’s theoretically true, to make effective use of Google AutoML Vision for a real-world problem, you’re going to want to have some understanding of the Google Cloud Platform (in particular, storage buckets, and IAM permissions), and probably a little Linux command line expertise. And if you want to actually build an application that uses your ML model, of course you’re going to do some programming.

Google’s AutoML service has generated a lot of hype. Actually it’s also generated some criticism–that it’s overhyped, and that throwing a shitload of CPU at a problem in order to determine the best ML model is not as good an approach as having a smart insightful person apply their smarts and insights. That’s probably true, but as I’ll show, this is still useful tool, and while there were some things that bugged me about it, the results were pretty compelling relative to the effort involved. People have also complained about the cost. I managed to get a trial Google Cloud Platform (GCP) account with $300 in credit, so this didn’t cost me any green dollars. If you check, you might be able to get the same deal. Apparently it costs about $20 per hour of training, but you can get an hour for free. I managed to train my model within an hour.

Background: NIH Malaria Dataset

In this post, I’m going to grab a malaria dataset that originated with the National Institute of Health (NIH), but which someone posted on Kaggle. It contains over 27,000 PNG images of blood smears, half of which show malaria and half of which don’t. According to the NIH: “…the images were manually annotated by an expert slide reader at the Mahidol-Oxford Tropical Medicine Research Unit in Bangkok, Thailand.”

Here are a few of the images in the dataset, shown after I imported them into AutoML.


First, let’s take a look at the results. Following that, I’ll provide a bit more detail on how to load the images into the tool, train the model, and call the API to make a prediction from a Python program.

Once training has completed, Google AutoML Vision provides a nice summary showing the results, including some graphs of recall vs. precision, and the confusion matrix. So on the test data, Google got 94.8% accuracy. Not bad.

Loading data into Google AutoML Vision

If you login to the GCP Console, you’ll find the menu item for Vision (It’s under “Machine Learning”). If you fire that up, it will ask you if you want to create a custom model, or use a pre-trained model (which is pretty cool if you want it to recognize run-of-the-mill objects, or pictures of celebrities, but that’s not going to work for our malaria dataset). Select Custom Model.

While you can drag and drop the images into the Google AutoML Vision web page, good luck doing that with 27,000 images. Instead, you need to get them into a Google Cloud Storage bucket. This is actually a bit of a pain, because you can’t download a zip file containing all of the images to a GCP bucket, and unzip it there with the Google storage command line interface. You might be able to find a way to do that with a little programming, but instead I just created a Linux VM in GCP attached a large disk to it, and downloaded the large images zip file from Kaggle to my drive. I unzipped the files and then moved them into a Cloud Storage bucket with the Google GSUTIL command line utilities. Once you start using AutoML Vision with the custom model, Google creates a cloud storage bucket for you, and gives AutoML Vision permissions to access it. So ideally, you should move your images into that bucket. It normally has a name like project_name-vcm, where project_name is the GCP project that you are using. (Projects serve as a kind of “container” within GCP, for resources that you want to group, logically. You can group servers, storage buckets, etc.)

To copy your files to your bucket, after unzipping them on the VM’s disk, use the GSUTIL cp command. Make sure you use the -m flag to speed things up (multi-threaded copying).

gsutil -m cp -r my-images-directory gs://mike-image-recognition-vcm

Now you need to build a file that Google AutoML will use to find the images in the bucket. This is just a CSV that contains the path to each file, and the label (uninfected or parasitized). Before starting, I unzipped all the files and ended up with 2 directories: one containing files labeled as parasitized, and one containing files labeled uninfected. I did the following steps from the Linux command line:

# from the Linux command line, cd to the Parasitized directory and put all file names in a file. 
# Watch out that you don't end up with extraneous entries in your file, like thumbs.db. AutoML will puke on that
ls > parasitized_labels.csv

# add 'parasitized' as a label to all lines in the file. I believe AutoML wants lower case labels
sed -i 's|$|,parasitized|' parasitized_labels.csv

# do the same for uninfected files.
ls > uninfected_labels.csv
sed -i 's|$|,uninfected|' uninfected_labels.csv

# Combine the two files into one, called malaria_labels.csv
cat parasitized_labels.csv >> malaria_labels.csv
cat uninfected_labels.csv >> malaria_labels.csv

# add the Google Cloud Storage bucket path to the start of each line in the file
sed 's|^|gs://mike-image-recognition-vcm/malaria/allimages/|' malaria_labels.csv

Optionally, you can also specify whether you want a given image to be used for training, evaluation, or testing. If you don’t, Google AutoML will decide automatically. I just let AutoML handle it, although I later regretted this somewhat because AutoML didn’t seem to have a way of finding out, after the fact, which images were used for training, etc.

Having done all that, you’ve got a file that describes the location of your 27,000+ files, and labels them. Now, you need to copy the malaria_labels.csv file into the same Google storage bucket where your images reside. The path within the bucket doesn’t matter. Once again, do this with the GSUTIL command, like above.

Creating your Dataset

Now it’s time to create your dataset in the Google AutoML Vision UI. Click “Create Dataset”, give your dataset a name, and provide a path to your CSV file (created as above), describing your files and your labels, which must be inside the bucket that AutoML Vision created. If AutoML Vision doesn’t like your CSV, you’ll get an error message, so you can fix the error and try again.

It will take a little while for AutoML to import all of your images. Once it’s complete, you should see the AutoML UI come back and show you samples of all of your images, and some stats like the number images labeled as parasitized and uninfected.


You train your model using the UI (there is also a command line interface). Just go to the Train tab, and start training. You can limit how much CPU will be used (remember, you pay for this). I limited my training to the 1 free hour, and managed to get the model trained within that time. AutoML Vision will come back with some stats on the training results, including your precision, recall, and a confusion matrix, shown above.

Predictions – Calling your Model from Python

Once you’ve trained your model, it’s automatically deployed as a REST web service. This is pretty cool because you can immediately call your model from an application and use it to make predictions. There’s no need to write any code or run a server to do this.

Google gives you some Python code for this, but I had some difficulty getting it to work without a couple changes. This it was mainly due to permissions. First you’ll need to make sure the AutoML API has been turned on. Go into the GCP console, enter AutoML in the search box, and you’ll find the screen where you need to do that pretty easily. You need to have a service account that has permission to use AutoML. I can’t remember if GCP created that for me when I started using AutoML, or if I had to create it myself. Anyway, the sample Python code that Google gives you assumes that this service account has been declared in your environment, and this sounded great, but that didn’t seem to work for me running Python in a Jupyter notebook on my Mac. So instead I declared the service account credentials in my code, explicitly. You’ll need to go into the IAM area of GCP, and download a JSON key file for your service account. You’ll provide a path to this key when you make the API call. This worked well. Here’s what I did:

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
If the code didn't show up in WordPress (above) then click here:

Pros and Cons – What’s inside the box?

The pros of using Google AutoML are clear. You can train a custom model without coding, and the results are good. But there are some things that bugged me.

The main annoyance was that I wasn’t able to see what model AutoML ended up with. I couldn’t find out the model architecture, let alone find out the specific weights. If anyone knows how to do that let me know, but the UI doesn’t show it, and the command line tools didn’t seem to offer this. This would be helpful if I wanted to take the results and try to improve on them myself.

I was unable to see which images the tool had selected for training, evaluation, and testing. Perhaps I should just trust the tool, but I wanted to use my Python program to take the test images and run predictions on them myself. You can’t do that unless you predefined which images to use for test. I just tried it with a few random images, but it would be nice to know which ones were in the training set, test set, etc. I think Google needs to give users more visibility into what’s going on inside the box.

The tool is a bit intolerant of bad data in your CSV file. I might have hoped it would skip bad lines and keep going, but instead it threw an error and failed entirely.

But all in all Google’s AutoML Vision tool worked better than I expected, and I think some people who don’t have time to code their own models might find it to be useful.