Note that (as I describe in the next post), a big issue here is that I used transfer learning, starting with the Resnet model. Apparently Resnet’s already seen some of these images, so my 80%+ accuracy is probably an overestimate.
Previously, I wrote about how to use AWS Rekognition to distinguish between different varieties of birds. You can train AWS Rekognition Custom Labels with photos of birds that live in your area, or try to get by with the cheaper standard Rekognition service, although it will give you a less specific bird ID. Recently I extended this work to add a motion sensing camera using a Raspberry Pi 4 and the Raspberry Pi High Quality Camera. I also changed the architecture of the AWS portion of the solution to use AWS Lambda, S3 and SNS.
As shown in the diagram below, the Raspberry Pi sends pictures from my bird feeder to an S3 bucket at AWS. When a new image arrives in S3, this invokes a Python Lambda function that sends the photo to AWS Rekognition, which uses its ML-based image recognition capabilities to determine what’s in the photo. If a bird is detected, this triggers a message to an SNS topic, which you can use to get a text or email. If a squirrel is detected, a message is sent to a different SNS topic. So you might use texts to notify yourself of a squirrel sighting so you can go chase it away, and use email to notify yourself about interesting birds. Or you could even hook up the Raspberry Pi to shoot water at any squirrels invading the bird feeder (which might be a project for next summer). Eventually I added a simple web site built using the AWS S3 static web site approach, to allow easy viewing of the best pictures.
Raspberry Pi Motion Detection
A Raspberry Pi is just a very small Linux box with an ARM processor. There’s a package called PI-TIMOLO which I found to be very useful. You can run it on the Raspberry Pi to detect motion, and automatically snap a photo. You don’t need an infrared motion detector attached to the Raspberry Pi (although that might not be a bad idea). PI-TIMOLO scans a low-res stream from your camera, and if it detects a significant difference from one frame to the next, it concludes something has moved, and momentarily stops the stream, snaps a high-res picture and puts it in a folder.
I pointed a Raspberry Pi High Quality (HQ) camera with a cheap telephoto lens at my bird feeder and set up PI-TIMOLO. There are several PI-TIMOLO settings that you need fiddle with to get good results for your particular situation, but you can ignore a lot of the settings, such as those related to video, and panoramic photos. Just focus on the image and motion settings. I’ll put a sample of the PI-TIMOLO settings I used in my Github repo.
I have a small Python program running on the Raspberry Pi, which trolls the directory where PI-TIMOLO puts its photos. If my code senses a new photo in that folder, it crops the photo (as I’ll explain in a minute) and makes an API call to send it to an S3 bucket in AWS. Here’s the code:
For this this code to work, you need to install boto (the AWS Python SDK) on your Raspberry Pi, and also create an IAM user in AWS with rights to write to S3. You need to copy and paste the AWS user credentials from that IAM user into a credentials file in your .aws or .boto folder on the Raspberry Pi, so that your Python code has the credentials needed to put files into an S3 bucket.
AWS S3, Lambda, Rekognition, and SNS
Once that’s working, you create a Lamdba function in AWS. Lambda is the flagship AWS serverless computing service. A Lambda is a piece of code that will run once it’s invoked by some trigger, which could be an incoming API call, or a timer, or in my case, something arriving in an S3 bucket. Once it’s done its job, the Lambda terminates. It’s nice for my ML-based bird and squirrel detector because it lets me run the application without running a single server, so it’s very economical. Between AWS S3, Lambda, Rekognition and SNS, and the S3 static web site, I’ve got all of this great functionality, with state-of-the-art image recognition, cloud storage, and email and text notifications, and it’s practically free if I can settle for the standard Rekognition service (i.e., not Custom Labels–more on that later). Just make sure your Raspberry Pi doesn’t go crazy sending too many photos to S3, e.g., if your motion detection settings are too loose, because eventually too many API calls could add up. But if the camera takes about 200 pictures a day, the costs to run this are minimal because the AWS free tier gives you 5,000 Rekognition calls per month.
To create a Lambda, first, you need to create a role that the Lambda will assume, with rights to S3, Lambda, Rekognition, and SNS. Go to IAM in the AWS console, and create a new role and give it the following policies:
Then go to Lambda in the AWS console. Click Create Function, and choose Author From Scratch. Name your Lambda, and choose Python 3.8 as your language, and be sure to give it the role you created earlier so it can access S3, Rekognition, and SNS.
Having created the Lambda, you want to set it to trigger based on a new object being created in S3 (i.e., a new photo being sent from the Raspberry Pi). So you click the Trigger button and select the S3 bucket and folder (which AWS also calls a prefix) where your pictures will be sent from the Raspberry Pi. Then, it’s time to enter your Python code. Here’s mine, but you’ll need to enter your own SNS topic identifiers:
After waking up (upon the arrival of a new picture in S3), the Lambda calls the Rekognition API to see what’s in the photo. Rekognition is an AWS image recognition service. I described Rekognition in a prior blog post. In short, if you provide Rekognition with an image, it tells you what it thinks is in it. I found that Rekognition’s out-of-the-box training enabled it to recognize birds, and it could sometimes identify the correct species of bird. It can also identify squirrels, which are frequent pests around feeders. In the code above, I’m using the standard Rekognition service, which comes pre-trained to recognize common objects, animals, and celebrities (in case someone famous shows up around my bird feeder).
So anyway, my Lambda code calls the Rekognition API and looks at the response to see if Rekognition sees a bird or a squirrel. I created two AWS Simple Notification Service (SNS) topics with the AWS console: one for bird sightings and one for squirrel sightings. So depending on what Rekognition saw in the image, my Lambda posts a message to the appropriate SNS topic using the SNS API. You can subscribe to an SNS topic with email or SMS texts, and get notified of the type of bird that Rekognition sees, and also get notified of squirrel sightings.
Configuring the Raspberry Pi to have the correct PI-TIMOLO and camera settings took some time. You want PI-TIMOLO to be sensitive enough to trigger a photo when it detects a bird, but not so sensitive that a few leaves blowing in the wind triggers a photo.
This was the first time I developed a Lambda function. I had prototyped and tested most of the code on my Mac, but it was a bit difficult to debug when I was creating the Lambda code in the AWS Console because I was debugging it by looking in AWS Cloudwatch to see what happened. Typically, there is a lag of up to five minutes before you can check Cloudwatch to see if you got any errors. AWS has a tool for debugging Lambdas locally (called SAM) so that’s probably worth learning if you’re going to create complex Lambdas, but I managed to muscle through without it this time.
Using the out-of-the-box Rekognition service, I was soon getting many notifications telling me that there was a bird feeder in the picture, or lawn furniture. So it was easy enough to filter those out. Then it started telling me there was grass or “nature” in every picture, so I had to filter that out. Eventually, I found that Rekognition was identifying the fact that there was a bird in the picture, but it couldn’t identify the species. The problem was my image quality. Previously when I provided Rekognition with professional quality bird photos, it could often ID the species. But my photos just weren’t as good. I found that by cropping the photos so that the bird took up a higher portion of the image, Rekognition tended to focus more on the bird and could sometimes recognize the bird species. So I added some cropping logic to my Raspberry Pi code, using the Python Pillow library. After cropping a woodpecker image (below) from my Raspberry Pi camera and sending it to Rekognition, I got this notification from SNS:
Detected labels for photo at time 2020-11-22-18:34:11 Bird (Confidence 99.25276184082031) Flicker Bird (Confidence 78.45641326904297) Woodpecker (Confidence 78.45641326904297) Finch (Confidence 54.26727294921875) in photo new-bird-images/2020-11-22-13-34-09.jpg
It’s a red bellied woodpecker, so Rekognition did determine it was a woodpecker (with moderate confidence), although not a red bellied one. It also thought it might be a flicker bird (and a finch, with low confidence). That’s wrong, although when I looked up flickers on the Cornell Ornithology site, it turns out they are a type of woodpecker, so this answer isn’t completely off-base. Rekognition sometimes gives some odd results, however. For example, the photo on the left was labeled “poultry” by Rekognition. Perhaps it looks a bit like a small chicken? Another time Rekognition decided that a photo from my backyard had a penguin in it. I checked and it was indeed a photo of a black and white bird. Of course, ML image recognition models only know about training images they were previously provided. They don’t have an understanding of the context of the images, which would let them know that it’s ridiculous to report that there’s a penguin in my backyard.
Rekognition Custom Labels
To get really accurate results, you can consider training Rekognition Custom Labels (or some other ML-based approach that involves custom training). Following the approach in my prior post, I used images from the Caltech-UCSD bird image data set to train Rekognition on common birds in my region. I threw in some pictures of squirrels so that Rekognition could also identify them. It took about 1.4 hours to train the model, but the results were impressive (although I found I was able to come close with a fast.ai model that I wrote with a few lines of Python). Below, you can see how Rekognition Custom Labels performs on the test images for various bird species that live in the Northeast (and squirrels)–it’s almost never wrong (at least when using professional quality pictures)!
But while this ML service does an outstanding job, it’s too expensive for an individual hobbyist to leave running all day long. You get a few hours as part of the AWS free tier, but after that it’s $4.00 per hour. Luckily I was able to get $100 in free AWS credits in my account, so I could try it for free. To economize, AWS recommends starting the service up, using it, and taking it down when you’re done. So you can’t really run it for hours and hours, which was my original design. For an enterprise, however, this service would be well worth the price, and it’s really not that expensive compared to other options.
When I tried Rekognition Custom Labels with pictures taken by my Raspberry Pi camera, initially I found that Rekognition Custom Labels wasn’t recognizing anything. But when I cropped the images Rekognition correctly determined that the picture (above) was a red bellied woodpecker. That was very cool, although I learned some tough lessons about training a model when I sent about 1200 pictures to Rekognition Custom Labels. The bird IDs were often wrong because the model had been trained on images it was unlikely to see at my feeder (e.g., blue jays aren’t common around here this time of year, but it thought it saw a lot of blue jays). Additionally, my images were sometimes out of focus because the bird was moving. So if I was going to run this model all the time, I’d retrain it with pictures that are closer to what it would see in production–ones from my feeder rather than stock images from a data set.
Given the cost of Rekognition Custom Labels, I’m probably just going to just run this with the much cheaper, off-the-shelf Rekognition for a while, and consider moving to a cheaper custom model in the future.
AWS Rekognition (sic) is a cloud service that provides a number of features for image recognition. You can train it with your own custom images, or you can use the canned training that’s already been done for you, if that’s sufficient.
I decided to try to use the tool both ways: by training it with some custom images and by using the out-of-the-box, pre-trained capabilities of Rekognition.
Rekognition Custom Labels
You use AWS Rekognition Custom Labels if you have your own images, which make sense for your application, but which aren’t handled by the standard Rekognition service. For example, you have good widgets coming off your production line, and damaged ones. You could train Rekognition to know the difference between the two and move the damaged ones off to the side.
I decided to train Rekognition to be able to recognize various birds. I’m thinking of pointing a camera at my bird feeder and sending the output to Rekognition. Caltech has a really nice library of labeled bird pictures of 200 different varieties of birds. They have color images and some outlines of the bird’s shapes. I just used the color images. First you go into the Recognition UI in the AWS Console. Click on Use Custom Labels. You need to create a data set. Recognition creates a bucket for you, and you can dump the images in there. You can drag and drop them, but if you have a lot of images you’re better off using the AWS CLI, with a command like this (which copies a folder containing all its images to an S3 bucket):
I only copied some of the folders to S3 because you pay for Rekognition training time and you may not want to load it up with all 18,000 images until you have a feel for what that’s going to cost you. And anyway, a lot of those birds aren’t native to my area.
Having created a data set, you click on Project, and create a new Rekognition project. You tell Rekognition where your images are. Conveniently, the Caltech bird images were nicely organized in folders that were named with the correct label for each image, i.e., the type of bird. So Recognition was able to understand the correct type of bird in the pictures with no further work by me. I started training, and waited. And waited. It took a while. Maybe an hour. And that was with only about 500 images. Rekognition is, I believe, optimizing the hyperparameters of the deep learning model it’s using. So that takes a while because it means it’s training the model multiple times to find the best hyperparameters. But the results are amazing. My model ended up with an F1 score of .977. The best possible score is 1.0, so the model is extremely accurate at classifying the nine types of birds I trained it to recognize. And if you look at these images, you’ll see that the differences between some of the varieties of birds is pretty subtle, so it’s impressive that Rekognition can tell the difference.
Recognizing New Images
Having trained our model, how do we use it? First, you need to start the model. In the Rekognition console they provide an AWS CLI command that does that. It takes about 5 minutes to fire it up. Then you can use the AWS CLI to provide your model with an image to classify, but to create a real application you’ll want to write a little code. I found some example code that AWS provides in the SDK for Rekognition, and managed to modify it and get it to work for Rekognition Custom Labels (although the API call I’m using here isn’t mentioned in the documentation, which might not be up-do-date. I just guessed it and it worked). Here’s my code:
To make this work I had to install boto (the AWS Python SDK) and figure out how to get my AWS API credentials and load them into a file and put them into my .boto folder.
Using Standard Rekognition
I managed to get an AWS account with some free credits, so I was able to train my custom model with 9 bird varieties for free, although it did cost a couple dollars in credits. But if I wanted to train my model with all 200 bird varieties, that might cost a lot more. And the other thing that worried me was the cost of inference (running the model vs. new images). You get a few hours as part of the AWS free tier, but after that it costs $4.00 an hour. But if you’re running it 24×7 that adds up. It looks like AWS is firing up a VM to run my model (its hard to tell because it doesn’t show up as an EC2 instance in the console). But I’m guessing that’s what’s happening because of the delay in starting up my model, and due to the cost. So I’m wondering if instead of using Custom Labels, I can just use standard out-of-the-box Rekognition. Pricing for standard image recognition (i.e., no custom labels) is really cheap. It’s priced per API call, and you can get 5000 calls per month free. Even when you exhaust that, it costs only 0.1 cents per image. So for my personal application here, I’d like to use standard Rekognition if possible. It turns out it’s quite good. It can’t distinguish between 10 different kinds of sparrows, but it can tell you that a bird is a sparrow, and not, say, a blackbird.
The Python code to call the REST API for standard Rekognition is similar to the code for Rekognition Custom Labels. The main difference is that you haven’t trained a custom model, and don’t need to start it up ahead of time, or refer to it in your API call:
So Rekognition is 99.97% sure it’s an animal, and it’s 93.9% sure it’s an Agelaius (red-winged blackbird), which is correct.
AWS Rekognition Custom Labels does a phenomenal job of letting you train a custom image recognition model, with no programming. You’ll need to do some programming to use the model, but you don’t need much, if any, deep learning knowledge to get great results. But it’s expensive because it’s not really (as far as I can tell) a true serverless deployment. So you pay for every hour you need it running. By contrast, the standard Rekognition service is truly serverless. You pay only based on the number of API calls you make. And Rekognition has been pretrained to do a lot of useful things out of the box. I was able to get it to recognize blackbirds, and sparrows, for example. And it can also do other things, like put bounding boxes around portions of an image to show you where, for example, cars or other objects are located.
Here’s at attempt to see if I can use historical pro golf data to predict how much money at player will earn in a year. It turns out that while some of the metrics tracked by the PGA correlate with the money a player earns, it’s not easy to predict a player’s winnings. Check out the post on Kaggle.
Dr. Google gets average precision of 0.988 in classifying infected vs. non-infected
Google’s AutoML services have been promoted as making machine learning accessible for people who may not have a lot of ML expertise. The idea behind AutoML is that it will automatically determine the best model and hyperparameters for your data. This can take a lot of CPU power, so Google argues that it’s best to do it in the cloud, because there’s no sense in having that kind of computing power lying around idle (except for the brief intervals during which you’re training a model).
In the case of the Google AutoML Vision service, you can use a graphical UI to drag in a bunch of images, label them, and start learning with the click of a button. While that’s theoretically true, to make effective use of Google AutoML Vision for a real-world problem, you’re going to want to have some understanding of the Google Cloud Platform (in particular, storage buckets, and IAM permissions), and probably a little Linux command line expertise. And if you want to actually build an application that uses your ML model, of course you’re going to do some programming.
Google’s AutoML service has generated a lot of hype. Actually it’s also generated some criticism–that it’s overhyped, and that throwing a shitload of CPU at a problem in order to determine the best ML model is not as good an approach as having a smart insightful person apply their smarts and insights. That’s probably true, but as I’ll show, this is still useful tool, and while there were some things that bugged me about it, the results were pretty compelling relative to the effort involved. People have also complained about the cost. I managed to get a trial Google Cloud Platform (GCP) account with $300 in credit, so this didn’t cost me any green dollars. If you check, you might be able to get the same deal. Apparently it costs about $20 per hour of training, but you can get an hour for free. I managed to train my model within an hour.
Background: NIH Malaria Dataset
In this post, I’m going to grab a malaria dataset that originated with the National Institute of Health (NIH), but which someone posted on Kaggle. It contains over 27,000 PNG images of blood smears, half of which show malaria and half of which don’t. According to the NIH: “…the images were manually annotated by an expert slide reader at the Mahidol-Oxford Tropical Medicine Research Unit in Bangkok, Thailand.”
Here are a few of the images in the dataset, shown after I imported them into AutoML.
First, let’s take a look at the results. Following that, I’ll provide a bit more detail on how to load the images into the tool, train the model, and call the API to make a prediction from a Python program.
Once training has completed, Google AutoML Vision provides a nice summary showing the results, including some graphs of recall vs. precision, and the confusion matrix. So on the test data, Google got 94.8% accuracy. Not bad.
Loading data into Google AutoML Vision
If you login to the GCP Console, you’ll find the menu item for Vision (It’s under “Machine Learning”). If you fire that up, it will ask you if you want to create a custom model, or use a pre-trained model (which is pretty cool if you want it to recognize run-of-the-mill objects, or pictures of celebrities, but that’s not going to work for our malaria dataset). Select Custom Model.
While you can drag and drop the images into the Google AutoML Vision web page, good luck doing that with 27,000 images. Instead, you need to get them into a Google Cloud Storage bucket. This is actually a bit of a pain, because you can’t download a zip file containing all of the images to a GCP bucket, and unzip it there with the Google storage command line interface. You might be able to find a way to do that with a little programming, but instead I just created a Linux VM in GCP attached a large disk to it, and downloaded the large images zip file from Kaggle to my drive. I unzipped the files and then moved them into a Cloud Storage bucket with the Google GSUTIL command line utilities. Once you start using AutoML Vision with the custom model, Google creates a cloud storage bucket for you, and gives AutoML Vision permissions to access it. So ideally, you should move your images into that bucket. It normally has a name like project_name-vcm, where project_name is the GCP project that you are using. (Projects serve as a kind of “container” within GCP, for resources that you want to group, logically. You can group servers, storage buckets, etc.)
To copy your files to your bucket, after unzipping them on the VM’s disk, use the GSUTIL cp command. Make sure you use the -m flag to speed things up (multi-threaded copying).
Now you need to build a file that Google AutoML will use to find the images in the bucket. This is just a CSV that contains the path to each file, and the label (uninfected or parasitized). Before starting, I unzipped all the files and ended up with 2 directories: one containing files labeled as parasitized, and one containing files labeled uninfected. I did the following steps from the Linux command line:
# from the Linux command line, cd to the Parasitized directory and put all file names in a file. # Watch out that you don't end up with extraneous entries in your file, like thumbs.db. AutoML will puke on that ls > parasitized_labels.csv
# add 'parasitized' as a label to all lines in the file. I believe AutoML wants lower case labels sed -i 's|$|,parasitized|' parasitized_labels.csv
# do the same for uninfected files. ls > uninfected_labels.csv sed -i 's|$|,uninfected|' uninfected_labels.csv
# Combine the two files into one, called malaria_labels.csv cat parasitized_labels.csv >> malaria_labels.csv cat uninfected_labels.csv >> malaria_labels.csv
# add the Google Cloud Storage bucket path to the start of each line in the file sed 's|^|gs://mike-image-recognition-vcm/malaria/allimages/|' malaria_labels.csv
Optionally, you can also specify whether you want a given image to be used for training, evaluation, or testing. If you don’t, Google AutoML will decide automatically. I just let AutoML handle it, although I later regretted this somewhat because AutoML didn’t seem to have a way of finding out, after the fact, which images were used for training, etc.
Having done all that, you’ve got a file that describes the location of your 27,000+ files, and labels them. Now, you need to copy the malaria_labels.csv file into the same Google storage bucket where your images reside. The path within the bucket doesn’t matter. Once again, do this with the GSUTIL command, like above.
Creating your Dataset
Now it’s time to create your dataset in the Google AutoML Vision UI. Click “Create Dataset”, give your dataset a name, and provide a path to your CSV file (created as above), describing your files and your labels, which must be inside the bucket that AutoML Vision created. If AutoML Vision doesn’t like your CSV, you’ll get an error message, so you can fix the error and try again.
It will take a little while for AutoML to import all of your images. Once it’s complete, you should see the AutoML UI come back and show you samples of all of your images, and some stats like the number images labeled as parasitized and uninfected.
You train your model using the UI (there is also a command line interface). Just go to the Train tab, and start training. You can limit how much CPU will be used (remember, you pay for this). I limited my training to the 1 free hour, and managed to get the model trained within that time. AutoML Vision will come back with some stats on the training results, including your precision, recall, and a confusion matrix, shown above.
Predictions – Calling your Model from Python
Once you’ve trained your model, it’s automatically deployed as a REST web service. This is pretty cool because you can immediately call your model from an application and use it to make predictions. There’s no need to write any code or run a server to do this.
Google gives you some Python code for this, but I had some difficulty getting it to work without a couple changes. This it was mainly due to permissions. First you’ll need to make sure the AutoML API has been turned on. Go into the GCP console, enter AutoML in the search box, and you’ll find the screen where you need to do that pretty easily. You need to have a service account that has permission to use AutoML. I can’t remember if GCP created that for me when I started using AutoML, or if I had to create it myself. Anyway, the sample Python code that Google gives you assumes that this service account has been declared in your environment, and this sounded great, but that didn’t seem to work for me running Python in a Jupyter notebook on my Mac. So instead I declared the service account credentials in my code, explicitly. You’ll need to go into the IAM area of GCP, and download a JSON key file for your service account. You’ll provide a path to this key when you make the API call. This worked well. Here’s what I did:
The pros of using Google AutoML are clear. You can train a custom model without coding, and the results are good. But there are some things that bugged me.
The main annoyance was that I wasn’t able to see what model AutoML ended up with. I couldn’t find out the model architecture, let alone find out the specific weights. If anyone knows how to do that let me know, but the UI doesn’t show it, and the command line tools didn’t seem to offer this. This would be helpful if I wanted to take the results and try to improve on them myself.
I was unable to see which images the tool had selected for training, evaluation, and testing. Perhaps I should just trust the tool, but I wanted to use my Python program to take the test images and run predictions on them myself. You can’t do that unless you predefined which images to use for test. I just tried it with a few random images, but it would be nice to know which ones were in the training set, test set, etc. I think Google needs to give users more visibility into what’s going on inside the box.
The tool is a bit intolerant of bad data in your CSV file. I might have hoped it would skip bad lines and keep going, but instead it threw an error and failed entirely.
But all in all Google’s AutoML Vision tool worked better than I expected, and I think some people who don’t have time to code their own models might find it to be useful.