I recently bought a new toy which I think really represents the future of AI. It’s the Coral chip, from Google. Google is a leader in Machine Learning with their Tensorflow platform, and now they are pushing down to the “edge” of machine learning with their Coral chip. This is important because if we can buy low-power, cost-effective chips that can speed up machine learning inferencing, we can run ML at a reasonable price on very low-powered devices such as IoT devices that can’t run ML in the cloud (e.g., because they don’t have the connectivity, or because they need to make quick, autonomous decisions using ML).
Google calls the Coral chip a TPU, for Tensor Processing Unit, because it’s a custom chip that’s designed to natively process tensors, which is how data is represented in neural networks. Because it’s designed to do exactly this, it’s way faster than just using a standard CPU when it comes to doing the math required to do neural network ‘inferencing’, i.e., making predictions using an ML model.
Buying and installing the Google Edge TPU
I ordered the Coral USB accelerator from Mouser. If you like machine learning, it’s a lot fun for just $60. Of course the main idea of the Coral is that hardware makers will build it into their products, but with the USB accelerator, you can easily add it to your Pi. Here, you can see my Pi with the USB accelerator attached by USB.
To install the Edge TPU on your Raspberry Pi, you follow these instructions. They recommend using the ‘standard’ runtime, but they also provide a “maximum operating frequency” runtime, although they warn that if you use this, your USB accelerator can get very hot, to the point where it can burn you! And you need to accept their warning before they even let you install the runtime software that makes the Coral run that fast.
Well, that sounded like too much fun to ignore, so I tried both the standard and the high-speed modes.
The Coral chip only works with Tensorflow Lite (TFLITE), which is Google’s version of Tensorflow designed for processing at the edge. In a prior post, I used my library of photos taken by my Raspberry Pi of birds at my bird feeder. I sorted the photos into folders named for the relevant species of bird, and fed it into the TFLITE model maker to build a TFLITE model. Then, I took that model and compiled it for the Coral Edge TPU according to Google’s instructions. You can see my Python notebook for training and compiling my model in Github. I got most of this code from examples provided by Google, but if you’re trying to train your own model and run it on a Coral chip, hopefully this will be useful for you. Once you have the model (and the labels in a .txt file), you can use the sample Python code that Google provides to run a TFLITE image recognition model on your Pi. I made a couple minor modifications to this, but nothing much.
Below is a video of a bird being recognized by the model without the Coral chip. The important number to look at is the time to run the TFLITE model, in milliseconds. You can see it takes around 140ms each time we take a grab from the Pi’s video stream and feed it into the model and run it.
Now let’s check out a Goldfinch being recognized by the TFLITE model running on the Edge TPU. Wow! Each call of the model executes in under 7ms! That’s 1/20th of the time it takes to run on the Raspberry Pi’s (low powered) ARM processor.
Just for fun I tried out the “maximum operating frequency” runtime, which Google warned us about. It gets the execution time of my ML model down to around 5ms or so. So that’s around a 25% reduction in execution time vs. the standard Edge TPU runtime, but compared with the time to run the model on the Pi’s CPU (140ms), that 2ms savings is probably not worth the extra power consumption in most cases. If I get a chance I’ll measure the temperature of the TPU after running for a while on the “max” runtime. Maybe I can cook an egg with it.
I’ve been interested in running my bird feeder image recognition model fully on a Raspberry Pi. I’m using a Pi model 4 with the High Quality (HQ) Pi camera to take pictures of the feeder, but so far I’ve only run the Machine Learning model for image recognition in the cloud. Using the fast.ai Python library, I was able to get about 95% accuracy when recognizing 15 bird species that are typically seen in my area.
But the process of sending the image to the cloud, calling the image recognition software, and getting a response back to the Raspberry Pi could take a couple seconds. That’s not fast enough if you need to take a quick action based on the result. For example, I’ve toyed with the idea of recognizing squirrels near the bird feeder and warning them away by squirting them automatically with a hose. To quickly activate the hose when the camera sees a squirrel, we’d want to run the ML algorithm on “the edge”, i.e., on the Pi itself, not in the cloud.
Tensorflow Lite, by Google, looks like an interesting approach to running ML on the edge. You can get in the game pretty quickly with some tutorials and sample code that Google provides. I took my library of bird pictures, with labels, and used a Python notebook that follows Google’s sample code. The results (around 75% test accuracy) aren’t as strong as those I achieved with fast.ai (about 95%), but they weren’t bad given the limitations of the basic TFLite approach that Google recommends.
Google provides some image recognition models that you can use for transfer learning. Actually they have a whole bunch of these. These models come pre-trained by Google on basic image recognition, so you can use them as a starting point. They have so many that it’s difficult to know which one will be best for your situation without just trying it. I ended up with EfficientNet, which is the default recommended by Google. One issue is that (as far as I can tell) Google’s image_classifier.create API works best when you just leave the weights in EfficientNet’s pre-trained model alone, and only train the neural network layer that you add onto EfficientNet in order to classify the photos in your situation (in my case, determining the species of bird in the photo). If you’re willing to invest more time, I think you could use Keras to try to freeze EfficientNet for a few training epochs, and then allow EfficientNet’s weights to be trained after that. This is what fast.ai lets you do very easily, so I think that’s one reason the results from fast.ai are much better than with TFLite. The other issue with TFLite (at least as I implemented it) was that I was quantizing the model down to 8 bit floating point numbers. You do that to get a speed improvement so that you can run inferencing (i.e., classifying the bird’s species) on your low-powered machine, but you’re making a trade-off that limits the accuracy of your model.
Still, I was able to run the model on the Pi and get some correct image classifications. Here, I’ve included some videos of the model running on my Pi, and recognizing a cardinal and a red-bellied woodpecker. For these, I trained my model using the Python notebook in my Github repo, and then downloaded the model and label files to the Pi. From there, you can just use the sample Raspberry Pi image classification code that Google provides to run the model against a video stream from your Pi camera. The code will run TFLite inferencing against a series of shots from your camera, and it will annotate the video steam with the result. You can see the label and the probability that the model assigns to the classification right on the video. Now these particular species are colorful and relatively easy to identify. When it comes to less distinctive birds like sparrows and finches, the TFLite model does noticeably worse than my fast.ai model.
Given my experience with TFLite, I’d say it will work well to give you a quick answer to a simple image classification problem, running directly on your Raspberry Pi. So I think it could be viable for, say, telling a squirrel from a bird. But if you want to accurately distinguish between many species of birds, I think you’re better off using some more computing horsepower and using a more powerful ML library, like fast.ai.
If you read my previous post on my efforts to create an accurate ML-based bird recognition model for use with photos from my bird feeder, you’ll know I was able to obtain validation accuracy of up to 95% using images mainly from the Caltech-UCSD bird image library (after cutting the number of species down to the ones I tend to see at my feeder in the Northeast US). However, my accuracy in identifying the species in actual photos from my bird feeder seemed to be much lower than 95%. I believe that’s because:
The Caltech-UCSD images may have been used to train Resnet, which is a “starter” model that I used to train my custom bird recognition model. That means that Resnet does well at identifying images from that data set because it’s already seen them before. But it doesn’t do as well when identifying images from my bird feeder because, obviously, it’s never seen them before.
Images from my bird feeder just don’t look much like the images in the data set, which I assume were taken by professional photographers, or at least by serious amateurs. Some of my photos are good (if I get lucky), but other times I get a photo of the bird’s rear end, or the bird is moving, so it’s a bit out of focus. Further, I think there’s some benefit to having images that are consistent. My bird feeder is consistently located within my photos, so you could imagine that the ML model is factoring the bird feeder out of it’s decisions regarding which label to assign to new photos, because the bird feeder didn’t play a role in any of the labels assigned to the training images.
So based on these factors, I decided to gather my own library of images and use these to train an ML model. This took a while, but as you’ll see, the accuracy I was able to achieve was quite good.
I use a Raspberry Pi 4 with a “High Quality” camera, with a Canon lens borrowed from my SLR camera, which lets me zoom in on the feeder from a spot inside my kitchen. I use PI-TIMOLO, running on the Raspberry Pi, to detect motion and capture a photo.
I’m using the Fast.ai Python library. I’ve found that it’s the quickest and easiest way to get good results. I’ve been playing around with Tensorflow-Lite to see if I can run the ML model on the “edge”, i.e., on the Raspberry Pi, not some high-powered system with a GPU in the cloud. While the results aren’t bad with Tensorflow-Lite (although not as good as with Fast.ai), it requires more fiddling with hyperparameters to get the best possible results. Fast.ai can figure out the best learning rate, for example, and can even adjust it automatically while training proceeds.
I used Google Colab to train the model. Colab gives you free access to a GPU, which is a specialized processor that cuts the training time of a model like mine by about 80%. If you use it too much, Google might cut you off and you can either wait a while until you can use a GPU again, or pay up for their Pro version, which costs $10/month.
Gathering Your Own Image Data Set
I put my Python notebook in Github, but I’m not going to include all my images. If you want to try this on your own you’ll probably want to gather your own photos. This wasn’t difficult, but it took some dedication (obsession?) in order to sort through hundreds of photos each day and put the good ones in folders, organized by bird species. Before taking on bird ML as a hobby, I didn’t know much about birds, so I had to ask for help in identifying the birds now and then (thanks, Steve!). By the end, my image data set had about 2000 images of 15 different bird species. I didn’t separate the various sparrow species–I just lumped them together–but maybe I’ll try that later.
In 9 training epochs I was able to get the results below. For the first 3 iterations, the Resnet weights are frozen, but in the last 6 epochs, these can be adjusted. The idea is that it’s not worth adjusting thousands of Resnet neural network weights early in the game, when your model is completely clueless. But once your model has been trained to a reasonable level, you can let fast.ai adjust the Resnet weights to optimize your results.
So with an error rate of 0.05, we’re at 95% validation accuracy. Given that Resnet’s never seen these pictures, we hope that this accuracy will carry over to inference on real images captured by the Rasperry Pi, day-by-day. Did it? Yes!
As an example, you can see that in my Python notebook I ran inferencing on 15 additional images, and it got only one wrong. You can see an example of this at the left. Below, you can see the model’s results. It thinks this is a cardinal with very high certainty (0.999), which is correct.
The API is the same one your Tesla smartphone app uses to control your Tesla, but it offers a lot more functionality than the app. So you can find out your battery level or the location of the car, and even take actions like honk the horn or lock the doors. You just need your email address and password to access the API, so I suppose it should probably make you slightly nervous that the North Koreans could roll down your windows and drive your Tesla into a lake.
I put that thought aside, and decided it might be interesting to gather time-series data from my car, and it might be a good source of data for machine learning projects. Maybe I could see how different styles of driving, or weather, affect the car’s mileage, for example. It was fun for a few days, but the fun stopped when Tesla locked me out of the API and my smart phone app no longer worked.
AWS Lambda and DynamoDB
I decided to use AWS Lambda and DynamoDB for this project. If you want to try this (I take no responsibility if Tesla shuts off your access, or if some 15-year olds from your neighborhood hack your car and take it for a joy ride), my code is in Github.
The way this works is pretty simple. The Lambda wakes up periodically and calls the Tesla API to gather a few data points. There are many data points available, but I choose just a subset of these, such as the latitude and longitude of the car, it’s current speed, and battery level. Then the Lambda writes them into a DynamoDB table and terminates.
Building and Configuring the Lambda
The hardest part about this project was building the Lambda so that the AWS Python environment had the necessary external libraries. If you’re just using the AWS Python SDK (boto) in your code, you can just import it without doing anything special. But if you’re using an oddball library like the MyTesla library I was using, you need to zip your Lambda up in a special way, along with MyTesla and the Requests library, and upload the zip file to AWS (either via the Console or the CLI). I found this to be a bit frustrating, maybe because Amazon’s instructions were a bit unclear. I found this set of instructions to be helpful, though.
Once you’ve got the Lambda zip file uploaded, you set the Lambda to be triggered by AWS EventBridge. This is like a cron job, waking up your Lambda using a timer. You’ll need to fill in your Tesla login credentials as Lambda environment variables, as below. This lets you avoid putting that in your code, where you might accidentally reveal it to others if you’re not careful. You also need to provide the name of your DynamoDB table, which is where you’re going to store the data about your Tesla.
DynamoDB is a cloud No-SQL database. It’s very economical, and lets you keep the project completely serverless, so the cost to run this is negligible. Go over to DynamoDB in the AWS console, and create a table. Because it’s a No-SQL database, you don’t even need to define all your columns ahead of time–just a key. I generated a unique ID for each record in my Lambda code, and used that as the key, but I probably could have just chosen the date/time stamp as the key. An important parameter to set in DynamoDB is the Provisioned read and write capacity units for your table. The default was 4 for each, but I just set these at 1 (for both). You can read up on what these numbers mean, but 1 is plenty for this application. AWS estimated that this table would cost me $0.59 per month with that read/write capacity. Below is a screen shot showing my DynamoDB table with some real readings from my car.
Now the only other thing you need to do is go into IAM in the AWS Console and make sure the role your Lambda is using has DynamoDB write access. When you create a Lambda you can assign your own role or let it create a default role for you. In this case, you can just take the default Lambda role and manually add the DynamoDBLambdaWritePolicy policy to it, and save it.
Once I went through the brain damage of understanding how to build a Lambda function with external Python libraries, everything worked great. I was conservative, and set the Lambda to wake up only once an hour. I didn’t try to wake up the car if it was sleeping, so that meant if the car wasn’t being actively used, there was no data returned. But I wanted to get some more readings, because in my 40 minute commute to work, I might get unlucky and get zero readings. So I changed the frequency to once every 30 minutes.
This worked for a few hours, until suddenly the Lambda started getting errors from the Tesla API. Then my Tesla iPhone app (which is also the key to the car) stopped working, and also gave me an API error. Uh oh. This made me a bit nervous. I wasn’t sure that if I contacted Tesla customer support and asked them to unlock my API access that they’d help me, or even know what I was asking them to do. I turned off the Lambda and luckily a few hours later, my credentials unlocked themselves and my smartphone app started working again. Maybe I just got unlucky, but I assume that Tesla locks out your credentials based on some unknown criteria, which I apparently triggered. So I think I’ll end this project on an up note, and leave the Lambda off.
P.S. One reader mentioned to me that there are commercial apps using the Tesla API, so in theory what I was doing here should have worked. His idea was that perhaps I should store the token received from Tesla and use it for subsequent API calls, and that perhaps asking for a new token each time I made an API call was what triggered Tesla to lock me out. It sounds plausible–perhaps they have a limit on the number of tokens they hand out.
I tailored my fastai model for bird (and squirrel) recognition to the situation in my backyard, so that I can use it to recognize birds that my Raspberry Pi HQ camera sees near my bird feeder. Fastai is a high-level Python library that sits on top of Pytorch. They’ve got an amazing set of videos that explain how to use it, so once you get up the learning curve, a model like mine can be written with just a few lines of Python.
I basically took the model I had previously trained on the whole Caltech-UCSD bird image dataset, but I removed all the training images for irrelevant birds that I don’t see in my backyard. Additionally, I spiked the data set with some images of squirrels and a couple of birds that I see in my backyard but which aren’t in the Caltech-UCSD data set (Hairy Woodpeckers and Black-capped Chickadees).
I think these are downy woodpeckers (plus a goldfinch)
You can see the Python notebook for my fastai model in Github:
Depending on the random seed, I can get over 95% accuracy with this model, so that’s close to the accuracy that Amazon Rekognition Custom Labels achieved (97%). However AWS took 1.4 hours to train the thing, because they are doing hyperparameter optimization which seems to require throwing a shitload of CPU at the model, and training it many times in a row. By contrast, mine takes just a few minutes to run with a GPU. Amazon’s service does a nice job of letting someone train a real custom model, with limited ML expertise, but it costs about $4.00 per hour for inference (i.e., predictions). So it’s not really viable for home use, although I think it’s great for Enterprises that don’t have deep ML skills. I should be able to run my fastai model for much less. I might run this on a low-end AWS EC2 instance, but we’ll have to see how an instance without a GPU performs. (I assume an instance with a GPU will be too expensive). Apparently you can also deploy fastai on AWS Lambda, so that might be the way to go.
A (likely) big problem, though, is that I doubt the accuracy of this model is really going to be 95%. The Caltech-UCSD people warn you right on their web site that these images may have been used to to train models such as Resnet. I’m using transfer learning, starting with Resnet. If Resnet’s already seen some of these images, testing your model with these images is probably going to give you an inflated idea of how good it really is. This looks to be the case: When I use this model with real images from my bird feeder, the results are worse than 95%. I don’t have a handle on the accuracy yet, but it’s definitely not 95%. Some of this is no doubt because my images are sometimes not in focus, or the bird may not always be oriented perfectly, whereas the data set images are generally of good quality. But some of this may be simply that my model just doesn’t have the capability to give you 95% accuracy on images it’s never seen before, and 95% is an unrealistic number because Resnet had a peek at some of the data set images previously. So I think I’d be better off training my model using actual images from my bird feeder cam, and not from an image data set. It will take me some effort to build up that data set, but that may be necessary to get the best possible accuracy in the real world. So the lesson here is that working with a canned data set won’t necessarily translate into real-world accuracy.
Note that (as I describe in the next post), a big issue here is that I used transfer learning, starting with the Resnet model. Apparently Resnet’s already seen some of these images, so my 80%+ accuracy is probably an overestimate.
Previously, I wrote about how to use AWS Rekognition to distinguish between different varieties of birds. You can train AWS Rekognition Custom Labels with photos of birds that live in your area, or try to get by with the cheaper standard Rekognition service, although it will give you a less specific bird ID. Recently I extended this work to add a motion sensing camera using a Raspberry Pi 4 and the Raspberry Pi High Quality Camera. I also changed the architecture of the AWS portion of the solution to use AWS Lambda, S3 and SNS.
As shown in the diagram below, the Raspberry Pi sends pictures from my bird feeder to an S3 bucket at AWS. When a new image arrives in S3, this invokes a Python Lambda function that sends the photo to AWS Rekognition, which uses its ML-based image recognition capabilities to determine what’s in the photo. If a bird is detected, this triggers a message to an SNS topic, which you can use to get a text or email. If a squirrel is detected, a message is sent to a different SNS topic. So you might use texts to notify yourself of a squirrel sighting so you can go chase it away, and use email to notify yourself about interesting birds. Or you could even hook up the Raspberry Pi to shoot water at any squirrels invading the bird feeder (which might be a project for next summer). Eventually I added a simple web site built using the AWS S3 static web site approach, to allow easy viewing of the best pictures.
Raspberry Pi Motion Detection
A Raspberry Pi is just a very small Linux box with an ARM processor. There’s a package called PI-TIMOLO which I found to be very useful. You can run it on the Raspberry Pi to detect motion, and automatically snap a photo. You don’t need an infrared motion detector attached to the Raspberry Pi (although that might not be a bad idea). PI-TIMOLO scans a low-res stream from your camera, and if it detects a significant difference from one frame to the next, it concludes something has moved, and momentarily stops the stream, snaps a high-res picture and puts it in a folder.
I pointed a Raspberry Pi High Quality (HQ) camera with a cheap telephoto lens at my bird feeder and set up PI-TIMOLO. There are several PI-TIMOLO settings that you need fiddle with to get good results for your particular situation, but you can ignore a lot of the settings, such as those related to video, and panoramic photos. Just focus on the image and motion settings. I’ll put a sample of the PI-TIMOLO settings I used in my Github repo.
I have a small Python program running on the Raspberry Pi, which trolls the directory where PI-TIMOLO puts its photos. If my code senses a new photo in that folder, it crops the photo (as I’ll explain in a minute) and makes an API call to send it to an S3 bucket in AWS. Here’s the code:
For this this code to work, you need to install boto (the AWS Python SDK) on your Raspberry Pi, and also create an IAM user in AWS with rights to write to S3. You need to copy and paste the AWS user credentials from that IAM user into a credentials file in your .aws or .boto folder on the Raspberry Pi, so that your Python code has the credentials needed to put files into an S3 bucket.
AWS S3, Lambda, Rekognition, and SNS
Once that’s working, you create a Lamdba function in AWS. Lambda is the flagship AWS serverless computing service. A Lambda is a piece of code that will run once it’s invoked by some trigger, which could be an incoming API call, or a timer, or in my case, something arriving in an S3 bucket. Once it’s done its job, the Lambda terminates. It’s nice for my ML-based bird and squirrel detector because it lets me run the application without running a single server, so it’s very economical. Between AWS S3, Lambda, Rekognition and SNS, and the S3 static web site, I’ve got all of this great functionality, with state-of-the-art image recognition, cloud storage, and email and text notifications, and it’s practically free if I can settle for the standard Rekognition service (i.e., not Custom Labels–more on that later). Just make sure your Raspberry Pi doesn’t go crazy sending too many photos to S3, e.g., if your motion detection settings are too loose, because eventually too many API calls could add up. But if the camera takes about 200 pictures a day, the costs to run this are minimal because the AWS free tier gives you 5,000 Rekognition calls per month.
To create a Lambda, first, you need to create a role that the Lambda will assume, with rights to S3, Lambda, Rekognition, and SNS. Go to IAM in the AWS console, and create a new role and give it the following policies:
Then go to Lambda in the AWS console. Click Create Function, and choose Author From Scratch. Name your Lambda, and choose Python 3.8 as your language, and be sure to give it the role you created earlier so it can access S3, Rekognition, and SNS.
Having created the Lambda, you want to set it to trigger based on a new object being created in S3 (i.e., a new photo being sent from the Raspberry Pi). So you click the Trigger button and select the S3 bucket and folder (which AWS also calls a prefix) where your pictures will be sent from the Raspberry Pi. Then, it’s time to enter your Python code. Here’s mine, but you’ll need to enter your own SNS topic identifiers:
After waking up (upon the arrival of a new picture in S3), the Lambda calls the Rekognition API to see what’s in the photo. Rekognition is an AWS image recognition service. I described Rekognition in a prior blog post. In short, if you provide Rekognition with an image, it tells you what it thinks is in it. I found that Rekognition’s out-of-the-box training enabled it to recognize birds, and it could sometimes identify the correct species of bird. It can also identify squirrels, which are frequent pests around feeders. In the code above, I’m using the standard Rekognition service, which comes pre-trained to recognize common objects, animals, and celebrities (in case someone famous shows up around my bird feeder).
So anyway, my Lambda code calls the Rekognition API and looks at the response to see if Rekognition sees a bird or a squirrel. I created two AWS Simple Notification Service (SNS) topics with the AWS console: one for bird sightings and one for squirrel sightings. So depending on what Rekognition saw in the image, my Lambda posts a message to the appropriate SNS topic using the SNS API. You can subscribe to an SNS topic with email or SMS texts, and get notified of the type of bird that Rekognition sees, and also get notified of squirrel sightings.
Configuring the Raspberry Pi to have the correct PI-TIMOLO and camera settings took some time. You want PI-TIMOLO to be sensitive enough to trigger a photo when it detects a bird, but not so sensitive that a few leaves blowing in the wind triggers a photo.
This was the first time I developed a Lambda function. I had prototyped and tested most of the code on my Mac, but it was a bit difficult to debug when I was creating the Lambda code in the AWS Console because I was debugging it by looking in AWS Cloudwatch to see what happened. Typically, there is a lag of up to five minutes before you can check Cloudwatch to see if you got any errors. AWS has a tool for debugging Lambdas locally (called SAM) so that’s probably worth learning if you’re going to create complex Lambdas, but I managed to muscle through without it this time.
Using the out-of-the-box Rekognition service, I was soon getting many notifications telling me that there was a bird feeder in the picture, or lawn furniture. So it was easy enough to filter those out. Then it started telling me there was grass or “nature” in every picture, so I had to filter that out. Eventually, I found that Rekognition was identifying the fact that there was a bird in the picture, but it couldn’t identify the species. The problem was my image quality. Previously when I provided Rekognition with professional quality bird photos, it could often ID the species. But my photos just weren’t as good. I found that by cropping the photos so that the bird took up a higher portion of the image, Rekognition tended to focus more on the bird and could sometimes recognize the bird species. So I added some cropping logic to my Raspberry Pi code, using the Python Pillow library. After cropping a woodpecker image (below) from my Raspberry Pi camera and sending it to Rekognition, I got this notification from SNS:
Detected labels for photo at time 2020-11-22-18:34:11 Bird (Confidence 99.25276184082031) Flicker Bird (Confidence 78.45641326904297) Woodpecker (Confidence 78.45641326904297) Finch (Confidence 54.26727294921875) in photo new-bird-images/2020-11-22-13-34-09.jpg
It’s a red bellied woodpecker, so Rekognition did determine it was a woodpecker (with moderate confidence), although not a red bellied one. It also thought it might be a flicker bird (and a finch, with low confidence). That’s wrong, although when I looked up flickers on the Cornell Ornithology site, it turns out they are a type of woodpecker, so this answer isn’t completely off-base. Rekognition sometimes gives some odd results, however. For example, the photo on the left was labeled “poultry” by Rekognition. Perhaps it looks a bit like a small chicken? Another time Rekognition decided that a photo from my backyard had a penguin in it. I checked and it was indeed a photo of a black and white bird. Of course, ML image recognition models only know about training images they were previously provided. They don’t have an understanding of the context of the images, which would let them know that it’s ridiculous to report that there’s a penguin in my backyard.
Rekognition Custom Labels
To get really accurate results, you can consider training Rekognition Custom Labels (or some other ML-based approach that involves custom training). Following the approach in my prior post, I used images from the Caltech-UCSD bird image data set to train Rekognition on common birds in my region. I threw in some pictures of squirrels so that Rekognition could also identify them. It took about 1.4 hours to train the model, but the results were impressive (although I found I was able to come close with a fast.ai model that I wrote with a few lines of Python). Below, you can see how Rekognition Custom Labels performs on the test images for various bird species that live in the Northeast (and squirrels)–it’s almost never wrong (at least when using professional quality pictures)!
But while this ML service does an outstanding job, it’s too expensive for an individual hobbyist to leave running all day long. You get a few hours as part of the AWS free tier, but after that it’s $4.00 per hour. Luckily I was able to get $100 in free AWS credits in my account, so I could try it for free. To economize, AWS recommends starting the service up, using it, and taking it down when you’re done. So you can’t really run it for hours and hours, which was my original design. For an enterprise, however, this service would be well worth the price, and it’s really not that expensive compared to other options.
When I tried Rekognition Custom Labels with pictures taken by my Raspberry Pi camera, initially I found that Rekognition Custom Labels wasn’t recognizing anything. But when I cropped the images Rekognition correctly determined that the picture (above) was a red bellied woodpecker. That was very cool, although I learned some tough lessons about training a model when I sent about 1200 pictures to Rekognition Custom Labels. The bird IDs were often wrong because the model had been trained on images it was unlikely to see at my feeder (e.g., blue jays aren’t common around here this time of year, but it thought it saw a lot of blue jays). Additionally, my images were sometimes out of focus because the bird was moving. So if I was going to run this model all the time, I’d retrain it with pictures that are closer to what it would see in production–ones from my feeder rather than stock images from a data set.
Given the cost of Rekognition Custom Labels, I’m probably just going to just run this with the much cheaper, off-the-shelf Rekognition for a while, and consider moving to a cheaper custom model in the future.
AWS Rekognition (sic) is a cloud service that provides a number of features for image recognition. You can train it with your own custom images, or you can use the canned training that’s already been done for you, if that’s sufficient.
I decided to try to use the tool both ways: by training it with some custom images and by using the out-of-the-box, pre-trained capabilities of Rekognition.
Rekognition Custom Labels
You use AWS Rekognition Custom Labels if you have your own images, which make sense for your application, but which aren’t handled by the standard Rekognition service. For example, you have good widgets coming off your production line, and damaged ones. You could train Rekognition to know the difference between the two and move the damaged ones off to the side.
I decided to train Rekognition to be able to recognize various birds. I’m thinking of pointing a camera at my bird feeder and sending the output to Rekognition. Caltech has a really nice library of labeled bird pictures of 200 different varieties of birds. They have color images and some outlines of the bird’s shapes. I just used the color images. First you go into the Recognition UI in the AWS Console. Click on Use Custom Labels. You need to create a data set. Recognition creates a bucket for you, and you can dump the images in there. You can drag and drop them, but if you have a lot of images you’re better off using the AWS CLI, with a command like this (which copies a folder containing all its images to an S3 bucket):
I only copied some of the folders to S3 because you pay for Rekognition training time and you may not want to load it up with all 18,000 images until you have a feel for what that’s going to cost you. And anyway, a lot of those birds aren’t native to my area.
Having created a data set, you click on Project, and create a new Rekognition project. You tell Rekognition where your images are. Conveniently, the Caltech bird images were nicely organized in folders that were named with the correct label for each image, i.e., the type of bird. So Recognition was able to understand the correct type of bird in the pictures with no further work by me. I started training, and waited. And waited. It took a while. Maybe an hour. And that was with only about 500 images. Rekognition is, I believe, optimizing the hyperparameters of the deep learning model it’s using. So that takes a while because it means it’s training the model multiple times to find the best hyperparameters. But the results are amazing. My model ended up with an F1 score of .977. The best possible score is 1.0, so the model is extremely accurate at classifying the nine types of birds I trained it to recognize. And if you look at these images, you’ll see that the differences between some of the varieties of birds is pretty subtle, so it’s impressive that Rekognition can tell the difference.
Recognizing New Images
Having trained our model, how do we use it? First, you need to start the model. In the Rekognition console they provide an AWS CLI command that does that. It takes about 5 minutes to fire it up. Then you can use the AWS CLI to provide your model with an image to classify, but to create a real application you’ll want to write a little code. I found some example code that AWS provides in the SDK for Rekognition, and managed to modify it and get it to work for Rekognition Custom Labels (although the API call I’m using here isn’t mentioned in the documentation, which might not be up-do-date. I just guessed it and it worked). Here’s my code:
To make this work I had to install boto (the AWS Python SDK) and figure out how to get my AWS API credentials and load them into a file and put them into my .boto folder.
Using Standard Rekognition
I managed to get an AWS account with some free credits, so I was able to train my custom model with 9 bird varieties for free, although it did cost a couple dollars in credits. But if I wanted to train my model with all 200 bird varieties, that might cost a lot more. And the other thing that worried me was the cost of inference (running the model vs. new images). You get a few hours as part of the AWS free tier, but after that it costs $4.00 an hour. But if you’re running it 24×7 that adds up. It looks like AWS is firing up a VM to run my model (its hard to tell because it doesn’t show up as an EC2 instance in the console). But I’m guessing that’s what’s happening because of the delay in starting up my model, and due to the cost. So I’m wondering if instead of using Custom Labels, I can just use standard out-of-the-box Rekognition. Pricing for standard image recognition (i.e., no custom labels) is really cheap. It’s priced per API call, and you can get 5000 calls per month free. Even when you exhaust that, it costs only 0.1 cents per image. So for my personal application here, I’d like to use standard Rekognition if possible. It turns out it’s quite good. It can’t distinguish between 10 different kinds of sparrows, but it can tell you that a bird is a sparrow, and not, say, a blackbird.
The Python code to call the REST API for standard Rekognition is similar to the code for Rekognition Custom Labels. The main difference is that you haven’t trained a custom model, and don’t need to start it up ahead of time, or refer to it in your API call:
So Rekognition is 99.97% sure it’s an animal, and it’s 93.9% sure it’s an Agelaius (red-winged blackbird), which is correct.
AWS Rekognition Custom Labels does a phenomenal job of letting you train a custom image recognition model, with no programming. You’ll need to do some programming to use the model, but you don’t need much, if any, deep learning knowledge to get great results. But it’s expensive because it’s not really (as far as I can tell) a true serverless deployment. So you pay for every hour you need it running. By contrast, the standard Rekognition service is truly serverless. You pay only based on the number of API calls you make. And Rekognition has been pretrained to do a lot of useful things out of the box. I was able to get it to recognize blackbirds, and sparrows, for example. And it can also do other things, like put bounding boxes around portions of an image to show you where, for example, cars or other objects are located.