ML-based bird and squirrel detector with Raspberry Pi and AWS Rekognition, Lambda, S3 and SNS

Previously, I wrote about how to use AWS Rekognition to distinguish between different varieties of birds. You can train AWS Rekognition Custom Labels with photos of birds that live in your area, or try to get by with the cheaper standard Rekognition service, although it will give you a less specific bird ID. Recently I extended this work to add a motion sensing camera using a Raspberry Pi 4 and the Raspberry Pi High Quality Camera. I also changed the architecture of the AWS portion of the solution to use AWS Lambda, S3 and SNS.

As shown in the diagram below, the Raspberry Pi sends pictures from my bird feeder to an S3 bucket at AWS. When a new image arrives in S3, this invokes a Python Lambda function that sends the photo to AWS Rekognition, which uses its ML-based image recognition capabilities to determine what’s in the photo. If a bird is detected, this triggers a message to an SNS topic, which you can use to get a text or email. If a squirrel is detected, a message is sent to a different SNS topic. So you might use texts to notify yourself of a squirrel sighting so you can go chase it away, and use email to notify yourself about interesting birds. Or you could even hook up the Raspberry Pi to shoot water at any squirrels invading the bird feeder (which might be a project for next summer). Eventually I added a simple web site built using the AWS S3 static web site approach, to allow easy viewing of the best pictures.

Raspberry Pi Motion Detection

A Raspberry Pi is just a very small Linux box with an ARM processor. There’s a package called PI-TIMOLO which I found to be very useful. You can run it on the Raspberry Pi to detect motion, and automatically snap a photo. You don’t need an infrared motion detector attached to the Raspberry Pi (although that might not be a bad idea). PI-TIMOLO scans a low-res stream from your camera, and if it detects a significant difference from one frame to the next, it concludes something has moved, and momentarily stops the stream, snaps a high-res picture and puts it in a folder.

I pointed a Raspberry Pi High Quality (HQ) camera with a cheap telephoto lens at my bird feeder and set up PI-TIMOLO. There are several PI-TIMOLO settings that you need fiddle with to get good results for your particular situation, but you can ignore a lot of the settings, such as those related to video, and panoramic photos. Just focus on the image and motion settings. I’ll put a sample of the PI-TIMOLO settings I used in my Github repo.

I have a small Python program running on the Raspberry Pi, which trolls the directory where PI-TIMOLO puts its photos. If my code senses a new photo in that folder, it crops the photo (as I’ll explain in a minute) and makes an API call to send it to an S3 bucket in AWS. Here’s the code:

https://github.com/mesadowski/RaspberryPi-Bird-Image-Recognition/blob/main/s3_send_bird_pic_crop.py

For this this code to work, you need to install boto (the AWS Python SDK) on your Raspberry Pi, and also create an IAM user in AWS with rights to write to S3. You need to copy and paste the AWS user credentials from that IAM user into a credentials file in your .aws or .boto folder on the Raspberry Pi, so that your Python code has the credentials needed to put files into an S3 bucket.

AWS S3, Lambda, Rekognition, and SNS

Once that’s working, you create a Lamdba function in AWS. Lambda is the flagship AWS serverless computing service. A Lambda is a piece of code that will run once it’s invoked by some trigger, which could be an incoming API call, or a timer, or in my case, something arriving in an S3 bucket. Once it’s done its job, the Lambda terminates. It’s nice for my ML-based bird and squirrel detector because it lets me run the application without running a single server, so it’s very economical. Between AWS S3, Lambda, Rekognition and SNS, and the S3 static web site, I’ve got all of this great functionality, with state-of-the-art image recognition, cloud storage, and email and text notifications, and it’s practically free if I can settle for the standard Rekognition service (i.e., not Custom Labels–more on that later). Just make sure your Raspberry Pi doesn’t go crazy sending too many photos to S3, e.g., if your motion detection settings are too loose, because eventually too many API calls could add up. But if the camera takes about 200 pictures a day, the costs to run this are minimal because the AWS free tier gives you 5,000 Rekognition calls per month.

To create a Lambda, first, you need to create a role that the Lambda will assume, with rights to S3, Lambda, Rekognition, and SNS. Go to IAM in the AWS console, and create a new role and give it the following policies:

Then go to Lambda in the AWS console. Click Create Function, and choose Author From Scratch. Name your Lambda, and choose Python 3.8 as your language, and be sure to give it the role you created earlier so it can access S3, Rekognition, and SNS.

Having created the Lambda, you want to set it to trigger based on a new object being created in S3 (i.e., a new photo being sent from the Raspberry Pi). So you click the Trigger button and select the S3 bucket and folder (which AWS also calls a prefix) where your pictures will be sent from the Raspberry Pi. Then, it’s time to enter your Python code. Here’s mine, but you’ll need to enter your own SNS topic identifiers:

https://github.com/mesadowski/RaspberryPi-Bird-Image-Recognition/blob/main/BirdLambda.py

After waking up (upon the arrival of a new picture in S3), the Lambda calls the Rekognition API to see what’s in the photo. Rekognition is an AWS image recognition service. I described Rekognition in a prior blog post. In short, if you provide Rekognition with an image, it tells you what it thinks is in it. I found that Rekognition’s out-of-the-box training enabled it to recognize birds, and it could sometimes identify the correct species of bird. It can also identify squirrels, which are frequent pests around feeders. In the code above, I’m using the standard Rekognition service, which comes pre-trained to recognize common objects, animals, and celebrities (in case someone famous shows up around my bird feeder).

So anyway, my Lambda code calls the Rekognition API and looks at the response to see if Rekognition sees a bird or a squirrel. I created two AWS Simple Notification Service (SNS) topics with the AWS console: one for bird sightings and one for squirrel sightings. So depending on what Rekognition saw in the image, my Lambda posts a message to the appropriate SNS topic using the SNS API. You can subscribe to an SNS topic with email or SMS texts, and get notified of the type of bird that Rekognition sees, and also get notified of squirrel sightings.

Challenges

Configuring the Raspberry Pi to have the correct PI-TIMOLO and camera settings took some time. You want PI-TIMOLO to be sensitive enough to trigger a photo when it detects a bird, but not so sensitive that a few leaves blowing in the wind triggers a photo.

This was the first time I developed a Lambda function. I had prototyped and tested most of the code on my Mac, but it was a bit difficult to debug when I was creating the Lambda code in the AWS Console because I was debugging it by looking in AWS Cloudwatch to see what happened. Typically, there is a lag of up to five minutes before you can check Cloudwatch to see if you got any errors. AWS has a tool for debugging Lambdas locally (called SAM) so that’s probably worth learning if you’re going to create complex Lambdas, but I managed to muscle through without it this time.

Results

Using the out-of-the-box Rekognition service, I was soon getting many notifications telling me that there was a bird feeder in the picture, or lawn furniture. So it was easy enough to filter those out. Then it started telling me there was grass or “nature” in every picture, so I had to filter that out. Eventually, I found that Rekognition was identifying the fact that there was a bird in the picture, but it couldn’t identify the species. The problem was my image quality. Previously when I provided Rekognition with professional quality bird photos, it could often ID the species. But my photos just weren’t as good. I found that by cropping the photos so that the bird took up a higher portion of the image, Rekognition tended to focus more on the bird and could sometimes recognize the bird species. So I added some cropping logic to my Raspberry Pi code, using the Python Pillow library. After cropping a woodpecker image (below) from my Raspberry Pi camera and sending it to Rekognition, I got this notification from SNS:

Detected labels for photo at time 2020-11-22-18:34:11 Bird (Confidence 99.25276184082031) Flicker Bird (Confidence 78.45641326904297) Woodpecker (Confidence 78.45641326904297) Finch (Confidence 54.26727294921875) in photo new-bird-images/2020-11-22-13-34-09.jpg

It’s a red bellied woodpecker, so Rekognition did determine it was a woodpecker (with moderate confidence), although not a red bellied one. It also thought it might be a flicker bird (and a finch, with low confidence). That’s wrong, although when I looked up flickers on the Cornell Ornithology site, it turns out they are a type of woodpecker, so this answer isn’t completely off-base. Rekognition sometimes gives some odd results, however. For example, the photo on the left was labeled “poultry” by Rekognition. Perhaps it looks a bit like a small chicken? Another time Rekognition decided that a photo from my backyard had a penguin in it. I checked and it was indeed a photo of a black and white bird. Of course, ML image recognition models only know about training images they were previously provided. They don’t have an understanding of the context of the images, which would let them know that it’s ridiculous to report that there’s a penguin in my backyard.

Rekognition Custom Labels

To get really accurate results, you can consider training Rekognition Custom Labels (or some other ML-based approach that involves custom training). Following the approach in my prior post, I used images from the Caltech-UCSD bird image data set to train Rekognition on common birds in my region. I threw in some pictures of squirrels so that Rekognition could also identify them. It took about 1.4 hours to train the model, but the results were impressive (although I found I was able to come close with a fast.ai model that I wrote with a few lines of Python). Below, you can see how Rekognition Custom Labels performs on the test images for various bird species that live in the Northeast (and squirrels)–it’s almost never wrong (at least when using professional quality pictures)!

But while this ML service does an outstanding job, it’s too expensive for an individual hobbyist to leave running all day long. You get a few hours as part of the AWS free tier, but after that it’s $4.00 per hour. Luckily I was able to get $100 in free AWS credits in my account, so I could try it for free. To economize, AWS recommends starting the service up, using it, and taking it down when you’re done. So you can’t really run it for hours and hours, which was my original design. For an enterprise, however, this service would be well worth the price, and it’s really not that expensive compared to other options.

To use Rekognition Custom Labels, you need to change your Lambda function slightly to use the detect_custom_labels API call rather than the detect_labels call. Then, start up the model (you can use the AWS CLI for this). If you click on your model in the console once training has been completed and scroll to the bottom, AWS provides you with the CLI command and you can just cut and paste it into a terminal window. Wait several minutes, and eventually the AWS Console will tell you it’s running. Just make sure you stop the model when you’re done so you don’t continue to rack up charges.

When I tried Rekognition Custom Labels with pictures taken by my Raspberry Pi camera, initially I found that Rekognition Custom Labels wasn’t recognizing anything. But when I cropped the images Rekognition correctly determined that the picture (above) was a red bellied woodpecker. That was very cool, although I learned some tough lessons about training a model when I sent about 1200 pictures to Rekognition Custom Labels. The bird IDs were often wrong because the model had been trained on images it was unlikely to see at my feeder (e.g., blue jays aren’t common around here this time of year, but it thought it saw a lot of blue jays). Additionally, my images were sometimes out of focus because the bird was moving. So if I was going to run this model all the time, I’d retrain it with pictures that are closer to what it would see in production–ones from my feeder rather than stock images from a data set.

Given the cost of Rekognition Custom Labels, I’m probably just going to just run this with the much cheaper, off-the-shelf Rekognition for a while, and consider moving to a cheaper custom model in the future.