Image Recognition on the Edge (TFLite on Raspberry Pi)

I’ve been interested in running my bird feeder image recognition model fully on a Raspberry Pi. I’m using a Pi model 4 with the High Quality (HQ) Pi camera to take pictures of the feeder, but so far I’ve only run the Machine Learning model for image recognition in the cloud. Using the Python library, I was able to get about 95% accuracy when recognizing 15 bird species that are typically seen in my area.

But the process of sending the image to the cloud, calling the image recognition software, and getting a response back to the Raspberry Pi could take a couple seconds. That’s not fast enough if you need to take a quick action based on the result. For example, I’ve toyed with the idea of recognizing squirrels near the bird feeder and warning them away by squirting them automatically with a hose. To quickly activate the hose when the camera sees a squirrel, we’d want to run the ML algorithm on “the edge”, i.e., on the Pi itself, not in the cloud.

Tensorflow Lite

Tensorflow Lite, by Google, looks like an interesting approach to running ML on the edge. You can get in the game pretty quickly with some tutorials and sample code that Google provides. I took my library of bird pictures, with labels, and used a Python notebook that follows Google’s sample code. The results (around 75% test accuracy) aren’t as strong as those I achieved with (about 95%), but they weren’t bad given the limitations of the basic TFLite approach that Google recommends.

Google provides some image recognition models that you can use for transfer learning. Actually they have a whole bunch of these. These models come pre-trained by Google on basic image recognition, so you can use them as a starting point. They have so many that it’s difficult to know which one will be best for your situation without just trying it. I ended up with EfficientNet, which is the default recommended by Google. One issue is that (as far as I can tell) Google’s image_classifier.create API works best when you just leave the weights in EfficientNet’s pre-trained model alone, and only train the neural network layer that you add onto EfficientNet in order to classify the photos in your situation (in my case, determining the species of bird in the photo). If you’re willing to invest more time, I think you could use Keras to try to freeze EfficientNet for a few training epochs, and then allow EfficientNet’s weights to be trained after that. This is what lets you do very easily, so I think that’s one reason the results from are much better than with TFLite. The other issue with TFLite (at least as I implemented it) was that I was quantizing the model down to 8 bit floating point numbers. You do that to get a speed improvement so that you can run inferencing (i.e., classifying the bird’s species) on your low-powered machine, but you’re making a trade-off that limits the accuracy of your model.

Still, I was able to run the model on the Pi and get some correct image classifications. Here, I’ve included some videos of the model running on my Pi, and recognizing a cardinal and a red-bellied woodpecker. For these, I trained my model using the Python notebook in my Github repo, and then downloaded the model and label files to the Pi. From there, you can just use the sample Raspberry Pi image classification code that Google provides to run the model against a video stream from your Pi camera. The code will run TFLite inferencing against a series of shots from your camera, and it will annotate the video steam with the result. You can see the label and the probability that the model assigns to the classification right on the video. Now these particular species are colorful and relatively easy to identify. When it comes to less distinctive birds like sparrows and finches, the TFLite model does noticeably worse than my model.

Given my experience with TFLite, I’d say it will work well to give you a quick answer to a simple image classification problem, running directly on your Raspberry Pi. So I think it could be viable for, say, telling a squirrel from a bird. But if you want to accurately distinguish between many species of birds, I think you’re better off using some more computing horsepower and using a more powerful ML library, like