
If you read my previous post on my efforts to create an accurate ML-based bird recognition model for use with photos from my bird feeder, you’ll know I was able to obtain validation accuracy of up to 95% using images mainly from the Caltech-UCSD bird image library (after cutting the number of species down to the ones I tend to see at my feeder in the Northeast US). However, my accuracy in identifying the species in actual photos from my bird feeder seemed to be much lower than 95%. I believe that’s because:
- The Caltech-UCSD images may have been used to train Resnet, which is a “starter” model that I used to train my custom bird recognition model. That means that Resnet does well at identifying images from that data set because it’s already seen them before. But it doesn’t do as well when identifying images from my bird feeder because, obviously, it’s never seen them before.
- Images from my bird feeder just don’t look much like the images in the data set, which I assume were taken by professional photographers, or at least by serious amateurs. Some of my photos are good (if I get lucky), but other times I get a photo of the bird’s rear end, or the bird is moving, so it’s a bit out of focus. Further, I think there’s some benefit to having images that are consistent. My bird feeder is consistently located within my photos, so you could imagine that the ML model is factoring the bird feeder out of it’s decisions regarding which label to assign to new photos, because the bird feeder didn’t play a role in any of the labels assigned to the training images.
So based on these factors, I decided to gather my own library of images and use these to train an ML model. This took a while, but as you’ll see, the accuracy I was able to achieve was quite good.
Technologies Used
I use a Raspberry Pi 4 with a “High Quality” camera, with a Canon lens borrowed from my SLR camera, which lets me zoom in on the feeder from a spot inside my kitchen. I use PI-TIMOLO, running on the Raspberry Pi, to detect motion and capture a photo.
I’m using the Fast.ai Python library. I’ve found that it’s the quickest and easiest way to get good results. I’ve been playing around with Tensorflow-Lite to see if I can run the ML model on the “edge”, i.e., on the Raspberry Pi, not some high-powered system with a GPU in the cloud. While the results aren’t bad with Tensorflow-Lite (although not as good as with Fast.ai), it requires more fiddling with hyperparameters to get the best possible results. Fast.ai can figure out the best learning rate, for example, and can even adjust it automatically while training proceeds.
I used Google Colab to train the model. Colab gives you free access to a GPU, which is a specialized processor that cuts the training time of a model like mine by about 80%. If you use it too much, Google might cut you off and you can either wait a while until you can use a GPU again, or pay up for their Pro version, which costs $10/month.
Gathering Your Own Image Data Set

I put my Python notebook in Github, but I’m not going to include all my images. If you want to try this on your own you’ll probably want to gather your own photos. This wasn’t difficult, but it took some dedication (obsession?) in order to sort through hundreds of photos each day and put the good ones in folders, organized by bird species. Before taking on bird ML as a hobby, I didn’t know much about birds, so I had to ask for help in identifying the birds now and then (thanks, Steve!). By the end, my image data set had about 2000 images of 15 different bird species. I didn’t separate the various sparrow species–I just lumped them together–but maybe I’ll try that later.
Results
In 9 training epochs I was able to get the results below. For the first 3 iterations, the Resnet weights are frozen, but in the last 6 epochs, these can be adjusted. The idea is that it’s not worth adjusting thousands of Resnet neural network weights early in the game, when your model is completely clueless. But once your model has been trained to a reasonable level, you can let fast.ai adjust the Resnet weights to optimize your results.
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 3.697723 | 1.892719 | 0.590698 | 02:16 |
1 | 2.527083 | 0.875750 | 0.267442 | 02:17 |
2 | 1.701044 | 0.560366 | 0.186047 | 02:18 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.566087 | 0.316642 | 0.100000 | 02:20 |
1 | 0.360757 | 0.277740 | 0.083721 | 02:20 |
2 | 0.249010 | 0.196168 | 0.067442 | 02:19 |
3 | 0.183660 | 0.210550 | 0.067442 | 02:20 |
4 | 0.121282 | 0.191973 | 0.053488 | 02:22 |
5 | 0.086573 | 0.180368 | 0.051163 | 02:20 |
So with an error rate of 0.05, we’re at 95% validation accuracy. Given that Resnet’s never seen these pictures, we hope that this accuracy will carry over to inference on real images captured by the Rasperry Pi, day-by-day. Did it? Yes!

As an example, you can see that in my Python notebook I ran inferencing on 15 additional images, and it got only one wrong. You can see an example of this at the left. Below, you can see the model’s results. It thinks this is a cardinal with very high certainty (0.999), which is correct.
('Cardinal', tensor(4), tensor([7.0226e-10, 3.4064e-09, 3.4986e-05, 8.6896e-07, 9.9989e-01, 6.3346e-08, 3.4776e-09, 1.3251e-07, 1.2651e-06, 1.4257e-08, 1.1414e-07, 6.5111e-05, 9.6264e-08, 7.8018e-07, 2.4581e-06]))
So gathering my own images and training my own model let me achieve quite good accuracy compared with using canned data from a publicly available data set.