I tailored my fastai model for bird (and squirrel) recognition to the situation in my backyard, so that I can use it to recognize birds that my Raspberry Pi HQ camera sees near my bird feeder. Fastai is a high-level Python library that sits on top of Pytorch. They’ve got an amazing set of videos that explain how to use it, so once you get up the learning curve, a model like mine can be written with just a few lines of Python.
I basically took the model I had previously trained on the whole Caltech-UCSD bird image dataset, but I removed all the training images for irrelevant birds that I don’t see in my backyard. Additionally, I spiked the data set with some images of squirrels and a couple of birds that I see in my backyard but which aren’t in the Caltech-UCSD data set (Hairy Woodpeckers and Black-capped Chickadees).

I think these are downy woodpeckers (plus a goldfinch)
You can see the Python notebook for my fastai model in Github:
Depending on the random seed, I can get over 95% accuracy with this model, so that’s close to the accuracy that Amazon Rekognition Custom Labels achieved (97%). However AWS took 1.4 hours to train the thing, because they are doing hyperparameter optimization which seems to require throwing a shitload of CPU at the model, and training it many times in a row. By contrast, mine takes just a few minutes to run with a GPU. Amazon’s service does a nice job of letting someone train a real custom model, with limited ML expertise, but it costs about $4.00 per hour for inference (i.e., predictions). So it’s not really viable for home use, although I think it’s great for Enterprises that don’t have deep ML skills. I should be able to run my fastai model for much less. I might run this on a low-end AWS EC2 instance, but we’ll have to see how an instance without a GPU performs. (I assume an instance with a GPU will be too expensive). Apparently you can also deploy fastai on AWS Lambda, so that might be the way to go.
A (likely) big problem, though, is that I doubt the accuracy of this model is really going to be 95%. The Caltech-UCSD people warn you right on their web site that these images may have been used to to train models such as Resnet. I’m using transfer learning, starting with Resnet. If Resnet’s already seen some of these images, testing your model with these images is probably going to give you an inflated idea of how good it really is. This looks to be the case: When I use this model with real images from my bird feeder, the results are worse than 95%. I don’t have a handle on the accuracy yet, but it’s definitely not 95%. Some of this is no doubt because my images are sometimes not in focus, or the bird may not always be oriented perfectly, whereas the data set images are generally of good quality. But some of this may be simply that my model just doesn’t have the capability to give you 95% accuracy on images it’s never seen before, and 95% is an unrealistic number because Resnet had a peek at some of the data set images previously. So I think I’d be better off training my model using actual images from my bird feeder cam, and not from an image data set. It will take me some effort to build up that data set, but that may be necessary to get the best possible accuracy in the real world. So the lesson here is that working with a canned data set won’t necessarily translate into real-world accuracy.