I hacked my Tesla. It turned out to be a bad idea.

I heard that Telsa has an undocumented “owner” API. While Tesla doesn’t support it, some smart people have reverse engineered it, and documented it. Someone even provided a Python library that you can use.

The API is the same one your Tesla smartphone app uses to control your Tesla, but it offers a lot more functionality than the app. So you can find out your battery level or the location of the car, and even take actions like honk the horn or lock the doors. You just need your email address and password to access the API, so I suppose it should probably make you slightly nervous that the North Koreans could roll down your windows and drive your Tesla into a lake.

I put that thought aside, and decided it might be interesting to gather time-series data from my car, and it might be a good source of data for machine learning projects. Maybe I could see how different styles of driving, or weather, affect the car’s mileage, for example. It was fun for a few days, but the fun stopped when Tesla locked me out of the API and my smart phone app no longer worked.

AWS Lambda and DynamoDB

I decided to use AWS Lambda and DynamoDB for this project. If you want to try this (I take no responsibility if Tesla shuts off your access, or if some 15-year olds from your neighborhood hack your car and take it for a joy ride), my code is in Github.

The way this works is pretty simple. The Lambda wakes up periodically and calls the Tesla API to gather a few data points. There are many data points available, but I choose just a subset of these, such as the latitude and longitude of the car, it’s current speed, and battery level. Then the Lambda writes them into a DynamoDB table and terminates.

Building and Configuring the Lambda

The hardest part about this project was building the Lambda so that the AWS Python environment had the necessary external libraries. If you’re just using the AWS Python SDK (boto) in your code, you can just import it without doing anything special. But if you’re using an oddball library like the MyTesla library I was using, you need to zip your Lambda up in a special way, along with MyTesla and the Requests library, and upload the zip file to AWS (either via the Console or the CLI). I found this to be a bit frustrating, maybe because Amazon’s instructions were a bit unclear. I found this set of instructions to be helpful, though.

Once you’ve got the Lambda zip file uploaded, you set the Lambda to be triggered by AWS EventBridge. This is like a cron job, waking up your Lambda using a timer. You’ll need to fill in your Tesla login credentials as Lambda environment variables, as below. This lets you avoid putting that in your code, where you might accidentally reveal it to others if you’re not careful. You also need to provide the name of your DynamoDB table, which is where you’re going to store the data about your Tesla.

DynamoDB is a cloud No-SQL database. It’s very economical, and lets you keep the project completely serverless, so the cost to run this is negligible. Go over to DynamoDB in the AWS console, and create a table. Because it’s a No-SQL database, you don’t even need to define all your columns ahead of time–just a key. I generated a unique ID for each record in my Lambda code, and used that as the key, but I probably could have just chosen the date/time stamp as the key. An important parameter to set in DynamoDB is the Provisioned read and write capacity units for your table. The default was 4 for each, but I just set these at 1 (for both). You can read up on what these numbers mean, but 1 is plenty for this application. AWS estimated that this table would cost me $0.59 per month with that read/write capacity. Below is a screen shot showing my DynamoDB table with some real readings from my car.

Now the only other thing you need to do is go into IAM in the AWS Console and make sure the role your Lambda is using has DynamoDB write access. When you create a Lambda you can assign your own role or let it create a default role for you. In this case, you can just take the default Lambda role and manually add the DynamoDBLambdaWritePolicy policy to it, and save it.

It works!

Once I went through the brain damage of understanding how to build a Lambda function with external Python libraries, everything worked great. I was conservative, and set the Lambda to wake up only once an hour. I didn’t try to wake up the car if it was sleeping, so that meant if the car wasn’t being actively used, there was no data returned. But I wanted to get some more readings, because in my 40 minute commute to work, I might get unlucky and get zero readings. So I changed the frequency to once every 30 minutes.

Uh oh

This worked for a few hours, until suddenly the Lambda started getting errors from the Tesla API. Then my Tesla iPhone app (which is also the key to the car) stopped working, and also gave me an API error. Uh oh. This made me a bit nervous. I wasn’t sure that if I contacted Tesla customer support and asked them to unlock my API access that they’d help me, or even know what I was asking them to do. I turned off the Lambda and luckily a few hours later, my credentials unlocked themselves and my smartphone app started working again. Maybe I just got unlucky, but I assume that Tesla locks out your credentials based on some unknown criteria, which I apparently triggered. So I think I’ll end this project on an up note, and leave the Lambda off.

P.S. One reader mentioned to me that there are commercial apps using the Tesla API, so in theory what I was doing here should have worked. His idea was that perhaps I should store the token received from Tesla and use it for subsequent API calls, and that perhaps asking for a new token each time I made an API call was what triggered Tesla to lock me out. It sounds plausible–perhaps they have a limit on the number of tokens they hand out.

Advertisement

Fast.ai image recognition for recognizing birds at my bird feeder

I tailored my fastai model for bird (and squirrel) recognition to the situation in my backyard, so that I can use it to recognize birds that my Raspberry Pi HQ camera sees near my bird feeder. Fastai is a high-level Python library that sits on top of Pytorch. They’ve got an amazing set of videos that explain how to use it, so once you get up the learning curve, a model like mine can be written with just a few lines of Python.

I basically took the model I had previously trained on the whole Caltech-UCSD bird image dataset, but I removed all the training images for irrelevant birds that I don’t see in my backyard. Additionally, I spiked the data set with some images of squirrels and a couple of birds that I see in my backyard but which aren’t in the Caltech-UCSD data set (Hairy Woodpeckers and Black-capped Chickadees).

I think these are downy woodpeckers (plus a goldfinch)

You can see the Python notebook for my fastai model in Github:

https://github.com/mesadowski/RaspberryPi-Bird-Image-Recognition/blob/main/Bird_ID_fastai_model.ipynb

Depending on the random seed, I can get over 95% accuracy with this model, so that’s close to the accuracy that Amazon Rekognition Custom Labels achieved (97%). However AWS took 1.4 hours to train the thing, because they are doing hyperparameter optimization which seems to require throwing a shitload of CPU at the model, and training it many times in a row. By contrast, mine takes just a few minutes to run with a GPU. Amazon’s service does a nice job of letting someone train a real custom model, with limited ML expertise, but it costs about $4.00 per hour for inference (i.e., predictions). So it’s not really viable for home use, although I think it’s great for Enterprises that don’t have deep ML skills. I should be able to run my fastai model for much less. I might run this on a low-end AWS EC2 instance, but we’ll have to see how an instance without a GPU performs. (I assume an instance with a GPU will be too expensive). Apparently you can also deploy fastai on AWS Lambda, so that might be the way to go.

A (likely) big problem, though, is that I doubt the accuracy of this model is really going to be 95%. The Caltech-UCSD people warn you right on their web site that these images may have been used to to train models such as Resnet. I’m using transfer learning, starting with Resnet. If Resnet’s already seen some of these images, testing your model with these images is probably going to give you an inflated idea of how good it really is. This looks to be the case: When I use this model with real images from my bird feeder, the results are worse than 95%. I don’t have a handle on the accuracy yet, but it’s definitely not 95%. Some of this is no doubt because my images are sometimes not in focus, or the bird may not always be oriented perfectly, whereas the data set images are generally of good quality. But some of this may be simply that my model just doesn’t have the capability to give you 95% accuracy on images it’s never seen before, and 95% is an unrealistic number because Resnet had a peek at some of the data set images previously. So I think I’d be better off training my model using actual images from my bird feeder cam, and not from an image data set. It will take me some effort to build up that data set, but that may be necessary to get the best possible accuracy in the real world. So the lesson here is that working with a canned data set won’t necessarily translate into real-world accuracy.

Fast.ai 200 bird species image recognition model on Kaggle (80+% accuracy)

I’m trying to improve on my ML-based bird recognition results. Currently, a Raspberry Pi with a camera detects motion at the feeder, and sends pictures to the AWS Rekognition service, which I’m using to try to identify the species of bird. But I’d like to improve on the accuracy of AWS Rekognition, while keeping the cost down. So here, I’m using fastai, which is a high-level Python library, which runs on top of Pytorch. So far, I’m getting over 80% accuracy on the 200-bird Caltech-UCSD bird image dataset. You can check out my model on Kaggle, here:

https://www.kaggle.com/mesadowski/caltech-ucsd-bird-image-dataset-fastai

Note that (as I describe in the next post), a big issue here is that I used transfer learning, starting with the Resnet model. Apparently Resnet’s already seen some of these images, so my 80%+ accuracy is probably an overestimate.

ML-based bird and squirrel detector with Raspberry Pi and AWS Rekognition, Lambda, S3 and SNS

Previously, I wrote about how to use AWS Rekognition to distinguish between different varieties of birds. You can train AWS Rekognition Custom Labels with photos of birds that live in your area, or try to get by with the cheaper standard Rekognition service, although it will give you a less specific bird ID. Recently I extended this work to add a motion sensing camera using a Raspberry Pi 4 and the Raspberry Pi High Quality Camera. I also changed the architecture of the AWS portion of the solution to use AWS Lambda, S3 and SNS.

As shown in the diagram below, the Raspberry Pi sends pictures from my bird feeder to an S3 bucket at AWS. When a new image arrives in S3, this invokes a Python Lambda function that sends the photo to AWS Rekognition, which uses its ML-based image recognition capabilities to determine what’s in the photo. If a bird is detected, this triggers a message to an SNS topic, which you can use to get a text or email. If a squirrel is detected, a message is sent to a different SNS topic. So you might use texts to notify yourself of a squirrel sighting so you can go chase it away, and use email to notify yourself about interesting birds. Or you could even hook up the Raspberry Pi to shoot water at any squirrels invading the bird feeder (which might be a project for next summer). Eventually I added a simple web site built using the AWS S3 static web site approach, to allow easy viewing of the best pictures.

Raspberry Pi Motion Detection

A Raspberry Pi is just a very small Linux box with an ARM processor. There’s a package called PI-TIMOLO which I found to be very useful. You can run it on the Raspberry Pi to detect motion, and automatically snap a photo. You don’t need an infrared motion detector attached to the Raspberry Pi (although that might not be a bad idea). PI-TIMOLO scans a low-res stream from your camera, and if it detects a significant difference from one frame to the next, it concludes something has moved, and momentarily stops the stream, snaps a high-res picture and puts it in a folder.

I pointed a Raspberry Pi High Quality (HQ) camera with a cheap telephoto lens at my bird feeder and set up PI-TIMOLO. There are several PI-TIMOLO settings that you need fiddle with to get good results for your particular situation, but you can ignore a lot of the settings, such as those related to video, and panoramic photos. Just focus on the image and motion settings. I’ll put a sample of the PI-TIMOLO settings I used in my Github repo.

I have a small Python program running on the Raspberry Pi, which trolls the directory where PI-TIMOLO puts its photos. If my code senses a new photo in that folder, it crops the photo (as I’ll explain in a minute) and makes an API call to send it to an S3 bucket in AWS. Here’s the code:

https://github.com/mesadowski/RaspberryPi-Bird-Image-Recognition/blob/main/s3_send_bird_pic_crop.py

For this this code to work, you need to install boto (the AWS Python SDK) on your Raspberry Pi, and also create an IAM user in AWS with rights to write to S3. You need to copy and paste the AWS user credentials from that IAM user into a credentials file in your .aws or .boto folder on the Raspberry Pi, so that your Python code has the credentials needed to put files into an S3 bucket.

AWS S3, Lambda, Rekognition, and SNS

Once that’s working, you create a Lamdba function in AWS. Lambda is the flagship AWS serverless computing service. A Lambda is a piece of code that will run once it’s invoked by some trigger, which could be an incoming API call, or a timer, or in my case, something arriving in an S3 bucket. Once it’s done its job, the Lambda terminates. It’s nice for my ML-based bird and squirrel detector because it lets me run the application without running a single server, so it’s very economical. Between AWS S3, Lambda, Rekognition and SNS, and the S3 static web site, I’ve got all of this great functionality, with state-of-the-art image recognition, cloud storage, and email and text notifications, and it’s practically free if I can settle for the standard Rekognition service (i.e., not Custom Labels–more on that later). Just make sure your Raspberry Pi doesn’t go crazy sending too many photos to S3, e.g., if your motion detection settings are too loose, because eventually too many API calls could add up. But if the camera takes about 200 pictures a day, the costs to run this are minimal because the AWS free tier gives you 5,000 Rekognition calls per month.

To create a Lambda, first, you need to create a role that the Lambda will assume, with rights to S3, Lambda, Rekognition, and SNS. Go to IAM in the AWS console, and create a new role and give it the following policies:

Then go to Lambda in the AWS console. Click Create Function, and choose Author From Scratch. Name your Lambda, and choose Python 3.8 as your language, and be sure to give it the role you created earlier so it can access S3, Rekognition, and SNS.

Having created the Lambda, you want to set it to trigger based on a new object being created in S3 (i.e., a new photo being sent from the Raspberry Pi). So you click the Trigger button and select the S3 bucket and folder (which AWS also calls a prefix) where your pictures will be sent from the Raspberry Pi. Then, it’s time to enter your Python code. Here’s mine, but you’ll need to enter your own SNS topic identifiers:

https://github.com/mesadowski/RaspberryPi-Bird-Image-Recognition/blob/main/BirdLambda.py

After waking up (upon the arrival of a new picture in S3), the Lambda calls the Rekognition API to see what’s in the photo. Rekognition is an AWS image recognition service. I described Rekognition in a prior blog post. In short, if you provide Rekognition with an image, it tells you what it thinks is in it. I found that Rekognition’s out-of-the-box training enabled it to recognize birds, and it could sometimes identify the correct species of bird. It can also identify squirrels, which are frequent pests around feeders. In the code above, I’m using the standard Rekognition service, which comes pre-trained to recognize common objects, animals, and celebrities (in case someone famous shows up around my bird feeder).

So anyway, my Lambda code calls the Rekognition API and looks at the response to see if Rekognition sees a bird or a squirrel. I created two AWS Simple Notification Service (SNS) topics with the AWS console: one for bird sightings and one for squirrel sightings. So depending on what Rekognition saw in the image, my Lambda posts a message to the appropriate SNS topic using the SNS API. You can subscribe to an SNS topic with email or SMS texts, and get notified of the type of bird that Rekognition sees, and also get notified of squirrel sightings.

Challenges

Configuring the Raspberry Pi to have the correct PI-TIMOLO and camera settings took some time. You want PI-TIMOLO to be sensitive enough to trigger a photo when it detects a bird, but not so sensitive that a few leaves blowing in the wind triggers a photo.

This was the first time I developed a Lambda function. I had prototyped and tested most of the code on my Mac, but it was a bit difficult to debug when I was creating the Lambda code in the AWS Console because I was debugging it by looking in AWS Cloudwatch to see what happened. Typically, there is a lag of up to five minutes before you can check Cloudwatch to see if you got any errors. AWS has a tool for debugging Lambdas locally (called SAM) so that’s probably worth learning if you’re going to create complex Lambdas, but I managed to muscle through without it this time.

Results

Using the out-of-the-box Rekognition service, I was soon getting many notifications telling me that there was a bird feeder in the picture, or lawn furniture. So it was easy enough to filter those out. Then it started telling me there was grass or “nature” in every picture, so I had to filter that out. Eventually, I found that Rekognition was identifying the fact that there was a bird in the picture, but it couldn’t identify the species. The problem was my image quality. Previously when I provided Rekognition with professional quality bird photos, it could often ID the species. But my photos just weren’t as good. I found that by cropping the photos so that the bird took up a higher portion of the image, Rekognition tended to focus more on the bird and could sometimes recognize the bird species. So I added some cropping logic to my Raspberry Pi code, using the Python Pillow library. After cropping a woodpecker image (below) from my Raspberry Pi camera and sending it to Rekognition, I got this notification from SNS:

Detected labels for photo at time 2020-11-22-18:34:11
Bird (Confidence 99.25276184082031)
Flicker Bird (Confidence 78.45641326904297)
Woodpecker (Confidence 78.45641326904297)
Finch (Confidence 54.26727294921875)
in photo new-bird-images/2020-11-22-13-34-09.jpg

Poultry?

It’s a red bellied woodpecker, so Rekognition did determine it was a woodpecker (with moderate confidence), although not a red bellied one. It also thought it might be a flicker bird (and a finch, with low confidence). That’s wrong, although when I looked up flickers on the Cornell Ornithology site, it turns out they are a type of woodpecker, so this answer isn’t completely off-base. Rekognition sometimes gives some odd results, however. For example, the photo on the left was labeled “poultry” by Rekognition. Perhaps it looks a bit like a small chicken? Another time Rekognition decided that a photo from my backyard had a penguin in it. I checked and it was indeed a photo of a black and white bird. Of course, ML image recognition models only know about training images they were previously provided. They don’t have an understanding of the context of the images, which would let them know that it’s ridiculous to report that there’s a penguin in my backyard.

Rekognition Custom Labels

To get really accurate results, you can consider training Rekognition Custom Labels (or some other ML-based approach that involves custom training). Following the approach in my prior post, I used images from the Caltech-UCSD bird image data set to train Rekognition on common birds in my region. I threw in some pictures of squirrels so that Rekognition could also identify them. It took about 1.4 hours to train the model, but the results were impressive (although I found I was able to come close with a fast.ai model that I wrote with a few lines of Python). Below, you can see how Rekognition Custom Labels performs on the test images for various bird species that live in the Northeast (and squirrels)–it’s almost never wrong (at least when using professional quality pictures)!

But while this ML service does an outstanding job, it’s too expensive for an individual hobbyist to leave running all day long. You get a few hours as part of the AWS free tier, but after that it’s $4.00 per hour. Luckily I was able to get $100 in free AWS credits in my account, so I could try it for free. To economize, AWS recommends starting the service up, using it, and taking it down when you’re done. So you can’t really run it for hours and hours, which was my original design. For an enterprise, however, this service would be well worth the price, and it’s really not that expensive compared to other options.

To use Rekognition Custom Labels, you need to change your Lambda function slightly to use the detect_custom_labels API call rather than the detect_labels call. Then, start up the model (you can use the AWS CLI for this). If you click on your model in the console once training has been completed and scroll to the bottom, AWS provides you with the CLI command and you can just cut and paste it into a terminal window. Wait several minutes, and eventually the AWS Console will tell you it’s running. Just make sure you stop the model when you’re done so you don’t continue to rack up charges.

When I tried Rekognition Custom Labels with pictures taken by my Raspberry Pi camera, initially I found that Rekognition Custom Labels wasn’t recognizing anything. But when I cropped the images Rekognition correctly determined that the picture (above) was a red bellied woodpecker. That was very cool, although I learned some tough lessons about training a model when I sent about 1200 pictures to Rekognition Custom Labels. The bird IDs were often wrong because the model had been trained on images it was unlikely to see at my feeder (e.g., blue jays aren’t common around here this time of year, but it thought it saw a lot of blue jays). Additionally, my images were sometimes out of focus because the bird was moving. So if I was going to run this model all the time, I’d retrain it with pictures that are closer to what it would see in production–ones from my feeder rather than stock images from a data set.

Given the cost of Rekognition Custom Labels, I’m probably just going to just run this with the much cheaper, off-the-shelf Rekognition for a while, and consider moving to a cheaper custom model in the future.

Image Recognition with AWS “Rekognition”

AWS Rekognition (sic) is a cloud service that provides a number of features for image recognition. You can train it with your own custom images, or you can use the canned training that’s already been done for you, if that’s sufficient.

I decided to try to use the tool both ways: by training it with some custom images and by using the out-of-the-box, pre-trained capabilities of Rekognition.

Rekognition Custom Labels

You use AWS Rekognition Custom Labels if you have your own images, which make sense for your application, but which aren’t handled by the standard Rekognition service. For example, you have good widgets coming off your production line, and damaged ones. You could train Rekognition to know the difference between the two and move the damaged ones off to the side.

I decided to train Rekognition to be able to recognize various birds. I’m thinking of pointing a camera at my bird feeder and sending the output to Rekognition. Caltech has a really nice library of labeled bird pictures of 200 different varieties of birds. They have color images and some outlines of the bird’s shapes. I just used the color images. First you go into the Recognition UI in the AWS Console. Click on Use Custom Labels. You need to create a data set. Recognition creates a bucket for you, and you can dump the images in there. You can drag and drop them, but if you have a lot of images you’re better off using the AWS CLI, with a command like this (which copies a folder containing all its images to an S3 bucket):

aws s3 cp --recursive 200.Common_Yellowthroat/ s3://mikes-bird-bucket/200.Common_Yellowthroat

I only copied some of the folders to S3 because you pay for Rekognition training time and you may not want to load it up with all 18,000 images until you have a feel for what that’s going to cost you. And anyway, a lot of those birds aren’t native to my area.

Having created a data set, you click on Project, and create a new Rekognition project. You tell Rekognition where your images are. Conveniently, the Caltech bird images were nicely organized in folders that were named with the correct label for each image, i.e., the type of bird. So Recognition was able to understand the correct type of bird in the pictures with no further work by me. I started training, and waited. And waited. It took a while. Maybe an hour. And that was with only about 500 images. Rekognition is, I believe, optimizing the hyperparameters of the deep learning model it’s using. So that takes a while because it means it’s training the model multiple times to find the best hyperparameters. But the results are amazing. My model ended up with an F1 score of .977. The best possible score is 1.0, so the model is extremely accurate at classifying the nine types of birds I trained it to recognize. And if you look at these images, you’ll see that the differences between some of the varieties of birds is pretty subtle, so it’s impressive that Rekognition can tell the difference.

Recognizing New Images

Having trained our model, how do we use it? First, you need to start the model. In the Rekognition console they provide an AWS CLI command that does that. It takes about 5 minutes to fire it up. Then you can use the AWS CLI to provide your model with an image to classify, but to create a real application you’ll want to write a little code. I found some example code that AWS provides in the SDK for Rekognition, and managed to modify it and get it to work for Rekognition Custom Labels (although the API call I’m using here isn’t mentioned in the documentation, which might not be up-do-date. I just guessed it and it worked). Here’s my code:

https://github.com/mesadowski/AWSRekognition/blob/main/Rekognition_Custom_Labels.py

To make this work I had to install boto (the AWS Python SDK) and figure out how to get my AWS API credentials and load them into a file and put them into my .boto folder.

Using Standard Rekognition

I managed to get an AWS account with some free credits, so I was able to train my custom model with 9 bird varieties for free, although it did cost a couple dollars in credits. But if I wanted to train my model with all 200 bird varieties, that might cost a lot more. And the other thing that worried me was the cost of inference (running the model vs. new images). You get a few hours as part of the AWS free tier, but after that it costs $4.00 an hour. But if you’re running it 24×7 that adds up. It looks like AWS is firing up a VM to run my model (its hard to tell because it doesn’t show up as an EC2 instance in the console). But I’m guessing that’s what’s happening because of the delay in starting up my model, and due to the cost. So I’m wondering if instead of using Custom Labels, I can just use standard out-of-the-box Rekognition. Pricing for standard image recognition (i.e., no custom labels) is really cheap. It’s priced per API call, and you can get 5000 calls per month free. Even when you exhaust that, it costs only 0.1 cents per image. So for my personal application here, I’d like to use standard Rekognition if possible. It turns out it’s quite good. It can’t distinguish between 10 different kinds of sparrows, but it can tell you that a bird is a sparrow, and not, say, a blackbird.

The Python code to call the REST API for standard Rekognition is similar to the code for Rekognition Custom Labels. The main difference is that you haven’t trained a custom model, and don’t need to start it up ahead of time, or refer to it in your API call:

https://github.com/mesadowski/AWSRekognition/blob/main/Rekognition_Standard.py

For the above picture, which was in the test data set, the output is:

Detected labels for 010.Red_winged_Blackbird/Red_Winged_Blackbird_0011_5845.jpg
Label Animal
Confidence 99.97032165527344
Label Bird
Confidence 99.97032165527344
Label Beak
Confidence 94.90032196044922
Label Agelaius
Confidence 93.90814971923828
Label Blackbird
Confidence 93.90814971923828

So Rekognition is 99.97% sure it’s an animal, and it’s 93.9% sure it’s an Agelaius (red-winged blackbird), which is correct.

Summary

AWS Rekognition Custom Labels does a phenomenal job of letting you train a custom image recognition model, with no programming. You’ll need to do some programming to use the model, but you don’t need much, if any, deep learning knowledge to get great results. But it’s expensive because it’s not really (as far as I can tell) a true serverless deployment. So you pay for every hour you need it running. By contrast, the standard Rekognition service is truly serverless. You pay only based on the number of API calls you make. And Rekognition has been pretrained to do a lot of useful things out of the box. I was able to get it to recognize blackbirds, and sparrows, for example. And it can also do other things, like put bounding boxes around portions of an image to show you where, for example, cars or other objects are located.

Test Driving Google’s AutoML Vision to Diagnose Malaria

Dr. Google gets average precision of 0.988 in classifying infected vs. non-infected

Google’s AutoML services have been promoted as making machine learning accessible for people who may not have a lot of ML expertise. The idea behind AutoML is that it will automatically determine the best model and hyperparameters for your data. This can take a lot of CPU power, so Google argues that it’s best to do it in the cloud, because there’s no sense in having that kind of computing power lying around idle (except for the brief intervals during which you’re training a model).

In the case of the Google AutoML Vision service, you can use a graphical UI to drag in a bunch of images, label them, and start learning with the click of a button. While that’s theoretically true, to make effective use of Google AutoML Vision for a real-world problem, you’re going to want to have some understanding of the Google Cloud Platform (in particular, storage buckets, and IAM permissions), and probably a little Linux command line expertise. And if you want to actually build an application that uses your ML model, of course you’re going to do some programming.

Google’s AutoML service has generated a lot of hype. Actually it’s also generated some criticism–that it’s overhyped, and that throwing a shitload of CPU at a problem in order to determine the best ML model is not as good an approach as having a smart insightful person apply their smarts and insights. That’s probably true, but as I’ll show, this is still useful tool, and while there were some things that bugged me about it, the results were pretty compelling relative to the effort involved. People have also complained about the cost. I managed to get a trial Google Cloud Platform (GCP) account with $300 in credit, so this didn’t cost me any green dollars. If you check, you might be able to get the same deal. Apparently it costs about $20 per hour of training, but you can get an hour for free. I managed to train my model within an hour.

Background: NIH Malaria Dataset

In this post, I’m going to grab a malaria dataset that originated with the National Institute of Health (NIH), but which someone posted on Kaggle. It contains over 27,000 PNG images of blood smears, half of which show malaria and half of which don’t. According to the NIH: “…the images were manually annotated by an expert slide reader at the Mahidol-Oxford Tropical Medicine Research Unit in Bangkok, Thailand.”

Here are a few of the images in the dataset, shown after I imported them into AutoML.

Results

First, let’s take a look at the results. Following that, I’ll provide a bit more detail on how to load the images into the tool, train the model, and call the API to make a prediction from a Python program.

Once training has completed, Google AutoML Vision provides a nice summary showing the results, including some graphs of recall vs. precision, and the confusion matrix. So on the test data, Google got 94.8% accuracy. Not bad.

Loading data into Google AutoML Vision

If you login to the GCP Console, you’ll find the menu item for Vision (It’s under “Machine Learning”). If you fire that up, it will ask you if you want to create a custom model, or use a pre-trained model (which is pretty cool if you want it to recognize run-of-the-mill objects, or pictures of celebrities, but that’s not going to work for our malaria dataset). Select Custom Model.

While you can drag and drop the images into the Google AutoML Vision web page, good luck doing that with 27,000 images. Instead, you need to get them into a Google Cloud Storage bucket. This is actually a bit of a pain, because you can’t download a zip file containing all of the images to a GCP bucket, and unzip it there with the Google storage command line interface. You might be able to find a way to do that with a little programming, but instead I just created a Linux VM in GCP attached a large disk to it, and downloaded the large images zip file from Kaggle to my drive. I unzipped the files and then moved them into a Cloud Storage bucket with the Google GSUTIL command line utilities. Once you start using AutoML Vision with the custom model, Google creates a cloud storage bucket for you, and gives AutoML Vision permissions to access it. So ideally, you should move your images into that bucket. It normally has a name like project_name-vcm, where project_name is the GCP project that you are using. (Projects serve as a kind of “container” within GCP, for resources that you want to group, logically. You can group servers, storage buckets, etc.)

To copy your files to your bucket, after unzipping them on the VM’s disk, use the GSUTIL cp command. Make sure you use the -m flag to speed things up (multi-threaded copying).

gsutil -m cp -r my-images-directory gs://mike-image-recognition-vcm

Now you need to build a file that Google AutoML will use to find the images in the bucket. This is just a CSV that contains the path to each file, and the label (uninfected or parasitized). Before starting, I unzipped all the files and ended up with 2 directories: one containing files labeled as parasitized, and one containing files labeled uninfected. I did the following steps from the Linux command line:

# from the Linux command line, cd to the Parasitized directory and put all file names in a file. 
# Watch out that you don't end up with extraneous entries in your file, like thumbs.db. AutoML will puke on that
ls > parasitized_labels.csv

# add 'parasitized' as a label to all lines in the file. I believe AutoML wants lower case labels
sed -i 's|$|,parasitized|' parasitized_labels.csv

# do the same for uninfected files.
ls > uninfected_labels.csv
sed -i 's|$|,uninfected|' uninfected_labels.csv

# Combine the two files into one, called malaria_labels.csv
cat parasitized_labels.csv >> malaria_labels.csv
cat uninfected_labels.csv >> malaria_labels.csv

# add the Google Cloud Storage bucket path to the start of each line in the file
sed 's|^|gs://mike-image-recognition-vcm/malaria/allimages/|' malaria_labels.csv

Optionally, you can also specify whether you want a given image to be used for training, evaluation, or testing. If you don’t, Google AutoML will decide automatically. I just let AutoML handle it, although I later regretted this somewhat because AutoML didn’t seem to have a way of finding out, after the fact, which images were used for training, etc.

Having done all that, you’ve got a file that describes the location of your 27,000+ files, and labels them. Now, you need to copy the malaria_labels.csv file into the same Google storage bucket where your images reside. The path within the bucket doesn’t matter. Once again, do this with the GSUTIL command, like above.

Creating your Dataset

Now it’s time to create your dataset in the Google AutoML Vision UI. Click “Create Dataset”, give your dataset a name, and provide a path to your CSV file (created as above), describing your files and your labels, which must be inside the bucket that AutoML Vision created. If AutoML Vision doesn’t like your CSV, you’ll get an error message, so you can fix the error and try again.

It will take a little while for AutoML to import all of your images. Once it’s complete, you should see the AutoML UI come back and show you samples of all of your images, and some stats like the number images labeled as parasitized and uninfected.

Training

You train your model using the UI (there is also a command line interface). Just go to the Train tab, and start training. You can limit how much CPU will be used (remember, you pay for this). I limited my training to the 1 free hour, and managed to get the model trained within that time. AutoML Vision will come back with some stats on the training results, including your precision, recall, and a confusion matrix, shown above.

Predictions – Calling your Model from Python

Once you’ve trained your model, it’s automatically deployed as a REST web service. This is pretty cool because you can immediately call your model from an application and use it to make predictions. There’s no need to write any code or run a server to do this.

Google gives you some Python code for this, but I had some difficulty getting it to work without a couple changes. This it was mainly due to permissions. First you’ll need to make sure the AutoML API has been turned on. Go into the GCP console, enter AutoML in the search box, and you’ll find the screen where you need to do that pretty easily. You need to have a service account that has permission to use AutoML. I can’t remember if GCP created that for me when I started using AutoML, or if I had to create it myself. Anyway, the sample Python code that Google gives you assumes that this service account has been declared in your environment, and this sounded great, but that didn’t seem to work for me running Python in a Jupyter notebook on my Mac. So instead I declared the service account credentials in my code, explicitly. You’ll need to go into the IAM area of GCP, and download a JSON key file for your service account. You’ll provide a path to this key when you make the API call. This worked well. Here’s what I did:

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
If the code didn't show up in WordPress (above) then click here:
https://gist.github.com/mesadowski/ca907c00bf03acdfa098e6566922d621

Pros and Cons – What’s inside the box?

The pros of using Google AutoML are clear. You can train a custom model without coding, and the results are good. But there are some things that bugged me.

The main annoyance was that I wasn’t able to see what model AutoML ended up with. I couldn’t find out the model architecture, let alone find out the specific weights. If anyone knows how to do that let me know, but the UI doesn’t show it, and the command line tools didn’t seem to offer this. This would be helpful if I wanted to take the results and try to improve on them myself.

I was unable to see which images the tool had selected for training, evaluation, and testing. Perhaps I should just trust the tool, but I wanted to use my Python program to take the test images and run predictions on them myself. You can’t do that unless you predefined which images to use for test. I just tried it with a few random images, but it would be nice to know which ones were in the training set, test set, etc. I think Google needs to give users more visibility into what’s going on inside the box.

The tool is a bit intolerant of bad data in your CSV file. I might have hoped it would skip bad lines and keep going, but instead it threw an error and failed entirely.

But all in all Google’s AutoML Vision tool worked better than I expected, and I think some people who don’t have time to code their own models might find it to be useful.