Training an ML model with a GPU on the cheap with Google Cloud Platform

This may or may not be useful to people who are working on projects and need a GPU for cheap. I use AWS at work, but I’ve found their approach to ML is overkill for my basic needs. i.e., you can use AWS Sagemaker’s managed notebooks but connecting it to your code and providing a place to put model checkpoints requires (I believe) hooking it up to an S3 bucket, providing the right bucket IAM permissions, etc. Or you can get an AWS instance with a GPU, but similarly you need to connect it to S3. It’s not that difficult, but it’s some overhead you might not want to bother with if you aren’t too familiar with AWS.

I find Google is easier to work with in this respect. You can use Colab. You get a GPU sometimes, but if you don’t want Google to kill your session at an inconvenient time, you need to pay up for the Pro or Pro+ subscription, which are $10 and $50 per month, respectively. Pro+ is actually a good value. It might be more than you want to spend, although it’s nice because you can run multiple sessions at a time and you get access to good GPUs (e.g., NVIDIA P100).

Another option is a GCP instance, so I’m using that too. With a new account you can get $300 in credits. Fire up a GPU ML instance like this:

https://cloud.google.com/deep-learning-vm/docs/cloud-marketplace

You can get a low end GPU instance for about $300 per month. So your credits cover that. You will need to request a quota increase from 0 to 1 GPUs, because the default quota is 0. If you try to launch the instance without this, you’ll find it doesn’t work. So ask for a quota increase to 1 GPU, as follows, and a little while later you should be able to launch a GPU instance.

https://cloud.google.com/compute/quotas

You can go for a higher-end instance ($1000 a month or so) but just be careful to stop it when you aren’t using it, and delete it once your project is done, just to be sure. I find GCP is much easier to use for training than AWS. You SSH into the instance by clicking the SSH button next to it on the VM Instances screen under Compute Engine. Then, from the browser-based SSH window, you can upload your code and any data you need. No need to connect it to a bucket and work out IAM permissions for accessing it.

https://cloud.google.com/compute/docs/instances/transfer-files

GCP’s deep learning images come pre-installed with a lot of python libraries, but if you need more, just pip install them, and you are in business! You can be up and running in just a few minutes, and if you’re careful you should be able to do this for free (within your GCP credits).