This project is interesting in that it is slightly different form the others I have done so far. It is meant more to supplement an existing service. Colab is Googles code notebook service that lets people collaborate and run code for free using idle hardware. By its very nature it is ephemeral and so on start you need to download and install all of your code and packages. Here we attempt to fix this by creating a google storage bucket and a service account that can mount that bucket like a shared drive to your Colab instance. If you are willing to provision a VM on google cloud you can ensure you will always have a machine and GPU that you can use.
Implementation Lessons
This was fairly straight forward to build up. There are however some considerations for security purposes. If you look at the diagram you see that the code to mount the storage uses google drive and a service account key. As well as a reaching out to GCP Cloud IAM and the storage buckets. This is because we are FUSE mounting the storage bucket to our provided or provisioned VM that Colab is running on. But to do that we either needed to have an entirely public bucket with read and write access (not recommended) or we needed to store our access credentials in our code (Also not recommended). So we created a service account which is Googles name for an account that is authorized to access some part of your cloud that will be built into code and not a human performing the access. You can then export a JSON file with access credentials. Here we store that in our google drive so it is not public but can be accessed by our Colab instance that can then use that credential file to authenticate to our cloud storage and read/write files.
As for provisioning Virtual Machines. This is only necessary if you are attempting to access the Colab instances when usage is high and there are no free machines available. Google offers a premium tier for Colab with a monthly fee, but that still does not guarantee access to a machine when you want it. The only way to ensure you have hardware available when you want it is to provision a machine in your Google Cloud Account. This can be an expensive prospect if you leave it running, but there is nothing stopping you from starting an instance from the marketplace and then terminating it once you are done after a few hours. The only consideration is that GCP by default does not allow you to provision GPUs and you will need to request a limit increase if you want to run models that needs GPU acceleration on your Colab instance.
Know your use cases
Adding persistent storage and compute resources is not always necessary. This fits a specific niche where you want to load large models or data into a Colab or you want to do this loading securely with low latency. Or you have specific timing requirements for completing work and cannot wait for free resources to become available. If you don’t have these requirements then the ephemeral nature of Colabs may be best even if starting up your models and copying data in takes a few minutes longer. Using storage and VM's incur usage costs.
Mount Storage Buckets with gcsFUSE
Google developed a Linux package for mounting filesystems in user space called gcsFUSE. This
allows us to mount storage buckets as volumes for holding configuration files and
reading/writing persistent binary files.
IAM Service Accounts
Service Accounts allow Colab to access storage buckets in our account. GCP has a
very strict zero trust policy so service accounts are attached in the same way
that task roles are attached to containers in AWS to grant them permissions for other
resources.
Colab and Google Drive Connection
There is a strong connection between Colabs and Drive where you are able to save your notebook and its code to your drive very easily. This is very helpful in that you can save a notebook to your drive and then access your dive and tell it to launch that notebook in Colab. It will also autosave changes as you work if you have the notebook saved to your drive.
Retrospective
I use this for running stable diffusion and testing out models. This means that I have large ram and GPU requirements. There isn’t always a GPU available when I attempt to connect to Colab but the ability to launch a VM in my GCP account that has a GPU attached and just pay per hour until I terminate it means I can always run the models when I want to. Also the startup for the code and the HDD size limits can be an issue when each model is several gigabytes. Offloading those to another storage system and having them always available in a place I can control the versioning is incredibly beneficial. But there is a balance to strike between access and cost. Storage is cheap but not free, and VMs with GPUs are moderately expensive when run for long periods. That trade off should always be considered before building out past the free tier of the Colab service.