Contents
Google currently provides free GPU/TPUs for research, as does Kaggle. These use Jupyter notebooks.
You may have heard Simmons say that he is, in general, not a huge fan of Jupyter notebooks (formerly, IPython notebooks); however, they make sense in the cloud if for no other reason than that they seemlessly integrate graphical output.
Of course, more cloud solutions are available; AWS, GCP, etc. Those typically are not free but Simmons has a bunch of mini-grants for various clouds providers.
Colab
This section comprises an example for how to use a Jupyter notebook in the cloud to get your hands on a free GPU or TPU.
- Go to colab.research.google.com, sign in (or create an account), and create a new Python 3 notebook.
- click on Runtime and select change runtime type
- under runtime type select GPU
We now want to upload a program to a notebook and run it. Scroll through this notebook's session that Simmons ran on Colab:
View the contents of the Iframe above in a separate window.
Here is the Python program that was uploaded and executed in the notebook:
mnist
#!/usr/bin/env python3import torchimport torch.nn as nn#import torch.optim as optimfrom torchvision import datasetsfrom torch.utils.data import DataLoaderimport du.lib as dulibimport du.utilsfrom du.conv.models import ConvFFNet# process commandline argumentsparser = du.utils.stand_args(desc=du.utils._markup("Classify MNIST data. In the presence of a gpu try\running this like: `mnist -sm 1.0` "), lr=0.001, mo=0.92, bs=20, epochs=15,prop=5/6, gpu=(-1,), graph=0, widths=(10,), channels=(1,16), verb=2,small=1, cm=False, print_lines=(7,8))args = parser.parse_args()# get the mnist datadl = DataLoader(datasets.MNIST('~/data/mnist', train=True, download=True,))features = dl.dataset.data.to(dtype=torch.float32)targets = dl.dataset.targets.to(dtype=torch.long)# use only a (randomly chosen) percentage of the dataif args.small < 1:features, _ , targets, _ = dulib.coh_split(args.small, features, targets)# it's standard practice to train on first 50K digits and test on last 10Kif args.prop == 5/6:train_feats, test_feats, train_targs, test_targs =\dulib.coh_split(5/6, features, targets, randomize=False)else:train_feats, test_feats, train_targs, test_targs =\dulib.coh_split(args.prop, features, targets)# center and normalize training datatrain_feats, train_means = dulib.center(train_feats)train_feats, train_stdevs = dulib.normalize(train_feats)# if test_data, center/normalize those w/r to means/stdevs of train dataif args.prop < 1:test_feats, _ = dulib.center(test_feats, train_means)test_feats, _ = dulib.normalize(test_feats, train_stdevs)print('training on {} of {} examples'.format(len(train_feats),len(features)))model = ConvFFNet(in_size = (28, 28),n_out = 10,channels = args.channels,widths = args.widths)model = dulib.train(model = model,crit = nn.NLLLoss(),train_data = (train_feats, train_targs),test_data = (test_feats, test_targs) if args.prop < 1 else None,#learn_params = optim.SGD(model.parameters(),lr=args.lr,momentum=args.mo),args = args)print('{:.2f}% correct on training data'.format(100*dulib.confusion_matrix((model,train_feats), train_targs,torch.arange(10))))if args.prop < 1:print('On test data:')print('{:.2f}% correct, overall.'.format(100*dulib.confusion_matrix((model,test_feats), test_targs,torch.arange(10),show = args.cm)))
Notes on performance:
There are two runs of the mnist in the Colab notebook above: one that includes validation (and graphing) on test data and one that ignores test data.
On Colab, the training time reported during the former run is just over 3 minutes; the later is just over 1 minute. The same two runs on the Arch machine in Simmons' office (so on a 1080) take, respectively, about 102 and 40 seconds. There must be some overhead on the Colab machines that is not present on the Arch machine (since a Tesla T4 is faster, in every way, than a 1080).
References:
- Colab notebook describing loading and saving files from external sources like your machine to the Colab filesystem.
Kaggle
Setup:
- Create an account at kaggle.com, sign in, and go to Notebooks.
- Click on the Your Work tab and Create New Notebook.
- Select the type to be Notebook.
- Under Advanced Settings, put GPU or TPU under accelerator.
- Click create.
Choosing Notebook above creates a Jupyter Notebook environment in which you can see the matplotlib graphics that DUlib optionally uses. If you don't need graphics, put Script for type.
To install DUlib to your Kaggle session:
- Click on Console in the bottom left and enter the following (or
its equivalent with your desired version number):python3 -m pip install git+https://github.com/sj-simmons/DUlib.git@v0.9 --user
To make sure everything is working, consider cutting and pasting
Simmons is working on finishing this (help him if you like)
Simmons' machine
This is technically not a cloud solution but if/when your project warrants it, you may have an account on the machine in Simmons' office (which holds dual 1080's).
First, contact Simmons so that he can create your account on the ArchLinux box. Then follow the setup details and access instructions on this page of the DL@DU Project's Github repo.
Resources
- pytorch.org/get-started/cloud-partners
- Tools that might help price EC2 instances:
- ec2instances.info
- ec2.shop (cleaner, but only for CPU instances)
- Discount GPUs at gpu.land this might be legit (let Simmons know, if you try this)