GPU servers

Google currently provides free GPU/TPUs for research, as does Kaggle. These use Jupyter notebooks.

You may have heard Simmons say that he is, in general, not a huge fan of Jupyter notebooks (formerly, IPython notebooks); however, they make sense in the cloud if for no other reason than that they seemlessly integrate graphical output.

Of course, more cloud solutions are available; AWS, GCP, etc. Those typically are not free but Simmons has a bunch of mini-grants for various clouds providers.

Colab

This section comprises an example for how to use a Jupyter notebook in the cloud to get your hands on a free GPU or TPU.

Go to colab.research.google.com, sign in (or create an account), and create a new Python 3 notebook.
click on Runtime and select change runtime type
under runtime type select GPU

We now want to upload a program to a notebook and run it. Scroll through this notebook's session that Simmons ran on Colab:

The <a href="./dulib_test0.asciidoc">asciidoc version</a>and <a href="./dulib_test0_files/dulib_test0_13_1.png">png image</a>.View the contents of the Iframe above in a separate window.

Here is the Python program that was uploaded and executed in the notebook:

mnist
#!/usr/bin/env python3
import torch
import torch.nn as nn
#import torch.optim as optim
from torchvision import datasets
from torch.utils.data import DataLoader
import du.lib as dulib
import du.utils
from du.conv.models import ConvFFNet
# process commandline arguments
parser = du.utils.stand_args(
    desc=du.utils._markup("Classify MNIST data. In the presence of a gpu try\
    running this like: `mnist -sm 1.0` "), lr=0.001, mo=0.92, bs=20, epochs=15,
    prop=5/6, gpu=(-1,), graph=0, widths=(10,), channels=(1,16), verb=2,
    small=1, cm=False, print_lines=(7,8))
args = parser.parse_args()
# get the mnist data
dl = DataLoader(datasets.MNIST('~/data/mnist', train=True, download=True,))
features = dl.dataset.data.to(dtype=torch.float32)
targets = dl.dataset.targets.to(dtype=torch.long)
# use only a (randomly chosen) percentage of the data
if args.small < 1:
  features, _ , targets, _ = dulib.coh_split(args.small, features, targets)
# it's standard practice to train on first 50K digits and test on last 10K
if args.prop == 5/6:
  train_feats, test_feats, train_targs, test_targs =\
      dulib.coh_split(5/6, features, targets, randomize=False)
else:
  train_feats, test_feats, train_targs, test_targs =\
      dulib.coh_split(args.prop, features, targets)
# center and normalize training data
train_feats, train_means = dulib.center(train_feats)
train_feats, train_stdevs = dulib.normalize(train_feats)
# if test_data, center/normalize those w/r to means/stdevs of train data
if args.prop < 1:
  test_feats, _ = dulib.center(test_feats, train_means)
  test_feats, _ = dulib.normalize(test_feats, train_stdevs)
print('training on {} of {} examples'.format(len(train_feats),len(features)))
model = ConvFFNet(
    in_size = (28, 28),
    n_out = 10,
    channels = args.channels,
    widths = args.widths)
model = dulib.train(
    model = model,
    crit = nn.NLLLoss(),
    train_data = (train_feats, train_targs),
    test_data = (test_feats, test_targs) if args.prop < 1 else None,
    #learn_params = optim.SGD(model.parameters(),lr=args.lr,momentum=args.mo),
    args = args)
print('{:.2f}% correct on training data'.format(100*dulib.confusion_matrix(
    (model,train_feats), train_targs,torch.arange(10))))
if args.prop < 1:
  print('On test data:')
  print('{:.2f}% correct, overall.'.format(100*dulib.confusion_matrix(
      (model,test_feats), test_targs,torch.arange(10),show = args.cm)))

Notes on performance:

There are two runs of the mnist in the Colab notebook above: one that includes validation (and graphing) on test data and one that ignores test data.
On Colab, the training time reported during the former run is just over 3 minutes; the later is just over 1 minute. The same two runs on the Arch machine in Simmons' office (so on a 1080) take, respectively, about 102 and 40 seconds. There must be some overhead on the Colab machines that is not present on the Arch machine (since a Tesla T4 is faster, in every way, than a 1080).

References:

Colab notebook describing loading and saving files from external sources like your machine to the Colab filesystem.

Kaggle

Setup:

Create an account at kaggle.com, sign in, and go to Notebooks.
Click on the Your Work tab and Create New Notebook.
Select the type to be Notebook.
Under Advanced Settings, put GPU or TPU under accelerator.
Click create.

Choosing Notebook above creates a Jupyter Notebook environment in which you can see the matplotlib graphics that DUlib optionally uses. If you don't need graphics, put Script for type.

To install DUlib to your Kaggle session:

Click on Console in the bottom left and enter the following (or its equivalent with your desired version number):

python3 -m pip install git+https://github.com/sj-simmons/DUlib.git@v0.9  --user

To make sure everything is working, consider cutting and pasting

Simmons is working on finishing this (help him if you like)

Simmons' machine

This is technically not a cloud solution but if/when your project warrants it, you may have an account on the machine in Simmons' office (which holds dual 1080's).

First, contact Simmons so that he can create your account on the ArchLinux box. Then follow the setup details and access instructions on this page of the DL@DU Project's Github repo.

Resources

pytorch.org/get-started/cloud-partners
Tools that might help price EC2 instances:
- ec2instances.info
- ec2.shop (cleaner, but only for CPU instances)
Discount GPUs at gpu.land this might be legit (let Simmons know, if you try this)

Menu toggle

Contents

Colab

Kaggle

Simmons' machine

Resources

Commandline basics

DUlib