Cloud Machine Learning: Train in the Cloud, Deploy at the Edge

Cloud Machine Learning: Train in the Cloud, Deploy at the Edge

Cloud Economics

Currently, the financial benefit to cloud computing skews toward ephemeral workloads not always-on workloads. One ephemeral example is cloud functions such as Amazon Lambda or GCP Cloud Functions or Azure Functions. Short-lived cloud functions are much more cost effective than always-on cloud VMs or always-on on-premises VMs. Another ephemeral example is the temporary use of cloud computing for disaster recovery or for machine learning (ML) model training. One does not typically require D/R resources on more than a temporary basis and one does not typically need a large pool of ML GPU resources on more than a temporary basis. If your business requires a large pool of GPU resources on a constant basis, then definitely build on-premises rather than in the cloud. 

I have been spending a lot of time with the $99 Jetson Nano IoT device and have found that using temporary cloud GPUs is a cost-effective way to speed up ML model creation. (I have pages and pages of notes from my use of the Nano. Please let me know if you have a Nano and would like any pointers.)

Teaching My Nano to Recognize Sign Language

There is an excellent set of training images for sign language on GitHub here. I thought it would be fun to teach the Nano to recognize sign language through a live video feed from the Nano camera. Plus, our daughter leads an ASL Club at school and she could be my test subject 🙂.


NVIDIA provides a free machine learning training environment called DIGITS. DIGITS takes all the programming pain out of building and testing machine learning models. DIGITS can be compiled from source, run in a container, or spun up as an AWS VM. To get comfortable with DIGITS, I had been running it from compiled source in a Ubuntu VM on my Mac. NVIDIA DIGITS is GPU-aware (naturally). My Ubuntu VM does not have access to a GPU, so all my training was run on the vCPU and not as high performance as it could be. 

Image Loading Comparison

Loading 87,000 small images of sign language hands into a DIGITS dataset in both my laptop VM and my cloud VM did not take that long. Loading took about 4 minutes in the laptop VM and about 2 minutes in the cloud VM.

Model Creation Comparison

Model creation is where the cloud VM (with access to an NVIDIA GPU) shines. During model creation, machine learning python code is run that analyzes 75% of the images to look for patterns and then continually tests those patterns against the other 25% of the images for validation until accuracy is maximized and loss is minimized.

Running this model creation on my Laptop VM estimated the time to completion to be 11 days (!). I aborted that run and started to look into the time and expense that it would take to use a cloud VM.  The recommended EC2 instance type for NVIDIA DIGITS is p2.xlarge which costs about $0.90/hour. My estimated run time on the Cloud VM was 8 hours. $8 versus my purchasing an NVIDIA GPU card for a few thousand dollars sounded like a good economic value to me. (Plus, Amazon issues a $100 credit to my account each month that someone uses my Dad Jokes Alexa Skill 🙂)

Train in the Cloud, Deploy at the Edge

Once the model was trained, I could download the model to my Nano and shut down the Cloud VM to stop the billing. Thankfully, DIGITS makes downloading the model easy. Just click Download Model.

The file is formatted as a gzipped tar file with a naming convention like 20190514-175338-623c_epoch_30.0.tar.gz. I expanded the file (tar xzvf) into a "sign-language-model" directory on my Nano. The Nano development kit includes working source code for file recognition as well as live camera recognition using Caffe models created by DIGITS. You can find the code and instructions on GitHub here.

All I needed to do was pass a few parameters to the live camera recognition program (imagenet-camera) in order to load the downloaded machine learning model. 

~/jetson-inference/build/aarch64/bin/imagenet-camera \
--prototxt=$NET/deploy.prototxt \
--model=$NET/snapshot_iter_15300.caffemodel \
--labels=$NET/labels.txt \
--input_blob=data \

The Jetson Nano caches this model into memory and uses its 128 core GPU to start recognizing live images at up to 60 frames per second. That high fps live recognition is what sets the Nano apart from other IoT devices such as the Raspberry Pi and the Google Coral.


Not all workloads are most cost-effective or most performant in the public cloud. Some are. One needs to define the requirements of each workload as well as the capabilities of on-premises and cloud data centers to determine the correct location for each workload. In this case, I did not have access to an on-premises NVIDIA GPU for a few hours, so renting an NVIDIA GPU in a Cloud VM made sense. I have also been hearing recommendations toward "Train in the cloud, deploy at the edge" and this seemed like a good reason to test that concept. Mission accomplished.

I hope you enjoyed this post. As always, I welcome your feedback.


CBlack said…
Nicely written and a great example use case to extrapolate from
Dan Sheldon said…
Great write-up Dennis! Thank you for all your help in getting my own Jetson Nano and remote DIGITS experiments up and functional. I'll have to drop off one of my ML drones for you to play with!
Dennis Faucher said…
This comment has been removed by the author.