Amazon SageMaker Accelerates Machine Learning Development

March 03 2020

From sales forecasting to virtual personal assistants, machine learning (ML) has become integral to many common business processes and applications. Its power lies in enabling organizations to identify trends and patterns — many of which are imperceptible or difficult to detect by people — from large, diverse data sets.

In addition, ML helps automate data analysis, significantly reducing times and costs. It’s also less error-prone, and enables organizations to deliver more personalized services and differentiated products. However, using the technology has its challenges.

ML development is a complex, iterative, time-consuming, and often expensive process. It doesn’t help that there are few integrated tools for the ML workflow, which means having to combine disparate tools and workflows.

An Overview of SageMaker

AWS launched Amazon SageMaker in 2017 to help simplify things. The fully managed end-to-end ML service enables data scientists and developers to quickly build, train, and deploy. It’s a robust, full-featured tool that stands out from the competition because of its unique capabilities, use of familiar components, and ease of integration with other AWS and open source services. Our team at ClearScale has found it extremely useful in helping several of our clients because of its flexibility, efficiency, and ability to create customized models.

Because SageMaker is a managed service, there’s no need to build, manage, or maintain infrastructure or tooling to support ML. It runs a user’s model on auto-scaling clusters spread across multiple Availability Zones (AZ) to deliver high performance and high availability. Storage and network charges are based on usage, so costs are controlled. It also has built-in security and compliance for ML workloads, so there’s no need for investing in added security.

SageMaker offers hosted Jupyter notebooks that help users explore and visualize the training data kept in Amazon Simple Storage Service (S3). The Jupyter notebooks require no setup, so processing the training data sets can begin immediately. It takes only a few clicks in the SageMaker console to create a fully managed notebook instance, pre-loaded with useful libraries for ML. Then the data can be added.

Users can directly connect the information in S3 or use AWS Glue to shift data from Amazon Redshift and Amazon DynamoDB for processing in a data store. Pre-trained ML models are provided that can be deployed as-is, and built-in ML algorithms that users can train on their own data. They’re usually optimized so that they can deliver 10 times the performance of running the algorithms elsewhere.

When Amazon FSx for Lustre file system is linked to the S3 buckets, it automatically copies objects from S3 to the file system the first time objects are accessed. The same FSx file system can be used across multiple SageMaker jobs to prevent repeated downloading of common objects.

SageMaker uses common ML algorithms optimized to run efficiently against large data sets in a distributed environment. With the distributed model building, training, and validation service, users can pick an AWS algorithm off the shelf, import a popular framework, or write and deploy their own algorithm with Docker containers. SageMaker is also pre-configured so that it can run Apache MXNet and TensorFlow.

By simply checking a box, SageMaker will spin up multiple copies of the trained model and use ML to look at each change in parallel and tune parameters accordingly.

For training, users can specify a location in S3 and the instance they want to use. In a single click, SageMaker spins up an isolated cluster and software-defined network with autoscaling and data pipelines to start training. When the process is completed, it tears down the cluster.

SageMaker-enabled models can be connected to AWS services. Various interfaces can be used to interact with SageMaker. It has two APIs: a high-level API for working with a variety of pre-optimized ML libraries (like MXNet, TensorFlow, and scikit-learn), and a low-level API that allows running completely custom jobs where anything goes. Any library and any API that can fit into a Docker image can be used with SageMaker. The service also provides SageMaker API bindings for languages such as Python, Ruby, and JavaScript.

HTTPs endpoints are used for model hosting, which can scale to support traffic and allow for A/B testing of multiple models at the same time. The algorithms can be deployed straight into production using EC2 instances with one click, after which it will be deployed with autoscaling across availability zones.

So far, the reviews of SageMaker have been good. Of the 69 reviews posted on Gartner Peer Insights, 67 gave the service 4 or 5 stars. Based on our experience using SageMaker thus far, our team at ClearScale gives it 5 stars.

SageMaker Studio

With the 2019 release of SageMaker Studio — the first serverless and fully integrated development environment (IDE) — AWS has made the ML development process even easier.

SageMaker Studio uses a single web-based visual interface for performing all ML development activities, including notebooks, experiment management, automatic model creation, debugging and profiling, and model drift detection. All steps in the ML workflow are tracked within the IDE. This makes it easy to move back and forth between steps, as well as to copy, modify, and replay each step.

SageMaker Studio Notebooks

Amazon SageMaker Studio Notebooks are the next generation of Amazon SageMaker notebooks. These notebooks include the following new features: AWS Single Sign-On (AWS SSO) integration, fast start-up times, and the ability to share notebooks with a single click.

SageMaker Experiments

Amazon SageMaker Experiments allows tracking results of the ML experiments in experiments and trials. Training a model requires running data through the model for several iterations, and entails trying different algorithms, fine-tuning parameters, adjusting features, and more. SageMaker Experiments enables users to store each optimization as an “experiment,” capturing input parameters, configurations, results, and other information for each iteration. Users can then browse through them using a visual interface to review their performance.

SageMaker Autopilot

Using a single API call, or a few clicks in SageMaker Studio, Amazon SageMaker Autopilot inspects the data set and then runs candidates to determine the optimal combination of data preprocessing steps, ML algorithms, and hyperparameters. The information is used to train an inference pipeline that can be deployed on a real-time endpoint or for batch processing. It also generates Python code showing exactly how data was preprocessed. Autopilot currently supports:

  • Input data in tabular format, with automatic data cleaning and preprocessing
  • Automatic algorithm selection for linear regression, binary classification, and multi-class classification
  • Automatic hyperparameter optimization Distributed training
  • Automatic instance and cluster size selection

SageMaker Debugger

Amazon SageMaker Debugger provides full visibility into the training of ML models by monitoring, recording, and analyzing the tensor data that captures the state of a ML training job at each instance in its lifecycle. It can automatically detect commonly occurring errors such as gradient values getting too large or too small. Pre-built Docker images are available to run your custom rules, or users can build their own Docker image for custom rule evaluation.

SageMaker Model Monitor

Amazon SageMaker Model Monitor continuously monitors the quality of Amazon SageMaker ML models in production. Alerts can be set for data drift and other deviations in model quality. Users can employ pre-built monitoring capabilities that don’t require coding or have the flexibility to monitor models by coding to provide custom analysis. In addition, SageMaker Model Monitor:

  • Checks data quality in production (inference) and model accuracy detection
  • Uses Deequ, an open source library built on Apache Spark, to measure data quality in large datasets and suggest baseline values based on the provided dataset
  • Provides the ability to add Python-based lifecycle hooks
  • Allows for creating custom monitoring Docker containers

SageMaker Ground Truth

We’re also pleased with Amazon SageMaker Ground Truth, an automated data-labeling service that AWS launched in 2018 and continues to enhance. It’s a service we successfully used, along with Amazon Forecast and Amazon Personalize, to help a customer enhance the processes behind its college/university recruitment and retention tool.

Datasets are usually obtained from various sources and employ different formats. Algorithms can’t work with raw data, so data preparation often requires manual labeling. SageMaker Ground Truth uses pre-trained ML models to automatically label raw data, significantly reducing the time and effort required to create labeled datasets. It gets progressively better over time by learning from labels created by manual methods.

Amazon SageMaker Operators for Kubernetes

Another enhancement to SageMaker launched in 2019 was Amazon SageMaker Operators for Kubernetes. It makes it easier for developers and data scientists that use Kubernetes — the open-source, general-purpose container orchestration system — to train, tune, and deploy ML models in SageMaker. The operators can be installed on a Kubernetes cluster to create SageMaker jobs natively using the Kubernetes API and command-line Kubernetes tools, such as kubectl.

SageMaker Features Worth Noting

SageMaker also includes a wide variety of extremely beneficial, time-saving features. The following are three that we think to stand out:

  • Amazon SageMaker Elastic Inference. Amazon Elastic Inference (EI) accelerates the throughput and decreases the latency of real-time inferences from deep learning (DL) models deployed as SageMaker hosted models for significantly less than the cost of using a GPU instance as an endpoint. An EI accelerator, in one of the available sizes, can be added to a deployable model in addition to a CPU instance type. That model can then be added as a production variant to an endpoint configuration used to deploy a hosted endpoint. An EI accelerator can also be added to an Amazon SageMaker notebook instance for testing and evaluating inference performance when building models. Elastic Inference is supported in EI-enabled versions of TensorFlow and MXNet. To use other DL frameworks, the model can be exported using ONNX and then imported into MXNet.

  • Automatic Hyperparameter Tuning. Amazon SageMaker hyperparameter tuning (automatic model tuning) finds the best version of a model by running many training jobs on a dataset using the algorithm and ranges of hyperparameters the user specifies. It then chooses the hyperparameter values that yield the best-performing model based on the metric selected. Hyperparameter tuning can be used with built-in algorithms, custom algorithms, and Amazon SageMaker pre-built containers for ML frameworks.

  • SageMaker Neo + IoT Greengrass. AWS SageMaker Neo enables ML models to train once and run anywhere in the cloud and at the edge. It automatically optimizes TensorFlow, MXNet, PyTorch, ONNX, and XGBoost models for deploying on ARM, Intel, and Nvidia processors, with the models running up to twice as fast and consuming less than a tenth of the memory footprint. By using Neo with AWS IoT Greengrass, models can be retrained in Amazon SageMaker and the optimized models updated quickly to improve intelligence on a broad range of edge devices based on the Nvidia Jetson TX2, Arm v7 (Raspberry Pi), or Intel Atom platforms.

ClearScale & The Future of SageMaker and ML

SageMaker is proving to be an invaluable asset for developers involved in ML. It simplifies the ML development process and offers both time savings and cost savings. It allows for greater customization of models as well as of the components used in the ML development process. With SageMaker Studio, users have complete access, control, and visibility into each step required to build, train, and deploy models.

Based on the frequency and usefulness of AWS’s SageMaker enhancements, we’re confident that even bigger and better features are coming soon — and will keep coming from AWS.

Still, ML development remains a complex process. As the use cases for it expand and new technologies emerge, the processes will likely get even more complicated or at least require more creative approaches.

Simply choosing the right tools will be a challenge. Just consider all the options AWS already offers: Personalize, Forecast, Fraud Detection, Transcribe, Translate, Comprehend, and more. There’s also the matter of if the solution required needs to be custom or can be fulfilled using an off-the-shelf product, managed services or a combination of them all.

This is where ClearScale comes in. ClearScale is an AWS Premier Consulting Partner and an early adopter of AWS services and tools — including those specific to ML, as well as (DL and AI).

We’re among the few companies that have the experience and expertise to employ the right AWS tools, as well as best practices, to create efficient cloud-based ML solutions. We’ve been working at the forefront of the ML space with our customers to create new ways of tapping the power and potential of ML. You can read our ML / AI case studies here.

If you’re interested in learning how ClearScale can help you get in on the action, contact us.

Get in touch today to speak with a cloud expert and discuss how we can help:

Call us at 1-800-591-0442
Send us an email at
Fill out a Contact Form
Read our Customer Case Studies