Explain the features of Amazon Elastic Inference

In this recipe, we will learn about Amazon Elastic Inference. We will also learn about the features of Amazon Elastic Inference.
Last Updated: 10 Apr 2023

Get access to Big Data projects View all Big Data projects

BIG DATA RECIPES DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective - Explain the features of Amazon Elastic Inference?

The Amazon Elastic Inference is widely used and allows users to attach low-cost GPU-powered acceleration to the Amazon EC2 and AWS Sagemaker instances or Amazon ECS tasks, to reduce the cost of running the deep learning inference by up to 75%. Amazon Elastic Inference further supports the PyTorch, TensorFlow, Apache MXNet and ONNX models. Also, Inference is defined as the process of making predictions using the trained model. Also, In deep learning applications, inference accounts for up to 90% of total operational costs for two reasons, Firstly, standalone GPU instances are typically designed for the model training and not for the inference So, While training jobs batch process, hundreds of data samples in parallel, inference jobs usually process a single input in real-time and thus consume a small amount of GPU compute and this makes standalone GPU inference cost-inefficient. Also, the standalone CPU instances are not specialized for matrix operations and thus are often too slow for deep learning inference. Secondly, different models have different GPU, CPU and memory requirements So, Optimizing for one resource can lead to underutilization of other resources and further higher costs. The Amazon Elastic Inference solves these problems by allowing users to attach just the right amount of GPU-powered inference acceleration to any Amazon EC2 or AWS SageMaker instance type or ECS task, with no code changes. Also, With Amazon Elastic Inference, users are allowed to choose any CPU instance in AWS which is best suited to the overall compute and memory needs of the application and then separately configure the right amount of GPU-powered inference acceleration, allowing users to efficiently utilize resources and reduce costs.

Recipe Objective - Explain the features of Amazon Elastic Inference?

Benefits of Amazon Elastic Inference

The Amazon Elastic Inferences allows users to choose the instance type which is best suited to the overall compute and memory needs of the users' application. Users can separately specify the amount of inference acceleration that they need and this reduces inference costs by up to 75% because users no longer need to over-provision GPU compute for inference. The Amazon Elastic Inference can provide as little as the single-precision TFLOPS (trillion floating-point operations per second) of the inference acceleration or as much as 32 mixed-precision TFLOPS and this is a much more appropriate range of inference compute than the range of up to 1,000 TFLOPS provided by the standalone Amazon EC2 P3 instance and thus give users what they exactly need. The Amazon Elastic Inference can easily scale the amount of inference acceleration up and down using the Amazon EC2 Auto Scaling groups to meet the demands of users applications without over-provisioning capacity, when EC2 Auto Scaling increases, user's EC2 instances to meet increasing demand and it also automatically scales up the attached accelerator for each instance and thus responding the changes in demand.

System Requirements

Any Operating System(Mac, Windows, Linux)

This recipe explains Amazon Elastic Inference and features of Amazon Inference.

Features of Amazon Elastic Inference

It provides Auto-scaling.

Amazon Elastic Inference is described to be a part of the same Amazon EC2 Auto Scaling group which is used to scale the Amazon SageMaker, Amazon EC2, and Amazon ECS instances. EC2 Auto Scaling also scales up the accelerator attached to each instance when it adds more EC2 instances to meet the demands of the user's application. Similarly, when Auto Scaling reduces users' EC2 instances as demand goes down, it also scales down the attached accelerator for each instance So, this makes it easy to scale user's inference acceleration alongside the application’s compute capacity to meet the demands of the user's application.

It provides a choice of single or mixed-precision operations.

Amazon Elastic Inference accelerators support both the single-precision (32-bit floating point) operations and mixed-precision (16-bit floating point) operations. The single-precision provides an extremely large numerical range to represent the parameters used by the model. Also, most models don’t need this much precision and calculating numbers that large results in unnecessary loss of performance. And to avoid the problem, mixed-precision operations allow users to reduce the numerical range by half to gain up to 8x greater inference performance.

It is available in multiple amounts of acceleration.

Amazon Elastic Inference is available in multiple throughput sizes ranging from the 1 to 32 trillion floating-point operations per second (TFLOPS) per accelerator and thus making it efficient for accelerating a wide range of inference models including computer vision, natural language processing, and speech recognition. As compared to standalone Amazon EC2 P3 instances that start at 125 TFLOPS (the smallest P3 instance available), the Amazon Elastic Inference starts at a single TFLOPS per accelerator and this allows users to scale up the inference acceleration in more appropriate increments. It allows users to select from larger accelerator sizes, up to 32 TFLOPS per accelerator, for more complex models.

It provides support of the Open Neural Network Exchange (ONNX) format.

Amazon Elastic Inference provides support of ONNX which is an open format that makes it possible to train a model in one deep learning framework and then further transfer it to another for inference. This allows users to take advantage of the relative strengths of different frameworks. The ONNX is also integrated into PyTorch, MXNet, Chainer, Caffe2, and Microsoft Cognitive Toolkit, and there are connectors for many other frameworks including TensorFlow. Amazon Elastic Inference allows using ONNX with it by transferring the user's trained models to the AWS-optimized version of Apache MXNet for production deployment.

It comes integrated with Amazon SageMaker, Amazon EC2, and Amazon ECS.

Amazon Elastic Inference comes integrated with Amazon SageMaker, Amazon EC2 and Amazon ECS. There are multiple ways to run inference workloads on AWS i.e. first deploy the model on Amazon SageMaker for a fully managed experience, or run it on Amazon EC2 instances or Amazon ECS tasks and manage it yourself. The Amazon Elastic Inference is integrated to work seamlessly with Amazon SageMaker, Amazon EC2, and Amazon ECS, allowing users to add inference acceleration in all scenarios. Users can also specify the desired amount of inference acceleration when they create their model's HTTPS endpoint in Amazon SageMaker, when it launches users Amazon EC2 instance, and when it defines the Amazon ECS task.

It provides support of TensorFlow, Apache MXNet and PyTorch.

Amazon Elastic Inference is widely used and further designed to be used with AWS’s enhanced versions of TensorFlow Serving, Apache MXNet and PyTorch. These enhancements enable the frameworks to automatically detect the presence of the inference accelerators, optimally distribute the model operations between the accelerator’s GPU and the instance’s CPU, and securely control access to the accelerators using AWS Identity and Access Management (IAM) policies. Also, the enhanced TensorFlow Serving, MXNet and PyTorch libraries are provided automatically in Amazon SageMaker, AWS Deep Learning AMIs, and AWS Deep Learning Containers, so users don't have to make any code change to deploy your models in production.

What Users are saying..

Jingwei Li

Graduate Research assistance at Stony Brook University

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More