Explain the features of Amazon Elastic Inference

In this recipe, we will learn about Amazon Elastic Inference. We will also learn about the features of Amazon Elastic Inference.

Recipe Objective - Explain the features of Amazon Elastic Inference?

The Amazon Elastic Inference is widely used and allows users to attach low-cost GPU-powered acceleration to the Amazon EC2 and AWS Sagemaker instances or Amazon ECS tasks, to reduce the cost of running the deep learning inference by up to 75%. Amazon Elastic Inference further supports the PyTorch, TensorFlow, Apache MXNet and ONNX models. Also, Inference is defined as the process of making predictions using the trained model. Also, In deep learning applications, inference accounts for up to 90% of total operational costs for two reasons, Firstly, standalone GPU instances are typically designed for the model training and not for the inference So, While training jobs batch process, hundreds of data samples in parallel, inference jobs usually process a single input in real-time and thus consume a small amount of GPU compute and this makes standalone GPU inference cost-inefficient. Also, the standalone CPU instances are not specialized for matrix operations and thus are often too slow for deep learning inference. Secondly, different models have different GPU, CPU and memory requirements So, Optimizing for one resource can lead to underutilization of other resources and further higher costs. The Amazon Elastic Inference solves these problems by allowing users to attach just the right amount of GPU-powered inference acceleration to any Amazon EC2 or AWS SageMaker instance type or ECS task, with no code changes. Also, With Amazon Elastic Inference, users are allowed to choose any CPU instance in AWS which is best suited to the overall compute and memory needs of the application and then separately configure the right amount of GPU-powered inference acceleration, allowing users to efficiently utilize resources and reduce costs.

Benefits of Amazon Elastic Inference

  • The Amazon Elastic Inferences allows users to choose the instance type which is best suited to the overall compute and memory needs of the users' application. Users can separately specify the amount of inference acceleration that they need and this reduces inference costs by up to 75% because users no longer need to over-provision GPU compute for inference. The Amazon Elastic Inference can provide as little as the single-precision TFLOPS (trillion floating-point operations per second) of the inference acceleration or as much as 32 mixed-precision TFLOPS and this is a much more appropriate range of inference compute than the range of up to 1,000 TFLOPS provided by the standalone Amazon EC2 P3 instance and thus give users what they exactly need. The Amazon Elastic Inference can easily scale the amount of inference acceleration up and down using the Amazon EC2 Auto Scaling groups to meet the demands of users applications without over-provisioning capacity, when EC2 Auto Scaling increases, user's EC2 instances to meet increasing demand and it also automatically scales up the attached accelerator for each instance and thus responding the changes in demand.

System Requirements

  • Any Operating System(Mac, Windows, Linux)

This recipe explains Amazon Elastic Inference and features of Amazon Inference.

Features of Amazon Elastic Inference

    • It provides Auto-scaling.

Amazon Elastic Inference is described to be a part of the same Amazon EC2 Auto Scaling group which is used to scale the Amazon SageMaker, Amazon EC2, and Amazon ECS instances. EC2 Auto Scaling also scales up the accelerator attached to each instance when it adds more EC2 instances to meet the demands of the user's application. Similarly, when Auto Scaling reduces users' EC2 instances as demand goes down, it also scales down the attached accelerator for each instance So, this makes it easy to scale user's inference acceleration alongside the application’s compute capacity to meet the demands of the user's application.

    • It provides a choice of single or mixed-precision operations.

Amazon Elastic Inference accelerators support both the single-precision (32-bit floating point) operations and mixed-precision (16-bit floating point) operations. The single-precision provides an extremely large numerical range to represent the parameters used by the model. Also, most models don’t need this much precision and calculating numbers that large results in unnecessary loss of performance. And to avoid the problem, mixed-precision operations allow users to reduce the numerical range by half to gain up to 8x greater inference performance.

    • It is available in multiple amounts of acceleration.

Amazon Elastic Inference is available in multiple throughput sizes ranging from the 1 to 32 trillion floating-point operations per second (TFLOPS) per accelerator and thus making it efficient for accelerating a wide range of inference models including computer vision, natural language processing, and speech recognition. As compared to standalone Amazon EC2 P3 instances that start at 125 TFLOPS (the smallest P3 instance available), the Amazon Elastic Inference starts at a single TFLOPS per accelerator and this allows users to scale up the inference acceleration in more appropriate increments. It allows users to select from larger accelerator sizes, up to 32 TFLOPS per accelerator, for more complex models.

    • It provides support of the Open Neural Network Exchange (ONNX) format.

Amazon Elastic Inference provides support of ONNX which is an open format that makes it possible to train a model in one deep learning framework and then further transfer it to another for inference. This allows users to take advantage of the relative strengths of different frameworks. The ONNX is also integrated into PyTorch, MXNet, Chainer, Caffe2, and Microsoft Cognitive Toolkit, and there are connectors for many other frameworks including TensorFlow. Amazon Elastic Inference allows using ONNX with it by transferring the user's trained models to the AWS-optimized version of Apache MXNet for production deployment.

    • It comes integrated with Amazon SageMaker, Amazon EC2, and Amazon ECS.

Amazon Elastic Inference comes integrated with Amazon SageMaker, Amazon EC2 and Amazon ECS. There are multiple ways to run inference workloads on AWS i.e. first deploy the model on Amazon SageMaker for a fully managed experience, or run it on Amazon EC2 instances or Amazon ECS tasks and manage it yourself. The Amazon Elastic Inference is integrated to work seamlessly with Amazon SageMaker, Amazon EC2, and Amazon ECS, allowing users to add inference acceleration in all scenarios. Users can also specify the desired amount of inference acceleration when they create their model's HTTPS endpoint in Amazon SageMaker, when it launches users Amazon EC2 instance, and when it defines the Amazon ECS task.

    • It provides support of TensorFlow, Apache MXNet and PyTorch.

Amazon Elastic Inference is widely used and further designed to be used with AWS’s enhanced versions of TensorFlow Serving, Apache MXNet and PyTorch. These enhancements enable the frameworks to automatically detect the presence of the inference accelerators, optimally distribute the model operations between the accelerator’s GPU and the instance’s CPU, and securely control access to the accelerators using AWS Identity and Access Management (IAM) policies. Also, the enhanced TensorFlow Serving, MXNet and PyTorch libraries are provided automatically in Amazon SageMaker, AWS Deep Learning AMIs, and AWS Deep Learning Containers, so users don't have to make any code change to deploy your models in production.

What Users are saying..

profile image

Jingwei Li

Graduate Research assistance at Stony Brook University
linkedin profile url

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

Retail Analytics Project Example using Sqoop, HDFS, and Hive
This Project gives a detailed explanation of How Data Analytics can be used in the Retail Industry, using technologies like Sqoop, HDFS, and Hive.

AWS Project-Website Monitoring using AWS Lambda and Aurora
In this AWS Project, you will learn the best practices for website monitoring using AWS services like Lambda, Aurora MySQL, Amazon Dynamo DB and Kinesis.

Learn to Create Delta Live Tables in Azure Databricks
In this Microsoft Azure Project, you will learn how to create delta live tables in Azure Databricks.

Explore features of Spark SQL in practice on Spark 2.0
The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.

Build a Real-Time Spark Streaming Pipeline on AWS using Scala
In this Spark Streaming project, you will build a real-time spark streaming pipeline on AWS using Scala and Python.

Build a big data pipeline with AWS Quicksight, Druid, and Hive
Use the dataset on aviation for analytics to simulate a complex real-world big data pipeline based on messaging with AWS Quicksight, Druid, NiFi, Kafka, and Hive.

Python and MongoDB Project for Beginners with Source Code-Part 2
In this Python and MongoDB Project for Beginners, you will learn how to use Apache Sedona and perform advanced analysis on the Transportation dataset.

Deploying auto-reply Twitter handle with Kafka, Spark and LSTM
Deploy an Auto-Reply Twitter Handle that replies to query-related tweets with a trackable ticket ID generated based on the query category predicted using LSTM deep learning model.

dbt Snowflake Project to Master dbt Fundamentals in Snowflake
DBT Snowflake Project to Master the Fundamentals of DBT and learn how it can be used to build efficient and robust data pipelines with Snowflake.

Learn How to Implement SCD in Talend to Capture Data Changes
In this Talend Project, you will build an ETL pipeline in Talend to capture data changes using SCD techniques.