Introduction to Amazon Athena and its use cases

In this recipe, we will learn about Amazon Athena. We will also learn about the use cases of Amazon Athena.
Last Updated: 23 Nov 2022

Get access to Big Data projects View all Big Data projects

BIG DATA RECIPES DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective - Introduction to Amazon Athena and its use cases?

The Amazon Athena is widely used and is defined as an interactive query service that makes it easy to analyze data in Amazon S3 using the standard SQL. Amazon Athena is serverless, so there is no infrastructure to manage, and users pay only for the queries that they run. Amazon Athena is easy to use and simply point to users' data in Amazon S3, define the schema, and start querying using standard SQL. Further, most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare user's data for the analysis and this makes it easy for anyone with SQL skills to quickly analyze large-scale datasets. Amazon Athena is out-of-the-box integrated with the AWS Glue Data Catalog allowing users to create the unified metadata repository across various services, crawl data sources to discover schemas and populate their Catalog with new and modified table and partition definitions, and maintain the schema versioning. Amazon Athena is the serverless data query tool which means it is scalable and cost-effective at the same time. Usually, customers are charged on a pay per query basis which further translates to the number of queries that are executed at a given time. The normal charge for scanning 1TB of data from S3 is 5 USD and although it looks quite a small amount at a first glance when users have multiple queries running on hundreds and thousands of GB of data, the price might get out of control at times

Recipe Objective - Introduction to Amazon Athena and its use cases?

Benefits of Amazon Athena

The Amazon Athena offers only Pay per query i.e. users pay only for the queries that they run. So, users are charged $5 per terabyte scanned by their queries. Also, users can save from 30% to 90% on their per-query costs and get better performance by compressing, partitioning, and converting your data into columnar formats. Athena queries data directly in Amazon S3 and there are no additional storage charges beyond S3. With Amazon Athena, Users don't have to worry about having enough compute resources to get fast, interactive query performance. Amazon Athena automatically executes queries in parallel, so most results come back within seconds and thus it is Fast and is fast. Amazon Athena uses Presto with the ANSI SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet. Amazon Athena is ideal for quick, ad-hoc querying but it can further also handle complex analysis, including large joins, window functions, and arrays. Amazon Athena is highly available and executes queries using the compute resources across multiple facilities and multiple devices in each facility. Amazon Athena uses Amazon S3 as its underlying data store making user's data highly available and durable and thus it is Open, powerful and standard. Amazon Athena is serverless and users can quickly query their data without having to set up and manage any servers or data warehouses. Just point to the user's data in the Amazon S3, define the schema, and start querying using the built-in query editor. Amazon Athena allows users to tap into all their data in S3 without the need to set up further complex processes to extract, transform, and load the data (ETL) and thus provides querying data instantly.

Check Out Top SQL Projects to Have on Your Portfolio

System Requirements

Any Operating System(Mac, Windows, Linux)

This recipe explains Amazon Athena and the Use cases of Amazon Athena.

Use cases of Amazon Athena

It provides Flexibility

Amazon Athena’s open and versatile architecture doesn’t restrict users to a specific vendor, technology, or tool. Users can, for eg, work with a wide range of open-source file formats, as well as switch freely between the query engines without adjusting the schema.

It provides Widely accessibility

Amazon Athena is widely accessible to anyone - not just the developers and engineers. Even business analysts and other data professionals can adopt it, as standard SQL queries are very simple and straightforward and provide a service that runs its queries using standard SQL.

It provides Cost-effective: service

Amazon Athena is not only cost-effective but also considerably cheaper than its close competitors. The reason is that the service doesn’t charge users for compute instances. Instead, users only pay for the queries they are running.

It provides Serverless service

Amazon Athena saves users all the trouble which comes with infrastructure management. Users don’t have to worry about setting up clusters, regulating capacity, or loading data since it’s distributed as a fully-managed serverless service.

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More