Introduction to Amazon Athena and its use cases

In this recipe, we will learn about Amazon Athena. We will also learn about the use cases of Amazon Athena.

Recipe Objective - Introduction to Amazon Athena and its use cases?

The Amazon Athena is widely used and is defined as an interactive query service that makes it easy to analyze data in Amazon S3 using the standard SQL. Amazon Athena is serverless, so there is no infrastructure to manage, and users pay only for the queries that they run. Amazon Athena is easy to use and simply point to users' data in Amazon S3, define the schema, and start querying using standard SQL. Further, most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare user's data for the analysis and this makes it easy for anyone with SQL skills to quickly analyze large-scale datasets. Amazon Athena is out-of-the-box integrated with the AWS Glue Data Catalog allowing users to create the unified metadata repository across various services, crawl data sources to discover schemas and populate their Catalog with new and modified table and partition definitions, and maintain the schema versioning. Amazon Athena is the serverless data query tool which means it is scalable and cost-effective at the same time. Usually, customers are charged on a pay per query basis which further translates to the number of queries that are executed at a given time. The normal charge for scanning 1TB of data from S3 is 5 USD and although it looks quite a small amount at a first glance when users have multiple queries running on hundreds and thousands of GB of data, the price might get out of control at times

Benefits of Amazon Athena

The Amazon Athena offers only Pay per query i.e. users pay only for the queries that they run. So, users are charged $5 per terabyte scanned by their queries. Also, users can save from 30% to 90% on their per-query costs and get better performance by compressing, partitioning, and converting your data into columnar formats. Athena queries data directly in Amazon S3 and there are no additional storage charges beyond S3. With Amazon Athena, Users don't have to worry about having enough compute resources to get fast, interactive query performance. Amazon Athena automatically executes queries in parallel, so most results come back within seconds and thus it is Fast and is fast. Amazon Athena uses Presto with the ANSI SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet. Amazon Athena is ideal for quick, ad-hoc querying but it can further also handle complex analysis, including large joins, window functions, and arrays. Amazon Athena is highly available and executes queries using the compute resources across multiple facilities and multiple devices in each facility. Amazon Athena uses Amazon S3 as its underlying data store making user's data highly available and durable and thus it is Open, powerful and standard. Amazon Athena is serverless and users can quickly query their data without having to set up and manage any servers or data warehouses. Just point to the user's data in the Amazon S3, define the schema, and start querying using the built-in query editor. Amazon Athena allows users to tap into all their data in S3 without the need to set up further complex processes to extract, transform, and load the data (ETL) and thus provides querying data instantly.

Check Out Top SQL Projects to Have on Your Portfolio

System Requirements

  • Any Operating System(Mac, Windows, Linux)

This recipe explains Amazon Athena and the Use cases of Amazon Athena.

Use cases of Amazon Athena

    • It provides Flexibility

Amazon Athena’s open and versatile architecture doesn’t restrict users to a specific vendor, technology, or tool. Users can, for eg, work with a wide range of open-source file formats, as well as switch freely between the query engines without adjusting the schema.

    • It provides Widely accessibility

Amazon Athena is widely accessible to anyone - not just the developers and engineers. Even business analysts and other data professionals can adopt it, as standard SQL queries are very simple and straightforward and provide a service that runs its queries using standard SQL.

    • It provides Cost-effective: service

Amazon Athena is not only cost-effective but also considerably cheaper than its close competitors. The reason is that the service doesn’t charge users for compute instances. Instead, users only pay for the queries they are running.

  • It provides Serverless service

Amazon Athena saves users all the trouble which comes with infrastructure management. Users don’t have to worry about setting up clusters, regulating capacity, or loading data since it’s distributed as a fully-managed serverless service.

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Project-Driven Approach to PySpark Partitioning Best Practices
In this Big Data Project, you will learn to implement PySpark Partitioning Best Practices.

EMR Serverless Example to Build a Search Engine for COVID19
In this AWS Project, create a search engine using the BM25 TF-IDF Algorithm that uses EMR Serverless for ad-hoc processing of a large amount of unstructured textual data.

SQL Project for Data Analysis using Oracle Database-Part 7
In this SQL project, you will learn to perform various data wrangling activities on an ecommerce database.

Web Server Log Processing using Hadoop in Azure
In this big data project, you will use Hadoop, Flume, Spark and Hive to process the Web Server logs dataset to glean more insights on the log data.

Log Analytics Project with Spark Streaming and Kafka
In this spark project, you will use the real-world production logs from NASA Kennedy Space Center WWW server in Florida to perform scalable log analytics with Apache Spark, Python, and Kafka.

Building Real-Time AWS Log Analytics Solution
In this AWS Project, you will build an end-to-end log analytics solution to collect, ingest and process data. The processed data can be analysed to monitor the health of production systems on AWS.

Getting Started with Pyspark on AWS EMR and Athena
In this AWS Big Data Project, you will learn to perform Spark Transformations using a real-time currency ticker API and load the processed data to Athena using Glue Crawler.

Azure Stream Analytics for Real-Time Cab Service Monitoring
Build an end-to-end stream processing pipeline using Azure Stream Analytics for real time cab service monitoring

Talend Real-Time Project for ETL Process Automation
In this Talend Project, you will learn how to build an ETL pipeline in Talend Open Studio to automate the process of File Loading and Processing.

Learn Data Processing with Spark SQL using Scala on AWS
In this AWS Spark SQL project, you will analyze the Movies and Ratings Dataset using RDD and Spark SQL to get hands-on experience on the fundamentals of Scala programming language.