How to connect to API endpoint and query data using Python?
BIG DATA RECIPES DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to connect to API endpoint and query data using Python?

How to connect to API endpoint and query data using Python?

This recipe helps you connect to API endpoint and query data using Python

0

Recipe Objective

In big data scenarios , we are going to connect to multiple API endpoints and need to retrieve the data from the api which is the very first step of data extraction and perform some processing like parsing , cleaning, transformation on the data for deriving business value out of the data.

System requirements :

  • Install the python module as follows if the below modules are not found:
  • pip install requests
  • The below codes can be run in Jupyter notebook , or any python console
  • In this scenario we are using an open api to make an requests and the link open api Click Here
  • Link to some public api’s for you Open-API's

Step 1: Import the module

The most common library for making requests and working with APIs is the requests library. You’ll need to import it. Let’s start with that important step:

import requests

Step 2: Making an HTTP request

To make a request to the API, there are different types of requests like GET, POST etc. GET request is the most commonly using one It used to get the data from api.when get the request is successful then it will give the response status code 200 , to make GET request we will use the get() method.

Sample code to make a request :

import requests response = requests.get('https://ghibliapi.herokuapp.com/films/') if response.status_code == 200: print("Succesful connection with API.") print('-------------------------------') data = response.json() print(data) elif response.status_code == 404: print("Unable to reach URL.") else: print("Unable to connect API or retrieve data.")

In the above code if the request response statuscode is 200, then it will print the "Successful connection with API." and also prints data in the json object format , otherwise it will print the "unable to reach URL".

Output of the above code:

Step 3: To query the data by sending the params

To query the specific data from the api will pass the "params" variable as send argument in the get method.

import requests response = requests.get('https://ghibliapi.herokuapp.com/films/', params={'id' :"4e236f34-b981-41c3-8c65-f8c9000b94e7"}) if response.status_code == 200: print("Successful connection with API.") print('-------------------------------') data = response.json() elif response.status_code == 404: print("Unable to reach URL.") else: print("Unable to connect API or retrieve data.") for record in data: print("Title: {},\n Release_Date: {},\n Director: {},\n".format(record['title'] , record['release_date'],record['director']))

Output of the above code: In the above code we query the specific data from the api and print the data in the sorted structure, Here the result is it will print specific id of data.

Relevant Projects

Design a Hadoop Architecture
Learn to design Hadoop Architecture and understand how to store data using data acquisition tools in Hadoop.

Real-Time Log Processing in Kafka for Streaming Architecture
The goal of this apache kafka project is to process log entries from applications in real-time using Kafka for the streaming architecture in a microservice sense.

Airline Dataset Analysis using Hadoop, Hive, Pig and Impala
Hadoop Project- Perform basic big data analysis on airline dataset using big data tools -Pig, Hive and Impala.

Real-Time Log Processing using Spark Streaming Architecture
In this Spark project, we are going to bring processing to the speed layer of the lambda architecture which opens up capabilities to monitor application real time performance, measure real time comfort with applications and real time alert in case of security

Implementing Slow Changing Dimensions in a Data Warehouse using Hive and Spark
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Tough engineering choices with large datasets in Hive Part - 1
Explore hive usage efficiently in this hadoop hive project using various file formats such as JSON, CSV, ORC, AVRO and compare their relative performances

Analyse Yelp Dataset with Spark & Parquet Format on Azure Databricks
In this Databricks Azure project, you will use Spark & Parquet file formats to analyse the Yelp reviews dataset. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.

Tough engineering choices with large datasets in Hive Part - 2
This is in continuation of the previous Hive project "Tough engineering choices with large datasets in Hive Part - 1", where we will work on processing big data sets using Hive.

Yelp Data Processing using Spark and Hive Part 2
In this spark project, we will continue building the data warehouse from the previous project Yelp Data Processing Using Spark And Hive Part 1 and will do further data processing to develop diverse data products.

Real-time Auto Tracking with Spark-Redis
Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.