In Eclipse you have to add the all the jars that are from hadoop.mapred.* packages. Please right click on the project -> properties ->Java Build path -> Libraries Tab -> Add external jars and add the following
/usr/lib/hadoop - add hadoop-annotations.jar,hadoop-auth.jar,hadoop-common.jar
/usr/lib/hadoop/client - Add all jars
/usr/lib/hadoop/lib - Add all jars.
Once these jars are added, Eclipse will recognize the MapReduce programs
In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.
The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.