Once you enroll for a batch, you are welcome to participate in any future batches free. If you have any doubts, our support team will assist you in clearing your technical doubts.
Learn industry-ready Hadoop frameworks, tools, vocabulary, and best practices from industry experts who work at Fortune 500 companies.
Learning a new technology is always a challenge. We have our tech support team to help you troubleshoot your problems via email or remote sessions.
Our mentors are industry professionals with years of experience. They help design the latest cutting edge curriculum. You can reach out to them anytime.
In the first class - we will learn about Big Data, Hadoop and MapReduce concepts. We will setup Cloudera Quickstart VM and go over Hadoop installations. The faculty will guide you on how to access all datasets and setup your github accounts.
In this project we will learn to install and play around with importing and exporting data with Sqoop. First we will import the MySQL World database tables into HDFS with default delimiters and using non-default file formats. Then we will practice by importing the Sakila film database tables into HDFS. Finally, we will export a parquet file on customer data from HDFS to MySQL.
In this project we will build a Flume agent to ingest data from Directory spool source to HDFS. We will be using interceptors and channel selectors. We will then use the famous Twitter example and build a Flume agent to stream data from Twitter to HDFS.
In this project, we will learn about Apache Pig and how to use it to process the Movielens dataset. We will get familiar with the various Pig operators used for data processing. We will cover how to use UDFs and write your own custom UDFs. Finally we will take a look at diagnostics and performance tunning.
In this project we will learn about Apache Hive - another popular processing framework. We will be using Hive to process the NYSE trading data on daily price. We will see the different file types and formats available and how to create and load data into Hive tables. Finally we will show how partitioning works in Hive.
We will dive deeper into Hive in this project. We will be working with the airline on-time performance dataset. As part of processing this data - we will learn about Joins. We will also learn how to use built in UDFs and also create your custom UDFs/UDAFs.
Here we will be introducing the workflow manager - Oozie. We will learn how to create and run an oozie work flow for movielens data processing pipeline. We will be using the Pig and Hive actions to setup this work flow. We will also be building Oozie coordinators using Time and data triggers.
This is our final project and will be using multiple tools we have learnt through the course. We will be loading live twitter feeds related to jobs advertisements. We will setup batch processing of tweets on hdfs and then extract the data to MySQL database. We will also use Oozie as workflow manager.