Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
Data engineering is the science of acquiring, aggregating or collection, processing and storage of data either in batch or in real time as well as providing variety of means of serving these data to other users which could include a data scientist. It involves software engineering practises on big data.
In this big data project for beginners, we will continue from a previous hive project on "Data engineering on Yelp Datasets using Hadoop tools" where we applied some data engineering principles to the Yelp Dataset in the areas of processing, storage and retrieval. Like in that session, We will not include data ingestion since we are already downloading the data from the yelp challenge website. But unlike that session, we will focus on doing the entire data processing using spark.
Hive Project -Learn to write a Hive program to find the first unique URL, given 'n' number of URL's.
In this project, we will evaluate and demonstrate how to handle unstructured data using Spark.
In this big data project, we will be performing an OLAP cube design using AdventureWorks database. The deliverable for this session will be to design a cube, build and implement it using Kylin, query the cube and even connect familiar tools (like Excel) with our new cube.