Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
This was great. The use of Jupyter was great. Prior to learning Python I was a self taught SQL user with advanced skills. I hold a Bachelors in Finance and have 5 years of business experience.. I... Read More
Recently I became interested in Hadoop as I think its a great platform for storing and analyzing large structured and unstructured data sets. The experts did a great job not only explaining the... Read More
In this spark project, we will embark on real-time data collection and aggregation from a simulated real-time system.
The dataset for the project which will simulate our sensor data delivery is from Microsoft Research Asia GeoLife project. According to the paper, the dataset recoded a broad range of users’ outdoor movements, including not only life routines like go home and go to work but also some entertainments and sports activities, such as shopping, sightseeing, dining, hiking, and cycling. This trajectory dataset can be used in many research fields, such as mobility pattern mining, user activity recognition, location-based social networks, location privacy, and location recommendation.
As a part of this big data project, we will use the data to provide real time aggregates of the movements along a number of dimension like effective distance, duration, trajectories and more. All streamed data will be stored in the NoSQL database - HBase.
In this project, we will show how to build an ETL pipeline on streaming datasets using Kafka.
Learn to write a Hadoop Hive Program for real-time querying.
In this Spark project, we are going to bring processing to the speed layer of the lambda architecture which opens up capabilities to monitor application real time performance, measure real time comfort with applications and real time alert in case of security