Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
I have worked for more than 15 years in Java and J2EE and have recently developed an interest in Big Data technologies and Machine learning due to a big need at my workspace. I was referred here by a... Read More
I have 11 years of experience and work with IBM. My domain is Travel, Hospitality and Banking - both sectors process lots of data. The way the projects were set up and the mentors' explanation was... Read More
Spark SQL offers the platform to provide a structured data to any dataset regardless its source or form. And once that structured data is formed, it can be queried using tools like Hive, Impala, and other Hadoop data warehouse tools.
In this spark project, we will go through Spark SQL syntax to process the dataset, perform some joins with other supplementary data as well as make the data available for the query using the Spark SQL thrift server. On provision of the data, we will perform some interesting query and other go through some performance tuning technique for Spark SQL.
In this hadoop project, we are going to be continuing the series on data engineering by discussing and implementing various ways to solve the hadoop small file problem.
In this big data project, we'll work with Apache Airflow and write scheduled workflow, which will download data from Wikipedia archives, upload to S3, process them in HIVE and finally analyze on Zeppelin Notebooks.
In this NoSQL project, we will use two NoSQL databases(HBase and MongoDB) to store Yelp business attributes and learn how to retrieve this data for processing or query.