One of the questions in the Module 2 quiz has "hadoop dfs -dus /input/data" as the right answer. While this works, there is also a note that "dus" is deprecated, and to please use "du -s" instead. Perhaps "hadoop dfs -du -s /input/data" would be a better answer?
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.
In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.