Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
Not all dataset comes structure. Or better put, there are more unstructured or semi-structured datasets that they are structured. And as a data engineer, we should at least give a good amount of structure or schema to data before it becomes useful for any downstream operation.
In this Hackerday session, we will evaluate and demonstrate how to handle rather unstructured data sets from the ginniemae.gov data disclosure history site. This dataset is a free text data that comes with a codebook describing the data. A lot does actually happen between the codebook and the data and we will see all in this sessions.
Ginnie Mae is a federally-owned corporation that helps to create and guarantee mortgage-backed securities in the US housing market. It is a lot more than that. See https://www.investopedia.com/terms/g/ginniemae.asp from more.
In this big data project, we'll work through a real-world scenario using the Cortana Intelligence Suite tools, including the Microsoft Azure Portal, PowerShell, and Visual Studio.
In this hive project , we will build a Hive data warehouse from a raw dataset stored in HDFS and present the data in a relational structure so that querying the data will be natural.
In this project, we will look at Cassandra and how it is suited for especially in a hadoop environment, how to integrate it with spark, installation in our lab environment.