We have come to learn that Hadoop's distributed file system was engineered to favor fewer larger files over many small files. However, we mostly would not have control over how data come. Many data ingestion to data infrastructures come in small bits and whether we are implementing a data lake on HDFS or not, we will have to deal with this data inputs.
In this hackerday, we are going to be continuing the series on data engineering by discussing and implementing various ways to solve the Hadoop big data problem.
We will start by defining what it means, how inevitable this situation could arise, how to identify bottlenecks in a cluster owing to the small file problem and varieties of ways to solve them.
Stay updated in technology trends by working on projects
Live online coding sessions led by industry experts
Build 2-4 projects a month each lasting 6 hours designed to teach you advanced concepts
Code in groups and connect with your community