With petabytes of digital information surrounding us on any topic under the sun, we often forget the importance of reading a book, to gain in-depth understanding about the latest big data technologies. We usually refer to the information available on sites like ProjectPro, where the free resources are quite informative, when it comes to learning about Hadoop and its components. However, books are always special and play a vital role even in the digital era, though you might have switched to reading an eBook instead of a paperback copy. A popular author Neil Richard MacKinnon Gaiman said- “A book is a dream that you hold in your hand.”
The Hadoop Definitive Guide by Tom White could be The Guide in fulfilling your dream to pursue a career as a Hadoop developer or a big data professional. Technologies like Hadoop, MapReduce, Apache Spark, and Apache Storm are the latest promises in the big data world for lightning fast cluster computing. With 2016 being the best time to make a career in big data, nothing can beat the understanding “Hadoop –The Definitive Guide” provides - in learning Hadoop concepts theoretically and in depth.
Doug Cutting, Hadoop Founder, Yahoo! said about “Hadoop-The Definitive Guide”-
“Now you have the opportunity to learn about Hadoop from a master—not only of the technology, but also of common sense and plain talk.”
“Hadoop-The Definitive Guide” introduces the world of big data to a layman (assuming that the person reading the book has no prior knowledge of big data). With big data analytic technologies like Hadoop and Apache Spark gaining mainstream presence in the enterprise, the big data Hadoop ecosystem is becoming more specialized and is evolving continuously. “The Hadoop Definitive Guide” is available in four editions where each edition clearly highlights - how the Hadoop ecosystem has evolved over the previous versions. “Hadoop –The Definitive Guide”, often referred as the bible for Hadoopers ,is an excellent reference for professionals looking to harness the power of big data with the Hadoop ecosystem. Though the book does not have a basic head-to-toe tutorial on how to get started with writing your first MapReduce program ( ProjectPro has it for you here), it really helps understand how each component in the Hadoop ecosystem works and how all of them work together collectively. The book might not teach you on how to develop big data solutions but helps you understand the entire big data Hadoop domain.
ProjectPro offers Big Data and Hadoop Training to help you learn Hadoop by working on hands-on projects.
Many candidates are keen on understanding everything about Hadoop to help gauge, if Hadoop is the technology they need to process big data on a distributive platform. To help candidates get competent in using the Hadoop technology efficiently and quickly- Tom White’s “Hadoop-The Definitive Guide” has everything what a Hadoop book should provide its readers with understanding on how a component in the Hadoop ecosystem works, why it works that way and how it fits into the design of the overall Hadoop framework.
The latest fourth edition of the Hadoop book is organized into 24 chapters with 4 appendices. Chapters in the book are well-organized with the very first chapter beginning with introduction to what is Hadoop and its history. It also highlights the advantages of using Hadoop over other tools for parallel processing of large datasets. Then, it goes on to introduce the concepts of map, reduce and combiner functions which form the integral part of the Hadoop framework. “Hadoop-The Definitive Guide” slowly takes you through Hadoop and the usage of its components but at times it becomes a bit difficult and confusing to follow unless you practice hands-on with the code.
Each chapter in the book provides illustrative working examples of the core Hadoop concepts, along with their description and technicality aspects. However, the reader should follow the examples from the beginning, to understand the different Hadoop components, as the book uses the same data and use cases from previous chapters. This eases the understanding of the reader as they don’t have to dig into a new case for every new example which could otherwise distract their attention.
With dedicated chapters for configuring and operating a Hadoop cluster and appendices that detail the installation process with example code - will help the readers of the book gear up on how to install and implement the Hadoop framework.The book goes on to highlight some of the practical applications of Hadoop in later chapters, that gives the reader a clear understanding of situations, when Hadoop can be used and what is the best possible way to implement it for a given situation. The case-studies are not programming related, so they might not be of interest to few of the coding geeks but they give a clear understanding on how Hadoop is used in practical situations to overcome various parallel processing problems that enterprises encounter.
As the book “Hadoop-The Definitive Guide” is mainly focussed on data processing, the latest edition i.e. the fourth edition of the book adds two new chapters related to the processing frameworks Apache Spark and Apache Crunch, one on data ingestion tool Apache Flume and an exclusive chapter on Apache Parquet for data formats.
The best thing about the book is that it does not merely teach about the various Hadoop features but also highlights the guidelines on how one can use them effectively. So, you will not just learn to create and run a MapReduce application but also you can learn how to tune various parameters to optimize performance, how to configure various properties and how to test and debug a MapReduce job. However, these concepts can be learnt over the time on the job, but the purpose of this book, is to help Hadoop developers speed up on the best practices for writing MapReduce jobs and get a good grasp of the design philosophy of Hadoop MapReduce applications.
The most appreciable part of the book, is that has links to latest online documentation, hints and coding gotchas that a Hadoop developer always looks out for. As it is with many books, the only thing that you might not like is - that it could get a bit dry and boring at times considering the density of topics which could lead to frustration and confusion.
The author Tom White cites a Martin Gardner quote in the book - “Beyond calculus, I am lost. That was the secret of my column’s success. It took me so long to understand what I was writing about that I knew how to write in a way most readers would understand.”
The message that Tom White intents to convey with this quote is that –there is actually no substitute to reading the code and understanding how a Hadoop component works. However, with small examples it is easier to understand the various aspects of a Hadoop component’s work.
The two important things that you can be sure to take way after reading the Hadoop book are -
The book has lots of information to consume and for beginners who are new to Hadoop it is suggested that they look through a couple of videos to become acquainted with the entire vocabulary related to the Hadoop ecosystem before they dive into the details of the book. Here are couple of videos that will get you get acquainted with common terms related to big data Hadoop-
One can actually read the book in two phases, by following a lecture then lab approach
So, what are you waiting for if you want to master your ride on the elephant in the big data room! Get a relaxing chair, good eyeglasses and start reading “Hadoop-The Definitive Guide” by Tom White to explore the world of big data Hadoop.