Big data and hadoop are catch-phrases these days in the tech media for describing the storage and processing of huge amounts of data. However, while you might be familiar with what is big data and hadoop, there is high probability that other people around you are not really sure on –What is big data, what hadoop is, what big data analytics is or why it is important. Over the years, big data has been defined in various ways and there is lots of confusion surrounding the terms big data and hadoop. So, here is a detailed explanation on what is big data and hadoop that will help you take the first few strategic steps to begin the journey towards pursuing a lucrative big data career.
What is Big Data and what is the Big Deal?
This is what happened on Facebook in the last 20 minutes - 1 million links shared, 1.5 million event invites, 1.9 million friend requests, 2.7 million photos uploaded, 2.8 million messages sent, 1 million tags, 1.5 million status updates and 2.8 million comments.12 Terabytes of data was generated through Twitter feeds in the last 6 hours, 5 million global share trades per second, millions of photos and videos. All these facts clearly speak about the Big Data trend making waves in the market.
Image Credit: twitter.com
There are hundreds of companies like Facebook, Twitter, and LinkedIn generating yottabytes of data. To gain competitive advantage, organizations have to make the best use of the unstructured data collected for profitable business decision making. This situation where companies and institutions have to support, store, analyze and make decisions using large amounts of data is called Big Data.
Image Credit : ibm.com
What is Big Data according to Gartner?
Gartner defines Big Data as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.
What is Big Data according to IBM?
More than 80% of data captured today is unstructured, from sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals, to name a few.IBM refers to all this unstructured data generated as Big Data.
What is Big Data according to EMC?
Joe Tucci ,CEO of EMC said that big data is best defined by example-“Big data would be the mass of seismic data an oil company accumulates when exploring for new sources of oil,” he said. “It would be the imaging data that a health care provider generates with multiple MRIs and other medical imaging techniques. It is the data that supports the rendering of video in 3D movies. The important thing is that this is petabyte scale from the start and grows in huge chunks to multi-multi-multi petabytes.”
The most evident example of “Big” big data that everyone is familiar with is Google Search. Google search works quickly that hardly anyone would spare a moment to think about the number of Google bots crawling through the web to generate dynamic results in real time. Google search results generated in milliseconds are the outcome of distributed processing of big data. Google search keeps an index of words instead of searching through webpages directly. It is better to scan through the index than to scan through the whole page. Index generation requires scanning through all the web pages and Google uses Hadoop MapReduce framework for scanning through huge number of servers and integrating the results into an index.
For the complete list of big data companies and their salaries- CLICK HERE
What is Big Data Analytics?
If you are convinced with the potential and strong power of big data, and still are a bit obscure on what it can really do for you and for your company then Big Data Analytics is something that you must leverage for profitable business decision making.
Why collect and store zettabytes of data if it cannot be leveraged for analysis in full context? Or if one has to wait for years to get outcomes?
The process of analysing large structured and unstructured data sets to discover indefinite relations, hidden patterns and any other valuable information that can be leveraged for better business decision making. Big Data Analytics tackles even the most challenging business problems through high-performance analytics. Big data analytics drives innovations by helping organizations make best possible decisions through –high performance data mining, predictive analytics, text mining, social sentiment analysis, text mining, forecasting and optimization. To add on to this, organizations are realizing that distinct properties of deep learning and machine learning are well-suited to address their requirements in novel ways through big data analytics.
Big Data Analytics is a big business, with IDC anticipating the Big Data market to grow at 27% CAGR reaching close to $32.4 billion by end of 2017.Organizations are increasingly leveraging high-performance big data analytics to find deep actionable insights with their big data. Most of the big data projects instigate with the need to answer business questions. With big data analytics in place, an organization can increase efficiency, enhance their operations, increase sales, enhance customer service and improve risk management strategies.
Big Deal Companies are striking with Big Data Analytics
It’s a Big Deal because, using Big Data one can build better products, offer better services and predict the future better. All this means Big Money. So Big Data is a Big Deal!
1) Macy’s , the largest retail store in US runs a daily price check analysis on million items based on demand and inventory. Whenever a neighbouring competitor between Los Angeles and New York reduces the prices for various products. Macy’s analytics system adjusts pricing of close to 73 million items based on the availability and demand to pace up with the competition.Macy’s analytics algorithms are designed to adjust prices several time in a day to react in a better manner to local competition. If there is no competitor in the neighbourhood, the prices remain unchanged.
2) The latest semantic search at Walmart depends on machine learning, text analysis and also synonym mining which helps Walmart produce effective search results. Walmart has witnessed a significant increase of 15% in the number of online shoppers completing their purchase which is some billions of dollars.
3) Tesco PLC, one of the largest supermarket chain in UK collected unstructured data points from over 70 million refrigerators which were analysed to leverage performance efficient. Analytics helped Tesco improve the performance and predict when the refrigerators would need to be serviced. Tesco, furthermore analysed these data points for predictive maintenance to cut down on the energy costs of the refrigerator.
What is Hadoop?
Hadoop is an open source software project. It is used chiefly in situations where there is LARGE amounts of data to analyze – structured and unstructured data. For example, banks need to draw patterns from millions of transactions, retail stores need to craft promotions based on millions of purchases, social networks need to analyze billions of events, Ad-networks need to analyze billions of clicks etc.
Image Credit : hadoop.apache.org
Traditional software frameworks are incapable of efficiently handling such large volumes of data using conventional hardware. The Hadoop ecosystem consists of a set of tools such as MapReduce, Hive,Pig, etc. that offers developers the flexibility to perform operations on large amounts of data using normal hardware. The data analysis jobs are split up on various computers and parallel processed using Hadoop. Hadoop configuration makes use of a computing architecture with multiple commodity hardware servers making it economical to scale and render support huge data sets.
With 80% of data being unstructured in nature, it is difficult for the legacy systems to analyse it. Hadoop is a core platform for structuring big data so that it can be used for further analysis. The real reason for Big Data Hadoop in Action is-“Before the advent of Big Data Hadoop, data storage was expensive”.
Work on Interesting Big Data and Hadoop Projects for just $9!
What is Hadoop according to Gartner?
According to a Gartner Analysts, Merv Adrian “Big Data Hadoop was Hadoop file system, MapReduce and some utilities. As those utilities got formalized and became projects themselves and were supported by commercial distributors, the list grew: Pig, Hive, HBase, and Zookeeper were Hadoop too. Accumulo, Avro, Cascading, Flume, Mahout, Oozie, Spark, Sqoop, and YARN had joined the list. MervAdrian redefined Hadoop as:
What is Hadoop according to IBM?
IBM defines Big Data Hadoop as –“Big Data Hadoop is a software project that enables distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software's ability to detect and handle failures at the application layer.”
Why is a funny looking elephant the logo for Big Data Hadoop?
“Hadoop” is the name of the toy elephant, belonging to the daughter of Doug Cutting – the Creator of Hadoop. So Doug decided to name his software project after his daughters toy “Hadoop”.
Know more about Big Data and Hadoop Training to join the top big data analytics companies!
Who invented Hadoop?
“All roads lead to Rome” – Google invented the basic frameworks that constitute what is today popularly called as Hadoop. They faced the future first with the problem of handling billions of searches and indexing millions of web pages. When they could not find any large scale, distributed, scalable computing platforms for their needs, they just went ahead and created their own. Doug Cutting was inspired by Google’s white papers and decided to create an open source project called “Hadoop”.
Yahoo further contributed to this project and played a key role in developing Hadoop for enterprise applications. Since then many companies such as Facebook, LinkedIn, eBay, Hortonworks, Cloudera etc. have contributed to the Hadoop project.
So why should I care?
Image Credit : visual.ly
Learn Hadoop through real-world hands-on hadoop projects
“For every 100 Big Data jobs there are only 2 qualified candidates” - Fastcompany
“By 2018, the United States will create 290,000 to 340,000 new big data jobs and more than half could go unfilled because skilled candidates are in short supply.” – McKinsey
Convinced? ....Not yet...Here’s more
“In a survey of 3,000 global companies, more than 83% of respondents identified business analytics from big data as a top priority” - IBM
“With Big Data skills, there is an opportunity for students to gain a market advantage while starting their career” – Terri Griffith, Professor, Santa Clara University
If you are interested in understanding the core components of Hadoop ecosystem, please check out this post on defining architecture components of the big data ecosystem.
Want to Learn Big Data and Hadoop? Check Out some awesome Big Data Online Courses!