Is Hadoop going to Replace Data Warehouse?

Is Hadoop built for replacing data warehouses completely or Hadoop data warehouse is the combination of the future? Will Hadoop compliment data warehouses?

Is Hadoop going to Replace Data Warehouse?
 |  BY ProjectPro

Hadoop is the most talked about innovation in the IT industry that has shaken the entire data centre infrastructure at many organizations. As the appetite for Hadoop and related big data technologies grows at an exponential rate, it is not out to spell the death of data warehousing. Data warehousing as a technology is evolving. Hadoop replacing the data warehouses still continues to be the hot-topic of discussion, even as mass majority of the enterprises continue to claim that they still need data warehouses.

“There’s no relationship between the EDW and Hadoop right now — they are going to be complementary. It’s NOT about rip and replace: we’re not going to get rid of RDBMS or MPP, but instead use the right tool for right job — and that will very much be driven by price.”- Alisdair Anderson said at a Hadoop Summit.


Web Server Log Processing using Hadoop in Azure

Downloadable solution code | Explanatory videos | Tech Support

Start Project

Will the increased enterprise adoption of Hadoop cause the demise of the data warehouse or will Hadoop and data warehouse be a winning combination for any business? This is not a simple question and does not lend a simple answer. However, the best possible answer to this question can be – “Use the best tool for the job”. Find compelling arguments to each side of the question on the most debated topic and understand how each has its own value as the big data landscape evolves with time.

 

ProjectPro Free Projects on Big Data and Data Science

Hadoop vs DataWarehouse

 

Hadoop vs Datawarehouse

Shocking Headlines like “Is data warehouse dead?”, “The death of MapReduce” have been doing rounds on the web. These superficial headlines do nothing but slow down the true potential of a data warehouse or Hadoop. Hadoop has made great mark in the big data analytics space but one cannot ignore the tremendous achievement and success of data warehouse in the last decade. With big data getting bigger, the hype level around Hadoop is increasing with lots of excitement coming in from IoT but it will not be able to replace the data warehouse as Hadoop sits side by side with data warehouses.

Dimensional Research surveyed 319 data professionals on the use of technologies like big data, Hadoop and data warehousing and the results reveal the fact that Hadoop is complementing data warehouse systems-

  • 96% of the respondents to the survey claimed that their organization is not decreasing their investments in data warehouses. This stat clearly indicates that data warehouse is not dead.
  • 64% of the respondents to the survey said that Hadoop and Data Warehouse complimented each other. It can be concluded from this that Hadoop and data warehouse co-exist in perfect harmony.

Hadoop and Data Warehouse – Understanding the Difference

“We don’t see anybody today trying to build an IDW with Hadoop. This is a capability issue, not a cost issue. Hadoop is not an IDW. Hadoop is not a database. Comparing these two for an IDW workload is comparing apples to oranges. I don’t know anybody who would try to build an IDW in Hadoop. There are many elements of the IDW on the technical side that are well refined and have been for 25 years. Things like workload management, the way concurrency works, and the way security works – there are many different aspects of a modern IDW that you are not going to see in Hadoop today. I would not see these two as equivalent.”- said Bob Page, the VP of development at Hortonworks.

The difference between Hadoop and data warehouse is like a hammer and a nail- Hadoop is a big data technology for storing and managing big data, whereas data warehouse is an architecture for organizing data to ensure integrity. A data warehouse is usually implemented in a single RDBMS which acts as a centre store, whereas Hadoop and HDFS span across multiple machines to handle large volumes of data that does not fit into the memory.

Has Hadoop really replaced Data Warehouses in the enterprise?

Hadoop has not completely replaced a data warehouse till date but has unwrapped a few workloads from an integrated data warehouse (IDW). Generally, ETL workloads are often offloaded to Hadoop. Hadoop is middleware infrastructure for parallelism and not an ETL solution, as it does not have what it takes to be data warehouse. Moreover, Hadoop requires manual coding of ETL transformation which is a costly affair over the years - as maintenance costs pile up. With many missing ETL subsystem features like data lineage, role based security, data quality and profiling subsystems, workflow management- Hadoop is not likely to replace a data warehouse but the two can exists together and make a perfect combination just like Peanut Butter and Chocolate. Companies are making new investments in Hadoop but are looking around for new ways to optimize their existing EDW investments instead of replacing them with Hadoop. For data scientists or data analysts, Hadoop and EDW are two different but great combinations that taste great when used together.

Get FREE Access to Data Analytics Example Codes for Data Cleaning, Data Munging, and Data Visualization

Hadoop and Data Warehouse – An Attractive Couple in the Big Data World

Hadoop and data warehouse do not exactly mix but are complementary technologies that can co-exist to provide the business with a better chance of handling big data workloads. Hadoop and data warehouse technologies are optimized for various business requirements and can be used collectively, based on how they best fit the business use case. A data warehouse makes the best use of relational and structured data whereas Hadoop excels in storing and managing unstructured data - which traditional data warehouses cannot handle.

Using a Hadoop-only strategy can prove to be dangerous for any business’s data needs. It is not a good move to use Hadoop to process 20 GB of structured data as that can easily be tackled with a traditional data warehouse. In fact, many businesses do not have the required Hadoop skilled personnel and the resources to run and deploy a Hadoop cluster for simple data queries. Doing so, will suppress the real potential of Hadoop. This implies that traditional data warehouses are still a necessity for day-to-day business operations where Hadoop is difficult to use.

Here's what valued users are saying about ProjectPro

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop Admin, Hadoop projects. I have been happy with every project. They have really brought me into the...

Ray han

Tech Leader | Stanford / Yale University

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain hands-on experience and prepare for job interviews. I would highly recommend this platform to anyone...

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd

Not sure what you are looking for?

View All Projects

Let’s take a simple example to understand how Hadoop and data warehouse co-exist together in an integrated and complementary manner. Suppose - an IT company decides to pull data from its social networking website and merge it with the data from the data warehouses to update the social circle of a user’s friends. Under these circumstances, using Hadoop to calculate the score of a person’s social influence can be a cost effective and timely solution. This data is then sent to the data warehouse where the campaign manager of the organization can see the user’s social influence score and re-segment them accordingly. This is the best example where we can understand the potential uses of Hadoop and data warehouse coexisting together. Each system is designed to do the best job it is meant for –Hadoop is processing unstructured social network data in parallel with minimal response time and data warehouse is making use of the data to help business analysts, data scientists and other users make meaningful decisions.

These scenarios clearly define the co-existence of the two technologies, independent of each other. However, choosing either of these, solely depends on the business problem for which the solution is being developed-

  • An organization can have a big data solution deployed on a Hadoop cluster and along with a data warehouse.
  • An organization can deploy a big data solution only with a data warehouse.
  • An organization can deploy a big data solution only with Hadoop

Missing Marks in Data Warehouse compared to Hadoop

  • Traditional data warehouse systems cannot ingest complex hierarchical data types in polytrees or graphs and any other kinds of unstructured data.
  • Exponential growth of big data, challenges the scalability and cost factors of a data warehouse because the licence cost models in a data warehousing system are based on the number of CPUs.
  • A data warehouse cannot ingest data that has no definite schema, as it follows schema on write mechanism unlike Hadoop which favours schema on read. Data warehouse professionals have to spend lot of time in modelling the data. This in turn might require the stakeholders to wait for months to discover the answer to a particular business question.

Missing Marks in Hadoop compared to a Data Warehouse

  • Data security is major concern in Hadoop, as it is still in its evolving state whereas data warehouse has already been crowned for being secure.
  • Apache Hadoop cannot provide high performance to mission critical workload that requires response in minutes whereas a data warehouse helps to achieve this in seconds through indices, high performance hardware and software. The best practice would be to offload other types of workloads on Hadoop and let alone mission critical workloads stay on data warehouse for efficient utilization of resources.

Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.

Request a demo

Hadoop vs. Data Warehouse – Decide Which One to Use When

Business Requirement

Data Warehouse

Hadoop

You have clean, consistent and high-quality data

Yes

A partial Yes (as Hadoop does not have many data quality solutions)

 

Raw Unstructured Data

No

Yes

Analysis of Preliminary Data

No

Yes

Low Latency and Interactive Reports

Yes

A partial Yes

To discover unexplored business questions

Yes

Yes

Get More Practice, More Big Data and Analytics Projects, and More guidance.Fast-Track Your Career Transition with ProjectPro

To decide whether Hadoop technology or a data warehouse architecture is better for a particular business case, key factors to be considered-

  • Cost – Considering the cost of scaling up systems, maintenance costs and supports costs is extremely important when choosing to go either with a data warehouse or Hadoop or both.
  • The chosen solution should have support for desired SQL functionality based on the requirements.
  • Operation performance is the heart of any real-time application and whether it is Hadoop or a traditional data warehouse it should be able to handle mixed workload of big data applications.

These criteria can be evaluated based on the desired outcome to be achieved for a specific business use case.

Hadoop ecosystem is evolving constantly and it will come as no surprise if soon it is able to handle all types of mission critical workloads, removing the need of a data warehouse but for now that is not the case and the two are likely to co-exist. Hadoop complements a data warehouse in its ability to perform large scale analytics on diverse data types and will continue to so. Organizations should keep their existing data warehouses and bring in Hadoop functionality to meet the business requirements as Hadoop is the heartthrob of enterprises leveraging big data.

Forrester Research analyst Mike Gualtieri very well sums up the debate on Hadoop vs. Data Warehouse in a single statement- “It’s not reasonable to think in terms of ‘death or glory’ for either EDWs or Hadoop.” Hadoop will change the implementation of data warehousing in the next few years but it will not outmode the practices of data warehousing. With varied opinions on Hadoop replacing data warehouse or complementing each other- one thing is for sure that Hadoop and Data Warehouse both are going to stay -forming an integral part of the big data landscape.

Build an Awesome Job Winning Project Portfolio with Solved End-to-End Big Data Projects

 

PREVIOUS

NEXT

Access Solved Big Data and Data Science Projects

About the Author

ProjectPro

ProjectPro is the only online platform designed to help professionals gain practical, hands-on experience in big data, data engineering, data science, and machine learning related technologies. Having over 270+ reusable project templates in data science and big data with step-by-step walkthroughs,

Meet The Author arrow link