Big data industry has made Hadoop as the cornerstone technology for large scale data processing but deploying and maintaining Hadoop clusters is not a cakewalk. The challenges in maintaining a well-run Hadoop environment has led to the growth of Hadoop-as-a-Service (HDaaS) market.
“Typical enterprise Hadoop distributions rely on a lot of moving parts -- leveraging up to 20 or more disparate components -- which leads to management complexity and total cost of ownership (TCO) issues.”-said Wikibon Analyst George Gilbert proposing Hadoop-as-a-Service to address these customer concerns.
According to a report by Research and Markets, Global Hadoop-as-a-Service (HDaaS) market is expected to grow at an annual compound growth rate of 84.8% from 2014-2019. IDC anticipates Hadoop-as-a-service (HDaaS) market to reach $16.1 billion by 2020 with North America alone, being the highest revenue generating region, with a value of $11.6 billion.
HDaaS is expected to witness a bullish run in the next few years as increasing number of small and medium enterprises adopt HDaaS. With a great doorway in the big data analytics market, Hadoop-as-a-service offering is highly appreciated by subject matter experts and data scientists. Want to know more about the outburst of Hadoop as a service offering- what is HDaaS, the advantages it brings to an enterprise and the companies offering it? Here’s a short editorial piece, that leverages the power of Hadoop as a Service offering.
For the complete list of big data companies and their salaries- CLICK HERE
“Customers building their outward facing Web and mobile applications on public clouds while trying to build Hadoop applications on-premises should evaluate vendors offering it as-a-service. Hadoop already comes with significant administrative complexity by virtue of its multi-product design. On top of that, operating elastic applications couldn't be more different from the client-server systems IT has operated for decades.”- said Wikibon Analyst, George Gilbert
Some have the desired Hadoop skills, the urge to build, operate and maintain huge Hadoop clusters in-house, yet, many enterprises still do not want to make sustained investments in developing an in-house Hadoop capability. Hadoop-as-a-Service is rescue to all such prospective Hadoop users.
In simple terms, HDaaS is a cloud computing solution that makes processing big data, fast and cost-effective by eliminating the operational challenges of running Hadoop, so that enterprises can emphasize on business growth. The key to anything as-a-service is to provide it as a complete utility. Consumers of Hadoop want to avail the service quickly, gain access to it whenever they need it and pay only for what they have consumed. There are several activities that have to take place before a consumer can consume Hadoop-as-a-Service i.e. organizations have to provide a self-service portal, policy management framework, tenant isolation, metering and chargeback mechanisms and monitoring tools.
Hadoop-as-a-Service (HDaaS) also known as Hadoop in the cloud is an alternate to on premise Hadoop deployments for companies that have huge number of data centre administrators with a need to adopt Hadoop technology. However, they cannot do so as they lack skilled resources.
The demand for cost effective big data management is driving the growth of HDaaS market. Cloud computing together with the ability to harness the power of Hadoop - for analysing big data, is helping organizations manage their big data effectively. However, the lack of awareness about the technology and lack of skilled personnel are major roadblocks hindering the growth of HDaaS market as enterprises rethink on investing in Hadoop based big data solutions.
If you would like more information about Big Data careers, please click the orange "Request Info" button on top of this page.
Enterprises running Hadoop on premise, learned that infrastructure management, distributed nature of Hadoop, configuration and tuning, provision and availability for bulky workloads are difficult and expensive to maintain. Hadoop-as-a-Service overcomes all these challenges by offerings -
Hadoop-as-a-Service market segmented based on the deployment types -
Run It Yourself (RIY) solutions require Hadoop skills to configure and operate Hadoop. These services require manual intervention to handle huge workloads. On the contrary, pure-play HDaaS solutions provide the users with a non-technical interface to use Hadoop without having to understand the underlying software. Pure-play solutions do not require manual intervention for configuration whenever the data sizes grow or contract. As the Hadoop environment is well-managed for customers in pure-play HDaaS offerings, enterprises do need Hadoop expertise. Performance comes as an added advantage to pure-play HDaaS offerings as data is always available for running analysis and production jobs.
There are many other cloud vendors offering Hadoop-as-a-Service including Rackspace, Microsoft Azure, HP Helion, EMC, Cask Data, MapR, Hortonworks, Infochimps, Pentaho, Mortar Data (Datadog), etc. Hadoop-as-a-Service vendors are helping enterprises live up to the reputation of Hadoop hype by minimizing the computing budgets.
A Quora user very well sums up the answer on how one can choose a HDaaS provider of the many cloud options available -
With so many HDaaS (Hadoop-as-a-Service) providers, understanding what makes a good Hadoop-as-a-Service offering is extremely important. HDaaS providers offer variety of features and support- right from basic access to complete service support options. A good HDaaS solution is the one that provides integrated big data software, simplified cluster management and on-demand elastic cluster at minimal cost. Any Hadoop-as-a-Service solution should possess the following characteristics-
Enterprises use Hadoop-as-a-Service (HDaaS) to minimize the need for hiring professionals with specialized Hadoop skills. A data scientist may have in-depth knowledge of machine learning but might not be well-versed with configuring a Hadoop cluster. HDaaS solutions should be self-configuring in the sense that the service should dynamically configure the required number of nodes and tune parameters depending on the workload and storage required. Self-configuring HDaaS solutions minimize administration time and provide faster results.
There could be various conditions that require restarting a failed sub-process of a large job or there could be a deadlock scenario. An important characteristic of a HDaaS solution is to support non-stop operations without requiring manual intervention from a system administrator.
The pricing model of a HDaaS solution should be dependent on the Hadoop units of work, in particular to the HDFS storage and YARN job units. A comparison between the HDaaS providers that offer such pricing models could be comparatively less expensive than others especially when taking into account the overall cost of ownership.
HDFS has gained prominence due to its reliability and cost effective storage at scale. Data analysts and data scientists use HDFS for interactive use. A HDaaS solution should not require the users to manage the data that is not native to Hadoop and should be compatible with the growing ecosystem of other third-party applications. Storing data in HDFS ensures that there are no delays and also avoids any further cost in translating data from one format to another.
Managing Hadoop clusters and maintaining them is a complex task and when it is about handling diverse workloads for multiple customers – it is easier said than done. HDaaS solution provider that have experienced support staff for maintaining and running productions jobs, should be preferred.
A HDaaS solution must meet the needs of a Hadoop administrator and a data scientist.Management consoles of a HDaaS service must be streamlined in a manner that helps administrators perform a set of related management tasks, with quick and minimal manual intervention. Low level monitoring should be handled by the HDaaS provider and the administrative interface should just display the overall health of the Hadoop service.
A data scientist spends most of the time in data preparation so a HDaaS solution should offer a rich and powerful environment for analysis. Data scientists should be able to run Hadoop jobs through Pig, Hive, Mahout and other data science programming tools. As soon as a data scientist logs in to the service - the compute operations should be available without much delays for deploying Hadoop clusters and loading data from various data stores.
Few HDaaS solutions ensure that Hadoop is available, patched, scaled, maintained and monitored continuously while others provide a well-run environment for running and managing Hadoop jobs. Choosing a competing HDaaS solution that best fits the business requirements renders cost benefits.
HDaaS solutions are emerging as a promising option for building and maintaining Hadoop clusters. A HDaaS provider that meets both Hadoop data management needs and data science requirements of the business is well-liked by the customers. What began as a small open source project, has now become the rock star of the data-driven landscape. It might take time for Hadoop to replace existing data investments but the market growth of Hadoop-as-a-service will compel organizations to consider Hadoop.