At his keynote address in San Jose, Strata+Hadoop World 2016, Doug Cutting said that Hadoop is not at its peak and not going to phase out. Hadoop is still undergoing digital mutations and the experts are considering it to be in its starting stages. Cutting admits that with HDFS and MapReduce, maybe Hadoop has reached its peak. But there are so many developments still happening on Hadoop – which makes it the goto technology in open source for data analysis and storage.
Most companies know what Hadoop is used for – but more often than not – they fail to implement it correctly causing loss of time and data that is crucial to business needs. Marketshare found this out the hard way. Marketshare earlier used Amazon’s EMR, and paired it with Oracle database and Tableau. In 2014, Marketshare shifted to using Altiscale. Right now they are in the process of moving to Arcadia data.
LinkedIn has open sourced Dr. Elephant, a performance tuning and monitoring tool, which will help Hadoop and Spark users improve their flow’s performance. LinkedIn had first presented this tool in the eighth annual Hadoop Summit in 2015. Dr. Elephant was created to standardize and automate the use of Hadoop by users at different levels of proficiency.
At a recent press conference - the Hadoop Summit in Dublin, Microsoft, HPE and EMC discussed their experiences in implementing Hadoop. All of them agreed and the result was quite unsurprising that simplicity is more important than speed when it comes to Hadoop implementation. Hortonworks while presenting the features of Spark, real time data analysis, emphasized that customers want simplicity over speed when it comes to getting value from their data. (Source: http://www.computerworlduk.com/data/enterprises-just-want-simplicity-when-it-comes-their-hadoop-big-data-strategy-3638198/ )
At the recent Strata + Hadoop World even 2016, Doug Cutting, the father of Hadoop says that he is amazed at how far the technology has come in the data management space. Cutting coming from a search technology background himself, understands how data works and keeps looking at newer ways to solve the data processing problems. He says that the Hadoop project has sparked a revolution in the open data space and he has seen so many industries moving away from platforms that they have been using for decades to open source data management tools. (Source: http://searchdatamanagement.techtarget.com/video/Hadoop-father-Doug-Cutting-talks-of-changes-on-data-front )
The leader in Hadoop distribution, Hortonworks said, in the recent Hadoop Summit, at Dublin, that they are set to become profitable at the end of this year. They are hopeful at becoming a cashflow positive company by the end of 2016. Any technology that is open source will enjoy a healthy bottom-line but the marketing costs are huge. Many open source technologies are now following suit. Apache Spark is a great example of this Hadoop zoo which has gained unprecedented momentum.
Hadoop is at the heart of San Francisco based real estate Company, Trulia’s data infrastructure. Usage of Hadoop helps Trulia to deliver personalized recommendations to customers based on complex data science models that analyse terabytes of data daily. To ensure reliability in its multi-tenant, multi-workload environment and complete complex jobs on time –Trulia has turned to Pepperdata. Pepperdata is a specialist in rendering adaptive hadoop performance guaranteeing quality of service on Hadoop.
The global market for Hadoop is growing exponentially since its inception in 2009 for managing big data because of its cost-effectiveness and low maintenance charges over other data frameworks. Hadoop market is divided into three types – hardware, software and services. This report unveils the growth of the hadoop market based on industry use, type and geography.
With huge interest in cloud-based applications using NoSQL for batch processing and real time analytics using data pipes- the biggest challenge is designing the applications in a streaming way and not the hadoop or data lake way. Datastax CEO, Derek says that there is technology to solve nifty use cases but there is huge skills gap.
As increasing number of organizations adopt hadoop to unlock novel opportunities hidden in the tidal wave of big data, there are several challenges and points of concern related to diverse data sources and sustainability. Enterprises just don’t have to run MapReduce jobs but they also want to build enterprise applications for various types of analytics users on HDFS. Thus, data has to protected, persisted and secured for greater period of time presenting various challenges and difficulties for users of hadoop data analytics.
With global market growth in hadoop technology, hadoop project management has become a growing domain. However, managing a hadoop project is not a cakewalk and presents many challenges related to performance, scalability, timeliness of the data, dependability, data governance, security, data access and interoperability. This is the best for people to hone their hadoop skills if they want to get their hands dirty on hadoop project management.
Many organizations leveraging big data are turning to SQL-on-Hadoop tools to process data faster without having to program in Java MapReduce. Premier Inc. earlier used a batch oriented MapReduce to power the web based BI dashboard application used by data analysts, hospital purchasing managers and supply chain executives. However, now it switched over to Cloudera Impala, SQL-on-Hadoop tool for faster hadoop query performance.