Hadoop is the cornerstone of the big data industry, however, the challenges involved in maintaining the hadoop network has led to the development and growth of Hadoop-as-a-Service (HaaS) market.Industry research reveals that the global Hadoop-as-a-Service market is anticipated to reach $16.2 billion by 2020 growing a a compound annual growth rate of 70.8% from 2014 to 2020.With market leaders like Microsoft and SAP expanding their horizons at the end user industry, HaaS is likely to witness rapid growth in the next 7 years.Organizations like Commerzbank have already launched new platforms based on HaaS solutions which demonstrate that HaaS is a promising solution for building and managing big data clusters. HaaS will compel organizations to consider Hadoop as a solution to various big data challenges.
(Source - https://insidebigdata.com/2018/09/07/hadoop-service-need-hour-superior-business-solutions/ )
Considering the importance cloud, Hortonworks is partnering with RedHat and IBM to transform Hadoop into a cloud-native platform.Today Hadoop can run in the cloud but it cannot exploit the capabilities of the cloud architecture to the fullest.The idea to make hadoop cloud-native is not a mere matter of buzzword compliance,but the goal is to make it more fleet-footed.25% of workloads from Hadoop incumbents - MapR, Hortonworks, and Cloudera are running in the cloud ,however, by next year it is anticipated that half of all the new big data workloads will be deployed on the cloud.Hortonworks is unveiling the Open Hybrid Architecture initiative for transforming Hadoop into a cloud-native platform that will address containerization, support Kubernetes, and include the roadmap to encompass separating compute from data.
(Source - https://www.zdnet.com/article/hortonworks-unveils-roadmap-to-make-hadoop-cloud-native/ )
LinkedIn’s open-source project Tony aims at scaling and managing deep learning jobs in Tensorflow using YARN scheduler in Hadoop.Tony uses YARN’s resource and task scheduling system to run Tensorflow jobs on a Hadoop cluster. LinkedIn’s open source project Tony can also schedule GPU based tensorflow jobs through Hadoop,allocate memory separately for Tensorflow nodes , request different types of resources (CPU’s vs GPU’s), and ensures that the job outcomes are saved at regular intervals on HDFS and resumed from where the jobs were interrupted or crashed.LinkedIn claims that there is no additional overhead for Tensorflow jobs when using Tony because it is present at a layer which orchestrates distributed Tensorflow and does not interrupt the execution of tensorflow jobs.Tony is also used for visualizing, optimization, and debugging of Tensorflow apps.
(Source - https://www.infoworld.com/article/3305590/tensorflow/linkedin-open-sources-a-tool-to-run-tensorflow-on-hadoop.html )
Microsoft has announced the addition of new connectors which will allow businesses to use SQL server to query other databases like MongoDB, Oracle, and Teradata. This will make Microsoft SQL server into a virtual integration layer where the data will never have to be replicated or moved to the SQL server. SQL server in 2019 will come with in-built support for Hadoop and Spark. SQL server will provide support for big data clusters through Google-incubated Kubernetes container orchestration system. Every big data cluster will include SQL server, Hadoop and Spark file system.
(Source - https://techcrunch.com/2018/09/24/microsofts-sql-server-gets-built-in-support-for-spark-and-hadoop/)
Big data is really changing the way we use data for agriculture. FAO, the Bill and Melinda Gates Foundation and national governments have launched a US$500-million effort to help developing countries collect data on small-scale farmers to help fight hunger and and promote rural development. Collecting accurate information about seed varieties ,farmer’s technological capacity, and farmers income will help coalition members understand how ongoing agricultural investments are making an impact.This data will also enable governments to customize policies to help farmers.
(Source - https://www.nature.com/articles/d41586-018-06800-8)
Milwaukee based maker of mining equipment Count Komatsu Mining Corp. is looking to churn more data in place and share BI analytics of the data within and outside the organization.To enhance the efficiency, Count Komatsu has combined several big data tools that include Spark, Hadoop, Kafka , Kudu, and Impala from Cloudera. It has also included on-cluster analytics software from BI on Hadoop analytics toolmaker Arcadia Data. This big data platform has been assembled to analyse sensor data collected by the equipments in the field to keep a track on wear and tear of massive shovels and earth movers.The company forsees a future in which the platform will utilize IoT application data for better predictive and prescriptive equipment maintenance.
(Source - https://searchdatamanagement.techtarget.com/feature/Mining-equipment-maker-uses-BI-on-Hadoop-to-dig-for-data )