Recap of Hadoop News for September 2018

Recap of Hadoop News for September 2018

Big Data Hadoop News for September 2018

Hadoop-as-a-Service: The Need Of The Hour For Superior Business, September 7, 2018

Hadoop is the cornerstone of the big data industry, however, the challenges involved in maintaining the hadoop network has led to the development and growth of Hadoop-as-a-Service (HaaS) market.Industry research reveals that the global Hadoop-as-a-Service market is anticipated to reach $16.2 billion by 2020 growing a a compound annual growth rate of 70.8% from 2014 to 2020.With market leaders like Microsoft and SAP expanding their horizons at the end user industry, HaaS is likely to witness rapid growth in the next 7 years.Organizations like Commerzbank have already launched new platforms based on HaaS solutions which demonstrate that HaaS is a promising solution for building and managing big data clusters. HaaS will compel organizations to consider Hadoop as a solution to various big data challenges.

(Source - )

Online Hadoop Training

Hortonworks unveils roadmap to make Hadoop, September 10, 2018

Considering the importance cloud, Hortonworks is partnering with RedHat and IBM to transform Hadoop into a cloud-native platform.Today Hadoop can run in the cloud but it cannot exploit the capabilities of the cloud architecture to the fullest.The idea to make hadoop cloud-native is not a mere matter of buzzword compliance,but the goal is to make it more fleet-footed.25% of workloads from Hadoop incumbents - MapR, Hortonworks, and Cloudera are running in the cloud ,however, by next year it is anticipated that half of all the new big data workloads will be deployed on the cloud.Hortonworks is unveiling the Open Hybrid Architecture initiative for transforming Hadoop into a cloud-native platform that will address containerization, support Kubernetes, and include the roadmap to encompass separating compute from data.

(Source - )

Master Hadoop Skills by working on interesting Hadoop Projects

LinkedIn open-sources a tool to run TensorFlow on, September 13, 2018.

LinkedIn’s open-source project Tony aims at scaling and managing deep learning jobs in Tensorflow using YARN scheduler in Hadoop.Tony uses YARN’s resource and task scheduling system to run Tensorflow jobs on a Hadoop cluster. LinkedIn’s open source project Tony can also schedule GPU based tensorflow jobs through Hadoop,allocate memory separately for Tensorflow nodes , request different types of resources (CPU’s vs GPU’s), and ensures that the job outcomes are saved at regular intervals on HDFS  and resumed from where the jobs were interrupted or crashed.LinkedIn claims that there is no additional overhead for Tensorflow jobs when using Tony because it is present at a layer which orchestrates distributed Tensorflow and does not interrupt the execution of tensorflow jobs.Tony is also used for visualizing, optimization, and debugging of Tensorflow apps.

(Source - )

Big Data Hadoop Projects

Microsoft’s SQL Server gets built-in support for Spark and Hadoop. September 24, 2018.

Microsoft has announced the addition of new connectors which will allow businesses to use SQL server to query other databases like MongoDB, Oracle, and Teradata. This will make Microsoft SQL server into a virtual integration layer where the data will never have to be replicated or moved to the SQL server. SQL server in 2019 will come with in-built support for Hadoop and Spark. SQL server will provide support for big data clusters through Google-incubated Kubernetes container orchestration system. Every big data cluster will include SQL server, Hadoop and Spark file system.

(Source -

Big-data project aims to transform farming in world’s poorest countries.September 24, 2018,

Big data is really changing the way we use data for agriculture. FAO, the Bill and Melinda Gates Foundation and national governments have launched a US$500-million effort to help developing countries collect data on small-scale farmers to help fight hunger and and promote rural development. Collecting accurate information about seed varieties ,farmer’s technological capacity, and farmers income will help coalition members understand how  ongoing agricultural investments are making an impact.This data will also enable governments to customize policies to help farmers. 

(Source - 

Mining equipment-maker uses BI on Hadoop to dig for, September 26, 2018.

Milwaukee based maker of mining equipment Count Komatsu Mining Corp. is looking to churn more data in place and share BI analytics of the data within and outside the organization.To enhance the efficiency, Count Komatsu has combined several big data tools that include Spark, Hadoop, Kafka , Kudu, and Impala from Cloudera. It has also included on-cluster analytics software from BI on Hadoop analytics toolmaker Arcadia Data. This big data platform has been assembled to analyse sensor data collected by the equipments in the field to keep a track on wear and tear of massive shovels and earth movers.The company forsees a future in which the platform will utilize IoT application data for better predictive and prescriptive equipment maintenance.

(Source - )

Online Hadoop Training


Relevant Projects

Create A Data Pipeline Based On Messaging Using PySpark And Hive - Covid-19 Analysis
In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.

Spark Project-Analysis and Visualization on Yelp Dataset
The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

Real-Time Log Processing using Spark Streaming Architecture
In this Spark project, we are going to bring processing to the speed layer of the lambda architecture which opens up capabilities to monitor application real time performance, measure real time comfort with applications and real time alert in case of security

Hive Project - Visualising Website Clickstream Data with Apache Hadoop
Analyze clickstream data of a website using Hadoop Hive to increase sales by optimizing every aspect of the customer experience on the website from the first mouse click to the last.

Data Warehouse Design for E-commerce Environments
In this hive project, you will design a data warehouse for e-commerce environments.

Event Data Analysis using AWS ELK Stack
This Elasticsearch example deploys the AWS ELK stack to analyse streaming event data. Tools used include Nifi, PySpark, Elasticsearch, Logstash and Kibana for visualisation.

Airline Dataset Analysis using Hadoop, Hive, Pig and Impala
Hadoop Project- Perform basic big data analysis on airline dataset using big data tools -Pig, Hive and Impala.

Implementing Slow Changing Dimensions in a Data Warehouse using Hive and Spark
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Tough engineering choices with large datasets in Hive Part - 1
Explore hive usage efficiently in this hadoop hive project using various file formats such as JSON, CSV, ORC, AVRO and compare their relative performances

Analysing Big Data with Twitter Sentiments using Spark Streaming
In this big data spark project, we will do Twitter sentiment analysis using spark streaming on the incoming streaming data.