Recap of Hadoop News for May

Recap of Hadoop News for May

News on Hadoop-May 2016

Hadoop News for May 2016

Microsoft Azure beats Amazon Web Services and Google for Hadoop Cloud Solutions. May 3, 2016. 

In the competition of the best Big Data Hadoop Cloud solution, Microsoft Azure came on top – beating tough contenders like Google and Amazon Web Services. In the 37 criteria-evaluation by Forrester, Azure came out as the leader. Microsoft’s cloud first strategy is definitely paying off.


Pachyderm Stack Challenges the core of Hadoop. May 10, 2016.

The Pachyderm team has begun to create opportunities out of the weaknesses of Hadoop. They have created containers for data storage and analysis – which is an alternate to Hadoop distributed file system. Pachyderm stack uses Docker containers. The Pachyderm File System is a replacement for HDFS and Pachyderm Pipelines is for MapReduce.

(Source: )

Learn Hadoop

If you would like more information about Big Data careers, please click the orange "Request Info" button on top of this page.

AtScale scoops up $11 million in Series B funding for its Hadoop-based BI. May 16, 2016.

With several players joining the vendor’s bandwagon in the Hadoop ecosystem, AtScale is in its position to succeed. AtScale is making BI work easily on Hadoop with interactive and multi-dimensional analytics capabilities providing governance, security and integration which enterprises demand from open source projects. At Scale’s strong performance has helped the company raise an $11 million funding in series B round.

(Source- )

Want to become a certified Hadoop Developer ? Enrol now for hands-on Hadoop Training Online

Hadoop 3 Poised to Boost Storage Capacity, Resilience with Erasure Coding. May 18, 2016.

According to a recent big data conference, the next major version of Hadoop i.e. Hadoop 3 is likely to have double storage capacity with increased resiliency with the addition of erasure coding. Erasure Coding is an error correction technology that is usually present in object file systems used for storing huge amounts of unstructured data. Hadoop 3 will make use of erasure codes to read and write data to HDFS. With many other novel features like the capability to derive heap size, shell script rewrite, capability to derive MapReduce memory automatically, task level native optimization , support for more than 2 NameNode’s- Hadoop 3 is bound to revolutionize the big data space.

(Source - )


For the complete list of big data companies and their salaries- CLICK HERE


Hadoop Evolution: What You Need to Know. May 23, 2016. EnterpriseAppsToday

Apache Hadoop has been constrained by skills shortage, complexity of implementation and lack of standardization but organizations are leveraging novel methods to use Hadoop. Though there might be limitation to Hadoop usage, experts suggest that it not going away and organizations might shift to cloud to handle Hadoop. With huge shift in the outlook of data analysts and vendors, hadoop is likely to play a vital role in business decisions with novel emerging approaches.

(Source- )

Global Hadoop Market Poised to Surge from USD 5.0 Billion in 2015 to USD 59.0 Billion by 2021. May 26, 2016. MarketResearchStore.Com

MarketResearchStore report anticipates the global demand for hadoop to reach $59 billion in 2012 from $4 billion in 2015 with a CAGR of 51%.The 3 major segments of the hadoop market are software, services and hardware of which services is a major shareholder with 40% of the total revenue. Geographically, North America dominated the regional hadoop market in 2015 and is expected to continue the same trend in the next few years with 50% share of the entire hadoop market. Asia Pacific and Europe are the growing regions for Hadoop market.

(Source - )

CLICK HERE  to know more about the open Hadoop Jobs for 2016

Global Hadoop Market Growth of 59.37% CAGR by 2020 .May 31, 2016. BusinessWire

Research and Markets recently released their Global Hadoop Market 2016-2020 report. Report anticipates that the global hadoop market is expected to grow at a compound annual growth rate of 59.37%.The report highlight the growth of Hadoop market based on three geographical segments- America, EMEA, APAC.




Learn Hadoop Online

Relevant Projects

Hadoop Project-Analysis of Yelp Dataset using Hadoop Hive
The goal of this hadoop project is to apply some data engineering principles to Yelp Dataset in the areas of processing, storage, and retrieval.

Real-time Auto Tracking with Spark-Redis
Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.

Online Hadoop Projects -Solving small file problem in Hadoop
In this hadoop project, we are going to be continuing the series on data engineering by discussing and implementing various ways to solve the hadoop small file problem.

Web Server Log Processing using Hadoop
In this hadoop project, you will be using a sample application log file from an application server to a demonstrated scaled-down server log processing pipeline.

Data Mining Project on Yelp Dataset using Hadoop Hive
Use the Hadoop ecosystem to glean valuable insights from the Yelp dataset. You will be analyzing the different patterns that can be found in the Yelp data set, to come up with various approaches in solving a business problem.

Tough engineering choices with large datasets in Hive Part - 2
This is in continuation of the previous Hive project "Tough engineering choices with large datasets in Hive Part - 1", where we will work on processing big data sets using Hive.

Hadoop Project for Beginners-SQL Analytics with Hive
In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.

Yelp Data Processing using Spark and Hive Part 2
In this spark project, we will continue building the data warehouse from the previous project Yelp Data Processing Using Spark And Hive Part 1 and will do further data processing to develop diverse data products.

Create A Data Pipeline Based On Messaging Using PySpark And Hive - Covid-19 Analysis
In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.

Analysing Big Data with Twitter Sentiments using Spark Streaming
In this big data spark project, we will do Twitter sentiment analysis using spark streaming on the incoming streaming data.