Recap of Apache Spark News for March

Recap of Apache Spark News for March

News on Apache Spark - March 2016

Apache Spark News March 2016

Hortonworks joins with Hewlett Packard to accelerate Apache Spark. March 1, 2016. 

Hewlett Packard Labs, announced a new collaboration with Hortonworks to accelerate the growth and adoption of Apache Spark. They will focus their efforts on a new class of analytics workload which will benefit from large pools of shared memory. Scott Gnau, CTO, Hortonworks says that this collaboration is with the aim of advancing Apache Spark and its big data solutions.

(Source: )

Syncsort delivers native mainframe format for Apache Spark and Hadoop. March 4, 2016. 

Syncsort the mainframe and big data company has made it possible for companies to work on Apache Spark in the native mainframe format. This was done mainly to allow companies in industries like banking and insurance to keep their data in the native format but apply Apache Spark solutions.

(Source: )

Apache Spark and Hadoop are not mutually exclusive but a match made in heaven. March 7, 2016. 

Recently a report circulated that what with the adoption of better computing engines across the enterprises, Hadoop is now a dead technology. Hadoop is very much alive and kicking.  Apache Spark is just a general purpose computation engine that performs data processing in-memory. If you think Apache Spark is going to replace Hadoop then you are just short -changing yourself. 

(Source: )


Learn Hadoop and Spark Online

Apache Spark has a contender and it is Apache Flink 1.0. March 9, 2016. 

Hadoop requires fast and easy to stream big data processing and Apache Flink is here to meet the demand. Apache Flink is a strong contender for Apache Spark, and has launched its first API stable version this week. Apache Spark is mostly used for in-memory processing, but it has serious limitations for incoming real time data processing. Apache Flink is looking to resolve that issue.

(Source: )

SAP’s HANA Vora is leveraging Apache Spark execution, to bridge the divide between enterprise and Hadoop data. March 15, 2016. 

As Big Data matures and becomes more critical, it is important to make distributed processing available and Hadoop systems available to analyse data in a coherent way. SAP HANA Vora bring Apache Spark powered OLAP analysis solution to business analysts and data scientists to bridge the gap between structured and instructed data.

(Source: )

After the recent Spark Summit East, 2016, in NY, Analytics vet, Thomas Dinsmore, spoke on the wide enterprise adoption of Apache Spark. March 14, 2016. 

Thomas Dinsmore, the analytics software vet, has been following the progress of Spark since its first launch. Dinsmore states that Apache Spark is yet to go through an entire level of development maturity. It is not as fast a data processing software that everyone seems to think. But it does warrant attention in the IT world.

(Source: )

Apache Spark and Scala creators to speak in the Scala Conference at New York in May. March 15, 2016. 

Martin Odersky, the creator of Scala programming will deliver the opening keynote address on May 9, 2016 in New York City. Matei Zaharia – creator of Apache Spark will be delivering keynote address. There will be a 3-day conference from May 9-11 in new World stages and it will be followed by hands-on-training courses in Spark and Scala on May 12 and May 13.

(Source: )

Apache Spark is running into speed bumps with enterprise-wide adoption. March 22, 2016. 

Apache Spark is running into speed bumps, which was expected, but as the adoption of Spark is growing, there is a need to overcome these limitations. Apache Spark can be run on Hadoop distribution, Mesos or as a standalone in cloud. Apache Spark hits a bump when run on Hadoop. This can be easily rectified by adding more hardware.

(Source: )

Apache Spark is gaining momentum in Big Data Analytics. March 23, 2016. 

In the recent Spark Summit East 2016, in New York, it was very evident that big names are investing into this Spark technology. Last year in June, IBM announced that they would be assigning 3500 researchers to the Spark technology and will be donating their IBMSystemML – their Machine Learning technology to Spark open source.

(Source: )


For the complete list of big data companies and their salaries- CLICK HERE


Apache Spark is the answer to get real time insights on data and make speedy decisions. March 28, 2016. 

The time taken to change raw data into meaningful information – may cost you the time in transforming a lead to sale. But very few companies are able to crack this critical juncture. Companies with a lot of resources like Netflix, Amazon, etc. have perfected real time analytics of data. But companies that are not able to afford a huge team for data analysis may rely on Spark for fast streaming and real time analytics.

(Source: )

Apache Spark will dominate the Big Data landscape by 2022, Wikibon says. March 30, 2016.

A recent research report by Wikibon predicted that Apache Spark big data processing framework will constitute more than 1/3rd of the big data spending by end of 2022.From 2020 to 2022, Apache Spark will become the “design time foundation” for building predictive models through machine learning accounting for 37% of all the big data spending. With the increased adoption and evolution of Spark- this is the best time for big data professionals to master their skills in Apache Spark technology.

(Source: )

IBM adds Apache Spark platform to zSystems mainframe. March 31,

IBM released a new mainframe platform for Apache Spark. The new z/OS platform released by IBM will help data scientists apply advanced analytics on the big data available   in the mainframe systems for valuable business insights in real-time. Data scientists need not move or download the data from mainframe anymore, with zSystems they can directly apply in-memory analytics through Spark.


Spark Leads Big Data Boom, Researcher Says. March 31,

Market researcher Wikibon predicted that the global market for unified streaming analytics will be majorly driven by Spark enterprise adoption accounting for 16% (about $11.5 billion) of the overall big data spending by 2020.The three key drivers for increasing demand of big data technologies are –maturing data lakes, emerging intelligent self-tuning systems and evolving intelligent systems of engagement. Spark is being seen as a crucial catalyst for driving the above three emerging trends.






Learn Apache Spark Online Now

Relevant Projects

Design a Hadoop Architecture
Learn to design Hadoop Architecture and understand how to store data using data acquisition tools in Hadoop.

Spark Project -Real-time data collection and Spark Streaming Aggregation
In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

Real-Time Log Processing using Spark Streaming Architecture
In this Spark project, we are going to bring processing to the speed layer of the lambda architecture which opens up capabilities to monitor application real time performance, measure real time comfort with applications and real time alert in case of security

Hadoop Project for Beginners-SQL Analytics with Hive
In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.

Data processing with Spark SQL
In this Apache Spark SQL project, we will go through provisioning data for retrieval using Spark SQL.

Tough engineering choices with large datasets in Hive Part - 2
This is in continuation of the previous Hive project "Tough engineering choices with large datasets in Hive Part - 1", where we will work on processing big data sets using Hive.

Yelp Data Processing using Spark and Hive Part 2
In this spark project, we will continue building the data warehouse from the previous project Yelp Data Processing Using Spark And Hive Part 1 and will do further data processing to develop diverse data products.

Hive Project - Visualising Website Clickstream Data with Apache Hadoop
Analyze clickstream data of a website using Hadoop Hive to increase sales by optimizing every aspect of the customer experience on the website from the first mouse click to the last.

Analysing Big Data with Twitter Sentiments using Spark Streaming
In this big data spark project, we will do Twitter sentiment analysis using spark streaming on the incoming streaming data.

Hadoop Project-Analysis of Yelp Dataset using Hadoop Hive
The goal of this hadoop project is to apply some data engineering principles to Yelp Dataset in the areas of processing, storage, and retrieval.