Recap of Apache Spark News for April 2017

Recap of Apache Spark News for April 2017

News on Apache Spark - April 2017

Apache Spark News for April 2017

Five emerging technologies for rapid digital transformation., April 4, 2017

A survey to understand as to which technologies IT leaders would employ to move faster and agility in 2017 noted Apache Spark as one of the important tool for agility in 2017.The other technologies that stood out in the survey include Puppet, Capriza, Okta and MultiChain.Apache Spark provides agility because of its ability to process large amounts of data faster that can help IT leaders make better business decisions and in a more confident manner.Data processing that earlier required weeks or days can now be completed in just few hours or in real-time using Apache Spark.

(Source : )

Learn Spark Online

If you would like more information about Apache Spark Training and Certification, click the Request Info button on top of this page.

Diablo Technologies Joins Hortonworks Partner Program, Achieves HDP Certification., April 4, 2017.

Diablo technologies announced its partnership with Hortonworks ISV/IHV program that has achieved “Product Integration Certification for the Supermicro® Memory1™ server on the Hortonworks Data Platform (HDP™)”.Diablo’s certification with HDP will not allow customers leverage terabytes of high performance application memory on a well-integrated Spark platform.The various advantages of Apache Spark deployments on Diablo powered Memory 1 include - 60% reduced TCO, more than 2x performance improvement per cluster , 5:1 server consolidation and approximately 389% work per server advantage.

(Source :

New MapR Ecosystem Pack Optimizes Security and Performance for Apache, April 10, 2017.

MapR released its latest version of MapR Ecosystem Pack (MEP) program 3.0 that provides enhanced security for Apache Spark, new Spark connectors for MapR DS and HBase, a faster version of Hive and updates and integrations with Drill. The latest version of MEP includes Apache Spark 2.1.0 focusing on enhancements in enterprise-ready stability and security, a native Spark connector for MapR-DB JSON that  will ease the process of developing real-time and batch pipelines.

(Source : )

Work on Hadoop and Spark Projects at just $9

Databricks Eyes Data Engineers With Spark, April 12, 2017.

Databricks rolled out the latest version of its cloud based platform on Spark to target data engineering workloads. The new  cloud based data science platform will allow data engineers to combine structured streaming , ETL , SQL and machine learning workloads on Spark. The goal of this platform is to combine secure deployment of multiple data pipelines in production. This new platform will address the increasing demand for data engineers which is hybrid job role between data scientists and data analysts.

(Source : )

Impetus Technologies Reveals Winners of Spark Streaming Innovation Contest., April 18,2017.

600 people participated in the Spark Streaming Innovation Contest. The registrants around the world were competing to build a real-world anomaly detection problem using the visual development platform StreamAnalytix which leverages Apache Spark in batch and streaming modes to create real-time ML applications. The participants were evaluated based on the quality of the application built,  extent and quality of StreamAnalytix usage and also on how well the solution was documented. A total of $18,000 prize money was awarded to the winners - Grand prize winner (awarded $10,000) – Venu Kanaparthy, Redlands, California, First runner-up (awarded $5,000) – Anindya Saha, Foster City, California and Second runner-up (awarded $3,000) – Kalyan Janaki, Denver, Colorado.

(Source :

Data Science Projects

Spark processing engine more at home in cloud, Databricks CEO says.SearchDataManagement, April 20, 2017.

Databricks CEO Ali Ghodsi said that Apache Spark has witnessed increased rate of adoption since 2016. He adds on saying that most of the Fortune 500 companies are looking to make a fast move to do analytics in the cloud due to the following reasons -

i) The cost of cloud services have become extremely competitive.

ii) People are more inclined in moving to the cloud to improve their security.

iii) Rate of innovation is faster in the cloud.

(Source : )

Apache Spark News



Relevant Projects

Analysing Big Data with Twitter Sentiments using Spark Streaming
In this big data spark project, we will do Twitter sentiment analysis using spark streaming on the incoming streaming data.

Data Mining Project on Yelp Dataset using Hadoop Hive
Use the Hadoop ecosystem to glean valuable insights from the Yelp dataset. You will be analyzing the different patterns that can be found in the Yelp data set, to come up with various approaches in solving a business problem.

Data processing with Spark SQL
In this Apache Spark SQL project, we will go through provisioning data for retrieval using Spark SQL.

Implementing Slow Changing Dimensions in a Data Warehouse using Hive and Spark
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Explore features of Spark SQL in practice on Spark 2.0
The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.

Hadoop Project-Analysis of Yelp Dataset using Hadoop Hive
The goal of this hadoop project is to apply some data engineering principles to Yelp Dataset in the areas of processing, storage, and retrieval.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Real-time Auto Tracking with Spark-Redis
Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.

Online Hadoop Projects -Solving small file problem in Hadoop
In this hadoop project, we are going to be continuing the series on data engineering by discussing and implementing various ways to solve the hadoop small file problem.

Data Warehouse Design for E-commerce Environments
In this hive project, you will design a data warehouse for e-commerce environments.