Recap of Apache Spark News for April 2017

Recap of Apache Spark News for April 2017

News on Apache Spark - April 2017

Apache Spark News for April 2017

Five emerging technologies for rapid digital transformation., April 4, 2017

A survey to understand as to which technologies IT leaders would employ to move faster and agility in 2017 noted Apache Spark as one of the important tool for agility in 2017.The other technologies that stood out in the survey include Puppet, Capriza, Okta and MultiChain.Apache Spark provides agility because of its ability to process large amounts of data faster that can help IT leaders make better business decisions and in a more confident manner.Data processing that earlier required weeks or days can now be completed in just few hours or in real-time using Apache Spark.

(Source : )

Learn Spark Online

If you would like more information about Apache Spark Training and Certification, click the Request Info button on top of this page.

Diablo Technologies Joins Hortonworks Partner Program, Achieves HDP Certification., April 4, 2017.

Diablo technologies announced its partnership with Hortonworks ISV/IHV program that has achieved “Product Integration Certification for the Supermicro® Memory1™ server on the Hortonworks Data Platform (HDP™)”.Diablo’s certification with HDP will not allow customers leverage terabytes of high performance application memory on a well-integrated Spark platform.The various advantages of Apache Spark deployments on Diablo powered Memory 1 include - 60% reduced TCO, more than 2x performance improvement per cluster , 5:1 server consolidation and approximately 389% work per server advantage.

(Source :

New MapR Ecosystem Pack Optimizes Security and Performance for Apache, April 10, 2017.

MapR released its latest version of MapR Ecosystem Pack (MEP) program 3.0 that provides enhanced security for Apache Spark, new Spark connectors for MapR DS and HBase, a faster version of Hive and updates and integrations with Drill. The latest version of MEP includes Apache Spark 2.1.0 focusing on enhancements in enterprise-ready stability and security, a native Spark connector for MapR-DB JSON that  will ease the process of developing real-time and batch pipelines.

(Source : )

Work on Hadoop and Spark Projects at just $9

Databricks Eyes Data Engineers With Spark, April 12, 2017.

Databricks rolled out the latest version of its cloud based platform on Spark to target data engineering workloads. The new  cloud based data science platform will allow data engineers to combine structured streaming , ETL , SQL and machine learning workloads on Spark. The goal of this platform is to combine secure deployment of multiple data pipelines in production. This new platform will address the increasing demand for data engineers which is hybrid job role between data scientists and data analysts.

(Source : )

Impetus Technologies Reveals Winners of Spark Streaming Innovation Contest., April 18,2017.

600 people participated in the Spark Streaming Innovation Contest. The registrants around the world were competing to build a real-world anomaly detection problem using the visual development platform StreamAnalytix which leverages Apache Spark in batch and streaming modes to create real-time ML applications. The participants were evaluated based on the quality of the application built,  extent and quality of StreamAnalytix usage and also on how well the solution was documented. A total of $18,000 prize money was awarded to the winners - Grand prize winner (awarded $10,000) – Venu Kanaparthy, Redlands, California, First runner-up (awarded $5,000) – Anindya Saha, Foster City, California and Second runner-up (awarded $3,000) – Kalyan Janaki, Denver, Colorado.

(Source :

Data Science Projects

Spark processing engine more at home in cloud, Databricks CEO says.SearchDataManagement, April 20, 2017.

Databricks CEO Ali Ghodsi said that Apache Spark has witnessed increased rate of adoption since 2016. He adds on saying that most of the Fortune 500 companies are looking to make a fast move to do analytics in the cloud due to the following reasons -

i) The cost of cloud services have become extremely competitive.

ii) People are more inclined in moving to the cloud to improve their security.

iii) Rate of innovation is faster in the cloud.

(Source : )

Apache Spark News



Relevant Projects

Real-time Auto Tracking with Spark-Redis
Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.

Hadoop Project for Beginners-SQL Analytics with Hive
In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.

Event Data Analysis using AWS ELK Stack
This Elasticsearch example deploys the AWS ELK stack to analyse streaming event data. Tools used include Nifi, PySpark, Elasticsearch, Logstash and Kibana for visualisation.

Analysing Big Data with Twitter Sentiments using Spark Streaming
In this big data spark project, we will do Twitter sentiment analysis using spark streaming on the incoming streaming data.

Online Hadoop Projects -Solving small file problem in Hadoop
In this hadoop project, we are going to be continuing the series on data engineering by discussing and implementing various ways to solve the hadoop small file problem.

Real-Time Log Processing in Kafka for Streaming Architecture
The goal of this apache kafka project is to process log entries from applications in real-time using Kafka for the streaming architecture in a microservice sense.

Web Server Log Processing using Hadoop
In this hadoop project, you will be using a sample application log file from an application server to a demonstrated scaled-down server log processing pipeline.

Hadoop Project-Analysis of Yelp Dataset using Hadoop Hive
The goal of this hadoop project is to apply some data engineering principles to Yelp Dataset in the areas of processing, storage, and retrieval.

Analyse Yelp Dataset with Spark & Parquet Format on Azure Databricks
In this Databricks Azure project, you will use Spark & Parquet file formats to analyse the Yelp reviews dataset. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.

Implementing Slow Changing Dimensions in a Data Warehouse using Hive and Spark
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.