Recap of Apache Spark News for October

Recap of Apache Spark News for October

News on Apache Spark - October 2016

Apache Spark News

What's Driving Apache Spark Growth? SQL, Streaming and Machine,October 3,2016

A recent report by Databricks Inc. the primary commercial steward for Spark highlighted that the technology is growing exponentially.Apache Spark growth is driven by the use of SQL, machine learning and streaming analytics. Over 900 organizations and 1615 participants responded to the survey including data scientists, data architects, data engineers, and others.This survey highlighted the fastest growing areas of the Apache Spark Ecosystem. 39% of the spark users were leveraging the machine learning library MLlib, 57% are streaming users,67% are Spark SQL users and the top growth area is the DataFrame API with 153% users.

(Source :

Apache Spark Training

If you would like more information about Apache Spark Certification training, please click the orange "Request Info" button on top of this page.

Spark Release 2.0.1., October 3, 2016

Apache Spark version 2.0.1 was released recently, it was a maintenance release containing 300 stability and bug fixes.This release is based on the branch-2.0 maintenance branch of Spark. It is strongly recommended all 2.0.0 users to upgrade to this stable release.

(Source :

The New Mainstream Appeal of Apache, October 13,2016.

A recent survey from Databricks shows how Spark’s momentum has increased exponentially in the past year and is popular than ever. The survey states that the number of users has increased to threefold from 2015 totalling up to 225,000.This shows the increased adoption of Apache Spark amongst businesses.


For the complete list of big data companies and their salaries- CLICK HERE

Apache Spark seventh heaven for developer., October 13, 2016

Apache Spark has gained prominence in the big data domain because of its parallel data processing capabilities. It allows you to easily develop rapid big data applications for machine learning, stream processing and analytics graph. A key concern while handling Big Data is speed. A notable difference between Spark vs Hadoop MapReduce is that Spark has an optimized "directed acyclic graph (DAC) -execution engine, which results in an efficient query plan for data transformations.

(Source: )

SlamData 4.0 Adds Analytic Support for Apache Spark, Couchbase, and,October 18,2016

SlamData Inc., the company which is building industry's first complete BI solution for complex modern data announced the release of Slamdata 4.0. "SlamData's mission has always been to completely solve the biggest problem facing enterprise BI — data chaos." said Jeff Carr, SlamData's CEO and cofounder. Slamdata 4.0 provides new connectors for modern data sources by rendering support for MarkLogic, MongoDB, Spark on Hadoop and CouchBase.

(Source :

Spark architecture finds place at center of big data,October 25,2016.

Spark architecture had a very limited role to play in the big data architecture at Webtrends, the company that collects user activity data from websites and mobile devices.However, now Apache Spark plays a critical role in the updated version of the new analytics platform.Apache Spark at the heart of Webtrends Infinity Analytics application. Webtrends set up a 160 node Apache Spark system for optimizing online marketing campaigns in real-time by analysing the activity data streaming into the Hadoop Clusters.

(Source :

Alluxio launches its memory-centric storage system for big data,October 26,2016.

Alluxio founded by Haoyuan Li allows distributed computing frameworks like Apache Hadoop and Spark to access big data through a memory centric storage by providing a unified namespace across all the distributed storage systems.can be well thought off as a sophisticated cache for big data workloads. Alluxio launched Enterprise and Community edition of the software to monetize its work by rendering advanced features and providing support.

(Source :



Apache Spark Training

Relevant Projects

Real-time Auto Tracking with Spark-Redis
Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.

Data processing with Spark SQL
In this Apache Spark SQL project, we will go through provisioning data for retrieval using Spark SQL.

Explore features of Spark SQL in practice on Spark 2.0
The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.

Design a Hadoop Architecture
Learn to design Hadoop Architecture and understand how to store data using data acquisition tools in Hadoop.

Yelp Data Processing Using Spark And Hive Part 1
In this big data project, we will continue from a previous hive project "Data engineering on Yelp Datasets using Hadoop tools" and do the entire data processing using spark.

Spark Project -Real-time data collection and Spark Streaming Aggregation
In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

Airline Dataset Analysis using Hadoop, Hive, Pig and Impala
Hadoop Project- Perform basic big data analysis on airline dataset using big data tools -Pig, Hive and Impala.

Hive Project - Visualising Website Clickstream Data with Apache Hadoop
Analyze clickstream data of a website using Hadoop Hive to increase sales by optimizing every aspect of the customer experience on the website from the first mouse click to the last.

Real-Time Log Processing in Kafka for Streaming Architecture
The goal of this apache kafka project is to process log entries from applications in real-time using Kafka for the streaming architecture in a microservice sense.

Create A Data Pipeline Based On Messaging Using PySpark And Hive - Covid-19 Analysis
In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.