Recap of Apache Spark News for July

Recap of Apache Spark News for July

News on Apache Spark - July 2016

Apache Spark News for 2016

MongoDB “connects” with Apache Spark with its new connector. July 4, 2016.

MongoDB Inc. has announced a new connector for Apache Spark – that will allow Spark developers and data scientists to use its database to work with rapidly moving data. Kelly Stirman, MongoDB’s VP of Strategy has said that MongoDB users have expressed great interest to work with the Spark connecter. MongoDB has just taken its Apache Hadoop connecter and enhanced it for Spark.


Apache Spark Training

Sparkling Water 2.0 enables machine learning with Apache Spark. July 4,

A new tool Sparkling Water 2.0 created by the startup (earlier known as Oxdata Inc.) provides an open source platform for algorithmic development. Sparkling Water 2.0 makes the use of machine learning algorithms during data analysis easier. Instead of using Apache Spark’s machine learning library MLlib, Sparkling Water 2.0 application programming interface allows users to tap into H2O’s AI platform. The tool allows users to make the best use of Spark features along with its own columnar compression, fully featured machine learning algorithms and speed.

(Source: )

For the complete list of big data companies and their salaries- CLICK HERE

All Apache Spark support are not the same. Choose wisely. July 7, 2016.

Apache Spark has become extremely popular since its launch in 2012. Since last year it has gained momentum in enterprise adoption. But for Apache Spark, all support is not the same. Customers should look at 4 main facets before using Spark libraries. How Spark is used in the platform, what is available in the Apache Spark package, how everyone in the team is exposed to Spark and how to perform analytics with the various libraries in Spark.


Splice Machine, which uses Hadoop and Spark, took its new RDBMS Sandbox live, in Amazon Web Services (AWS). June 18, 2016.

Splice machine which is an open source RDBMS, powered by Hadoop and Spark, today, and announced its new open source Sandbox for the use of developers. This new open source Sandbox 2.0 community edition is not up for test in AWS.


Enrol now for Apache Spark Training and Become a Certified Spark Developer!!!

TIBCO’s Apache Spark accelerator is out and about to make the fast Spark faster. July 26, 2016.

Hayden Schultz, the global architect for TIBCO talks about how to bring technology that is making waves in the industry, closer to the understanding and usage of the customer. TIBCO is all about building an application to boost a technology’s core feature. In the case of Apache Spark, the application will help build accelerated systems on top of Apache Spark to stimulate big data solutions.


Databricks unveils commercial support for Apache Spark 2.0.July 28,

Apache Spark 2.0 is now available to users on the Databricks platform. Spark 2.0 is 5 to 10 times better in performance when compared to Spark 1.6 with support for applications requiring structured streaming. Tungsten's Phase 2 whole-stage-code generation and Catalysts code optimization adds on to the enhanced speed of Spark 2.0.  The latest releases comes bundled with many novel features like - Machine learning model persistence, DataFrame-based machine learning APIs, standard SQL support, etc.

(Source: )

Apache MLlib — making practical machine learning easy and scalable. July 29, 2016.Jaxenter

Machine learning might seem to be futuristic, however, it is not. Apache Spark’s scalable machine learning library MLlib is making machine learning easy for machine learning engineers and data scientists. MLlib library does not only fits models but can also be used for various staging transformations like data collection, data labelling, feature extraction ,model tuning, model evaluation and deployment. MLlib library together with other Apache Spark components provide a unified solution to data scientists under a single big data framework.

(Source: )



Apache Spark Certification and Training

Relevant Projects

Real-Time Log Processing using Spark Streaming Architecture
In this Spark project, we are going to bring processing to the speed layer of the lambda architecture which opens up capabilities to monitor application real time performance, measure real time comfort with applications and real time alert in case of security

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Yelp Data Processing Using Spark And Hive Part 1
In this big data project, we will continue from a previous hive project "Data engineering on Yelp Datasets using Hadoop tools" and do the entire data processing using spark.

Explore features of Spark SQL in practice on Spark 2.0
The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.

Spark Project -Real-time data collection and Spark Streaming Aggregation
In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

Spark Project-Analysis and Visualization on Yelp Dataset
The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

Hadoop Project-Analysis of Yelp Dataset using Hadoop Hive
The goal of this hadoop project is to apply some data engineering principles to Yelp Dataset in the areas of processing, storage, and retrieval.

Hive Project - Visualising Website Clickstream Data with Apache Hadoop
Analyze clickstream data of a website using Hadoop Hive to increase sales by optimizing every aspect of the customer experience on the website from the first mouse click to the last.

Data processing with Spark SQL
In this Apache Spark SQL project, we will go through provisioning data for retrieval using Spark SQL.

Design a Hadoop Architecture
Learn to design Hadoop Architecture and understand how to store data using data acquisition tools in Hadoop.