Recap of Apache Spark News for September

Recap of Apache Spark News for September

News on Apache Spark - September 2016

Apache Spark News

GigaSpaces Launches the Next Generation Apache Spark, September 6,2016

GigaSpaces, the provider of in-memory computing technologies launched a data grid enabled real time analytics platform ,InsightEdge for faster data analytics using Apache Spark.“Gaining meaningful intelligence from data has typically been hindered by the speed at which data becomes usable. InsightEdge makes it possible for companies to make decisions that are informed by all of the data they have available at any given second.”- said Ali Hodroj, VP of Products and Strategy in the IMC Business Unit for GigaSpaces.


Apache Spark Training

Azul's New Zing Release Brings Cassandra, Spark Performance.,September 7, 2016.

Azul, one of the leading providers for Java runtime solutions for developers has released the latest version of Zing runtime for Java  which is a replacement for the legacy JVM’s. The latest version of Zing’s JVM enhances the performance and scalability of DataStax, Cassandra and Apache Spark applications.

(Source: )

For the complete list of big data companies and their salaries- CLICK HERE

Taking Spark Apps from Prototype to, September 12,2016.

Cask Data Application Platform (CDAP) is an open source , enterprise ready integration platform that provides necessary components for building production ready data platform around spark.The latest version of CDAP 3.4 provides easy to use API for Java and Scala,provides support for Spark SQL ,Spark Streaming and for fine grained transactions with Apache Tephra.

(Source: )

Spark GraphX in Action Book Review and Interview.InfoQ, September 12,2016.

The book authored by Michael Malak and Robin East “Spark GraphX in Action”  provides a complete tutorial based coverage of the graph processing library GraphX.The readers of this book can learn how to use SQL with Spark graphs by using GraphFrames API. The book will also help its readers learn on how to apply machine learning algorithms to graph data.

(Source: )

Apache Spark Earns Datanami Awards for Machine Learning, Real-time Analytics, and More.,September 19,2016

Apache Spark -The fast and flexible big data processing engine created by Matei Zaharia was honoured with four awards at the Datanami Readers' and Editors' Choice Awards with votes contributed by the big data community worldwide.The four honours to the Apache Spark framework include -
1) Readers' Choice - Best Big Data Product or Technology: Machine Learning 
2) Readers' Choice - Best Big Data Product or Technology: Real-Time Analytics 
3) Readers' and Editors' Choice - Top 5 Open Source Projects to Watch 
4) Readers' Choice - Best Big Data Startup: Databricks


Apache Spark Survey 2016 Results Now,September 27, 2016

Databricks revealed the results of Apache Spark Survey conducted in July 2016 to analyse spark community growth trends. 900 different companies and 1615 Apache Spark users participated in this survey.The results favoured the growth of the swiss army knife of big data- Apache Spark.The survey results showed that the number of spark users and the meetup members has tripled since 2015 and also the number of people working on Spark projects has gone up by 67%.


sparklyr — R interface for Apache Spark., September 27, 2016

The increasing demand for a native dplyr interface to Apache Spark has led to the innovation of a new package sparklyr that allows R programmers to tap into apache spark big data. Big giants in the industry have already started using the new interface- IBM has already incorporated sparklyr interface into their data science experience, H20 has an integration between H20 Sparkling Water and sparklyr and Cloudera is experimenting with the new interface to ensure that it meets the requirements of its enterprise customers.




Apache Spark Training

Relevant Projects

Explore features of Spark SQL in practice on Spark 2.0
The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.

Real-time Auto Tracking with Spark-Redis
Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.

Design a Hadoop Architecture
Learn to design Hadoop Architecture and understand how to store data using data acquisition tools in Hadoop.

Spark Project -Real-time data collection and Spark Streaming Aggregation
In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

Yelp Data Processing Using Spark And Hive Part 1
In this big data project, we will continue from a previous hive project "Data engineering on Yelp Datasets using Hadoop tools" and do the entire data processing using spark.

Spark Project-Analysis and Visualization on Yelp Dataset
The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

Web Server Log Processing using Hadoop
In this hadoop project, you will be using a sample application log file from an application server to a demonstrated scaled-down server log processing pipeline.

Online Hadoop Projects -Solving small file problem in Hadoop
In this hadoop project, we are going to be continuing the series on data engineering by discussing and implementing various ways to solve the hadoop small file problem.

Implementing Slow Changing Dimensions in a Data Warehouse using Hive and Spark
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Movielens dataset analysis using Hive for Movie Recommendations
In this hadoop hive project, you will work on Hive and HQL to analyze movie ratings using MovieLens dataset for better movie recommendation.