Recap of Apache Spark News for May

Recap of Apache Spark News for May

News on Apache Spark - May 2016

Apache Spark News

Google’s Cloud Data Flow outperforms Apache Spark. May 3, 2016. 

At a recent benchmark study by Mammoth Data Inc., Google’s Data Flow Services has beat Apache Spark by a huge margin. Of course there is a contention to this – as Mammoth’s study was sponsored by Google. However, Mammoth Data pointed out that since it uses Apache Spark for its own consultancy – the study is highly objective.

(Source: )

Airbnb Builds on Apache Spark Work With Mobile Matching Engine. May 4, 2016. WallStreetJournal

The startup darling AirBnB has ignited a spark for providing a perfect match between short term renters and landlords. Data analytics is at the heart of the home listing service for rendering personalized search results to users. With the intent to understand its users better, it has built a new matching engine for its mobile application on Apache Spark and its machine learning capabilities. AirBnB is a prime example for making the best use of big data technologies and tools for business growth.

(Source - )

Learn Apache Spark Online

If you would like more information about Big Data careers, please click the orange "Request Info" button on top of this page.

Spark 2.0: Databricks has launched a faster and more efficient version. May 11, 2016.

Spark 1.0 was launched almost two years ago and a change was due. Spark 2.0 is a result of the continuous effort to double down on the good things about Spark and mitigate the limitations. Spark 2.0 is better at SQL and streamlined APIs. Spark 2.0 works 10x faster than Spark 1.0.

(Source: )

For the complete list of big data companies and their salaries- CLICK HERE

Updates to TIBCO Analytics Include New Apache Spark and IoT Functionality. May 21, 2016. AppDeveloperMagazine

TIBCO’s analytics platform leverages big data analytics for augmented intelligence to enhance human experience through streaming analytics solutions that provide a practical approach to cognitive computing. TIBCO has announced several updates to its analytics platform -new data wrangling features in TIBCO Spotfire ,enhanced BI support in TICBO Jasper soft, code free operational intelligence dashboards in TIBCO Live View wen and a new accelerator package for Apache Spark and IoT . If you want to know more in detail about the accelerator package built on top of Apache Spark and IoT then you can read more in detail at appdevelopermagazine.

(Source - )

IBM Uses Apache Spark across Its Products to Help Enterprise Customers. May 24, 2016.

IBM is harnessing the power of Spark and other emerging big data technologies like Zeppelin, link, Storm for handling unstructured and streaming data in a single memory efficient platform. Apache IBM offers Spark-as-a-Service in the cloud and is embedding it into the Watson analytics platform. IBM has moved its ETL platform on top of Apache Spark. This has helped IBM reduce the code to 4 million lines instead of 40 million on ETL platform.

(Source - )

Datadog Adds Hadoop and Spark Integrations to Leading Cloud-Scale Monitoring Platform. May 24, 2016.May 24, 2016. Business wire

The leading vendor of SaaS based monitoring platform for cloud applications- Datadog has announced its support for using Hadoop and Spark. Datadog monitoring service brings data from servers, applications, databases, tools and services to present users with a unified view of the apps that are hosted and run in the cloud. Datadog has announced that the platform would now be integrated with technologies like HDFS, Hadoop MapReduce, YARN and Spark so that users can make the best use of Datadog’s rich dashboards, full stack visibility, targeted alerts, and collaborative tools and integrations.

(Source - )

Want to become a Certified Spark Developer? Enrol now for hands-on Apache Spark Training Online

Could Concord topple Apache Spark from its big data throne? May 25, 2016. TechRepublic

After Hadoop, Spark is in the big data elite and has the most active big data development community in the world but will it remain in elite is the big data question. With the new stream based data processing project Concord ,Apache Spark’s reign seems to be at built on top of Apache Mesos fills in the blank space left by Apache Spark in terms of event based streaming and low latency streaming. Will it be able to topple Spark? Let’s wait and watch.

(Source - )

Apache Spark 2.0 Technical Preview. May 31, 2016. Infoq

Databricks has announced the technical preview of Apache Spark 2.0. However, the preview is just to gather the feedback from the community before it is available in production. The new release is based on the feedback from the big data community and emphasize on two major areas of improvement-SQL interface and programming API’s.

(Source - )


Certified Apache Spark Training

Relevant Projects

Hive Project - Visualising Website Clickstream Data with Apache Hadoop
Analyze clickstream data of a website using Hadoop Hive to increase sales by optimizing every aspect of the customer experience on the website from the first mouse click to the last.

Explore features of Spark SQL in practice on Spark 2.0
The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.

Real-time Auto Tracking with Spark-Redis
Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Yelp Data Processing Using Spark And Hive Part 1
In this big data project, we will continue from a previous hive project "Data engineering on Yelp Datasets using Hadoop tools" and do the entire data processing using spark.

Tough engineering choices with large datasets in Hive Part - 2
This is in continuation of the previous Hive project "Tough engineering choices with large datasets in Hive Part - 1", where we will work on processing big data sets using Hive.

Hadoop Project-Analysis of Yelp Dataset using Hadoop Hive
The goal of this hadoop project is to apply some data engineering principles to Yelp Dataset in the areas of processing, storage, and retrieval.

Web Server Log Processing using Hadoop
In this hadoop project, you will be using a sample application log file from an application server to a demonstrated scaled-down server log processing pipeline.

Data processing with Spark SQL
In this Apache Spark SQL project, we will go through provisioning data for retrieval using Spark SQL.

Data Warehouse Design for E-commerce Environments
In this hive project, you will design a data warehouse for e-commerce environments.