Recap of Apache Spark News for May 2018

Recap of Apache Spark News for May 2018

News on Apache Spark - May 2018

Apache Spark News for May 2018


SnapLogic Introduces SnapLogic eXtreme to Help Data Engineers Operationalize Cloud-Based Big Data, May 2, 2018

SnapLogic announced a novel solution SnapLogic eXtreme that supports complex data processes on cloud big data services such as Microsoft Azure HDInsight, Amazon Elastic MapReduce(EMR) ,and Google Cloud Dataproc. Big Data engineers and data integrators can now make use of SnapLogic eXtreme along with its Enterprise Integration Cloud platform to build powerful Apache Spark pipelines and manage cloud data architectures devoid of having to write complex code.SnapLogic eXtreme helps businesses reduce the operational time and cost whilst eliminating the skills gap required to make the best out of cloud-based data lakes and services.

(Source - )

Apache Spark Training

If you would like more information about Apache Spark Training and Certification, click the Request Info button on top of this page.


Out of all the businesses across the world Real Estate from outside seems to be very simple. Anyone can argue what is complex in selling or buying a property, right? But, the answer is big NO, real estate may seem to be pretty straight forward, although it is very much complex when it comes to decision making. There are various parameters on which the decision of buying or selling is made, which was earlier done manually and completely dependent on individual capability of decision making. Well, thanks to Big Data, all these decisions now can be done with the blink of an eye and also help in providing tailor made solutions based on the client. Below are some of the use cases in which big data proves beneficial for real-estate businesses:

1) Finances

2) Appraisals

3) Insurance analytics

4) Targeted marketing

5) Money laundering prevention.

(Source - )

Data Science Projects

‘DOTA analytics’: Big data meets e-sports in software giant deal with Team, May 10, 2018

The charm of Big Data is irresistible and this is clearly evident from the way it has transformed the whole world. Big Data analytics is applied across diverse and most complex of the industries. Another feather to its cap is the deal between SAP and one of the top e-sports team– Team Liquid. SAP Hana, has in the past also collaborated with Germany Football team for providing data analytics services to them, which helped them winning the 2014 World Cup. Team Liquid and SAP, will partner on software based on machine learning which will help them in providing on demand analytics and even predict the team success rate in each match. The machine learning software will also aim to improve team’s Gold collection. (Source -

Vodafone India leverages Artificial Intelligence and Big Data., May 16,2018

21st century is the dawn of digitization era and the current trends in technology rightfully justify this. The trends reveal that the business focus is inclining towards customer specific experience. This can be seen from war of the telcos going on in India for quite some time now from providing customized user experience to tariff plans. India’s 2nd largest telecom company Vodafone has also moved their IT infrastructure to leverage Big Data and AI. Recently, Vodafone has launched a chatbot TOBi, which will be integrated with their mobile apps, for customer servicing and even for remote troubleshooting. Vodafone is hoping that these digital initiatives will help them service their 22.5 crore subscribers. The same trend can be seen for all the carriers worldwide, by gaining insights into the customer’s profile of voice and data consumption, they can improve their service offerings catering to specific needs of the user and also reduce dependency on the physical customer care units.

(Source -

Google acquires Cask Data to beef up its tools for building and running big data analytics., May 17,2018

In today’s fast paced world, we need everything with the blink of the eye, be it information or infrastructure processing that information. Thanks to cloud-based environment we can have access to on-demand infrastructure. And, this is one area where Google is having a market share of $4 billion and is still catching up with AWS with market share of $20 billion and Microsoft with market share of $21 billion in terms of revenue and corporates choice of cloud. To keep up the pace, Google has recently acquired Cask Data, a startup which specializes in developing solutions for big data analytic services on Hadoop. The key product of Cask is Cask Data Application Platform (CDAP), a unique platform which reduces the time complexity of building big data applications making it easy for enterprises to run these applications. According to Cask’s blogpost, it will be updated by the both Google Cloud and Cask team with new features and will remain as an open source. Although there is no disclosure on the financial terms, at the time of deal Cask was valued at $40 million.

(Source -

 Apache Spark News




Relevant Projects

Airline Dataset Analysis using Hadoop, Hive, Pig and Impala
Hadoop Project- Perform basic big data analysis on airline dataset using big data tools -Pig, Hive and Impala.

Explore features of Spark SQL in practice on Spark 2.0
The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.

Data processing with Spark SQL
In this Apache Spark SQL project, we will go through provisioning data for retrieval using Spark SQL.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Web Server Log Processing using Hadoop
In this hadoop project, you will be using a sample application log file from an application server to a demonstrated scaled-down server log processing pipeline.

Hadoop Project for Beginners-SQL Analytics with Hive
In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.

Hadoop Project-Analysis of Yelp Dataset using Hadoop Hive
The goal of this hadoop project is to apply some data engineering principles to Yelp Dataset in the areas of processing, storage, and retrieval.

Real-time Auto Tracking with Spark-Redis
Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.

Tough engineering choices with large datasets in Hive Part - 2
This is in continuation of the previous Hive project "Tough engineering choices with large datasets in Hive Part - 1", where we will work on processing big data sets using Hive.

Analysing Big Data with Twitter Sentiments using Spark Streaming
In this big data spark project, we will do Twitter sentiment analysis using spark streaming on the incoming streaming data.