Recap of Apache Spark News for May 2018

Recap of Apache Spark News for May 2018

News on Apache Spark - May 2018

Apache Spark News for May 2018


SnapLogic Introduces SnapLogic eXtreme to Help Data Engineers Operationalize Cloud-Based Big Data, May 2, 2018

SnapLogic announced a novel solution SnapLogic eXtreme that supports complex data processes on cloud big data services such as Microsoft Azure HDInsight, Amazon Elastic MapReduce(EMR) ,and Google Cloud Dataproc. Big Data engineers and data integrators can now make use of SnapLogic eXtreme along with its Enterprise Integration Cloud platform to build powerful Apache Spark pipelines and manage cloud data architectures devoid of having to write complex code.SnapLogic eXtreme helps businesses reduce the operational time and cost whilst eliminating the skills gap required to make the best out of cloud-based data lakes and services.

(Source - )

Apache Spark Training

If you would like more information about Apache Spark Training and Certification, click the Request Info button on top of this page.


Out of all the businesses across the world Real Estate from outside seems to be very simple. Anyone can argue what is complex in selling or buying a property, right? But, the answer is big NO, real estate may seem to be pretty straight forward, although it is very much complex when it comes to decision making. There are various parameters on which the decision of buying or selling is made, which was earlier done manually and completely dependent on individual capability of decision making. Well, thanks to Big Data, all these decisions now can be done with the blink of an eye and also help in providing tailor made solutions based on the client. Below are some of the use cases in which big data proves beneficial for real-estate businesses:

1) Finances

2) Appraisals

3) Insurance analytics

4) Targeted marketing

5) Money laundering prevention.

(Source - )

Data Science Projects

‘DOTA analytics’: Big data meets e-sports in software giant deal with Team, May 10, 2018

The charm of Big Data is irresistible and this is clearly evident from the way it has transformed the whole world. Big Data analytics is applied across diverse and most complex of the industries. Another feather to its cap is the deal between SAP and one of the top e-sports team– Team Liquid. SAP Hana, has in the past also collaborated with Germany Football team for providing data analytics services to them, which helped them winning the 2014 World Cup. Team Liquid and SAP, will partner on software based on machine learning which will help them in providing on demand analytics and even predict the team success rate in each match. The machine learning software will also aim to improve team’s Gold collection. (Source -

Vodafone India leverages Artificial Intelligence and Big Data., May 16,2018

21st century is the dawn of digitization era and the current trends in technology rightfully justify this. The trends reveal that the business focus is inclining towards customer specific experience. This can be seen from war of the telcos going on in India for quite some time now from providing customized user experience to tariff plans. India’s 2nd largest telecom company Vodafone has also moved their IT infrastructure to leverage Big Data and AI. Recently, Vodafone has launched a chatbot TOBi, which will be integrated with their mobile apps, for customer servicing and even for remote troubleshooting. Vodafone is hoping that these digital initiatives will help them service their 22.5 crore subscribers. The same trend can be seen for all the carriers worldwide, by gaining insights into the customer’s profile of voice and data consumption, they can improve their service offerings catering to specific needs of the user and also reduce dependency on the physical customer care units.

(Source -

Google acquires Cask Data to beef up its tools for building and running big data analytics., May 17,2018

In today’s fast paced world, we need everything with the blink of the eye, be it information or infrastructure processing that information. Thanks to cloud-based environment we can have access to on-demand infrastructure. And, this is one area where Google is having a market share of $4 billion and is still catching up with AWS with market share of $20 billion and Microsoft with market share of $21 billion in terms of revenue and corporates choice of cloud. To keep up the pace, Google has recently acquired Cask Data, a startup which specializes in developing solutions for big data analytic services on Hadoop. The key product of Cask is Cask Data Application Platform (CDAP), a unique platform which reduces the time complexity of building big data applications making it easy for enterprises to run these applications. According to Cask’s blogpost, it will be updated by the both Google Cloud and Cask team with new features and will remain as an open source. Although there is no disclosure on the financial terms, at the time of deal Cask was valued at $40 million.

(Source -

 Apache Spark News




Relevant Projects

Data Mining Project on Yelp Dataset using Hadoop Hive
Use the Hadoop ecosystem to glean valuable insights from the Yelp dataset. You will be analyzing the different patterns that can be found in the Yelp data set, to come up with various approaches in solving a business problem.

Spark Project-Analysis and Visualization on Yelp Dataset
The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

Online Hadoop Projects -Solving small file problem in Hadoop
In this hadoop project, we are going to be continuing the series on data engineering by discussing and implementing various ways to solve the hadoop small file problem.

Implementing Slow Changing Dimensions in a Data Warehouse using Hive and Spark
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Analysing Big Data with Twitter Sentiments using Spark Streaming
In this big data spark project, we will do Twitter sentiment analysis using spark streaming on the incoming streaming data.

Yelp Data Processing Using Spark And Hive Part 1
In this big data project, we will continue from a previous hive project "Data engineering on Yelp Datasets using Hadoop tools" and do the entire data processing using spark.

Design a Hadoop Architecture
Learn to design Hadoop Architecture and understand how to store data using data acquisition tools in Hadoop.

Data Warehouse Design for E-commerce Environments
In this hive project, you will design a data warehouse for e-commerce environments.

Hadoop Project for Beginners-SQL Analytics with Hive
In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.

Tough engineering choices with large datasets in Hive Part - 2
This is in continuation of the previous Hive project "Tough engineering choices with large datasets in Hive Part - 1", where we will work on processing big data sets using Hive.