Recap of Apache Spark News for November 2017

Recap of Apache Spark News for November 2017

News on Apache Spark - November 2017

Apache Spark News

The future of the future: Spark, big data insights, streaming and deep learning in the, November 1, 2017

With Apache Spark booming and its community growing at a rapid pace, spark is making waves in the big data ecosystem.Though Spark in the cloud is nothing new , Databricks is announcing it latest addition Delta - smart cache layer in the cloud which will offer scalability and elasticity in the cloud.A smart cache layer like Delta brings an array of benefits for people working in the cloud only if they are willing to shell out big  bucks. However, Databricks major focus is on growing its proprietary platform by making streaming and deep learning work together in the cloud. 

Source :


Learn Spark Online

If you would like more information about Apache Spark Training and Certification, click the Request Info button on top of this page.

Microsoft launches Azure Databricks, a new cloud data platform based on Apache, November 15, 2017.

Azure users interested in gleaning meaningful business insights by parsing huge amounts of data will soon be  able to use Azure Databricks built around the popular open source big data framework and developed in collaboration with Databricks. The first Spark-as-a-service of any of the cloud vendors , Azure Databricks will be used to model real-time data patterns. For instance, the platform would be used to measure how guests in a hotel move around the lobby so the hotel can decide on the best place furniture and guest service.

(Source : )

Data Science Projects

Cloudera Bets Its Future on Scalability for Spark, GATK, November 15, 2017. 

Shawn Dolley, global industry leader of health and life sciences at Cloudera said - “Spark "is becoming the lingua franca of research computing pipeline generation”. Earlier Cloudera was a support organization for most of the big data technologies but now one third of the demand for Cloudera services is from folks working on computational pipelines and they want it to be in Apache Spark. Cloudera ( of which Intel holds a stake of 18% ) is among the leading providers of support for Apache Spark when it comes to clinical data.
(Source : )

Big-data company Qubole brings Apache Spark to AWS, November 22, 2017.

Qubole is making Apache spark more easier and flexible to use by providing its customers with ability to run Apache Spark applications on AWS Lambda service. The ability to execute  Spark apps on Lambda , a serverless compute service will require customers to pay only for the compute power without having to use servers , making its platform elastic and efficient in terms of resource usage. This will overcome two major problems that previously made running Spark applications on Lambda a challenging task.The first one is Spark’s inability to communicate directly with AWS Lambda service and the other is AWS Lambda’s runtime resources that are limited to a maximum runtime duration of 5 minutes, 512 MB disk space and 1536 MB memory.

(Source : )

Azure gets Apache Spark, Cassandra and MariaDB., November 22, 2017.

Microsoft has incorporated various third-party platforms on its Azure Cloud to help data analysts and developers. It’s latest Azure capabilities include a beta Spark cluster computing platform named Azure Databricks that will help data analysts and developers glean insights from enterprise data.Developers can sign up for the beta version of Azure Databricks.
(Source :

 Apache Spark News



Relevant Projects

Analysing Big Data with Twitter Sentiments using Spark Streaming
In this big data spark project, we will do Twitter sentiment analysis using spark streaming on the incoming streaming data.

Create A Data Pipeline Based On Messaging Using PySpark And Hive - Covid-19 Analysis
In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.

Tough engineering choices with large datasets in Hive Part - 2
This is in continuation of the previous Hive project "Tough engineering choices with large datasets in Hive Part - 1", where we will work on processing big data sets using Hive.

Real-Time Log Processing in Kafka for Streaming Architecture
The goal of this apache kafka project is to process log entries from applications in real-time using Kafka for the streaming architecture in a microservice sense.

Data processing with Spark SQL
In this Apache Spark SQL project, we will go through provisioning data for retrieval using Spark SQL.

Spark Project -Real-time data collection and Spark Streaming Aggregation
In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

Design a Hadoop Architecture
Learn to design Hadoop Architecture and understand how to store data using data acquisition tools in Hadoop.

Spark Project-Analysis and Visualization on Yelp Dataset
The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

Finding Unique URL's using Hadoop Hive
Hive Project -Learn to write a Hive program to find the first unique URL, given 'n' number of URL's.

Yelp Data Processing Using Spark And Hive Part 1
In this big data project, we will continue from a previous hive project "Data engineering on Yelp Datasets using Hadoop tools" and do the entire data processing using spark.