Recap of Apache Spark News for March 2018

Recap of Apache Spark News for March 2018

News on Apache Spark - March 2018

Apache Spark News March 2018

Visual Spark Studio IDE For Spark Apps. March 1, 2018.

The new free Spark Studio IDE allows spark users to build, test and run spark applications on desktop. The IDE is a free cut-down version of the Impetus StreamAnalytix platform and acts as a lightweight development tool for processing data and performing analytics on apache spark data. This IDE can be used to build Spark applications for both batch and streaming mode with ready-to-use operators to drag and drop,select,  connect, and  configure connectors for a completely functional apache spark pipeline. The IDE can also be used to receive data from local data sources and other data targets and has all the machine learning capabilities of StreamAnalytix platform in a single instance.

(Source - )

Apache Spark Training

If you would like more information about Apache Spark Training and Certification, click the Request Info button on top of this page.

Databricks Announces Availability of Apache Spark 2.3 Within its Unified Analytics, March 6, 2018

Databricks made the Apache Spark 2.3.0 platform available on the unified analytics platform within a compute engine Databricks Runtime 4.0. Databricks Runtime 4.0 brings in new features that include Machine Learning Model export to simplify production deployments and performance optimizations. Apache Spark 2.3.0 is a major milestone as it introduces the continuous processing mode of structured streaming with millisecond low-latency along with other enhanced features across the project.

(Source - )

Data Science Projects

Splice Machine Releases Native Spark Integration For Data Science and IoT, March 8, 2018.

Splice Machine, the leading data platform that powers intelligent applications announced that availability of its native Apache Spark DataSource to simplify and accelerate IoT and machine learning applications. This new connector will provide fast, native and ACID compliant datastore for Apache Spark. Data engineers, data scientists and hadoop developers can directly use Apache Spark without excessive data transfers in and out of the Splice Machine.Splice Machine’s underlying Spark engine can be used directly for leveraging advanced spark capabilities such as Spark Streaming, MLlib, and Spark SQL. Moreover, Splice Machine is native to spark so it does not require data serialization and transfer across JDBC/ODBC connections.

(Source : )

Apache Spark Market – IoT devices are anticipated to drive in Near, March 12, 2018

Apache Spark has matured as a big data processing framework and become a mainstream solution at ideal time when IoT devices are increasingly growing in the market. IoT devices are expected to drive the Apache Spark market in the next few years with increasing needs for processing large datasets.The demand for Apache Spark framework is anticipated to rise in the near future as fog computing gains popularity.Major players in Apache Spark market are Cloudera, Inc., IBM Corporation, Databricks, MapR Technologies Inc.,and Qubole, Inc.

(Source : )

What’s new in Apache Spark? Low-latency streaming and, March 15, 2018.

Apache Spark 2.3 has been released with two novel features - the biggest and major enhancement is change to the streaming operations since the time Spark Streaming has been added to the Apache Spark project. Spark 2.3 version brings Continuous Processing to Structured Streaming that will provide low-latency responses on the order of 1 millisecond instead of 100  millisecond which you are likely to get with micro-batching.The other major enhancement in 2.3 is the integration of Kubernetes for executing Apache Spark jobs in container clusters.Native integration of Kubernetes will make Hadoop deployments temperamental beasts. Organizations will move their Spark deployments onto Kubernetes either in the cloud or on-premise.

(Source : )

How Microsoft Azure Databricks can help companies speed big data and AI, March 23, 2018

The big data analytics platform based on Apache Spark and Optimized for Azure will help organizations better integrate and scale machine learning projects.AWS continue to lead the cloud space being adopted by 64% firms in 2018, Azure is also gaining popularity with 45% of the organizations using it.Using Microsoft Azure Databricks , the company was able to enhance the productivity of its data science team by more than 50%.Azure Databricks is integrated with other Azure services which include Azure Data Lake Store, Azure SQl Data Warehouse, Azure Cosmos DB, and Azure IoT Hub.

(Source : )

 Apache Spark News



Relevant Projects

Real-Time Log Processing using Spark Streaming Architecture
In this Spark project, we are going to bring processing to the speed layer of the lambda architecture which opens up capabilities to monitor application real time performance, measure real time comfort with applications and real time alert in case of security

Real-time Auto Tracking with Spark-Redis
Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.

Finding Unique URL's using Hadoop Hive
Hive Project -Learn to write a Hive program to find the first unique URL, given 'n' number of URL's.

Hive Project - Visualising Website Clickstream Data with Apache Hadoop
Analyze clickstream data of a website using Hadoop Hive to increase sales by optimizing every aspect of the customer experience on the website from the first mouse click to the last.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Analysing Big Data with Twitter Sentiments using Spark Streaming
In this big data spark project, we will do Twitter sentiment analysis using spark streaming on the incoming streaming data.

Airline Dataset Analysis using Hadoop, Hive, Pig and Impala
Hadoop Project- Perform basic big data analysis on airline dataset using big data tools -Pig, Hive and Impala.

Data Mining Project on Yelp Dataset using Hadoop Hive
Use the Hadoop ecosystem to glean valuable insights from the Yelp dataset. You will be analyzing the different patterns that can be found in the Yelp data set, to come up with various approaches in solving a business problem.

Tough engineering choices with large datasets in Hive Part - 2
This is in continuation of the previous Hive project "Tough engineering choices with large datasets in Hive Part - 1", where we will work on processing big data sets using Hive.

Hadoop Project-Analysis of Yelp Dataset using Hadoop Hive
The goal of this hadoop project is to apply some data engineering principles to Yelp Dataset in the areas of processing, storage, and retrieval.