Recap of Apache Spark News for March 2018

Recap of Apache Spark News for March 2018

News on Apache Spark - March 2018

Apache Spark News March 2018

Visual Spark Studio IDE For Spark Apps. March 1, 2018.

The new free Spark Studio IDE allows spark users to build, test and run spark applications on desktop. The IDE is a free cut-down version of the Impetus StreamAnalytix platform and acts as a lightweight development tool for processing data and performing analytics on apache spark data. This IDE can be used to build Spark applications for both batch and streaming mode with ready-to-use operators to drag and drop,select,  connect, and  configure connectors for a completely functional apache spark pipeline. The IDE can also be used to receive data from local data sources and other data targets and has all the machine learning capabilities of StreamAnalytix platform in a single instance.

(Source - )

Apache Spark Training

If you would like more information about Apache Spark Training and Certification, click the Request Info button on top of this page.

Databricks Announces Availability of Apache Spark 2.3 Within its Unified Analytics, March 6, 2018

Databricks made the Apache Spark 2.3.0 platform available on the unified analytics platform within a compute engine Databricks Runtime 4.0. Databricks Runtime 4.0 brings in new features that include Machine Learning Model export to simplify production deployments and performance optimizations. Apache Spark 2.3.0 is a major milestone as it introduces the continuous processing mode of structured streaming with millisecond low-latency along with other enhanced features across the project.

(Source - )

Data Science Projects

Splice Machine Releases Native Spark Integration For Data Science and IoT, March 8, 2018.

Splice Machine, the leading data platform that powers intelligent applications announced that availability of its native Apache Spark DataSource to simplify and accelerate IoT and machine learning applications. This new connector will provide fast, native and ACID compliant datastore for Apache Spark. Data engineers, data scientists and hadoop developers can directly use Apache Spark without excessive data transfers in and out of the Splice Machine.Splice Machine’s underlying Spark engine can be used directly for leveraging advanced spark capabilities such as Spark Streaming, MLlib, and Spark SQL. Moreover, Splice Machine is native to spark so it does not require data serialization and transfer across JDBC/ODBC connections.

(Source : )

Apache Spark Market – IoT devices are anticipated to drive in Near, March 12, 2018

Apache Spark has matured as a big data processing framework and become a mainstream solution at ideal time when IoT devices are increasingly growing in the market. IoT devices are expected to drive the Apache Spark market in the next few years with increasing needs for processing large datasets.The demand for Apache Spark framework is anticipated to rise in the near future as fog computing gains popularity.Major players in Apache Spark market are Cloudera, Inc., IBM Corporation, Databricks, MapR Technologies Inc.,and Qubole, Inc.

(Source : )

What’s new in Apache Spark? Low-latency streaming and, March 15, 2018.

Apache Spark 2.3 has been released with two novel features - the biggest and major enhancement is change to the streaming operations since the time Spark Streaming has been added to the Apache Spark project. Spark 2.3 version brings Continuous Processing to Structured Streaming that will provide low-latency responses on the order of 1 millisecond instead of 100  millisecond which you are likely to get with micro-batching.The other major enhancement in 2.3 is the integration of Kubernetes for executing Apache Spark jobs in container clusters.Native integration of Kubernetes will make Hadoop deployments temperamental beasts. Organizations will move their Spark deployments onto Kubernetes either in the cloud or on-premise.

(Source : )

How Microsoft Azure Databricks can help companies speed big data and AI, March 23, 2018

The big data analytics platform based on Apache Spark and Optimized for Azure will help organizations better integrate and scale machine learning projects.AWS continue to lead the cloud space being adopted by 64% firms in 2018, Azure is also gaining popularity with 45% of the organizations using it.Using Microsoft Azure Databricks , the company was able to enhance the productivity of its data science team by more than 50%.Azure Databricks is integrated with other Azure services which include Azure Data Lake Store, Azure SQl Data Warehouse, Azure Cosmos DB, and Azure IoT Hub.

(Source : )

 Apache Spark News



Relevant Projects

Analysing Big Data with Twitter Sentiments using Spark Streaming
In this big data spark project, we will do Twitter sentiment analysis using spark streaming on the incoming streaming data.

Analyse Yelp Dataset with Spark & Parquet Format on Azure Databricks
In this Databricks Azure project, you will use Spark & Parquet file formats to analyse the Yelp reviews dataset. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.

Tough engineering choices with large datasets in Hive Part - 2
This is in continuation of the previous Hive project "Tough engineering choices with large datasets in Hive Part - 1", where we will work on processing big data sets using Hive.

Real-time Auto Tracking with Spark-Redis
Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.

Yelp Data Processing using Spark and Hive Part 2
In this spark project, we will continue building the data warehouse from the previous project Yelp Data Processing Using Spark And Hive Part 1 and will do further data processing to develop diverse data products.

Data Warehouse Design for E-commerce Environments
In this hive project, you will design a data warehouse for e-commerce environments.

Design a Hadoop Architecture
Learn to design Hadoop Architecture and understand how to store data using data acquisition tools in Hadoop.

Real-Time Log Processing using Spark Streaming Architecture
In this Spark project, we are going to bring processing to the speed layer of the lambda architecture which opens up capabilities to monitor application real time performance, measure real time comfort with applications and real time alert in case of security

Web Server Log Processing using Hadoop
In this hadoop project, you will be using a sample application log file from an application server to a demonstrated scaled-down server log processing pipeline.

Online Hadoop Projects -Solving small file problem in Hadoop
In this hadoop project, we are going to be continuing the series on data engineering by discussing and implementing various ways to solve the hadoop small file problem.