Recap of Apache Spark News for January

Recap of Apache Spark News for January

News on Apache Spark - January 2016

Apache Spark News for January

Spark 1.6 is released., January 5, 2016

Apache Foundation released Spark 1.6 on January 4th, 2016. The latest version of Apache Spark offers better memory management and performance enrichments for faster processing of Parquet data format. Improved Parquet performance is likely to enhance the overall performance for streaming state management in Apache Spark. The latest version of Spark will speed up the overall performance of Apache Spark when working with existing Hadoop systems.


Mtell Operationalizes Apache Spark for the Industrial Internet of Things (IIoT)., January 21, 2016

Mtell’s prescriptive and predictive analytics platform -Mtell PreviseTM has been reviewed to execute with the incorporation of all Apache Spark open source elements. Mtell is extending its machine learning platform to Apache Spark to achieve extreme performance and simplify applications for IIOT. Data analysts and data scientists with expertise skills in Scala, Python and R programming can enhance the Mtell Previse platform by deploying custom algorithms, custom business logic and calculations.


Hadoop Survey Shows Spark Coming of Age in 2016., January 20, 2016

 According to a recent survey on “Hadoop Perspective for 2016” by Syncsort, 70% of the respondents to the survey exhibited interest in  deploying Apache Spark framework in 2016 over other compute frameworks because of its compute performance and the flair for interactive, streaming and other analytics capabilities. Apache Spark allows companies leverage novel big data platforms without having to replacing the existing big data tools or learn new skills.


ClearStory adopts Apache 1.6 advances to empower its Business Analysts. January 23, 2016.

With a more diverse and sophisticated set of Big Data to handle, businesses are struggling to keep up with timely and relevant data insights through Big Data solutions. With the Apache Spark 1.6 platform, ClearStory helps businesses with un-restricted data discovery and free-form data exploration.


Google wants to donate its Dataflow technology to, January 20, 2016.

Google recently made an announcement that it would be submitting its dataflow data processing technology to Apache Software Foundation. Google’s Dataflow data processing technology can handle both stream and batch processing of large datasets. The dataflow technology will come under the authority of Apache project along with Apache Flink and Spark runners.


For the complete list of big data companies and their salaries- CLICK HERE

Is Spark replacing Hadoop?, January 27, 2016

In 2010, Hadoop was the heart of the big data industry and Spark was just a research project at University of California at Berkeley. Today, Spark is an integral part of all the big data conversations happening across the globe. With many business banging straight on Spark and foregoing Hadoop- Is Spark going to replace Hadoop is the big “Big Data” question of 2016.


Spark 2015 Year In Review., January 5, 2016

2015 saw an exponential growth in the enterprise adoption of Apache Spark with 1000 developers contributing code to the Spark Community when compared to 500 in 2014.2015 was a noteworthy year for Apache Spark in big data with major developments in platform API’s , Performance optimizations through Project Tungsten, Spark Streaming and Machine Learning API’s for Data Science.


Learn Apache Spark Online Now to upgrade your big data skills!




Relevant Projects

Spark Project -Real-time data collection and Spark Streaming Aggregation
In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

Data Mining Project on Yelp Dataset using Hadoop Hive
Use the Hadoop ecosystem to glean valuable insights from the Yelp dataset. You will be analyzing the different patterns that can be found in the Yelp data set, to come up with various approaches in solving a business problem.

Online Hadoop Projects -Solving small file problem in Hadoop
In this hadoop project, we are going to be continuing the series on data engineering by discussing and implementing various ways to solve the hadoop small file problem.

Airline Dataset Analysis using Hadoop, Hive, Pig and Impala
Hadoop Project- Perform basic big data analysis on airline dataset using big data tools -Pig, Hive and Impala.

Hadoop Project for Beginners-SQL Analytics with Hive
In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.

Analysing Big Data with Twitter Sentiments using Spark Streaming
In this big data spark project, we will do Twitter sentiment analysis using spark streaming on the incoming streaming data.

Movielens dataset analysis using Hive for Movie Recommendations
In this hadoop hive project, you will work on Hive and HQL to analyze movie ratings using MovieLens dataset for better movie recommendation.

Finding Unique URL's using Hadoop Hive
Hive Project -Learn to write a Hive program to find the first unique URL, given 'n' number of URL's.

Real-Time Log Processing in Kafka for Streaming Architecture
The goal of this apache kafka project is to process log entries from applications in real-time using Kafka for the streaming architecture in a microservice sense.

Implementing Slow Changing Dimensions in a Data Warehouse using Hive and Spark
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.