Recap of Apache Spark News for June

Recap of Apache Spark News for June

News on Apache Spark - June 2016

Apache Spark News for June

Apache Spark shoots up as one of the highest paying job profiles in 2016. June 8, 2016.

According to Tech Overflow’s latest survey, Spark developers in the US are earning at an average of $125,000 per year. If you take the recent votes on a Stack Overflow survey – Apache Spark has recorded the second highest year-on-year increase at 163.5% on the adoption game.

(Source: )

Apache Spark Online Training

IBM Releases Cloud-Based Apache Spark Development Environment. June 8,

The new environment called the Data Science Experience hosted on Bluemix cloud platform will have 250 data sets, collaborative workspace and various open source tools for data scientists to speed up analytics in real-time. The Data Science Experience will provide data scientists with a single security-rich managed environment for data curation, data ingestion and data analysis by  combining data, content, models and other open source resources like RStudio, Jupyter Notebook, H20 on Apache Spark.

(Source: )

For the complete list of big data companies and their salaries- CLICK HERE

Couchbase Apache Spark Connector Accelerates Time to Insight and Time to Action for Digital Economy Businesses. June 12,

The new Couchbase Spark Connector combines the power of the analytical platform Apache Spark to extract meaningful data insights and operational database platform like Couchbase to turn insights into actions. Couchbase Spark connector will help businesses deliver enriching customer experience through web, mobile and IoT applications. Couchbase Spark connector will add value for use cases like network intrusion detection, failure detection, Customer 360 view, real time product recommendations and fraud detection.

(Source: )

Databricks makes a strategic partnership with the CIA in Apache Spark adoption. June 22, 2016.

Databricks has made a strategic partnership with the CIA’s investment wing – In-Q-Tel. While In-Q-Tel is a separate entity from the CIA – its history shows how passionate the IQT is when working with world class analytics companies.

(Source: )

Apache Spark cluster computing continues to mature. June 22, 2016.

It’s not just the well-known vendors like IBM that are betting on Apache Spark but many organizations are adopting Apache Spark for in-memory cluster computing. ClearStory data has unveiled a spark based technology known as IDOD i.e. Infinite Data Overlap Detection. This technology uses data inference and ClearStory data’s harmonization technology (the technology that measures how well the 2 data sets can be combined together). IDOD will help users from a non-technical to mix and match data from various sources and analyse it. IDOD will automate data preparation and blending of the desired data to provide results in minutes when compared to modelling it manually which takes weeks or days.

(Source: )

Structured Steaming in Spark – Explained by Matei Zaharia. June 27, 2016. 

In the MesosCon 2016 keynotes speech, Matei Zaharia talked about Apache Spark’s advanced data analytics capabilities and its upcoming 2.0 release. The most significant feature in Apache Spark 2.0 is structured streaming. "With structured streaming, you're able to take the data in a stream, build a table in Spark SQL, and serve the table through JBDC, and anything that docks SQL can query the real time state of your stream," Zaharia said.

(Source: )

BlueTalon Extends Data-Centric Security Platform to Support Apache Spark. June 27,

BlueTalon has released its data centric security solution for Apache Spark. BlueTalon is the first company to provide data security across Hadoop, Spark, hive and other platforms used for big data processing. Data centric security solutions help companies eliminate security blind spots providing them with the ability to control the data layer directly. BlueTalon ensures precise and dynamic security controls with dynamic data masking, data authorization and stealth analytics to protect sensitive data.

(Source: )

Enrol now for Apache Spark Training Online to become a Certified Spark Developer

MongoDB Enables Advanced Real-Time Analytics on Fast Moving Data with New Connector for Apache Spark. June 28, 2016.prnewswire

The new MongoDB connector for Apache Spark will help developers and data scientists glean valuable insights in real-time on operational and streaming data. Industry estimates reveal that 80% of analytics development effort goes into data integration and the new MongoDB connector for spark will eliminate the need to shuttle data between operational and analytics data infrastructure. The new connectors will help developers built applications with ease and faster through a single analytics and database technology stack.

(Source: )

Hortonworks tightens Hadoop security, intros Spark-based notebook for data scientists. June 28,

At the Hadoop Summit in San Jose, California, Hortonworks announced several new updates to its big data platform that included enhanced security, easier data analytics using apache spark and better developer productivity. Hortonworks released the availability of Apache Zeppelin, a spark based notebook for data scientists. Apache Zeppelin is a graphical environment that will help data scientists create and share data visualizations. As Zeppelin is built on Apache Spark data scientists can enjoy fast data processing speeds that Spark offers as an in-memory system whilst creating Tableau like data visualizations.

(Source: )



Apache Spark Training Online

Relevant Projects

Data processing with Spark SQL
In this Apache Spark SQL project, we will go through provisioning data for retrieval using Spark SQL.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Hadoop Project-Analysis of Yelp Dataset using Hadoop Hive
The goal of this hadoop project is to apply some data engineering principles to Yelp Dataset in the areas of processing, storage, and retrieval.

Tough engineering choices with large datasets in Hive Part - 1
Explore hive usage efficiently in this hadoop hive project using various file formats such as JSON, CSV, ORC, AVRO and compare their relative performances

Web Server Log Processing using Hadoop
In this hadoop project, you will be using a sample application log file from an application server to a demonstrated scaled-down server log processing pipeline.

Finding Unique URL's using Hadoop Hive
Hive Project -Learn to write a Hive program to find the first unique URL, given 'n' number of URL's.

Data Warehouse Design for E-commerce Environments
In this hive project, you will design a data warehouse for e-commerce environments.

Yelp Data Processing using Spark and Hive Part 2
In this spark project, we will continue building the data warehouse from the previous project Yelp Data Processing Using Spark And Hive Part 1 and will do further data processing to develop diverse data products.

Yelp Data Processing Using Spark And Hive Part 1
In this big data project, we will continue from a previous hive project "Data engineering on Yelp Datasets using Hadoop tools" and do the entire data processing using spark.

Analysing Big Data with Twitter Sentiments using Spark Streaming
In this big data spark project, we will do Twitter sentiment analysis using spark streaming on the incoming streaming data.