News on Apache Spark - August 2016
Apache Spark 2.0: MLib Preview. August 6, 2016. insideBIGDATA.com
In the recent Spark Summit 2016, Joseph K Bradley of Databricks focussed on “Apache Spark MLib 2.0 Preview: Data Science and Production”. The focus of MLib 2.0 is the use of APIs critical to data science. Also the pain points addressed are customising pipelines and improving persisting models of production.
MapR’s $50mn funding leverages Apache Hadoop and Spark. August 9, 2016. TechRepublic.com
The popular big data platform - MapR announced its $50mn funding that it got. MapR is now looking at an initial IPO. This move strengthens the rise of MapR’s core product - MapR Converged Data Platform - which leverages Hadoop and Spark. This data platform offers features like global event streaming, real-time database capabilities and enterprise storage for developing and running innovative data applications.
For the complete list of big data companies and their salaries- CLICK HERE
Securing Apache Spark Shuffle Using Apache Commons Crypto. August 11, 2016. Dzone.com
Apache Commons Crypto, a cryptographic library optimized with enhanced encryption standards will provide performance advantages to Spark shuffle encryption over the existing approach. Programmers can use Apache Commons Crypto to implement high performance AES encryption or decryption methods with minimal code and effort. Apache Commons Crypto project was developed by Intel with the name Chimera and is now available to developers as a sub project of Apache Commons.
Can Spark do for machine learning what it’s done for data? August 17, 2016. SiliconAngle.com
Spark is breaking the language barrier by offering algorithmic implementations and API’s for multiple languages. As of now Spark’s structured streaming can apply to batch for learning tasks and predictions can be made using structured streaming. Apache Spark 2.0 is touted to have continuous streaming app capabilities and will expand support for training machine leaning models.
Spark innovation: Catalyst Optimizer simplifies complicated queries. August 19, 2016.
The fundamental piece of Spark- Spark SQL’s Catalyst Optimizer simplifies the execution of complicated queries and provide high performance. Catalyst Optimizer supports Databricks new dataframe API to make big data accessible and simple for users.
Hadoop Based Data Lakes are augmenting the use of Apache Spark. ComputerWeekly.com, August 22, 2016.
Adoption of Apache Spark is still taken with a grain of salt by the developers, mostly because Apache Spark lacks the distributed storage space. Apache Spark is becoming invaluable in terms of real time data processing and companies involved in gaming, betting and also providing financial solutions in fraud detection - are swearing by Spark. But Apache Spark needs Hadoop’s storage and that is where Hadoop Data Lakes come in.
Bridging the Gap with Spark and SAP HANA. August 24, 2016. DBTA.com
Hadoop’s capabilities helped businesses store and access large amounts of data but organizations are still encountering high-performance demands. Emerging in-memory frameworks like SAP HANA Vora and Apache Spark are providing organizations with tools to overcome the limitations of batch oriented processing to help them achieve real-time iterative access to data on Hadoop clusters.
HPE adapts Vertica analytical database to world with Hadoop, Spark. August 31, 2016.SearchDataManagement
To compete in a field of diverse data tools, HPE has adapted to Vertica 8.0 that expanded its analytical database support for Hadoop, Spark and Kafka. High performance querying capabilities of Vertica 8.0 can now reach to hadoop and spark to bring in valid results sets back to the database environment