News on Apache Spark - November 2016
The newly released Redis-ML component for the popular in-memory data store accelerates machine learning functions with Apache Spark.Infoworld.com, November 2,2016.
Redis,in-memory datastore recently expanded its functionality through a module architecture featuring a new machine learning add-on that speeds up the delivery of results through trained data instead of training the model itself. This module works with the machine learning components of Apache Spark which handles the data gathering phase.Redis plugs into the Apache Spark cluster through the Redis SparK ML Module.
Industry Trends and Apache Spark's Evolving Role in the Big Data Landscape.Dzone.com, November 4, 2016.
Apache Spark has been the biggest trend in the field of Big Data as of now, with novel opportunities. In future Apache Spark will be more useful than Hadoop for the computational purposes, Spark workloads will increasingly move into production. Trends show that in future companies will start drinking their own Spark champagne. This means that Spark will no longer just be used for customer centric use cases but will be used to build models to places, stress test the risk involved in financial instruments and lots more.
Machine learning and data science workloads ignite Apache Spark adoption. CBROnline.com, November 8,2016.
According to a Cloudera study conducted by Taneja Group on 7000 professionals involved in big data- 54% of them are using Apache Spark actively whilst 64% are finding it to be invaluable. It is proving to be invaluable for 64% of the people as they plan to expand their usage over the next one year.With the emergence of machine learning applications and 71% employing Apache Spark for data science- Spark is here to stay to support the increasing number of workloads for real-time processing.
(Source: http://www.cbronline.com/news/enterprise-it/software/machine-learning-data-science-workloads-ignite-apache-spark-adoption/ )
Databricks Sets New World Record for CloudSort Benchmark Using Apache Spark. Dbta.com, November 16,2016
Databricks has broken a third party benchmarking competition - CloudSort Benchmark for processing large datasets.Databricks in collaboration with Nanjing University and Alibaba Group architected an efficient cloud platform for processing of large datasets. The platform sorted 100TB of data at an economical cost of $1.44 per TB outperforming the earlier record of $4.51 per TB.The benchmark is meant to measure the lowest possible cost in the public cloud pricing per TB.
(Source : http://www.dbta.com/Editorial/News-Flashes/Databricks-Sets-New-World-Record-for-CloudSort-Benchmark-Using-Apache-Spark-114812.aspx )
Review: Spark lights up machine learning.InfoWorld.com, November 16,2016.
Apache Spark's machine learning library MLlib is bringing in machine learning capabilities to large compute clusters by combining it with TensorFlow for deep learning.Users can now make use of Databricks configuration of Apache Spark Clusters to use GPU’s rather than using stock CPU. GPU’s will give users 10 times better speed for training complex machine learning algorithms with big data.
DWP explores use of Apache Spark.UKAuthority.com,November 9,2016.
Data scientists at Department for Work and Pensions (DWP) are exploring Apache Spark for processing large datasets . Data science team at DWP is working with Spark technology for investigations of AI, Machine learning and latest uses of data. Team of 20 people and more are working on this platform since half a year hoping to create an application with a specific service which will be valuable in a short period of time.Their goal is to build up some of their own capabilities and look at the analysis that can be performed using Apache Spark.
Spark 2.0.2 Released.Apache.org,November 14,2016.
Apache Spark 2.0.2 has been recently released in the market.Databricks is strongly recommending all its existing users to upgrade to the latest version as it has included the fixes across several areas, Kafka 0.10 along with runtime metric support. The latest version also fixes several bug fixes on top of 2.0.1.
Couchbase 4.6 Developer Preview Released, Adds Real-Time Connectors for Apache Spark 2.0 and Kafka.Infoq.com, November 28,2016.
The Couchbase 4.6 has added few new features to it that include full text search capability based Golang based open source library bleve. The next feature which is included is Cross Datacenter Replication. The main focus of this feature is to ensure that the applications used in different geographic locations remain in a consistent state. The other main features include connectors for real-time analytics technologies Spark 2.0 and Kafka. Spark 2.0 connector supports structured streaming and automatic flow control on a Couchbase cluster.
(Source : https://www.infoq.com/news/2016/11/couchbase-4.6-developer-preview)