News on Apache Spark - February 2017
New MemSQL Spark 2 Connector Operationalizes Powerful Advanced Analytics.MarketWired.com, February 1, 2017.
The provider of fastest database platform for real-time analytics MemSQL announced the release of Spark 2 connector. This new connector will support Spark version 2.0 and 2.1.Some of the functions provided by the new MemSQL Spark connector include access to manipulate data inside MemSQL and Spark, using Spark Session as entry point for the DataFrame API. The new connector also allows bidirectional data movement between MemSQL and Apache Spark.MemSQL Spark 2 Connector provides performance enhancement through SQL pushdown support with filter and faster in-database processing through predicate DataFrame operations.
(Source : http://www.marketwired.com/press-release/new-memsql-spark-2-connector-operationalizes-powerful-advanced-analytics-2192773.htm )
Machine learning A-team: TensorFlow, Apache Spark MLlib, MOA and more. Jaxenter.com, February 2, 2017.
As technology giants heavily rely on Artificial Intelligence and Machine learning , the demand for machine learning experts is increasing. There is no doubt that the hiring for professionals in the machine learning field is on the spree and for one to get hired they need to have right tools skillset. With the most popular language for machine learning today being Python, there are other tools that professionals must master to get hired. Some of the frameworks that professionals need to master to make use of machine learning include Apache Spark MLib, Amazon Machine Learning, Google Cloud Machine Learning, TensorFlow,etc.
(Source : https://jaxenter.com/machine-learning-frameworks-list-131500.html )
Benchmarks Show Diablo Technologies' Memory1 Doubles the Speed of Apache Spark Graph Processing.PRNewsWire, February 7,2017.
The benchmark data released by Inspur Systems and Diablo Technologies revealed that using Memory1 solution for Apache Spark workloads they could reduce the processing time for graph analytics by more than half. The benchmarking used k-core decomposition algorithm of Spark’s GraphX analytics engine. The highlights of the Memory1 solution benchmarking are that Spark users will now be able to achieve more work per server and there would be considerable reduction in the time required to process larger datasets than servers with DRAM alone. This will result in spark users getting more work done with minimal server sprawl, existing resources and improved TCO.
(Source : http://www.prnewswire.com/news-releases/benchmarks-show-diablo-technologies-memory1-doubles-the-speed-of-apache-spark-graph-processing-300403107.html )
Taking on Apache Spark, Hazelcast launches faster data processing engine. SiliconAngle.com, February 7, 2017
Hazelcast Inc. is stealing the attention from popular big data frameworks like Apache Spark and Apache Flink claiming that its new open source solution Hazelcast jet is faster and low-latency big data processing engine for big data. Hazelcast Jet has in-memory data grid for storing incoming big data streams. The idea behind the development of Hazelcast Jet is to make both storage and computation in-memory. For parallel execution on incoming data. This will enable big data applications to operate in as close to real-time as its possible.
(Source : http://siliconangle.com/blog/2017/02/07/hazelcast-launches-lightweight-distributed-data-processing-engine-rival-apache-spark/ )
Cloudera And Intel Speed Up Machine Learning Workloads With Apache Spark, Intel® Math Kernel Library Integration. GlobeNewsWire.com, February 8,2017.
Cloudera unveiled a joint tested solution with Intel to enhance machine learning capabilities and AI workloads.Benchmark tests on Cloudera with Spark and the new Math Kernel Library (Intel MKL) show that the collective offering can enhance performance of machine learning algorithms on huge datasets with minimal hardware and time. This will organisations speed up their predictive analytic investments.
(Source : https://globenewswire.com/news-release/2017/02/08/915258/0/en/Cloudera-And-Intel-Speed-Up-Machine-Learning-Workloads-With-Apache-Spark-Intel-Math-Kernel-Library-Integration.html )
EnterpriseDB Releases New Apache Spark Connector.Dbta.com, February 9, 2017.
EDB is launching a latest version of EDB Postgre Data Adapter for hadoop that will offer compatibility with the in-memory cluster computing framework Apache Spark so that users can easily combine analytic workloads. This new connector for spark will allows users to combine analytic workloads based on HDFS with the operation data stored in Postgres using Spark interface.Jason Davis, senior director, product management at EnterpriseDB said - “Any enterprise that’s collecting weblog information, storing data about customers for analytic purposes, will benefit the most from this.”
(Source : http://www.dbta.com/Editorial/News-Flashes/EnterpriseDB-Releases-New-Apache-Spark-Connector-116278.aspx )
Yahoo supercharges TensorFlow with Apache Spark. TechCrunch.com, February 14, 2017.
Yahoo built CaffeonSpark to benefit the machine learning community so that developers could build models in Caffe to scale with parallel processing. Yahoo is all set to now pair up its most popular framework TensorFlow with Apache Spark - TensorFlowOnSpark .Combination of TensorFlow and Apache Spark will make the deep learning framework appealing to the developers who want to build deep learning models that need to run on large computing clusters.Using TensorFlowOnSpark developers can quickly alter their existing TensorFlow programs.
(Source : https://techcrunch.com/2017/02/13/yahoo-supercharges-tensorflow-with-apache-spark/ )
Drizzle on tap to spur Spark Streaming architecture. SearchDataManagement, February 16,2017.
Until now Apache Spark Architecture was focussed on programming benefits but now its streaming capabilities are being leveraged to make the most out of it. Drizzle comes as a rescue to reduce the streaming latency when compared to other streaming options. Drizzle framework will soon become a part of Apache Spark by end of 2017. This streaming update of Databrick’s is targeted towards providing a platform for broad new big data analytics use cases. Drizzle is being developed to help promote the use of lambda architecture which combine both real-time and batch processing.
(Source : http://searchdatamanagement.techtarget.com/news/450413166/Drizzle-on-tap-to-spur-Spark-Streaming-architecture )
Containers help move Spark and Hadoop into analytics operations.Searchdatamanagement.com , February 21,2017.
Moving individual Spark and Hadoop projects into production has become an overwhelming task but the use of container technology is all set to ease the process. Configuration complexity is a major roadblock when shared with a larger user base. To get away this problem, users are using DevOps oriented container and micro-services techniques as they stick Hadoop and Spark components together. Container implementations can be scripted but it is challenging as big data pipelines feature increasing number of components.Apache Spark , in particular , cannot easily adapt to container methods due to complex workloads and this is where BlueData platform addresses the container implementation needs.
(Source : http://searchdatamanagement.techtarget.com/news/450413510/Containers-help-move-Spark-and-Hadoop-into-analytics-operations )