News on Apache Spark - June 2017
IBM sparks conversations about analytics, processing and the hunt for ET.Computing.co.uk,June 5, 2017.
IBM data scientists and developers will present multiple talks on various uses of Apache Spark framework that will include its applications in parallel processing and storage. The key highlights of the presentation by IBM scientists at the Spark Summit include -
- Understanding how to make the most of distributed storage that will be presented through a talk on “‘Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash”.
- Another talk on how to do perfect parallel processing will be presented by Kazuaki Ishizaki's focussing on machine learning library framework and its internal API’s.
- IBM scientist Gil Vernick will present a talk on NASA’s SETI project on IBM Cloud platform.
(Source : https://www.computing.co.uk/ctg/news/3011305/ibm-sparks-conversations-about-analytics-processing-and-the-hunt-for-et )
If you would like more information about Apache Spark Training and Certification, click the Request Info button on top of this page.
Apache Spark MapR Connector Provides JSON Support. I-programmer.info, June 5, 2017
MapR-DB , a high performance NoSQL database provides support for 2 primary data models - wide column tables and JSON documents.A new spark connector has been unveiled for MapR-DB JSON data model that will provide developers API’s access to MapR-DB JSON documents from Spark through the Open JSON application interface (OJAI). This connector will provide support for loading data from MapR-DB table as a Spark RDD of OJAI documents and save a Spark RDD into a MapR-DB JSON table. The connector provides support for data frames and dataset API’s making it easy to query MapR-DB binary tables and HBase tables directly with Apache Spark. This makes it easier to construct faster data pipelines by removing intermediary layers, if any and also reduces latency related with data movement.
MemSQL Showcases Machine Learning Image Recognition for Apache Spark.GlobeNewsWire.com, June 5, 2017.
The provider of fastest real-time data warehouse , MemSQL is hosting a session on June 7, 2017 at the Spark Summit 2017 that will dig in about the various image recognition techniques using Apache Spark and how these techniques can be applied in production.The session will be led by the CTO of MemSQL, Nikita Shamgunov at Kiosk 7 in the Expo Hall at Moscone West, in San Francisco from 2.40 PM to 3.10 PM. The key highlights of the sessions include -
- Use of a fast relational datastore to persist data from Spark
- Architectural considerations in building an image recognition pipeline
- Real-time capabilities for instant matches
- Advantages and pitfalls of using particular approaches
(Source: http://www.globenewswire.com/news-release/2017/06/05/1008243/0/en/MemSQL-Showcases-Machine-Learning-Image-Recognition-for-Apache-Spark.html )
Microsoft’s new Machine Learning library make data scientists more productive on Apache Spark. Mspoweruser.com, June 8, 2017.
Microsoft released a new machine learning library for data scientists to be more productive on Apache Spark. The MMLSpark library will provide simplified consistent API’s for handling various data types like categoricals or text,will increase the rate of experimentation and will help leverage cutting edge machine learning methods on large datasets. Data scientists just need to pass the data to the model and the MMLSpark library will do the rest. Data scientists can easily make changes to the feature space and algorithm without having to worry about recording the pipeline. Some of the capabilities of MMLSpark include -scalable image processing pipelines, DNN Featurization, Training on a GPU node, etc.
(Source : https://mspoweruser.com/microsofts-new-machine-learning-library-make-data-scientists-productive-apache-spark/ )
Riot Games turns to Spark to weed out 'toxic' players.ITnews.com.au, June 8, 2017.
Riot Games is all set to incorporate Google’s TensorFlow and Apache Spark to enhance the ability to find out the players who use abusive language on in-game chat and punish them accordingly. Riot Games uses a neural model known as Word2Vec which digs deep into the language used by the game players and deciphers the meaning on the context in which the words were used. This is the first critical step helping them build a blacklist of languages or words they do not want to see in chat. Wes Kerr, a senior data scientist in the player behaviour team says that they are trying to improve the performance of the model by looking into Apache Spark.According to Kerr, Apache Spark has helped Riot Games change the amount of training data that is input to the model and it is a well known fact that the more the training data better is the performance of the algorithms.
(Source: https://www.itnews.com.au/news/riot-games-turns-to-spark-to-weed-out-toxic-players-464582 )
Impetus Technologies Unveils New, TensorFlow-Based Deep Learning Feature on Apache Spark for StreamAnalytix.PRNewswire.com, June 15, 2017.
The leading provider of big data software and service company, Impetus Technologies released an integrated , deep learning capability for its Stream Analytix platform which will be showcased at the DataWorks Summit 2017 in San Jose, California. The company will showcase an image recognition application that will run on a Spark Streaming pipeline on Stream Analytix. Stream Analytics and deep learning in combination provides new breed of application with machine learning capabilities in voice analytics, anomaly detection and IoT.
(Source : http://www.prnewswire.com/news-releases/impetus-technologies-unveils-new-tensorflow-based-deep-learning-feature-on-apache-spark-for-streamanalytix-300474651.html )
Hotels.com uplifts big data to get closer to customers. ITNews.com.au, June 16, 2017.
Hotels.com, a subsidiary of Expedia has upgraded its big data platform to provide better and relevant suggestion of hotels to its visitors. One and a half year of brains and hard work has changed how analytics algorithms at Hotels.com work. They have moved 100% to AWS cloud based platform with Apache Spark running at its heart. For their kind of business after location and price of the room, the next most important thing that matters for customers is photos. Using Spark and other big data technologies Hotels.com has achieved a major breakthrough in analyzing the photos by detecting duplicate photos for a particular hotel, identifying the photos based on what they show – restaurant or a lobby and how the images need to be ranked in gallery. With the use of data science and machine learning, the company will soon be able to present highly personalized results and on-screen presentations to each customer individually based on their choice and preferences.
(Source : https://www.itnews.com.au/news/hotelscom-uplifts-big-data-to-get-closer-to-customers-464688 )
Cray Moves to Lasso 'Big Data Deluge'.eetimes.com, June 20, 2017.
Cray Inc. launched a downloadable free analytics software suite for Intel’s Xeon powered high-end Urika-XC line of supercomputers.This software suite cuts the big data deluge using open source analytics, artificial intelligence (AI) and deep learning.The main components of this analytics software suite include Cray;s own graph engine, Apache Spark framework , BigDL deep learning framework for Spark, Dask parallel computing libraries for analytics and popularly used programming languages like Scala, Java, Python and R. All these components are free but Cray will provide complete support for the analytics software suite which will include a subscription that will handle maintenance, updates and required technical support.
(Source : http://www.eetimes.com/document.asp?doc_id=1331917 )
Cray adds big data software to its supercomputers. NetworkWorld.com, June 30, 2017.
Cray is synonym for supercomputing and supercomputers. Cray released its new software named as Urika-XC for their XC series of supercomputers, which will merge processing and modelling of scientific and analytics data. Urika-XC is an amalgamation of tools like Spark, Cray’s graph analytics engine, Intel’s BigDL library, Python, Maven etc. This new software will provide the users of XC supercomputers the power of deep learning, Urika’s graphic engine for the fastest analytics result.