News on Apache Spark - June 2018
H2O.ai Shares Advancements for H2O Sparkling Water at Spark + AI Summit 2018.PRNewswire.com, June 4, 2018.
The open source leader in AI, H2O.ai discussed about its latest innovations in H2O Sparkling Water, API for Apache Spark at the Spark +AI summit. Sparkling Water makes the best use of Apache Spark features - RDD’s , elegant API’s and multi-tenant context. These spark features are coupled with the power of H2O’s visual intelligence capabilities such as columnar-compression, speed and fully-featured machine learning algorithms.The machine learning algorithms benefit from H2O MOJOs (Model Object Optimized) , a concept that is shared across the entire H2O platform for sharing and exchanging models with emphasis on scoring speed, exchangeability, traceability and backward compatibility. This solution provides greater flexibility when it comes to finding the best algorithm and lets enterprise customer make use of H2O algorithms along with MLlib algorithms on Apache Spark.
(Source - https://www.prnewswire.com/news-releases/h2oai-shares-advancements-for-h2o-sparkling-water-at-spark--ai-summit-2018-300658797.html)
If you would like more information about Apache Spark Training and Certification, click the Request Info button on top of this page.
Project Hydrogen Unites Apache Spark with DL Frameworks.Datanami.com, June 5, 2018.
The creators of Apache Spark today announced a new endeavour with an objective to eliminate the barriers that prevent organizations from using Spark with other deep learning frameworks such MXnet and TensorFlow. Apache Spark has no doubt replaced the Hadoop stack but the elegance of Apache Spark framework breaks down when it is coupled with other distributed machine learning frameworks into the loop. With data science emerging at a rapid pace, most of the data scientists want to explore the capabilities of various deep learning frameworks like MXnet, Keras, TensorFlow , and others. However, Apache Spark does not gel well with deep learning frameworks, Project Hydrogen presents a possible solution to this problem by providing a new scheduling primitive referred to as the Gang Scheduler which addresses the dependencies introduced by the deep learning schedulers.
(Source - https://www.datanami.com/2018/06/05/project-hydrogen-unites-apache-spark-with-dl-frameworks/ )
A $940 million startup that VCs initially rejected is trying to do for pharma what it did for Netflix.ConsultantsInsider.com, June 6, 2018
The 5 year old data crunching startup , Databricks is worth $940 million after its big data framework Apache Spark gained popularity and increased adoption from enterprises like Netflix to Shell. With its new initiative, it aims to provide similar kind of a personalized experience that it provides to companies like Netflix however the new tool will crunch genetics data for pharma companies.The first client to use this tool is a New-York based company Regeneron that manufactures popular drugs for eye and skin conditions. The company has a sizable genetics database of anonymized information from more than 300, 000 people. The data along with Databricks new platform will let the company speed up its drug discovery and development process which would not have been possible before.
(Source- https://consultantsinsider.com/articles/A-940-million-startup-that-VCs-initially-rejected-is-trying-to-do-for-pharma-what-it-did-for-Netflix--5b18050fd1c64b1e338d3a3f )
Databricks Helps Turn Clinical and Genomic Big Data into Insights to Improve Patient Lives.BusinessWire.com, June 6, 2018.
Databricks unveiled the Unified Analytics Platform for genomic data processing, AI and tertiary analytics at the Spark+AI summit, an annual gathering of 4000 data engineers, analytics leaders and data scientists. The unified analytics platform will speed up the discovery of critical medical treatments and help healthcare and life sciences organizations to make advancements in personalized medicines and discover new medical treatments . Using this platform, healthcare organizations can now process and analyze large scale genomics data up to 100 times faster than existing solutions to foster critical research.
(Source -https://www.businesswire.com/news/home/20180606006101/en/Databricks-Helps-Turn-Clinical-Genomic-Big-Data )
Apache Spark Market Size, Share, Growth, Analysis, Forecast to 2025.thefreenewsman.com, June 15, 2018.
Apache Spark market is highly competitive and characterized by continuous changes in customer requirements, technology, industry standards and novel product enhancements. North America held the highest market share of 48% in terms of revenue of the overall apache spark global market. The North America segment is anticipated to grow at the rate of 32.9% throughout the forecast period. According to a study, North America generates 50% of the global data and is one of the fastest growing regions adopting Apache Spark. The other key highlights from the global apache spark market report 2017-2025 are as follows -
- Data Tier greater than 10PB segment is expected to grow at a compound annual growth rate of 36% during 2018-2025.
- UK held the highest market share of nearly 30% for the year 2017 in Europe region.
- Asia-Pacific is anticipated to grow at the highest CAGR of of 36.4% during 2018-2025.
Apache Spark and Big Data: What's Ahead.TDWI.org, June 22, 2018.
With increasing adoption of Apache Spark in the industry as big data grows, here are five big data trends that deserve attention -
i) The shift from storage to computational power
Apache Spark is at the center of smart-computation evolution because of its large-scale, in-memory data processing. Apache Spark will see a significant growth particularly in highly competitive business domains such as manufacturing, pharma, and finance.
ii) Better and Improved Cloud Infrastructures
Organizations are using Apache Spark to leverage their rapid innovation cycles from the open source community. It is faster to upgrade to latest versions of the software in the cloud than on any on-premise implementations.Cloud infrastructure has improved over the period of time with investments from Microsoft, Amazon, and Google making it easier for enterprises with large data volumes to adopt a complete cloud-based Spark implementation resulting in widespread adoption of Apache Spark.
iii) Improved Security and Governance Models
With increasing adoption of Spark, the availability of enterprise- grade security and data governance frameworks will increase that would attract the most conservative business domains such as insurance and finance to adopt spark.
iv) The advent of Big Deep Learning Library by Intel
The advent of the deep learning library has paved way for completely a new set of users and business use cases that span across deep-learning landscape.
v) Increasing demand and popularity for Python and Spark
Big data developers and data scientists have adopted Python as the go-to language for programming with Spark as code readability, maintainability and familiarity are better with Python.
(Source - https://tdwi.org/articles/2018/06/22/ta-all-apache-spark-and-big-data-whats-ahead.aspx)
Databricks Partners with RStudio To Increase Productivity of Data Science Teams.InsideBigData.com, June 29, 2018.
Databricks announced its partnership with RStudio to enhance the productivity of data science teams. This collaboration will let the two companies integrate Databricks Unified Analytics Platform with RStudio server to simplify R programming on big data for data scientists. This collaboration will remove all the major roadblocks that put a fullstop to several R-based AI and Machine Learning projects. This collaboration will help data science teams in the following ways -
- Provide simplified access to large datasets as all datasets will be accessible in the Unified Analytics Platform and data scientists can work on the code in RStudio.
- Data scientists will be able to use famiair tools and languages to execute R jobs resulting in enhanced productivity among the data science teams.
- This partnership will provide data scientists with the ability to auto-scale cloud based clusters to handle jobs whilst keeping the overall TCO low.
(Source - https://insidebigdata.com/2018/06/29/databricks-partners-rstudio-increase-productivity-data-science-teams/ )