Recap of Apache Spark News for April 2018

Apache Spark Monthly News Update -Learn what happened in the Apache Spark Community in the month of April 2018.

Get access to all Big Data Projects View all Big Data Projects

Recap of Apache Spark News for April 2018

Last Updated: 14 Feb 2024 | BY ProjectPro

News on Apache Spark - April 2018

Apache Spark News 2018

Databricks to Host AI Thought Leaders at Spark + AI Summit 2018. Globenewswire.com, April 4, 2018.

The provider of the leading Unified Analytics Platform, Databricks is hosting a Spark + AI summit conference in San Francisco from June 4-6, 2018. The conference will have Marc Andreessen as the keynote speaker who will participate in Fireside Chat with Databricks Co-founder and CEO, Ali Ghodsi. The Spark + AI summit is a leading event for data engineers, data scientists, and business professionals to converse on hot topics in analytics, big data and practical applications of AI.The agenda also has a talk featured by Matei Zaharia, co-founder of Databricks on why large-scale data processing is important to AI applications and how Spark enables data and AI projects.

(Source : https://globenewswire.com/news-release/2018/04/04/1460213/0/en/Databricks-to-Host-AI-Thought-Leaders-at-Spark-AI-Summit-2018.html )

MapR Introduces New Capabilities to Build Real-Time Streaming and Global IoT Applications.martechadvisor.com, April 6, 2018

MapR Technologies announced the incorporation of some breakthrough capabilities to its data platform across every cloud. The new enhancements to the platform will help build , powerful, real-time streaming and global IoT applications. The novel enhancements to the MapR Converged Data Platform release 6.0.1 and MapR Expansion Pack 5.0 include Event Streams (MapR-ES), Apache Spark and Apache Drill release 1.13.These enhancements will support streaming pipelines which can stretch across millions of endpoints whilst also providing support for rich analytics that can be used to divide and aggregate streams.

(Source : https://www.martechadvisor.com/news/bi-ci-amp-data-visualization/mapr-introduces-new-capabilities-to-build-real-time-streaming-and-iot-applications/ )

Immuta Introduces Apache Spark Ecosystem Support and Automated Governance Reporting for Data Science Programs.BusinessWire.com, April 9, 2018.

Immuta unveiled novel features of its data management platform that includes native Apache Spark SQL policy enforcement and automated governance reporting.These latest additions in Immuta v2.1 will allow organizations to process, secure and audit on massive scale in Apache Spark while providing greater visibility and control on how data is being used with integrated compliance reporting. The latest release extends Immuta’s powerful capability for the Apache Spark ecosystem for large scale processing with native policy enforcement (policies include time windowing, minimization, purpose limitation, dynamix row and column level controls, and automated differential privacy) within spark.

(Source : https://www.businesswire.com/news/home/20180409005470/en/Immuta-Introduces-Apache-Spark-Ecosystem-Support-Automated )

Prostate cancer: Big data unlocks 80 new drug targets.MedicalNewsToday.com,April 17, 2018.

An international team used double -pronged approach of big data and DNA analysis to dig deep into the genetics of prostate cancer.Prostate cancer is among one of the most common types of cancer in men in US with an average of 164,690 cases of prostate cancer and approximately 30,000 deaths to the disease.The researchers took data from 112 men with prostate cancer and combined with data from other studies. A total of 930 patients data was used as a sample for analysis.With the help of latest big data methods , the team gathered novel insights into genetic changes the result in the development and progress of prostate cancer. Having understood the genes involved, they were able to create a map of the proteins that can be coded by these genes.Scientists have found that 80 of the proteins that they have uncovered were potential drug targets and 11 of them were targeted by existing drugs and 7 could be targeted by drugs already present in clinical trials.

(Source -https://www.medicalnewstoday.com/articles/321511.php )

Big Data Information Access :Spark, Presto and Apache Hive (Oh My), channele2e.com, April 18, 2018.

A recent research from “Big Data-as-a-service” company Qubole found that more than three-quarters of organizations depend on multiple open source big data frameworks to glean insights from data.Apache Spark and Presto are among the fastest growing open source big data engines. The findings show that Apache Spark has grown by 365% in the total number of commands run whereas Presto has increased to 420% in its compute hours. Another major big data engine that also witnessed growth is Apache Hive with number of commands increasing by 129% over the year.The study also revealed the number of users accessing each of the big data platforms.Apache Spark and Hadoop saw 171% and 136% increase in users running commands on the platform respectively.

(Source https://www.channele2e.com/software/big-data/spark-presto-apache-hive/ )

Here are five key things happening in the global Big Data market.YourStory.com, April 19,2018.

According to IDC, big data revenues are expected to cross $187 billion in 2019 as the amount of data continues to double every two years and by end of 2020, the universe will have 4 trillion gigabytes in data. Here are 5 interesting big data trends across the worldwide big data market observed in the Big Data Activation Report 2018 from Qubole -

There is widespread adoption of big data tools. 73% of the organizations use atleast three big data open source engines with the most popular one’s being Apache Hadoop/Hive, Apache Spark and Presto.
58 million commands were processed by the three popular big data engines Hadoop/Hive, Presto, and Apache Spark.
There are several new big data tools gaining adoption and nearly 30% of the organizations have started using these new big data tools such as Apache Airflow.
Increased productivity and automation -Data-driven organisations are focussing on optimising the number of users running commands in each engine.For large-scale implementations there are 188 users per engine while for small-scale implementation there are 16 users per engine.
There is lack of data science skills. The hiring trend for data scientists has grown over 650% since 2012.

(Source - https://yourstory.com/2018/04/five-key-things-happening-global-big-data-market/ )

ProjectPro

ProjectPro is the only online platform designed to help professionals gain practical, hands-on experience in big data, data engineering, data science, and machine learning related technologies. Having over 270+ reusable project templates in data science and big data with step-by-step walkthroughs,

Meet The Author