Recap of Apache Spark News for December 2017

Recap of Apache Spark News for December 2017

News on Apache Spark - December 2017

Apache Spark News December 2017

How CardinalCommerce grew its big data analytics capabilities., December 1, 2017.

The Mentor, Ohio based company CardinalCommerce Corp. purchased by Visa in 2017 generates huge amounts of  financial transaction data and gleaning valuable insights from the data always been a top priority and central challenge. They build a small spark cluster on premises  for computing basic data processing tasks,, like getting data back from CardinalCommerce’s payment platform for reporting purpose. Later, they moved Spark workloads to Amazon EMR  big data service in cloud which gave them extra flexibility to scale Apache Spark for larger workloads as required. The lesson CardinalCommerce learnt by doing big data analytics with Spark is that it sparked a demand for data in the organization and increased number of in-house users engaged with the analytics process.

(Source - )

Learn Spark Online

If you would like more information about Apache Spark Training and Certification, click the Request Info button on top of this page.

AMD scores EPYC gig powering new Azure instances., December 5, 2017.

AMD has acquired a  place in the top tier cloud as it won Microsoft's business for the next generation of Azure L-series instances. AMS’s 32 core, 2.2 GHz EPYC 7551 will have a new Lv2 Azure instance type that is optimised for storage and as well workloads like Apache Spark. AMD regarded this win as a triumph to showcase that it is a force in the server CPU market again.

(Source : )

A Decade into Big Data.,December 11, 2017.

The first data oriented approach came in 2008- Hadoop , born out from a Google research paper started in 2006 and since then the big data evolution has been taken by a storm. Hadoop turned 10 in 2016. Another important contributor to the big data world has been Apache Spark that solved performance limitations and increased costs associated with using disk based storage approach in hadoop. With the advent of Spark, there was a huge shift from batch processing to real-time and event processing. Big data has moved through various stages -right from Hadoop era to spark and then to data lake and data fabrics. Data enthusiasts are wondering what’s next in big data evolution.As the use of sensors and other related technologies evolve , the data will be streamed into big data clusters for real time analysis .As increasing number of companies adopt data science and machine learning,Apache Spark machine learning and Google Tensorflow will be the blockbuster machine learning tools for predictive analytics and deep learning.

(Source : )

Data Science Projects

Accelerite takes single-pipeline approach to data transformation and, December 12, 2017.

Cloud Management vendor Accelerite released ShareInsights 2.0 , an end-to-end self-service analytics platform for data preparation, data visualization, collaboration and online analytical processing, all from a single user interface. Accelerite’s platform prepares and queries terabytes of data in minutes. ShareInsights 2.0 runs on top of a Hadoop cluster and leverages existing Apache Spark instances for predictive analytics and machine learning tasks.The platform has 50+ connectors to connect with other data sources and 100+ analytical widgets for performing simple tasks like aggregation to complex ones like machine learning.

(Source :  )

Big data delivers higher revenue and faster growth.,  December 19, 2017

60% of the organizations that adopt big data mention improved efficiency and increased productivity as one of the biggest gains with using big data.In 2014, Hadoop and Spark had high interest but low adoption but now 70% of enterprises are either using these big data technologies in production or have a plan to use them in future. 90% of organizations mention that moving away from legacy systems and investing in big data technologies like Hadoop and Spark has not just proved valuable in deriving meaningful insights but also helped them save money.  

(Source : )

Microsoft's cloud Big Data service cuts prices up to 52 percent., December 18, 2017.

Microsoft  announced a cut in its pricing for HDInsight (HDI), its Azure cloud- hosted big data offering based on Hadoop, Spark, Storm, Kafka, Hive and Microsoft R Server. With the intent to make the prices far more competitive than Amazon AWS and EMR, it reduced the charges for R server by 80% and for HDInsight by 52%. The service offering still remains the same with 3-nines service level agreement making Microsoft as a differentiator from its competitors.
(Source : )

53% of Companies are adopting Big Data Analytics., December 24,2017

Big data adoption has reached 53% in 2017 up from 17% in 2015. The leading early adopters of big data include telecom and financial sectors with data warehouse optimization being the top use case for it. The softwares that have gained popularity for big data are Apache Spark, Hadoop MapReduce and YARN.  30% of the organizations surveyed consider Apache Spark a critical component of big data strategies and 20% consider Hadoop MapReduce and YARN a critical component. The big data access methods that are most preferred by organizations include Spark SQL,  Hadoop HDFS, Hadoop Hive and Amazon S3 with 73% of the organizations considering Spark SQL as the key component for implementing analytic strategies.
(Source : )

Artificial Intelligence Needs Big Data, and Big Data Needs AI., December 26, 2017.

Big Data and Artificial Intelligence have formed a symbiotic relationship with each other and they need each other to reap the fruit of what they promise. Mike Manchett, senior analyst with Taneja Group who has been observing the revolution in the AI market said that Apache Spark is the Spark for AI development using big data.Artificial Intelligence is a resource intensive environment and many organizations do not have the infrastructure for this. Under such circumstances, open source tools like Apache Spark make this proposition cost effective and compelling.Apache Spark has gained widespread adoption for its in-memory, real-time processing and fast machine learning at scale.
(Source : )

 Apache Spark News



Relevant Projects

Implementing Slow Changing Dimensions in a Data Warehouse using Hive and Spark
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Real-Time Log Processing in Kafka for Streaming Architecture
The goal of this apache kafka project is to process log entries from applications in real-time using Kafka for the streaming architecture in a microservice sense.

Data Mining Project on Yelp Dataset using Hadoop Hive
Use the Hadoop ecosystem to glean valuable insights from the Yelp dataset. You will be analyzing the different patterns that can be found in the Yelp data set, to come up with various approaches in solving a business problem.

Yelp Data Processing Using Spark And Hive Part 1
In this big data project, we will continue from a previous hive project "Data engineering on Yelp Datasets using Hadoop tools" and do the entire data processing using spark.

Spark Project-Analysis and Visualization on Yelp Dataset
The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

Create A Data Pipeline Based On Messaging Using PySpark And Hive - Covid-19 Analysis
In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.

Hive Project - Visualising Website Clickstream Data with Apache Hadoop
Analyze clickstream data of a website using Hadoop Hive to increase sales by optimizing every aspect of the customer experience on the website from the first mouse click to the last.

Hadoop Project for Beginners-SQL Analytics with Hive
In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.

Web Server Log Processing using Hadoop
In this hadoop project, you will be using a sample application log file from an application server to a demonstrated scaled-down server log processing pipeline.

Real-time Auto Tracking with Spark-Redis
Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.