Hewlett Packard Labs, announced a new collaboration with Hortonworks to accelerate the growth and adoption of Apache Spark. They will focus their efforts on a new class of analytics workload which will benefit from large pools of shared memory. Scott Gnau, CTO, Hortonworks says that this collaboration is with the aim of advancing Apache Spark and its big data solutions.
Syncsort the mainframe and big data company has made it possible for companies to work on Apache Spark in the native mainframe format. This was done mainly to allow companies in industries like banking and insurance to keep their data in the native format but apply Apache Spark solutions.
Recently a report circulated that what with the adoption of better computing engines across the enterprises, Hadoop is now a dead technology. Hadoop is very much alive and kicking. Apache Spark is just a general purpose computation engine that performs data processing in-memory. If you think Apache Spark is going to replace Hadoop then you are just short -changing yourself.
Hadoop requires fast and easy to stream big data processing and Apache Flink is here to meet the demand. Apache Flink is a strong contender for Apache Spark, and has launched its first API stable version this week. Apache Spark is mostly used for in-memory processing, but it has serious limitations for incoming real time data processing. Apache Flink is looking to resolve that issue.
As Big Data matures and becomes more critical, it is important to make distributed processing available and Hadoop systems available to analyse data in a coherent way. SAP HANA Vora bring Apache Spark powered OLAP analysis solution to business analysts and data scientists to bridge the gap between structured and instructed data.
Thomas Dinsmore, the analytics software vet, has been following the progress of Spark since its first launch. Dinsmore states that Apache Spark is yet to go through an entire level of development maturity. It is not as fast a data processing software that everyone seems to think. But it does warrant attention in the IT world.
Martin Odersky, the creator of Scala programming will deliver the opening keynote address on May 9, 2016 in New York City. Matei Zaharia – creator of Apache Spark will be delivering keynote address. There will be a 3-day conference from May 9-11 in new World stages and it will be followed by hands-on-training courses in Spark and Scala on May 12 and May 13.
Apache Spark is running into speed bumps, which was expected, but as the adoption of Spark is growing, there is a need to overcome these limitations. Apache Spark can be run on Hadoop distribution, Mesos or as a standalone in cloud. Apache Spark hits a bump when run on Hadoop. This can be easily rectified by adding more hardware.
In the recent Spark Summit East 2016, in New York, it was very evident that big names are investing into this Spark technology. Last year in June, IBM announced that they would be assigning 3500 researchers to the Spark technology and will be donating their IBMSystemML – their Machine Learning technology to Spark open source.
For the complete list of big data companies and their salaries- CLICK HERE
The time taken to change raw data into meaningful information – may cost you the time in transforming a lead to sale. But very few companies are able to crack this critical juncture. Companies with a lot of resources like Netflix, Amazon, etc. have perfected real time analytics of data. But companies that are not able to afford a huge team for data analysis may rely on Spark for fast streaming and real time analytics.
A recent research report by Wikibon predicted that Apache Spark big data processing framework will constitute more than 1/3rd of the big data spending by end of 2022.From 2020 to 2022, Apache Spark will become the “design time foundation” for building predictive models through machine learning accounting for 37% of all the big data spending. With the increased adoption and evolution of Spark- this is the best time for big data professionals to master their skills in Apache Spark technology.
IBM released a new mainframe platform for Apache Spark. The new z/OS platform released by IBM will help data scientists apply advanced analytics on the big data available in the mainframe systems for valuable business insights in real-time. Data scientists need not move or download the data from mainframe anymore, with zSystems they can directly apply in-memory analytics through Spark.
Market researcher Wikibon predicted that the global market for unified streaming analytics will be majorly driven by Spark enterprise adoption accounting for 16% (about $11.5 billion) of the overall big data spending by 2020.The three key drivers for increasing demand of big data technologies are –maturing data lakes, emerging intelligent self-tuning systems and evolving intelligent systems of engagement. Spark is being seen as a crucial catalyst for driving the above three emerging trends.