Apache Spark is becoming the execution engine in Hadoop. So much so – that enterprises are second to none in adopting Spark for building data driven applications. Databricks has launched new APIs to automate their Spark infrastructure. While the DevOps team work with command line APIs to automate the infrastructure, Data Scientists need a more visual platform to run algorithms. There is no unifying structure to bring the different workflows of these two teams together.
If you would like more information about Big Data careers, please click the orange "Request Info" button on top of this page.
Holden Karau, principal software engineer of Big Data at IBM launched her book on “High Performance Spark” with her co-author Rachel. The first four chapters of the book have been released highlighting -Introduction to High-Performance Spark, How Spark Works, DataFrame’s, Datasets and Spark SQL and Joins (SQL and Core).Addressing the release of the first few chapters of the book, Karau says that there are many exciting things happening with Apache Spark 2.0 as it moves from RDD to dataset model. This will help business analyst’s work easily with Spark and productionize it with the help of data engineers.
For the complete list of big data companies and their salaries- CLICK HERE
Cloudera announced the general availability of Cloudera Enterprise 5.7. This release is set to improve performance across key workloads and operations. It is set to improve data processing by 3x times with the added support of Hive-On-Spark. Apache Spark is playing a very important role in this new release and is set to replace MapReduce.
IBM’s new z/OS platform for Apache Spark will make the lives of data scientists and developers easier – by giving real-time, secure access to Mainframe data. IBM is holding up its commitment to Spark, which was made last year of dedicating 3500 IBM researchers and developers to work on Spark projects. As a part of that endeavour, z Systems at IBM have also established a GitHub organization for developers to collaborate on Apache Spark.
As the interest in Apache Spark continues to grow, Think Big has incorporated Spark in its big data frameworks for developing enterprise quality data lakes and big data analytic applications. Its customers can now make use of Apache Spark framework in the cloud, on general commodity built hadoop environments, optimizing them to run enterprise class mission critical big data workloads.
IBM z Systems mainframe with the novel z/OS platform for apache spark is meant to ease and speed up data analysis so that data scientists and big data developers can apply advanced analytics to large data sets to glean real-time insights. IBM’s new z/OS platform for apache spark runs on z/OS mainframe operating system enabling data scientists to analyse the data on the system origin without have to extract, transform or load it.
Apache Apex, an open-source batch and stream processing platform compatible with HDFS and YARN. This new Apache project meets the big data needs of enterprises for real-time reporting, monitoring and learning with millisecond data point precision. Apex might seem similar to other open-source data frameworks like Apache Spark, Storm or Samza but it is likely to rival all these frameworks on usability features.
Pivotal launched a new database solution powered by in-memory transactional data store Gemfire and Apache Spark called Snappydata. Snappydata uses spark in-memory data analytics engine or perform live SQL analytics on static data streams or data sets. Users can write queries either in SQL or as spark abstractions. Snappydata extends the features of Spark streaming in various ways by allowing users to manipulate and query data streams as if they were tables.