MongoDB Inc. has announced a new connector for Apache Spark – that will allow Spark developers and data scientists to use its database to work with rapidly moving data. Kelly Stirman, MongoDB’s VP of Strategy has said that MongoDB users have expressed great interest to work with the Spark connecter. MongoDB has just taken its Apache Hadoop connecter and enhanced it for Spark.
A new tool Sparkling Water 2.0 created by the startup H20.ai (earlier known as Oxdata Inc.) provides an open source platform for algorithmic development. Sparkling Water 2.0 makes the use of machine learning algorithms during data analysis easier. Instead of using Apache Spark’s machine learning library MLlib, Sparkling Water 2.0 application programming interface allows users to tap into H2O’s AI platform. The tool allows users to make the best use of Spark features along with its own columnar compression, fully featured machine learning algorithms and speed.
For the complete list of big data companies and their salaries- CLICK HERE
Apache Spark has become extremely popular since its launch in 2012. Since last year it has gained momentum in enterprise adoption. But for Apache Spark, all support is not the same. Customers should look at 4 main facets before using Spark libraries. How Spark is used in the platform, what is available in the Apache Spark package, how everyone in the team is exposed to Spark and how to perform analytics with the various libraries in Spark.
Splice machine which is an open source RDBMS, powered by Hadoop and Spark, today, and announced its new open source Sandbox for the use of developers. This new open source Sandbox 2.0 community edition is not up for test in AWS.
Enrol now for Apache Spark Training and Become a Certified Spark Developer!!!
Hayden Schultz, the global architect for TIBCO talks about how to bring technology that is making waves in the industry, closer to the understanding and usage of the customer. TIBCO is all about building an application to boost a technology’s core feature. In the case of Apache Spark, the application will help build accelerated systems on top of Apache Spark to stimulate big data solutions.
Apache Spark 2.0 is now available to users on the Databricks platform. Spark 2.0 is 5 to 10 times better in performance when compared to Spark 1.6 with support for applications requiring structured streaming. Tungsten's Phase 2 whole-stage-code generation and Catalysts code optimization adds on to the enhanced speed of Spark 2.0. The latest releases comes bundled with many novel features like - Machine learning model persistence, DataFrame-based machine learning APIs, standard SQL support, etc.
Machine learning might seem to be futuristic, however, it is not. Apache Spark’s scalable machine learning library MLlib is making machine learning easy for machine learning engineers and data scientists. MLlib library does not only fits models but can also be used for various staging transformations like data collection, data labelling, feature extraction ,model tuning, model evaluation and deployment. MLlib library together with other Apache Spark components provide a unified solution to data scientists under a single big data framework.