The massively parallel processing engine born at Cloudera acquired the status of a top-level project within the Apache Foundation. The main objective of Impala is to provide SQL-like interactivity to big data analytics just like other big data tools - Hive, Spark SQL, Drill, HAWQ , Presto and others. Apache Impala puts special emphasis on high concurrency and low latency , features which have been at times eluded from Hadoop-style applications. Organizations like NYSE (New York Stock Exchange), Caterpillar and Cox Automotive have used Impala in their data architecture to separate storage management and query processing.
(Source : http://searchdatamanagement.techtarget.com/news/450431143/Apache-Impala-gets-top-level-status-as-open-source-Hadoop-tool )
The rise of big data will continue to gain speed in 2018 and here are emerging big data trends for data driven initiatives in 2018 -
(Source : http://www.cxotoday.com/story/4-big-data-trends-to-watch-in-2018/ )
Big Data and Hadoop Developers are likely to get an early holiday present as Hadoop version 3.0 is all set to complete. With several new features like 4x enhancement in scalability and 50% increase in capacity and other exciting features like support for GPU’s and Docker, S3 compatible storage API - set to be incorporated for versions 3.1 and 3.2 in 2018. The hadoop community is giving finishing touches to Hadoop 3.0 and is all set to release it by mid of December 2017 leaving out any unforeseen occurrences.
(Source- https://www.datanami.com/2017/12/05/hadoop-3-0-likely-arrive-christmas/ )
For the complete list of big data companies and their salaries- CLICK HERE
Oracle’s Big Data Cloud Service provides organizations with a platform for quickly and easily implementing big data hadoop architecture and other open source technologies. This service makes use of Oracle’s cloud infrastructure and other technologies to setup, manage and scale hadoop clusters through a centralized server. This will eliminate the complexities involved in implementing hadoop clusters for oracle hadoop users by providing them all the necessary tools required to deploy a big data system, secure its environment and integrate it with other systems. Big data cloud service is evolving quickly and the list of supported Apache tools will keep changing over time.
(Source : http://searchoracle.techtarget.com/tip/Big-Data-Cloud-Service-streamlines-Oracle-Hadoop-deployments )
Bigtop , an Apache Foundation project used for packaging, testing and configuring the open source components that constitute the hadoop infrastructure. The best feature of Bigtop is that users can spin up a virtual hadoop cluster using a single command.Apache released the latest version 1.2.1 that supports OpenJDK8 and a novel sandbox feature that allows hadoop developers to run big data pseudo clusters on Docker. Bigtop packages Hadoop RPMs and DEBs, so that hadoop developers can manage and maintain your hadoop cluster and it provides an integrated smoke testing framework along with a suite of 50 test files.
(Source : http://www.i-programmer.info/news/197-data-mining/11374-apache-bigtop-adds-openjdk-8-support.html )
Hedge fund runs new experiments to see if machine learning can help them find profitable patterns. The $208 billion hedge fund group led by Clifford Asness plans to use “big data” such as satellite imaging to find if that can help it manage money. The quantitative investment group AQR is performing various experiments by parsing big datasets such as the images of shadows cast by tankers and oil wells to see if they can find any profitable patterns in the market. If the experiments bear some fruit then they would do further research on this , making it an integral part of AQR.
(Source : https://www.ft.com/content/3a8f69f2-df34-11e7-a8a4-0a1e63a52f9c )
Apache Hadoop continues to evolve with version 3.0 released on December 14, 2017 that will accommodate various workloads other than just batch analytics, in particular real-time queries and long-running services. Some of the major enhancements that have been added to Hadoop 3.0 include -
(Source : https://jaxenter.com/hadoop-3-0-is-here-139861.html )
Traditional credit scoring is limited as the decision of the lender solely relies on the information from credit bureaus which miss out several deserving borrowers or small startups. GiniMachine , an artificial intelligence based credit scoring platform makes use of advanced machine learning algorithms that lets lenders build, validate and deploy high-performing risk models in just few minutes without requiring them to have expertise in the field of math, statistics and machine learning. This AI system analyses various parameters which otherwise are ignored by traditional scoring systems. One can build scorings models for different types of credit products, diverse markets and operations and various customer bases by taking in parameters like age, income, occupation , and more.
(Source : https://www.benzinga.com/fintech/17/12/10929639/ginimachine-brings-credit-scoring-into-the-age-of-big-data )
With increasing demand to store, process and manage large datasets, it is becoming important for companies to install and run hadoop clusters. To do this, companies have adopted an on-premise installation of hadoop but soon there is going to be a change in this trend as Hadoop becomes available through various cloud platforms. Moving hadoop workloads to the cloud makes sense and is beneficial to organizations in the following ways-
(Source : http://www.cio.in/opinion/why-hadoop-cloud-makes-sense )
With Hadoop 3.0 version shipped recently, hadoop distribution providers are all set to provide support for a number of classes of application workloads in 2018. The two important enhancements to Hadoop 3.0 include the ability to run big data applications incorporating deep learning and machine learning algorithms on GPUs and field programmable gate arrays.It might take some time for all the tooling to settle in an enterprise setting and become compatible with Hadoop 3.0. However, once it is done there would be diverse applications running on top of hadoop based data lakes that will expand well beyond traditional batch analytics.
(Source : https://www.itbusinessedge.com/blogs/it-unmasked/apache-software-foundation-sets-hadoop-sights-higher-for-2018.html )