News on Hadoop - December 2017
Apache Impala gets top-level status as open source Hadoop tool.TechTarget.com, December 1, 2017.
The massively parallel processing engine born at Cloudera acquired the status of a top-level project within the Apache Foundation. The main objective of Impala is to provide SQL-like interactivity to big data analytics just like other big data tools - Hive, Spark SQL, Drill, HAWQ , Presto and others. Apache Impala puts special emphasis on high concurrency and low latency , features which have been at times eluded from Hadoop-style applications. Organizations like NYSE (New York Stock Exchange), Caterpillar and Cox Automotive have used Impala in their data architecture to separate storage management and query processing.
(Source : http://searchdatamanagement.techtarget.com/news/450431143/Apache-Impala-gets-top-level-status-as-open-source-Hadoop-tool )
4 Big Data Trends To Watch In 2018. CXOToday.com, December 4, 2017.
The rise of big data will continue to gain speed in 2018 and here are emerging big data trends for data driven initiatives in 2018 -
- Data will not be in silos anymore as hundreds of connectors have been made available by companies like Talend and Spark which makes it easy to add datasets in just hours instead of weeks and months.
- 2018 will see increased emergence of micro subscription models as tools like Cassandra, Apache Kafka make real time processing at scale possible with Google Tensor Flow and Python.
- Organizations will become future ready while continuing to utilize their legacy systems and building parallel systems which make use of the data from legacy systems to make them more efficient.
- 2018 will be the era of AI and soon people will be able to buy/sell products and services and locate or resolve problems with their voice.
(Source : http://www.cxotoday.com/story/4-big-data-trends-to-watch-in-2018/ )
Hadoop 3.0 Likely to Arrive Before Christmas. Datanami.com, December 5, 2017.
Big Data and Hadoop Developers are likely to get an early holiday present as Hadoop version 3.0 is all set to complete. With several new features like 4x enhancement in scalability and 50% increase in capacity and other exciting features like support for GPU’s and Docker, S3 compatible storage API - set to be incorporated for versions 3.1 and 3.2 in 2018. The hadoop community is giving finishing touches to Hadoop 3.0 and is all set to release it by mid of December 2017 leaving out any unforeseen occurrences.
(Source- https://www.datanami.com/2017/12/05/hadoop-3-0-likely-arrive-christmas/ )
For the complete list of big data companies and their salaries- CLICK HERE
Big Data Cloud Service streamlines Oracle Hadoop deployments. TechTarget.com, December 6, 2017.
Oracle’s Big Data Cloud Service provides organizations with a platform for quickly and easily implementing big data hadoop architecture and other open source technologies. This service makes use of Oracle’s cloud infrastructure and other technologies to setup, manage and scale hadoop clusters through a centralized server. This will eliminate the complexities involved in implementing hadoop clusters for oracle hadoop users by providing them all the necessary tools required to deploy a big data system, secure its environment and integrate it with other systems. Big data cloud service is evolving quickly and the list of supported Apache tools will keep changing over time.
(Source : http://searchoracle.techtarget.com/tip/Big-Data-Cloud-Service-streamlines-Oracle-Hadoop-deployments )
Apache Bigtop Adds OpenJDK 8 Support. i-programmer.info/news, December 11, 2017
Bigtop , an Apache Foundation project used for packaging, testing and configuring the open source components that constitute the hadoop infrastructure. The best feature of Bigtop is that users can spin up a virtual hadoop cluster using a single command.Apache released the latest version 1.2.1 that supports OpenJDK8 and a novel sandbox feature that allows hadoop developers to run big data pseudo clusters on Docker. Bigtop packages Hadoop RPMs and DEBs, so that hadoop developers can manage and maintain your hadoop cluster and it provides an integrated smoke testing framework along with a suite of 50 test files.
(Source : http://www.i-programmer.info/news/197-data-mining/11374-apache-bigtop-adds-openjdk-8-support.html )
AQR to explore use of ‘big data’ despite past doubts. Ft.com, December 12, 2017.
Hedge fund runs new experiments to see if machine learning can help them find profitable patterns. The $208 billion hedge fund group led by Clifford Asness plans to use “big data” such as satellite imaging to find if that can help it manage money. The quantitative investment group AQR is performing various experiments by parsing big datasets such as the images of shadows cast by tankers and oil wells to see if they can find any profitable patterns in the market. If the experiments bear some fruit then they would do further research on this , making it an integral part of AQR.
(Source : https://www.ft.com/content/3a8f69f2-df34-11e7-a8a4-0a1e63a52f9c )
Apache Hadoop 3.0 is here. Jaxenter.com, December 15, 2017.
Apache Hadoop continues to evolve with version 3.0 released on December 14, 2017 that will accommodate various workloads other than just batch analytics, in particular real-time queries and long-running services. Some of the major enhancements that have been added to Hadoop 3.0 include -
- Hadoop shell scripts have been rewritten
- Hadoop JARS have been compiled to run in Java 8.
- Additional support has been added for the native implementation of map output collector which will enhance the performance by 30% or more for shuffle intensive jobs.
- The new version of hadoop adds a new client API and hadoop client runtime facts which shade hadoop dependencies into a single jar.
(Source : https://jaxenter.com/hadoop-3-0-is-here-139861.html )
GiniMachine Brings Credit Scoring Into The Age Of Big Data. Benzinga.com, December 19, 2017
Traditional credit scoring is limited as the decision of the lender solely relies on the information from credit bureaus which miss out several deserving borrowers or small startups. GiniMachine , an artificial intelligence based credit scoring platform makes use of advanced machine learning algorithms that lets lenders build, validate and deploy high-performing risk models in just few minutes without requiring them to have expertise in the field of math, statistics and machine learning. This AI system analyses various parameters which otherwise are ignored by traditional scoring systems. One can build scorings models for different types of credit products, diverse markets and operations and various customer bases by taking in parameters like age, income, occupation , and more.
(Source : https://www.benzinga.com/fintech/17/12/10929639/ginimachine-brings-credit-scoring-into-the-age-of-big-data )
Why Hadoop in the Cloud Makes Sense ? CIO.in, December 20, 2017.
With increasing demand to store, process and manage large datasets, it is becoming important for companies to install and run hadoop clusters. To do this, companies have adopted an on-premise installation of hadoop but soon there is going to be a change in this trend as Hadoop becomes available through various cloud platforms. Moving hadoop workloads to the cloud makes sense and is beneficial to organizations in the following ways-
- The ability to spawn nodes on the fly and paying only for the duration of their use definitely makes more senses instead of companies having to invest in software, hardware and manpower to manage hadoop clusters on-premise.
- Finding hadoop administrators is not easy making it sensible for organizations to move their hadoop workloads from on-premise to cloud devoid of having to hunt for skilled manpower.
- There are several leading cloud service providers that offer strong cloud options for hadoop implementation at economical cost.
(Source : http://www.cio.in/opinion/why-hadoop-cloud-makes-sense )
Apache Software Foundation Sets Hadoop Sights Higher for 2018. ITBusinessEdge.com, December 21, 2017.
With Hadoop 3.0 version shipped recently, hadoop distribution providers are all set to provide support for a number of classes of application workloads in 2018. The two important enhancements to Hadoop 3.0 include the ability to run big data applications incorporating deep learning and machine learning algorithms on GPUs and field programmable gate arrays.It might take some time for all the tooling to settle in an enterprise setting and become compatible with Hadoop 3.0. However, once it is done there would be diverse applications running on top of hadoop based data lakes that will expand well beyond traditional batch analytics.
(Source : https://www.itbusinessedge.com/blogs/it-unmasked/apache-software-foundation-sets-hadoop-sights-higher-for-2018.html )