Microsoft's cloud-based Azure Data Lake will soon be available for big data analytic workloads. Azure Data Lake will have 3 important components -Azure Data Lake Analytics, Azure Data Lake Store and U-SQL. Data will be stored in HDFS compatible Azure Data Lake store and processed using the Azure Data Lake analytics components.U-SQL component will allow users to query data.
There are several tools in the Hadoop ecosystem to handle and analyze data, but most of them focus only on a single part of the large process. Organizations need to review the complete analytics pipeline to make it better. The most common reason for the failed Hadoop implementations is lack of personnel with right big data and hadoop skills. It takes very long time to manually code process and integrating it with the existing systems. Pentaho published a whitepaper titled “Hadoop and the Analytic Data Pipeline” that highlights the key categories which need to be focused on - Big Data Ingestion, Transformation, Analytics, Solutions.
(Source: http://www.techrepublic.com/resource-library/whitepapers/hadoop-and-the-analytic-data-pipeline/ )
If you would like more information about Big Data Training, please click the orange "Request Info" button on top of this page.
Trifacta, the global leader in data preparation and data wrangling launched its new offerings Wrangler Edge for analyst teams wrangling with data outside big data environments.More than 4000 companies across 132 countries use Wrangler to explore, transform and join diverse datasets.Wrangler Edge is an enhanced version of the free edition that allows broader connectivity, huge data volumes, support for multiple users with the ability to schedule and operationalize workflows.
Apache Hadoop is based on Java and it is an open source software. It is used to run applications on large clustered hardware (servers). Hadoop is to increase from a single server to thousands of machines, with a very high improvement on fault tolerance. To handle the failures, rather than relying on high end hardwares, cluster softwares help detect the faults.
Hortonworks valuation was in a death spiral dropping down from $1.1 billion to less than $400 million over the year because of deceptive profitability and missed revenue. Hortonworks delivered better than the expected revenue in Quarter 3 but missed on achieving the targets. However, it expects to surpass analyst expectations in Quarter 4.
(Source: http://www.techrepublic.com/article/hortonworks-breaks-free-from-death-spiral-but-cloud-is-still-a-threat/ )
Hadoop standardisations body -Open Data Platform Intiative (ODPi) released its latest version 2.0 that supports updated specifications for runtime to the Hadoop interoperability program.Charaka Goonatilake, CTO at Panaseer, the company that offers Hadoop-based security analytics software, said he admired "the community effort to drive standardisation as it's sorely needed in the heavily fragmented Hadoop industry."
(Source : http://www.theregister.co.uk/2016/11/14/odpi_20/ )
Demand for Spark and cloud services is increasing as the demand for Hadoop services slows down. Hadoop was designed to be the next big thing moving into enterprises recently, but very soon it has reached to “peak Hadoop”. Hadoop has been killed by cloud in the enterprise IT or rather we can say that Hadoop has been moving so fast that it killed itself. Hadoop has not been completely fallen but it is still on the falling side. As Hadoop's data management capabilities are not yet being matched by Spark or any other big data cloud services, enterprises will end up using pieces of Hadoop a little.
AtScale has made its path by providing an access layer on top of Hadoop so that it can be used directly as a data warehouse. AtScale is now offering Unified Analytics Platform with the support for Teradata Data Warehouse, Google Dataproc and BigQuery. Through the Unified Analytics Platform, AtScale provides a vendor- neutral middleware to access data stored in Hadoop over SQL and MDX. The new analytics platform architecture is based on three main pillars - DESIGN, CACHE, QUERY.
For customers who do not like working on Microsoft Azure, Hortonworks has launched a new cloud service on AWS. Users of HDP can quickly setup a hadoop cluster in the cloud with various billing options available on AWS marketplace. The new Hortonworks Data Cloud for AWS will provide customers with a rigid experience for many of the Apache Spark, Apache Hive and Hadoop use cases through hourly and annual billing options and also rendering complete community support.
AtScale provides multidimensional analysis and interactive capabilities directly on big data with maximum speed.AtScale recently declared the expansion of its services. AtScale will provide modern businesses with a BI platform both on premise and in the cloud. Along with hadoop, AtScale has announced preview availability of support for data stored in Google Dataproc, BigQuery and Teradata.
(Source : http://www.atscale.com/press/atscale-5-0-delivers-industrys-first-modern-business-intelligence-platform-enables-bi-hadoop-big-data-premises-cloud/)
Microsoft recently announced that its Azure Data Lake is production ready. It is empowered with Microsoft's 99.9 percent service level agreement. Azure can process structured and structured data with no limits. It took a year to complete the production. Azure Data lake is being marketed as Big Cognition. According to Microsoft Channel 9 presentation, Azure data lake services have got 3 components -HDInsight, Data lake store and finally new data lake analytics. The Azure data lake is based on open Apache HDFS.