News on Hadoop - February 2018
Kyvos Insights to Host Webinar on Accelerating Business Intelligence with Native Hadoop BI Platforms. PRNewswire.com, February 1, 2018.
The leading big data analytics company Kyvo Insights is hosting a webinar titled “Accelerate Business Intelligence with Native Hadoop BI platforms.” on February 7, 2018 at 10 AM PST. The webinar will address examples from the many organizations that depend on Kyvos and also the data compiled by Forrester Research. The webinar will feature Forrester Research VP, Principal Analyst Boris Evelson as the guest speaker who will show how a Hadoop BI platform can help organizations glean valuable insights from data that drive realistic business outcomes.
(Source : https://www.prnewswire.com/news-releases/kyvos-insights-to-host-webinar-on-accelerating-business-intelligence-with-native-hadoop-bi-platforms-300592068.html )
Three ways to prepare your mainframe files in Hadoop.Lemagit.fr, February 6, 2018.
Hadoop data lakes provide a new retreat to historical data which still has analytical value,but the challenging aspect to make use of this data is its migration to large data environments for easy access and analysis. In most cases the historical data is stored in mainframe files such as COBOL, VSAM, an\nd IMS files. To migrate heritage data to a hadoop based data lake, the various target data format options should be considered based on the use case. The three different way to convert mainframe files to formats which can support extensive analysis -
i) SQL Based Storage - Exploiting the SQL data engines like Hive, Spark SQL, Impala that are superimposed on Hadoop.
ii) File to File Transformation - The original files are transformed into a modern format file such as ASCII and the original data instances are stored in the new files.
iii)Storage of Individual Objects - Every instance of data should be transformed into its own object using JSON or XML format. This will render better flexibility for analyzing hadoop datasets because objects stored in clustered architecture make way for an ideal environment to run Spark or Hadoop MapReduce programs.
(Source : http://www.lemagit.fr/conseil/Trois-facons-pour-preparer-des-fichiers-mainframe-a-Hadoop )
For the complete list of big data companies and their salaries- CLICK HERE
How Erasure Coding Changes Hadoop Storage Economics.Datanami.com, February 7, 2018
Erasure coding has been introduced in Hadoop 3.0 that lets users pack up to 50% additional data within the same hadoop cluster. This feature is definitely a great boon to hadoop users who are struggling with data storage but this additional storage boost comes at the cost of greater CPU and network overhead.Experts This will require hadoop users to obtain a balance on the demands of cost and performance. Considering from a network-bandwidth availability, small hadoop clusters would be better suited for erasure coding i.e. bulk of cluster with 10 to 50 nodes as most of them would use a single switch and plenty of network bandwidth would be available from node to node. Companies that want to adopt erasure coding in particular those that run larger clusters of more than 100 nodes need to choose their use cases carefully based on the computation
(Source : https://www.datanami.com/2018/02/07/erasure-coding-changes-hadoop-storage-economics/ )
Alibaba Throws $486 Million Behind Big Data.Footwearnews.com, February 8, 2018.
Alibaba is investing $486 million in a chinese big data firm centered around hotel , catering and retail industries.Alibaba is all set to buy a 38% stake in Beijing Shiji Information Technology Co. Ltd. The e-commerce giant is shifting into a “New Retail” strategic cooperation and plans to leverage big data as part of a bigger push to restructure the retail market.Jack Ma also announced a plan to spend 15 billion dollars in research and development in 3 years with the target of serving 2 billion customers creating 100 million open jobs in the next 2 decades.
(Source : http://footwearnews.com/2018/business/technology/alibaba-invests-millions-big-data-china-492811/ )
LinkedIn open-sources Dynamometer for Hadoop performance testing at scale.Venturebeat.com, February 8, 2018
In 2015, the professional social networking company LinkedIn added 500 machines to its HDFS cluster to enhance the performance.However, the team at LinkedIn ran into a bug resulting in jobs targeting the hadoop cluster to time out.LinkedIn released an open source tool Dynamometer named after a tool used to test cars which will help enterprises stress-test large scale Hadoop systems without using huge infrastructure. This tool simulates large-scale hadoop clusters with only 5% of the actual underlying infrastructure and helps developers test software at scale.Dynamometer helps customers test the same kind of workloads as they see in production and make sure that the system can withstand any software updates.
(Source - https://venturebeat.com/2018/02/08/linkedin-open-sources-dynamometer-for-hadoop-performance-testing-at-scale/ )
Ebates Migrates the Data Analytics Stack: SQL Server to Hadoop.InformationWeek.com, February 14, 2018.
Online Consumer coupon and incentives website needed to move from its SQL server to systems with greater processing capacity. The question was on how to do it. The company initially thought of opting for a bigger machine but there was not one. After which Ebates chose to run its Proof of Concept on the tiny toy elephant Hadoop. Hadoop cluster POC was able to punch the SQL server in just few month which was enough to convince the company on the idea of migrating the data into a hadoop cluster operationally as it provided some unexpected benefits.Ebates created the hadoop cluster on 16 Dell machines and built out ETL and reporting on this new system.
(Source : https://www.informationweek.com/big-data/big-data-analytics/ebates-migrates-the-data-analytics-stack-sql-server-to-hadoop/d/d-id/1331047? )
Will HarperDB Replace Hadoop In The Near Future?AnalyticsIndiaMag.com, February 21, 2018.
HarperDB, a big data software founded in 2017 integrates the functionality and support of both SQL and NoSQL databases on a single platform. This dual functionality will provide NoSQL without impacting the SQL components like math functions, SQL joins and multiple operators.HarperDB mainly focuses on large datasets without worrying much about programming languages. HarperDB is referred to as “exploded data model” as it is a single model built to satisfy both SQL and NoSQL criteria. In HarperDB model, SQL query or JSON entity is integrated to form an index table which eliminates the need to assign foreign keys thereby reducing the disk footprint.HarperDB is a new entrant in the big data space and might take a while to set a strong foothold. We need to wait and watch if it has the ability to replace the dominant big data frameworks such as Apache Hadoop and Spark in the analytics industry.
(Source : https://analyticsindiamag.com/is-harperdb-the-new-hadoop/ )
$87.14 Bn for Global Hadoop Market by 2022.Newsofsoftware.com, February 23, 2018.
According to Zion Market Research report, the global hadoop market valued at 7.69 billion USD in 2016 and is anticipated to reach 87.14 billion USD by 2022 growing at a compound annual growth rate of 50% between 2017 and 2022. The major participants in the Hadoop market include AWS, IBM, Teradata, Cloudera, Hortonworks, Datameer, Oracle, VMWare and others. The complete hadoop market report available on Zionmarketresearch.com provides a decisive view on the market by classifying it based on the basis of type, end users and regions.
(Source : http://newsofsoftware.com/2018/02/87-14-bn-for-global-hadoop-market-by-2022/ )