Recap of Hadoop News for January 2018

Recap of Hadoop News for January 2018


News on Hadoop - Janaury 2018

Hadoop News for January 2018

Apache Hadoop 3.0 goes GA, adds hooks for cloud and GPUs.TechTarget.com, January 3, 2018.

The latest update to the 11 year old big data framework  Hadoop 3.0 allows cluster pooling on GPU resources , reduces storage requirements, and adds a novel federation scheme that lets YARN resource manager and the job scheduler expand the number of nodes which can run within a Hadoop cluster.  This new feature of YARN federation in Hadoop 3.0 will find great use in Cloud based Hadoop based applications.Carlo Aldo Curino, a principal scientist at Microsoft said that with YARN federation feature there is a routing layer that sits on top of HDFS clusters providing greater scalability in cloud platforms like Microsoft Azure.Carlo further adds that the biggest of Hadoop clusters till date  have been in low thousands of nodes, however now people need tens of thousands of nodes and running YARN federation will help them get there.

(Source : http://searchdatamanagement.techtarget.com/news/450432578/Apache-Hadoop-30-goes-GA-adds-hooks-for-cloud-and-GPUs )

Hadoop Online Training

Hadoop 3 confronts the realities of storage growth. Zdnet.com, January 3, 2018


Apache Hadoop was built around the concept of cheap commodity infrastructure a decade ago but the latest release of Hadoop i.e. Hadoop 3.x confronts the truth that too much of cheap storage can get expensive.When Hadoop was built, the idea was to provide linearly scalable parallel computing and bring it close to big data with the use of commodity hardware so that storage costs can be an afterthought for organizations. The assumption behind Hadoop’s original approach for high availability is to make data available with 3 replicas through cheap storage options.However, However, the latest release of Hadoop 3.0 unveils that too much cheap storage can actually turn out to be expensive. Hadoop 3.0 brings in erasure coding which uses RAID mechanism to reduce the data sprawl. The price for this the developers have to pay is that they would not be able to get failover access immediately since data managed through RAID approaches needs to be restored. In reality, erasure coding feature of Hadoop 3.0 will more likely be used as a data tiering strategy where data will be stored on cheaper and slower media.

(Source : http://www.zdnet.com/article/hadoop-3-confronts-the-realities-of-storage-growth/)

For the complete list of big data companies and their salaries- CLICK HERE

6 Key Future Prospects of Big Data Analytics in Healthcare Market for Forecast Period 2017 - 2026. Globalnewswire.com, January 5, 2018.

According to a report collated by Fact.MR , the big data analytics in healthcare market is expected to see an annual double digit CAGR through 2017-2026. By end of 2026, more than US$ 45,000 Mn revenue will be garnered from the sales of big data analytics in healthcare across the globe. The factmr report further highlights that big data analytics would be extensively used for cutting down on healthcare costs and boosting precision medicine research. The major takeaways from the fact.MR’s report on big data analytics in healthcare sector are -

  • Despite on-premise deployments being most sought-after , cloud based big data deployments in the healthcare market will see a faster expansion by 2026.
  • CRM will remain the go-to tool for big data analytics in healthcare market.
  • Healthcare providers will remain the dominant spenders in global big data analytics.
  • Asia-Pacific other than Japan will continue to be the fast expanding and remunerative market for big data analytics.
  • Access to operational information is expected to be the largest use of big data analytics in healthcare, in terms of revenues.

(Source : https://globenewswire.com/news-release/2018/01/05/1284203/0/en/6-Key-Future-Prospects-of-Big-Data-Analytics-in-Healthcare-Market-for-Forecast-Period-2017-2026.html )

Big Data Hadoop Projects

Could big data unlock safer commutes for cyclists?Marketplace.org, January 8, 2018

Cities across US are finding it hard to make roads safe for bikes as cycling has become a popular means of commute and recreation. City planners are looking to big data solutions so that they can plan accordingly on where they have to add additional bike lanes and routes and also correct other small scale problems like where there are problems at crossings or intersections.City planners in TExas do not have enough data about the routes cyclists are travelling so Texas has acquired 2 years worth of data from the popular cycling app Strava having 84,000 users (20% of entire biking community) for an undisclosed amount. With the data from Strava app, the Texas Department of Transportation can actually see what is being used ,instead of merely guessing on where to put the infrastructure or the bike lane.  
(Source : https://www.marketplace.org/2018/01/08/tech/could-big-data-unlock-safer-commutes-cyclists )

Hadoop 3.0 Perspectives by Hortonwork’s Hadoop YARN & MapReduce Development Lead, Vinod Kumar Vavilapalli. InsideBigData.com, January 16, 2018.

Apache Hadoop has become the go-to framework  within the big data ecosystem for running and managing big data applications on large hardware hadoop clusters in distributed environments.Hortonwork’s Hadoop YARN & MapReduce Development Lead, Vinod Kumar Vavilapalli offered his perspective on the latest release of Hadoop 3.0 in Q&A with Insidebigdata editorial team and here’s a quick summary of what he had to say. According to Vinod , the key element that distinguishes the latest release from Hadoop 2.0 is around storage. Hadoop 3.0 brings erasure coding as optional storage mechanism along with the replication based system. The other key element focuses on computation through extensible resource-types. Hadoop 3 brings machine learning and deep learning workloads to Hadoop cluster through GPU and FPGA resources.Vinod says that the major industry trends of today are cloud, machine learning and deep learning and proliferation of big data to achieve greater scalability and Hadoop 3.0 caters to most of the key big data trends of today making it the de facto distributed data-processing framework for big data.

(Source : https://insidebigdata.com/2018/01/16/hadoop-3-0-perspectives-hortonworks-hadoop-yarn-mapreduce-development-lead-vinod-kumar-vavilapalli/ )

Adobe adds Hadoop connector to Adobe Campaign.Zdnet.com, January 16, 2018.

 Hadoop is a valuable data source at Adobe but marketers have not been able to use it and only the IT and analytics team have been primarily working with it. With the intent of providing direct access to big data and hadoop for marketers, Adobe has  added a new Hadoop connector which will add more data to the Adobe Campaign , a part of company's experience cloud than it can analyse.  The addition of Hadoop connector through Apache Hive will bring in additional data sources from various POS terminals, mobile devices and kiosks that will facilitate organising various marketing campaigns across diverse channels.Hadoop can prove valuable to various industries and one major use case is retail where the unstructured data stored in Hadoop can be added to loyalty programs.

(Source : http://www.zdnet.com/article/adobe-adds-hadoop-connector-to-adobe-campaign/ )

Could 'big data' help Cleveland reduce health disparities - and create jobs?Cleveland.com, January 21, 2018

Cleveland has serious clout when it comes to providing healthcare locally, nationally and globally.Researchers in Cleveland are exploring various opportunities to make use of big data which can be analyzed by computers to identify patterns and make predictions to identify health disparities in the city and support upcoming start-ups,health care firms, technology companies and consulting firms.A study conducted by the Center for Population Dynamics at Cleveland State University states that Cleveland has many opportunities to grow its economy by leveraging big data to enhance residents health."The health systems are learning that there are a lot of other things that determine people's health - social determinants, environmental determinants. There's going to be a more holistic approach around that," said Stephen McHale, the former chief executive of Explorys, a Cleveland-based healthcare analytics business acquired by IBM in 2015.
 
 (Source : http://www.cleveland.com/business/index.ssf/2018/01/could_big_data_help_cleveland.html )

Big data to help ensure food safety in Beijing. Xinhuanet.com,January 21, 2018.

Beijing Municipal Commission of Commerce is making use of big data and cloud computing  technology to ensure food safety in beijing. The commission is working with organizations to set up a food traceability system to track all the stages in food supply chain including food production, processing, packaging, delivery and sales. The commission has signed a strategic agreement with Chinese e-commerce giant JD.com to carry out cooperation in food traceability.

(Source : http://www.xinhuanet.com/english/2018-01/21/c_136913185.htm )

Big data in cloud computing demands an IT skill set change.TechTarget.com, January 29,2018.

Organizations are shifting their big data workloads in the cloud , though this does not require a complete overhaul of IT skills, it does require some changes for admin and dev teams.Big data in cloud computing can reduce overall costs when compared to on-premise deployments. However, every big data project does not require the organization to have a big data expert but the ones that involve Hadoop will.It might seem pretty straightforward to replace a 5 node hadoop cluster on-premise with a 5 node hadoop cluster in the cloud, there are several management challenges that arise around software interoperability. To succeed with big data in cloud computing, IT teams must focus on 4 important categories of skills -Administration, Development, Data Analysis and Data Visualization.

(Source : http://searchcloudcomputing.techtarget.com/tip/Big-data-in-cloud-computing-demands-an-IT-skill-set-change)
 

PREVIOUS

NEXT

Online Hadoop Training

 

Relevant Projects

Hive Project - Visualising Website Clickstream Data with Apache Hadoop
Analyze clickstream data of a website using Hadoop Hive to increase sales by optimizing every aspect of the customer experience on the website from the first mouse click to the last.

Airline Dataset Analysis using Hadoop, Hive, Pig and Impala
Hadoop Project- Perform basic big data analysis on airline dataset using big data tools -Pig, Hive and Impala.

Spark Project-Analysis and Visualization on Yelp Dataset
The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

Data processing with Spark SQL
In this Apache Spark SQL project, we will go through provisioning data for retrieval using Spark SQL.

Online Hadoop Projects -Solving small file problem in Hadoop
In this hadoop project, we are going to be continuing the series on data engineering by discussing and implementing various ways to solve the hadoop small file problem.

Explore features of Spark SQL in practice on Spark 2.0
The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.

Implementing Slow Changing Dimensions in a Data Warehouse using Hive and Spark
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Real-Time Log Processing using Spark Streaming Architecture
In this Spark project, we are going to bring processing to the speed layer of the lambda architecture which opens up capabilities to monitor application real time performance, measure real time comfort with applications and real time alert in case of security

Hadoop Project for Beginners-SQL Analytics with Hive
In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.

Real-Time Log Processing in Kafka for Streaming Architecture
The goal of this apache kafka project is to process log entries from applications in real-time using Kafka for the streaming architecture in a microservice sense.



Tutorials