News on Hadoop - Janaury 2018
Apache Hadoop 3.0 goes GA, adds hooks for cloud and GPUs.TechTarget.com, January 3, 2018.
The latest update to the 11 year old big data framework Hadoop 3.0 allows cluster pooling on GPU resources , reduces storage requirements, and adds a novel federation scheme that lets YARN resource manager and the job scheduler expand the number of nodes which can run within a Hadoop cluster. This new feature of YARN federation in Hadoop 3.0 will find great use in Cloud based Hadoop based applications.Carlo Aldo Curino, a principal scientist at Microsoft said that with YARN federation feature there is a routing layer that sits on top of HDFS clusters providing greater scalability in cloud platforms like Microsoft Azure.Carlo further adds that the biggest of Hadoop clusters till date have been in low thousands of nodes, however now people need tens of thousands of nodes and running YARN federation will help them get there.
(Source : http://searchdatamanagement.techtarget.com/news/450432578/Apache-Hadoop-30-goes-GA-adds-hooks-for-cloud-and-GPUs )
Hadoop 3 confronts the realities of storage growth. Zdnet.com, January 3, 2018
Apache Hadoop was built around the concept of cheap commodity infrastructure a decade ago but the latest release of Hadoop i.e. Hadoop 3.x confronts the truth that too much of cheap storage can get expensive.When Hadoop was built, the idea was to provide linearly scalable parallel computing and bring it close to big data with the use of commodity hardware so that storage costs can be an afterthought for organizations. The assumption behind Hadoop’s original approach for high availability is to make data available with 3 replicas through cheap storage options.However, However, the latest release of Hadoop 3.0 unveils that too much cheap storage can actually turn out to be expensive. Hadoop 3.0 brings in erasure coding which uses RAID mechanism to reduce the data sprawl. The price for this the developers have to pay is that they would not be able to get failover access immediately since data managed through RAID approaches needs to be restored. In reality, erasure coding feature of Hadoop 3.0 will more likely be used as a data tiering strategy where data will be stored on cheaper and slower media.
(Source : http://www.zdnet.com/article/hadoop-3-confronts-the-realities-of-storage-growth/)
For the complete list of big data companies and their salaries- CLICK HERE
6 Key Future Prospects of Big Data Analytics in Healthcare Market for Forecast Period 2017 - 2026. Globalnewswire.com, January 5, 2018.
According to a report collated by Fact.MR , the big data analytics in healthcare market is expected to see an annual double digit CAGR through 2017-2026. By end of 2026, more than US$ 45,000 Mn revenue will be garnered from the sales of big data analytics in healthcare across the globe. The factmr report further highlights that big data analytics would be extensively used for cutting down on healthcare costs and boosting precision medicine research. The major takeaways from the fact.MR’s report on big data analytics in healthcare sector are -
- Despite on-premise deployments being most sought-after , cloud based big data deployments in the healthcare market will see a faster expansion by 2026.
- CRM will remain the go-to tool for big data analytics in healthcare market.
- Healthcare providers will remain the dominant spenders in global big data analytics.
- Asia-Pacific other than Japan will continue to be the fast expanding and remunerative market for big data analytics.
- Access to operational information is expected to be the largest use of big data analytics in healthcare, in terms of revenues.
(Source : https://globenewswire.com/news-release/2018/01/05/1284203/0/en/6-Key-Future-Prospects-of-Big-Data-Analytics-in-Healthcare-Market-for-Forecast-Period-2017-2026.html )
Could big data unlock safer commutes for cyclists?Marketplace.org, January 8, 2018
Cities across US are finding it hard to make roads safe for bikes as cycling has become a popular means of commute and recreation. City planners are looking to big data solutions so that they can plan accordingly on where they have to add additional bike lanes and routes and also correct other small scale problems like where there are problems at crossings or intersections.City planners in TExas do not have enough data about the routes cyclists are travelling so Texas has acquired 2 years worth of data from the popular cycling app Strava having 84,000 users (20% of entire biking community) for an undisclosed amount. With the data from Strava app, the Texas Department of Transportation can actually see what is being used ,instead of merely guessing on where to put the infrastructure or the bike lane.
(Source : https://www.marketplace.org/2018/01/08/tech/could-big-data-unlock-safer-commutes-cyclists )
Hadoop 3.0 Perspectives by Hortonwork’s Hadoop YARN & MapReduce Development Lead, Vinod Kumar Vavilapalli. InsideBigData.com, January 16, 2018.
Apache Hadoop has become the go-to framework within the big data ecosystem for running and managing big data applications on large hardware hadoop clusters in distributed environments.Hortonwork’s Hadoop YARN & MapReduce Development Lead, Vinod Kumar Vavilapalli offered his perspective on the latest release of Hadoop 3.0 in Q&A with Insidebigdata editorial team and here’s a quick summary of what he had to say. According to Vinod , the key element that distinguishes the latest release from Hadoop 2.0 is around storage. Hadoop 3.0 brings erasure coding as optional storage mechanism along with the replication based system. The other key element focuses on computation through extensible resource-types. Hadoop 3 brings machine learning and deep learning workloads to Hadoop cluster through GPU and FPGA resources.Vinod says that the major industry trends of today are cloud, machine learning and deep learning and proliferation of big data to achieve greater scalability and Hadoop 3.0 caters to most of the key big data trends of today making it the de facto distributed data-processing framework for big data.
(Source : https://insidebigdata.com/2018/01/16/hadoop-3-0-perspectives-hortonworks-hadoop-yarn-mapreduce-development-lead-vinod-kumar-vavilapalli/ )
Adobe adds Hadoop connector to Adobe Campaign.Zdnet.com, January 16, 2018.
Hadoop is a valuable data source at Adobe but marketers have not been able to use it and only the IT and analytics team have been primarily working with it. With the intent of providing direct access to big data and hadoop for marketers, Adobe has added a new Hadoop connector which will add more data to the Adobe Campaign , a part of company's experience cloud than it can analyse. The addition of Hadoop connector through Apache Hive will bring in additional data sources from various POS terminals, mobile devices and kiosks that will facilitate organising various marketing campaigns across diverse channels.Hadoop can prove valuable to various industries and one major use case is retail where the unstructured data stored in Hadoop can be added to loyalty programs.
(Source : http://www.zdnet.com/article/adobe-adds-hadoop-connector-to-adobe-campaign/ )
Could 'big data' help Cleveland reduce health disparities - and create jobs?Cleveland.com, January 21, 2018
Cleveland has serious clout when it comes to providing healthcare locally, nationally and globally.Researchers in Cleveland are exploring various opportunities to make use of big data which can be analyzed by computers to identify patterns and make predictions to identify health disparities in the city and support upcoming start-ups,health care firms, technology companies and consulting firms.A study conducted by the Center for Population Dynamics at Cleveland State University states that Cleveland has many opportunities to grow its economy by leveraging big data to enhance residents health."The health systems are learning that there are a lot of other things that determine people's health - social determinants, environmental determinants. There's going to be a more holistic approach around that," said Stephen McHale, the former chief executive of Explorys, a Cleveland-based healthcare analytics business acquired by IBM in 2015.
(Source : http://www.cleveland.com/business/index.ssf/2018/01/could_big_data_help_cleveland.html )
Big data to help ensure food safety in Beijing. Xinhuanet.com,January 21, 2018.
Beijing Municipal Commission of Commerce is making use of big data and cloud computing technology to ensure food safety in beijing. The commission is working with organizations to set up a food traceability system to track all the stages in food supply chain including food production, processing, packaging, delivery and sales. The commission has signed a strategic agreement with Chinese e-commerce giant JD.com to carry out cooperation in food traceability.
(Source : http://www.xinhuanet.com/english/2018-01/21/c_136913185.htm )
Big data in cloud computing demands an IT skill set change.TechTarget.com, January 29,2018.
Organizations are shifting their big data workloads in the cloud , though this does not require a complete overhaul of IT skills, it does require some changes for admin and dev teams.Big data in cloud computing can reduce overall costs when compared to on-premise deployments. However, every big data project does not require the organization to have a big data expert but the ones that involve Hadoop will.It might seem pretty straightforward to replace a 5 node hadoop cluster on-premise with a 5 node hadoop cluster in the cloud, there are several management challenges that arise around software interoperability. To succeed with big data in cloud computing, IT teams must focus on 4 important categories of skills -Administration, Development, Data Analysis and Data Visualization.
(Source : http://searchcloudcomputing.techtarget.com/tip/Big-data-in-cloud-computing-demands-an-IT-skill-set-change)