News on Hadoop - July 2018
Hadoop data governance services surface in wake of GDPR.TechTarget.com, July 2, 2018.
GDPR has turned out to be a strong motivator that would bring greater governance to big data. At the recent DataWorks Summit 2018 , though most of the attention was focussed on how Hadoop pioneer Hortonworks is all set to expand its service in the cloud, there was great interest and importance put on managing data privacy as well. Just one month after the European Union’s GDPR mandate, implementers at the summit discussed various ways on how to populate data lakes, curate data and improve hadoop data governance services. Hadoop data governance services are going to be a bigger part of the scene not just for big data but for all the data.
(Source - https://searchdatamanagement.techtarget.com/podcast/Hadoop-data-governance-services-surface-in-wake-of-GDPR )
Predicting Future Online Threats with Big Data.InsideBigData.com, July 4, 2018
A study published that the net increase in average annual number of security breaches is expected to be 27.4%. To counter the rapid increase of cyber crimes and threats, big data is considered the major driver for detecting , preventing and predicting future online security threats. What follows are few examples on how big data can be used against online threats -
- Big data is used to analyze network vulnerabilities by identifying the databases that are most likely to be attacked by hackers for IDs, addresses, email accounts, payment information. This will help organizations eliminate the risk of online threats and stay ahead of hackers.
- Detecting any Irregularities in Online Behavior and Device Use - Any anomalies observed in the online behavior of employees or in the analysis of device use can also help enhance online security.
(Source - https://insidebigdata.com/2018/07/04/predicting-future-online-threats-big-data/ )
MapR Data Platform gets object tiering and S3 support.TechTarget.com, July 5, 2018
MapR Data Platform 6.1 added support for Amazon’s S3 API and automated tiering for cloud-based object storage. The new version of the data platform will provide policy-based data placement across capacity, performance and archive tiers. It also comes bundled with fast ingest erasure coding for high capacity storage on premise and in public clouds, an installer option that would provide security by default and volume based encryption of data at rest. New storage features have been added through policy-based tiering that automatically moves data.With businesses moving data lake infrastructure to cloud, the new storage feature will not only address the storage cost issues associated with it but the version 6.1 will also do it automatically.
(Source - https://searchstorage.techtarget.com/news/252444267/MapR-Data-Platform-gets-object-tiering-and-S3-support )
Novartis drug development gets big data analytics boost.ScienceBusiness.net, July 12, 2018
One of the largest pharmaceutical company - Novartis has joined the race of Artificial Intelligence and Machine Learning. “Nerve Live”, a platform designed by the Predictive Analytics and Design Group in Global Drug Development department, uses the most advanced technologies to harness the data generated by Novartis. The journey of creating this platform started way back in 2015, and the first step was to integrate all the data which was sealed inside different teams having different domains. Next step was to create an analytics engine that will process this data and create meaningful insights from it. The third step was to create intuitive models which can be used in decision making. So far 5 modules have been created and many more in pipeline. The Trail Footprint Optimized and Sense to name a few from the modules, which helps GDD, to identify and plan countries for clinical studies, and to provide a transparent platform for the teams working in Novartis and across the globe on clinical trials, respectively.
(Source - https://sciencebusiness.net/network-news/novartis-drug-development-gets-big-data-analytics-boost )
Hadoopi - Raspberry Pi Hadoop Cluster.i-programmer.info, July 17 2018
Hadoopi, as the name goes, is an amalgamation of Hadoop and Raspberry Pi, available as an open source project on Github. This project has all the necessary configuration files and pre-build code to configure and run a cluster of 5 Raspberry Pi 3s running Hue as a Hadoop Distribution. Hadoopi provides support on majority of the components of Hadoop like Hive, Spark, and HBase. To improve performance and reliability , the latest version of Hadoopi has wired networking. It also supports collection of metrics using Prometheus and graphical representation of those metrics on dashboard provided by Grafana.
At 1 lakh new jobs, hiring will be flat in FY19: Nasscom.HinduBusinessLine.com, July 26,2018
The incremental hiring this year by the IT companies will remain same as the previous year i.e. 1 lakh new jobs to be added to IT Industry. A total of 40 lakh of people will be employed in year 2018-19. Although majority of the jobs will be for the emerging technologies like Artificial Technologies, Big Data, Machine Learning etc. As per Nasscom, total demand for these technologies will be around 5.11 lakh in 2018 and is expected to increase more than 50% by 2021. Also, under the scheme 9-66-155 which is part of FutureSkills platform, Nasscom has identified 9 technologies across 16 different job roles and 155 skills that will be needed in those technologies.
Big data and predictive analytics pull in smokers for lung screening.HealthcareITnews.com, July 27,2018
Predictive analysis and focused cloud-based marketing by Chesapeake Regional Healthcare has pulled in the smokers to get screened by its Lung Cancer Screening Trigger Campaign. According to Chesapeake Healthcare, there is 80% probability for curing lung cancer if detected at an early stage using low-dose computed tomography imaging. Chesapeake Healthcare uses SaaS based analytics tools and marketing technology for their campaign to identify, educate and target a specific set of people that are already at the risk of having lung cancer or who require lung screening test. Out of all patients screened, they were able to get around 5.21% of new patients and 9.17% of all patients it targeted to get lung screening. To achieve this result, they partnered with Tea Leaves Health, a cloud-based analytics service provider that also provides self-reported smokers data and some modelling tools around it.