Recap of Hadoop News for July 2018

Recap of Hadoop News for July 2018

News on Hadoop - July 2018

Big Data Hadoop News


Hadoop data governance services surface in wake of, July 2, 2018.

GDPR has turned out to be a strong motivator that would bring greater governance to big data. At the recent DataWorks Summit 2018 , though most of the attention was focussed on how Hadoop pioneer Hortonworks is all set to expand its service in the cloud, there was great interest and importance put on managing data privacy as well. Just one month after the European Union’s GDPR mandate, implementers at the summit discussed various ways on how to populate data lakes, curate data and improve hadoop data governance services. Hadoop data governance services are going to be a bigger part of the scene not just for big data but for all the data.

(Source - )

Online Hadoop Training

Predicting Future Online Threats with Big, July 4, 2018

A study published that the net increase in average annual number of security breaches is expected to be 27.4%. To counter the rapid increase of cyber crimes and threats,  big data is considered the major driver for detecting , preventing and predicting future online security threats. What follows are few examples on how big data can be used against online threats -

  • Big data is used to analyze network vulnerabilities by identifying the databases that are most likely to be attacked by hackers for  IDs, addresses, email accounts, payment information. This will help organizations eliminate the risk of  online threats and stay ahead of hackers.
  • Detecting any Irregularities in Online Behavior and Device Use - Any anomalies observed in the online behavior of employees or in the analysis of device use can also help enhance online security.

(Source - )

Master Hadoop Skills by working on interesting Hadoop Projects

MapR Data Platform gets object tiering and S3, July 5, 2018

MapR Data Platform 6.1 added support for Amazon’s S3 API and automated tiering for cloud-based object storage. The new version of the data platform will provide policy-based data placement across capacity, performance and archive tiers. It also comes bundled with fast ingest erasure coding for high capacity storage on premise and in public clouds, an installer option that would provide security by default and volume based encryption of data at rest. New storage features have been added through policy-based tiering that automatically moves data.With businesses moving data lake infrastructure to cloud, the new storage feature will not only address the storage cost issues associated with it but the version 6.1 will also do it automatically.

(Source - )

Novartis drug development gets big data analytics, July 12, 2018

One of the largest pharmaceutical company - Novartis has joined the race of Artificial Intelligence and Machine Learning. “Nerve Live”, a platform designed by the Predictive Analytics and Design Group in Global Drug Development department, uses the most advanced technologies to  harness the data generated by Novartis. The journey of creating this platform started way back in 2015, and the first step was to integrate all the data which was sealed inside different teams having different domains. Next step was to create an analytics engine that will process this data and create meaningful insights from it. The third step was to create intuitive models which can be used in decision making. So far 5 modules have been created and many more in pipeline. The Trail Footprint Optimized and Sense to name a few from the modules, which helps GDD, to identify and plan countries for clinical studies, and to provide a transparent platform for the teams working in Novartis and across the globe on clinical trials, respectively.

(Source - )

Big Data Hadoop Projects

Hadoopi - Raspberry Pi Hadoop, July 17 2018

Hadoopi, as the name goes, is an amalgamation of Hadoop and Raspberry Pi,  available as an open source project on Github. This project has all the necessary configuration files and pre-build code to configure and run a cluster of 5 Raspberry Pi 3s running Hue as a Hadoop Distribution. Hadoopi provides support on majority of the components of Hadoop like Hive, Spark, and HBase. To improve performance and reliability , the latest version of Hadoopi has wired networking. It also supports collection of metrics using Prometheus and graphical representation of those metrics on dashboard provided by Grafana.


At 1 lakh new jobs, hiring will be flat in FY19:, July 26,2018

The incremental hiring this year by the IT companies will remain same as the previous year i.e. 1 lakh new jobs to be added to IT Industry. A total of 40 lakh of people will be employed in year 2018-19.  Although majority of the jobs will be for the emerging technologies like Artificial Technologies, Big Data, Machine Learning etc. As per Nasscom, total demand for these technologies will be around 5.11 lakh in 2018 and is expected to increase more than 50% by 2021. Also, under the scheme 9-66-155 which is part of FutureSkills platform, Nasscom has identified 9 technologies across 16 different job roles and 155 skills that will be needed in those technologies.

Big data and predictive analytics pull in smokers for lung, July 27,2018

Predictive analysis and focused cloud-based marketing by Chesapeake Regional Healthcare has pulled in the smokers to get screened by its Lung Cancer Screening Trigger Campaign. According to Chesapeake Healthcare, there is 80% probability for curing lung cancer  if detected at an early stage using low-dose computed tomography imaging. Chesapeake Healthcare uses SaaS based analytics tools and marketing technology for their campaign to identify, educate and target a specific set of people that are already at the risk of having lung cancer or who require lung screening test. Out of all patients screened, they were able to get around 5.21% of new patients and 9.17% of all patients it targeted to get lung screening. To achieve this result, they partnered with Tea Leaves Health, a cloud-based analytics service provider that also provides self-reported smokers data and some modelling tools around it.

Online Hadoop Training



Relevant Projects

Machine Learning Projects
Data Science Projects
Python Projects for Data Science
Data Science Projects in R
Machine Learning Projects for Beginners
Deep Learning Projects
Neural Network Projects
Tensorflow Projects
NLP Projects
Kaggle Projects
IoT Projects
Big Data Projects
Hadoop Real-Time Projects Examples
Spark Projects
Data Analytics Projects for Students
Real-time Auto Tracking with Spark-Redis
Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.

Real-Time Log Processing using Spark Streaming Architecture
In this Spark project, we are going to bring processing to the speed layer of the lambda architecture which opens up capabilities to monitor application real time performance, measure real time comfort with applications and real time alert in case of security

Analyse Yelp Dataset with Spark & Parquet Format on Azure Databricks
In this Databricks Azure project, you will use Spark & Parquet file formats to analyse the Yelp reviews dataset. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.

Data processing with Spark SQL
In this Apache Spark SQL project, we will go through provisioning data for retrieval using Spark SQL.

Finding Unique URL's using Hadoop Hive
Hive Project -Learn to write a Hive program to find the first unique URL, given 'n' number of URL's.

Web Server Log Processing using Hadoop
In this hadoop project, you will be using a sample application log file from an application server to a demonstrated scaled-down server log processing pipeline.

Hadoop Project for Beginners-SQL Analytics with Hive
In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.

Implementing Slow Changing Dimensions in a Data Warehouse using Hive and Spark
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Hadoop Project-Analysis of Yelp Dataset using Hadoop Hive
The goal of this hadoop project is to apply some data engineering principles to Yelp Dataset in the areas of processing, storage, and retrieval.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.