Recap of Hadoop News for June

Recap of Hadoop News for June

News on Hadoop-June 2016

Big Data and Hadoop News for June

No poop, Datadog loops in Hadoop. June 6,

Datadog, a leading firm that provides cloud monitoring as a service has announced its support for Hadoop framework for processing large datasets across a cluster of computers. Hadoop users can use Datadog’s dashboard for target alerts and full stack visibility. Hadoop users will not be able to host’s system level metric and hundreds of Hadoop metrics so that they can easily relate to what’s happening throughout their stack. Users will also be able to set alerts if critical hadoop jobs don’t finish on time.

(Source: )

For the complete list of big data companies and their salaries- CLICK HERE

Hadoop creator Doug Cutting on the near-future tech that will unlock big data. June 8,

Speaking at the Strata and Hadoop World conference, Hadoop creator Doug Cutting mentioned about the increasing I/O bottleneck arising as processing speed and efficiency are increasing faster than the storage read-write rates. Cutting predicts that the new Intel 3D XPoint storage chips will retrieve data 1000 times faster will open up the platform to new uses so that users can process large datasets in-memory by bypassing the latency inherent in fetching the data from disk.

(Source: )

Hadoop Training

How Hadoop is being used in Business Operations. June 16, 2016.

Pepperdata conducted a survey of about 134 software engineers, data scientists from various industries such as finance, education, etc. The biggest challenge the respondents found in using Hadoop in operations is – the he skills gap – on implementing Hadoop and the knowledge of Hadoop.

(Source: )

California based BlueData becomes the first vendor to offer (BDaaS). June 23, 2016. 

BlueData, whose software allows enterprises to run Apache Hadoop and Spark in a virtualized environment is now branching out and offering its new EPIC platform version. This will let users run computing on AWS, while keeping their data on-premise.

(Source: )

Enrol now for hands-on Hadoop Training to become a certified Hadoop Developer

Apache Hadoop distributions are becoming ODPi compliant. June 27, 2016.

ODPi is a non-profit organization that is dedicated to standardizing big data and Hadoop ecosystem with a common reference solution specification, known as the ODPi core. Hadoop distributions - Altiscale, ArenaData, Hortonworks, IBM, and Infosys are now ODPi Runtime Compliant. These Hadoop distributions now adhere to a specific set of expectations to run big data solutions.

(Source: )

MapR's New Spyglass Initiative Aims to Ease and Advance Hadoop Administration. June 28,

With many companies still struggling with Hadoop complexities to yield data-driven results, MapR announced its new initiative Spyglass. MapR’s new Spyglass intuitive will help customers leverage will ease hadoop administration with greater administrator productivity and efficient cluster management. Spyglass will provide in-depth visibility with various customizable dashboards that will ease big data deployments.

(Source: )

Microsoft has now integrated support for Apache Spark into Microsoft R Server for Hadoop. June 29, 2016.

To bring in Spark’s fast data processing speed advantages within the reach of its R users, Microsoft has integrated the support for Spark into its R server for Hadoop. This integration will help R users run R functions on 1000’s of apache spark nodes and train models on data that is thousand times larger. The new integrated support combines the speed of R Server’s parallelized algorithms and Spark’s in-memory architecture, it can run algorithms several times faster than the open source.

(Source: )

BMC evolving with Hadoop to launch new data solutions. June 29,

BMC is evolving with its new hadoop initiatives. As Hadoop turns 10 this years, BMC has opened up bunch of new API’s. Automation is a key ingredient to its services and its working on automating jobs across different platforms including hadoop to provide simple and sexy interface. There already exist many technologies that warn customers of potential threat but BMC is working on changing that by actually preventing threat and securing operations instead of just warning them.


3 ways Yahoo employed Hadoop to optimize utilization. June 30, 2016.SiliconAngle

At the Hadoop Summit 2016 in San Jose, CA, Mark Holder Baugh, senior director of Hadoop Engineering at Yahoo highlights 3 ways in which Yahoo uses Hadoop to optimize utilization. To meet the growing demands from its customers and provide better user experience –

  • Yahoo implemented cluster management technology YARN
  • Embraced Apache Storm for monitoring clusters and data analyzation.
  • Migrating to Tez which helped Yahoo run millions of jobs and increase utilization by 50%.

(Source - )

Attunity Introduces Hadoop Data Usage Analytics for Growing Data Lake Environments. June 30, 2016.

The leading provider of big data management software solutions Attunity released its latest software solution Attunity Visibility for Hadoop. The latest release with enhanced technology provides organization comprehensive data usage analytics that help enterprise measure their hadoop data storage usage for optimized cost performance, accurate capacity planning, and  ensure that all data governance and compliance requirements are met.

(Source: )



Big Data and Hadoop Certification

Relevant Projects

Tough engineering choices with large datasets in Hive Part - 1
Explore hive usage efficiently in this hadoop hive project using various file formats such as JSON, CSV, ORC, AVRO and compare their relative performances

Movielens dataset analysis using Hive for Movie Recommendations
In this hadoop hive project, you will work on Hive and HQL to analyze movie ratings using MovieLens dataset for better movie recommendation.

Real-time Auto Tracking with Spark-Redis
Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.

Spark Project-Analysis and Visualization on Yelp Dataset
The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

Finding Unique URL's using Hadoop Hive
Hive Project -Learn to write a Hive program to find the first unique URL, given 'n' number of URL's.

Spark Project -Real-time data collection and Spark Streaming Aggregation
In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

Hadoop Project-Analysis of Yelp Dataset using Hadoop Hive
The goal of this hadoop project is to apply some data engineering principles to Yelp Dataset in the areas of processing, storage, and retrieval.

Implementing Slow Changing Dimensions in a Data Warehouse using Hive and Spark
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Explore features of Spark SQL in practice on Spark 2.0
The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.

Tough engineering choices with large datasets in Hive Part - 2
This is in continuation of the previous Hive project "Tough engineering choices with large datasets in Hive Part - 1", where we will work on processing big data sets using Hive.