Make a Career Change from Mainframe to Hadoop - Learn Why

Make a Career Change from Mainframe to Hadoop - Learn Why

Mainframe legacy systems might not be a part of technology conversations anymore but they are of critical importance to a business. In 1990, analysts predicted that the big data era would witness the death of Mainframes, due to the advent of various other cheap computing resources. It’s been more than 20 years after this prediction and Mainframes seem to be going strong and will be around for quite some time. 81% of the CIOs believe that Mainframes will remain an integral part of business. The largest and critical industries across the globe - healthcare, insurance, finance and retail, still generate data from Mainframes. Mainframe data cannot be ignored because it drives mission critical applications across myriad industries. Can any distributed platform address mainframe workload?  Is there an easy and cost effective way to make use of it? The answer is definitely a resounding YES.

 Using Hadoop distributed processing framework to offload data from the legacy Mainframe systems, companies can optimize the cost involved in maintaining Mainframe CPUs. As increasing number of organizations are involved in Mainframe to Hadoop migration, to exploit big data – let’s take a look at what are the top skills, technicalities required and cost challenges involved, in migrating from Mainframe to Hadoop.

Mainframe to Hadoop Migration

If you would like more information about Big Data careers, please click the orange "Request Info" button on top of this page.

Need to Offload Data from Mainframes to Hadoop

Mainframe legacy systems account for 60% of the global enterprise transactions happening today.70% of the Fortune 500 companies including top 25 retailers in USA, 9 top insurers and top 25 banks still depend on Mainframes for processing more than 30 billion transactions every day. Most of the Fortune 500 companies still process 80% of their corporate data with Mainframes. The information present in Mainframes is highly critical, such as healthcare records, ATM transactions, package tracking information, credit card records, etc.

Organizations run critical applications on Mainframe systems, which generate huge volumes of data but lack the capability to support novel business requirements of processing unstructured data and also involve huge maintenance costs. The wealth of the data stored and processed in mainframes is vital but the resources required to manage data on mainframe systems are highly expensive. Businesses today, spend approximately $100,000 per TB, every year, to lock their data and back it up to tape. However, to manage the same amount of data on Hadoop –it costs $1000 to $4000. To address this huge cost of operation, organizations are increasingly offloading data to the Hadoop framework by shifting to clusters of commodity servers to analyse the bulk of their data. Offloading data to Hadoop might not be important but has potential benefits to the business, as the data is available to the analysts to explore and discern novel business opportunities, ensuring that no information is left untapped.

Learn Hadoop Online

Challenges to be Successful with Hadoop and Mainframes

“The largest organizations want to leverage the scalability and cost benefits of Big Data platforms like Apache Hadoop and Apache Spark to drive real-time insights from previously unattainable mainframe data, but they have faced significant challenges around accessing that data and adhering to compliance requirements.”- said Tendü Yoğurtçu, General Manager of Syncsort’s Big Data business.

  • Mainframe systems contain highly sensitive information whereas Hadoop manages data from diverse sources from harmless tweets to sensitive information. This implies that any data transfers from mainframes to Hadoop must be performed with utmost care, to ensure security. Organizations need to ensure that any software they install on mainframe systems to load data into Hadoop should be legitimate and have a good security track record.
  • Many people think that moving mainframe data to Hadoop is very simple. However, this is not true, as there are several integration gaps, since Hadoop does not have any native support for mainframes.
  • Mainframes and Hadoop both use different data formats - Hadoop uses ASCII format whereas Mainframes use packed decimal EBCDIC format.
  • There is a huge skills gap, as both mainframe and Hadoop skills are in-demand. If finding a JCL or COBOL developer is difficult then finding a Hadoop developer who also understands mainframes is like trying to find a needle in haystack. Mainframe data has to be rationalized with COBOL copybook, which requires a special skillset that a person with only Hadoop skill might not possess and vice-versa. The switch from Mainframes to Hadoop is achievable and is a great technological adventure.

There are many solutions from vendors like Syncsort, Veristorm, Compuware and BMC that target mainframe data with enhanced Hadoop ETL tools. Veristorm and Syncsort are developing various solutions to clear the bottleneck for organizations that still have valuable information locked in mainframe systems.

“Our customers tell us we have delivered a solution that will allow them to do things that were previously impossible. Not only do we simplify and secure the process of accessing and integrating mainframe data with Big Data platforms, but we also help organizations who need to maintain data lineage when loading mainframe data into Hadoop.”- said Tendü Yoğurtçu, General Manager of Syncsort’s Big Data business.

Learn Hadoop to become a Microsoft Certified Big Data Engineer.

Mainframe Legacy Systems Ride on Hadoop: Offloading from Mainframe to Hadoop

With the advent of scalable fault tolerant and cost effective big data technology like Hadoop, organizations can now easily reduce maintenance and processing expenses involved with mainframe legacy systems by including a Hadoop layer or by off-loading the batch processing data from Mainframes to Hadoop. Companies can address big data analytic requirements using Hadoop and distributed analytical model of Apache Mahout Libraries while leveraging the stored legacy data for valuable business insights. eBay, Google, Facebook, Twitter and Yahoo are already making the most of mainframe and Hadoop technology.

Hadoop fits well among COBOL and other legacy technologies, so, by migrating or offloading from mainframe to Hadoop, batch processing can be done at a lower cost, and in a fast and efficient manner. Moving from mainframe to Hadoop is a good move now, because of the reduced batch processing and infrastructure costs. Also, Hadoop code is flexible and easily maintainable, which helps in rapid development of new functionalities.

Organizations should begin with creating copies of selected mainframe datasets in HDFS and then migrating huge volumes of data from various semi-structured sources and RDBMs. The ultimate step is to migrate expensive batch mainframe workloads to Hadoop.

There are several Hadoop components that organizations can take direct advantage of, when offloading from Mainframes to Hadoop-

  • HDFS, Hive and MapReduce components of the Hadoop framework help process huge legacy data, batch workloads and store the intermediate results of processing. Batch jobs can be taken off from mainframe systems, processed using Pig, Hive or MapReduce and the result can be moved back to mainframe systems which helps reduce MIPS (million instructions per second) cost.
  • Sqoop and Flume components of the Hadoop framework helps move data between Hadoop and RDBMS.
  • Oozie, component of the Hadoop framework, helps schedule batch jobs just like the job scheduler in mainframes.

Advantages of Using Hadoop with Mainframes for Legacy Workload

  • Organizations can retain and analyse data at much granular level with longer history.
  • Hadoop reduces the cost and strain on legacy platforms.
  • Hadoop helps revolutionize enterprise workloads by reducing batch processing times for mainframes and EDW.

Why Mainframe Professionals should learn Hadoop in 2016?

Huge Talent Crunch for “Mainframe + Hadoop” Professionals

The lack of talent for “Mainframe+ Hadoop” skills, is becoming a persistent problem for the CIOs of any organization that want to push mainframe-hosted data and Hadoop-powered analysis closer together. Companies that still depend on mainframes are finding it difficult to hire professionals who possess mainframe knowledge along with Hadoop skills to support transaction processing, legacy applications and have the capability to leverage analytics from the data.

There is a huge mainframe knowledge gap these days because of limited training opportunities and lack of university programs that offer Mainframe specialization. Anybody with an internet connectivity can easily learn Python, Java, R, Hadoop from open online MOOCs like DeZyre, Coursera, Udacity but it would be hard to find a COBOLO or JCL course on these websites.

Why Mainframe professionals should learn Hadoop

Here’s a heads-up to mainframe professionals from DeZyre Industry Experts – “Mainframe development languages like JCL or COBOL alone, are not cool any more. Just having Mainframe skills do not look good on an IT resume to attract hiring managers, even though they are applicable to several employers. It is much trendier to have mainframe development programming skills along with Hadoop skills, on your resume - to land a top gig as a big data professional.”

For the complete list of big data companies and their salaries- CLICK HERE

The future is all set for Apache Hadoop and mainframes to rule the world of data managing systems. Organizations that are migrating from mainframes to Hadoop, are in search of professionals with knowledge of analytics.  Your skills - solely as a mainframe professional will not be enough to cater to the data management requirements of the present and the future. This is the best time for mainframe professionals to start updating their skillset with Hadoop. If you are a mainframe professional looking forward to upscaling your career in Hadoop technology, then you can talk to one of our career counsellors.

We would love to answer any questions you have in moving from Mainframe to Hadoop, please leave a comment below.



How to learn Hadoop Online

Relevant Projects

Online Hadoop Projects -Solving small file problem in Hadoop
In this hadoop project, we are going to be continuing the series on data engineering by discussing and implementing various ways to solve the hadoop small file problem.

Spark Project -Real-time data collection and Spark Streaming Aggregation
In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

Hadoop Project for Beginners-SQL Analytics with Hive
In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.

Design a Hadoop Architecture
Learn to design Hadoop Architecture and understand how to store data using data acquisition tools in Hadoop.

Hadoop Project-Analysis of Yelp Dataset using Hadoop Hive
The goal of this hadoop project is to apply some data engineering principles to Yelp Dataset in the areas of processing, storage, and retrieval.

Spark Project-Analysis and Visualization on Yelp Dataset
The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Implementing Slow Changing Dimensions in a Data Warehouse using Hive and Spark
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Real-time Auto Tracking with Spark-Redis
Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.

Explore features of Spark SQL in practice on Spark 2.0
The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.