Company Name: Capgemini
Location: New York, NY
Date Posted: 27th Oct, 2016
- Design and Build distributed, scalable, and reliable data pipelines that ingest and process data at scale and in real-time.
- Collaborate with other teams to design and develop data tools that support both operations and product use cases.
- Source huge volume of data from diversified data platforms into Hadoop platform
- Perform offline analysis of large data sets using components from the Hadoop ecosystem.
- Evaluate big data technologies and prototype solutions to improve our data processing architecture.
- Knowledge of Private Banking & Wealth Management domain is an added advantage
- 10+ years of hands-on programming experience with 3+ years in Hadoop platform
- Experience designing and architecting Hadoop based platforms for building Data Lakes
- Knowledge of various components of Hadoop ecosystem and experience in applying them to practical problems
- Proficiency with Java and one of the scripting languages like Python / Scala etc.
- Flair for data, schema, data model, how to bring efficiency in big data related life cycle
- Experience building ETL frameworks in Hadoop using Pig/Hive/Map reduce
- Experience in creating custom UDFs and custom input/output formats / serdes
- Ability to acquire, compute, store and provision various types of datasets in Hadoop platform
- Understanding of various Visualization platforms (Tableau, Qlikview, others)
- Experience in data warehousing, ETL tools , MPP database systems
- Strong object-oriented design and analysis skills