Data Engineer, Apache Spark

Company Name: Capital One
Location: McLean, Virginia
Date Posted: 03rd Aug, 2016

Collaborating as part of a cross-functional Agile team to create and enhance software that enables state of the art, next generation Big Data & Fast Data applications

-Building efficient storage for structured and unstructured data

- Developing and deploying distributed computing Big Data applications using Open Source frameworks like Apache Spark, Apex, Flink, Storm, Akka and Kafka on AWS Cloud

- Utilizing programming languages like Java, Scala, Python and Open Source RDBMS and NoSQL databases like PostgreSQL and MongoDB  

-Utilizing Hadoop modules such as YARN & MapReduce, and related Apache projects such as Hive, Hbase, Pig, and Cassandra

-Developing data-enabling software utilizing open source frameworks or projects such as Spring, Angular JS, SOLR, Drools, etc.

-Leveraging DevOps techniques and practices like Continuous Integration, Continuous Deployment, Test Automation, Build Automation and Test Driven Development to enable the rapid delivery of working code utilizing tools like Jenkins, Maven, Nexus, Chef, Teraform, Ruby, Git and Docker 

-Performing unit tests and conducting reviews with other team members to make sure your code is rigorously designed, elegantly coded, and effectively tuned for performance


-Bachelor’s Degree or military experience

-At least 3 years of professional work experience coding in data management

Preferred Qualifications:

-2+ years of experience with the Hadoop Stack

-2+ years of Distributed Computing frameworks experience

-2+ years experience with Cloud computing (AWS a plus)

-2+ years of NoSQL implementation experience (MongoDB and Cassandra a plus)

-4+ years Java or Scala development experience

-4+ years of scripting experience

-4+ years' experience with Relational Database Systems and SQL (PostgreSQL a plus)

-4+ years of ETL design, development and implementation experience

-4+ years of UNIX/Linux experience