Company Name: Blackwood Seven
Location: Greater Los Angeles Area
Date Posted: 21st Feb, 2017
- Participate in design and optimization of our Data Lake (Hadoop / Spark / Hive) and Elasticsearch data infrastructure
- Optimize Airflow ETL jobs and Hive table schemas, while supporting our Data Science team
- Investigate tools that may accelerate data discovery (Apache Zeppelin, etc…)
- Collaborate and document complex flows into easy to follow diagrams.
- Bachelor’s Degree (technical or science preferred)
- 6+ years working with Linux
- 3+ years working with Hadoop and Hive
- 3+ years working in Python in a production environment
- Experience working in AGILE SCRUM
- Experience with scalable architectures and large data processing
- Experience in evaluating emerging technologies and their applicability
- Experience in effectively communicating data flows, design choices, and risks
- Experience in query tuning and optimization
- Experience in production backend development with Git or a similar source control.
- Experience with Amazon Web Services a plus
- Experience with Apache Spark / pySpark a plus
- Experience with Apache Airflow a plus
- Experience with Elasticsearch a plus.