Software Engineer - Data Processing Platform (Hoodie)
Company Name: Uber
Location: San Francisco, CA
Date Posted: 15th Mar, 2017
- As a member of the Data Processing Platform team, you will help redefine how compute and data processing in done for massively large data sets in Uber.
- You will focus on delivering production systems that will be the bedrock of the most demanding workloads in Uber, with a focus on engineering for self-healing and reduced operational complexity which works as a turnkey solution for many diverse workloads.
- You will be responsible for coming up with solutions for reducing end-to-end latencies of multi-terabyte scale pipelines to a few minutes and implement them on some of the most business critical data feeds at Uber.
- You will be a core contributor to the future of Hoodie (https://github.com/uber/hoodie) and other open source projects this team will be working on.
- You will work on open problems like ingesting updates to HDFS at scale, supporting large complex stream-stream and stream-dataset joins, unification of stream and batch processing etc.
- You will be challenged to understand and architect solutions related to consistency, durability, failure recovery scenarios, storage formats, indexing techniques and query optimizations.
Qualifications and Required Skills
- Bachelor's degree or higher in an engineering field (Computer Science, Computer Engineering, etc) with solid fundamentals
- 5+ years experience designing, implementing and productionising large scale distributed systems
- Clear abstract thinking around hard problems and deep conceptual understanding of complex systems and the ability to communicate design ideas very clearly
- Ability to create right abstractions and write clean concise code that is self-documenting and expertise in at least one programming languages (Java, Go, Scala, C/C++, etc), right down to the runtime.
- Deep understanding of a massively parallel computing framework like Apache Spark and thorough knowledge and understanding of database and file system internals.
- Understand fundamental tradeoffs in large scale systems and have strong opinions on how systems should be architected and how choices affects throughput, latency, scan performance, write throughput etc
- Always curious about how things break with scale, with ability to quickly validate a claim with a simple prototype
Nice to Have Skills (it would be great if you could check some of these boxes!)
- Open source contributions to a storage or compute system (Apache Spark, Apache Parquet, RocksDB etc)
- Have fundamentally rethought about an existing status quo and pitched a ideal solution to management and rallied troops to deliver towards reaching the solution.
- You are addicted to writing concise functional programming code and you take more time writing tests than actual code.
- Excited about evangelizing and open-sourcing the technology you build and give technical talks on various conferences.