Software Engineer - Data Processing Platform (Hoodie)

Company Name: Uber
Location: San Francisco, CA
Date Posted: 15th Mar, 2017
  • As a member of the Data Processing Platform team, you will help redefine how compute and data processing in done for massively large data sets in Uber.
  • You will focus on delivering production systems that will be the bedrock of the most demanding workloads in Uber, with a focus on engineering for self-healing and reduced operational complexity which works as a turnkey solution for many diverse workloads.
  • You will be responsible for coming up with solutions for reducing end-to-end latencies of multi-terabyte scale pipelines to a few minutes and implement them on some of the most business critical data feeds at Uber.
  • You will be a core contributor to the future of Hoodie ( and other open source projects this team will be working on.
  • You will work on open problems like ingesting updates to HDFS at scale, supporting large complex stream-stream and stream-dataset joins, unification of stream and batch processing etc.
  • You will be challenged to understand and architect solutions related to consistency, durability, failure recovery scenarios, storage formats, indexing techniques and query optimizations.



Qualifications and Required Skills

  • Bachelor's degree or higher in an engineering field (Computer Science, Computer Engineering, etc) with solid fundamentals
  • 5+ years experience designing, implementing and productionising large scale distributed systems
  • Clear abstract thinking around hard problems and deep conceptual understanding of complex systems and the ability to communicate design ideas very clearly
  • Ability to create right abstractions and write clean concise code that is self-documenting and expertise in at least one programming languages (Java, Go, Scala, C/C++, etc), right down to the runtime.
  • Deep understanding of a massively parallel computing framework like Apache Spark and thorough knowledge and understanding of database and file system internals.
  • Understand fundamental tradeoffs in large scale systems and have strong opinions on how systems should be architected and how choices affects throughput,  latency, scan performance, write throughput etc
  • Always curious about how things break with scale, with ability to quickly validate a claim with a simple prototype

Nice to Have Skills (it would be great if you could check some of these boxes!)

  • Open source contributions to a storage or compute system (Apache Spark, Apache Parquet, RocksDB etc)
  • Have fundamentally rethought about an existing status quo and pitched a ideal solution to management and rallied troops to deliver towards reaching the solution.
  • You are addicted to writing concise functional programming code and you take more time writing tests than actual code.
  • Excited about evangelizing and open-sourcing the technology you build and give technical talks on various conferences.