Engineering Manager, Large-Scale Machine Learning

San Francisco

Full Time
OpenAI logo
Discovering and enacting the path to safe artificial general intelligence
Apply now Apply later

Posted 2 weeks ago

We’re looking for an engineering manager to lead a team building models vastly more capable than GPT-3, CLIP, and DALL-E. We train our models on some of the largest machine learning supercomputers in the world, which requires a team with deep software engineering expertise in order to extract maximum performance.
The challenges we face are difficult, but are similar in shape to those you’d see at a growing web startup. We’re constantly improving the instropectability and reliability of our systems, finding new performance bottlenecks, and designing better abstractions for managing our code. We sweat the details, making sure we understand every piece of our system, but are also pragmatic — we’ll sometimes pause further progress to perfect our tooling to understand a new issue, and sometimes we’ll just ignore weirdness or instabilities in order to continue training our models. Our stack is primarily Python + Kubernetes.
The team moves fast and you need to be ready to move fast with it. You will work with many other teams across the company to integrate their contributions into a maximally capable model. Machine learning experience is a plus, but deep software and infrastructure experience is a requirement. 

We look for a track record of the following:

  • Experience managing small teams building ambitious projects on tight deadlines
  • Taking on challenges to make a team run as efficiently as possible
  • Working with other teams across the company rather than in a single silo
  • A desire to understand problems across the stack for yourself, whether networking issues, performance problems, memory leaks, or simply reading unfamiliar code to find where issues might lie
  • Experience managing infrastructure deployments

You might be a good fit if you:

  • Enjoy developing a deep technical understanding of your team's research work and codebase, and are excited to write code and dig into problems yourself
  • Operate without ego and put the needs of the team above your own desires
  • Work cooperatively and collaboratively with other teams in high-pressure situations
  • Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done
  • Brought a software project from early-stage to mid-stage as a founder at a startup
  • Enjoy working as part of a fast-moving team, where perfectionism can sometimes be at odds with (but sometimes directly required for) pragmatism
About OpenAI
We’re building safe Artificial General Intelligence (AGI), and ensuring it leads to a good outcome for humans. We believe that unreasonably great results are best delivered by a highly creative group working in concert. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
This position is subject to a background check for any convictions directly related to its duties and responsibilities. Only job-related convictions will be considered and will not automatically disqualify the candidate. Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodations via
- Health, dental, and vision insurance for you and your family - Unlimited time off (we encourage 4+ weeks per year) - Parental leave - Flexible work hours - Lunch and dinner each day - 401(k) plan with matching
Job tags: AGI Engineering Kubernetes Machine Learning Python Research
Job region(s): North America
Job stats:  19  0  0
  • Share this job via
  • or