Senior Software Engineer, Large-Scale Machine Learning

San Francisco

Full Time Senior-level / Expert
OpenAI logo
Discovering and enacting the path to safe artificial general intelligence
Apply now Apply later

Posted 5 days ago

We’re looking for software engineers to help us build models vastly more capable than GPT-3, CLIP, and DALL-E. We train our models on some of the largest machine learning supercomputers in the world, which requires a team with deep software engineering expertise in order to extract maximum performance.
The challenges we face are difficult, but are similar in shape to those you’d see at a growing web startup. We’re constantly improving the instropectability and reliability of our systems, finding new performance bottlenecks, and designing better abstractions for managing our code. We sweat the details, making sure we understand every piece of our system, but are also pragmatic — we’ll sometimes pause further progress to perfect our tooling to understand a new issue, and sometimes we’ll just ignore weirdness or instabilities in order to continue training our models. Our stack is primarily Python + Kubernetes.
No machine learning experience is required — we can teach you whatever ML you need — but we look for depth of software and infrastructure experience. This role is not a support role; you’ll be expected to be self-directed and find the most important problems to solve while working in close concert with a team with a shared goal.

We look for a track record of the following:

  • Experience designing, implementing, and running production services, especially through a growth phase or period of rapid change.
  • Designing clean and flexible abstractions for new problems which are still in the process of being understood.
  • A desire to dig into problems across the stack, whether networking issues, performance problems, memory leaks, or simply reading unfamiliar code to find where issues might lie.
  • Comfort managing and monitoring infrastructure deployments.

You might be a good fit if you:

  • Are self-directed and enjoy figuring out the most important problem to work on.
  • Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done.
  • Have brought software from an early- to mid-stage, such as being an early infrastructure engineer or founder at a startup, or created a new infrastructure component at a large company. 
  • Find or create the best tools to accelerate your and others’ workflows, and are very efficient in your editor and shell.
  • Enjoy working as part of a fast-moving team, where perfectionism can sometimes be at odds with (but sometimes directly required for) pragmatism.
About OpenAI
We’re building safe Artificial General Intelligence (AGI), and ensuring it leads to a good outcome for humans. We believe that unreasonably great results are best delivered by a highly creative group working in concert. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
This position is subject to a background check for any convictions directly related to its duties and responsibilities. Only job-related convictions will be considered and will not automatically disqualify the candidate. Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodations via
- Health, dental, and vision insurance for you and your family - Unlimited time off (we encourage 4+ weeks per year) - Parental leave - Flexible work hours - Lunch and dinner each day - 401(k) plan with matching
Job tags: AGI Engineering Kubernetes Machine Learning ML Python
Job region(s): North America
Job stats:  31  6  0
Share this job: