Machine Learning Operations Data Engineer

Somerville, MA

Full Time Senior-level / Expert USD 115K - 180K *

Flagship Pioneering, Inc.

We create breakthroughs in human health and sustainability and build bioplatform companies . Companies founded

View company page

About Generate Biomedicines

Generate Biomedicines is a new kind of therapeutics company – existing at the intersection of machine learning, biological engineering, and medicine – pioneering Generative Biology™ to create breakthrough medicines where novel therapeutics are computationally generated, instead of being discovered. Generate has built a machine learning-powered biomedicines platform with the potential to generate new drugs across a wide range of biologic modalities. This platform represents a potentially fundamental shift in what is possible in the field of biotherapeutic development.

We pursue this audacious vision because we believe in the unique and revolutionary power of generative biology to radically transform the lives of billions, with an outsized opportunity for patients in need. We are seeking collaborative, relentless problem solvers that share our passion for impact to join us!

Generate was founded in 2018 by Flagship Pioneering and has received over $420 million in funding, providing the resources to rapidly scale the organization. The Company has offices in Somerville, and Andover, Massachusetts with over 200 employees.

The Role:

Generation of novel proteins through data-driven machine learning models is at the core of Generate’s platform. We aim to upend the traditional approach to drug development towards one characterized by intentionality, surgical precision, and speed by developing methods for protein generation that can reliably generalize across biological functions, disease areas, and therapeutic modalities.

We are seeking creative, motivated Machine Learning Scientists to develop and apply our core technologies for ML-powered protein generation. They will join a vibrant and growing machine learning group at Generate to develop innovative methods for protein generation and modeling, leveraging both in-house and external data to train and evaluate models while also deploying new algorithms into production on our experimental platform. The successful candidate will work closely with experimental scientists from Protein Sciences and Medicines groups to rapidly advance the scientific program.

Key responsibilities:

  • Develop novel machine learning models and algorithms for data-driven generation of proteins, and hone them through deployment on our experimental platform.
  • Advance and evaluate the state of the art for machine learning models of protein sequence, structure, and function, including but not limited to protein sequence design, structure prediction, complex prediction, and function learning.
  • Use our integrated data platform to devise models able to leverage measured labels “in-the-loop”.
  • Work with Protein Sciences and Medicines groups to tailor modeling efforts toward high-impact therapeutic applications.
  • Develop production-quality code in a team setting and work with MLOps for deploying and training models at scale.
  • Present progress from scientific work in regular research meetings and prepare reports and slide decks for broader internal and external communication.


  • PhD in Computational Biology, Computer Science, or a related field with demonstrated experience working on scientific applications
  • 3+ years of experience with developing Machine Learning methods to solve scientific problems, with a particular interest towards applications to protein modeling as well as adjacent fields such as genomics, chemistry, immunology, or physics
  • Experience developing, debugging, and applying models using modern deep learning frameworks.
  • Proficiency in Python and experience analyzing data with Numpy/Scipy, R, or similar.

Nice to have:

    • Foundational knowledge around probabilistic machine learning and optimization methods
    • Practical experience developing deep generative models (e.g., autoregressive models, VAEs, Flows, GANs, EBMs etc.)
    • Publications in major ML conferences or scientific journals that apply ML to problems in molecular biology, structural biology, or genetics, especially at the intersection of machine learning and proteins.
    • Demonstrated experience developing software in a team setting.
    • Experience with optimizing performant code.


Generate Biomedicines is committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status.

COVID Safety:

Generate Biomedicines enforces a mandatory vaccination policy for COVID-19. All employees must be fully vaccinated and have received a booster.  The purpose of this policy is to safeguard the health of our employees, their families, and the community at large from infectious disease that may be reduced by vaccinations.  The company will make exceptions to this policy if required by applicable law and will consider requests for an exemption from this policy due to a medical reason, or because of a sincerely held religious belief, or any other exemptions that may be recognized by applicable.

  Recruitment & Staffing Agencies: Generate Biomedicines do not accept unsolicited resumes from any source other than candidates. The submission of unsolicited resumes by recruitment or staffing agencies to Generate Biomedicines or its employees is strictly prohibited unless contacted directly by Generate Biomedicines's internal Talent Acquisition team. Any resume submitted by an agency in the absence of a signed agreement will automatically become the property of Generate Biomedicines and Generate Biomedicines will not owe any referral or other fees with respect thereto.

* Salary range is an estimate based on our salary survey 💰

Tags: Autoregressive models Biology Chemistry Computer Science Deep Learning Engineering Generative modeling Genetics Machine Learning ML models MLOps NumPy PhD Physics Python R Research SciPy

Perks/benefits: Career development Conferences

Region: North America
Country: United States
Job stats:  12  0  0
  • Share this job via
  • or

More jobs like this

Explore more AI/ML/Data Science career opportunities

Find open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general, filtered by job title or popular skill, toolset and products used.