Data Engineer, Informatics & ML Platform
Somerville, MA USA
Applications have closed
Flagship Pioneering, Inc.
We are Flagship Pioneering We are a biotechnology company that invents platforms and builds companies that change the world. CEO Chats from the Flagship…
Company Summary:
What if you could join a rapidly growing company and play a critical role in bringing new medicines to patients through looking at and treating disease in a revolutionary way?
Cellarity's mission is to bring breakthrough medicines to patients by completely redefining the way drugs are discovered. Founded by Flagship Pioneering in 2017, Cellarity is designing medicines against the cell as opposed to a single molecular target. The company has developed a unique combination of expertise across network biology, chemistry, high-resolution data, and machine learning to unlock new treatment options in a vast array of disease areas. Cellarity currently has drug discovery programs underway in metabolic disease, hematology, immuno-oncology and respiratory disease. The company has raised $123 million as part of a Series B funding round with contributions from world renown investors such as Blackrock, The Baupost Group, Banque Pictet, alongside Flagship Pioneering.
What this position is all about:
Research Informatics & Data Engineering is part of an enterprise effort to enable data-driven science at Cellarity by building a robust technology platform. This partner-centric group is embedded with stakeholders across Cellarity’s novel pipeline value chain from Computation & Data Science to Exploratory & Platform Biology and Medicinal Chemistry. Our focus is to build an end-to-end operational platform bridging lab data generation and data science in an exploratory environment, ensuring data is democratized across the company. We consistently strive to innovate, iterate, and improve our practices, while driving novel drug discovery at Cellarity.
The successful candidate will be responsible for advancing and optimizing our data infrastructure, architecture, integrations, and pipeline development, building a robust computational platform in collaboration with our bench and data scientists.
What you would be responsible for?
- Design, implement, test, and maintain data pipelines for various workloads, including scientific data ingestion, platform integrations, instrument raw data processing, computational & data science workflows, ML model training, and inference at scale.
- Develop well-documented production-ready code, working in a collaborative CI/CD development environment including use of git and participation in code reviews.
- Design and implement high-quality testable APIs and microservices.
- Implement and maintain databases for raw and processed scientific data from a variety of internal and external sources (e.g., partner and public repositories).
- Design data models for entities, assays, and results from experiments and informatics pipelines in collaboration with bench and computational scientists.
- Define, contribute to, and proactively communicate data engineering standards and practices establishing repeatable templates and frameworks and efficient usage of cloud services and tools.
- Manage relationships and build solutions with external consultants/contractors and vendor engineers.
- Innovate and advise on the latest technologies and standard methodologies in Data Engineering and be able to identify and implement effective technical solutions.
- Assist in the management and administration of our AWS environment.
What experiences will you need?
- BS/MS in Computer Science, Bioinformatics, Data Science, or a related discipline with 5+ years of software engineering experience.
- 5+ years of hands-on Python development experience, Pythonic design and object-oriented programming. Experience with R is a plus.
- Demonstrated proficiency with workflow orchestration frameworks such as Prefect, Airflow, Nextflow, Snakemake, and AWS Step Functions; scientific data and NGS pipeline development a plus.
- Demonstrated proficiency with cloud development (AWS strongly preferred) using infrastructure-as-code frameworks, computing services (ie AWS ECS, Batch, etc)
- Proficiency with database engineering and optimization (ie PostgreSQL, GraphQL, Redshift, Aurora, etc)
- Practical experience with data and metadata modeling, including alignment of optimized database design with metadata usage.
- Proficiency with modern software development methodologies such as Agile, source control, project management, and issue tracking with JIRA.
- Demonstrated ability to successfully work in cross-functional teams with an emphasis on teamwork, collaboration, and communication within the team and across the department
What will set you apart?
- Professional AWS certifications.
- Experience in building pipelines/workflows for biomedical, NGS, and/or high-throughput molecular profiling data.
- Experience with Electronic Lab Notebook (ELN) & LIMS platforms.
- Proficiency with container strategies using Docker, Fargate, and ECR.
- Proficiency with Linux and shell scripting
- Experience working with GxP and non-GxP data
What it’s like to work at Cellarity:
At Cellarity, we
- Push Boundaries: we create a legacy with breakthrough science in the service of patients
- Act with urgency: we work quickly and with conviction, and are eager to learn from data to iterate
- Own it: We transcend our job descriptions and relentlessly follow through on our commitments
- Tell it like it is: We give regular feedback on behaviors and are accountable for how we treat people
- Energize others: we are easy to work with and build strength from differing perspectives
Cellarity is committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status.
Recruitment & Staffing Agencies: Cellarity does not accept unsolicited resumes from any source other than candidates. The submission of unsolicited resumes by recruitment or staffing agencies to Cellarity or its employees is strictly prohibited unless contacted directly by Cellarity’s internal Talent Acquisition team. Any resume submitted by an agency in the absence of a signed agreement will automatically become the property of Cellarity, and Cellarity will not owe any referral or other fees with respect thereto.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Agile Airflow APIs Architecture AWS Biology Chemistry CI/CD Computer Science Data pipelines Docker Drug discovery ECS Engineering Git GraphQL Jira Linux Machine Learning Microservices Model training OOP Pipelines PostgreSQL Python R Redshift Research Shell scripting
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open MLOps Engineer jobs
- Open Lead Data Analyst jobs
- Open Data Science Manager jobs
- Open Senior Business Intelligence Analyst jobs
- Open Data Manager jobs
- Open Data Engineer II jobs
- Open Power BI Developer jobs
- Open Principal Data Engineer jobs
- Open Sr Data Engineer jobs
- Open Business Intelligence Developer jobs
- Open Junior Data Scientist jobs
- Open Data Analytics Engineer jobs
- Open Data Scientist II jobs
- Open Product Data Analyst jobs
- Open Sr. Data Scientist jobs
- Open Senior Data Architect jobs
- Open Business Data Analyst jobs
- Open Data Analyst Intern jobs
- Open Big Data Engineer jobs
- Open Manager, Data Engineering jobs
- Open Azure Data Engineer jobs
- Open Data Quality Analyst jobs
- Open Data Product Manager jobs
- Open Junior Data Engineer jobs
- Open Principal Data Scientist jobs
- Open GCP-related jobs
- Open Data quality-related jobs
- Open Business Intelligence-related jobs
- Open ML models-related jobs
- Open Java-related jobs
- Open Data management-related jobs
- Open Privacy-related jobs
- Open Finance-related jobs
- Open Data visualization-related jobs
- Open Deep Learning-related jobs
- Open PhD-related jobs
- Open APIs-related jobs
- Open TensorFlow-related jobs
- Open PyTorch-related jobs
- Open NLP-related jobs
- Open Consulting-related jobs
- Open Snowflake-related jobs
- Open CI/CD-related jobs
- Open Kubernetes-related jobs
- Open LLMs-related jobs
- Open Generative AI-related jobs
- Open Data governance-related jobs
- Open Hadoop-related jobs
- Open Airflow-related jobs
- Open Docker-related jobs