Computational Biologist, Data Engineering
Invitae is dedicated to bringing comprehensive genetic information into mainstream medicine to improve healthcare for billions of people. Our team is driven to make a difference for the patients we serve. We are leading the transformation of the genetics industry by making genetic testing affordable and accessible for everyone to guide health decisions across all stages of life.
Invitae needs a Computational Biologist with a special focus on high throughput genomics and data engineering to help us achieve our mission. The Data Engineering team is collaborating with the Research Team to build infrastructure and data products that will accelerate research discoveries and help us launch new products. The teams are highly skilled and composed of cross-functional groups of AI engineers and scientists, software engineering, bioinformaticians, and assay development scientists that collaborate with Product, Development, Operations, and Lab Automation teams to drive decision-making and data-driven product development at Invitae. The teams lead high-risk/high-impact projects, engages in internal and external collaborations, regularly partners with our executive team, and develops key strategic frameworks for all business areas.
We’re looking for someone that has an “anything is possible” mindset. You’re proud of your work, skills, and expertise, but you’re not cynical, inflexible or only concerned about your portfolio. You want to make a difference by improving health care. You love being part of a highly collaborative and high-energy team that isn’t afraid to swing for the fences and exceed expectations. You are a passionate, curious researcher with uncanny observational and analytical skills, an impressive ability to collaborate with other teams and a relentless passion to do what’s right for Invitae’s customers and our internal Research Team and Data Engineering teams. You can work with minimal direction while delivering high quality results, and have a good instinct for the point where the possible meets the practical. You will work on overlapping projects, so you are good at multitasking, juggling competing priorities, and organizing yourself to ensure you deliver on your commitments.
What you’ll do:
- Design and build an ETL pipeline that can process millions of samples of genetic data (e.g., panels, exomes, whole genomes).
- Develop data models to traverse multi-omic datasets for individual patient analysis and cohort analysis.
- Evaluate and use systems that can process large amounts of diverse data (e.g., AWS Healthlake, Snowflake, SPARK).
- Develop analysis capabilities to process the data and derive insights using computational tools and machine learning algorithms.
- Demonstrate proof-of-concept use cases to showcase the utility of a research data platform.
What you bring:
- Significant industry or academic experience (5+ years) working with large diverse datasets. Preference will be given to users that have demonstrated a deep understanding of the biological data as well as processing it.
- Proven experience optimizing performance with large datasets (multi-terabyte or more).
- Ability to develop flexible systems without clear requirements.
- Strong organizational, written, and oral communication skills, including client-facing project representation.
- Flexible and pragmatic problem solving skills, with a rigorous and data-driven approach to troubleshooting.
At Invitae, we value diversity and provide equal employment opportunities (EEO) to all employees and applicants without regard to race, color, religion, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We will consider for employment qualified applicants with criminal histories in a manner consistent with the requirements of the San Francisco Fair Chance Ordinance.