Sr Genomic Data Scientist (Hereditary Disease)
Redwood City, CA
Passionate about precision medicine and advancing the healthcare industry?
Recent advancements in genomics and computer technology have finally made it possible for AI to impact clinical care in a meaningful way. Tempus' proprietary platform connects an entire ecosystem of real-world evidence to deliver real-time, actionable insights to physicians, providing critical information about the right treatments for the right patients, at the right time.
We are seeking a genomics data scientist with interdisciplinary experience, including a track record of supporting innovative, high quality research by managing and modelling large volumes of clinical, genetic and/or genomic data and results in a distributed database and analytical environment. You will lead the data ingestion, organization, and implementation of analysis workflows for large-scale human cohorts with genetic and multi-dimensional, multi-modality phenotypic data. These will be obtained from public sources, as well as private datasets generated in-house and obtained through our collaborations.
- Bring in genomics/genetics datasets from external and internal sources to help develop internal resources for various analytical approaches
- Prototype a robust data platform to efficiently house and represent critical human genetic, genomic, and clinical/phenotypic data, to inform genetic risk predictive models, cohort selection and clinical test validation across a range of disease areas.
- Develop scalable and high quality analysis pipelines for clinical trials and clinical diagnostics products.
- Leverage the opportunities and efficiencies afforded by access to hybrid cloud-based, distributed ecosystem of database technologies
- Collaborate with other data scientists and statistical geneticists to leverage multimodal data in training polygenic risk scores, machine learning, and other predictive models.
- Work with scientists and clinicians to design and perform analyses on clinical sequencing data that generate clinically actionable insights in order to improve quality of care.
- Communicate with internal and external scientific teams as well as product, science, and bioinformatics leadership.
- Produce high quality and detailed documentation for all projects.
- PhD/Masters or equivalent experience in genetics, biomedical informatics, or related life sciences areas.
- 5+ years of experience in complex data analysis, architecture design, and familiarity with applications of FAIR principles
- Hands-on development and maintenance of database systems and data manipulation using SQL, working within a POSIX CLI environment
- Computational skills using Python (strongly preferred), Java, C/C++ or other programming languages.
- Experience with complex longitudinal human clinical/phenotype data, e.g. from electronic health records, epidemiological cohorts, or clinical trials.
- Experience with genetic and genomic data types, including public genetic databases and results data from high-throughput genetic assays (e.g. UK Biobank, Gnomad, etc.).
Ideal Candidates Will Possess
- Experience with Python/Jupyter notebooks and/or R/Bioconductor in analyzing large data sets.
- Experience mining modern, large-scale genetic databases (e.g. ExAC/gnomAD, UK Biobank, UK10K, EBI GWAS Catalog, 1KG, etc.).
- Experience with distributed database technologies and related big-data analysis tools (e.g. Spark, BigQuery; the Apache Hadoop/Hive ecosystem).
- Experience with communicating insights and presenting concepts to a diverse audience.
- Demonstrated knowledge in best-practice coding processes and data change control.
- Experience in implementing and parallelizing pipelines in cloud computing environments.
- Experience in analyzing large scale multimodal datasets to train and validate machine learning models.
- Self-driven and works well in interdisciplinary teams.
- Track record of publications.