Senior Data Scientist
SanofiWe are an innovative global healthcare company with one purpose: to chase the miracles of science to improve people’s lives.
ABOUT THE JOB:
The Computational Biology Cluster is part of the Precision Medicine & Computation Biology (PMCB) global research function at Sanofi. We are looking for a Senior Scientist (genAI and LLMs for Precision Medicine research) with a passion for building software/data products for pharmaceutical, life science or healthcare verticals. The post holder will be part of the Data Science & Artificial Intelligence Lab in the Computational Biology cluster and helps to index, integrate, and infer new biomedical insights from massive-scale biomedical big data.
The Data Science lab is an innovation-driven team that uses the full spectrum of machine learning methods to address growing needs in precision medicine research. Within the lab, the successful candidate will specialize in methods related to generative AI (genAI) and large language models (LLMs) to accelerate drug target discovery, development, and repositioning.
Sanofi Research Dataset is poised to be one of the largest human disease datasets in the pharmaceutical industry. The successful candidate will have access to the data and collaborate with a multi-disciplinary group of talented scientists and will lead the development and implementation of state-of-the-art genAI, LLM and machine learning methods, focusing on training, and fine-tuning large models on text, images, as well as custom experimental data. The candidate will work in an exciting, interdisciplinary environment, overlapping the different stages of the discovery pipeline, and interact with multiple internal and external organizations. A close interaction and synergy with all PMCB clusters and various therapeutic area functions will be expected.
Use AI / ML to impact precision medicine and drug discovery research.
Develop, implement, and apply state-of-the-art ML-based methods to analyze large and/or complex collections of datasets.
Communicate clearly results and methodologies to multidisciplinary and international project teams.
Document and follow good coding practices.
Execute work plans on time, update, and report relevant results to project teams and stakeholders.
Maintain close collaborations with other data scientists as well as with scientists from a different background.
Constantly monitor literature to maintain in-depth knowledge of the most recent developments in data science, bioinformatics, and cutting-edge AI/ML/DL algorithms as well as the latest applications in the field of drug discovery.
Actively engage in evaluation and coordination of both academic and startup collaborations.
Education and Professional Experience
A PhD degree in Artificial Intelligence, Data Science, Computational Biology, Computer Science, Machine Learning or Bioinformatics.
0-3 years of post-PhD industry or academic experience with a strong track record of publications, accomplishments, and project experience in applications of generative AI and large language models.
Excellent attention to details, problem solving and dedication to address complex problems in biomedicine using an AI-first mindset.
Strong written, oral, and interpersonal communication skills.
Strong aptitude to work within multidisciplinary team environment.
Strong project management skills including organization, time management, prioritization and follow-up are key.
Experience with building and fine-tuning foundation models trained on text, image, genomic, clinical, healthcare and/or other data types.
Experience and demonstration of skills in a core machine learning area: computer vision, natural language processing, multi-modality learning
Experience in access, customization and internalization of models using model zoo’s including: TensorFlow Hub, PyTorch Hub, Hugging Face Transformers Hub, Model Zoo by Apache MXNet, Caffe Model Zoo, ONNX Model Zoo, ModelDepot, Fastai Model Zoo, TorchVision Models, Facebook AI Research (FAIR) Models
Experience with large Language Models, particularly GPT (Generative Pre-trained Transformer) variants, XLNet, Bloom, BERT, LaMDA, Falcon, Llama
Experience with advanced NLP techniques, software packages and algorithm development
Experience in AI/ML Ops and Data Ops for peta-bytes of data and database optimization
Knowledge of large language models, graph learning, deep learning, and generative AI algorithms.
Proficiency in Python and/or R
Experience with some of the leading AI/ML frameworks including TensorFlow, PyTorch, opencv, openslide, scikit-learn, scikit-image, scikit-LLM, langchain, OpenAI, Hugging Face, llm, lamini etc.
Experience with various database technologies including SQL, NoSQL, graph database and vector database.
Familiarity with good coding practices (documentation, version control) and modern environments (cloud, high performance computing).
Experience with a pharma / biotech environment or with translational research problems is a plus.
Experience in deploying models into production environments by leverage cloud (AWS, Azure or GCP), local or hybrid computing environment.
Fluency in English (spoken and written)
Sanofi Inc. and its U.S. affiliates are Equal Opportunity and Affirmative Action employers committed to a culturally diverse workforce. All qualified applicants will receive consideration for employment without regard to race; color; creed; religion; national origin; age; ancestry; nationality; marital, domestic partnership or civil union status; sex, gender, gender identity or expression; affectional or sexual orientation; disability; veteran or military status or liability for military status; domestic violence victim status; atypical cellular or blood trait; genetic information (including the refusal to submit to genetic testing) or any other characteristic protected by law.
At Sanofi diversity and inclusion is foundational to how we operate and embedded in our Core Values. We recognize to truly tap into the richness diversity brings we must lead with inclusion and have a workplace where those differences can thrive and be leveraged to empower the lives of our colleagues, patients and customers. We respect and celebrate the diversity of our people, their backgrounds and experiences and provide equal opportunity for all.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: AWS Azure BERT Big Data Biology Caffe Computer Science Computer Vision DataOps Deep Learning Drug discovery fastai GCP GPT HPC LangChain LLaMA LLMs Machine Learning MXNet NLP NoSQL ONNX OpenAI OpenCV PhD Python PyTorch R Research Scikit-learn SQL TensorFlow Testing Transformers XLNet
Perks/benefits: Startup environment
More jobs like this
Vienna, Vienna, Austria - … Vienna, Vienna, Austria - Remote Full TimeSenior Senior-levelUSD 136K - 205K * USD 136K+ *
Senior Data Scientist (remote from EU)Big Data Computer Science Finance Machine Learning ML models PhD +2
Career development Competitive pay Flex hours Startup environment Team events
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Principal Data Engineer jobs
- Open BI Analyst jobs
- Open Data Analyst Intern jobs
- Open Business Intelligence Engineer jobs
- Open Product Data Analyst jobs
- Open Lead Data Analyst jobs
- Open Senior Data Architect jobs
- Open Sr. Data Scientist jobs
- Open Data Manager jobs
- Open Big Data Engineer jobs
- Open Data Engineer II jobs
- Open Sr Data Engineer jobs
- Open Manager, Data Engineering jobs
- Open Power BI Developer jobs
- Open Senior Manager, Data Science jobs
- Open Data Analytics Engineer jobs
- Open Principal Data Scientist jobs
- Open Business Data Analyst jobs
- Open Lead Machine Learning Engineer jobs
- Open Research Scientist jobs
- Open Data Quality Analyst jobs
- Open Data Engineering Manager jobs
- Open Head of Data jobs
- Open Business Intelligence Developer jobs
- Open Clinical Data Manager jobs
- Open Data pipelines-related jobs
- Open Privacy-related jobs
- Open ML models-related jobs
- Open GCP-related jobs
- Open Data quality-related jobs
- Open Business Intelligence-related jobs
- Open Data management-related jobs
- Open Data visualization-related jobs
- Open Deep Learning-related jobs
- Open Finance-related jobs
- Open APIs-related jobs
- Open Snowflake-related jobs
- Open PyTorch-related jobs
- Open PhD-related jobs
- Open TensorFlow-related jobs
- Open Consulting-related jobs
- Open Hadoop-related jobs
- Open Airflow-related jobs
- Open NLP-related jobs
- Open Scala-related jobs
- Open Data warehouse-related jobs
- Open Databricks-related jobs
- Open Kubernetes-related jobs
- Open Git-related jobs
- Open Docker-related jobs