Machine Learning Engineer
San Diego
AI Fund
RapidFire AI is a cutting-edge deep tech startup specializing in scaling Machine Learning solutions. We are dedicated to empowering customers to effortlessly scale their AI workloads, ensuring they stay at the forefront of innovation in their industries.
About the Role
We are seeking a highly motivated and skilled Machine Learning Engineer to join our growing team. In this role, you will be responsible for developing distributed infrastructure for Deep Learning (DL) applications on the cloud, as well as contributing towards the design of newer features. You will collaborate closely with other developers and customer-facing personnel to ensure a seamless product experience.
Responsibilities:
- Design, develop, deploy, and maintain large-scale DL infrastructure software, following best practices and SE guidelines
- Contribute to designing efficient distributed systems that can scale DL computations, be modular and fault tolerant
- Automate the set up, launch, and orchestration of end-to-end training and experimentation pipelines written with PyTorch, Tensorflow, or KerasUse and extend libraries like FSDP, DDP, DeepSpeed, and GPipe to train DL models across multiple GPUs
- Use tools like Pandas and Dask to handle large multimodal datasets
- Troubleshoot code and fix bugs to ensure smooth functioning of the applicationMonitor and troubleshoot cluster resource usage to ensure optimal performance
- Conform to continuous integration and continuous delivery (CI/CD) pipeline standards for code deployment
- Communicate effectively with the wider team to ensure successful application development and deployment
- Collaborate with other developers to define and implement cloud infrastructure strategies for DL applications
- Stay up-to-date with the latest advancements in DL and AI technologies and best practices
Required Skills:
- 4+ years programming experience with PythonProven experience as an ML Engineer working on deploying production model training and/or inferenceExcellent knowledge of the DL tools PyTorch and TensorFlow
- Excellent knowledge and experience of using DL systems libraries such as FSDP, DDP, DeepSpeed, and GPipe
- Familiarity with LLMs, finetuning, and associated conceptsFamiliarity with operating systems concepts, memory management, networking, and cloud computing
- Familiarity with AWS infrastructure components like EC2, S3, EBS, EFS, EKS, and LambdaBasic experience with version control systems (e.g., Git) and collaborative development workflows
- Understanding of CI/CD methodologies and toolsExcellent communication and collaboration skills
- Ability to work independently and as part of a team
- Strong problem-solving and analytical skills
- A passion for learning and staying updated with the latest ML technologies
Nice to Have:
- Familiarity with Docker and Kubernetes to integrate code with underlying layers of deployment
- Experience working on AWS or other public cloud providers
- Experience with ML usability tools such as MLFlow, W&B, or AWS Sagemaker
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: AWS CI/CD DDP Deep Learning Distributed Systems Docker EC2 FSDP Git Kubernetes LLMs Machine Learning MLFlow Model training Pandas Pipelines PyTorch SageMaker TensorFlow Weights & Biases
Perks/benefits: Career development Startup environment
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Data Manager jobs
- Open Lead Data Analyst jobs
- Open MLOps Engineer jobs
- Open Senior Business Intelligence Analyst jobs
- Open Data Science Manager jobs
- Open Principal Data Engineer jobs
- Open Data Engineer II jobs
- Open Sr Data Engineer jobs
- Open Power BI Developer jobs
- Open Data Scientist II jobs
- Open Product Data Analyst jobs
- Open Business Intelligence Developer jobs
- Open Data Analytics Engineer jobs
- Open Junior Data Scientist jobs
- Open Sr. Data Scientist jobs
- Open Senior Data Architect jobs
- Open Business Data Analyst jobs
- Open Data Analyst Intern jobs
- Open Big Data Engineer jobs
- Open Principal Data Scientist jobs
- Open Manager, Data Engineering jobs
- Open Junior Data Engineer jobs
- Open Data Product Manager jobs
- Open Data Quality Analyst jobs
- Open Research Scientist jobs
- Open GCP-related jobs
- Open Data quality-related jobs
- Open Business Intelligence-related jobs
- Open Java-related jobs
- Open ML models-related jobs
- Open Data management-related jobs
- Open Privacy-related jobs
- Open Data visualization-related jobs
- Open Finance-related jobs
- Open Deep Learning-related jobs
- Open PhD-related jobs
- Open APIs-related jobs
- Open TensorFlow-related jobs
- Open PyTorch-related jobs
- Open NLP-related jobs
- Open Consulting-related jobs
- Open Snowflake-related jobs
- Open CI/CD-related jobs
- Open LLMs-related jobs
- Open Kubernetes-related jobs
- Open Generative AI-related jobs
- Open Data governance-related jobs
- Open Hadoop-related jobs
- Open Airflow-related jobs
- Open Docker-related jobs