Machine Learning Engineer

San Diego

Full Time Mid-level / Intermediate USD 126K - 210K *

AI Fund

View company page

Apply now Apply later

Posted 4 weeks ago

About RapidFire AI
RapidFire AI is a cutting-edge deep tech startup specializing in scaling Machine Learning solutions. We are dedicated to empowering customers to effortlessly scale their AI workloads, ensuring they stay at the forefront of innovation in their industries.
About the Role
We are seeking a highly motivated and skilled Machine Learning Engineer to join our growing team. In this role, you will be responsible for developing distributed infrastructure for Deep Learning (DL) applications on the cloud, as well as contributing towards the design of newer features. You will collaborate closely with other developers and customer-facing personnel to ensure a seamless product experience.

Responsibilities:

Design, develop, deploy, and maintain large-scale DL infrastructure software, following best practices and SE guidelines
Contribute to designing efficient distributed systems that can scale DL computations, be modular and fault tolerant
Automate the set up, launch, and orchestration of end-to-end training and experimentation pipelines written with PyTorch, Tensorflow, or KerasUse and extend libraries like FSDP, DDP, DeepSpeed, and GPipe to train DL models across multiple GPUs
Use tools like Pandas and Dask to handle large multimodal datasets
Troubleshoot code and fix bugs to ensure smooth functioning of the applicationMonitor and troubleshoot cluster resource usage to ensure optimal performance
Conform to continuous integration and continuous delivery (CI/CD) pipeline standards for code deployment
Communicate effectively with the wider team to ensure successful application development and deployment
Collaborate with other developers to define and implement cloud infrastructure strategies for DL applications
Stay up-to-date with the latest advancements in DL and AI technologies and best practices

Required Skills:

4+ years programming experience with PythonProven experience as an ML Engineer working on deploying production model training and/or inferenceExcellent knowledge of the DL tools PyTorch and TensorFlow
Excellent knowledge and experience of using DL systems libraries such as FSDP, DDP, DeepSpeed, and GPipe
Familiarity with LLMs, finetuning, and associated conceptsFamiliarity with operating systems concepts, memory management, networking, and cloud computing
Familiarity with AWS infrastructure components like EC2, S3, EBS, EFS, EKS, and LambdaBasic experience with version control systems (e.g., Git) and collaborative development workflows
Understanding of CI/CD methodologies and toolsExcellent communication and collaboration skills
Ability to work independently and as part of a team
Strong problem-solving and analytical skills
A passion for learning and staying updated with the latest ML technologies

Nice to Have:

Familiarity with Docker and Kubernetes to integrate code with underlying layers of deployment
Experience working on AWS or other public cloud providers
Experience with ML usability tools such as MLFlow, W&B, or AWS Sagemaker

Apply now Apply later

Share this job via
or

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Tags: AWS CI/CD DDP Deep Learning Distributed Systems Docker EC2 FSDP Git Kubernetes LLMs Machine Learning MLFlow Model training Pandas Pipelines PyTorch SageMaker TensorFlow Weights & Biases

Perks/benefits: Career development Startup environment

Region: North America

Country: United States

Job stats: 16 8 0

Categories: Engineering Jobs Machine Learning Jobs

More jobs like this

« Back to job search To the top ↑

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.