Machine Learning Operations Engineer

Vancouver, BC

Applications have closed

Sanctuary AI

Sanctuary is on a mission to create the world’s first human-like intelligence in general-purpose robots.

View company page

Your New Role and TeamSanctuary AI - a multi award-winning LinkedIn Top Startup company - is looking to hire a Machine Learning (ML) Operations Engineer. Reporting to the Data Collection Team Lead, you’ll gain a comprehensive understanding of the infrastructure that powers our ML training pipelines.
The best candidate for this role will be a versatile, creative engineer with proven experience in infrastructure management, cloud platform operations, and deployment automation. You’ll be a valued contributor as you learn how our sophisticated multi-degree-of-freedom robotic systems work, while collaborating with cross-functional teams, including ML, Platform, Product Design, and Hardware and Sensor teams, to build and operate our data collection and ML model training pipelines.

Our Success Criteria

  • Build and support a secure, extensive, scalable, repeatable, and high-performing data collection platform
  • Set up and manage the necessary infrastructure for machine learning workloads, including cloud services, containers, and orchestration tools
  • Evaluate and deploy new tools and processes to optimize the effectiveness of our data collection and ML research activities
  • Monitor training cluster performance, and troubleshoot hardware and software errors, including docker and GPU driver issues
  • Build data collection and ML training pipelines, and support our researchers in containerizing ML workloads

  • Your ExperienceQualifications
  • Bachelor's degree or higher in Computer Science or related field
  • 3+ years experience in with Docker, Kubernetes, and at least one of AWS, GCP, or Azure cloud services
  • 2+ years of experience with ML frameworks, platforms and tools
  • Knowledge of professional engineering practices for the full product life cycle, including coding standards, code reviews, source management, agile, processes, testing, and operations
  • Demonstrated ability to design, implement, and test in a fast-paced environment

  • Skills
  • Demonstrated proficiency with Python for data and ML pipeline development
  • Demonstrated proficiency with Linux, Docker, and Kubernetes
  • Demonstrated proficiency with Observability platforms such as Splunk, Datadog, ELK Stack, and Prometheus/Grafana
  • Demonstrated familiarity with MLOps and machine learning frameworks such as PyTorch and TensorFlow
  • A passion for deployment automation practices, such as GitOps and CI/CD

  • Traits
  • Above all else, a consistently positive attitude and a willingness to do whatever it takes to create robust solutions to complex problems
  • Optimistic listening and conflict resolution capabilities
  • Advanced verbal and written communication and interpersonal skills
  • Self-motivated and able to solve problems independently
  • Demonstrated ability to influence without authority and create a sense of urgency
  • Obsession with bringing human-like intelligence to machines

Working at Sanctuary AISanctuary AI is an equal opportunity employer; employment with Sanctuary AI is governed based on skills, competence, and qualifications and will not be influenced in any way by race, color, religion, gender, national origin/ethnicity, veteran status, disability status, age, sexual orientation, gender identity, marital status, mental or physical disability, or any other legally protected status. In 2023, Sanctuary AI moved into a state-of-the-art office facility and has been recognized by LinkedIn as a Top Startup company. BenefitsFull time (non co-op) employees enjoy medical/dental/vision coverage, life insurance, wellness programs, stock options, paid time off (3 weeks vacation accrued annually, paid statutory holidays, paid statutory sick leave, and statutory parental leave), scheduling and worksite flexibility by role, and more.
About Sanctuary AIFounded in 2018 by Geordie Rose, Suzanne Gildert, Olivia Norton, and Ajay Agrawal, Sanctuary AI is a Vancouver, Canada-based company. Sanctuary AI is on a mission to create the world’s first human-like intelligence in general-purpose robots that will help us work more safely, efficiently, and sustainably. And in the not-too-distant future, help us explore, settle, and prosper in outer space.
Members of the Sanctuary AI team founded D-Wave (a pioneer in the quantum computing industry), Kindred (first use of reinforcement learning in a production robot), and the Creative Destruction Lab (pioneered a revolutionary method for the commercialization of science for the betterment of humankind). The team has experience launching market-defining innovations rooted in previously unsolved and deep scientific problems.



Recruiting & Employment Agency Notice:Recruitment and hiring is conducted internally by Sanctuary AI. We are not seeking or soliciting any new agency partnerships or agreements at this time. Any employment agency or professional recruiter (“Agency”) that provides an unsolicited resume(s) or otherwise presents a prospective job candidate through the Sanctuary AI career site or directly to any Sanctuary AI employee, irrevocably grants to Sanctuary AI the unrestricted right to engage, hire, or contract with that candidate at Sanctuary AI's sole discretion without any compensation to the Agency. We appreciate your interest in working together, and should the need arise our Talent Acquisition team will contact any external firms directly.

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Tags: Agile AWS Azure CI/CD Computer Science Docker ELK Engineering GCP GPU Grafana Kubernetes Linux Machine Learning MLOps Model training Pipelines Python PyTorch Reinforcement Learning Research Splunk TensorFlow Testing

Perks/benefits: Career development Equity Health care Insurance Medical leave Parental leave Startup environment

Region: North America
Country: Canada
Job stats:  27  4  0

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.