Senior Software Engineer, Machine Learning

Remote, ON, Canada

Red Hat

Red Hat is the world’s leading provider of enterprise open source solutions, including high-performing Linux, cloud, container, and Kubernetes technologies.

View company page

About the job

Do you want to be part of a team that is focused on scaling the deployment, prompt-tuning, and monitoring of Foundation Models and Large Language Models (LLMs)? The OpenShift AI team is looking for a Senior Software Engineer with Kubernetes and Machine Learning experience to join our rapidly growing engineering team. Our team’s focus is to make machine learning model deployment and monitoring seamless and scalable across the hybrid cloud and the edge. This is a very exciting opportunity to build and impact the next generation of hybrid cloud MLOps platforms. 

 

In this role, you'll be contributing as a technical expert for the model serving and inference runtimes features of the open source Open Data Hub project and OpenShift AI by actively participating in KServe, Kubeflow, HuggingFace, vLLM, and several other open source communities. You will work as part of an evolving development team to rapidly design, secure, build, test and release model serving, trustworthy AI, and model registry capabilities. The role is primarily an individual contributor who will be a key notable contributor to the MLOps upstream communities and collaborate closely with the internal cross-functional development teams. 

 

What you will do

  • Be an influencer and leader in MLOps related open source communities to help build an active MLOps open source ecosystem for Open Data Hub and OpenShift AI

  • Act as a Model Serving SME within Red Hat by supporting customer facing discussions, presenting at technical conferences, and evangelizing OpenShift AI within the internal community of practices

  • Architect and design new features in collaboration with open source communities such as KubeFlow and KServe

  • Contribute to developing and integrating model inference and runtimes in OpenShift AI product

  • Collaborate with our product management and customer engineering teams to identify and expand product functionalities

  • Mentor, influence, and coach a team of distributed engineers

What you will bring

  • Advanced Level knowledge of Python or Golang a MUST
  • Hands on experience in deploying and maintaining machine learning models in production environments

  • Ideally work Hybrid in Montreal CA or Toronto CA  or be remote Eastern Time Zone
  • Solid understanding of the fundamentals of model inferencing and runtimes architectures

  • Advanced level knowledge and experience with development in Go, and Python 

  • Strong experience with Kubernetes

  • Excellent written and verbal communication skills; fluent English language skills

The following will be considered a plus: 

  • Bachelor's degree in statistics, mathematics, computer science, operations research, or a related quantitative field, or equivalent expertise; Master’s or PhD is a big plus

  • Experience in engineering, consulting or another field related to model serving and monitoring, model registry, deep neural networks, in a customer environment or supporting a data science team

  • Familiarity with popular python machine learning libraries such as PyTorch, Tensorflow, and Hugging Face

  • Experience with monitoring and alerting tools such as Prometheus

The salary range for this position is $111,260 - $183,530. Actual offer will be based on your qualifications.

 

 

 

#LI-LS2

#LI-REMOTE #LI-HM1

Apply now Apply later
  • Share this job via
  • or

Tags: Architecture Computer Science Consulting Engineering Golang HuggingFace Kubeflow Kubernetes LLMs Machine Learning Mathematics ML models MLOps Model deployment Model inference Open Source PhD Python PyTorch Research Statistics TensorFlow

Perks/benefits: Conferences

Regions: Remote/Anywhere North America
Country: Canada
Job stats:  16  1  0

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.