AI/ML Infrastructure Lead

Remote, United States

Red Hat

Red Hat is the world’s leading provider of enterprise open source solutions, including high-performing Linux, cloud, container, and Kubernetes technologies.

View all jobs at Red Hat

About the job

The Red Hat Engineering team is looking for a lead developer in the PyTorch community to join us as a Principal Software Engineer in our growing AI engineering team.  In this role, you will lead the design and development of the AI/ML components for Red Hat Enterprise Linux (RHEL).  This AI/ML engineering effort is focused on developing, improving and integrating the PyTorch framework, as well as the related support libraries and tools necessary to give ISVs, our internal RHEL teams and our layered product teams (e.g., Red Hat OpenShift AI) the environment required to develop AI-based applications, models, tools, etc.  You will work alongside our GPU and Hardware Accelerator teams, who will be integrating hardware support for the AI components that this team will be building.  We're open to candidates in the US and other countries.

What you will do

  • Design and build the RHEL AI/ML infrastructure
  • Actively participate in and collaborate with the AI/ML framework upstream communities, like PyTorch
  • Work closely with key stakeholders within Red Hat who are using the frameworks, 
  • libraries and tools that you build
  • Optimize the AI infrastructure to better perform and integrate with our layered products
  • Be a key part of the team defining the AI platforms that will be used by the AI/ML industry

What you will bring

  • Expertise in the PyTorch AI/ML platform, specifically focusing on PyTorch internals
  • Fluent in Python, C/C++, and Linux scripting tools
  • Open source development experience with the internals of PyTorch, and related tools, utilities and libraries
  • Strong English written and verbal communication and organizational skills
  • Passion for growing the AI/ML infrastructure

 

The following are considered a plus

  • Master's degree or PhD in computer science or related field is a big plus
  • Experience with AI/ML model development on Pytorch or other AI/ML platforms like Tensorflow
  • Experience with LLVM or MLIR compiler infrastructure project
  • Experience in operating system integration and tooling; RHEL experience is a big plus
  • Experience with container-related technologies like Podman, OpenShift or Kubernetes
  • Knowledge of Git version control system
  • Passion for developing open source solutions for the AI/ML space

 

 

The salary range for this position is $142,140 - $234,500. Actual offer will be based on your qualifications.

 

 

 

 

#LI-REMOTE

#LI-LS2

Job stats:  26  1  0

Tags: Computer Science Engineering Git GPU Kubernetes Linux Machine Learning ML infrastructure ML models Open Source PhD Python PyTorch TensorFlow

Regions: Remote/Anywhere North America
Country: United States

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.