Sr. ML Infrastructure Engineer
Remote
Runway
Runway is an applied AI research company shaping the next era of art, entertainment and human creativity.At Runway, we believe everyone has a story to tell. Our mission is to make professional video and content creation accessible to all. We are taking recent advancements in computer graphics, the web, and machine learning to push the boundaries of creativity and in turn, lower the barriers of content creation; unfastening a new wave of storytelling 🚀
Over the last three years, we’ve raised funding from top-tier investors including Coatue, Lux, and Amplify, all with a team small enough to fit at one (growing) table. Our team consists of creative, open minded, caring and entrepreneurial individuals from all walks of life. We aspire to build incredible things which starts with building an incredible team, so we’d love to hear from you! 😄
About the role 🎉
We’re looking for a Senior ML Infrastructure Engineer to help us scale our infrastructure and tooling behind the development, testing, and deployment of our machine learning based products. The ideal candidate for this role has experience provisioning large compute clusters for machine learning workflows, has experience supporting teams to create best practices for reliability and scalability, and thrives in fast-paced, high-ownership environment.
A peek at our technical stack 🔍
The rich UI of our video editing and collaboration tools is powered by Typescript and React/Redux, while the real time compositing and graphics engine behind our interactive preview runs on WebGL2 and WebAssembly. Our video streaming backend components are written in Python, use a lot of FFmpeg/libav and HLS for on-the-fly transcoding, PyTorch and TorchScript for ML inference, and are deployed as containerized services on Kubernetes. Our API endpoints for real-time collaboration and media asset management are written in Typescript and node.js and are deployed as serverless functions on AWS Lambda.
What you’ll do 🎨
- Manage large compute clusters for ML training, inference, and development
- Create tooling and infrastructure that abstract compute and storage in ML development workflows
- Build automation and CI/CD pipelines for developing and deploying new machine learning models
What you’ll need 💻
- 3+ years of experience in a DevOps or Infrastructure Engineer role building machine learning infrastructure and working with large GPU clusters
- Knowledge of cloud providers such as AWS, GCP, or Azure, infrastructure-as-code frameworks such as Terraform, observability tools such as Grafana
- Interest and experience supporting engineering teams in creating robust processes for automation, reliability, and instrumentation
- Strong communication, collaboration, and documentation skills
Working at Runway 🥳
We are a small and growing team of artists, engineers, researchers, and dreamers working together to reimagine creativity. And we’re building a unique team of talented individuals from diverse backgrounds. We believe that this will allow us to continue to up-level each other, our company, and our product. We’re looking for people that will add to our culture, not just fit in.
We’re committed to creating a space where our employees can bring their full selves to work and have equal opportunity to succeed. So regardless of race, gender identity or expression, sexual orientation, religion, origin, ability, age, veteran status, if joining this mission speaks to you, we encourage you to apply!
Keep exploring Runway:
Our Behaviors and Company Mission
Building Impossible Things: A glimpse into the future of Runway
Tags: APIs AWS Azure CI/CD Content creation DevOps Engineering GCP GPU Grafana Kubernetes Lambda Machine Learning ML infrastructure ML models Node.js Pipelines Python PyTorch React Research Streaming Terraform Testing TypeScript
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Data Science Manager jobs
- Open Lead Data Analyst jobs
- Open MLOps Engineer jobs
- Open Senior Business Intelligence Analyst jobs
- Open Data Engineer II jobs
- Open Data Manager jobs
- Open Sr Data Engineer jobs
- Open Power BI Developer jobs
- Open Principal Data Engineer jobs
- Open Data Analytics Engineer jobs
- Open Business Intelligence Developer jobs
- Open Junior Data Scientist jobs
- Open Data Scientist II jobs
- Open Product Data Analyst jobs
- Open Senior Data Architect jobs
- Open Sr. Data Scientist jobs
- Open Business Data Analyst jobs
- Open Big Data Engineer jobs
- Open Data Analyst Intern jobs
- Open Manager, Data Engineering jobs
- Open Azure Data Engineer jobs
- Open Data Quality Analyst jobs
- Open Data Product Manager jobs
- Open Junior Data Engineer jobs
- Open Principal Data Scientist jobs
- Open Data quality-related jobs
- Open Business Intelligence-related jobs
- Open GCP-related jobs
- Open ML models-related jobs
- Open Data management-related jobs
- Open Java-related jobs
- Open Privacy-related jobs
- Open Data visualization-related jobs
- Open Finance-related jobs
- Open APIs-related jobs
- Open Deep Learning-related jobs
- Open PyTorch-related jobs
- Open Snowflake-related jobs
- Open Consulting-related jobs
- Open TensorFlow-related jobs
- Open PhD-related jobs
- Open CI/CD-related jobs
- Open NLP-related jobs
- Open Kubernetes-related jobs
- Open Data governance-related jobs
- Open Airflow-related jobs
- Open Hadoop-related jobs
- Open LLMs-related jobs
- Open Data warehouse-related jobs
- Open Databricks-related jobs