Machine Learning Infrastructure Engineer, ML Platform
Seattle, San Francisco, Remote (North America)
Stripe
Stripe powers online and in-person payment processing and financial solutions for businesses of all sizes. Accept payments, send payouts, and automate financial processes with a suite of APIs and no-code tools.Stripe's mission is to increase the GDP of the internet. To do this, we need to fight fraud at scale and build great software products, which means assembling strong machine learning teams and equipping them with the technologies they need to be effective. Our mission on Machine Learning Platform is to make these teams more impactful by providing reliable and flexible infrastructure to enable Machine Learning at scale.
The Machine Learning Platform team does this by designing and engineering the underlying infrastructure that powers experimentation, training and serving for Stripe’s key machine learning systems. Our flagship products include Railyard and Diorama. Railyard provides an expressive and powerful interface for model training at scale. Diorama enables model serving in real-time with strong reliability and latency guarantees. We work closely with ML engineers, data scientists, and platform infrastructure teams to build the powerful, flexible, and user-friendly systems that substantially increase ML velocity across the company.
You will work on:
- Building powerful, flexible, and user-friendly infrastructure that powers all of ML at Stripe
- Designing and building fast, reliable services for ML model training and serving, and distributing that infrastructure across multiple regions
- Creating services and libraries that enable ML engineers at Stripe to seamlessly transition from experimentation to production across Stripe’s systems
- Pairing with product teams and ML modeling engineers to develop easy to use infrastructure for production ML models
We are looking for:
- A strong engineering background and experience with data infrastructure and/or distributed systems
- Experience optimizing the end-to-end performance of distributed systems
- Experience developing and maintaining distributed systems built with open source tools
- Experience with or strong interest in developing ML models
Nice to haves:
- Experience with Scala and Python
- Experience with Kubernetes
- Experience with creating developer tools
- Experience with model training and serving in production and at scale
- Experience in writing and debugging ETL jobs using a distributed data framework (such as Spark, Kafka, or Flink)
It’s not expected that you’ll have deep expertise in every dimension above, but you should be interested in learning any of the areas that are less familiar.
Tags: Distributed Systems Engineering ETL Flink Kafka Kubernetes Machine Learning ML models Model training Open Source Python Scala Spark
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Marketing Data Analyst jobs
- Open MLOps Engineer jobs
- Open AI Engineer jobs
- Open Junior Data Scientist jobs
- Open Data Engineer II jobs
- Open Senior Data Architect jobs
- Open Sr Data Engineer jobs
- Open Senior Business Intelligence Analyst jobs
- Open Data Analytics Engineer jobs
- Open Power BI Developer jobs
- Open Manager, Data Engineering jobs
- Open Product Data Analyst jobs
- Open Principal Data Engineer jobs
- Open Business Data Analyst jobs
- Open Data Quality Analyst jobs
- Open Data Manager jobs
- Open Sr. Data Scientist jobs
- Open Data Scientist II jobs
- Open Big Data Engineer jobs
- Open Business Intelligence Developer jobs
- Open Data Analyst Intern jobs
- Open Principal Data Scientist jobs
- Open ETL Developer jobs
- Open Azure Data Engineer jobs
- Open Data Product Manager jobs
- Open Business Intelligence-related jobs
- Open Data quality-related jobs
- Open Privacy-related jobs
- Open Data management-related jobs
- Open GCP-related jobs
- Open Java-related jobs
- Open ML models-related jobs
- Open Finance-related jobs
- Open Data visualization-related jobs
- Open Deep Learning-related jobs
- Open APIs-related jobs
- Open PyTorch-related jobs
- Open PhD-related jobs
- Open TensorFlow-related jobs
- Open Consulting-related jobs
- Open Snowflake-related jobs
- Open NLP-related jobs
- Open Data governance-related jobs
- Open Data warehouse-related jobs
- Open Airflow-related jobs
- Open Databricks-related jobs
- Open Hadoop-related jobs
- Open LLMs-related jobs
- Open DevOps-related jobs
- Open CI/CD-related jobs