AI Data Engineer
Foster City, CA (Hybrid)
Full Time Mid-level / Intermediate USD 150K - 190K
About Replit
Replit is where the world’s most productive developers write, share, and deploy code. Get started writing an application quickly without spending a second on environments or setup. Remix templates from the world’s top companies. Deploy in one click. Go from idea to software, fast.
About the role:
As an AI Data Engineer at Replit, your core mission is to empower and amplify AI capabilities through the seamless management and transformation of data at scale. You will collaborate closely with our platform and workspace teams to architect and build data pipelines and transformations, enabling us to create training datasets for our novel AI models. Your work will not only require processing data at scale, but also focusing (almost obsessively) on data quality.
If you are excited about the data pipeline we built for Replit Code Repair (Blog Post), this is the job for you! We are already working on new AI projects leveraging our unique data at Replit, and your role will help speed them up substantially.
You will…:
Develop data pipelines to efficiently collect and process platform and workspace data, seamlessly transferring it to cloud storage and data warehouses.
Create ETL jobs to wrangle, clean, filter, deduplicate, etc. training datasets for AI models.
Quickly iterate over several data pipelines, driven by the insights gained by ablation studies.
Required skills and experience:
Possess a minimum of 5 years of industry experience in a Data Engineering role, demonstrating a strong foundation in data manipulation and transformation.
Proficiency in building data pipelines and workflows using Spark/Databricks and other ETL tools, with an emphasis on scalability and performance optimization.
A keen interest in understanding the science (and art) of creating training datasets.
Self-driven and comfortable working autonomously, capable of taking ownership of complex AI data engineering projects.
Bonus Points:
Interest in the Developer Tools space
Full-Time Employee Benefits Include:
🧑💻 Flexible Work Hours
💰 Competitive Salary & Equity
🖥 Home Office Set-Up Stipend
⚕️ Health, Dental, Vision and Life Insurance
🩼 Short Term and Long Term Disability
📱 Monthly Expenses Stipend
🚼 Parental and Baby Bonding Leave
🏝 Flexible Time Off (FTO) + Holidays
🚀 Annual company/team offsites (4/year)
Want to Learn More?
Replit Product
Interviewing + Culture
To achieve our mission of making programming more accessible around the world, we need our team to be representative of the world. We welcome your unique perspective and experiences in shaping this product. We encourage people from all kinds of backgrounds to apply, including and especially candidates from underrepresented and non-traditional backgrounds.
The overall market range of base compensation for roles in this area of Replit is typically $150,000 - $190,000. Compensation offered will be determined by additional factors such as location and experience.
This is a full-time hybrid role with an in-office requirement of Monday, Wednesday, and Friday.
Tags: Databricks Data pipelines Data quality Engineering ETL Pipelines Spark
Perks/benefits: Competitive pay Equity Flex hours Flex vacation Gear Health care Home office stipend Insurance Parental leave Salary bonus
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Data Science Manager jobs
- Open Lead Data Analyst jobs
- Open MLOps Engineer jobs
- Open Data Manager jobs
- Open Senior Business Intelligence Analyst jobs
- Open Principal Data Engineer jobs
- Open Data Engineer II jobs
- Open Power BI Developer jobs
- Open Sr Data Engineer jobs
- Open Data Scientist II jobs
- Open Data Analytics Engineer jobs
- Open Product Data Analyst jobs
- Open Business Intelligence Developer jobs
- Open Junior Data Scientist jobs
- Open Business Data Analyst jobs
- Open Sr. Data Scientist jobs
- Open Data Analyst Intern jobs
- Open Senior Data Architect jobs
- Open Big Data Engineer jobs
- Open Principal Data Scientist jobs
- Open Junior Data Engineer jobs
- Open Manager, Data Engineering jobs
- Open Data Quality Analyst jobs
- Open Azure Data Engineer jobs
- Open Data Product Manager jobs
- Open Data quality-related jobs
- Open GCP-related jobs
- Open Business Intelligence-related jobs
- Open Java-related jobs
- Open ML models-related jobs
- Open Data management-related jobs
- Open Privacy-related jobs
- Open Data visualization-related jobs
- Open Finance-related jobs
- Open Deep Learning-related jobs
- Open PhD-related jobs
- Open PyTorch-related jobs
- Open APIs-related jobs
- Open TensorFlow-related jobs
- Open NLP-related jobs
- Open Consulting-related jobs
- Open Snowflake-related jobs
- Open LLMs-related jobs
- Open CI/CD-related jobs
- Open Generative AI-related jobs
- Open Kubernetes-related jobs
- Open Hadoop-related jobs
- Open Data governance-related jobs
- Open Airflow-related jobs
- Open DevOps-related jobs