AI Data Engineer

Foster City, CA (Hybrid)

About Replit
Replit is where the world’s most productive developers write, share, and deploy code.  Get started writing an application quickly without spending a second on environments or setup.  Remix templates from the world’s top companies.  Deploy in one click.  Go from idea to software, fast.

About the role: 

As an AI Data Engineer at Replit, your core mission is to empower and amplify AI capabilities through the seamless management and transformation of data at scale. You will collaborate closely with our platform and workspace teams to architect and build data pipelines and transformations, enabling us to create training datasets for our novel AI models. Your work will not only require processing data at scale, but also focusing (almost obsessively) on data quality.

If you are excited about the data pipeline we built for Replit Code Repair (Blog Post), this is the job for you! We are already working on new AI projects leveraging our unique data at Replit, and your role will help speed them up substantially.

You will…:

  • Develop data pipelines to efficiently collect and process platform and workspace data, seamlessly transferring it to cloud storage and data warehouses.

  • Create ETL jobs to wrangle, clean, filter, deduplicate, etc. training datasets for AI models.

  • Quickly iterate over several data pipelines, driven by the insights gained by ablation studies.

Required skills and experience:

  • Possess a minimum of 5 years of industry experience in a Data Engineering role, demonstrating a strong foundation in data manipulation and transformation.

  • Proficiency in building data pipelines and workflows using Spark/Databricks and other ETL tools, with an emphasis on scalability and performance optimization.

  • A keen interest in understanding the science (and art) of creating training datasets.

  • Self-driven and comfortable working autonomously, capable of taking ownership of complex AI data engineering projects.

Bonus Points:

  • Interest in the Developer Tools space

Full-Time Employee Benefits Include:

🧑‍💻 Flexible Work Hours

💰 Competitive Salary & Equity

🖥 Home Office Set-Up Stipend

⚕️ Health, Dental, Vision and Life Insurance

🩼 Short Term and Long Term Disability

📱 Monthly Expenses Stipend 

🚼 Parental and Baby Bonding Leave

🏝 Flexible Time Off (FTO) + Holidays

🚀 Annual company/team offsites (4/year)

Want to Learn More? 

To achieve our mission of making programming more accessible around the world, we need our team to be representative of the world. We welcome your unique perspective and experiences in shaping this product. We encourage people from all kinds of backgrounds to apply, including and especially candidates from underrepresented and non-traditional backgrounds.

The overall market range of base compensation for roles in this area of Replit is typically $150,000 - $190,000. Compensation offered will be determined by additional factors such as location and experience.

This is a full-time hybrid role with an in-office requirement of Monday, Wednesday, and Friday.

Apply now Apply later
  • Share this job via
  • or

Tags: Databricks Data pipelines Data quality Engineering ETL Pipelines Spark

Perks/benefits: Competitive pay Equity Flex hours Flex vacation Gear Health care Home office stipend Insurance Parental leave Salary bonus

Region: North America
Country: United States
Job stats:  4  1  0

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.