Software Engineer, Data Pipelines

Budapest, Hungary

Scale AI

Trusted by world class companies, Scale delivers high quality training data for AI applications such as self-driving cars, mapping, AR/VR, robotics, and more.

View company page

Scale is powering this generative AI wave by providing the data and infrastructure for companies to build large-scale foundation models. AI is rapidly changing the world, and Scale is growing to meet that rapid demand. Our customers include OpenAI, Microsoft, Adept, Stability AI and many more major players in this space!

The Platform team is responsible for building the core abstractions and infrastructure on which the products can be built and iterated rapidly. The team owns how data flows throughout the Scale platform. We’re looking for a software engineer with deep experience building and scaling tools for data pipelines and change stream ingestion, on both structured and unstructured database platforms. You have a growth mindset and are comfortable learning new technologies.

You will:

  • Build and mature data pipelines, change streams, and ETL workloads for Scale, leveraging industry-standard platforms
  • Maintain and extend live data ingestion pipelines that power analytics and data science using a mixture of proprietary and commercial solutions
  • Collaborate with stakeholders across the organization, such as software developers, platform engineers, machine learning scientists, customer operations, etc.
  • Own services or systems and define their long-term health goals, while also improving the health of surrounding components
  • Mentor other engineers and become deeply involved in architectural design and database best-practices
  • Work directly with our engineering and sales teams to create backend database solutions to meet their challenging data and security needs
  • Build systems capable of handling millions of frames of data every day, making it available to both our workforce and our internal teams with high availability

Ideally you'd have:

  • 5+ years of industry experience as a software engineer post graduation with focus on data pipelines
  • Engineering experience with building real-time and distributed system architecture
  • Experience designing data platforms on industry standard public cloud solutions
  • Deep familiarity with design, architecture, optimization, and tuning database platforms such as MongoDB, Postgres, MySQL, Redis
  • Deep familiarity with SQL query optimization, database indexing, scalability (partitioning/sharding), and replication
  • Intermediate experience in at least one coding language: Typescript, Python, Go, Java, C++ (note that we are mostly language-agnostic and are open to using whatever is the best tech for the problem at hand)
  • Experience working with Docker, Kubernetes, and Infra-as-Code (e.g. Terraform); bonus points for experience supporting GPU/ML workloads

Nice to have:

  • Prior startup experience to help us grow responsibly
  • Experience with AWS, TiDB, Datadog, ElasticSearch
  • Experience with cloud-based data warehouse solutions like Snowflake or Databricks
  • Experience with cost optimization strategies and techniques for database platforms
  • Experience developing and designing intermediary data abstraction layers
  • Mentored and grown members of your team or been a tech lead on large projects

About Us:

At Scale, we believe that the transition from traditional software to AI is one of the most important shifts of our time. Our mission is to make that happen faster across every industry, and our team is transforming how organizations build and deploy AI.  Our products power the world's most advanced LLMs, generative models, and computer vision models. We are trusted by generative AI companies such as OpenAI, Meta, and Microsoft, government agencies like the U.S. Army and U.S. Air Force, and enterprises including GM and Accenture. We are expanding our team to accelerate the development of AI applications.

We believe that everyone should be able to bring their whole selves to work, which is why we are proud to be an affirmative action employer and inclusive and equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability status, gender identity or Veteran status. 

We are committed to working with and providing reasonable accommodations to applicants with physical and mental disabilities. If you need assistance and/or a reasonable accommodation in the application or recruiting process due to a disability, please contact us at accommodations@scale.com. Please see the United States Department of Labor's Know Your Rights poster for additional information.

We comply with the United States Department of Labor's Pay Transparency provision

PLEASE NOTE: We collect, retain and use personal data for our professional business purposes, including notifying you of job opportunities that may be of interest and sharing with our affiliates. We limit the personal data we collect to that which we believe is appropriate and necessary to manage applicants’ needs, provide our services, and comply with applicable laws. Any information we collect in connection with your application will be treated in accordance with our internal policies and programs designed to protect personal data.

Apply now Apply later
  • Share this job via
  • or

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Tags: Architecture AWS Computer Vision Databricks Data pipelines Data warehouse Docker Elasticsearch Engineering ETL Generative AI Generative modeling GPU Java Kubernetes LLMs Machine Learning MongoDB MySQL OpenAI Pipelines PostgreSQL Python Security Snowflake SQL Terraform TypeScript

Perks/benefits: Career development Startup environment

Region: Europe
Country: Hungary
Job stats:  7  0  0
Category: Engineering Jobs

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.