Data Engineer

San Francisco, CA

Segment.io, Inc. logo
Segment.io, Inc.
Apply now Apply later

Posted 1 week ago

Tribe and Karma of the Team:

At Segment, we believe that good data is the foundation of good decision-making. Too often, we see product managers and marketers with siloed or conflicting pieces of the picture, and engineers who struggle to keep up with evolving data needs. Segment standardizes and streamlines data infrastructure using a single platform that collects, unifies, and sends data to hundreds of business tools with the flip of a switch. We ensure that all of our customers are equipped with a holistic and detailed view of their customers, unlocking digital transformations, personalized marketing, and a data-driven future.   Data Engineering enables Segment to derive insights about our customers and product usage effectively and efficiently, and it is the backbone of all data-driven decisions we make to move the business forward.   This is a rare opportunity to be one of the founding members of Segment’s internal Data Engineering team. As a founding data engineer, you will be part of the core team responsible for building out our data lake, processing many terabytes of data, and developing efficient ETL pipelines to deliver data and insights to partners across the company. Additionally, you have the opportunity to provide feedback to Product and Engineering teams that will help shape the future of our products.

Drive and Focus of the Role:

  • Design, build and launch efficient and reliable data pipelines for ingesting and transforming data from internal and cloud applications
  • Assist partners in Analytics, Product, and Go-to-Market teams to with their data transformation and infrastructure needs
  • Optimize Segment’s internal data storage and compute resources to improve performance, reliability, and availability of the data
  • Execute on our data lake and self-service strategies
  • Own and maintain Segment’s internal data infrastructure assets primarily hosted on AWS
  • Build back-end data services for internal applications
  • Build data engineering tools and frameworks to enable common data processing patterns at scale 

What we're looking for:

  • 2+ years of hands-on experience with Python, Java, or Scala for data processing
  • 2+ years of data or software engineering experience
  • Knowledge of data pipelining and workflow management tools such as Airflow
  • Working experience with data warehouses such as Snowflake
  • Advanced working SQL experience
  • Experience building big data (terabyte scale) processing pipelines using a distributed data processing engine/framework
  • Hands-on experience with versioning, continuous integration, and build & deployment tools and platforms such as Github and Buildkite
  • Familiar with dimensional data modeling and data normalization
  • Familiar with data-lake architecture and data serialization formats such as JSOS and Parquet
  • Exposure working in a dynamic and fast-paced shop
  • BS or MS degree in Computer Science or a related technical field
  Segment is an equal opportunity employer. We believe that everyone should receive equal consideration and treatment in all terms and conditions of employment regardless of sex, gender (including pregnancy, childbirth, breastfeeding or related medical conditions), sexual orientation, gender identity, gender expression, race, color, religion, creed, national origin, ancestry, age (over 40), physical disability, mental disability, medical condition, genetic information, marital status, domestic partner status, military or veteran status, height, weight, AIDS/HIV status, and any other protected category under federal, state or local law. Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
Job tags: Airflow AWS Big Data Engineering ETL Java Marketing Parquet Python Scala SQL