Senior Data Engineer (Remote)

Remote - New York City, United States

Applications have closed

PostHog

PostHog is the all-in-one platform for building better products - with product analytics, feature flags, session recordings, a/b testing, heatmaps, and more.

View company page

PostHog is an open-source product analytics platform. We provide product-led teams with everything they need to understand user behaviour, including funnels, session recordings, user paths, multivariate testing and more. PostHog can be deployed to the cloud, or self-hosted on existing infrastructure, removing the need to send data externally.

We started PostHog as part of Y Combinator's W20 cohort and had the most successful B2B software launch on HackerNews since 2012 - with a product that was just 4 weeks old. Since then, we raised $27m from some of the world's top investors, grew the team to over 30 and have shown strong product-led growth.

We’re now looking for a senior Data Engineer to join our Ingestion team. We have a community of over 20k+ developers using PostHog, mostly on the open source product, plus a 1,000+ Slack community and over 7,000 GitHub stars. And all of these numbers are going up, fast.

We hire globally, but are currently restricted to GMT -5 to +2 time zones.

What you’ll be doing:

We are looking for someone to take our ingestion pipeline to the next level. You will be working with our super talented Ingestion small team to iteratively build out and shore up the functionality of our ingestion pipeline. A good chunk of this work will be focussing on our Plugins service. This is the core of our data ingestion pipeline. It is responsible for transforming, augmenting, routing, and backfilling data to many different final destinations including the warehouse that we use to power PostHog, ClickHouse.

If in your spare time you love reading about Designing Data-Intensive Applications and dream about producing and consuming large amounts of data from Kafka, then this is the spot for you!

If you like to see for yourself exactly the kind of projects you would be working on check out these:

What we value:

  • We are open source - building a huge community around a free-for-life product is key to our strategy.
  • We aim to become _the_ most transparent company, ever. In order to enable teams to make great decisions, we share as much information as we can. In our public handbook _everyone_ can read about our roadmap, how we pay people, what our strategy is, and who we have raised money from. We also have regular team-wide feedback sessions, where we share honest feedback with each other.
  • We’re an all-remote company and writing things down is hugely important to us. We use asynchronous communication to reduce time spent in meetings. We are structured for speed and autonomy - we are all about acting fast, innovating and iterating.
  • We are a #LI-remote company, which allows us to hire amazing people from all over the world, and foster an inclusive culture.

Requirements

  • Experience designing or operating large scale realtime or near realtime data pipelines
  • Operational knowledge and experience with Kafka at scale
  • Solid backend engineer skills

Nice to haves (if you don't have any of these you should still apply!):

  • Experience deploying realtime or near-realtime data pipelines to K8s environments
  • Working knowledge of the internals of Apache Flink or other stateful streaming computation engines
  • Experience working with and operating a Data Lake / Lake House / Delta Lake at scale.
  • Experience being a user and wearing your product analytics is always a huge advantage
  • Experience operating or being a user of ClickHouse or any other data warehouse
  • You enjoy geeking out about serializations and their tradeoffs

Benefits

What we offer in return:

    We believe people from diverse backgrounds, with different identities and experiences, make our product and our company better. That’s why we dedicated a page in our handbook to diversity and inclusion. No matter your background, we'd love to hear from you!

    Also, if you have a disability, please let us know if there's any way we can make the interview process better for you - we're happy to accommodate!

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Tags: Data pipelines Flink GitHub Kafka Open Source Pipelines Streaming Testing

Perks/benefits: Career development Equity Flex vacation Health care Insurance Medical leave Parental leave Team events Travel Unlimited paid time off

Regions: Remote/Anywhere North America
Country: United States
Job stats:  6  3  0
Category: Engineering Jobs

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.