Data Engineer - Stream Data Processing - Distributed Data Processing

San Francisco, California, United States - Remote

Applications have closed

Pathway

Pathway is the data processing framework which handles streaming data updates for you.

View company page

About Pathway

Deeptech start-up, founded in March 2020.

  • Our developer product, Pathway™ is a new Stream Data Processing layer – a game changer for enterprise clients, designed to enable real-time insights based on raw streams of events data.
  • Pathway™ provides application developers with a capacity for real-time incremental in-memory transformation of complex event streams. It is built to master scenarios involving real-world data (e.g. IoT), online data (e.g. user activity patterns), and graph data (including graphs which evolve in time).
  • Pathway™ comes complete with a reactive Python programming framework and a rich library of composable application templates, process mining and Machine Learning algorithms.
  • The product is available in open beta to all developers at pathway.com, and our first deployed clients include some of the leaders of the logistics industry, such as DB Schenker or La Poste.


Pathway is a growing start-up, VC-funded at $4.5 million in pre-seed and supported by amazing Business Angels from the Machine Learning and logistics spaces. In the French tech ecosystem, Pathway is incubated at Agoranov and Ecole Polytechnique, a member of French Tech Paris Saclay, supported by the French Public Investment Bank and Réseau Entreprendre, accelerated by Wilco. Named as one of the 2021 Hottest Startups to invest in by the magazine Challenges and winner of the BPI I-Lab award for deeptech startups.


The Team

Pathway is built by and for overachievers. Its co-founders and employees have worked in the best AI labs in the world (Microsoft Research, Google Brain, ETH Zurich), worked at Google, and graduated from top universities (Polytechnique, ENSAE, Sciences Po, HEC Paris, PhD obtained at the age of 20, etc…). Pathway’s CTO is a co-author of Goeff Hinton and Yoshua Bengio. The team also includes the co-founders of Spoj.com (1M+ developer users) and NK.pl (13.5M+ users).

The opportunity

We are searching for a person with a Data Processing or Data Engineering profile, willing to work with live client datasets, and to test, benchmark, and showcase our brand-new stream data processing technology.

The end-user of our product are mostly developers and data engineers working in a corporate environment. Our development framework is one day expected to become for them a part of their preferred development stack for analytics projects at work – their daily bread & butter.


You Will

You will be working closely with our CTO, Head of Product, as well as key developers. You will be expected to:

  • Implement the flow of data from their location in client's warehouses up to Pathway's ingress.
  • Set up CDC interfaces for change streams between client data stores and i/o data processed by Pathway; ensuring data persistence for Pathway outputs.
  • Design ETL pipelines within Pathway.
  • Contribute to benchmark framework design (throughput / latency / memory footprint; consistency), including in a distributed system setup.
  • Contribute to building open-source test frameworks for simulated streaming data scenarios on public datasets.

Requirements

  • Inside-out understanding of at least one major distributed data processing framework (Spark, Dask, Ray,...)
  • 6 months+ experience working with a streaming dataflow framework (e.g.: Flink, Kafka Streams or ksqldb, Spark in streaming mode, Beam/Dataflow)
  • Ability to set up distributed dataflows independently.
  • Experience with data streams: message queues, message brokers (Kafka), CDC.
  • Working familiarity with data schema and schema versioning concepts; Avro, Protobuf, or others.
  • Familiarities with Kubernetes.
  • Familiarity with deployments in both Azure and AWS clouds.
  • Good working knowledge of Python.
  • Good working knowledge of SQL.
  • Experienced in working for an innovative tech company (SaaS, IT infrastructure or similar preferred), with a long-term vision.
  • Warmly disposed towards open-source and open-core software, but pragmatic about licensing.


Bonus Points

  • Know the ways of developers in a corporate environment.
  • Passionate about trends in data.
  • Proficiency in Rust.
  • Experience with Machine Learning pipelines or MLOps.
  • Familiarity with any modern data transformation workflow tooling (dbt, Airflow, Dagster, Prefect,...)
  • Familiarity with Databricks Data Lakehouse architecture.
  • Familiarity with Snowflake's data product vision (2022+).
  • Experience in a startup environment.

Benefits

Why You Should Apply

  • Intellectually stimulating work environment. Be a pioneer: you get to work with a new type of stream processing framework.
  • Work in one of the hottest data startups in France, with exciting career prospects
  • Responsibilities and ability to make significant contribution to the company’ success
  • Compensation: contract of up to $150k (full-time-equivalent) + Employee stock option plan.
  • Inclusive workplace culture


Further details

  • Type of contract: Flexible / remote
  • Preferable joining date: early 2023.
  • Compensation: contract of up to $150k (full-time-equivalent) + Employee stock option plan.
  • Location: Remote work from home. Possibility to meet with other team members in one of our offices:
    • Paris – Agoranov (where Doctolib, Alan, and Criteo were born) near Saint-Placide Metro (75006).
    • Paris Area – Drahi X-Novation Center, Ecole Polytechnique, Palaiseau.
    • Wroclaw – University area.

Candidates based anywhere in the United States and Canada will be considered.

Tags: Airflow Architecture Avro AWS Azure Dagster Databricks Dataflow Engineering ETL Flink Kafka Kubernetes Machine Learning MLOps PhD Pipelines Python Research Rust Snowflake Spark SQL Streaming

Perks/benefits: Career development Equity Flex hours Salary bonus Startup environment Team events

Regions: Remote/Anywhere North America
Country: United States
Job stats:  38  2  0
Category: Engineering Jobs

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.