Staff Data Pipeline Engineer
Remote US, Remote Canada, Remote Toronto/Vancouver Area, Remote San Francisco Bay Area
Mozilla
Mozilla is the not-for-profit behind the lightning fast Firefox browser. We put people over profit to give everyone more power online.The company
Pocket empowers people to discover, organize, consume, and share content that matters to them. Our apps and platform are essential ways that tens of millions of people discover and consume content on the web. Pocket is the Web, curated: for you and by you.
The opportunity
For content recommendations, everything starts with data. Pocket’s Data Products team builds systems that combine machine learning with editorial expertise to surface high-quality content from across the internet. Ensuring data privacy when collecting, distributing, validating, and securing data at scale is no small task and every engineer on our team plays a vital role in shaping each user’s experience.
We are looking for a Lead Data Pipeline Engineer to own the design and development of data pipeline applications for complex, extensible, and highly scalable cloud-based data platforms. Are you passionate about building intuitive data models? Do you excel at taking vague requirements and crystallizing them into scalable data solutions? We invite you to apply!
People who excel on our team thrive in a small, dynamic environments. We cover many areas including machine learning, product engineering, machine learning operations, and data modeling, among others.
Who you are
- Enjoy working on small, dynamic teams.
- Understand Data Lifecycle and concepts such as lineage, governance, privacy, retention, anonymity, etc.
- Conceptually familiar with AWS cloud resources (S3, EC2, RDS etc).
- A trusted authority in distributed data processing patterns.
- Highly proficient in at least one of Java, Python or Scala.
- Comfortable with complex SQL
- Experience designing, building, and maintaining data lakes.
What you'll do
- Build and maintain data pipeline applications
- Design, create and maintain the data platform data model at the conceptual, logical, and physical levels.
- Establish data security, quality, load, transport and performance models.
- Research, design, document and modify data pipeline software specifications throughout the production life cycle.
- Develop and maintain stakeholder documentation and operations procedures, programs, security, etc. and assist in eliminating
- redundancy and automating manual processes.
- Assist in developing standards and criteria for the successful implementation of new systems.
- Perform code reviews and mentor other engineers.
Bonus experience
- Cloud warehouses: Snowflake, BigQuery, Redshift
- Feature stores: Sagemaker, Databricks, Vertex
- Orchestrators: Airflow, Prefect
- Compute frameworks: AWS Glue, Spark, Hadoop, Athena
- Streaming data: Kinesis, Kafka
- Data modeling: DBT
About Pocket
We’re a remote-first team. Video conferencing, Slack chats, and shared documents keep everyone in the loop and make sure no one feels isolated. We value transparency and collaboration from the CEO on down.
As a subsidiary of Mozilla, we have the nimbleness of a small team with the resources of a large company, which means each teammate has the opportunity to make a big impact. But we make sure our working hours are flexible—not just because we have team members in different time zones—but because we know you have a life outside the office, and we value that. You’re human, we’re human, and everyone at Pocket is treated with utmost respect
Commitment to diversity, equity, inclusion, and belonging
Mozilla understands that valuing diverse creative practices and forms of knowledge are crucial to and enrich the company’s core mission. We encourage applications from everyone, including members of all equity-seeking communities, such as (but certainly not limited to) women, racialized and Indigenous persons, persons with disabilities, persons of all sexual orientations, gender identities and expressions.
We will ensure that qualified individuals with disabilities are provided reasonable accommodations to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment, as appropriate. Please contact us at hiringaccommodation@mozilla.com to request accommodation.
We are an equal opportunity employer. We do not discriminate on the basis of race (including hairstyle and texture), religion (including religious grooming and dress practices), gender, gender identity, gender expression, color, national origin, pregnancy, ancestry, domestic partner status, disability, sexual orientation, age, genetic predisposition, medical condition, marital status, citizenship status, military or veteran status, or any other basis covered by applicable laws. Mozilla will not tolerate discrimination or harassment based on any of these characteristics or any other unlawful behavior, conduct, or purpose.
Group: C
#LI-REMOTE
Tags: Airflow Athena AWS BigQuery Databricks EC2 Engineering Excel Hadoop Kafka Kinesis Machine Learning Python Redshift Research SageMaker Scala Security Snowflake Spark SQL Streaming
Perks/benefits: Career development Flex hours Salary bonus Transparency
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Lead Data Analyst jobs
- Open Data Engineer II jobs
- Open Data Science Manager jobs
- Open Marketing Data Analyst jobs
- Open Senior Business Intelligence Analyst jobs
- Open MLOps Engineer jobs
- Open Principal Data Engineer jobs
- Open Power BI Developer jobs
- Open Data Scientist II jobs
- Open Data Analytics Engineer jobs
- Open Business Intelligence Developer jobs
- Open Junior Data Scientist jobs
- Open Business Data Analyst jobs
- Open Sr Data Engineer jobs
- Open Product Data Analyst jobs
- Open Data Analyst Intern jobs
- Open Senior Data Architect jobs
- Open Sr. Data Scientist jobs
- Open Research Scientist jobs
- Open Big Data Engineer jobs
- Open Data Quality Analyst jobs
- Open Azure Data Engineer jobs
- Open Principal Data Scientist jobs
- Open Manager, Data Engineering jobs
- Open Data Product Manager jobs
- Open Data quality-related jobs
- Open GCP-related jobs
- Open Java-related jobs
- Open ML models-related jobs
- Open Business Intelligence-related jobs
- Open Data management-related jobs
- Open Privacy-related jobs
- Open Deep Learning-related jobs
- Open Data visualization-related jobs
- Open PhD-related jobs
- Open Finance-related jobs
- Open PyTorch-related jobs
- Open NLP-related jobs
- Open TensorFlow-related jobs
- Open APIs-related jobs
- Open Consulting-related jobs
- Open LLMs-related jobs
- Open CI/CD-related jobs
- Open Generative AI-related jobs
- Open Snowflake-related jobs
- Open Kubernetes-related jobs
- Open Hadoop-related jobs
- Open Data governance-related jobs
- Open Airflow-related jobs
- Open Databricks-related jobs