Data Engineer

New York City, NY

Applications have closed

MongoDB

Get your ideas to market faster with a developer data platform built on the leading modern database. MongoDB makes working with data easy.

View company page

The database market is massive (IDC estimates it to be $121B+ by 2025!) and MongoDB is at the head of its disruption. At MongoDB we are transforming industries and empowering developers to build amazing apps that people use every day. We are the leading modern data platform and the first database provider to IPO in over 20 years. Join our team and be at the forefront of innovation and creativity.

MongoDB is growing rapidly and seeking a Data Engineer to be a key contributor to the company’s Internal Data Platform. You will build ETL pipelines that pull data into our Data Lake/Warehouse and that will be used to drive forward our growth as a product and as a company. You will take on complex data-related problems using very diverse data sets, and will work with stakeholder groups throughout the company to help them make better data-informed decisions.

We are looking to speak to candidates who are based in New York City, NY.

Our ideal candidate has experience with

  • Building ETL pipelines at scale that can grow without sacrificing performance
  • Data Lake/Warehouse design patterns and concepts, including Delta Lakes
  • Several programming languages (Python, Scala, Java, etc.)
  • Data processing frameworks such as Spark and Pandas
  • Orchestration tools such as Airflow, Luiji, Azkaban, Cask, etc.
  • AWS services such as S3, Kinesis, EMR, Lambda, Athena, Glue, IAM, RDS, etc.
  • Different storage formats such as Parquet, JSON, Avro, and Arrow
  • Streaming data processing frameworks like Kafka, KSQL, and Spark Streaming
  • A diverse set of databases (MongoDB, Redshift, etc.)

You might be an especially great fit if you

  • Enjoy wrangling huge amounts of data and exploring new data sets
  • Value code simplicity and performance
  • Obsess over data: everything needs to be accounted for and be thoroughly tested
  • Plan effective data storage, security, sharing, and publishing within an organization
  • Constantly thinking of ways to squeeze better performance out of data pipelines

Nice to haves

  • You are deeply familiar with Spark and/or Hive
  • You have expert experience with Airflow
  • You understand the differences between different storage formats like Parquet, Avro, Arrow, and JSON and when to use each
  • You understand the tradeoffs between different schema designs like normalization vs. denormalization
  • In addition to data pipelines, you’re also quite good with Kubernetes, Drone, and Terraform
  • You’ve built an end-to-end production-grade data solution that runs on AWS or GCP
  • You have experience building machine learning pipelines using tools such as SparkML, Tensorflow, Scikit-Learn, etc.

Responsibilities

 As a Data Engineer, you will

  • Build large-scale batch and real-time data pipelines with data processing frameworks including Spark and Kinesis
  • Help drive best practices in continuous integration and delivery
  • Help drive optimization, testing, and tooling to improve data quality
  • Collaborate with other software engineers, machine learning experts, and stakeholders, taking learning and leadership opportunities that will arise every single day

To drive the personal growth and business impact of our employees, we’re committed to developing a supportive and enriching culture for everyone. From employee affinity groups, to fertility assistance and a generous parental leave policy, we value our employees’ wellbeing and want to support them along every step of their professional and personal journeys. Learn more about what it’s like to work at MongoDB, and help us make an impact on the world!

MongoDB is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process. To request an accommodation due to a disability, please inform your recruiter.

MongoDB, Inc. provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type and makes all hiring decisions without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Tags: Airflow Arrow Athena Avro AWS Azkaban Cask Data pipelines Data quality ETL GCP JSON Kafka Kinesis Kubernetes Lambda Luiji Machine Learning MongoDB Pandas Parquet Pipelines Python Redshift Scala Scikit-learn Security Spark SparkML Streaming TensorFlow Terraform Testing

Perks/benefits: Career development Fertility benefits Parental leave

Region: North America
Country: United States
Job stats:  8  2  0
Category: Engineering Jobs

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.