Data Engineer
New York City, NY
MongoDB
Get your ideas to market faster with a developer data platform built on the leading modern database. MongoDB makes working with data easy.The database market is massive (IDC estimates it to be $121B+ by 2025!) and MongoDB is at the head of its disruption. At MongoDB we are transforming industries and empowering developers to build amazing apps that people use every day. We are the leading modern data platform and the first database provider to IPO in over 20 years. Join our team and be at the forefront of innovation and creativity.
MongoDB is growing rapidly and seeking a Data Engineer to be a key contributor to the company’s Internal Data Platform. You will build ETL pipelines that pull data into our Data Lake/Warehouse and that will be used to drive forward our growth as a product and as a company. You will take on complex data-related problems using very diverse data sets, and will work with stakeholder groups throughout the company to help them make better data-informed decisions.
We are looking to speak to candidates who are based in New York City, NY.
Our ideal candidate has experience with
- Building ETL pipelines at scale that can grow without sacrificing performance
- Data Lake/Warehouse design patterns and concepts, including Delta Lakes
- Several programming languages (Python, Scala, Java, etc.)
- Data processing frameworks such as Spark and Pandas
- Orchestration tools such as Airflow, Luiji, Azkaban, Cask, etc.
- AWS services such as S3, Kinesis, EMR, Lambda, Athena, Glue, IAM, RDS, etc.
- Different storage formats such as Parquet, JSON, Avro, and Arrow
- Streaming data processing frameworks like Kafka, KSQL, and Spark Streaming
- A diverse set of databases (MongoDB, Redshift, etc.)
You might be an especially great fit if you
- Enjoy wrangling huge amounts of data and exploring new data sets
- Value code simplicity and performance
- Obsess over data: everything needs to be accounted for and be thoroughly tested
- Plan effective data storage, security, sharing, and publishing within an organization
- Constantly thinking of ways to squeeze better performance out of data pipelines
Nice to haves
- You are deeply familiar with Spark and/or Hive
- You have expert experience with Airflow
- You understand the differences between different storage formats like Parquet, Avro, Arrow, and JSON and when to use each
- You understand the tradeoffs between different schema designs like normalization vs. denormalization
- In addition to data pipelines, you’re also quite good with Kubernetes, Drone, and Terraform
- You’ve built an end-to-end production-grade data solution that runs on AWS or GCP
- You have experience building machine learning pipelines using tools such as SparkML, Tensorflow, Scikit-Learn, etc.
Responsibilities
As a Data Engineer, you will
- Build large-scale batch and real-time data pipelines with data processing frameworks including Spark and Kinesis
- Help drive best practices in continuous integration and delivery
- Help drive optimization, testing, and tooling to improve data quality
- Collaborate with other software engineers, machine learning experts, and stakeholders, taking learning and leadership opportunities that will arise every single day
To drive the personal growth and business impact of our employees, we’re committed to developing a supportive and enriching culture for everyone. From employee affinity groups, to fertility assistance and a generous parental leave policy, we value our employees’ wellbeing and want to support them along every step of their professional and personal journeys. Learn more about what it’s like to work at MongoDB, and help us make an impact on the world!
MongoDB is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process. To request an accommodation due to a disability, please inform your recruiter.
MongoDB, Inc. provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type and makes all hiring decisions without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow Arrow Athena Avro AWS Azkaban Cask Data pipelines Data quality ETL GCP JSON Kafka Kinesis Kubernetes Lambda Luiji Machine Learning MongoDB Pandas Parquet Pipelines Python Redshift Scala Scikit-learn Security Spark SparkML Streaming TensorFlow Terraform Testing
Perks/benefits: Career development Fertility benefits Parental leave
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Lead Data Analyst jobs
- Open Data Science Manager jobs
- Open MLOps Engineer jobs
- Open Senior Business Intelligence Analyst jobs
- Open Data Engineer II jobs
- Open Data Manager jobs
- Open Sr Data Engineer jobs
- Open Power BI Developer jobs
- Open Principal Data Engineer jobs
- Open Data Analytics Engineer jobs
- Open Business Intelligence Developer jobs
- Open Junior Data Scientist jobs
- Open Data Scientist II jobs
- Open Product Data Analyst jobs
- Open Senior Data Architect jobs
- Open Sr. Data Scientist jobs
- Open Business Data Analyst jobs
- Open Big Data Engineer jobs
- Open Data Analyst Intern jobs
- Open Manager, Data Engineering jobs
- Open Azure Data Engineer jobs
- Open Junior Data Engineer jobs
- Open Data Quality Analyst jobs
- Open Data Product Manager jobs
- Open Principal Data Scientist jobs
- Open Data quality-related jobs
- Open Business Intelligence-related jobs
- Open ML models-related jobs
- Open GCP-related jobs
- Open Data management-related jobs
- Open Java-related jobs
- Open Privacy-related jobs
- Open Data visualization-related jobs
- Open Finance-related jobs
- Open APIs-related jobs
- Open Deep Learning-related jobs
- Open PyTorch-related jobs
- Open Consulting-related jobs
- Open Snowflake-related jobs
- Open TensorFlow-related jobs
- Open PhD-related jobs
- Open CI/CD-related jobs
- Open NLP-related jobs
- Open Kubernetes-related jobs
- Open Data governance-related jobs
- Open LLMs-related jobs
- Open Airflow-related jobs
- Open Hadoop-related jobs
- Open Data warehouse-related jobs
- Open Databricks-related jobs