Data Engineer

Gurugram

Applications have closed

MongoDB

Get your ideas to market faster with a developer data platform built on the leading modern database. MongoDB makes working with data easy.

View company page

The database market is massive (the IDC estimates it to be $119B+ by 2025!) and MongoDB is at the head of its disruption. The MongoDB community is transforming industries and empowering developers to build amazing apps that people use every day. We are the leading modern data platform and the first database provider to IPO in over 20 years. Join our team and be at the forefront of innovation and creativity.

Headquartered in New York, with offices across North America, Europe, and Asia-Pacific, MongoDB has more than 17,000 customers, which include some of the largest and most sophisticated businesses in nearly every vertical industry, in over 100 countries.

MongoDB is growing rapidly and seeking a Data Engineer to be a key contributor to the overall internal data platform at MongoDB. You will build data-driven solutions to help drive MongoDB's growth as a product and as a company. You will tackle complex data-related problems using very diverse data sets.

Our ideal candidate has experience with

  • Several programming languages (Python, Scala, Java, etc.)
  • Data processing frameworks like Spark
  • Streaming data processing frameworks like Kafka, KSQ, and Spark Streaming
  • A diverse set of databases like MongoDB, Cassandra, Redshift, Postgres, etc.
  • Different storage formats like Parquet, Avro, Arrow, and JSON
  • AWS services such as EMR, Lambda, S3, Athena, Glue, IAM, RDS, etc.
  • Orchestration tools such as Airflow, Luiji, Azkaban, Cask, etc.
  • Git and Github
  • CI/CD Pipelines

You might be a phenomenal fit if you

  • Enjoy wrangling huge amounts of data and exploring new data sets
  • Value code simplicity and performance
  • Obsess over data: everything needs to be accounted for and be thoroughly tested
  • Plan effective data storage, security, sharing, and publishing within an organization
  • Constantly thinking of ways to squeeze better performance out of data pipelines

Nice to haves

  • You are deeply familiar with Spark and/or Hive
  • You have expert experience with Airflow
  • You understand the differences between different storage formats like Parquet, Avro, Arrow, and JSON
  • You understand the tradeoffs between different schema designs like normalization vs. denormalization
  • In addition to data pipelines, you’re also quite good with Kubernetes, Drone, and Terraform
  • You’ve built an end-to-end production-grade data solution that runs on AWS
  • You have experience building machine learning pipelines using tools like: SparkML, Tensorflow, Scikit-Learn, etc.

Responsibilities

As a Data Engineer, you will

  • Build large-scale batch and real-time data pipelines with data processing frameworks like Spark on AWS
  • Help drive best practices in continuous integration and delivery
  • Help drive optimization, testing, and tooling to improve data quality
  • Collaborate with other software engineers, machine learning specialists, and partners, taking learning and leadership opportunities that will arise every single day

Success Measures

In 3 months- you will have familiarized yourself with much of our data platform, be making regular contributions to our codebase, will be collaborating regularly with partners to widen your knowledge and helping to resolve incidents and respond to user requests.

6 Months- you will have successfully investigated, scoped, executed, and documented a small to a medium-sized project and worked with partners to make sure their data needs are satisfied by implementing improvements to our platform.

12 Months- you will have become the key person for several projects within the team and will have contributed to the data platform’s roadmap. You will have made several sizable contributions to the project and are regularly looking to improve the overall stability and scalability of the architecture.

This role is remote optional until 10th January 2022, we are looking to speak to candidates who plan to be available in India Office when we introduce our hybrid model.

To drive the personal growth and business impact of our employees, we’re committed to developing a supportive and enriching culture for everyone. From employee affinity groups, to fertility assistance and a generous parental leave policy, we value our employees’ wellbeing and want to support them along every step of their professional and personal journeys. Learn more about what it’s like to work at MongoDB, and help us make an impact on the world!

MongoDB is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process. To request accommodation due to a disability, please inform your recruiter.

MongoDB is an equal opportunities employer.

Tags: Airflow Arrow Athena Avro AWS Azkaban Cask Cassandra CI/CD Data pipelines Git GitHub JSON Kafka KSQ Kubernetes Lambda Luiji Machine Learning MongoDB Parquet Pipelines PostgreSQL Python Redshift Scala Scikit-learn Security Spark SparkML Streaming TensorFlow Terraform Testing

Perks/benefits: Career development Fertility benefits Parental leave

Region: Asia/Pacific
Job stats:  4  1  0
Category: Engineering Jobs

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.