Senior Data Engineer
Bengaluru, India
Airbnb
Dec 23, 2023 - Find the perfect place to stay at an amazing price in 191 countries. Belong anywhere with Airbnb.Airbnb is a mission-driven company dedicated to helping create a world where anyone can belong anywhere. It takes a unified team committed to our core values to achieve this goal. Airbnb's various functions embody the company's innovative spirit and our fast-moving team is committed to leading as a 21st century company.
Introduction
Data is critical to the success of every organisation in Airbnb. It is the foundation on which business intelligence insights, experimentation, Machine Learning (ML) models, analysis and publicly shared company performance metrics are built. The best insights and ML models are worthless without trustworthy data.
Data Engineers at Airbnb are responsible for ensuring the company has trustworthy data for innovation & operations. They play a unique role of bridging the gap and connecting data producers (typically Software Engineers who own the online source systems) to data consumers (typically Analysts, Data Scientists and Software Engineers building applications which rely on the data).
The role of a Data Engineer is critical to the success of any data driven organisation. Data Engineers deliver business value by increasing both confidence in the integrity of decision making and productivity of data consumers across the company. This document defines the role of a Data Engineer at Airbnb.
What Is A Data Engineer Not Responsible For?- Implementing and ensuring the syntactic and semantic quality of events sent to the offline data ecosystem. This is owned by the online system’s engineering team or the Data Platform team (depending on the ingestion mechanism).
- Developing artifact-specific datasets for reports or models - generally owned by Data Science or Analysts through company
- Developing or maintaining Minerva assets - generally owned by Data Science
- Developing or maintaining reports & dashboards - generally owned by Data Science or Analysts throughout company
- Developing & maintaining specific ML features - generally owned by Data Science or ML Engineers throughout company
Not every Data Engineer will require all of these skills, but we expect most Data Engineers to be strong in a significant number of these skills to be successful at Airbnb.
- Data Product Management
- Effective at building partnerships with business stakeholders, engineers and product to understand use cases from intended data consumers
- Able to create & maintain documentation to support users in understanding how to use tables/columns
- Data Architecture & Data Pipeline Implementation
- Experience creating and evolving dimensional data models & schema designs to structure data for business-relevant analytics. (Ex: familiarity with Kimball's data warehouse lifecycle)
- Strong experience using ETL framework (ex: Airflow, Flume, Oozie etc.) to build and deploy production-quality ETL pipelines.
- Experience ingesting and transforming structured and unstructured data from internal and third-party sources into dimensional models.
- Experience with dispersal of data to OLTP (ex: MySQL, Cassandra, HBase, etc) and fast analytics solutions (ex: Druid, ElasticSearch etc.).
- Data Systems Design
- Strong understanding of distributed storage and compute (S3, Hive, Spark)
- Knowledge in distributed system design, such as how map-reduce and distributed data processing work at scale
- Basic understanding of OLTP systems like Cassandra, HBase, Mussel, Vitess etc.
- Coding
- Experience building batch data pipelines in Spark
- Expertise in SQL
- General Software Engineering (e.g. proficiency coding in Python, Java, Scala)
- Experience writing data quality unit and functional tests.
- (Optional) Aptitude to learn and utilize data analytics tools to accelerate business needs
- (Optional) Stream Processing:
- Experience building Stream Processing jobs on Apache Flink, Apache Spark Streaming, Apache Samza, Apache Storm or similar streaming analytics technology.
- Experience with messaging systems (ex: Apache Kafka or RabbitMQ etc.)
- Experience designing and implementing distributed and real-time algorithms for stream data processing.
- Understand concepts of schema evolution, sharding, latency etc.
- Good understanding of Lambda Architecture, along with its advantages and drawbacks
Tags: Airflow Business Intelligence Cassandra Data Analytics Data pipelines Elasticsearch Engineering ETL Flink HBase Kafka Lambda Machine Learning ML models MySQL Oozie Pipelines Python Scala Spark SQL Streaming Unstructured data
Perks/benefits: Team events
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Marketing Data Analyst jobs
- Open MLOps Engineer jobs
- Open AI Engineer jobs
- Open Junior Data Scientist jobs
- Open Data Engineer II jobs
- Open Senior Data Architect jobs
- Open Sr Data Engineer jobs
- Open Senior Business Intelligence Analyst jobs
- Open Data Analytics Engineer jobs
- Open Power BI Developer jobs
- Open Manager, Data Engineering jobs
- Open Product Data Analyst jobs
- Open Principal Data Engineer jobs
- Open Business Data Analyst jobs
- Open Data Quality Analyst jobs
- Open Data Manager jobs
- Open Sr. Data Scientist jobs
- Open Data Scientist II jobs
- Open Big Data Engineer jobs
- Open Business Intelligence Developer jobs
- Open Data Analyst Intern jobs
- Open Principal Data Scientist jobs
- Open ETL Developer jobs
- Open Azure Data Engineer jobs
- Open Data Product Manager jobs
- Open Business Intelligence-related jobs
- Open Data quality-related jobs
- Open Privacy-related jobs
- Open Data management-related jobs
- Open GCP-related jobs
- Open Java-related jobs
- Open ML models-related jobs
- Open Finance-related jobs
- Open Data visualization-related jobs
- Open Deep Learning-related jobs
- Open APIs-related jobs
- Open PyTorch-related jobs
- Open PhD-related jobs
- Open TensorFlow-related jobs
- Open Consulting-related jobs
- Open Snowflake-related jobs
- Open NLP-related jobs
- Open Data governance-related jobs
- Open Data warehouse-related jobs
- Open Databricks-related jobs
- Open Airflow-related jobs
- Open Hadoop-related jobs
- Open LLMs-related jobs
- Open DevOps-related jobs
- Open CI/CD-related jobs