Senior Data Engineer, Search POI
Remote, United States
Mapbox
APIs and SDKs for AI-powered maps, location search, turn-by-turn navigation, and geospatial data in mobile or web apps. Get started for free.Mapbox is the leading real-time location platform for a new generation of location-aware businesses. Mapbox is the only platform that equips organizations with the full set of tools to power the navigation of people, packages, and vehicles everywhere. More than 3.5 million registered developers have chosen Mapbox because of the platform’s flexibility, security and privacy compliance. Organizations use Mapbox applications, data, SDKs and APIs to create customized and immersive experiences that delight their customers. Whether you’re watching the delivery of your grocery order on Instacart, tracking your personal best mile on Strava, monitoring your gas budget on Metromile, or checking today’s forecast on The Weather Channel, Mapbox is the location and maps within those apps.
What We Do
The Search POI team normalizes and conflates multiple data sources into consumable, high-quality data layers such as the road network, places of interest (POIs), buildings, places for internal and external customers. We ensure high-quality by first maintaining a standardized specification for all map layers that our internal customers like the Navigation, Maps and Search division rely on. These layers are maintained in a data warehouse, which is updated daily and preprocessed, filtered, conflated, and transformed into different formats for respective consumers. The Search POI team is at the core of building the AI map at Mapbox by integrating features extracted from road side or aerial imagery as well as data derivatives of the 225M miles of anonymized traffic data Mapbox receives per day.
We are built on top of AWS. We use tools like Airflow for job orchestration and automate our tasks using Lambda, ECS and PySpark applications on Qubole. We store our data in S3 and are heavy users of Hive. We use Amazon Athena to provide and measure our operational and qualitative metrics. We never shy away from building our own internal tools to accelerate our workflows.
What You'll Do
- Work with many geospatial data sets, specifically road networks, buildings, POI and address data
- Implement distributed pipelines using Airflow and Spark to process geospatial data
- Integrate third party data sources from different geographic areas into the basemap
- Interface with engineers from other teams to analyze their needs for geospatial data and solve their data problems
- Implement automated quality metrics to ensure we are continuously delivering high quality data to our customers
- Participating in design and code reviews
- Mentor other software developers to develop all aspects of their engineering skill sets
- Create new data products by aggregating proprietary sources and derived data from sensors and aerial imagery
What We Believe are Important Traits for This Role
- 3+ years of experience in software development
- Experience with AWS, GCP, or Azure
- Proficiency in at least one modern programming language (Python, Scala, Java, …) suitable for data processing
- Proficiency in a query language like SQL
- Familiarity working with Spark or other Hadoop based technologies
- Familiarity with CI/CD processes
- Familiarity handling processing and normalizing many different datasets into a single coherent product.
- Ability to communicate complex concepts to both peers and leadership. Strong verbal and written communication skills.
Nice-to-Haves
- Experience with introducing quality and operational metrics into a data ETL pipeline.
- Experience with geospatial data analysis and processing is a plus.
- Experience with machine learning is a plus.
What We Value
In addition to our core values, which are not unique to this position and are necessary for Mapbox leaders:
- We value high-performing creative individuals who dig into problems and opportunities.
- We believe in individuals being their whole selves at work. We commit to this through supportive health care, parental leave, flexibility for the things that come up in life, and innovating on how we think about supporting our people.
- We emphasize an environment of teaching and learning to equip employees with the tools needed to be successful in their function and the company.
- We strongly believe in the value of growing a diverse team and encourage people of all backgrounds, genders, ethnicities, abilities, and sexual orientations to apply.
Mapbox is an EEO Employer - Minority/Female/Veteran/Disabled/Sexual Orientation/Gender Identity
#LI-Remote
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow APIs Athena AWS Azure CI/CD Data analysis ECS Engineering ETL GCP Hadoop Lambda Machine Learning Pipelines PySpark Python Scala Security Spark SQL
Perks/benefits: Career development Parental leave
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Marketing Data Analyst jobs
- Open MLOps Engineer jobs
- Open AI Engineer jobs
- Open Junior Data Scientist jobs
- Open Data Engineer II jobs
- Open Senior Data Architect jobs
- Open Sr Data Engineer jobs
- Open Senior Business Intelligence Analyst jobs
- Open Data Analytics Engineer jobs
- Open Power BI Developer jobs
- Open Manager, Data Engineering jobs
- Open Product Data Analyst jobs
- Open Principal Data Engineer jobs
- Open Business Data Analyst jobs
- Open Data Quality Analyst jobs
- Open Data Manager jobs
- Open Sr. Data Scientist jobs
- Open Data Scientist II jobs
- Open Big Data Engineer jobs
- Open Business Intelligence Developer jobs
- Open Data Analyst Intern jobs
- Open Principal Data Scientist jobs
- Open ETL Developer jobs
- Open Azure Data Engineer jobs
- Open Data Product Manager jobs
- Open Business Intelligence-related jobs
- Open Data quality-related jobs
- Open Privacy-related jobs
- Open Data management-related jobs
- Open GCP-related jobs
- Open Java-related jobs
- Open ML models-related jobs
- Open Finance-related jobs
- Open Data visualization-related jobs
- Open Deep Learning-related jobs
- Open APIs-related jobs
- Open PyTorch-related jobs
- Open PhD-related jobs
- Open TensorFlow-related jobs
- Open Consulting-related jobs
- Open Snowflake-related jobs
- Open NLP-related jobs
- Open Data governance-related jobs
- Open Data warehouse-related jobs
- Open Airflow-related jobs
- Open Databricks-related jobs
- Open Hadoop-related jobs
- Open LLMs-related jobs
- Open DevOps-related jobs
- Open Kubernetes-related jobs