Data Engineer, Alexa AI Analytics Platform - Data Warehouse

Cambridge, Massachusetts, USA

Full Time
Amazon.com logo
Amazon.com
Apply now Apply later

Posted 2 weeks ago


Do you have a passion for big data, data warehousing, optimizing data pipelines, and working on a cross-functional team? Are you interested in providing data for machine learning scientists to build, train, and develop models?

The Data Engineer on the Alexa AI Analytics Platform team is responsible for the data pipeline, data warehousing, and future-looking/strategic data modeling for the analytics platform's day to day and three year plan roadmap.

The Data Engineer will build and optimize logical data model and data pipelines for difficult, big data datasets in the Alexa Spoken Language Understanding (SLU - covering Automatic Speech Recognition, Natural Understanding/Processing, and individual Alexa feature) space, powering customized analytics software powering thousands of users and investigations per month. The Data Engineer will be accountable for ongoing data quality, efficiency, testing, and maintenance. The Data Engineer will be required to work independently within our team and across other teams to source, optimize, and warehouse the right data for our customers.

The Data Engineer should thrive and have demonstrated success in an environment which offers ambiguously defined problems, big challenges, and quick changes. They will influence large-size data solutions/access to dataset(s) in team architecture, advising product managers, program managers, other engineers, and machine learning scientists. The data engineer will play a critical role in providing data for these machine learning scientists and help maintain their data pipelines using SageMaker and other latest AWS offerings.
We are looking for passionate data engineers to optimize the consumption of very large data sources we require to generate predictive models for uncovering unique insights. You influence your team’s technical and business strategy by making insightful contributions to team priorities and overall data approach. You take the lead in identifying and solving ambiguous problems, architecture deficiencies, or areas where your team bottlenecks the innovations of other teams. You make data solutions simpler. We are looking for people who are motivated by thinking big, moving fast, and changing the way customers use data to drive profitability. If you love to implement solutions to hard problems while working hard, having fun, and making history, this may be the opportunity for you.

The Data Engineer:
· Has knowledge of recent advances in distributed systems (e.g. MapReduce, MPP architectures, and NoSQL databases). You are proficient in a broad range of data design approaches and know when it is appropriate to use them (and when it is not).
· Has knowledge of the specific challenges and opportunities for providing data to machine learning scientists to enable their model building, training, testing, and deployment to production.
· Knowledge of engineering and operational excellence best practices. Can make enhancements that improve data processes (e.g., data auditing solutions, management of manually maintained tables, automating, ad-hoc or manual operation steps).
· Works with engineers to develop efficient data querying and modeling infrastructure.
· Understands how to make appropriate data trade-offs. Can balance customer requirements with technology requirements. Knows when to re-use code. Is judicious about introducing dependencies.
· Writing code that a Data Engineer or Software Development Engineer unfamiliar with the system can understand.
· Can create coherent Logical Data Models that drive physical design.
· Delivers pragmatic solutions. You do things with the proper level of complexity the first time (or at least minimize incidental complexity).
· Understands how to be efficient with resource usage (e.g., system hardware, data storage, query optimization, AWS infrastructure etc.)
· Collaboration with colleagues from multidisciplinary science, engineering and business backgrounds.
· Communicate proposals and results in a clear manner backed by data and coupled with actionable conclusions to drive business decisions



Basic Qualifications


· 3+ years of experience as a Data Engineer or in a similar role
· Experience with data modeling, data warehousing, and building ETL pipelines
· Experience in SQL
· Bachelor's degree or higher in a quantitative/technical field (e.g. Computer Science, Statistics, Engineering)
· 3+ years of relevant experience in one of the following areas: Data engineering, database engineering, business intelligence or business analytics
· 3+ years of hands-on experience in writing complex, highly-optimized SQL queries across large data sets
· Demonstrable experience in scripting languages (Python, Perl, Ruby) and Excel
· Experience in data modeling, ETL development, and Data warehousing
· Experience with massively parallel processing (MPP) databases (data warehouse and data lake)
· Experience with Tableau, Matillion, and AWS services (Redshift, S3, AWS Glue, EMR, DynamoDB)
· Experience with cloud data platforms and big data solutions
· Knowledge of distributed systems as it pertains to data storage and computing

Preferred Qualifications

· Master’s degree in a quantitative/technical field (e.g. Computer Science, Statistics, Engineering)
· Experience working with enterprise reporting systems, data analytics.
· Experience working as an Analytics Engineer or Data Scientist working with cross functional teams.

Amazon is an Equal Opportunity Employer – Minority / Women / Disability / Veteran / Gender Identity / Sexual Orientation / Age




Job tags: AI AWS Big Data Business Intelligence Data Analytics Data Warehousing Distributed Systems Engineering ETL Machine Learning MPP NoSQL Perl Python Redshift Ruby SQL Tableau
Share this job: