Big Data Engineer

US, NC, Virtual Location - N Carolina

Full Time
Amazon.com logo
Amazon.com
Apply now Apply later

Posted 1 week ago

Have you ever ordered a product from Amazon and been amazed at how fast it gets to you?

Our team in the Seattle office is building complex, massive data systems to capture data during every step in the automated pipeline and use that data to proactively predict efficiency and cost improvements.

As an Amazon.com Big Data Engineer you will be working in one of the world's largest and most complex data warehouse environments. You should be skilled in the architecture of DW solutions for the Enterprise using multiple platforms (RDBMS, Columnar, Cloud). You should have extensive experience in the design, creation, management, and business use of extremely large data-sets. You should have excellent business and communication skills to be able to work with business owners to develop and define key business questions, and to build data sets that answer those questions. Above all, you should be passionate about working with huge data sets and someone who loves to bring data-sets together to answer business questions and drive change.

As a Big Data Engineer in this role, you will develop new data engineering patterns that leverage a new cloud architecture, and will extend or migrate our existing data pipelines to this architecture as needed. You will also be assisting with integrating the Redshift platform as our primary processing platform to create the curated Amazon.com data model for the enterprise to leverage. You will be part of a team that builds the next generation data warehouse platform and to drive the adoption of new technologies and new practices in existing implementations. You will be responsible for designing and implementing the complex ETL pipelines in data warehouse platform and other BI solutions to support the rapidly growing and dynamic business demand for data, and use it to deliver the data as service which will have an immediate influence on day-to-day decision-making at Amazon.com.

Key responsibilities include
- Building and migrating the complex ETL pipelines from different sources (DynamoDB/S3/SQS/Relational Databases) to Redshift and Elastic Map Reduce to make the system grow elastically
- Optimizing the performance of business-critical queries and dealing with ETL job-related issues
- Tuning application and query performance using Unix profiling tools and SQL.
- Gather and understand data requirements, work in the team to achieve high quality data ingestion and build systems that can process the data, transform the data
and store in in various relational and non-relational stores.
- Improve upon the data ingestion models, ETLs, and alarming to maintain data integrity and data availability.
- Extracting and combining data from various heterogeneous data sources
- Designing, implementing and supporting a platform that can provide ad-hoc access to large data-sets
- Modeling data and metadata to support ad-hoc and pre-built reporting
- Working with customers to fulfill their data requirement using DW tables & maintain metadata for all DW Tables

Basic Qualifications


• Degree in Computer Science, Engineering, Mathematics, or a related field and 5+ years industry experience
• Experience in data modeling, ETL development, and data warehousing
• Data Warehousing Experience with Oracle, Redshift, Teradata, etc.
• Experience with Big Data Technologies (Hadoop, Hive, Hbase, Pig, Spark, etc.)
• Strong customer focus, ownership, urgency and drive.
• Excellent communication skills and the ability to work well in a team.
• Effective analytical, troubleshooting and problem-solving skills.

Preferred Qualifications

• Industry experience as a Data Engineer or related specialty (e.g., Software Engineer, Business Intelligence Engineer, Data Scientist) with a track record of manipulating, processing, and extracting value from large datasets
• Coding proficiency in at least one modern programming language (Python, Ruby, Java, etc)
• Experience building/operating highly available, distributed systems of data extraction, ingestion, and processing of large data sets
• Experience building data products incrementally and integrating and managing datasets from multiple sources
• Query performance tuning skills using Unix profiling tools and SQL
• Experience leading large-scale data warehousing and analytics projects, including using AWS technologies – Redshift, S3, EC2, Data-pipeline and other big data technologies
• Experience providing technical leadership and mentor other engineers for the best practices on the data engineering space
• Linux/UNIX including to process large data sets
• Experience with AWS

Job tags: AWS Big Data Business Intelligence Data Warehousing Distributed Systems Engineering ETL Hadoop Java Linux Map Reduce Oracle Python Redshift Ruby Spark SQL