Freelance Data Engineer
Warsaw, Masovian Voivodeship, Poland - Remote
Displate
Wall art built to last forever. Official designs from Star Wars™, Marvel, Netflix and 200+ more brands. Hassle-free magnet mounting. 100% made in the EU.We’re Displate, one of the top 5 most valuable Polish e-commerce according to Forbes Magazine, the XXI-century metal posters makers. Displate.com is home to artworks from thousands of independent artists and the biggest brands on this planet, like Marvel, DC, Star Wars, Blizzard, NASA, Iron Maiden, CyberPunk, The Witcher… you name it.
The role
We are looking for a Data Engineer who will join one of the product teams on a 6 monthly contract. The role is focused on productionalization of ML-focused data pipelines, organizing data resources (data lake), leveraging streaming data and prototyping with new services in AWS cloud.
You will work together with back-end developers and data scientists to deliver better products and services by building purpose-oriented data lakes and pipelines.
The work will require designing, building, operationalizing, securing, and monitoring data processing systems in the AWS environment with a mind on security, scalability, and reliability. As part of the team, you will have an impact on how the data is governed, accessed and distributed across services and data products for ML.
Sounds exciting? Keep reading below.
"OK, so what's in it for me?"
- A team of 40+ people divided into cross-functional squads that make the Displate.com web app full of personalized content, allowing users to discover meaningful items that help geeks build the collections they deserve.
- Working in a team which delivers value to the production system with multiple deployments a day. We constantly iterate with A/B tests and measure and implement solutions at scale.
- We value and reward engagement and proactivity - you would have an impact on the way that our data stack will be developed. Besides constantly implementing production features, you will also be able to prototype and experiment with new services to resolve business objectives.
- Software Craftsmanship and Clean Code are not just buzzwords here - we strive for top-notch quality in everything we do - as you’ll find out during our Code Reviews, we’re rather demanding ;) We aim to test as much as possible in a variety of different ways.
- Flextime, open office policy - come to the office, we have cookies, Displates hanging all over the place, nice people, and a convenient space to work (we love it, and so will you!)
Key Responsibilities:
- Building data pipelines (AWS Glue) focused on Machine Learning for discoverability, content acquisition, and content personalization.
- Developing a real-time user profile, combining interactions streaming data (AWS Kinesis) with RDS records (stream enrichment).
- Contributing to a data lake (AWS Glue Data Catalogs, Lake formation, Athena) and data warehouse (AWS Redshift). Advising the most efficient infrastructure in terms of performance and cost.
- Productionalizing machine learning models (including custom models and off-the-shelf solutions like Amazon Personalize and Sagemaker), including working with the model registry, CI/CD pipelines, and managing model artefacts.
- Providing support and expertise on all machine learning product life cycle stages.
Requirements
Want to try yourself in Displate?
General expectations:
- Strong Python coding skills, including experience with creating, optimizing and monitoring pySpark jobs
- Experience with delivering and monitoring production data pipelines on a scale with ETL orchestrators (AWS Glue, Cloud Dataflow)
- Experience with real-time stream services (preferable AWS Kinesis)
- Experience with IaC for data-related services (preferable Cloudformation)
- 3+ years of total experience in data engineering, data operations, machine learning development, or related on a scale in production environment
- Expert knowledge of data structures and algorithms used in the data processing systems.
- Strong problem-solving skills and proficiency in leading technical meetings and ability to take ownership on data pipelines for ML.
Nice to have:
- Prior experience or willingness to learn concepts related to operations for Machine Learning
- Experience with NoSQL (DynamoDB)
Skillset:
Python (4/5),
Spark (4/5),
AWS (4/5),
Git (3/5),
Docker (2/5),
IaC (3/5),
Java (1/5),
Kubernetes (1/5),
data version control (1/5)
Tech stack (non-exhaustive)
- Data is stored in S3 buckets, PostgreSQL, DynamoDB, Athena and organized in AWS Glue Data Catalog. We process data from clickstream as ETL jobs using AWS Glue jobs, crawlers and workflows leveraging Spark distributed computing. Stream processing with AWS Kinesis
- We profile data using AWS Glue DataBrew, use Python Jupyter Notebooks and Redshift Notebooks. Any Ad-hoc analysis is made using SQL queries in AWS Athena.
- Development work is done with best programming practices in Gitlab, pytest tests and Gitlab CI/CD.
- The orchestration and monitoring include EKS, GitLab CI/CD, Ansible, Terraform, Composer, and New Relic
- We code in Python (including Spark) and Java, but other experience, is welcomed
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: A/B testing Ansible Athena AWS AWS Glue AWS Glue DataBrew CI/CD CloudFormation Dataflow DataOps Data pipelines Data warehouse Docker DynamoDB E-commerce Engineering ETL Git GitLab Java Jupyter Kinesis Kubernetes Lake Formation Machine Learning ML models NoSQL Pipelines PostgreSQL Prototyping PySpark Python Redshift SageMaker Security Spark SQL Streaming Terraform
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Lead Data Analyst jobs
- Open MLOps Engineer jobs
- Open Data Science Manager jobs
- Open Senior Business Intelligence Analyst jobs
- Open Data Manager jobs
- Open Data Engineer II jobs
- Open Power BI Developer jobs
- Open Principal Data Engineer jobs
- Open Sr Data Engineer jobs
- Open Data Analytics Engineer jobs
- Open Business Intelligence Developer jobs
- Open Junior Data Scientist jobs
- Open Data Scientist II jobs
- Open Product Data Analyst jobs
- Open Senior Data Architect jobs
- Open Sr. Data Scientist jobs
- Open Business Data Analyst jobs
- Open Big Data Engineer jobs
- Open Data Analyst Intern jobs
- Open Azure Data Engineer jobs
- Open Manager, Data Engineering jobs
- Open Data Product Manager jobs
- Open Data Quality Analyst jobs
- Open Junior Data Engineer jobs
- Open Principal Data Scientist jobs
- Open Data quality-related jobs
- Open Business Intelligence-related jobs
- Open GCP-related jobs
- Open ML models-related jobs
- Open Data management-related jobs
- Open Java-related jobs
- Open Privacy-related jobs
- Open Finance-related jobs
- Open Data visualization-related jobs
- Open APIs-related jobs
- Open Deep Learning-related jobs
- Open PyTorch-related jobs
- Open Snowflake-related jobs
- Open TensorFlow-related jobs
- Open Consulting-related jobs
- Open PhD-related jobs
- Open CI/CD-related jobs
- Open NLP-related jobs
- Open Kubernetes-related jobs
- Open Data governance-related jobs
- Open Airflow-related jobs
- Open Hadoop-related jobs
- Open LLMs-related jobs
- Open Databricks-related jobs
- Open Data warehouse-related jobs