Freelance Data Engineer

Warsaw, Masovian Voivodeship, Poland - Remote

Applications have closed

Displate

Wall art built to last forever. Official designs from Star Wars™, Marvel, Netflix and 200+ more brands. Hassle-free magnet mounting. 100% made in the EU.

View company page

We’re Displate, one of the top 5 most valuable Polish e-commerce according to Forbes Magazine, the XXI-century metal posters makers. Displate.com is home to artworks from thousands of independent artists and the biggest brands on this planet, like Marvel, DC, Star Wars, Blizzard, NASA, Iron Maiden, CyberPunk, The Witcher… you name it.

The role

We are looking for a Data Engineer who will join one of the product teams on a 6 monthly contract. The role is focused on productionalization of ML-focused data pipelines, organizing data resources (data lake), leveraging streaming data and prototyping with new services in AWS cloud.

You will work together with back-end developers and data scientists to deliver better products and services by building purpose-oriented data lakes and pipelines.

The work will require designing, building, operationalizing, securing, and monitoring data processing systems in the AWS environment with a mind on security, scalability, and reliability. As part of the team, you will have an impact on how the data is governed, accessed and distributed across services and data products for ML.


Sounds exciting? Keep reading below.

"OK, so what's in it for me?"

  • A team of 40+ people divided into cross-functional squads that make the Displate.com web app full of personalized content, allowing users to discover meaningful items that help geeks build the collections they deserve.
  • Working in a team which delivers value to the production system with multiple deployments a day. We constantly iterate with A/B tests and measure and implement solutions at scale.
  • We value and reward engagement and proactivity - you would have an impact on the way that our data stack will be developed. Besides constantly implementing production features, you will also be able to prototype and experiment with new services to resolve business objectives.
  • Software Craftsmanship and Clean Code are not just buzzwords here - we strive for top-notch quality in everything we do - as you’ll find out during our Code Reviews, we’re rather demanding ;) We aim to test as much as possible in a variety of different ways.
  • Flextime, open office policy - come to the office, we have cookies, Displates hanging all over the place, nice people, and a convenient space to work (we love it, and so will you!)


Key Responsibilities:

  • Building data pipelines (AWS Glue) focused on Machine Learning for discoverability, content acquisition, and content personalization.
  • Developing a real-time user profile, combining interactions streaming data (AWS Kinesis) with RDS records (stream enrichment).
  • Contributing to a data lake (AWS Glue Data Catalogs, Lake formation, Athena) and data warehouse (AWS Redshift). Advising the most efficient infrastructure in terms of performance and cost.
  • Productionalizing machine learning models (including custom models and off-the-shelf solutions like Amazon Personalize and Sagemaker), including working with the model registry, CI/CD pipelines, and managing model artefacts.
  • Providing support and expertise on all machine learning product life cycle stages.

Requirements

Want to try yourself in Displate?

General expectations:

  • Strong Python coding skills, including experience with creating, optimizing and monitoring pySpark jobs
  • Experience with delivering and monitoring production data pipelines on a scale with ETL orchestrators (AWS Glue, Cloud Dataflow)
  • Experience with real-time stream services (preferable AWS Kinesis)
  • Experience with IaC for data-related services (preferable Cloudformation)
  • 3+ years of total experience in data engineering, data operations, machine learning development, or related on a scale in production environment
  • Expert knowledge of data structures and algorithms used in the data processing systems.
  • Strong problem-solving skills and proficiency in leading technical meetings and ability to take ownership on data pipelines for ML.


Nice to have:

  • Prior experience or willingness to learn concepts related to operations for Machine Learning
  • Experience with NoSQL (DynamoDB)



Skillset:

Python (4/5),

Spark (4/5),

AWS (4/5),

Git (3/5),

Docker (2/5),

IaC (3/5),

Java (1/5),

Kubernetes (1/5),

data version control (1/5)




Tech stack (non-exhaustive)

  • Data is stored in S3 buckets, PostgreSQL, DynamoDB, Athena and organized in AWS Glue Data Catalog. We process data from clickstream as ETL jobs using AWS Glue jobs, crawlers and workflows leveraging Spark distributed computing. Stream processing with AWS Kinesis
  • We profile data using AWS Glue DataBrew, use Python Jupyter Notebooks and Redshift Notebooks. Any Ad-hoc analysis is made using SQL queries in AWS Athena.
  • Development work is done with best programming practices in Gitlab, pytest tests and Gitlab CI/CD.
  • The orchestration and monitoring include EKS, GitLab CI/CD, Ansible, Terraform, Composer, and New Relic
  • We code in Python (including Spark) and Java, but other experience, is welcomed

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Tags: A/B testing Ansible Athena AWS AWS Glue AWS Glue DataBrew CI/CD CloudFormation Dataflow DataOps Data pipelines Data warehouse Docker DynamoDB E-commerce Engineering ETL Git GitLab Java Jupyter Kinesis Kubernetes Lake Formation Machine Learning ML models NoSQL Pipelines PostgreSQL Prototyping PySpark Python Redshift SageMaker Security Spark SQL Streaming Terraform

Regions: Remote/Anywhere Europe
Country: Poland
Job stats:  105  14  1
Category: Engineering Jobs

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.