Data Engineer - Machine Learning Product Catalogue
Poznań, Warsaw, Poland
Job Description
The salary range for this position is (contract of employment):
mid: 12 300 - 17 600 PLN in gross terms
senior: 16 100 - 23 200 PLN in gross terms
A hybrid work model that incorporates solutions developed by the leader and the team
We are looking for a Data Engineer with a focus on the data processing and preparation, deployment and maintenance of our ML/data projects. Join our team to enhance your skills related to deploying data-based processes, MLOps Machine Learning approaches and share the skills within the team.
We are looking for people who have:
- 2+ years hands-on experience in Python and its data processing toolset (pandas, NumPy)
- Experience in process/solution monitoring
- Knowledge and experience in processing large datasets with Big Data tools, especially Spark (PySpark)
- Proficiency in using development tools (git, issue tracking, pull requests, code reviews etc.), familiarity with software engineering best practices (PEP8, code review, documentation, CI/CD, testing, automation etc.)
- DevOps experience
- Experience in writing advanced and efficient SQL queries (especially in GCP/BigQuery environment)
- Experience in working on cloud solutions and architecture (GCP, AWS, Azure)
- Understanding of AI related concepts (classification vs clustering, modeling, precision/recall metrics, model evaluation etc.) and demonstrated ability to use those metrics to back up assumptions and evaluate outcomes
- Positive attitude and ability to work in a team
- Good communication skills and pro-activity in seeking, clarifying and understanding information from end users and stakeholders
An additional advantage would be:
- Previous experience in building, evaluating or deploying ML/AI-based solutions
- Knowledge of ML libraries (sklearn, xgboost, lgbm)
- MLOps practical experience
- Previous experience with GCP tools for data processing e.g. BigQuery, Dataproc etc. and workflow automation solutions, e.g. Airflow
- GCP certifications and/or hand-on experience in GCP including ML/AI tools (vertex AI)
Our techstack:
- Python, BigQuery SQL, Spark
- Google Cloud Platform (Airflow, BigQuery, Composer)
- GitHub (code storage, CI/CD, hosting our own Data Science Python library)
What we offer:
- A hybrid work model that you will agree on with your leader and the team. We have well-located offices (with fully equipped kitchens and bicycle parking facilities) and excellent working tools (height-adjustable desks, interactive conference rooms)
- Annual bonus up to 10% of the annual salary gross (depending on your annual assessment and the company’s results)
- A wide selection of fringe benefits in a cafeteria plan – you choose what you like (e.g. medical, sports or lunch packages, insurance, purchase vouchers)
- English classes that we pay for related to the specific nature of your job
- Working in a team you can always count on — we have on board top-class specialists and experts in their areas of expertise
- A high degree of autonomy in terms of organizing your team’s work; we encourage you to develop continuously and try out new things
- Hackathons, team tourism, training budget and an internal educational platform, MindUp (including training courses on work organization, means of communications, motivation to work and various technologies and subject-matter issues)
- A 16" or 14" MacBook Pro with M1 processor and, 32GB RAM or a corresponding Dell with Windows (if you don’t like Macs) and other gadgets that you may need
What will your responsibilities be?
- You will be actively responsible for building data processing tools for modeling, analysis and ML – in close cooperation with Data Science team
- You will be supporting Data Science team in the development of data sources for ad-hoc analyses and Machine Learning projects
- You will process terabytes of data using Google Cloud Platform BigQuery, Composer, Dataflow and PySpark as well as optimize processes in terms of their performance and GCP cloud processing costs
- You will collect process requirements from project groups and automate tasks related to preprocessing and data quality monitoring, prediction serving, as well as Machine Learning model monitoring, alerting and retraining
- You will be responsible for the engineering quality of each project and you will cooperate with your colleagues on the engineering excellence
Why is it worth working with us?
- Through the supplied data and processes, you will have a meaningful impact on the operation of one of the largest e-commerce platforms in the world
- Thanks to the wide range of projects we are involved in, you will never be without an interesting challenge to take on
- You will have access to vast datasets (measured in petabytes)
- You will get a chance to work in a team of experienced engineers and BigData specialists who are willing to share their knowledge (incl. with the general public, as part of allegro.tech)
- Your professional growth will follow the most recent open-source technological trends
- You will have an actual impact on the directions of product development and on the selection of particular technologies – we use the most recent and best technological solutions available, because we align them closely with our needs
- We are a full-stack provider – we design, code, test, deploy and maintain our solutions
Send us your CV and learn why it’s #goodtobehere
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow Architecture AWS Azure Big Data BigQuery CI/CD Classification Clustering Dataflow Dataproc Data quality DevOps E-commerce Engineering GCP Git GitHub Google Cloud Machine Learning MLOps NumPy Open Source Pandas PySpark Python Scikit-learn Spark SQL Testing Vertex AI XGBoost
Perks/benefits: Career development Gear Lunch / meals Salary bonus Startup environment
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Lead Data Analyst jobs
- Open MLOps Engineer jobs
- Open Senior Business Intelligence Analyst jobs
- Open Data Manager jobs
- Open Data Science Manager jobs
- Open Principal Data Engineer jobs
- Open Data Engineer II jobs
- Open Sr Data Engineer jobs
- Open Power BI Developer jobs
- Open Data Scientist II jobs
- Open Product Data Analyst jobs
- Open Business Intelligence Developer jobs
- Open Data Analytics Engineer jobs
- Open Junior Data Scientist jobs
- Open Sr. Data Scientist jobs
- Open Senior Data Architect jobs
- Open Business Data Analyst jobs
- Open Data Analyst Intern jobs
- Open Big Data Engineer jobs
- Open Principal Data Scientist jobs
- Open Manager, Data Engineering jobs
- Open Junior Data Engineer jobs
- Open Data Product Manager jobs
- Open Data Quality Analyst jobs
- Open Research Scientist jobs
- Open GCP-related jobs
- Open Data quality-related jobs
- Open Business Intelligence-related jobs
- Open Java-related jobs
- Open ML models-related jobs
- Open Data management-related jobs
- Open Privacy-related jobs
- Open Data visualization-related jobs
- Open Finance-related jobs
- Open Deep Learning-related jobs
- Open PhD-related jobs
- Open APIs-related jobs
- Open TensorFlow-related jobs
- Open PyTorch-related jobs
- Open NLP-related jobs
- Open Consulting-related jobs
- Open Snowflake-related jobs
- Open CI/CD-related jobs
- Open LLMs-related jobs
- Open Kubernetes-related jobs
- Open Generative AI-related jobs
- Open Data governance-related jobs
- Open Hadoop-related jobs
- Open Airflow-related jobs
- Open Docker-related jobs