Stagiaire Data Engineering / Data Engineering Intern
Suresnes, France
Talend
Talend Data Fabric offers a scalable, cloud-independent data fabric that supports the full data lifecycle, from integration and quality to observability and governance.
WHO WE ARE:
We are changing the way the world makes decisions! Talend is a global leader in data integration and data integrity. Our software is used to truly transform business and companies with data. We believe our company has a certain Je ne sais quoi that makes us special and gives us opportunities with purpose. We pride ourselves in our values of Passion, Agility, Team Spirit and Integrity.
We help companies take their data from chaos to clarity by delivering complete, trusted, and timely data to the business.
With over 1,400 employees, we support more than 4,750 enterprise customers globally who have chosen Talend to put their data to work. We are consistently recognized by Forrester and Gartner as a leader in the Data Integration Market and our plan for the future is even more exciting.
Internship subject: Load data into Talend’s data Lake Context The Data preparation application allows the user to easily transform a data set by applying functions to columns. This Cloud application generates a lot of metadata. In the Lab team where you will be working, we are interested in applying Machine Learning algorithms to these generated data for various purposes. For instance, in the below screen capture, suggestions of which function to apply are offered to the user. These suggestions are based on some heuristics, but these do not take into account the choices already done by the user.
The data scientists of the team want to study the ability to base the suggestions on past user choices. In order to do so, we will need to first get an extract of the available metadata and create a data model that is easily reusable by the data scientists.
Objectives of the internship To manage and populate our data lake, we created 3 zones (Raw, Refined & Analytics) - Raw zone: We store the data iso-source. - Refined zone: We apply mono file transformations. - Analytics zone: We aggregate different sources to create our data model.
Your job is to develop: - A data pipeline to read the data in the raw zone and copy it to the refined zone, applying transformations and normalizations to the data. - A data pipeline that aggregates sources in the refined zone to create a data model and store the tables in the analytics zone. - An integration test script that runs the whole pipeline. - A log directory where you store errors and information about the steps of the process. - A README file to document what you did. - A Docker container with your programs.
Depending on your skills and interests, the data pipelines can be Python scripts, Talend Studio Data Integration jobs or even Talend Pipeline Designer’s pipelines. If time allows, you will participate to some machine learning experiments on the data.
Skills - Knowledge in databases, SQL - Knowledge in programming languages (Python, Java) - Knowledge in ETL AND NOW, A LITTLE ABOUT US:
Talend has received some pretty impressive accolades along the way:
- CEO named a 2020 Top Diverse Leader by the National Diversity Council- 5th consecutive year named a Leader for Data Integration Tools in the Gartner Magic Quadrant 2020- 3rd consecutive year named as a Leader for Data Quality Solutions in Gartner Magic Quadrant 2020- Recognized as a Challenger for Enterprise Integration Platform as a Service (iPaaS) in Gartner Magic Quadrant 2020- "2018 Best Public Cloud Computing Companies To Work For" by Glassdoor- Named Leader in The Forrester Wave™: Enterprise Data Fabric- Ranked in the DBTA “100 Companies that Matter Most in Data”- Listed in the CRN Big Data 100 Companies We are passionate about helping companies become more data driven; and, if we can be honest, we are all geeks at heart who pride ourselves on the vibrant company culture that we have built.
As a global employer, Talend believes our success depends on diversity, inclusion and mutual respect among our team members. We want to look like our customers, and we recruit, develop and retain the most hardworking people from a diverse candidate pool. We are committed to making all employment decisions on the basis of business need, merit, capability and equality of opportunity. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, or national origin.
We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.
We are changing the way the world makes decisions! Talend is a global leader in data integration and data integrity. Our software is used to truly transform business and companies with data. We believe our company has a certain Je ne sais quoi that makes us special and gives us opportunities with purpose. We pride ourselves in our values of Passion, Agility, Team Spirit and Integrity.
We help companies take their data from chaos to clarity by delivering complete, trusted, and timely data to the business.
With over 1,400 employees, we support more than 4,750 enterprise customers globally who have chosen Talend to put their data to work. We are consistently recognized by Forrester and Gartner as a leader in the Data Integration Market and our plan for the future is even more exciting.
Internship subject: Load data into Talend’s data Lake Context The Data preparation application allows the user to easily transform a data set by applying functions to columns. This Cloud application generates a lot of metadata. In the Lab team where you will be working, we are interested in applying Machine Learning algorithms to these generated data for various purposes. For instance, in the below screen capture, suggestions of which function to apply are offered to the user. These suggestions are based on some heuristics, but these do not take into account the choices already done by the user.
The data scientists of the team want to study the ability to base the suggestions on past user choices. In order to do so, we will need to first get an extract of the available metadata and create a data model that is easily reusable by the data scientists.
Objectives of the internship To manage and populate our data lake, we created 3 zones (Raw, Refined & Analytics) - Raw zone: We store the data iso-source. - Refined zone: We apply mono file transformations. - Analytics zone: We aggregate different sources to create our data model.
Your job is to develop: - A data pipeline to read the data in the raw zone and copy it to the refined zone, applying transformations and normalizations to the data. - A data pipeline that aggregates sources in the refined zone to create a data model and store the tables in the analytics zone. - An integration test script that runs the whole pipeline. - A log directory where you store errors and information about the steps of the process. - A README file to document what you did. - A Docker container with your programs.
Depending on your skills and interests, the data pipelines can be Python scripts, Talend Studio Data Integration jobs or even Talend Pipeline Designer’s pipelines. If time allows, you will participate to some machine learning experiments on the data.
Skills - Knowledge in databases, SQL - Knowledge in programming languages (Python, Java) - Knowledge in ETL AND NOW, A LITTLE ABOUT US:
Talend has received some pretty impressive accolades along the way:
- CEO named a 2020 Top Diverse Leader by the National Diversity Council- 5th consecutive year named a Leader for Data Integration Tools in the Gartner Magic Quadrant 2020- 3rd consecutive year named as a Leader for Data Quality Solutions in Gartner Magic Quadrant 2020- Recognized as a Challenger for Enterprise Integration Platform as a Service (iPaaS) in Gartner Magic Quadrant 2020- "2018 Best Public Cloud Computing Companies To Work For" by Glassdoor- Named Leader in The Forrester Wave™: Enterprise Data Fabric- Ranked in the DBTA “100 Companies that Matter Most in Data”- Listed in the CRN Big Data 100 Companies We are passionate about helping companies become more data driven; and, if we can be honest, we are all geeks at heart who pride ourselves on the vibrant company culture that we have built.
As a global employer, Talend believes our success depends on diversity, inclusion and mutual respect among our team members. We want to look like our customers, and we recruit, develop and retain the most hardworking people from a diverse candidate pool. We are committed to making all employment decisions on the basis of business need, merit, capability and equality of opportunity. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, or national origin.
We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.
Tags: Big Data Data pipelines Docker Engineering ETL Machine Learning Pipelines Python SQL Talend
Perks/benefits: Career development Flex vacation
Region:
Europe
Country:
France
Job stats:
57
18
0
Categories:
Deep Learning Jobs
Engineering Jobs
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Lead Data Analyst jobs
- Open MLOps Engineer jobs
- Open Data Science Manager jobs
- Open Senior Business Intelligence Analyst jobs
- Open Data Engineer II jobs
- Open Data Manager jobs
- Open Principal Data Engineer jobs
- Open Sr Data Engineer jobs
- Open Power BI Developer jobs
- Open Business Intelligence Developer jobs
- Open Data Analytics Engineer jobs
- Open Junior Data Scientist jobs
- Open Data Scientist II jobs
- Open Product Data Analyst jobs
- Open Senior Data Architect jobs
- Open Sr. Data Scientist jobs
- Open Business Data Analyst jobs
- Open Big Data Engineer jobs
- Open Data Analyst Intern jobs
- Open Manager, Data Engineering jobs
- Open Azure Data Engineer jobs
- Open Data Product Manager jobs
- Open Data Quality Analyst jobs
- Open Junior Data Engineer jobs
- Open Principal Data Scientist jobs
- Open GCP-related jobs
- Open Data quality-related jobs
- Open Business Intelligence-related jobs
- Open ML models-related jobs
- Open Java-related jobs
- Open Data management-related jobs
- Open Privacy-related jobs
- Open Finance-related jobs
- Open Data visualization-related jobs
- Open Deep Learning-related jobs
- Open PhD-related jobs
- Open APIs-related jobs
- Open TensorFlow-related jobs
- Open PyTorch-related jobs
- Open NLP-related jobs
- Open Consulting-related jobs
- Open Snowflake-related jobs
- Open CI/CD-related jobs
- Open Kubernetes-related jobs
- Open LLMs-related jobs
- Open Generative AI-related jobs
- Open Data governance-related jobs
- Open Hadoop-related jobs
- Open Airflow-related jobs
- Open Docker-related jobs