Stagiaire Data Engineering / Data Engineering Intern

Suresnes, France

Talend logo
Talend
Apply now Apply later

Posted 4 weeks ago

WHO WE ARE:
We are changing the way the world makes decisions! Talend is a global leader in data integration and data integrity. Our software is used to truly transform business and companies with data. We believe our company has a certain Je ne sais quoi that makes us special and gives us opportunities with purpose. We pride ourselves in our values of Passion, Agility, Team Spirit and Integrity.
We help companies take their data from chaos to clarity by delivering complete, trusted, and timely data to the business.
With over 1,400 employees, we support more than 4,750 enterprise customers globally who have chosen Talend to put their data to work. We are consistently recognized by Forrester and Gartner as a leader in the Data Integration Market and our plan for the future is even more exciting.

Internship subject: Load data into Talend’s data Lake  Context  The Data preparation application allows the user to easily transform a data set by applying functions to columns. This Cloud application generates a lot of metadata. In the Lab team where you will be working, we are interested in applying Machine Learning algorithms to these generated data for various purposes. For instance, in the below screen capture, suggestions of which function to apply are offered to the user. These suggestions are based on some heuristics, but these do not take into account the choices already done by the user.  
The data scientists of the team want to study the ability to base the suggestions on past user choices. In order to do so, we will need to first get an extract of the available metadata and create a data model that is easily reusable by the data scientists.  
Objectives of the internship  To manage and populate our data lake, we created 3 zones (Raw, Refined & Analytics) - Raw zone: We store the data iso-source. - Refined zone: We apply mono file transformations.  - Analytics zone: We aggregate different sources to create our data model. 
Your job is to develop: - A data pipeline to read the data in the raw zone and copy it to the refined zone, applying transformations and normalizations to the data. - A data pipeline that aggregates sources in the refined zone to create a data model and store the tables in the analytics zone. - An integration test script that runs the whole pipeline. - A log directory where you store errors and information about the steps of the process. - A README file to document what you did. - A Docker container with your programs. 
Depending on your skills and interests, the data pipelines can be Python scripts, Talend Studio Data Integration jobs or even Talend Pipeline Designer’s pipelines.  If time allows, you will participate to some machine learning experiments on the data.  
Skills - Knowledge in databases, SQL - Knowledge in programming languages (Python, Java) - Knowledge in ETL  AND NOW, A LITTLE ABOUT US:
Talend has received some pretty impressive accolades along the way:
- CEO named a 2020 Top Diverse Leader by the National Diversity Council- 5th consecutive year named a Leader for Data Integration Tools in the Gartner Magic Quadrant 2020- 3rd consecutive year named as a Leader for Data Quality Solutions in Gartner Magic Quadrant 2020- Recognized as a Challenger for Enterprise Integration Platform as a Service (iPaaS) in Gartner Magic Quadrant 2020- "2018 Best  Public Cloud Computing Companies To Work For" by Glassdoor- Named Leader in The Forrester Wave™: Enterprise Data Fabric- Ranked in the DBTA “100 Companies that Matter Most in Data”- Listed in the CRN Big Data 100 Companies We are passionate about helping companies become more data driven; and, if we can be honest, we are all geeks at heart who pride ourselves on the vibrant company culture that we have built. 

As a global employer, Talend believes our success depends on diversity, inclusion and mutual respect among our team members. We want to look like our customers, and we recruit, develop and retain the most hardworking people from a diverse candidate pool. We are committed to making all employment decisions on the basis of business need, merit, capability and equality of opportunity. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, or national origin.
We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.
Job tags: Big Data Engineering ETL Java Machine Learning Python SQL
Job region(s): Europe
Job stats:  42  18  0
Share this job: