Research Data Science Intern - Transfer Learning

Paris, France

Applications have closed

Dataiku

Dataiku is the world’s leading platform for Everyday AI, systemizing the use of data for exceptional business results.

View company page

Headquartered in New York City, Dataiku was founded in Paris in 2013 and achieved unicorn status in 2019. Now, more than 850 employees work across the globe in our offices and remotely. Backed by a renowned set of investors and partners including CapitalG, Tiger Global, and ICONIQ Growth, we’ve set out to build the future of AI. 
We are looking for a Research Data Science Intern to join Dataiku’s Lab in our Paris office for a 6-month internship on the theme of Transfer Learning for Object Detection. In Transfer Learning a pre-trained network can be fine-tuned on new data to solve a new task. In the current protocol to transfer pre-trained deep models to different object detection tasks, various decisions regarding the fine-tuning setup (i.e. number of layers to re-train, ...) fall back to some existing default choice, without being optimized for the new task. With this internship we aim at exploring data-driven approaches to optimize the transfer setup for the new task.
The success of the model transfer depends on the amount of new data, but also on the complexity of the new task (number of classes, classes ambiguity) and on the similarity between the new data and the pre-training data. Definition of quantitative measures for the notion of tasks similarity [1,2] and dataset complexity [3] can be used to make sensible decisions regarding the selection of training settings and hyper-parameters. Those data-driven decisions, as opposed to standard default choices, can yield to an optimized Transfer Learning protocol. Moreover, analyzing  dataset complexity and similarity to pre-training data could allow to recognize beforehand challenging situations where either training from scratch is the best solution or there is not enough data to retrain the model and alternative non-deep solutions are required.
This internship focuses on studying the optimal transfer learning setup of a pre-trained Faster RCNN Object Detection model, given a new dataset, its complexity and its similarity to pre-training data. The intern will first study existing measures of task difficulty and similarity, validate and employ them to collect a testbed of datasets organized in levels of complexity and linked by similarity. In a second step the intern will run a benchmark on the collected datasets, varying training settings and hyper-parameters, to highlight what settings are optimal for what level of complexity and similarity to pre-training data. The heuristics derived by this study will be integrated in the tool for smart selection of the training setup in the Deep Learning feature of the Dataiku data science software.

Your mission will be to:

  • Get familiar with the subject
  • Define and validate metrics of tasks complexity and similarity
  • Realise a through benchmark on datasets of various complexity/similarity
  • Derive data-driven heuristics for optimal transfer setup

You are our ideal candidate if:

  • You are eager to get your hands dirty and dive into coding
  • You know that bagging and boosting trees is not about gardening

Ideal technical skills:

  • Good understanding of parametric machine learning algorithms and their optimisation
  • Good experience with Python development; alternatively experience with an object-oriented language such as Java or Scala
  • Some experience working with deep learning frameworks, esp. Keras or pytorch, for both supervised and unsupervised text/tabular learning
Reference:[1]: Geometric Dataset Distances via Optimal Transport[2]: OTCE: A transferability Metrics for Cross-Domain Cross-Task Representations[3]: Efficient Image Dataset Classification Difficulty Estimation for Predicting Deep Learning Accuracy
About Dataiku:
Dataiku is the platform for Everyday AI, systemizing the use of data for exceptional business results. By making the use of data and AI an everyday behavior, Dataiku unlocks the creativity within individual employees to power collective success at companies of all sizes and across all industries. Don’t get us wrong: we are a tech company building software. Our culture is even pretty geeky! But our driving force is and will always remain people, starting with ours. We consider our employees to be our most precious asset, and we are committed to ensuring that each of them gets the most rewarding, enjoyable, and memorable work experience with us. Fly over to Instagram to learn more about our #dataikulife.
Our practices are rooted in the idea that everyone should be treated with dignity, decency and fairness. Dataiku also believes that a diverse identity is a source of strength and allows us to optimize across the many dimensions that are needed for our success. Therefore, we are proud to be an equal opportunity employer. All employment practices are based on business needs, without regard to race, ethnicity, gender identity or expression, sexual orientation, religion, age, neurodiversity, disability status, citizenship, veteran status or any other aspect which makes an individual unique or protected by laws and regulations in the locations where we operate. This applies to all policies and procedures related to recruitment and hiring, compensation, benefits, performance, promotion and termination and all other conditions and terms of employment.

Tags: Classification Deep Learning Keras Machine Learning Python PyTorch Research Scala

Perks/benefits: Career development

Region: Europe
Country: France
Job stats:  68  22  0

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.