Intern, Data Engineer

London City, London, GB

Copyright Clearance Center

Collective licensing pioneer CCC helps you integrate, access, and share information through licensing, content, software and professional services.

View company page

Job Overview:

We are looking for a Data Engineer Intern that can work with the Architecture team on some internal initiatives. These initiatives encompass exploratory work, analytics as well as nascent services and products. The Data Engineer Intern will be allocated to one of these initiatives and will work with the Architecture team.

Our analytics stack includes the use of Spark / pyspark for bulk processing, Zeppelin notebooks and Airflow for process orchestration and data profiling, graph and relational databases for storage, R for visualization, and a variety of techniques for statistical analyses and machine learning.   

The individual must possess oral and written English communications skills and will gain experience of working with a cross-functional engineering team.

Experience with AWS is a plus. n

 

Primary Responsibilities:

  • Work with product owners and technical staff to integrate, profile and analyze internal and external data sets to provide data into the viability and quality of potential and existing CCC data offerings. 
  • Participates as a team member in analysis, development, implementation, testing and documentation of data engineering projects, setting and meeting realistic timelines and deadlines. 
  • Ensures that design and code review occur in a timely manner and that systems are documented. 

 

Requirements:

  • Python and/or R programming
  • Experience with databases, querying, reporting and ETL
  • Practiced in working with multiple data sets, creating combined views, measuring data quality, and applying insights to business problems
  • Experience working with APIs to query and obtain data
  • An understanding of fuzzy matching, entity matching/deduplication would be beneficial
  • The ability to track and evaluate experiments, communicate findings and propose next steps based on the outcomes
  • Familiar with GitHub for version control, Jira for task/issue tracking, and structured approaches to working on data-centric tasks (such as CRISP-DM)
  • Ability to work both independently and collaboratively, subject to peer review
  • Capable of setting and meeting deadlines
  • Excellent analytical, interpretative and interpersonal skills, backed up by the ability to convey meaningful information through verbal and written communication
  • May be accountable for other results and activities as assigned.
Apply now Apply later
  • Share this job via
  • or

Tags: Airflow APIs Architecture AWS Data quality Engineering ETL GitHub Jira Machine Learning PySpark Python R RDBMS Spark Statistics Testing

Region: Europe
Country: United Kingdom
Job stats:  27  3  0
Category: Engineering Jobs

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.