(Lead) NLP Data Scientist / ML Engineer, RegBrain

Remote, UK

Applications have closed
CUBE logo


We're a global regtech engineering a movement to transform regulatory data into regulatory intelligence. 🤖🌎

🤖 AI at CUBE

CUBE uses AI and NLP to machine read the regulatory internet, at global scale. We collect, clean, standardise, translate, monitor, classify, and enrich regulatory data across 180 countries in over 60 languages. All in near real-time.

We've even built our own ontology of regulation—machine-driven and continuously refined by a team of subject matter experts.

On a high level, CUBE uses AI to transform regulatory data into regulatory intelligence. And this is exactly where RegBrain comes in.

🧠 RegBrain

It's always a great time to become a CUBER, but now literally could not be a better time. This year, we are building out the core RegBrain team. RegBrain leverages the 10 years of global regulatory data that our existing AI teams have collected, cleaned, standardised, translated, and classified.

🚀 The mission: to create the ultimate semantic map of global regulatory data, and to take CUBE's AI to the next level through data learning.

The RegBrain team will be responsible for the end-to-end research, design, and development of both the semantic map and a suite of AI-driven capabilities—including recommendation systems, prediction, and task automation.

As such, the team will be split into two core areas: research & data science and ML & data engineering. All with an NLP flavour, of course.

⚠️ Please note: While we're hiring across a wide range of experience levels over the next 4-6 months, the most immediate open roles are team lead positions (there will be one lead for each subteam). The leads will directly influence the hiring process for the rest of the team. If you are not interested in a lead role but think you'd be a great fit for RegBrain, you can still fill out the application. It's designed to be versatile.

Here are the core responsibilities of each RegBrain subteam. Note that the responsibilities are extremely complementary, to reflect how closely the subteams will work together.

🧬 Research & data science

🚀  Core mission: Design ML & NLP prototypes for each RegBrain use case, and own the semantic map of CUBE's regulatory data.

  • Prepare, maintain, and refine the semantic map (knowledge graph) of CUBE's regulatory data.
  • Develop, test, and improve optimal ML & NLP models for each RegBrain use case.
  • Present information using data visualisation techniques (especially important for the semantic map).
  • Determine additional data sources and how to include them in the pipeline (another team will help with actually adding them).
  • Stay up-to-date with ML & NLP research, and experiment with new models and techniques.
🏗️ ML & data engineering

🚀  Core mission: Develop the ML & NLP prototypes from the data science team, resulting in APIs that can be consumed by CUBE's core platform.

  • Determine the cloud architecture strategy and overall ML & data systems for RegBrain.
  • Work closely with other AI engineering and data teams to ingest data from our core platform, our transformation engine, and other sources.
  • Improve the efficiency, performance, and scalability of ML & NLP models (this includes data quality, ingestion, loading, cleaning, and processing).
  • Improve the efficiency, performance, and scalability of the semantic map.
  • Verify that the quality of results in production meets the requirements.

💪 Core competencies

Just as the responsibilities of the RegBrain subteams overlap, the core competencies we're looking for overlap too. The good news for you is that we will use your preferences and the interview process to collaboratively determine which side of the spectrum you should sit on. The strongest candidates have competencies across both sides (and are as modular as CUBE's core product!).

  • End-to-end ML model design and development experience (design is more relevant for the data science team; deploying models to production and performance monitoring are especially important for the engineering team) 🌀
  • Experience with cloud infrastructure for data pipelining and model deployment (more relevant for engineering) ☁️
  • Experience with ML platforms, frameworks, and libraries 📚
  • Experience analysing vast volumes of textual data 🔠
  • Strong familiarity with SQL and NoSQL/graph databases 🏦
  • Solid understanding of data structures, data modelling, and software architecture 🏛️
  • Ability to write clear, robust, and testable code, especially in Python 🐍
  • Strong grasp of data visualisation techniques (for dashboarding, reporting, etc.) 📊
  • A systems thinking approach 🌐
  • A mathematically and statistically-oriented brain 🔢
  • A healthy sense of humour (you're going to need it... don't say we didn't warn you 😉)

Experience matters. But what is more important than raw number of years of experience is demonstrated proficiency (through GitHub profiles/online portfolios and the interview process itself). Bonus points for Stack Overflow and Kaggle contributions! 💯

💝 Why you'll love RegBrain (& CUBE)

If there is a best time to join RegBrain, it's now. Here are the many reasons why.

🌍 Immediate global impact. CUBE is a well-established player in regtech (we were around before regtech was even a thing!), and our category-defining product is used by leading financial institutions around the world (including Revolut, Citi, and HSBC). We have an audience across 150 countries, and they love CUBE.

🗽 Freedom & flexibility. Think of RegBrain as a fully-funded startup within a scaleup. The first to join will have a blank canvas, a tabula rasa. You'll be able to choose your own tech stack. GCP or AWS or Azure? To Spark or not to Spark? PyTorch or TensorFlow? You decide. As long as you can justify your choices, the rings of Saturn are the limit.

📊 Quantity & quality of data. The stage has literally been set: over the past 10 years, the five engineering teams at CUBE have built solid foundations for data collection, transformation, and classification. The RegBrain team will focus solely on learning from this mountain of structure.

🗣️ A rich & complex dataset. The main dataset is not only already structured, but also longitudinal and multilingual. We've tracked changes to regulation over time and built in-house translation models for 60+ languages.

📚 Always learning. Part of your job is to stay up-to-date with the latest research, and share your learning with the RegBrain team and other AI teams at CUBE. You'll have a training budget and a conference budget. In the mid-long term, we're aiming to collaborate with universities.

⚖️ Responsible AI. We will proactively address the inevitable biases that emerge for any AI system. Our Head of Product was trained at the Oxford Internet Institute and has direct connections with ethicists who are influencing the future of AI regulation.

💻 Employee-first work-life policy. CUBE went fully remote before the pandemic even hit, because we wanted to define the future of work. As a CUBER, you'll be able to design your home office and choose your own work equipment. Unable to work from home one week, or desperate for in-person interaction with colleagues? No problem—book a room in a coworking space.

🌱 Sustainable, customer-driven growth. We are a bootstrapped company funded by customers and strategic private investment. This means that growth is sustainable, and product development is very closely aligned with customer needs.

🌎 Visa sponsorship if required. We know every single nuance of Skilled Worker visas.

🦄 Extremely bespoke hiring process. At CUBE, we're trying to flip hiring on its head: the objective of the process is to create a personalised job description (and title). This page sets the general context. We'll collaboratively determine the best role for you, given your interests, CUBE's needs, and other members of the team.

⏱️ Hiring timeline

We know how insufferably long and complicated hiring processes can be. We've been there before.

That's why at CUBE, we aim to compress the hiring timeline to between 5 and 10 days (from the first-round interview to the final round). There's no HR screen, culture fit interview, or coding on a whiteboard. Just high-quality infoflow in both directions. 🌊

Here's what will happen:

  • Online application (link below 👇)
  • First round video interview with RegBrain's Head of Product (30-45m)
  • Second round video interview with our CTO (30-45m)
  • Take-home challenge (it'll be fun, we promise, and we won't ask for more than a few hours of your time)
  • Final round panel interview, again over video (45-60m)

If you have any questions at this stage, feel free to use the live chat widget on the application page. Otherwise: what are you waiting for? This is your once-in-a-lifetime opportunity to define the future of regulation. The clock is already ticking. 🕰️

Tags: AWS Azure Classification Engineering GCP ML Model deployment Model design NLP NoSQL Python PyTorch Research Spark SQL TensorFlow

Perks/benefits: Career development Flex vacation Gear Salary bonus Startup environment

Regions: Remote/Anywhere Europe
Country: United Kingdom

Other jobs like this

Explore more AI/ML/Data Science career opportunities

Find open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general, filtered by job title or popular skill, toolset and products used.