Research Engineer - Data Quality

New York City

Character.AI

Meet AIs that feel alive. Chat with anyone, anywhere, anytime. Experience the power of super-intelligent chat bots that hear you, understand you, and remember you.

View company page

About us

Character’s mission is to empower everyone with AGI. Our vision is to enable people with our technology so that they can use Character.AI any moment of any day.

Character.AI is one of the world’s leading personal AI platforms. Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character.AI is a full-stack AI company with a globally scaled direct-to-consumer platform. As of 2023 that platform was #2 in the space in user engagement. Character.AI is uniquely centered around people, letting users personalize their experience by interacting with AI “Characters.” The company achieved unicorn status in 2023 and was named Google Play’s AI App of the Year.

Noam co-invented the key tech powering LLMs and was recently named to TIME100’s Most Influential People in AI list. TIME called him “one of the most important and impactful people of the space’s past, present, and future.” Daniel created and led LaMDA, the breakthrough conversational tech project currently powering Bard.

To learn more, please visit beta.character.ai.

About the role

As a passionate data miner, you wield big data tools and visualization software to line up research developments. You love uncovering interesting subsets of data, creating clear dashboards to communicate your findings, and proposing ideas for unexplored opportunities. You are also extremely interested in learning how to support large language model development with unparalleled for data quality and performance insights.

You are an ML engineer, data engineer, or data scientist who wants to work with world-class LLM researchers to curate, develop, and analyze our data catalog. Your responsibilities are threefold:

  • Curate, mine, and analyze datasets for LLMs

  • Work with our Product org to identify datasets needed for specific user experiences

  • Help maintain and improve core tables in our data lake used for research across the company

  • Data is the lifeblood of AI. Alongside the data platform team, you will be responsible for making sure this vital resource is available, understood, and of the highest quality.

Who we’re looking for

Required Experience:

  • 5+ years of experience

  • Familiarity with Machine Learning and NLP and willingness to learn more on the job

  • Experience mining text and graphical data

  • Data visualization skills

  • SQL Wizardry

  • Spark Experience

  • Passionate about Conversational AI or large language models

Additional Desired Experience:

  • Experience with cloud platforms like GCP

  • Experience with Kubernetes

  • Experience training your own LLMs

You will be a good fit if you are proactive and have a “get things done” mindset. Given our current pace of growth and load on our systems, most people have had a significant impact during their first week at the company.

Character is an equal opportunity employer and does not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status. We value diversity and encourage applicants from a range of backgrounds to apply.

Apply now Apply later
  • Share this job via
  • or

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Tags: AGI Bard Big Data Conversational AI Data quality Data visualization GCP Kubernetes LLMs Machine Learning ML models NLP Research Spark SQL

Perks/benefits: Career development

Region: North America
Country: United States
Job stats:  14  3  0

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.