Staff Data Engineer - Research/Machine Learning

New York City

Character.AI

Meet AIs that feel alive. Chat with anyone, anywhere, anytime. Experience the power of super-intelligent chat bots that hear you, understand you, and remember you.

View company page

About us

Character’s mission is to empower everyone with AGI. Our vision is to enable people with our technology so that they can use Character.AI any moment of any day.

Character.AI is one of the world’s leading personal AI platforms. Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character.AI is a full-stack AI company with a globally scaled direct-to-consumer platform. As of 2023 that platform was #2 in the space in user engagement. Character.AI is uniquely centered around people, letting users personalize their experience by interacting with AI “Characters.” The company achieved unicorn status in 2023 and was named Google Play’s AI App of the Year.

Noam co-invented the key tech powering LLMs and was recently named to TIME100’s Most Influential People in AI list. TIME called him “one of the most important and impactful people of the space’s past, present, and future.” Daniel created and led LaMDA, the breakthrough conversational tech project currently powering Bard.

To learn more, please visit beta.character.ai.

About the role

You would be a great fit for this role if you are an experienced engineer who will be instrumental in building the world's best LLMs by collecting and refining the essential training data that powers them. In pursuit of the best language models, your responsibility is twofold:

  • First, identify and collect data at the scale required to feed our largest models. This involves managing a diverse set of sources, including structured and unstructured content from text and multimedia formats. Your engineering expertise is crucial in crafting the infrastructure and tools necessary to efficiently collect and manage petabytes of data.

  • Second, you will experiment with various methods of extracting a balanced and comprehensive training dataset from the raw data. You will leverage your expertise in data to build datasets reflecting a hypothesis, train models, and evaluate experimental results. Through this experimentation, you will create the training datasets for our largest models.

These are critical steps in the construction of AI. With petabytes of data and numerous design decisions, each step requires careful attention. Expertise in AI is not necessary, but enthusiasm for the space and a track record of adapting to new domains is important.

Who we’re looking for

Required Experience:

  • 5+ years of production software engineering experience

  • Experience building large-scale data processing pipelines, with tools like PySpark, Beam, or Flink

  • Familiarity with Machine Learning and NLP and willingness to learn more on the job

  • Track record of adapting to new domains and a desire to use data to improve products

Additional Desired Experience:

  • ML experience as an ML engineer, Data Scientist, or another similar role

  • Experience with cloud platforms like AWS or Azure, or tools such as Kubernetes and Terraform

  • Passionate about Conversational AI or large language models

You will be a good fit if you are proactive and have a “get things done” mindset. Given our current pace of growth and load on our systems, most people have had a significant impact during their first week at the company.

Character is an equal opportunity employer and does not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status. We value diversity and encourage applicants from a range of backgrounds to apply.

Apply now Apply later
  • Share this job via
  • or

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Tags: AGI AWS Azure Bard Conversational AI Engineering Flink Kubernetes LLMs Machine Learning NLP Pipelines PySpark Research Terraform

Perks/benefits: Career development

Region: North America
Country: United States

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.