Site Reliability Engineer (SRE), Multimodal

Menlo Park, CA

Applications have closed

Character.AI

Meet AIs that feel alive. Chat with anyone, anywhere, anytime. Experience the power of super-intelligent chat bots that hear you, understand you, and remember you.

View company page

About us

Character’s mission is to empower everyone with AGI. Our vision is to enable people with our technology so that they can use Character.AI any moment of any day.

Character.AI is one of the world’s leading personal AI platforms. Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character.AI is a full-stack AI company with a globally scaled direct-to-consumer platform. As of 2023 that platform was #2 in the space in user engagement. Character.AI is uniquely centered around people, letting users personalize their experience by interacting with AI “Characters.” The company achieved unicorn status in 2023 and was named Google Play’s AI App of the Year.

Noam co-invented the key tech powering LLMs and was recently named to TIME100’s Most Influential People in AI list. TIME called him “one of the most important and impactful people of the space’s past, present, and future.” Daniel created and led LaMDA, the breakthrough conversational tech project currently powering Bard.

To learn more, please visit beta.character.ai.

About the role

Responsibilities:

As a Multimodal Site Reliability Engineer (SRE) at Character, you will be responsible for ensuring the reliability, scalability, and performance of our app and AI multimodal services (e.g., voice interfacing services). You will work closely with our development team to design and implement processes and systems that ensure the stability and availability of our service.

  • Maintain production multimodal services operational.

  • Instrument, monitor and optimize the performance and reliability of our service.

  • Implement and maintain automation tools and processes to prevent and mitigate service disruptions.

  • Collaborate with development teams to design and implement scalable, reliable systems.

  • Participate in on-call rotations to provide support for critical incidents and outages.

Who we’re looking for

Requirements:

  • Possess deep expertise in Python, SQL, Linux, CI/CD, Kubernetes, Terraform

  • Have worked with multiple cloud computing platforms such as GCP

  • Can reliably troubleshoot across a range of platforms and systems

Desired Experience:

  • Have familiarity with WebRTC stacks / services

  • Have familiarity with GPU clusters and/or HPC environments

  • Have familiarity with monitoring and logging tools such as Prometheus and Grafana

  • Have first-hand experience scaling a consumer product from early days into hypergrowth

You will be a good fit if you are proactive and have a “get things done” mindset. Given our current pace of growth and load on our systems, most people have had a significant impact during their first week at the company.

Character is an equal opportunity employer and does not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status. We value diversity and encourage applicants from a range of backgrounds to apply.

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Tags: AGI Bard CI/CD GCP GPU Grafana HPC Kubernetes Linux LLMs Python SQL Terraform

Perks/benefits: Career development

Region: North America
Country: United States
Job stats:  20  4  0

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.