Data Scientist, NLP (Remote)
Remote - San Francisco, California, United States
Applications have closed
AssemblyAI
With AssemblyAI's industry-leading Speech AI models, transcribe speech to text and extract insights from your voice data.AssemblyAI is an AI company - we build powerful models to transcribe and understand audio data, exposed through simple APIs.
Hundreds of companies, and thousands of developers, use our APIs to both transcribe and understand millions of videos, podcasts, phone calls, and zoom meetings every day. Our APIs power innovative products like conversational intelligence platforms, zoom meeting summarizers, content moderation, and automatic closed captioning.
We’ve been growing at breakneck speed, and are backed by leading investors including Y Combinator’s AI Fund, Patrick and John Collision (Founders of Stripe), Nat Friedman (Former CEO of GitHub), and Daniel Gross (Entrepreneur & Investor in companies including GitHub, Uber, Coinbase, SpaceX, Instacart, Notion, and Cruise Automation).
AssemblyAI’s Speech-to-Text APIs are already trusted by Fortune 500s, startups, and thousands of developers around the world, with well-known customers including Spotify, Algolia, Dow Jones, The Wall Street Journal, and NBCUniversal. As part of a huge and emerging market, AssemblyAI is well on its way to becoming the leader in speech recognition and NLP.
Join our world-class, remote team and help us build an iconic deep learning company.
The Role:
AssemblyAI is growing quickly, and we’re searching for a Data Scientist to join our team. With significant investment and strong leadership to fuel our growth, it’s the perfect time to join the AssemblyAI team!
In this role you’ll have the opportunity to:
- Leverage cloud services to process terabytes of data. Data is the lifeblood of deep learning models. Current SOTA models require massive quantities of high quality and relevant data. Cloud platforms like AWS and GCP enable processing terabytes of data in a matter of hours. You will be responsible for efficiently leveraging these platforms to quickly process new data and share this knowledge with the team through code libraries and examples.
- Ensure our models are generalizing well during inference. Our customers rely on our models being able to generalize to their data. It's imperative that we have a good understating of our training data and how well it represents our customers' use cases. Your analysis and automatic monitoring of our models' inference performance will help catch issues before they effect customers and enable us to adapt our models quickly to new domains.
- Be part of a world-class team of creative researchers & engineers. You will help strengthen the position of AssemblyAI as a leading company in AI research. Our deep learning team is a tight knit group of creative researchers and engineers, who are not afraid to try unconventional ideas. You will have the opportunity to use your insights to come up with hypotheses and test them through experimentation.
Responsibilities:
- Engineer scalable pipelines for data procurement and preprocessing
- Drive best practices for the team when working with terabytes of data in a hybrid cloud environment
- Stay up to date on model evaluation research so that we can optimize metrics that align with human preferences
- Streamline production release QA process for new models to ensure we're not introducing quality regressions
- Present statistical and qualitative analysis of our data and model behaviour to the team
Our Team:
We are a fully remote team made up of problem solvers, innovators and top AI researchers with 20+ years of experience in Machine Learning, Speech Recognition, and NLP from places like DeepMind, Google, Meta, Amazon, Apple, and Cisco. Our culture is super collaborative, low-ego, transparent, and fast-paced. We want to win - and have a flat organization where everyone can openly share ideas (regardless of their title or position) in order to get the best idea.
As a remote company, our team members are given a lot of trust and autonomy to work where and how they want. We look for people to join our team who are ambitious, curious, and self-motivated, and we put a lot of trust and autonomy into everyone on our team. We want to empower everyone to do their best work with whatever tools, structures, or resources they need to perform at their highest potential.
Requirements
- 2+ experience as a data engineer, data scientist, or similar role
- 1+ experience with cloud platforms technology such as AWS or GCP
- Fluent in Python or similar programming languages
- Excellent written and oral communication skills in English
Benefits
- Competitive Salary
- Equity
- 100% Remote team
- Unlimited PTO
- Premium Healthcare (100% Covered)
- Vision & Dental Care
- $1K budget for your home office setup
- New Macbook Pro (or PC if you prefer)
- 3-4x/year company paid team retreats
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: APIs AWS Deep Learning GCP GitHub Machine Learning NLP Pipelines Python Research
Perks/benefits: Career development Competitive pay Equity Gear Health care Home office stipend Startup environment Team events Unlimited paid time off
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Lead Data Analyst jobs
- Open MLOps Engineer jobs
- Open AI Engineer jobs
- Open Senior Business Intelligence Analyst jobs
- Open Sr Data Engineer jobs
- Open Data Engineer II jobs
- Open Data Manager jobs
- Open Principal Data Engineer jobs
- Open Power BI Developer jobs
- Open Data Analytics Engineer jobs
- Open Junior Data Scientist jobs
- Open Product Data Analyst jobs
- Open Data Scientist II jobs
- Open Senior Data Architect jobs
- Open Business Intelligence Developer jobs
- Open Sr. Data Scientist jobs
- Open Manager, Data Engineering jobs
- Open Big Data Engineer jobs
- Open Data Analyst Intern jobs
- Open Data Quality Analyst jobs
- Open Business Data Analyst jobs
- Open Data Product Manager jobs
- Open Junior Data Engineer jobs
- Open ETL Developer jobs
- Open Principal Data Scientist jobs
- Open Data quality-related jobs
- Open Business Intelligence-related jobs
- Open ML models-related jobs
- Open GCP-related jobs
- Open Data management-related jobs
- Open Privacy-related jobs
- Open Java-related jobs
- Open Finance-related jobs
- Open Data visualization-related jobs
- Open APIs-related jobs
- Open Deep Learning-related jobs
- Open PyTorch-related jobs
- Open Consulting-related jobs
- Open Snowflake-related jobs
- Open TensorFlow-related jobs
- Open PhD-related jobs
- Open CI/CD-related jobs
- Open NLP-related jobs
- Open Data governance-related jobs
- Open Kubernetes-related jobs
- Open Airflow-related jobs
- Open Hadoop-related jobs
- Open Databricks-related jobs
- Open LLMs-related jobs
- Open DevOps-related jobs