Senior Research Scientist
Cambridge, MA
Kensho
Kensho develops cutting-edge products and technologies that transform businesses. We are the AI Innovation Hub for S&P Global.At Kensho, we hire talented people and give them the freedom, support, and resources needed to accomplish our shared goals. We believe in flexibility-first and give our employees the opportunity to work from where they feel most productive and engaged (must be in the United States). We also value in-person collaboration, so there may be times when travel to one of our Kensho hubs (e.g., Cambridge, MA or NYC) will be required for team meetings or company events.
About the R&D Lab:Since 2022, we have been building a world-class R&D lab comprised of NLP Research Scientists, and we heavily prioritize publishing in top-tier conferences. Our small team has demonstrated compelling results and is fueling innovation throughout Kensho and S&P Global at large. Specifically, we are continuously developing Large Language Models (LLMs) and are actively working on long-context question-answering (QA), complex reasoning, tokenization, alignment (e.g., factuality), multi-document QA, and more!
Our small team has reserved access to hundreds of fast GPUs (A100s), spanning Cloud and on-prem machines.
Our current projects include:- Long-context document QA, where the answer is contained within documents that are hundreds of pages in length [1]- Complex reasoning, including better understanding and improving models’ ability to approximate numbers (related to commonsense reasoning).- Creating rigorous evaluation benchmarks, spanning domain knowledge, quantity extraction, and program synthesis [2]- Improving existing alignment techniques for domain-specific needs, while also addressing factuality- Dissecting tokenizers to better understand how each of the sub-components impact intrinsic and extrinsic performance [3][4]- Multi-Document QA where the answer requires combining information from dozens of sources.- Retrieval-augmented generation (RAG) methods- Creating high-quality data filters for LLM development
Additionally, we maintain strong relationships with academia, including collaborating on several ongoing projects, providing industry grants, sponsoring conferences, and jointly holding faculty positions.[1] DocFinQA: A Long-Context Financial Reasoning Dataset (Reddy et al., 2024)[2] BizBench: A quantitative reasoning benchmark for business and finance (Koncel-Kedziorski et al., 2024)[3] Tokenization Is More Than Compression (Schmidt et al., 2024)[4] Greed is All You Need: An Evaluation of Tokenizer Inference Methods (Uzan et al., 2024)
What You'll Do:
- Helping to Identify the most promising problems to pursue
- Developing novel, state-of-the-art NLP models that can scale to millions of documents
- Working closely with other Research Scientists and ML Engineers
- Writing clean, readable research code in PyTorch (not expected to write production-level code)
- Contribute to a stellar engineering culture that values excellent design, documentation, testing, and code
- Share your research results with your colleagues (presentations) and the world (published papers, patents, and blog posts)
Who You Are:
- Hold a PhD in Computer Science or related field
- Have several years of post-PhD research experience in industry or academia
- Have a strong publication record with top-tier ML/NLP conferences (e.g., ACL, NAACL, EMNLP, NeurIPS, ICML)
- Are proficient in writing code in PyTorch, Tensorflow, or JAX
- Experience with leading research projects with others (e.g., last-author papers), including directing the vision and providing regular feedback
- Have experience with the techniques required to work effectively with large, messy real-world data
- Prefer to collaborate iteratively on hard problems with your teammates rather than spending stretches of time working alone and presenting your results intermittently
- Have a love for learning new skills and domainsAre excited to share knowledge freely, proactively, and effectively with others who are interested (e.g., participate in our Reading Group)
- Are a generous teammate who takes work seriously without taking yourself too seriously
Technologies We Love at Kensho:
- ML: PyTorch, Weights & Biases, NetworkX
- Deployment: Airflow, Docker, EC2, Kubernetes, AWS
- Datastores: Postgres, Elasticsearch, S3
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow ASR AWS Classification Computer Science Docker EC2 Elasticsearch EMNLP Engineering Excel Finance ICML JAX Kubernetes LLMs Machine Learning NeurIPS NLP PhD Physics PostgreSQL PyTorch R R&D Research Statistics TensorFlow Testing Unstructured data Weights & Biases
Perks/benefits: Career development Conferences Flex vacation Health care Medical leave Parental leave Pet friendly Startup environment Team events Unlimited paid time off
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Data Science Manager jobs
- Open MLOps Engineer jobs
- Open Lead Data Analyst jobs
- Open Senior Business Intelligence Analyst jobs
- Open Data Manager jobs
- Open Data Engineer II jobs
- Open Principal Data Engineer jobs
- Open Power BI Developer jobs
- Open Sr Data Engineer jobs
- Open Business Intelligence Developer jobs
- Open Junior Data Scientist jobs
- Open Data Analytics Engineer jobs
- Open Product Data Analyst jobs
- Open Data Scientist II jobs
- Open Business Data Analyst jobs
- Open Senior Data Architect jobs
- Open Sr. Data Scientist jobs
- Open Data Analyst Intern jobs
- Open Big Data Engineer jobs
- Open Manager, Data Engineering jobs
- Open Data Quality Analyst jobs
- Open Data Product Manager jobs
- Open Junior Data Engineer jobs
- Open Principal Data Scientist jobs
- Open Azure Data Engineer jobs
- Open GCP-related jobs
- Open Data quality-related jobs
- Open Business Intelligence-related jobs
- Open Java-related jobs
- Open ML models-related jobs
- Open Data management-related jobs
- Open Privacy-related jobs
- Open Data visualization-related jobs
- Open Finance-related jobs
- Open Deep Learning-related jobs
- Open PhD-related jobs
- Open APIs-related jobs
- Open TensorFlow-related jobs
- Open PyTorch-related jobs
- Open NLP-related jobs
- Open Consulting-related jobs
- Open Snowflake-related jobs
- Open CI/CD-related jobs
- Open LLMs-related jobs
- Open Kubernetes-related jobs
- Open Generative AI-related jobs
- Open Data governance-related jobs
- Open Hadoop-related jobs
- Open Airflow-related jobs
- Open Docker-related jobs