Data Scientist

Washington, District of Columbia, United States

Applications have closed

Sayari

Get instant access to public records, financial intelligence and structured business information on over 455 million companies worldwide.

View company page

Sayari’s Data Scientist is a staff role within its Product group reporting to the Director of Data & API Product Management. This is a core role within Product’s Data & API team and can be remote or based in Sayari’s Washington, DC headquarters location. The Data & API team is responsible for the company’s data asset portfolio, as well as its API and bulk data offerings.

The Data Scientist’s primary functional responsibilities will include: 1) blueprint and deliver market requirements & design guiding the the company’s overall effort to aggregate corporate and non-corporate data from a variety of premium & open-source databases into Sayari’s singular data repository; 2) training of analytic methods to identify risks & threats across the entirety of the company’s data assets, as well as proprietary and secure/classified data assets of its customers and 3) providing leadership with reporting on coverage and quality metrics 4) creating sample queries and notebooks for internally and by end-users

Your primary functional responsibilities include providing technical expertise on data retrieval to both product managers and engineers, assisting in implementing QA/QC practices, and providing best practices to our client base. You should also feel comfortable representing Sayari in presales presentations and industry conferences. Additionally, our sales team may rely on you for best practices on leveraging our bulk data and API offerings.

The scope of your role extends from maintaining our current data assets to scouting for new assets to extend Sayari’s data portfolio. You will expand the depth of our data offering under the leadership of our Director of Data & API.

You should possess a unique blend of business and technical savvy; a big-picture vision, and the drive to make that vision a reality. You should enjoy spending time in the market to understand relevant problem spaces across the Financial Crime & RegTech/FinTech value-chain and finding innovative solutions that address them.

You should be able to communicate with all areas of the company. You will work with the company’s Product Management, R&D Application Engineering, R&D Data Engineering & Technical Services teams to define data requirements. You will be an important voice contributing to Product Management’s requirements definition for our data portfolio’s overall capabilities and will assist the Product group’s Global Data Manager in identifying data assets for acquisition based on the ease of their assimilation into the company’s existing data portfolio library.

This is a remote role that offers an office option located in the heart of Washington, DC, a block away from the Chinatown metro. The Product team is a cross-department team working with our Engineering, Marketing, and Content divisions as well as other key stakeholders across the business.

What You Will Do:


We need your help to harvest and transform hundreds of millions of structured and unstructured records from over 150 countries and 30 languages into a dynamic and meaningful graph of entities and relationships. You will also work with data and analytics experts and analysts to find and resolve data quality problems.

Requirements

What You Will Need:

  • Three plus years of experience developing in Python (e.g. pandas, NumPy, Scrapy)
  • Ability to create and maintain complex SQL queries
  • Familiarity with graph databases
  • Conduct exploratory data analysis and data visualization for generating and reporting key performance indicators to relevant stakeholders
  • Comfortable working in a cloud environment (GCP/AWS)
  • Familiar with data warehousing best practices

What We Would Like:

  • Experience in data warehousing, test planning, writing and executing test cases, and creating automation scripts for ETL testing
  • Ability to identify, evaluate, and deploy new algorithms, data strategies, test plans, and implementation capabilities to drive continuous innovation
  • Carry a passion to stay on top of tech trends, experiment with and learn new technologies, participate in internal & external technology communities, and mentor other members of the data community
  • Partner closely with software engineering and product stakeholders to support development of innovative analytics solutions and products
  • Familiar with developing and deploying containerized applications and services, including orchestration, particularly Kubernetes
  • Ability to develop frameworks, approaches, solutions and recommendations that effectively and efficiently address the most impactful opportunities and challenges
  • Experience with or interest in learning Apache Spark and/or other components of the Hadoop ecosystem
  • Experience with Apache Airflow

Who You Are:

  • Strong process-oriented self-starter, with impeccable organizational skills
  • Experienced in supporting and working with cross-functional teams in a dynamic environment
  • Experienced in working with non-English data

Benefits

What We Offer:

  • Limitless growth and learning opportunities
  • A collaborative and positive culture - your team will be as smart and driven as you
  • A strong commitment to diversity, equity & inclusion
  • Exceedingly generous vacation leave, parental leave, floating holidays, flexible schedule, & other remarkable benefits
  • Outstanding competitive compensation & commission package
  • Comprehensive family-friendly health benefits, including full healthcare coverage plans, commuter benefits, & 401K matching

Sayari is an equal opportunity employer and strongly encourages diverse candidates to apply. We believe diversity and inclusion mean our team members should reflect the diversity of the United States. No employee or applicant will face discrimination or harassment based on race, color, ethnicity, religion, age, gender, gender identity or expression, sexual orientation, disability status, veteran status, genetics, or political affiliation. We strongly encourage applicants of all backgrounds to apply.

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Tags: Airflow APIs AWS Data analysis Data visualization Data Warehousing EDA Engineering ETL FinTech GCP Hadoop Kubernetes NumPy Pandas Python R R&D Spark SQL Testing

Perks/benefits: Career development Competitive pay Conferences Equity Flex hours Flex vacation Health care Parental leave

Region: North America
Country: United States
Job stats:  8  2  0
Category: Data Science Jobs

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.