Data Scientist
Washington, District of Columbia, United States
Applications have closed
Sayari
Get instant access to public records, financial intelligence and structured business information on over 455 million companies worldwide.Sayari’s Data Scientist is a staff role within its Product group reporting to the Director of Data & API Product Management. This is a core role within Product’s Data & API team and can be remote or based in Sayari’s Washington, DC headquarters location. The Data & API team is responsible for the company’s data asset portfolio, as well as its API and bulk data offerings.
The Data Scientist’s primary functional responsibilities will include: 1) blueprint and deliver market requirements & design guiding the the company’s overall effort to aggregate corporate and non-corporate data from a variety of premium & open-source databases into Sayari’s singular data repository; 2) training of analytic methods to identify risks & threats across the entirety of the company’s data assets, as well as proprietary and secure/classified data assets of its customers and 3) providing leadership with reporting on coverage and quality metrics 4) creating sample queries and notebooks for internally and by end-users
Your primary functional responsibilities include providing technical expertise on data retrieval to both product managers and engineers, assisting in implementing QA/QC practices, and providing best practices to our client base. You should also feel comfortable representing Sayari in presales presentations and industry conferences. Additionally, our sales team may rely on you for best practices on leveraging our bulk data and API offerings.
The scope of your role extends from maintaining our current data assets to scouting for new assets to extend Sayari’s data portfolio. You will expand the depth of our data offering under the leadership of our Director of Data & API.
You should possess a unique blend of business and technical savvy; a big-picture vision, and the drive to make that vision a reality. You should enjoy spending time in the market to understand relevant problem spaces across the Financial Crime & RegTech/FinTech value-chain and finding innovative solutions that address them.
You should be able to communicate with all areas of the company. You will work with the company’s Product Management, R&D Application Engineering, R&D Data Engineering & Technical Services teams to define data requirements. You will be an important voice contributing to Product Management’s requirements definition for our data portfolio’s overall capabilities and will assist the Product group’s Global Data Manager in identifying data assets for acquisition based on the ease of their assimilation into the company’s existing data portfolio library.
This is a remote role that offers an office option located in the heart of Washington, DC, a block away from the Chinatown metro. The Product team is a cross-department team working with our Engineering, Marketing, and Content divisions as well as other key stakeholders across the business.
What You Will Do:
We need your help to harvest and transform hundreds of millions of structured and unstructured records from over 150 countries and 30 languages into a dynamic and meaningful graph of entities and relationships. You will also work with data and analytics experts and analysts to find and resolve data quality problems.
Requirements
What You Will Need:
- Three plus years of experience developing in Python (e.g. pandas, NumPy, Scrapy)
- Ability to create and maintain complex SQL queries
- Familiarity with graph databases
- Conduct exploratory data analysis and data visualization for generating and reporting key performance indicators to relevant stakeholders
- Comfortable working in a cloud environment (GCP/AWS)
- Familiar with data warehousing best practices
What We Would Like:
- Experience in data warehousing, test planning, writing and executing test cases, and creating automation scripts for ETL testing
- Ability to identify, evaluate, and deploy new algorithms, data strategies, test plans, and implementation capabilities to drive continuous innovation
- Carry a passion to stay on top of tech trends, experiment with and learn new technologies, participate in internal & external technology communities, and mentor other members of the data community
- Partner closely with software engineering and product stakeholders to support development of innovative analytics solutions and products
- Familiar with developing and deploying containerized applications and services, including orchestration, particularly Kubernetes
- Ability to develop frameworks, approaches, solutions and recommendations that effectively and efficiently address the most impactful opportunities and challenges
- Experience with or interest in learning Apache Spark and/or other components of the Hadoop ecosystem
- Experience with Apache Airflow
Who You Are:
- Strong process-oriented self-starter, with impeccable organizational skills
- Experienced in supporting and working with cross-functional teams in a dynamic environment
- Experienced in working with non-English data
Benefits
What We Offer:
- Limitless growth and learning opportunities
- A collaborative and positive culture - your team will be as smart and driven as you
- A strong commitment to diversity, equity & inclusion
- Exceedingly generous vacation leave, parental leave, floating holidays, flexible schedule, & other remarkable benefits
- Outstanding competitive compensation & commission package
- Comprehensive family-friendly health benefits, including full healthcare coverage plans, commuter benefits, & 401K matching
Sayari is an equal opportunity employer and strongly encourages diverse candidates to apply. We believe diversity and inclusion mean our team members should reflect the diversity of the United States. No employee or applicant will face discrimination or harassment based on race, color, ethnicity, religion, age, gender, gender identity or expression, sexual orientation, disability status, veteran status, genetics, or political affiliation. We strongly encourage applicants of all backgrounds to apply.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow APIs AWS Data analysis Data visualization Data Warehousing EDA Engineering ETL FinTech GCP Hadoop Kubernetes NumPy Pandas Python R R&D Spark SQL Testing
Perks/benefits: Career development Competitive pay Conferences Equity Flex hours Flex vacation Health care Parental leave
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Marketing Data Analyst jobs
- Open MLOps Engineer jobs
- Open Junior Data Scientist jobs
- Open AI Engineer jobs
- Open Data Engineer II jobs
- Open Senior Data Architect jobs
- Open Power BI Developer jobs
- Open Senior Business Intelligence Analyst jobs
- Open Data Analytics Engineer jobs
- Open Sr Data Engineer jobs
- Open Manager, Data Engineering jobs
- Open Principal Data Engineer jobs
- Open Product Data Analyst jobs
- Open Business Data Analyst jobs
- Open Data Quality Analyst jobs
- Open Data Manager jobs
- Open Sr. Data Scientist jobs
- Open Big Data Engineer jobs
- Open Data Scientist II jobs
- Open Business Intelligence Developer jobs
- Open Data Analyst Intern jobs
- Open ETL Developer jobs
- Open Principal Data Scientist jobs
- Open Azure Data Engineer jobs
- Open Data Product Manager jobs
- Open Business Intelligence-related jobs
- Open Data quality-related jobs
- Open Privacy-related jobs
- Open Data management-related jobs
- Open GCP-related jobs
- Open Java-related jobs
- Open ML models-related jobs
- Open Finance-related jobs
- Open Data visualization-related jobs
- Open Deep Learning-related jobs
- Open APIs-related jobs
- Open PyTorch-related jobs
- Open PhD-related jobs
- Open Consulting-related jobs
- Open TensorFlow-related jobs
- Open Snowflake-related jobs
- Open NLP-related jobs
- Open Data governance-related jobs
- Open Data warehouse-related jobs
- Open Airflow-related jobs
- Open Hadoop-related jobs
- Open Databricks-related jobs
- Open LLMs-related jobs
- Open DevOps-related jobs
- Open CI/CD-related jobs