Data Engineer
San Francisco, California, United States - Remote
Yakoa
Yakoa's AI searches for fraud across many blockchains to protect NFT consumers and defend the intellectual property of brands.We're looking for a forward-thinking, structured problem solver, and technical specialist passionate about building systems at scale. You will be among the first to tap into massive blockchain datasets, to construct data infrastructure that makes possible analytics, data science, machine learning, and AI workloads.
As the data domain specialist, you will partner with a cross-functional team of product engineers, analytics specialists, and machine learning engineers to unify data infrastructure across Yakoa's product suite. Requirements may be vague, but the iterations will be rapid, and you must take thoughtful and calculated risks. Your work will take place at the interface of the AI, blockchain, and intellectual property domains, so you must be a quick learner with a thirst for many types of knowledge.
Responsibilities
- Design, build, test, and maintain scalable data pipelines and microservices sourcing both first-party and third-party datasets and deploying distributed (cloud) structures and other applicable storage forms such as vector databases and relational databases.
- Index multiple blockchain data standards into responsive data environments, and tune those environments to power real-time query infrastructure.
- Design and optimize data storage schemas to make terabytes of data readily accessible to our API.
- Build utilities, user-defined functions, libraries, and frameworks to better enable data flow patterns.
- Utilize and advance continuous integration and deployment frameworks.
- Research, evaluate and utilize new technologies/tools/frameworks centered around high-volume data processing.
- Mentor other engineers while serving as technical lead, contributing to and directing the execution of complex projects.
Requirements
- 4+ years working as a data engineer.
- Proficient in database schema design, and analytical and operational data modeling.
- Proven experience working with large datasets and big data ecosystems for computing (spark, Kafka, Hive, or similar), orchestration tools (dagster, airflow, oozie, luigi), and storage(S3, Hadoop, DBFS).
- Experience with modern databases (PostgreSQL, Redshift, Dynamo DB, Mongo DB, or similar).
- Proficient in one or more programming languages such as Python, Java, Scala, etc., and rock-solid SQL skills.
- Experience building CI/CD pipelines with services like Bitbucket Pipelines or GitHub Actions.
- Proven analytical, communication, and organizational skills and the ability to prioritize multiple tasks at a given time.
- An open mind to try solutions that may seem astonishing at first.
- An MS in Computer Science or equivalent experience.
Exceptional candidates also have:
- Experience with Web3 tooling.
- Experience with artificial intelligence, machine learning, and other big data techniques.
- B2B software design experience.
No crypto or Web3 experience? No problem! We’ll help coach you and cover any costs for educational materials for your growth.
Benefits
- Unlimited PTO.
- Competitive compensation packages.
- Remote friendly & flexible hours.
- Wellness packages for mental and physical health.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow APIs Big Data Bitbucket Blockchain CI/CD Computer Science Crypto Dagster Data pipelines GitHub Hadoop Kafka Machine Learning Microservices Oozie Pipelines PostgreSQL Python RDBMS Redshift Research Scala Spark SQL
Perks/benefits: Career development Competitive pay Flex hours Flex vacation Gear Health care Unlimited paid time off
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Marketing Data Analyst jobs
- Open MLOps Engineer jobs
- Open Junior Data Scientist jobs
- Open AI Engineer jobs
- Open Data Engineer II jobs
- Open Senior Data Architect jobs
- Open Sr Data Engineer jobs
- Open Senior Business Intelligence Analyst jobs
- Open Data Analytics Engineer jobs
- Open Power BI Developer jobs
- Open Manager, Data Engineering jobs
- Open Product Data Analyst jobs
- Open Principal Data Engineer jobs
- Open Business Data Analyst jobs
- Open Data Quality Analyst jobs
- Open Data Manager jobs
- Open Sr. Data Scientist jobs
- Open Data Scientist II jobs
- Open Big Data Engineer jobs
- Open Business Intelligence Developer jobs
- Open Data Analyst Intern jobs
- Open Principal Data Scientist jobs
- Open ETL Developer jobs
- Open Azure Data Engineer jobs
- Open Data Product Manager jobs
- Open Business Intelligence-related jobs
- Open Data quality-related jobs
- Open Privacy-related jobs
- Open Data management-related jobs
- Open GCP-related jobs
- Open Java-related jobs
- Open ML models-related jobs
- Open Finance-related jobs
- Open Data visualization-related jobs
- Open Deep Learning-related jobs
- Open APIs-related jobs
- Open PyTorch-related jobs
- Open PhD-related jobs
- Open Consulting-related jobs
- Open TensorFlow-related jobs
- Open Snowflake-related jobs
- Open NLP-related jobs
- Open Data governance-related jobs
- Open Data warehouse-related jobs
- Open Airflow-related jobs
- Open Hadoop-related jobs
- Open Databricks-related jobs
- Open LLMs-related jobs
- Open DevOps-related jobs
- Open CI/CD-related jobs