Data Engineer

San Francisco, California, United States - Remote

Applications have closed

Yakoa

Yakoa's AI searches for fraud across many blockchains to protect NFT consumers and defend the intellectual property of brands.

View company page

We're looking for a forward-thinking, structured problem solver, and technical specialist passionate about building systems at scale. You will be among the first to tap into massive blockchain datasets, to construct data infrastructure that makes possible analytics, data science, machine learning, and AI workloads.

As the data domain specialist, you will partner with a cross-functional team of product engineers, analytics specialists, and machine learning engineers to unify data infrastructure across Yakoa's product suite. Requirements may be vague, but the iterations will be rapid, and you must take thoughtful and calculated risks. Your work will take place at the interface of the AI, blockchain, and intellectual property domains, so you must be a quick learner with a thirst for many types of knowledge.


Responsibilities

  • Design, build, test, and maintain scalable data pipelines and microservices sourcing both first-party and third-party datasets and deploying distributed (cloud) structures and other applicable storage forms such as vector databases and relational databases.
  • Index multiple blockchain data standards into responsive data environments, and tune those environments to power real-time query infrastructure.
  • Design and optimize data storage schemas to make terabytes of data readily accessible to our API.
  • Build utilities, user-defined functions, libraries, and frameworks to better enable data flow patterns.
  • Utilize and advance continuous integration and deployment frameworks.
  • Research, evaluate and utilize new technologies/tools/frameworks centered around high-volume data processing.
  • Mentor other engineers while serving as technical lead, contributing to and directing the execution of complex projects.

Requirements

  • 4+ years working as a data engineer.
  • Proficient in database schema design, and analytical and operational data modeling.
  • Proven experience working with large datasets and big data ecosystems for computing (spark, Kafka, Hive, or similar), orchestration tools (dagster, airflow, oozie, luigi), and storage(S3, Hadoop, DBFS).
  • Experience with modern databases (PostgreSQL, Redshift, Dynamo DB, Mongo DB, or similar).
  • Proficient in one or more programming languages such as Python, Java, Scala, etc., and rock-solid SQL skills.
  • Experience building CI/CD pipelines with services like Bitbucket Pipelines or GitHub Actions.
  • Proven analytical, communication, and organizational skills and the ability to prioritize multiple tasks at a given time.
  • An open mind to try solutions that may seem astonishing at first.
  • An MS in Computer Science or equivalent experience.


Exceptional candidates also have:

  • Experience with Web3 tooling.
  • Experience with artificial intelligence, machine learning, and other big data techniques.
  • B2B software design experience.


No crypto or Web3 experience? No problem! We’ll help coach you and cover any costs for educational materials for your growth.

Benefits

  • Unlimited PTO.
  • Competitive compensation packages.
  • Remote friendly & flexible hours.
  • Wellness packages for mental and physical health.

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Tags: Airflow APIs Big Data Bitbucket Blockchain CI/CD Computer Science Crypto Dagster Data pipelines GitHub Hadoop Kafka Machine Learning Microservices Oozie Pipelines PostgreSQL Python RDBMS Redshift Research Scala Spark SQL

Perks/benefits: Career development Competitive pay Flex hours Flex vacation Gear Health care Unlimited paid time off

Regions: Remote/Anywhere North America
Country: United States
Job stats:  7  1  0
Category: Engineering Jobs

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.