Data Engineer

San Francisco

Applications have closed

Hive

Hive's APIs enable developers to integrate pre-trained AI models that address technically challenging content understanding needs into their applications.

View company page

About Hive
Hive is a full-stack deep learning platform helping to bring companies into the AI era. We take complex visual challenges and build custom machine learning models to solve them. For AI to work, companies need large volumes of high quality training data. We generate this data through Hive Data, our proprietary data labeling platform with over 1,000,000 globally distributed workers, generating millions of high quality pieces of data per day. We then use this training data to build machine learning models for verticals such as Media, Autonomous Driving, Security, and Retail. Today, we work with some of the largest companies in the world to redefine how they think about unstructured visual data. Together, we build solutions that incorporate AI into their businesses to completely transform industries.
We are fortunate that investors like Peter Thiel (Founders Fund), General Catalyst, 8VC, and others see Hive's potential to be groundbreaking in AI business solutions. We have over 160 talented individuals globally in our San Francisco and Delhi offices. Please reach out if you are interested in joining the AI revolution!
Data Engineer Role
In order to execute our vision, we need to grow our team of best-in-class data engineers. We are looking for developers who conduct impeccable data practices and implement high quality data infrastructures. We value hard workers who are comfortable improvising solutions to big data challenges while building a system that can stand the test of time. Our ideal candidate has experience building data infrastructure from the ground up, contributes innovative ideas and ingenious implementations to the team, and is capable of planning out scalable, maintainable data pipelines.
As a data engineer, you would at first work primarily on our Hive Media product, taking real-time data from hundreds of television streams and turning them into a combination of real-time and scheduled outputs, especially our signature ads feed. Your work would improve the quality of our results while reducing computational cost and latency. Expect truly novel challenges.

Responsibilities

  • Writing scheduled Spark pipelines that perform sophisticated queries on the entirety of our datasets
  • Writing real-time pipelines that execute complex operations on incoming data
  • Synchronizing large amounts of data between unstructured and structured formats on various data sources
  • Creating testing and alerting for data pipelines
  • Building out our data infrastructure and managing dependencies between data pipelines
  • Defining and implementing metrics that provide visibility into our data quality

Requirements

  • You have an undergraduate and / or graduate degree in computer science or a similar technical field, with a sound understanding of statistics
  • You have 1-2 years of industry experience as a data engineer
  • You have hands-on experience doing ETL and have written data pipelines in either Spark, Hadoop, or similar technologies
  • You have a sound understanding of SQL
  • You have worked with data lakes such as S3 or HDFS
  • You have worked with various databases, such as Postgres, Cassandra, or Redshift before, and understand their pros and cons
  • You have a working knowledge of the following technologies, or are not afraid of picking them up on the fly: Mesos, Chronos, Marathon, Jenkins
  • You are fluent in at least one scripting language (preferably NodeJS or python) and one compiled language (such as Scala, Java, or C)
  • You have great communication skills and ability to work with others
  • You are a strong team player, with a do-whatever-it-takes attitude
What We Offer You
We are a group of ambitious individuals who are passionate about creating a revolutionary machine learning company. At Hive, you will have a significant career development opportunity and a chance to contribute to one of the fastest growing AI startups in San Francisco. The work you do here will have a noticeable and direct impact on the development of Hive.
Our benefits include competitive pay, equity, health / vision / dental insurance, catered lunch and dinner, a corporate gym membership, etc.
Thank you for your interest in Hive.

Tags: Autonomous Driving Big Data Cassandra Computer Science Data pipelines Deep Learning ETL Hadoop HDFS Machine Learning ML models Node.js Pipelines PostgreSQL Python Redshift Scala Security Spark SQL Statistics Testing

Perks/benefits: Career development Competitive pay Fitness / gym Flex vacation Health care

Region: North America
Country: United States
Job stats:  28  3  0
Category: Engineering Jobs

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.