Data Engineer
US Remote
H1
H1 is the convening force for global HCP, clinical, science, and research insights that inform a healthier future. Join the journey.About the Data Engineering TeamThe Data Engineering team is building a suite of data, products, services, and other tools to help solve healthcare and life science space problems. This includes the classification of researchers and physicians to their scholarly research, simulating how effective drug compounds will be, and much more.
H1 is growing fast, and we are expanding to more countries and offering an expanded catalog of web applications and data-oriented services, so we're looking for people who want to grow fast too. We think an environment that is supportive, collaborative, and sophisticated is the key to making this happen.
The RoleAs a Data Engineer, you will pull, clean, augment and master data coming from a variety of public and private sources. This goes way beyond simple ETL - we’re preparing data sets that will be used to power machine learning algorithms, not just pulling data out of a database to display in a graph. You will be working with stakeholders across the company to ensure that the Data Engineering team’s services improve the visibility, reliability, consistency, and availability of our data, development, and production infrastructures.
YouYou are an experienced data engineer that is comfortable scaling software for complex data transformations. This data is product-facing, robustly engineered with testing coverage, and written cleanly and extensively. You're responsible for developing and delivering these innovative, flexible, and scalable solutions, and driving the automation and adoption of data engineering processes on the cloud. You’ll work within a Spark/Scala environment from notebooks to mature pipelines., as we’re code-centric, not tool-centric. As a data engineer, you’re equally comfortable writing code to normalize names or pulling together disparate notebook functions into a reusable class architecture. You’ll build out data pipelines, schedulers, CI, and docker images on top of AWS infrastructure. Additionally, cloud functions, networking, writing IaC task definitions for ECS, or a variety of engineering tasks might be on the table at any given time. Being uncomfortable in the unknown is a familiar place for you. You’ve built up something from scratch and documentation -- and that motivates you.
What You’ll Do at H1
- Design, develop, and implement the pipelines and needed infrastructure that brings our data to life and powers our solutions.
- Automate and harden the technology maximizing speed, efficiency, resiliency, and repeatability without manual intervention.
- Support the pipeline execution in the production and troubleshoot technical issues Partnering closely with data experts to reveal insights and grow the value of our data.
- End to End data deployment from inception to customer consumption.
- Cross teaming with the broader engineering and product teams to collaborate and bring winning solutions to the market with accuracy and agility.
- Ensuring proper knowledge transfer amongst the data engineering team to promote proper and innovative use of our solutions Document our methods and processes
Ideal Skills
- 3+ years of development experience with Scala, and PythonProficient experience with Spark, Hadoop, JSON
- Experience working with Airflow, Databricks, and ElasticsearchExperience working with development SaaS applications and tooling (e.g. GitHub, Bitbucket, Jira, Confluence, etc)Knowledge of CI/CD and DevOps tools
- Proficiency working with RDBMS such as SQL Server and PostgreSQL
- Scripting experience of automating using shell and python, and background in working in Linux/macOS environment.
- Experience developing in a cloud environment (AWS preferred) a plus
- Ability to discuss problems/solutions to non-technical stakeholders Strong Logic and Algorithm Skills and knowledge of data structure principles
- Experience with standing up infrastructure from documentation
- A relevant degree or job-related experience
Benefits and Perks
- Health Insurance
- Work from Home
- Flexible Work Hours
- Computer Setup
Headquartered in New York City, with offices in India (and post-pandemic - Singapore, Europe (location TBD), Boston, and San Francisco), H1 is changing the way Healthcare professionals connect. We’ve been named one of Forbes Best Startup Employers -2021. Visit h1insights.com to learn more about us.
Tags: Airflow AWS Bitbucket CI/CD Classification Databricks Data pipelines DevOps Docker ECS Engineering ETL GitHub Hadoop Healthcare technology Jira JSON Linux Machine Learning Pipelines PostgreSQL Python RDBMS Research Scala Spark SQL Testing
Perks/benefits: Career development Flex hours Health care Startup environment
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Data Science Manager jobs
- Open MLOps Engineer jobs
- Open AI Engineer jobs
- Open Senior Business Intelligence Analyst jobs
- Open Sr Data Engineer jobs
- Open Data Engineer II jobs
- Open Data Manager jobs
- Open Principal Data Engineer jobs
- Open Data Analytics Engineer jobs
- Open Power BI Developer jobs
- Open Junior Data Scientist jobs
- Open Product Data Analyst jobs
- Open Data Scientist II jobs
- Open Senior Data Architect jobs
- Open Business Intelligence Developer jobs
- Open Sr. Data Scientist jobs
- Open Manager, Data Engineering jobs
- Open Big Data Engineer jobs
- Open Data Analyst Intern jobs
- Open Business Data Analyst jobs
- Open Data Quality Analyst jobs
- Open Data Product Manager jobs
- Open Junior Data Engineer jobs
- Open ETL Developer jobs
- Open Principal Data Scientist jobs
- Open Data quality-related jobs
- Open Business Intelligence-related jobs
- Open GCP-related jobs
- Open ML models-related jobs
- Open Data management-related jobs
- Open Privacy-related jobs
- Open Java-related jobs
- Open Finance-related jobs
- Open Data visualization-related jobs
- Open APIs-related jobs
- Open Deep Learning-related jobs
- Open PyTorch-related jobs
- Open Consulting-related jobs
- Open TensorFlow-related jobs
- Open Snowflake-related jobs
- Open PhD-related jobs
- Open NLP-related jobs
- Open CI/CD-related jobs
- Open Kubernetes-related jobs
- Open Airflow-related jobs
- Open Data governance-related jobs
- Open Databricks-related jobs
- Open Hadoop-related jobs
- Open LLMs-related jobs
- Open Data warehouse-related jobs