Data Engineer, Machine Learning (Remote)
San Francisco
Applications have closed
AllTrails
Search over 400,000 trails with trail info, maps, detailed reviews, and photos curated by millions of hikers, campers, and nature lovers like you.What You’ll Be Doing:
- Deploy and build systems that enable machine learning and artificial intelligence product solutions
- Work cross-functionally to ensure data scientists have access to clean, reliable, and secure data, the backbone for new algorithmic product features
- Build, deploy, and orchestrate large-scale batch and stream data pipelines to transform and move data to/from our data warehouse and third-party systems
- Deliver scalable, testable, maintainable, and high-quality code
- Investigate, test-for, monitor, and alert on inconsistencies in our data, data systems, or processing costs
- Create tools to improve data and model discoverability and documentation
- Ensure data collection and storage adheres to GDPR and other privacy and legal compliance requirements
- Uphold best data-quality standards and practices, promoting such knowledge throughout the organization
Requirements:
- Expertise in Python for data cleansing, transformation, modeling, etc.
- Professional experience in transforming machine-learning prototypes into solutions that scale with real-world constraints and deploying them into production
- Proficiency with SQL and experience working with high volume datasets in SQL-based warehouses such as BigQuery, Redshift, Snowflake, or others
- Experience with parallelized data processing frameworks such as Apache Beam, Apache Spark, Google Dataflow, AWS Glue, etc.
- Deep understanding of data modeling, access, storage, caching, replication, and optimization techniques
- Ability to orchestrate data pipelines through tools such as Apache Airflow
- Experienced in container orchestration (e.g. Docker)
- Understanding of the software development lifecycle and CI/CD
- Monitoring and metrics-gathering (e.g. Datadog, NewRelic, Cloudwatch, etc)
- Proficiency with git and working on a shared codebase
- Excellent documentation skills
- Self motivation and a deep sense of pride in your work
- Passion for the outdoors
- Comfort with ambiguity, and an instinct for moving quickly
- Humility, empathy and open-mindedness - no egos
Bonus Points:
- Experience working with machine learning development frameworks such as TensorFlow, Caffe2, PyTorch, Spark ML, scikit-learn, or related frameworks
- Experience with machine learning workflow management frameworks such as MLFlow, KubeFlow, SageMaker, Neptune, or related frameworks
- Experience with GPU-optimized data processing (i.e. CUDA)
- Experience with infrastructure-as-code, such as Terraform
- Experience with ELT tools such as dbt or Dataform
What We Offer:
- A competitive and equitable compensation plan. This is a full-time, salaried position that includes equity.
- Physical & mental well-being including health, dental and vision benefits.
- Trail Days: First Friday of each month off to hit the trails!
- Unlimited PTO.
- Flexible parental leave.
- Annual continuing education stipend.
- Discounts on subscriptions and merchandise for you and your friends & family.
- An authentic investment in you as a human being and your career as a professional.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow AWS AWS Glue BigQuery CI/CD CUDA Dataflow Data pipelines Data warehouse Docker ELT Git GPU Kubeflow Machine Learning MLFlow Pipelines Privacy Python PyTorch Redshift SageMaker Scikit-learn Snowflake Spark SQL TensorFlow Terraform
Perks/benefits: Career development Competitive pay Equity Fitness / gym Flex hours Flex vacation Health care Parental leave Salary bonus Unlimited paid time off
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Lead Data Analyst jobs
- Open Data Science Manager jobs
- Open Senior Business Intelligence Analyst jobs
- Open MLOps Engineer jobs
- Open Data Manager jobs
- Open Data Engineer II jobs
- Open Power BI Developer jobs
- Open Principal Data Engineer jobs
- Open Sr Data Engineer jobs
- Open Data Analytics Engineer jobs
- Open Business Intelligence Developer jobs
- Open Data Scientist II jobs
- Open Junior Data Scientist jobs
- Open Product Data Analyst jobs
- Open Senior Data Architect jobs
- Open Business Data Analyst jobs
- Open Sr. Data Scientist jobs
- Open Big Data Engineer jobs
- Open Data Analyst Intern jobs
- Open Manager, Data Engineering jobs
- Open Azure Data Engineer jobs
- Open Junior Data Engineer jobs
- Open Data Product Manager jobs
- Open Data Quality Analyst jobs
- Open Principal Data Scientist jobs
- Open Data quality-related jobs
- Open Business Intelligence-related jobs
- Open ML models-related jobs
- Open GCP-related jobs
- Open Data management-related jobs
- Open Java-related jobs
- Open Privacy-related jobs
- Open Finance-related jobs
- Open Data visualization-related jobs
- Open APIs-related jobs
- Open Deep Learning-related jobs
- Open PyTorch-related jobs
- Open Consulting-related jobs
- Open Snowflake-related jobs
- Open TensorFlow-related jobs
- Open PhD-related jobs
- Open CI/CD-related jobs
- Open NLP-related jobs
- Open Kubernetes-related jobs
- Open Data governance-related jobs
- Open LLMs-related jobs
- Open Airflow-related jobs
- Open Hadoop-related jobs
- Open Data warehouse-related jobs
- Open Databricks-related jobs