Software Engineer - Machine Learning Infrastructure

Mountain View, CA

DiDi Labs logo
DiDi Labs
Apply now Apply later

Posted 1 week ago

Didi Chuxing (“DiDi”) is the world’s leading mobile transportation platform. We’re committed to working with communities and partners to solve the world’s transportation, environmental, and employment challenges by using big data-driven deep-learning algorithms that optimize resource allocation.

Didi Chuxing’s Autonomous-Driving team was established in 2016, and has grown to a comprehensive research and development organization covering HD mapping, perception, behavior prediction, planning and control, infrastructure and simulation, labeling, hardware, mechanical, problem diagnosis, vehicle modifications, connected car, and security, among others. We’re developing and testing self-driving vehicles in China and the United States.

In August 2019, DiDi upgraded its autonomous driving unit to an independent company to focus on R&D, product application, and business development related to self-driving technologies. The new company will integrate the resources and technology of DiDi’s platform, continue to increase investment in R&D, and deepen collaboration with auto industry partners.

 

Software Engineer - Machine Learning Infrastructure

The Machine Learning infra team builds and supports the essential tools and frameworks for every machine learning engineer at Didi autonomous driving. Our goal is to greatly accelerate the development cycle of machine learning models across the whole company, empowering machine learning engineers to focus on improving the car’s safety and performance, instead of worrying about their infrastructure.  Our scope covers the whole lifecycle of machine learning: intelligent data collection, feature processing, model training and evaluation/debugging, and deployment to vehicles.  We care about performance, ease of use and reliability of our products.

Responsibilities:

  • Design, implement and deploy offboard and onboard infra and tools to support machine learning models training/deployment workflows.
  • Own technical projects from start to finish and be responsible for major technical decisions and tradeoffs. Effectively participate in team’s planning, code reviews and design discussions.
  • Consider the effects of projects across multiple teams and proactively manage conflicts. Work closely with partner teams to ensure they are benefiting from the systems we built.
  • Conduct technical interviews with well-calibrated standards and play an essential role in recruiting activities. Effectively onboard and mentor junior engineers and/or interns.

Qualifications:

  • Strong coding in Python
  • Expertise in at least one of the following:
    • Data processing and storage systems (e.g. relational databases, NoSQL databases, stream processing etc.)
    • Architecture of distributed systems (e.g. Apache Spark, Apache Hadoop etc.)
    • Strong coding in C++
    • Building frameworks with high quality lasting APIs
  • Passionate about self-driving technology and its potential impact on the world
  • BS, MS or PhD in CS, Math or equivalent real-world experience

Preferred:

  • Deep learning frameworks like PyTorch, TensorFlow, etc.
  • Experience with MLOps platforms such as Kubeflow etc.
  • Understanding of distributed ML model training, model deployment (e.g., TensorRT conversion)
  • Experience building software solutions on cloud infrastructure
  • Experience working with Docker and Kubernetes
  • Knowledge and experience with machine learning algorithms
Job tags: Autonomous Driving Big Data Deep Learning Distributed Systems Hadoop Kubernetes Machine Learning ML NoSQL Python PyTorch R Research Security Spark TensorFlow
Job region(s): North America
Job stats:  14  2  0
  • Share this job via
  • or