Director, Google Compute Engine, AI/ML Systems and Solutions

Kirkland, WA, USA

Google

Google’s mission is to organize the world's information and make it universally accessible and useful.

View company page

Minimum qualifications:

  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
  • 15 years of experience in machine learning, cloud computing, distributed systems architecture, and software engineering.
  • 10 years of leadership experience, including building and managing teams.
  • 10 years of experience working with hyperscalers and enterprise customers in the AI/ML domain.

Preferred qualifications:

  • Master's degree or PhD in computer science, engineering, or a related field.
  • Proposed preferred: 20 years of experience in machine learning, cloud computing, large-scale distributed systems architecture, and software engineering.
  • 15 years of experience of building and leading high-performing engineering teams in a fast-paced environment.
  • 15 years of experience working with hyperscalers and large enterprise customers in the AI/ML domain.
  • Understanding of AI training and inference workflows, including model architectures, training strategies, and deployment paradigms.
  • Excellent problem solving skills and ability to think strategically about complex technical challenges.
Proposed preferred:

About the job

The GCE (Google Compute Engine) AI/ML Systems and Solutions team develops turnkey solutions that enables customers to train SOTA machine learning models and serve them at scale on GCE’s platform. Given the diversity of AI workloads, these Systems amd Solutions play a key role in scaling customer adoption and business growth for GCE’s AI portfolio of products.

As Director, you will be responsible for the development and operation of AI Solutions on GCE instance families that leverage advancements in accelerators (e.g., HPC, High-Perf, and Cost Optimized GPUs) and the Google Compute Engine fabric. This includes developing AI solutions that enable workloads on single VMs to large-scale AI HyperComputer with reliability and performance.

Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

The US base salary range for this full-time position is $278,000-$399,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.

The GCE (Google Compute Engine) AI/ML Systems and Solutions team develops turnkey solutions that enables customers to train SOTA machine learning models and serve them at scale on GCE’s platform. Given the diversity of AI workloads, these Systems amd Solutions play a key role in scaling customer adoption and business growth for GCE’s AI portfolio of products.

Responsibilities

  • Develop the strategy for AI solutions on all accelerator instances in Google Cloud Platform (GCP), working closely with partner teams inside and outside of Google to leverage advancements in GPU and TPU technologies.
  • Build and grow a robust team of technical leads and engineers to execute on a multi-year roadmap.
  • Lead the team to maintain and improve product, engineering, operational, and support excellence to ensure the reliable and high-performance operation of our fleet of accelerator instances and support for large whale customers.
  • Collaborate closely with cross-functional teams inside and outside of Google to ensure the timely delivery of new product initiatives.
Apply now Apply later
  • Share this job via
  • or

Tags: Architecture Computer Science Distributed Systems Engineering GCP Google Cloud GPU HPC Machine Learning ML models PhD

Perks/benefits: Career development Equity / stock options Salary bonus

Region: North America
Country: United States

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.