Sr. Staff ML Ops Technical Lead

Atlanta, US

Dolby Laboratories

Dolby entwickelt Audio-, Bild- und Sprachtechnologien für Film, TV, Musik und Spiele. Erleben Sie alles mit beeindruckendem Klang und atemberaubendem Bild

View company page

 

Join the leader in entertainment innovation and help us design the future. At Dolby, science meets art, and high tech means more than computer code. As a member of the Dolby team, you’ll see and hear the results of your work everywhere, from movie theaters to smartphones. We continue to revolutionize how people create, deliver, and enjoy entertainment worldwide. To do that, we need the absolute best talent. We’re big enough to give you all the resources you need, and small enough so you can make a real difference and earn recognition for your work. We offer a collegial culture, challenging projects, and excellent compensation and benefits, not to mention a Flex Work approach that is truly flexible to support where, when, and how you do your best work.

 

Dolby’s consumer entertainment and cinema businesses are bringing Dolby’s breakthrough technologies, powering the world’s top movies, TV shows, music, games, and live sports to more places around the world across a wider range of consumer experiences and devices.

 

We are seeking a talented  Staff Machine Learning Operations Engineer to join the Consumer Entertainment Group, to help bring the next generation of spectacular audio and video experiences to market. You will partner closely with research and development to establish machine-learning best practices and tools that maximize training and use of resources.

MLOpsEngineer-Responsibilities">Responsibilities

  • Troubleshooting high-performance computing, storage and networks for machine-learning workloads.
  • Collaborate with research, development and engineering to establish machine-learning and data management workflows and supporting tools and processes that maximize machine-learning activities and use of resources.
  • Improve capabilities of data set exploration, transformation and overall data management of large to very large datasets.
  • Partner with research and development to proactively iterate and fine-tune model training for best performance and efficient use of machine-learning resources.
  • Collaborate with infrastructure teams physical compute, storage and network infrastructure experts to improve on-premise and cloud infrastructure.
  • Improve use of cloud compute and storage for global research teams and manage within budget.

Education and Experience

  • BS or MS degree in Computer Science or equivalent experience.
  • 6+ years of professional practical hands-on experience in machine learning operations or equivalent.
  • Comprehensive knowledge of AWS and infrastructure-as-code techniques.
  • Advanced proficiency with Python, Terraform, Cloud Formation, Ansible, git and related.
  • Experience leading a small team of machine-learning operations engineers with international distribution.
  • Positive team leader with strong interpersonal skills to build team cohesion and rapport even from half a world away.
  • Proficiency with machine learning and scaling workloads with both cloud and on-premise GPU server environments.
  • Experience with managing and coordinating storage of large machine learning data sets.
  • Proficiency in Kubernetes cluster design, deployment and management.
  • Interest and understanding of industry trends in machine learning development techniques and tools and processes.
  • Comprehensive knowledge of continuous integration and continuous release processes and tools

Recommended

  • Exceptional understanding and practical experience in software and infrastructure configuration management with high-performance compute and storage and maximizing high-availability.
  • Active collaborator to help build positive community with researchers, scientists and engineers around machine-learning operations and resources.
  • AWS resource management and provisioning.
  • Previous experience in system administration and infrastructure.
  • Hands On Experience with:
  • Conda, Python
  • Ray cluster design, setup, provisioning and monitoring for high-availability.
  • ML flow or similar
  • High-performance file systems (lustre, beegeefs, Weka, or similar).

The Atlanta Area base salary range for this full-time position is $161,400-$197,200, which can vary if outside this location, plus bonus, benefits, and some roles may also include equity. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, competencies, experience, market demands, internal parity, and relevant education or training. Your recruiter can share more about the specific salary range and perks and benefits for your location during the hiring process.

 

Dolby will consider qualified applicants with criminal histories in a manner consistent with the requirements of San Francisco Police Code, Article 49, and Administrative Code, Article 12

 

Equal Employment Opportunity:
Dolby is proud to be an equal opportunity employer. Our success depends on the combined skills and talents of all our employees. We are committed to making employment decisions without regard to race, religious creed, color, age, sex, sexual orientation, gender identity, national origin, religion, marital status, family status, medical condition, disability, military service, pregnancy, childbirth and related medical conditions or any other classification protected by federal, state, and local laws and ordinances.

Apply now Apply later
  • Share this job via
  • or

Tags: Ansible AWS Classification Computer Science Data management Engineering Git GPU Kubernetes Machine Learning Model training Python Research Terraform Weka

Perks/benefits: Career development Equity Salary bonus

Region: North America
Country: United States
Job stats:  7  0  0

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.