Principal Software Engineer
Bengaluru, Karnataka, India
Microsoft
Azure AI Infrastructure team is looking for passionate engineers to build the largest deep-learning infrastructure service at Microsoft. In this role you will be tasked with building new components to bring the latest innovations in AI Infrastructure onto the Azure ML platform. You will partner with top engineering talent within Azure AI Infrastructure and across Azure to work on cluster orchestration, job scheduling, storage, networking, containerization and operating system integration. Your work will enable various AI languages and run-times on Azure AI Infrastructure to bring distributed deep learning training and inferencing to life. In addition, you will build infrastructure components required to build, deploy, monitor and service highly available and scalable Microsoft Service Fabric and Kubernetes clusters under your care. You will lead development and customer support from the frontline and establish architecture, service excellence guidelines and a high-quality bar.
Candidates must have a track record for delivering engineering and service excellence on a mid-to-large scale service.
Who are We?
We are engineers on Azure AI Infrastructure. We believe that building a planet-scale AI Supercomputer from the ground-up which addresses the fundamental pain-points of data scientists and AI practitioners and takes AI to the unprecedented scale is an opportunity of a lifetime. If you share the same dream as us, come join us!
What Is Azure AI Infrastructure?
High scale AI workloads are always testing the limits of the infrastructure stack. Large-scale model training and inference with huge data volumes of training data on hundreds-thousands of GPUs make it a true engineering challenge. Azure AI Infrastructure is a globally distributed, multi-tenant service that provides robust, cost-effective and competitive AI infrastructure (compute, networking and storage) for AI training and inferencing. By abstracting workloads from underlying infrastructure, Azure AI Infrastructure creates a shared pool of resources that can be dynamically provisioned for full utilization of expensive GPU compute, and enabling data scientists to productively build, scale, experiment, and iterate their models on top of a robust, performant, scalable and cost-effective distributed infrastructure built for AI. In Azure AI Infrastructure, we are constantly seeking to apply the best ideas from AI, ML, distributed systems, distributed databases, machine learning, information retrieval, networking, and security.
Responsibilities
- Work on the architecture, design, and development of the core AI Infrastructure services that support large scale AI training and inferencing
- Develop, test, and maintain backend services written in C#, Go, Rust, C++, hosted on Kubernetes/Service Fabric clusters and Docker containers
- Enhance systems and applications to ensure high stability, efficiency, & maintainability, low latency, tight cloud security
- Provide operational support and DRI responsibilities for the product
- Develop and foster a deep understanding of the machine learning systems and concepts and their usage by our customers
- Collaborate closely with engineers, data scientists within the team, internal Microsoft Research teams and external enterprises to build better solutions together
- Provide vision, expertise, and technical leadership to other team members
- Help to grow talent in these areas
Qualifications
- 10+ years of experience with coding in one of C#, C or C++, Rust, go
- Experience working with the Linux operation system and Kubernetes cluster orchestration
- Experience with improving service operations or engineering fundamentals
- Excellent collaboration skills
- A Masters or Bachelors degree in computer science or a related field
- At least 8 years of experience building and shipping production software or services
#IDCAIPlatfromHiring
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture Azure Computer Science Deep Learning Distributed Systems Docker Engineering GPU Kubernetes Linux Machine Learning ML infrastructure Model training Research Rust Security Testing
Perks/benefits: Career development Medical leave
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Research Scientist jobs
- Open Data Science Manager jobs
- Open Data Engineer II jobs
- Open Principal Data Scientist jobs
- Open Business Data Analyst jobs
- Open Data Scientist II jobs
- Open BI Analyst jobs
- Open Sr Data Engineer jobs
- Open Business Intelligence Engineer jobs
- Open Sr. Data Scientist jobs
- Open Data Science Intern jobs
- Open Software Engineer, Machine Learning jobs
- Open Senior Business Intelligence Analyst jobs
- Open Lead Data Analyst jobs
- Open Azure Data Engineer jobs
- Open Junior Data Scientist jobs
- Open MLOps Engineer jobs
- Open Manager, Data Engineering jobs
- Open Data Analytics Engineer jobs
- Open Marketing Data Analyst jobs
- Open Data Engineer III jobs
- Open Junior Data Engineer jobs
- Open Data Analyst II jobs
- Open Product Data Analyst jobs
- Open Data Engineering Manager jobs
- Open Data quality-related jobs
- Open Data management-related jobs
- Open Privacy-related jobs
- Open Excel-related jobs
- Open ML models-related jobs
- Open Data pipelines-related jobs
- Open APIs-related jobs
- Open PhD-related jobs
- Open PyTorch-related jobs
- Open Finance-related jobs
- Open LLMs-related jobs
- Open Data visualization-related jobs
- Open Consulting-related jobs
- Open TensorFlow-related jobs
- Open Business Intelligence-related jobs
- Open Deep Learning-related jobs
- Open Generative AI-related jobs
- Open CI/CD-related jobs
- Open NLP-related jobs
- Open Data governance-related jobs
- Open DevOps-related jobs
- Open Kubernetes-related jobs
- Open Git-related jobs
- Open Docker-related jobs
- Open Hadoop-related jobs