2024 - Machine Learning Infrastructure Observability - Expert Software Engineer
Dublin, County Dublin, Ireland
Huawei Ireland
Huawei is a leading global provider of information and communications technology (ICT) infrastructure and smart devices.Company Overview: Our cutting-edge technology company is at the forefront of the AI revolution, and we’re seeking an Expert to join our talented team. As a global leader in Cloud & ML infrastructure, we operate large fleets with ML accelerators and distributed systems. Our work directly impacts the rapid development and deployment of AI models across various domains.
Role Summary: As an Expert, you will be a pivotal force in shaping the efficiency, reliability, and scalability of our ML infrastructure by designing and developing observability solutions and tools. Your role involves close collaboration with technical leaders across multidisciplinary domains, including cloud infrastructure and ML software systems. Together, we aim to design observability to help operational excellence in our fleet, ensuring seamless ML experiences for our customers.
Responsibilities:
- Design and develop our ML fleet infrastructure observability/monitoring, including GPU clusters, distributed storage, and compute nodes.
- Design and develop ai cluster operations related observability to help proactive maintenance and capacity planning functions.
- Drive efficiency improvements and provide guidance for the AI/ML operations engineers with observability best practices.
- Evaluate cutting edge observability technologies for hardware accelerators, and next generation networking infrastructure.
- Provide technical leadership and mentorship to junior SREs, SDEs and Data Scientists.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field.
- Minimum 5 years of hands-on experience in SRE or DevOps roles, specifically focused on ML infrastructure along with AI Infra and AI monitoring.
- Proficiency in Linux, low-level debugging, and system performance analysis.
- Strong scripting skills (Python, Bash) for automation and monitoring.
- Experience with Kubernetes, Docker, and container orchestration.
- Excellent communication skills and ability to collaborate across teams.
Benefits
- Competitive salary package
- Long-term personal growth space
- Opportunities to work on high profile initiatives that impact the whole company
- Opportunities to work with the brightest minds in software engineering (including Huawei Fellow and renowned professors in the world)
- A multi-cultural, international working environment
- Work for an international world leader, an established yet still rapidly growing Fortune 500 company
Check out Life at Huawei Ireland Research Centre: https://www.youtube.com/watch?v=3gR64sYSnOA&feature=youtu.be
DUE TO THE HIGH VOLUME OF REPLIES, ONLY CANDIDATES WHO ARE SHORTLISTED FOR INTERVIEW WILL BE CONTACTED.
Privacy Statement
Please read and understand our West European Recruitment Privacy Notice before submitting your personal data to Huawei so that you fully understand how we process and manage your personal data received.
http://career.huawei.com/reccampportal/portal/hrd/weu_rec_all.html
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Computer Science DevOps Distributed Systems Docker Engineering GPU Kubernetes Linux Machine Learning ML infrastructure Privacy Python Research
Perks/benefits: Career development Competitive pay Startup environment
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Lead Data Analyst jobs
- Open Senior Business Intelligence Analyst jobs
- Open MLOps Engineer jobs
- Open Data Manager jobs
- Open Data Science Manager jobs
- Open Principal Data Engineer jobs
- Open Data Engineer II jobs
- Open Sr Data Engineer jobs
- Open Power BI Developer jobs
- Open Product Data Analyst jobs
- Open Business Intelligence Developer jobs
- Open Data Scientist II jobs
- Open Junior Data Scientist jobs
- Open Data Analytics Engineer jobs
- Open Business Data Analyst jobs
- Open Sr. Data Scientist jobs
- Open Senior Data Architect jobs
- Open Data Analyst Intern jobs
- Open Big Data Engineer jobs
- Open Manager, Data Engineering jobs
- Open Junior Data Engineer jobs
- Open Data Quality Analyst jobs
- Open Data Product Manager jobs
- Open Principal Data Scientist jobs
- Open Azure Data Engineer jobs
- Open GCP-related jobs
- Open Data quality-related jobs
- Open Business Intelligence-related jobs
- Open Java-related jobs
- Open ML models-related jobs
- Open Data management-related jobs
- Open Privacy-related jobs
- Open Data visualization-related jobs
- Open Finance-related jobs
- Open Deep Learning-related jobs
- Open PhD-related jobs
- Open APIs-related jobs
- Open TensorFlow-related jobs
- Open PyTorch-related jobs
- Open NLP-related jobs
- Open Consulting-related jobs
- Open Snowflake-related jobs
- Open CI/CD-related jobs
- Open LLMs-related jobs
- Open Generative AI-related jobs
- Open Kubernetes-related jobs
- Open Data governance-related jobs
- Open Hadoop-related jobs
- Open Airflow-related jobs
- Open Docker-related jobs