Observability Tech Lead

United Arab Emirates

Expleo

Expleo is a trusted partner for end-to-end, integrated engineering, quality services and management consulting for digital transformation.

View company page

Overview

We are looking for an Observability Tech Lead to design and develop observability platform. The platform is used by internal teams to monitor, diagnose, and optimize the products, assets and services in cloud, on-prem, data centers. You will work with a team of engineers, product managers, and partners to define the observability strategy, roadmap, and standard methodologies. You will also mentor and coach other engineers on observability, machine learning, tools and techniques

Responsibilities

• Lead the design, development, and deployment of the observability platform, including metrics, logs, traces, events, alerts, dashboards, and visualizations.

• Collaborate with other teams and customers to understand their observability needs and provide solutions that meet their requirements and expectations.

• Establish and implement observability standards, guidelines, and processes.

• Research, evaluate, and adopt new observability technologies and frameworks that can enhance user experience.

• Provide peer reviews to other engineers including feedback on performance, scalability, security and correctness.

• Handle large volumes of data and ensure data quality, security, and compliance.

• Develop and operate scalable, reliable, and distributed systems that can handle high traffic and complex workloads.

• Find opportunities to automate remediation of commonly occurring issues to operate systems reliably and efficiently.

Qualifications

Bachelor’s degree in computer science and Engineering, or related field, or equivalent experience.

• 10+ years of experience in product development and full stack engineering, with 5+ years of experience in developing and operating observability platforms and solutions, preferably in a cloud-native environment.

• Strong knowledge and experience in one or more observability tools, such as Prometheus, Dynatrace, Datadog, NewRelic, Splunk, OpenTelemetry, etc.

• Experience with Kubernetes, Docker, and microservices architectures.

• Proficient in one or more programming languages, such as Go, Python, Java, C#, etc.

• Passionate about observability and delivering high-quality internal platforms.

• Demonstrated experience and expertise in using machine learning and Generative AI to develop solutions such as predictive monitoring, incident diagnosis, summarization and chatbots. • Experience with developing Observability solutions to monitor On-prem and Public Cloud environments.

• Developed unified cloud observability platform to monitor Network, Compute, Storage, Operating Systems, Security, Applications, SaaS Platforms.

• Understanding of implementing Observability solutions to large scale on-prem Infrastructure and Networking.

Apply now Apply later
  • Share this job via
  • or

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  3  1  0

Tags: Architecture Chatbots Computer Science Data quality Distributed Systems Docker Engineering Generative AI Java Kubernetes Machine Learning Microservices Python Research Security Splunk

Perks/benefits: Career development Team events

Region: Middle East

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.