SRE Manager - Data Analytics, Cloud Platform
Beaverton, OR
Applications have closed
Lucid Motors
With extraordinary design, performance, range, convenience, and utility, Lucid Gravity is the future of sustainable mobility, reimagining the luxury electric SUV.The Cloud Platform team at Lucid is seeking a Site Reliability Engineering Manager for Data Analytics. In this position, you will lead a team to build and maintain the reliability of the Data Analytics Platform on various public and private Cloud Infrastructures. Our ideal candidate exhibits a can-do attitude and approaches the work with vigor and determination. We are looking for a hands-on Software Engineering Manager who will collaborate with various stakeholders to build tools & services to keep up the SLA of the Data Analytics Cloud Platform.
The Role:
- Manage and lead the Service Reliability Engineering of cloud services across Lucid Motors.
- Collaborate with Service Owners to define SLOs, build SLIs, and ensure the Data Analytic services meet the SLA.
- Indulge with Developers, DevOps, Data Scientists, and Quality Engineers to build reliable services from design to production.
- Build tools and frameworks to automate the monitoring systems to ensure the highest level of uptime in various production-grade environments.
- Identify the bottlenecks at early stages in HA big-data systems.
- Use the self-service model approach by creating tools that enable Data Engineers & Data Scientists to optimize Data analytics jobs based on the load and be cost-effective.
- Manage Kafka as a streaming platform to run various workloads, induce observability tools to monitor and inspect the pub-subs, and automate the system scalability based on the load.
- Manage orchestration engines such as Airflow, Kubeflow, and others to run various Data ETL and ML pipeline processes by providing additional frameworks and tools to achieve higher reliability.
- Manage the cloud connectivity modules and be able to troubleshoot end-to-end on a private or public cloud Infrastructure.
- Manage and offer various Query Engines, such as PrestoDB, Hive, Spark SQL, and others, to run analytic queries.
- Use Incident management processes and always look for improving operational efficiency.
- Swiftly navigate through the incident, perform the impact analysis and take appropriate actions.
- Understands customer impact and can prioritize the workload between features development and customer support
- Create a 24/7 service availability model to proactive monitor the systems across geographical locations.
- Support the reliability aspects of services that use MQTT, Kafka, EMQX, RabbitMQ, Spark, Hive, and other open-source software services.
- Track record of hiring and building SRE organization from the ground up.
Qualifications:
- B.S. or M.S. degree in Computer Science, Engineering
- Minimum 5+ years of experience in SRE or DevOps Engineering
- 2-5 years of experience in managing one or more SRE teams
- 3-5 years of experience in managing large-scale data analytics platforms that use Spark, Storm, or other similar frameworks
- 1-3 years of experience deploying and maintaining applications that are built using Docker and orchestrated on Kubernetes on Public or Private Cloud Providers
- 1-3 years of experience using Cloud Automation tools such as Terraform, Pulumi, Cluster API, or other frameworks
- 1-3 years of experience in Programing or scripting languages using Python, Go, Bash/Shell, or others
- 1-3 years of administrative operations knowledge in RDBMS such as Postgres and no-SQL such as Cassandra, MongoDB, or others
- Show experience in hiring and building high-performant SRE teams
- Show the traits of being detail-oriented with time management and organization skills, and dedication to quality
- Experienced with various debugging tools and troubleshooting performance bottlenecks at the infrastructure or application tier
- Good to have experience with Config Management and automation using Ansible, Chef, Puppet, or others
- Experienced with various Networking challenges and able to resolve networking bottlenecks at peak load
- Good to know about REST-based APIs and knows how to triage the request-response
Notice regarding COVID-19 protocols At Lucid, we prioritize the health and wellbeing of our employees, families, and friends above all else. In response to the novel Coronavirus all new Lucid employees, whose job will be based in the United States may or may not be required to provide original documentation confirming status as having received the prescribed inoculation (doses). Vaccination requirements are dependent upon location and position, please refer to the job description for more details. Individuals in positions requiring vaccinations may seek a medical and/or religious exemption from this requirement and may be granted such an accommodation after submitting a formal request to and the subsequent review and approval thereof by our dedicated Covid-19 Response team. To all recruitment agencies: Lucid Motors does not accept agency resumes. Please do not forward resumes to our careers alias or other Lucid Motors employees. Lucid Motors is not responsible for any fees related to unsolicited resumes.
Tags: Airflow Ansible APIs Cassandra Computer Science Data Analytics DevOps Docker Engineering ETL Kafka Kubeflow Kubernetes Machine Learning MongoDB PostgreSQL Python RabbitMQ RDBMS Spark SQL Streaming Terraform
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Marketing Data Analyst jobs
- Open MLOps Engineer jobs
- Open Junior Data Scientist jobs
- Open AI Engineer jobs
- Open Data Engineer II jobs
- Open Senior Data Architect jobs
- Open Sr Data Engineer jobs
- Open Senior Business Intelligence Analyst jobs
- Open Data Analytics Engineer jobs
- Open Power BI Developer jobs
- Open Manager, Data Engineering jobs
- Open Product Data Analyst jobs
- Open Principal Data Engineer jobs
- Open Business Data Analyst jobs
- Open Data Quality Analyst jobs
- Open Data Manager jobs
- Open Sr. Data Scientist jobs
- Open Data Scientist II jobs
- Open Big Data Engineer jobs
- Open Business Intelligence Developer jobs
- Open Data Analyst Intern jobs
- Open Principal Data Scientist jobs
- Open ETL Developer jobs
- Open Azure Data Engineer jobs
- Open Data Product Manager jobs
- Open Business Intelligence-related jobs
- Open Data quality-related jobs
- Open Privacy-related jobs
- Open Data management-related jobs
- Open GCP-related jobs
- Open Java-related jobs
- Open ML models-related jobs
- Open Finance-related jobs
- Open Data visualization-related jobs
- Open Deep Learning-related jobs
- Open APIs-related jobs
- Open PyTorch-related jobs
- Open PhD-related jobs
- Open Consulting-related jobs
- Open TensorFlow-related jobs
- Open Snowflake-related jobs
- Open NLP-related jobs
- Open Data warehouse-related jobs
- Open Data governance-related jobs
- Open Airflow-related jobs
- Open Hadoop-related jobs
- Open Databricks-related jobs
- Open LLMs-related jobs
- Open DevOps-related jobs
- Open Kubernetes-related jobs