Principal ML Pipeline and Data Architect

Newark, CA

Applications have closed

Lucid Motors

With extraordinary design, performance, range, convenience, and utility, Lucid Gravity is the future of sustainable mobility, reimagining the luxury electric SUV.

View company page

Leading the future in luxury electric and mobilityAt Lucid, we set out to introduce the most captivating, luxury electric vehicles that elevate the human experience and transcend the perceived limitations of space, performance, and intelligence. Vehicles that are intuitive, liberating, and designed for the future of mobility. We plan to lead in this new era of luxury electric by returning to the fundamentals of great design – where every decision we make is in service of the individual and environment. Because when you are longer bound by convention, you are free to define your own experience. Come work alongside some of the most accomplished minds in the industry. Beyond providing competitive salaries, we’re providing a community for innovators who want to make an immediate and significant impact. If you are driven to create a better, more sustainable future, then this is the right place for you. Notice regarding COVID-19 Vaccination for positions located in Newark, CaliforniaAt Lucid (the “Company”), we prioritize the health and wellbeing of our employees, families, and friends above all else. In response to the novel Coronavirus, and the increased transmissibility with recent variants, all Lucid Employees whose employment is based in Newark, current and future, must be fully vaccinated and provide proof thereof as a condition of continued or future employment with the Company. Accommodations due to medical or religious exemptions will be considered.
Principal ML Pipeline and Data Architect is responsible to define and lead the Data and Machine learning Architecture, Performance, and Scalability of lucid ML operations, for both Vehicle and Operational data, ingesting, processing, and storing Trillions of rows of data per day. This hands-on role helps solve real big data problems, which most of the standard tools on the market are not capable of handling. You will be designing solutions, writing codes and automation, defining standards, and establish best practices across the company. 

Role

  • Lead the design and implementation of ML training and inference frameworks
  • Design, implement and lead Data and Machine learning Architecture, Performance, and Scalability of Lucid ML operations.
  • Lead the design and deployment of large scale ML pipeline using open source technologies such as Kubeflow, MLFlow, Airflow
  • Design and implement robust, automated, production-level software using horizontally scalable components
  • Work effectively with cross-functional teams of engineers, product managers, and domain experts
  • Design and build the next generation of ML architecture that will power large-scale data science projects
  • Present project metrics and complex ML concepts to both technical and non-technical audiences
  • Deep understanding of data design systems and experience handling large data sets
  • Implement and manage industry best practice tools and processes such as Data Lake, Delta Lake, S3, Spark ETL, Airflow, Hive Catalog, Ranger, Redshift, Spline, Kafka, MQTT, Timeseries Database, Cassandra, Redis, Presto, Kubernetes, Docker, CI/CD, DevOps
  • Contribute to the overall architecture, implementation and ongoing maintenance of our codebase
  • Optimize the performance and scale our data ingestion and processing infrastructure to server ever-increasing volume.
  • Translate big data and analytics requirements into data models that will operate at a large scale and high performance and guide the data analytics engineers on these data models.
  • Provide direction and focus in areas of high ambiguity
  • Mentoring junior team members

Qualifications

  • M.S. or PhD in Computer Science, or equivalent.
  • 10+ years of hands-on experience in ML pipeline, ETL, data modeling processing
  • 5+ years of hands-on experience in productionizing and deploying Big Data platforms and applications, Hands-on experience working with: Relational/SQL, distributed columnar data stores/NoSQL databases, time-series databases, Spark streaming, Kafka, Hive, Parquet, Avro, and more
  • ML engineering or data engineering background with more than 10 years of industry experience.
  • Expert in Spark, Kafka, Presto, Kubeflow, Airflow, or similar technologies.
  • Experience with Kubernetes-based ML Architecture.
  • Proven hands-on experience building solutions for large-scale data infrastructures and ML pipelines
  • Have hands-on experience with Scala, Spark, Python, and GoLang to implement large-scale data flows.
  • Have production experience with open source technologies like Hive, Kafka, Airflow, HBase, etc.
  • Experience developing in a highly concurrent, multi-processor, and multi-threaded environment
  • Experience with heterogeneous computing and GPGPU programming.
  • Strong knowledge and understanding of machine learning pipelines from standardization, normalization, clustering, modeling, scoring, validation
  • Understanding of ETL engineering and tools so you can interface with data integration teams
  • Experience working with various data infrastructure technologies such as datastores (SQL/NoSQL dbs) and data streaming/processing (Spark, Kafka, Airflow, AWS Kinesis) is preferred
At Lucid, we don’t just welcome diversity - we celebrate it! Lucid Motors is proud to be an equal opportunity workplace and is an affirmative action employer.  We are committed to equal employment opportunity regardless of race, color, national or ethnic origin, age, religion, disability, sexual orientation, gender, gender identity and expression, marital status, and any other characteristic protected under applicable State or Federal laws and regulations. To all recruitment agencies: Lucid Motors does not accept agency resumes. Please do not forward resumes to our careers alias or other Lucid Motors employees. Lucid Motors is not responsible for any fees related to unsolicited resumes.

Tags: Airflow Avro AWS Big Data Cassandra CI/CD Computer Science Data Analytics DevOps Docker Engineering ETL Golang HBase Kafka Kinesis Kubernetes Machine Learning MLFlow NoSQL Open Source Parquet PhD Pipelines Python Redshift Scala Spark SQL Streaming

Region: North America
Country: United States
Job stats:  12  1  0

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.