Lead Data Engineer

New York City, United States

Company Description

Read this first: If you have what it takes for this job, but don't meet all the criteria within the job description, we encourage you to apply. We'd like to connect with you to see what skills and experience you could bring to our team.

The Publicis Media Technology group is a centralized practice within Publicis Media, the media-centric solutions hub of Publicis Groupe [Euronext Paris FR0000130577, CAC 40]. PM Technology drives transformation across Publicis Media’s agency brands via the provision of a scalable and consistent approach to data and technology. PM Technology is powered by a leading, data/tech platform; best-in-class technology management and consultancy services; unified workflow solutions; differentiated products and client solution stacks; and strategic oversight of all major data and technology partnerships. Home to over 100 engineers, data architects, consultants, product managers and a robust project management operation, PM Technology helps scale innovation worldwide.

Job Description

An experienced software engineer who will be responsible for the design, development, implementation and on-going support of .NET applications as well as database ETL processes, managing pipelines, data cleaning, MLOps and data modeling.  The position will be comprised work closely with data science team productionizing concepts and models developed in Python and other DS technologies. Should have experience with .Net development of web services and using Git version control repositories. Experience with dependency injection, unit testing, and software design patterns.

Should have knowledge of and experience with Machine Learning and AI concepts and be able to articulate differences between them and data requirements that they may have. Particularly the candidate should have knowledge of NLP based services, such as Chat GPT/OpenAI, and experience of prompt engineering, embeddings, fine tuning etc. via the api. Should understand the flow of data and be able to extract/clean data, into appropriate structures, in order to train models. Should understand synchronous and asynchronous data pipelines and training methodologies. 

Must be able to work on multiple projects simultaneously, including both enhancements as well as new project development. The software engineer’s responsibilities will be to design and develop these applications, and to coordinate with the rest of the team working on different layers of the infrastructure. The candidate must be a self-starter with a sense of urgency and a commitment to quality and professionalism. 

Responsibilities:

  • Ultimate responsibility for the accuracy of ingested data
  • Work closely with Dev Lead and Architect and other data providers to ensure implementation follows best practices
  • Technical design process of efficient, scalable, secure and performant data processing pipelines
  • Ensure modern practices and techniques are brought to the design process
  • Translate application stories and use cases into functional applications
  • Design, build, and maintain efficient, reusable, and reliable C# and or Python code
  • Integrate with AI models, 1st and 3rd party
  • Integrate with 1st and 3rd party API’s for data consumption
  • Integrate with data storage solutions such as SQL Server/Spark/Vector DB/Search
  • Ensure the best possible performance, quality, and responsiveness of applications
  • Identify bottlenecks and bugs, and devise solutions to mitigate and address these issues
  • Integration of the front-end and back-end aspects of the web application
  • Collaborate with other team members and stakeholders

Qualifications

  • Experience of .NET Core
  • RESTful Web services in .NET or experience with SOAP or WCF
  • Experience of managing models using ML frameworks such as SageMaker
  • Experience of Data pipelining, ingestion and transformation
  • Experience with Data cleaning and preprocessing strategies
  • Experience of scheduled training of models and exposing as services
  • Experience with Python and working knowledge of Python ML Libraries
  • Experience of building highly performant and scalable services
  • Experience with NLP services and knowledge of Prompt Engineering and Embeddings
  • Understanding of pipeline automation and monitoring
  • Understanding of data security and compliance
  • Strong relational Database knowledge, especially MSSQL Server
  • Knowledge of other database technologies like document, vector, graph, timeseries, and data warehouse technologies, i.e., Redshift/BigQuery
  • Understanding of event-based technologies and queuing technologies, e.g., SQS and SNS
  • Experience working In cloud based applications, ideally AWS Management - SQS, EC2, S3, SNS, IAM Roles, ALB 
  • Knowledge and understanding of JS frameworks like ReactJS, d3 is preferred
  • Knowledge of dependency injection and unit testing
  • Experience working with Agile Scrum teams
  • Proficient understanding of version control systems such as SVN or Git
  • Knowledge of or willing to learn virtualization and containerization tools, especially Docker
  • Familiarity with continuous integration tools like TeamCity
  • Bachelor's degree in Computer Science or related field is preferred 
  • 5+ years of development experience in the field of Data Engineering

Additional Information

Compensation Range: $106,500 - $167,500 annually. This is the pay range the Company believes it will pay for this position at the time of this posting. Consistent with applicable law, compensation will be determined based on the skills, qualifications, and experience of the applicant along with the requirements of the position, and the Company reserves the right to modify this pay range at any time. For this role, the Company will offer medical coverage, dental, vision, disability, 401k, and paid time off.

All your information will be kept confidential according to EEO guidelines.

REQ # 23-6747

Job stats:  15  1  0

Tags: Agile APIs AWS BigQuery ChatGPT Computer Science D3 Data pipelines Data warehouse Docker EC2 Engineering ETL Git GPT Machine Learning MLOps MS SQL NLP OpenAI Pipelines Prompt engineering Python RDBMS Redshift SageMaker Scrum Security Spark SQL Testing

Perks/benefits: Career development Health care

Region: North America
Country: United States

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.