Cloud Data Engineer | Bangalore
Bengaluru Millenia
PwC
We are a community of solvers combining human ingenuity, experience and technology innovation to help organisations build trust and deliver sustained outcomes.Line of Service
AdvisoryIndustry/Sector
Not ApplicableSpecialism
OperationsManagement Level
Senior AssociateJob Description & Summary
A career in our New Technologies practice, within Application and Emerging Technology services, will provide you with a unique opportunity to help our clients identify and prioritise emerging technologies that can help solve their business problems. We help clients design approaches to integrate new technologies, skills, and processes so they can drive business results and innovation.Our team helps organisations to embrace emerging technologies to remain competitive and improve their business by solving complex questions. Our team focuses on identifying and prioritising emerging technologies, breaking into new markets, and preparing clients to get the most out of their emerging technology investments.
Job description
• Design and implement data integration, acquisition, cleansing,
harmonisation, and transformation processes to create curated high-quality
datasets for the usage of vector stores/embeddings.
• Maintain scalable data processing pipelines and monitor technology
trends and advancements in Generative AI and incorporate them to improve.
• Collaborate with Solution Architects, Data Scientists, Software
Engineers and DevOps Engineers, Product Owners, researchers and
business stakeholders on the cross-functional team.
Skills and Qualifications:
• 6+ years of experience in data engineering and 6-12 months of hands-
on experience in Generative AI technologies like vector databases.
• Hands-on experiences on text processing tools (e.g. spaCy, NLTK,
Word2vec, pyTorch) and data engineering tools (e.g. Hadoop, Airflow,
Pandas)
• Experience with at least one cloud platform like Azure, AWS, GCP
• Hands-on experience in software development with one major
programming language (e.g. Python) and experience in machine learning and
Generative AI models
• Understand the difference, advantages, and disadvantages between the
most common large language models (GPT, Llama)
• Hands-on experiences on information retrieval tools (e.g. Chroma DB,
Pinecone, Vector, Elasticsearch or other vector stores)
• Good overview of re-usable frameworks and tools in the field of
Generative AI (both commercial and open source)
Role & responsibilities
• Data for AI:
• Design and implement data pipelines specifically tailored for AI training
and model development.
• Partner with AI data scientists and engineers to understand their data
needs and ensure efficient data access.
• Implement data pre-processing and feature engineering techniques for
optimal model performance.
• AWS expertise:
• Design, develop, and maintain data infrastructure on the AWS cloud
platform, leveraging services like S3,Redshift, Glue, Lambda, and Kinesis.
• Build and maintain data lakes and warehouses optimized for AI
workloads.
• Implement Automated data transformations, cleansing, and validation
processes to ensure data quality.
• Coding proficiency:
• Write code using Python,SQL, and other relevant programming
languages, with a focus on libraries and frameworks commonly used in AI
(e.g., TensorFlow, PyTorch).
• Automation and monitoring:
• Automate data pipelines using CI/CD tools and related techniques.
• Monitor and troubleshoot data pipelines for performance, reliability, and
data quality issues.
Preferred candidate profile
• 6 to 10 years of experience as a Data Engineer or similar role, with a
demonstrated experience in supporting data pipelines for AI applications and
their data requirements.
• Proven experience with AWS services, including S3, Redshift, Glue,
Lambda, and Kinesis.
• Strong experience with Python and SQL.
• Strong understanding of AI libraries and frameworks.
• Experience with data warehousing and data modelling concepts.
• Excellent problem-solving and analytical skills.
• Ability to work independently and as part of a collaborative team.
• Strong communication and collaboration skills.
Role: AI Data Engineer
Industry Type: technology Consulting
Department: D&A
Employment Type: Full Time, Permanent
Role Category: Cloud Engineer
Education
UG: B.Tech/B.E. in Any Specialization
PG: MCA in Any Specialization, M.Tech in Any Specialization
Key Skills S3, Redshift, Glue, Lambda, and Kinesis, TensorFlow, PyTorch ,
Python, Spark
Mandatory skill sets- S3, Redshift, Glue, Lambda, and Kinesis, TensorFlow, PyTorch , Python, Spark
Preferred skill sets- S3, Redshift, Glue, Lambda, and Kinesis, TensorFlow, PyTorch , Python, Spark
Year of experience required-6 to 8 years
Qualifications-Graduate Engineer or Management Graduate
Education (if blank, degree and/or field of study not specified)
Degrees/Field of Study required:Degrees/Field of Study preferred:Certifications (if blank, certifications not specified)
Required Skills
Amazon Redshift, Pyth (Procedural Programming Language), Python (Programming Language)Optional Skills
Desired Languages (If blank, desired languages not specified)
Travel Requirements
Available for Work Visa Sponsorship?
Government Clearance Required?
Job Posting End Date
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow AWS Azure CI/CD Consulting Data pipelines Data quality Data Warehousing DevOps Elasticsearch Engineering Feature engineering GCP Generative AI GPT Hadoop Kinesis Lambda LLaMA LLMs Machine Learning ML models NLTK Open Source Pandas Pinecone Pipelines Python PyTorch Redshift spaCy Spark SQL TensorFlow Word2Vec
Perks/benefits: Career development
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Research Scientist jobs
- Open Data Science Manager jobs
- Open Data Engineer II jobs
- Open Principal Data Scientist jobs
- Open Business Data Analyst jobs
- Open Data Scientist II jobs
- Open BI Analyst jobs
- Open Sr Data Engineer jobs
- Open Business Intelligence Engineer jobs
- Open Lead Data Analyst jobs
- Open Sr. Data Scientist jobs
- Open Data Science Intern jobs
- Open Senior Business Intelligence Analyst jobs
- Open Software Engineer, Machine Learning jobs
- Open Junior Data Scientist jobs
- Open MLOps Engineer jobs
- Open Azure Data Engineer jobs
- Open Manager, Data Engineering jobs
- Open Data Analytics Engineer jobs
- Open Marketing Data Analyst jobs
- Open Data Engineer III jobs
- Open Junior Data Engineer jobs
- Open Data Analyst II jobs
- Open Data Engineering Manager jobs
- Open ETL Developer jobs
- Open Data quality-related jobs
- Open Tableau-related jobs
- Open Privacy-related jobs
- Open Excel-related jobs
- Open ML models-related jobs
- Open Data pipelines-related jobs
- Open APIs-related jobs
- Open PhD-related jobs
- Open PyTorch-related jobs
- Open Finance-related jobs
- Open Data visualization-related jobs
- Open LLMs-related jobs
- Open Consulting-related jobs
- Open TensorFlow-related jobs
- Open Deep Learning-related jobs
- Open Business Intelligence-related jobs
- Open Generative AI-related jobs
- Open CI/CD-related jobs
- Open NLP-related jobs
- Open Data governance-related jobs
- Open DevOps-related jobs
- Open Kubernetes-related jobs
- Open Git-related jobs
- Open Hadoop-related jobs
- Open Docker-related jobs