Cloud Data Engineer | Bangalore

Bengaluru Millenia

PwC

We are a community of solvers combining human ingenuity, experience and technology innovation to help organisations build trust and deliver sustained outcomes.

View company page

Apply now Apply later

Line of Service

Advisory

Industry/Sector

Not Applicable

Specialism

Operations

Management Level

Senior Associate

Job Description & Summary

A career in our New Technologies practice, within Application and Emerging Technology services, will provide you with a unique opportunity to help our clients identify and prioritise emerging technologies that can help solve their business problems. We help clients design approaches to integrate new technologies, skills, and processes so they can drive business results and innovation.

Our team helps organisations to embrace emerging technologies to remain competitive and improve their business by solving complex questions. Our team focuses on identifying and prioritising emerging technologies, breaking into new markets, and preparing clients to get the most out of their emerging technology investments.

Job description
• Design and implement data integration, acquisition, cleansing,
harmonisation, and transformation processes to create curated high-quality
datasets for the usage of vector stores/embeddings.
• Maintain scalable data processing pipelines and monitor technology
trends and advancements in Generative AI and incorporate them to improve.
• Collaborate with Solution Architects, Data Scientists, Software
Engineers and DevOps Engineers, Product Owners, researchers and
business stakeholders on the cross-functional team.
Skills and Qualifications:
• 6+ years of experience in data engineering and 6-12 months of hands-
on experience in Generative AI technologies like vector databases.
• Hands-on experiences on text processing tools (e.g. spaCy, NLTK,
Word2vec, pyTorch) and data engineering tools (e.g. Hadoop, Airflow,
Pandas)
• Experience with at least one cloud platform like Azure, AWS, GCP
• Hands-on experience in software development with one major
programming language (e.g. Python) and experience in machine learning and
Generative AI models
• Understand the difference, advantages, and disadvantages between the
most common large language models (GPT, Llama)
• Hands-on experiences on information retrieval tools (e.g. Chroma DB,
Pinecone, Vector, Elasticsearch or other vector stores)
• Good overview of re-usable frameworks and tools in the field of
Generative AI (both commercial and open source)
Role & responsibilities
• Data for AI:
• Design and implement data pipelines specifically tailored for AI training
and model development.
• Partner with AI data scientists and engineers to understand their data
needs and ensure efficient data access.
• Implement data pre-processing and feature engineering techniques for
optimal model performance.
• AWS expertise:
• Design, develop, and maintain data infrastructure on the AWS cloud
platform, leveraging services like S3,Redshift, Glue, Lambda, and Kinesis.
• Build and maintain data lakes and warehouses optimized for AI
workloads.
• Implement Automated data transformations, cleansing, and validation
processes to ensure data quality.

• Coding proficiency:
• Write code using Python,SQL, and other relevant programming
languages, with a focus on libraries and frameworks commonly used in AI
(e.g., TensorFlow, PyTorch).
• Automation and monitoring:
• Automate data pipelines using CI/CD tools and related techniques.
• Monitor and troubleshoot data pipelines for performance, reliability, and
data quality issues.
Preferred candidate profile
• 6 to 10 years of experience as a Data Engineer or similar role, with a
demonstrated experience in supporting data pipelines for AI applications and
their data requirements.
• Proven experience with AWS services, including S3, Redshift, Glue,
Lambda, and Kinesis.
• Strong experience with Python and SQL.
• Strong understanding of AI libraries and frameworks.
• Experience with data warehousing and data modelling concepts.
• Excellent problem-solving and analytical skills.
• Ability to work independently and as part of a collaborative team.
• Strong communication and collaboration skills.

Role: AI Data Engineer
Industry Type: technology Consulting
Department: D&A
Employment Type: Full Time, Permanent
Role Category: Cloud Engineer
Education
UG: B.Tech/B.E. in Any Specialization
PG: MCA in Any Specialization, M.Tech in Any Specialization
Key Skills S3, Redshift, Glue, Lambda, and Kinesis, TensorFlow, PyTorch ,
Python, Spark

Mandatory skill sets- S3, Redshift, Glue, Lambda, and Kinesis, TensorFlow, PyTorch , Python, Spark
Preferred skill sets- S3, Redshift, Glue, Lambda, and Kinesis, TensorFlow, PyTorch , Python, Spark
Year of experience required-6 to 8 years
Qualifications-Graduate Engineer or Management Graduate

Education (if blank, degree and/or field of study not specified)

Degrees/Field of Study required:

Degrees/Field of Study preferred:

Certifications (if blank, certifications not specified)

Required Skills

Amazon Redshift, Pyth (Procedural Programming Language), Python (Programming Language)

Optional Skills

Desired Languages (If blank, desired languages not specified)

Travel Requirements

Available for Work Visa Sponsorship?

Government Clearance Required?

Job Posting End Date

Apply now Apply later
  • Share this job via
  • or

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  1  0  0
Category: Engineering Jobs

Tags: Airflow AWS Azure CI/CD Consulting Data pipelines Data quality Data Warehousing DevOps Elasticsearch Engineering Feature engineering GCP Generative AI GPT Hadoop Kinesis Lambda LLaMA LLMs Machine Learning ML models NLTK Open Source Pandas Pinecone Pipelines Python PyTorch Redshift spaCy Spark SQL TensorFlow Word2Vec

Perks/benefits: Career development

Region: Asia/Pacific
Country: India

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.