Analyst - Data Engineer
Bengaluru, Karnataka, IN, 560100
Merck Group
Work Your Magic with us!
Ready to explore, break barriers, and discover more? We know you’ve got big plans – so do we! Our colleagues across the globe love innovating with science and technology to enrich people’s lives with our solutions in Healthcare, Life Science, and Electronics. Together, we dream big and are passionate about caring for our rich mix of people, customers, patients, and planet. That`s why we are always looking for curious minds that see themselves imagining the unimageable with us.
Job Title: Analyst
Job Location: Bangalore
In this role, you will be part of a growing, global team of data engineers, who collaborate in DevOps mode, in order to enable Life Science business with state-of-the-art technology to leverage data as an asset and to take better informed decisions.
The Life Science Data Engineering Team is responsible for designing, developing, testing, and supporting automated end-to-end data pipelines and applications on Life Science’s data management and analytics platform (Palantir Foundry, AWS and other components).
The Foundry platform comprises multiple different technology stacks, which are hosted on Amazon Web Services (AWS) infrastructure . Developing pipelines and applications on Foundry requires:
- Proficiency in SQL / Java / Python (Python required; all 3 not necessary)
- Proficiency in PySpark for distributed computation
- Familiarity with Postgres and Elasticsearch
- Familiarity with HTML, CSS, and JavaScript and basic design/visual competency
- Familiarity with common databases (e.g. JDBC, MySQL, Microsoft SQL). Not all types required
- Familiarity with any cloud infrastructure/tools with respect to data engineering
This position will be project based and may work across multiple smaller projects or a single large project utilizing an agile project methodology.
Roles & Responsibilities:
- Develop data pipelines by ingesting various data sources – structured and un-structured – into Palantir Foundry
- Participate in end-to-end project lifecycle, from requirements analysis to go-live and operations of an application
- Acts as business analyst for developing requirements for Foundry pipelines
- Review code developed by other data engineers and check against platform-specific standards, cross-cutting concerns, coding and configuration standards and functional specification of the pipeline
- Document technical work in a professional and transparent way. Create high quality technical documentation
- Work out the best possible balance between technical feasibility and business requirements (the latter can be quite strict)
- Deploy applications on Foundry platform infrastructure with clearly defined checks
- Implementation of changes and bug fixes via change management framework and according to system engineering practices (additional training will be provided)
- DevOps project setup following Agile principles (e.g. Scrum)
- Besides working on projects, act as third level support for critical applications; analyze and resolve complex incidents/problems. Debug problems across a full stack of Foundry and code based on Python with Spark
- Work closely with business users, data scientists/analysts to design physical data models
Education
- B.Sc. (or higher) degree in Computer Science, Engineering, Mathematics, or related fields
Professional Experience
- 5+ years of experience in system engineering or software development
- 3+ years of experience in data and analytics.
- 0-2 years of experience for intern in data analytis
Skills
Hadoop General
Deep knowledge of big data, distributed file system concepts, map-reduce principles and distributed computing. Knowledge of Spark and differences between Spark and Map-Reduce. Familiarity of encryption and security in a Hadoop cluster.
Application Development
Familiarity with HTML, CSS, and JavaScript and basic design/visual competency. Experience with any data visualization tool like Tableau is a plus.
Spark
Deep understanding of Apache Spark framework and proficiency in building spark pipelines.
Data management / data structures
Must be proficient in technical data management tasks, i.e. writing code to read, transform and store data
XML/JSON knowledge
Experience working with REST APIs
SCC/Git
Must be experienced in the use of source code control systems such as Git
Experience with developing ELT/ETL processes with experience in loading data from enterprise sized RDBMS systems such as Oracle, DB2, MySQL, etc.
Authorization
Basic understanding of user authorization and authentication
Programming
Must be at able to code in Python
Must have experience in using REST APIs
SQL
Must be an expert in manipulating database data using SQL. Familiarity with views, functions, stored procedures, and exception handling.
AWS
General knowledge of AWS Stack (EC2, S3, EBS, …)
IT Process Compliance
SDLC experience and formalized change controls
Working in DevOps teams, based on Agile principles (e.g. Scrum)
ITIL knowledge (especially incident, problem and change management)
Languages
Fluent English skills
Specific information related to the position:
- Physical presence in primary work location (Bangalore)
- Flexible to work CEST and US EST time zones (according to team rotation plan)
- Willingness to travel to Germany, US and potentially other locations (as per project demand)
What we offer: We are curious minds that come from a broad range of backgrounds, perspectives, and life experiences. We celebrate all dimensions of diversity and believe that it drives excellence and innovation, strengthening our ability to lead in science and technology. We are committed to creating access and opportunities for all to develop and grow at your own pace. Join us in building a culture of inclusion and belonging that impacts millions and empowers everyone to work their magic and champion human progress!
Apply now and become a part of our diverse team!
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Agile APIs AWS Big Data Computer Science Data management Data pipelines Data visualization DB2 DevOps EC2 Elasticsearch ELT Engineering ETL Git Hadoop ITIL Java JavaScript JSON Mathematics MySQL Oracle Pipelines PostgreSQL PySpark Python RDBMS Scrum SDLC Security Spark SQL Tableau Testing XML
Perks/benefits: Career development Flex hours
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Data Science Intern jobs
- Open Lead Data Analyst jobs
- Open Data Engineer II jobs
- Open Power BI Developer jobs
- Open Senior Business Intelligence Analyst jobs
- Open Data Science Manager jobs
- Open Marketing Data Analyst jobs
- Open MLOps Engineer jobs
- Open Junior Data Scientist jobs
- Open Data Scientist II jobs
- Open Business Intelligence Developer jobs
- Open Business Data Analyst jobs
- Open Product Data Analyst jobs
- Open Data Analytics Engineer jobs
- Open Data Analyst Intern jobs
- Open Sr Data Engineer jobs
- Open Principal Data Scientist jobs
- Open Sr. Data Scientist jobs
- Open Senior Data Architect jobs
- Open Data Engineering Manager jobs
- Open Junior Data Engineer jobs
- Open Big Data Engineer jobs
- Open Data Quality Analyst jobs
- Open Research Scientist jobs
- Open Azure Data Engineer jobs
- Open GCP-related jobs
- Open Java-related jobs
- Open Data quality-related jobs
- Open ML models-related jobs
- Open Business Intelligence-related jobs
- Open Data management-related jobs
- Open Privacy-related jobs
- Open PhD-related jobs
- Open Data visualization-related jobs
- Open Deep Learning-related jobs
- Open Finance-related jobs
- Open NLP-related jobs
- Open PyTorch-related jobs
- Open TensorFlow-related jobs
- Open LLMs-related jobs
- Open APIs-related jobs
- Open Generative AI-related jobs
- Open CI/CD-related jobs
- Open Snowflake-related jobs
- Open Consulting-related jobs
- Open Hadoop-related jobs
- Open Kubernetes-related jobs
- Open Data governance-related jobs
- Open Databricks-related jobs
- Open Airflow-related jobs