Data Engineer - I
Bengaluru, Karnataka, India
Data Engineer
Job Description:
Chubb is the world’s largest publicly traded property and casualty insurer. With operations in 54 countries, Chubb provides commercial and personal property and casualty insurance, personal accident and supplemental health insurance, reinsurance and life insurance to a diverse group of clients. Chubb is also defined by its extensive product and service offerings, broad distribution capabilities, exceptional financial strength and local operations globally.
Data is at the core of our business. The data engineer is a technical job that requires substantial expertise in a broad range of software development and programming fields. The data engineer should especially have sufficient knowledge of big data solutions to be able to implement those on premises or in the cloud.
A data engineer generally works on implementing complex big data projects with a focus on collecting, parsing, managing, analyzing and visualizing large sets of data to turn information into insights using multiple platforms. He or she should be able to decide on the needed hardware and software design needs and act according to the decisions. The big data engineer should be able to develop prototypes and proof of concepts for the selected solutions.
Ideal candidate for this role is someone with a strong background in computer programming, statistics, and data science who is eager to tackle problems with large, complex datasets using the latest Python, R, and/or PySpark. You are a self-starter who will take ownership of your projects and deliver high-quality data-driven analytics solutions. You are adept at solving diverse business problems by utilizing a variety of different tools, strategies, algorithms and programming languages.
Specific responsibilities are as follows:
- Utilize the data engineering skills within and outside of the developing Chubb information ecosystem for discovery, analytics and data management
- Work with data science team to deploy Machine Learning Models
- You will be using Data wrangling techniques converting one "raw" form into another including data visualization, data aggregation, training a statistical model etc.
- Work with various relational and non-relational data sources with the target being Azure based SQL Data Warehouse & Cosmos DB repositories
- Clean, unify and organize messy and complex data sets for easy access and analysis
- Create different levels of abstractions of data depending on analytics needs
- Hands on data preparation activities using the Azure technology stack especially Azure Databricks is strongly prefeered
- Implement discovery solutions for high speed data ingestion
- Work closely with the Data Science team to perform complex analytics and data preparation tasks
- Work with the MLops/Technology on the team to develop APIs
- Sourcing data from multiple applications, profiling, cleansing and conforming to create master data sets for analytics use
- Utilize state of the art methods for data manning especially unstructured data
- Experience with Complex Data Parsing (Big Data Parser) and Natural Language Processing (NLP) Transforms on Azure a plus
- Design solutions for managing highly complex business rules within the Azure ecosystem
- Performance tune data loads
Skills Required
- Mid to advanced level knowledge of Python and Pyspark is an absolute must.
- Knowledge of Azure, Hadooop 2.0 ecosystems, HDFS, MapReduce, Hive, Pig, sqoop, Mahout, Spark etc. a must
- Significant programming experience (with above technologies as well as Java, R and Python on Linux) a must
- Experience with Web Scraping frameworks (Scrapy or Beautiful Soup or similar)
- Extensive experience working with Data APIs (Working with RESTful endpoints and/or SOAP)
- Knowledge of any commercial distribution like HortonWorks, Cloudera, MapR etc. a must
- Excellent working knowledge of relational databases, MySQL, Oracle etc.
- Experience with Complex Data Parsing (Big Data Parser) a must. Should have worked on XML, JSON and other custom Complex Data Parsing formats
- Natural Language Processing (NLP) skills with experience in Apache Solr, Python a plus
- Knowledge of High-Speed Data Ingestion, Real-Time Data Collection and Streaming is a plus
Qualifications/Experience
- Bachelors in Computer Science or related educational background
- 4-8 years of solid experience in Big Data technologies a must
- Microsoft Azure certifications a huge plus
- Data visualization tool experience a plus
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: APIs Azure Big Data Computer Science Cosmos DB Databricks Data management Data visualization Data warehouse Engineering HDFS Java JSON Linux Machine Learning ML models MLOps MySQL NLP Oracle PySpark Python R RDBMS Spark SQL Statistics Streaming Unstructured data XML
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Data Science Manager jobs
- Open Lead Data Analyst jobs
- Open Data Manager jobs
- Open Data Engineer II jobs
- Open Senior Business Intelligence Analyst jobs
- Open MLOps Engineer jobs
- Open Principal Data Engineer jobs
- Open Power BI Developer jobs
- Open Data Scientist II jobs
- Open Data Analytics Engineer jobs
- Open Business Intelligence Developer jobs
- Open Junior Data Scientist jobs
- Open Business Data Analyst jobs
- Open Sr Data Engineer jobs
- Open Data Analyst Intern jobs
- Open Product Data Analyst jobs
- Open Sr. Data Scientist jobs
- Open Senior Data Architect jobs
- Open Big Data Engineer jobs
- Open Azure Data Engineer jobs
- Open Principal Data Scientist jobs
- Open Research Scientist jobs
- Open Data Quality Analyst jobs
- Open Manager, Data Engineering jobs
- Open Junior Data Engineer jobs
- Open Data quality-related jobs
- Open GCP-related jobs
- Open Java-related jobs
- Open Business Intelligence-related jobs
- Open ML models-related jobs
- Open Data management-related jobs
- Open Privacy-related jobs
- Open PhD-related jobs
- Open Deep Learning-related jobs
- Open Finance-related jobs
- Open Data visualization-related jobs
- Open PyTorch-related jobs
- Open TensorFlow-related jobs
- Open APIs-related jobs
- Open NLP-related jobs
- Open Consulting-related jobs
- Open LLMs-related jobs
- Open Snowflake-related jobs
- Open CI/CD-related jobs
- Open Generative AI-related jobs
- Open Kubernetes-related jobs
- Open Hadoop-related jobs
- Open Data governance-related jobs
- Open Airflow-related jobs
- Open Docker-related jobs