Senior Data Engineer

India, Hyderabad, DVS, SEZ-1 – Orion B4; FL 7,8,9,11 (Hyderabad - Divyasree 3)

About FactSet:

FactSet is a leader in providing research and analytical tools to finance professionals. FactSet offers instant access to accurate financial data and analytics around the world. FactSet clients combine hundreds of databases from industry-leading suppliers into a single powerful information system.

About Enterprise Data & Insights:

The Enterprise Data & Insights Engineering group promotes informed decision-making driven by data across our organization. This group is on mission to create Enterprise data lakes, Develop & Maintain a connected Enterprise Data Models, Build Standard Reporting Layers and follow a stringent governance process to enable Research and Development of new insights for our C-suite Executives.

Data quality, timeliness and lineage lies at the heart of our vision. Our team leverages the latest cloud technologies, and We are strong contingent of FactSet Veterans who have worked across varied product lines of FactSet.

Ultimately, our vision is to enable data-driven decision-making at all levels, empowering individuals, and nurturing a culture that relies on accurate, reliable, and accessible enterprise data.

VALUES THAT DEFINE OUR CULTURE

We are unified by the spirit of going above and beyond for our clients and each other. We look to foster a globally inclusive culture, enabling our people to be themselves at work and to join in, be heard, contribute, and grow. We continually seek to expand our workforce with diverse perspectives, backgrounds, and experiences. We recognize that our best ideas can come from anyone, anywhere, at any time and help us provide the best solutions for our clients around the globe. Our inclusive work environment maximizes our diversity values, engagement, productivity, and ultimately makes FactSet a fun place to work.

Job Summary

Executes software development projects with high quality design and architecture with a focus on performance, scalability, and stability.  Ability to independently handle complex software development tasks all the way to release management processes for their respective applications. Engages with software development teams, business analyst and stakeholders at different stages of Software Development Life Cycle to ensure that projects are completed on time with quality. Provides guidance to junior developers on the software development best practices.

JOB REQUIREMENTS

  • Minimum 2 to 3 years of work experience as a Data Engineer building data pipelines/ETL pipelines

  • Proficiency in Python

    • The candidate needs to be well-versed in using Python as a primary language for data extraction, wrangling, cleaning and analysis.

    • Deep knowledge of various Python libraries such as Pandas, NumPy and PySpark.

    • Understanding of Python’s machine learning libraries is an added advantage.

  • Expertise in SQL

    • Strong proficiency in SQL programming with the ability to write, analyze, and debug complex SQL  queries.

  • Proficiency in PySpark:

    • Solid understanding of distributed computing principles and proficiency in PySpark for processing large-scale data workloads in a distributed environment.

    • Hands-on experience in the use of RDDs and DataFrames in Spark for big data processing and analytics.

    • Able to utilize SparkSQL for working with structured data and running SQL-like queries.

    • Proficient in performance tuning of PySpark applications and optimizing transformations and actions within Spark.

    • Demonstrated experience in handling various data formats - JSON, CSV, and Parquet in PySpark.

    • Knowledge in implementing stream processing applications using PySpark Streaming.

    • Experience with using PySpark to interact with storage systems such as AWS S3.

    • Experience in cloud platforms and running PySpark in a cloud environment would be beneficial.

  • Big Data Processing using Data Lakes:

    • Good understanding of the data lake architecture including data ingestion, storage, processing, and security layers.

    • Experience in managing and processing large volumes of structured and unstructured data using data lakes.

    • Experience in the implementation and management of data lakes on various platforms such as AWS S3, Azure Data Lake Storage, and Google Cloud Storage.

    • Knowledge of SQL and NoSQL databases and ability to leverage them for different big data processing requirements in a data lake setup.

    • Familiarity with data lake metadata management and data cataloging tools such as AWS Glue/Unity Catalog.

    • Familiarity with the concept of Data Lakehouse - a blend of data lake and data warehouse, and relevant tools like Databricks Lakehouse platform.

  • Experience with Data Protection/Security and Data Quality Practices

    • Understanding of Data encryption and decryption practices for data engineering, to protect sensitive and confidential information from unauthorized access and misuse. 

    • Experience with Data Anonymization Techniques like data masking, data swapping, pseudonymization etc

    • Experience in implementing access controls and entitlements management in a multi-tenant data store

    • Understanding of Data quality tools like Great Expectations, Deequ, Soda Core etc

  • Experience with Jupyter Notebooks

    • Demonstrated experience in developing, running, and debugging code within Jupyter Notebooks-

    • Comprehensive knowledge of Jupyter Notebook extensions and widgets to enhance its functionality.

    • Experience in using Jupyter Notebooks for data visualization using libraries like Matplotlib, Seaborn, Plotly, etc.

    • Ability to effectively use Jupyter Notebooks for exploratory data analysis

  • Good to Have:

    • Relevant Data Engineering/Data Analytics Certifications

      • AWS Certified Data Analytics – Specialty

      • Databricks Certified Data Engineer

      • Databricks Certified Associate Developer for Apache Spark

    • Experience in using LLMs (large language models) for natural language processing or generative AI tasks. • Knowledge of various LLM platforms such as Hugging Face, OpenAI, and GPT-3.

    • CI/CD pipelines using GitHub Actions

    • IAC using Terraform

Technology Learning Opportunities:

FactSet is committed to invest into Career development of all the Engineers to upskill, or re-skill based on individual interests, Project priorities and offers:

  • Licenses for learning resources like Pluralsight

  • Reimbursement of Technology Certification Fees (AWS, Snowflake, Databricks)

  • Paid Leave for Certification Exam preparation (In addition to Casual Leaves and Privilege Leaves)

  • Vibrant Technology Communities that organize Internal programs, technology symposiums, Guest lectures by internal and external experts.

As an aspiring Cloud Data Engineer at FactSet, you will have the opportunity to work alongside experienced and professionals and expand your skills in data engineering.

JOB RESPONSIBILITIES

  • Collaborate with the Data engineering and Data Analytics team to design and implement data pipelines and workflows.

  • Develop and maintain ETL processes to extract, transform, and load data from various sources into data platforms.

  • Optimize database performance, troubleshoot issues, and fine-tune queries for cost and storage efficiency.

  • Identify opportunities to improve data infrastructure scalability and reliability.

  • Stay up to date with emerging technologies and industry trends in data engineering.

  • Work with Analytics Dashboards (e.g., Power BI) to create insightful visualizations and reports.

DIVERSITY

At FactSet, we celebrate diversity of thought, experience, and perspective. We are committed to disrupting bias and a transparent hiring process. All qualified applicants will be considered for employment regardless of race, color, ancestry, ethnicity, religion, sex, national origin, gender expression, sexual orientation, age, citizenship, marital status, disability, gender identity, family status or veteran status. FactSet participates in E-Verify.


Returning from a break?

We are here to support you! If you have taken time out of the workforce and are looking to return, we encourage you to apply and chat with our recruiters about our available support to

help you relaunch your career.

Apply now Apply later
  • Share this job via
  • or

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Job stats:  1  0  0
Category: Engineering Jobs

Tags: Architecture AWS AWS Glue Azure Big Data CI/CD CSV Data analysis Data Analytics Databricks Data pipelines Data quality Data visualization Data warehouse EDA Engineering ETL Finance GCP Generative AI GitHub Google Cloud GPT GPT-3 JSON Jupyter LLMs Machine Learning Matplotlib NLP NoSQL NumPy OpenAI Pandas Parquet Pipelines Plotly Power BI PySpark Python Research SDLC Seaborn Security Snowflake Spark SQL Streaming Terraform Unstructured data

Perks/benefits: Career development

Region: Asia/Pacific
Country: India

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.