Senior Data Engineer
India, Hyderabad, DVS, SEZ-1 – Orion B4; FL 7,8,9,11 (Hyderabad - Divyasree 3)
FactSet
About FactSet:
FactSet is a leader in providing research and analytical tools to finance professionals. FactSet offers instant access to accurate financial data and analytics around the world. FactSet clients combine hundreds of databases from industry-leading suppliers into a single powerful information system.
About Enterprise Data & Insights:
The Enterprise Data & Insights Engineering group promotes informed decision-making driven by data across our organization. This group is on mission to create Enterprise data lakes, Develop & Maintain a connected Enterprise Data Models, Build Standard Reporting Layers and follow a stringent governance process to enable Research and Development of new insights for our C-suite Executives.
Data quality, timeliness and lineage lies at the heart of our vision. Our team leverages the latest cloud technologies, and We are strong contingent of FactSet Veterans who have worked across varied product lines of FactSet.
Ultimately, our vision is to enable data-driven decision-making at all levels, empowering individuals, and nurturing a culture that relies on accurate, reliable, and accessible enterprise data.
VALUES THAT DEFINE OUR CULTURE
We are unified by the spirit of going above and beyond for our clients and each other. We look to foster a globally inclusive culture, enabling our people to be themselves at work and to join in, be heard, contribute, and grow. We continually seek to expand our workforce with diverse perspectives, backgrounds, and experiences. We recognize that our best ideas can come from anyone, anywhere, at any time and help us provide the best solutions for our clients around the globe. Our inclusive work environment maximizes our diversity values, engagement, productivity, and ultimately makes FactSet a fun place to work.
Job Summary
Executes software development projects with high quality design and architecture with a focus on performance, scalability, and stability. Ability to independently handle complex software development tasks all the way to release management processes for their respective applications. Engages with software development teams, business analyst and stakeholders at different stages of Software Development Life Cycle to ensure that projects are completed on time with quality. Provides guidance to junior developers on the software development best practices.
JOB REQUIREMENTS
Minimum 2 to 3 years of work experience as a Data Engineer building data pipelines/ETL pipelines
Proficiency in Python
The candidate needs to be well-versed in using Python as a primary language for data extraction, wrangling, cleaning and analysis.
Deep knowledge of various Python libraries such as Pandas, NumPy and PySpark.
Understanding of Python’s machine learning libraries is an added advantage.
Expertise in SQL
Strong proficiency in SQL programming with the ability to write, analyze, and debug complex SQL queries.
Proficiency in PySpark:
Solid understanding of distributed computing principles and proficiency in PySpark for processing large-scale data workloads in a distributed environment.
Hands-on experience in the use of RDDs and DataFrames in Spark for big data processing and analytics.
Able to utilize SparkSQL for working with structured data and running SQL-like queries.
Proficient in performance tuning of PySpark applications and optimizing transformations and actions within Spark.
Demonstrated experience in handling various data formats - JSON, CSV, and Parquet in PySpark.
Knowledge in implementing stream processing applications using PySpark Streaming.
Experience with using PySpark to interact with storage systems such as AWS S3.
Experience in cloud platforms and running PySpark in a cloud environment would be beneficial.
Big Data Processing using Data Lakes:
Good understanding of the data lake architecture including data ingestion, storage, processing, and security layers.
Experience in managing and processing large volumes of structured and unstructured data using data lakes.
Experience in the implementation and management of data lakes on various platforms such as AWS S3, Azure Data Lake Storage, and Google Cloud Storage.
Knowledge of SQL and NoSQL databases and ability to leverage them for different big data processing requirements in a data lake setup.
Familiarity with data lake metadata management and data cataloging tools such as AWS Glue/Unity Catalog.
Familiarity with the concept of Data Lakehouse - a blend of data lake and data warehouse, and relevant tools like Databricks Lakehouse platform.
Experience with Data Protection/Security and Data Quality Practices
Understanding of Data encryption and decryption practices for data engineering, to protect sensitive and confidential information from unauthorized access and misuse.
Experience with Data Anonymization Techniques like data masking, data swapping, pseudonymization etc
Experience in implementing access controls and entitlements management in a multi-tenant data store
Understanding of Data quality tools like Great Expectations, Deequ, Soda Core etc
Experience with Jupyter Notebooks
Demonstrated experience in developing, running, and debugging code within Jupyter Notebooks-
Comprehensive knowledge of Jupyter Notebook extensions and widgets to enhance its functionality.
Experience in using Jupyter Notebooks for data visualization using libraries like Matplotlib, Seaborn, Plotly, etc.
Ability to effectively use Jupyter Notebooks for exploratory data analysis
Good to Have:
Relevant Data Engineering/Data Analytics Certifications
AWS Certified Data Analytics – Specialty
Databricks Certified Data Engineer
Databricks Certified Associate Developer for Apache Spark
Experience in using LLMs (large language models) for natural language processing or generative AI tasks. • Knowledge of various LLM platforms such as Hugging Face, OpenAI, and GPT-3.
CI/CD pipelines using GitHub Actions
IAC using Terraform
Technology Learning Opportunities:
FactSet is committed to invest into Career development of all the Engineers to upskill, or re-skill based on individual interests, Project priorities and offers:
Licenses for learning resources like Pluralsight
Reimbursement of Technology Certification Fees (AWS, Snowflake, Databricks)
Paid Leave for Certification Exam preparation (In addition to Casual Leaves and Privilege Leaves)
Vibrant Technology Communities that organize Internal programs, technology symposiums, Guest lectures by internal and external experts.
As an aspiring Cloud Data Engineer at FactSet, you will have the opportunity to work alongside experienced and professionals and expand your skills in data engineering.
JOB RESPONSIBILITIES
Collaborate with the Data engineering and Data Analytics team to design and implement data pipelines and workflows.
Develop and maintain ETL processes to extract, transform, and load data from various sources into data platforms.
Optimize database performance, troubleshoot issues, and fine-tune queries for cost and storage efficiency.
Identify opportunities to improve data infrastructure scalability and reliability.
Stay up to date with emerging technologies and industry trends in data engineering.
Work with Analytics Dashboards (e.g., Power BI) to create insightful visualizations and reports.
DIVERSITY
At FactSet, we celebrate diversity of thought, experience, and perspective. We are committed to disrupting bias and a transparent hiring process. All qualified applicants will be considered for employment regardless of race, color, ancestry, ethnicity, religion, sex, national origin, gender expression, sexual orientation, age, citizenship, marital status, disability, gender identity, family status or veteran status. FactSet participates in E-Verify.
Returning from a break?
We are here to support you! If you have taken time out of the workforce and are looking to return, we encourage you to apply and chat with our recruiters about our available support to
help you relaunch your career.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Architecture AWS AWS Glue Azure Big Data CI/CD CSV Data analysis Data Analytics Databricks Data pipelines Data quality Data visualization Data warehouse EDA Engineering ETL Finance GCP Generative AI GitHub Google Cloud GPT GPT-3 JSON Jupyter LLMs Machine Learning Matplotlib NLP NoSQL NumPy OpenAI Pandas Parquet Pipelines Plotly Power BI PySpark Python Research SDLC Seaborn Security Snowflake Spark SQL Streaming Terraform Unstructured data
Perks/benefits: Career development
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Data Science Manager jobs
- Open Business Intelligence Developer jobs
- Open Data Engineer II jobs
- Open Principal Data Scientist jobs
- Open Data Scientist II jobs
- Open Business Data Analyst jobs
- Open BI Analyst jobs
- Open Sr Data Engineer jobs
- Open Business Intelligence Engineer jobs
- Open Lead Data Analyst jobs
- Open Data Science Intern jobs
- Open Sr. Data Scientist jobs
- Open Senior Business Intelligence Analyst jobs
- Open Junior Data Scientist jobs
- Open MLOps Engineer jobs
- Open Software Engineer, Machine Learning jobs
- Open Data Analytics Engineer jobs
- Open Azure Data Engineer jobs
- Open Marketing Data Analyst jobs
- Open Manager, Data Engineering jobs
- Open Data Engineer III jobs
- Open Junior Data Engineer jobs
- Open Product Data Analyst jobs
- Open Data Analyst II jobs
- Open ETL Developer jobs
- Open Data quality-related jobs
- Open GCP-related jobs
- Open Excel-related jobs
- Open Privacy-related jobs
- Open Data pipelines-related jobs
- Open ML models-related jobs
- Open PhD-related jobs
- Open APIs-related jobs
- Open PyTorch-related jobs
- Open Finance-related jobs
- Open LLMs-related jobs
- Open Data visualization-related jobs
- Open Business Intelligence-related jobs
- Open TensorFlow-related jobs
- Open Consulting-related jobs
- Open Deep Learning-related jobs
- Open Generative AI-related jobs
- Open Data governance-related jobs
- Open NLP-related jobs
- Open CI/CD-related jobs
- Open DevOps-related jobs
- Open Kubernetes-related jobs
- Open Git-related jobs
- Open Snowflake-related jobs
- Open Hadoop-related jobs