Data Pipeline Engineer (Secret Clearance Required)

Tampa, FL

Latitude

Latitude Inc is an organization providing staffing solutions and government services for companies and public sector.

View company page

Responsibilities:Data Pipeline Design and Architecture: Design and architect scalable and efficient data pipelines using Kafka and Hadoop technologies. Collaborate with data architects, software engineers, and business stakeholders to define pipeline requirements and ensure alignment with business objectives.Data Ingestion and Processing: Implement data ingestion processes to collect, ingest, and process large volumes of data from diverse sources. Develop streaming and batch processing solutions using Kafka, Apache Spark, and Hadoop ecosystem tools (e.g., HDFS, MapReduce).Data Transformation and Enrichment: Perform data transformation and enrichment operations to cleanse, validate, and enrich raw data before loading it into target systems or data stores. Utilize technologies such as Apache NiFi, Apache Flink, or Apache Storm for real-time data processing.Data Storage and Management: Manage data storage solutions within the Hadoop ecosystem, including HDFS and Hive. Optimize data storage and retrieval performance, implement data partitioning and compression techniques, and ensure data integrity and security.Data Governance and Compliance: Ensure compliance with data governance policies, regulatory requirements, and best practices for data privacy and security. Implement data lineage tracking, metadata management, and auditing mechanisms to maintain data quality and lineage.Monitoring and Performance Tuning: Set up monitoring and alerting systems to track the health and performance of data pipelines. Conduct performance tuning and optimization activities to improve throughput, latency, and resource utilization.Documentation and Knowledge Sharing: Document data pipeline designs, configurations, and operational procedures. Provide training and support to other team members to promote knowledge sharing and best practices adoption.Continuous Improvement: Stay informed about emerging technologies, tools, and trends in the data engineering field. Evaluate new technologies and methodologies to enhance data pipeline scalability, reliability, and efficiency.Qualifications:Bachelor's degree in Computer Science, Engineering, or a related field. Advanced degree or relevant certifications (e.g., Certified Kafka Developer, Cloudera Certified Developer for Apache Hadoop) is a plus.3+ years of experience in data engineering or related roles, with a focus on building and maintaining data pipelines using Kafka and Hadoop ecosystem technologies.Strong proficiency in Apache Kafka, including Kafka Streams, Kafka Connect, and Kafka ecosystem components.Hands-on experience with Hadoop ecosystem tools such as HDFS, MapReduce, YARN, Hive, and Spark.Proficiency in programming languages commonly used in data engineering (e.g., Java, Scala, Python).Experience with data serialization formats (e.g., Avro, Parquet, JSON), message queuing systems, and distributed computing concepts.Knowledge of data modeling, schema design, and database concepts.Strong problem-solving skills and ability to troubleshoot complex data pipeline issues.Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.
Apply now Apply later
  • Share this job via
  • or

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Tags: Architecture Avro Computer Science Data governance Data pipelines Data quality Engineering Flink Hadoop HDFS Java JSON Kafka NiFi Parquet Pipelines Privacy Python Scala Security Spark Streaming

Perks/benefits: Team events

Region: North America
Country: United States
Job stats:  2  0  0
Category: Engineering Jobs

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.